Hi guys, some of you might notice spammy postings once in a while - we are on it with our platform provider. Please have a patience, you could help us by reporting those posts!
Martin Fiser Keboola TEAM573 Head of Professional Services @ Keboola
Do you have questions for Martin Fiser?
Log in to ask Martin Fiser questions publicly or anonymously.
There is a stellar Keboola Academy course for Generic Extractor - I would encourage everyone to start there:
However, I think this guide might be pretty useful for kickstarting new configuration:
- Mentioned Keboola Academy for Generic Extractor https://academy.keboola.com/courses/generic-extractor
- For Users: https://help.keboola.com/extractors/other/generic/
- For Developers: https://developers.keboola.com/extend/generic-extractor/
- Refer to Fisa’s repository for various samples of Generic Extractor configuration
- Github Repo of Generic Extractor with various examples
Steps to follow
- Authentication type
- Base URL
- Basic endpoint(s)
- Is Mapping needed? (in most cases - yes, especially for incremental loads)
(Important) If you bounce on some serious issues and you cannot get a solution from your peers, stop wasting your time further and switch to a custom component (Python, PHP, etc.)
- REST HTTP API Introduction
- JSON Introduction
- Basic Configuration (same as ‘tutorial’ above)
- Pagination Tutorial
- Jobs Tutorial (special emphasis on ‘Child Jobs’)
- Mapping Tutorial (copy and save the example under ‘Review’ - very useful)
Note: It is important to make use of Column Mapping feature in order to avoid occurrence of duplicates during extraction.
Review various Authentication types
Note: Don’t consume yourself with advanced authentication types e.g. OAuth, in the beginning. Start with ‘URL Query, ‘Basic HTTP’, ‘Login’ and know where to come back to in case another authentication type is required.
Adopted from an internal article by @Michal Hruska @Keboola
Here are many ways to share data:
- Data Catalogue (business-level UX)
- Keboola Storage Writer - PUSH (analyst/engineering-level UX)
- Keboola Storage Extractor - PULL (analyst/engineering-level UX)
- Direct Storage Access - TBD (analyst/engineering-level UX)
- Keboola Storage API (developer-level UX)
The data catalog represents an overview of data shared to and from the project. The data catalog allows you to share data in a very efficient, controlled and auditable way.
There are several options how you can share data:
- Project Members – To the entire organization. Any user of any project in the organization can link the data bucket.
- Organization Members – To administrators of the organization. Any user of any project in the organization can link the data bucket provided that they are also an administrator of the organization.
- Selected Projects – To specified projects. Any user of the listed projects in the organization can link the data bucket.
- Selected Users – To specified users. Any listed users in the organization can link the data bucket.
Shared catalogue details
Creating new catalogue
Subscribing to existing shared catalogue
Keboola Storage writer
This writer loads single or multiple tables from your current project into a different Keboola Connection project. The component can be used in situations where Data Catalog cannot, e.g., moving data between two different organizations or regions.
Keboola Storage extractor
Extractor uses source project storage API token to setup a data extraction tunnel between source project and destination (current project). API token can be limited to buckets, tables, or a single table if needed.
Direct Storage Access
TBD - as per platform capability feature request
Keboola Storage API
Direct connection to Keboola storage through storage API, as described here:
Like  and , it requires the Keboola API token, which can be limited as mentioned before. Storage API supports quick sync and more robust async data load requests, as well as data preview requests, etc. More in the official documentation.
Since we are bringing a feature parity between different stacks (mostly existing stacks and pay-as-you-go one), I think it might be beneficial to discuss the new features and publish a bit of a guide how to do the same (for testing/developing SQL query in workspaces. Lets have a look on SQL workspaces now:
A workspace serves several purposes and can be used as
- an interactive development environment (IDE) to create transformations.
- an analytical workspace where you can interactively perform experiments and modelling with live production data.
- an ephemeral workspace created on each run of a transformation to provide the staging area in which the transformation operates. Ephemeral transformation workspaces are not visible in the transformation UI, hence we won’t mention them further.
When a workspace is created, it enters the Active state and can be used.
- Database (Snowflake, Redshift, and Synapse) workspaces are billed by the runtime of queries executed in them. As such, we leave them in active state until you delete them.
|1 click setup||1 click setup|
|Single sandbox per user x project||Multiple private or shared Workspaces|
|No table unload||UI-based load/unload (user can also add new tables afterwards)|
|Scaling up via support ticket||UI-based scaling up (*some features may come later)|
|Has pre-set duration||Can be terminated, resumed & deleted|
How-to create workspace to develop and test SQL queries - you can create workspace by clicking on the button on the right:
then you can specify if the workspace should be shared:
once its up, you can click on the workspace detail:
you can see the input mapping has been correctly set (workspace created from transformation):
just click on credentials (link on the right) to get the same creds as you would be used to with "old" sandbox:
Let us know what is your experience working with Sandboxes and Workspaces, which features you miss and how can we improve it!
Hi all, as you may know, currently our GenEx does not support using user defined fields in child jobs:
Since this question appears once in a while in our support system, we would love your feedback on this missing functionality. Is here someone who would appreciate such feature? Would you be able to describe your case? Have you already submitted wishlist(idea) item? Thanks!
CC: @František Řehoř
It behaves in two different ways:
- when created, the last run is set to
now() - cooldown period- so that it immediately starts recording the events.
- when updated, the last run is not changed. Only the cooldown period is changed. I'll describe below:
- 9:50 trigger created with cooldown period: 2h (last run set to 09:50 minus cooldown 02:00 = 07:50, so 2h cooldown ends at 09:50)
- 10:00 table is updated, cooldown not in effect
- 10:00 trigger fired → "last run" = 10:00 (cooldown would end at 12:00)
- 10:01 trigger updated, set cooldown = 5min (cooldown is still in effect, but only until 10:05 - last run 10:00 + cooldown 00:05)
To to sum up - when created the cooldown is not in effect for the first run. When updated it's immediately recalculated, but it's still effective.