This processor enables you to anonymize specified columns of an input table. In the configuration you specify the method of anonymization and the tables and the respective columns you wish to anonymize. Specified tables and their specified columns are anonymized, the rest is passed through to out/tables.
example of the use mentioned in the documentation:
Since we are having increased requests on new functionality enabling, I think this simple diagram will be useful to determine the project eligibility:
The crucial piece is the move to Transformations V2 that enables the queue and hence a slew of new exciting features. Some of them heavily influence speed/costs ratio.
For more information and feedback please use the feedback button in the platform - our product team will be glad to hear any feedback that will help us improve!
Hi guys, some of you might notice spammy postings once in a while - we are on it with our platform provider. Please have a patience, you could help us by reporting those posts!
Is Mapping needed? (in most cases - yes, especially for incremental loads)
(Important) If you bounce on some serious issues and you cannot get a solution from your peers, stop wasting your time further and switch to a custom component (Python, PHP, etc.)
Iterate through each underlined item of the Configuration Map and review details. Alternatively, go to this page containing useful links for various sections of the Generic Extractor.
Note: Don’t consume yourself with advanced authentication types e.g. OAuth, in the beginning. Start with ‘URL Query, ‘Basic HTTP’, ‘Login’ and know where to come back to in case another authentication type is required.
The data catalog represents an overview of data shared to and from the project. The data catalog allows you to share data in a very efficient, controlled and auditable way.
There are several options how you can share data:
Project Members – To the entire organization. Any user of any project in the organization can link the data bucket.
Organization Members – To administrators of the organization. Any user of any project in the organization can link the data bucket provided that they are also an administrator of the organization.
Selected Projects – To specified projects. Any user of the listed projects in the organization can link the data bucket.
Selected Users – To specified users. Any listed users in the organization can link the data bucket.
Shared catalogue details
Creating new catalogue
Subscribing to existing shared catalogue
Keboola Storage writer
This writer loads single or multiple tables from your current project into a different Keboola Connection project. The component can be used in situations where Data Catalog cannot, e.g., moving data between two different organizations or regions.
Extractor uses source project storage API token to setup a data extraction tunnel between source project and destination (current project). API token can be limited to buckets, tables, or a single table if needed.
Like [3] and [4], it requires the Keboola API token, which can be limited as mentioned before. Storage API supports quick sync and more robust async data load requests, as well as data preview requests, etc. More in the official documentation.
Since we are bringing a feature parity between different stacks (mostly existing stacks and pay-as-you-go one), I think it might be beneficial to discuss the new features and publish a bit of a guide how to do the same (for testing/developing SQL query in workspaces. Lets have a look on SQL workspaces now:
Workspaces
A workspace serves several purposes and can be used as
an interactive development environment (IDE) to create transformations.
an analytical workspace where you can interactively perform experiments and modelling with live production data.
an ephemeral workspace created on each run of a transformation to provide the staging area in which the transformation operates. Ephemeral transformation workspaces are not visible in the transformation UI, hence we won’t mention them further.
When a workspace is created, it enters the Active state and can be used.
Database (Snowflake, Redshift, and Synapse) workspaces are billed by the runtime of queries executed in them. As such, we leave them in active state until you delete them.
Since this question appears once in a while in our support system, we would love your feedback on this missing functionality. Is here someone who would appreciate such feature? Would you be able to describe your case? Have you already submitted wishlist(idea) item? Thanks!