Greetings, everyone!
Context:
Basically, all I want is to copy the prepared tables from Keboola storage into 1:1 tables in the application database.
The easiest way to solve this is obviously to use PostgreSQL writer and just write to the application database, but that basically means that local development of the application requires a remote database or to dump&restore the data from production. And if you want to change/add something, there is no easy way to test it without going through the production database first, which I dislike very much.
Right now, we think the best solution is to use Snowflake writer and have it provision a database with all the data, and then have a process in the application that can “pull” the data.
This solves both production and local development in the same way and we won’t have to maintain any hacks or alternative data-sync methods to avoid the problem that Keboola simply cannot write to a developer’s localhost database. And even if it could (throught VPN for example), it would mean maintaining a writer for production and then a copy of that writer for each developer and I don’t like that very much.
The disadvantage is that we won’t be using any of the Snowflake advantages and it will be just a dumb storage we can connect to using a JDBC driver. We’re also considering having our own PostgreSQL data-warehouse, which Keboola can write into and then the applications can pull the data from there into its database, in case the Keboola-provisioned Snowflake would become too expensive.
We have also considered using API of the Keboola storage directly, but in order to get the data out, we have to first run a job that writes it somewhere and then download a bunch of CSV files, which defeats the purpose of having it easily queriable by SQL.
Question:
What other options do we have? How else can I get the data from Keboola to application databases?
Thank you!
Hi @Filip Procházka This looks like you would prefer some sort of pull mechanism. There are multiple ways I think:
Unless I am mistaken, you need to get data into DB on localhost, then you will still need some way of connecting it to external store - so either pull as mentioned above, or VPN and push it through it.
just out of the curiosity, is the main purpose the development process for multiple devs? Any chance our workspaces would suffice? you can have a multiple of them (private/shared), load those data in/out, etc.
If you need 1:1 copy of the data in production, then you will still need to get to the production machine, as mentioned push (connectivity) or pull (application that handles the pull process)
Hi Martin,
we need a copy of the data inside the application database. We don't want to query the data warehouse from the application directly and there is a need for some application-specific processing of the data, which I don't want to do in the data warehouse (or in ETL before it).
Generally, we prefer querying the data using SQL to copying CSV files from S3 - it's simpler to get the data into the database tables.
Right now, we're trying the "write into a data-warehouse database (snowflake), pull from application" strategy and so far it looks good. I just wanted to know if there are more options to choose from - if we're missing something.
About the Keboola storage API - my colleague was exploring it and he was not able to get the data out of the storage directly, he was required to run some kind of async export job that would write the data somewhere as CSV files, he was not able to just query the API and get raw data right away in the response. Did we miss something here?
Thank you
There is deprecated but still functional Sync API for data loads, though that one is not suitable for fast queries/large volumes of data, hence the new ASYNC version of the API.
Additionally, single tenants/customers with own SFLK have the option to query data directly, though I have to remind that this is the case when you would try to create transactional DB out of the analytical (OLTP vs. MPP DBs).
Perfectly understandable.
We will go with the Snowflake writers (with databases provisioned by Keboola) for now and we'll see how that goes.
Thank you for the info.