One of the larger undertakings of our product team over the last few quarters has been the new transformations module and its interface. Our new users on the Azure stack actually don’t get to see the “traditional” transformations any more, and the new transformations are coming to AWS soon as well. I thought it a good idea to share the reasons for the change and to introduce the new concept ahead of the wide release.

Why did we rework the transformations in the first place? There was quite a bit of technical debt in the old system (it was monolithic, baked deep into KBC, and therefore an exception from our current container-based architecture). It’s flaws led users to various workarounds that proved themselves to be hard to maintain, and caused confusion as the UI did not necessarily reflect the underlying functionality in an intuitive way. Our users wanted more - such as use the sandboxes for playing with data rather than solely as development environments for the transformations. We also wanted to introduce new features (more of that later) that just weren’t feasible in that setup.

So how do the new transformations address it? First of all, we are introducing the concept of Workspaces (replacement to the original “sandbox”). Workspaces have their own configurations (such as input mapping), one user can have multiple sandboxes even on the same backend, there is the option to share workspace(s) with colleagues for collaboration. Workspace can be easily converted into transformation and vice versa.

For those who are familiar with our original setup, a new transformation is a rough equivalent of a phase in a transformation bucket. That means it is a block of functionality (input mapping, output mapping, code blocks) that get executed together. Within the transformation one can have several code blocks that are executed consecutively (replacing the old dependencies between transformations). That way everything that is executed together is organized in one screen (and of course the underlying configuration JSON).

Besides the mentioned more powerful workspaces, the new solution opened doors to some new features. Parameters allow for modified execution based on outside output (without the need of switching code versions etc.). Shared code allows for easier reusability of once-developed functions. Code patterns take it one step further by reducing the amount of typing required when solving common problems. The setup also makes it easier to introduce new backends rapidly.  Recently we introduced (in private beta, let us know if you want to try it out!)  Oracle and Spark backends - we can run pyspark code on the Spark cluster with a ready-made workflow helping you to deploy and maintain models in your production environment.

This is just the beginning - we will be adding parallel processing of code within the same block, smart grouping/chaining of transformations that are part of bigger process and only make sense when put together, we are using user feedback to improve the UI (reducing number of clicks required, tying closer the transformation to a workspace etc.) and will provide tools to speed up migration from old to new transformations once introduced to your existing projects (continuity will be ensured as we will support both for the foreseeable future, while hoping that the better features will motivate users to migrate).

You can see the new transformations in action in our PAYG projects. Documentation can be found here.

All and any feedback is welcome to ensure the product works as well for you as possible!

For more details please contact your CSM or reach out to us at support@keboola.com.