Development environment#

This page briefly explains where the code is written and executed when working on a Dataiku DSS instance.

Tools for editing code#

In this section you will discover the various options at your disposal to edit code in Dataiku DSS depending on your use-case.

Notebooks#

If you are looking for a way to interactively explore your data and experiment with small pieces of code, then notebooks are the way to go. They allow you to execute your code by consecutive blocks called cells, and visualize each cell output.

Dataiku DSS offers the ability to spawn complete code notebooks environments server-side:

  • SQL notebooks to run interactive queries on your SQL databases

  • Code notebooks to execute Python or R code in a simple-yet-effective interface based on Jupyter notebooks

All these solutions are natively embedded in the Dataiku DSS web interface to facilitate your navigation and easily share your work with other users on the same instance. Additionally, Python/R notebook sources (.ipynb files) can be synchronized from/to remote Git repositories.

IDEs#

If you are already using an IDE like Visual Studio Code or PyCharm on your client machine, by installing the relevant extensions/plugins you will be able to connect it to your Dataiku DSS instance and edit source code directly from there.

If you prefer editing your source code remotely, Dataiku DSS offers the possibility to embed a Visual Studio Code editor directly in its interface. This option is based on the platform’s “Code Studios” feature and does not require any setup on your client machine since it is fully managed by the platform.

See also

VSCode/IntelliJ extension for Dataiku DSS

PyCharm plugin for Dataiku DSS

Code Studios

Managing dependencies#

Writing code often implies working with third-party packages that you need to install separately. For example, in the case of Python you would take advantage of virtual environments to create and import your dependencies.

In Dataiku DSS, the equivalent of the virtual environment concept is called “code environment”, it allow you to choose which Python version and which custom packages you want to run your code with. Once the code environment is set up, its dependencies can be imported from any piece of code run by Dataiku DSS.

Building a shared code base#

When writing code for a project, past a certain size and/or complexity threshold it is important to modularize it into classes and functions. By doing so, you also allow other users to import these items directly instead of re-implementing them. This concept of shared code repository is materialized in Dataiku DSS in the form of “project libraries”.

Thanks to them, you can also decouple your code’s logic (containing business/domain expertise) from the Dataiku DSS-related code that handles workflow orchestration.

Bringing an external code base#

As a new Dataiku DSS user, you probably have already worked on an existing code base living independently from the instance. You can make the items of this code base directly importable in Dataiku DSS by using a special feature of project libraries called “Git references”. Provided that the external code based is hosted on a remote Git repository, this feature allows you to pull a specific branch of that repository in Dataiku DSS, which will be materialized into a project library.

By doing so, you can have your Dataiku DSS workflows operate hand-in-hand with any external code repository.