Coding with Dataiku#
Coding capabilities in Dataiku#
Using Dataiku, you can build more flexible and powerful data, analytics, machine learning, and AI solutions, while still benefiting from the platform’s visual and collaborative features.
Enhanced Customization and Flexibility#
You can create custom code recipes using Python, R, and SQL. Custom recipes can offer alternatives to existing data transformations, algorithms, and execution contexts. You’re not limited to Dataiku’s standard offerings. You can build any logic required to meet your needs.
Advanced usage of Generative AI and Agents#
Dataiku serves as a secure, centralized gateway, offering access to diverse LLMs and agent functionalities. You can seamlessly test various models while preserving your core logic, establish custom connections, create custom LLMs, or host LLMs on your own infrastructure. Additionally, you can create code-driven components, including tools, agents, guardrails, and retrieval-augmented workflows, thereby enhancing your project’s efficiency.
Advanced Machine Learning#
Although Dataiku’s visual machine learning tools are powerful, you can also use external libraries or write your own training and inference logic. These custom models remain part of the Dataiku platform, enabling them to be deployed, monitored, and reused in governed workflows.
Automation and Integration#
You can use Dataiku’s APIs and automation features to control many aspects of the platform, including datasets, scenarios, machine learning, and AI workflows. These capabilities also make it easier to integrate Dataiku into CI/CD processes and production operations.
Extensibility with Plugins#
One of the most significant benefits for coders is the ability to develop custom plugins. These plugins can add new functionalities to Dataiku, such as new dataset connectors, custom data preparation steps, or unique visualizations. This allows you to extend the platform to meet your organization’s specific needs and share these new capabilities with other users, both technical and non-technical.
The following sections describe where code is written, executed, and managed when working on a Dataiku instance.
Tools for coding#
Notebooks#
If you want to explore your data interactively and experiment with small pieces of code, notebooks are a natural place to start. They let you run code in blocks called cells and inspect the output of each one.
Dataiku provides complete notebook environments that run server-side:
SQL notebooks to run interactive queries on your SQL databases.
Code notebooks to execute Python or R code in a simple yet effective interface based on Jupyter notebooks.
These notebook options are embedded directly in the Dataiku web interface,
so you can move easily through your project and share your work with other users on the same instance.
Additionally, Python/R notebook sources (.ipynb files) can be synchronized from/to remote Git repositories.
Code recipe#
Once you have tested your code in a notebook or elsewhere, you can integrate it into your workflow. Code recipes let you run code on many different kinds of inputs and outputs, including datasets, managed folders, knowledge banks, and agents. This makes it easier to adapt Dataiku workflows to your specific needs. A code recipe can replace a visual recipe if you’re more comfortable with code.
IDE and Code Studio#
If you already use an IDE like Visual Studio Code or PyCharm on your client machine, installing the relevant extension or plugin lets you connect it to your Dataiku instance and edit source code directly.
If you prefer to edit source code remotely, Dataiku can embed a Visual Studio Code editor directly in its interface. This option relies on the platform’s Code Studios feature and does not require any setup on your client machine, since the platform manages it for you.
See also
Code Studios (in the reference documentation)
VSCode/IntelliJ extension for Dataiku
The Visual Studio marketplace page to install and configure the extension
PyCharm plugin for Dataiku
The JetBrains marketplace page to install and configure the plugin
Code Studios
Managing dependencies#
Writing code often means working with third-party packages that must be installed separately. In Python, for example, you would typically use virtual environments to manage these dependencies.
In Dataiku, the equivalent is called a code environment. It lets you choose which Python version and custom packages your code should use. Once the code environment is set up, its dependencies can be imported from any code run by Dataiku.
See also
Bringing an external code base#
You may already have an existing code base outside your Dataiku instance. You can make items from that code base directly importable in Dataiku by using a project library feature called “Git references.” If the external code is hosted in a remote Git repository, this feature lets you pull a specific branch into Dataiku and expose it as a project library.
By doing so, you can have your Dataiku workflows operate hand-in-hand with any external code repository.
Git integration#
Dataiku natively includes git operations. There are multiple ways to use Git in Dataiku, from managing your code to versioning your project. The documentation includes several guides for working with Git within Dataiku.
