Setting up the Dataiku API local environment#
Dataiku provides a Python client library to programmatically interact with its platform. If you work directly within the Dataiku web interface, then the client is available out-of-the-box and the authentication is done transparently from the current context. If you work outside of the Dataiku web interface, there are a few additional setup steps to complete.
This tutorial will help you set up your local environment by walking you through each of these steps.
Prerequisites#
Access to a Dataiku DSS instance via an API key (see the documentation for explanations on how to generate one)
Python >= 3.7 with the following packages:
numpy==1.23.5
pandas==1.3.5
Building your local virtual environment#
The first step is to create a Python virtual environment on your computer in which you will install all the required dependencies.
Generate a new virtual environment using the tool of your choice (venv, Pipfile, poetry). In this tutorial we’ll use
venv
and call our environmentdataiku_local_env
:# Create the virtual environment python3 -m venv dataiku_local_env # Activate the virtual environment source dataiku_local_env/bin/activate
Install the
dataiku
package by running the following command (replacehttps://dss.example
with your own instance’s complete URL)pip install https://dss.example/public/packages/dataiku-internal-client.tar.gz
Warning
If your instance has a self-signed or expired certificate, in order to connect with HTTP you will need to add the
--trusted-host
flag:pip install https://dss.example/public/packages/dataiku-internal-client.tar.gz --trusted-host https://dss.example
Install the
dataikuapi
package directly from the PyPI repository:pip install dataiku-api-client
Once all relevant packages are installed, you can establish the connection with your Dataiku instance.
Connecting to your Dataiku instance#
The connection with your Dataiku instance will be established by the dataiku
package which will look for :
the instance URL,
the API key to use for authentication.
You can provide this information in different ways:
directly inside your code, by using the
set_remote_dss()
method and replacingYOURAPIKEY
with your own API key:import dataiku dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY")
Warning
If your instance has a self-signed or expired certificate, in order to connect with HTTPS you will need to pass the
no_check_certificate
flag:import dataiku dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY", no_check_certificate=True)
with environment variables to be initialized before starting your Python environment:
export DKU_DSS_URL="https://dss.example" export DKU_API_KEY="YOURAPIKEY"
with a configuration file that should be located at
$HOME/.dataiku/.config.json
(or%USERPROFILE%/.dataiku/config.json
on Windows) with the following structure:{ "dss_instances": { "default": { "url": "https://dss.example", "api_key": "YOURAPIKEY" } }, "default_instance": "default" }
Warning
If your instance has a self-signed or expired certificate, in order to connect with HTTPS you will need to add the
no_check_certificate
property:{ "dss_instances": { "default": { "url": "https://dss.example", "api_key": "YOURAPIKEY", "no_check_certificate": true } }, "default_instance": "default" }
Testing your setup#
The last step is to check if you can properly connect to your Dataiku instance. For that, you can use the code snippet below:
import dataiku
# Uncomment this if you are not using environment variables or a configuration file
# dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY")
client = dataiku.api_client()
# Uncomment this if your instance has a self-signed certificate
# client._session.verify = False
info = client.get_auth_info()
print(info)
If all goes well, you should see an output similar to this:
{
"authSource": "PERSONAL_API_KEY",
"via": [],
"authIdentifier": "your-user-name",
"groups": ["one_group", "another_group"],
"userProfile": "DESIGNER",
"associatedDSSUSer": "your-user-name",
"userForImpersonation": "your-user-name"
}
If so, congratulations: your setup is now fully operational ! You can move on and set up Dataiku plugins/extensions in your favorite IDE or learn more about how to automate things using the public API.