Using Dataiku’s Python packages#
Using the client from inside Dataiku#
You have nothing to do when you use the client from inside Dataiku.
The packages are preinstalled, and you don’t need to provide an API key.
The client will inherit credentials from the current context.
Both packages (dataiku
and dataikuapi
) can be used.
The easiest way to create a client is:
import dataiku
client = dataiku.api_client()
# client is now a DSSClient and can perform all authorized actions.
# For example, list the project keys for which you have access
client.list_project_keys()
Using the client from outside Dataiku#
There are a few additional setup steps to complete. This tutorial will help you set up your local environment by following these steps.
Prerequisites#
Access to a Dataiku DSS instance via an API key (see the documentation for explanations on how to generate one)
Python >= 3.7 with the following packages:
numpy
pandas
Note
This tutorial has been tested with Python 3.10
and
numpy==2.1.1
pandas==2.2.2
Building your local virtual environment#
The first step is to create a Python virtual environment on your computer in which you will install all the required dependencies.
Generate a new virtual environment using the tool of your choice (venv, Pipfile, poetry). In this tutorial, we’ll use
venv
and call our environmentdataiku_local_env
:# Create the virtual environment python3 -m venv dataiku_local_env # Activate the virtual environment source dataiku_local_env/bin/activate
Installing the dataiku
package#
Install the dataiku
package by running the following command
pip install http(s)://DSS_HOST:DSS_PORT/public/packages/dataiku-internal-client.tar.gz
Warning
If your instance has a self-signed or expired certificate, to connect with HTTP, you will need to add the --trusted-host
flag:
pip install http(s)://DSS_HOST:DSS_PORT/public/packages/dataiku-internal-client.tar.gz --trusted-host=DSS_HOST:DSS_PORT
In your requirements.txt file, add a line:
http(s)://DSS_HOST:DSS_PORT/public/packages/dataiku-internal-client.tar.gz
Then update your requirements with pip install -r requirements.txt
If you use HTTPS without a proper certificate, you may need to add
--trusted-host=DSS_HOST:DSS_PORT
to your pip command line.
Download the package’s tar.gz file from your DSS instance:
http(s)://DSS_HOST:DSS_PORT/public/packages/dataiku-internal-client.tar.gz
Install it with
pip install dataiku-internal-client.tar.gz
Installing the dataikuapi
package#
Install the dataikuapi
package directly from the PyPI repository:
pip install dataiku-api-client
This installs the client in the system-wide Python installation,
so if you are not using virtualenv, you may need to replace pip
with sudo pip
.
Note that this will always install the latest version of the API client. You might need to request a version that is compatible with your DSS version.
When connecting from the outside world, you need an API key. See Public API Keys) for more information on creating an API key and the associated privileges.
You also need to connect using the base URL of your DSS instance.
Once all relevant packages are installed, you can connect with your Dataiku instance.
Connecting to your Dataiku instance#
The connection with your Dataiku instance will be established by the dataiku
package, which will look for :
the instance URL,
the API key to use for authentication.
You can provide this information in different ways:
directly inside your code, by using the
set_remote_dss()
method and replacingYOURAPIKEY
with your own API key:import dataiku dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY")
Warning
If your instance has a self-signed or expired certificate, in order to connect with HTTPS you will need to pass the
no_check_certificate
flag:import dataiku dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY", no_check_certificate=True) client = dataiku.api_client() print(client.list_project_keys())
with environment variables to be initialized before starting your Python environment:
export DKU_DSS_URL="https://dss.example" export DKU_API_KEY="YOURAPIKEY"
import dataiku client = dataiku.api_client() print(client.list_project_keys())
Warning
You can not turn off certificate checking via environment variables.
with a configuration file that should be located at
$HOME/.dataiku/.config.json
(or%USERPROFILE%/.dataiku/config.json
on Windows) with the following structure:{ "dss_instances": { "default": { "url": "https://dss.example", "api_key": "YOURAPIKEY" } }, "default_instance": "default" }
import dataiku client = dataiku.api_client() print(client.list_project_keys())
Warning
If your instance has a self-signed or expired certificate, in order to connect with HTTPS you will need to add the
no_check_certificate
property:{ "dss_instances": { "default": { "url": "https://dss.example", "api_key": "YOURAPIKEY", "no_check_certificate": true } }, "default_instance": "default" }
If at some point you need to clear the connection settings, you can do so with the following code:
dataiku.clear_remote_dss()
The configuration will be cleared. If you are using the client within your DSS instance, it will target the API of your instance.
To work with the API, a connection needs to be established with DSS, by creating a DSSClient
object.
Once the connection is established, the DSSClient
object serves as the entry point to the other calls.
import dataikuapi
host = "http://localhost:11200"
apiKey = "some_key"
client = dataikuapi.DSSClient(host, apiKey)
# client is now a DSSClient and can perform all authorized actions.
# For example, list the project keys for which the API key has access
client.list_project_keys()
Warning
If your DSS has SSL enabled, the package will verify the certificate. In order for this to work, you may need to add the root authority that signed the DSS SSL certificate to your local trust store. Please refer to your OS or Python manual for instructions.
If this is not possible, you can also disable checking the SSL certificate by using DSSClient(host, apiKey, insecure_tls=True)
Testing your setup#
The last step is to check if you can properly connect to your Dataiku instance. For that, you can use the code snippet below:
import dataiku
# Uncomment this if you are not using environment variables or a configuration file
# dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY")
client = dataiku.api_client()
# Uncomment this if your instance has a self-signed certificate
# client._session.verify = False
info = client.get_auth_info()
print(info)
If all goes well, you should see an output similar to this:
{
"authSource": "PERSONAL_API_KEY",
"via": [],
"authIdentifier": "your-user-name",
"groups": ["one_group", "another_group"],
"userProfile": "DESIGNER",
"associatedDSSUSer": "your-user-name",
"userForImpersonation": "your-user-name"
}
If so, congratulations: your setup is now fully operational! You can move on and set up Dataiku plugins/extensions in your favorite IDE or learn more about how to automate things using the public API.