Setting up the Dataiku API local environment#

Dataiku provides a Python client library to programmatically interact with its platform. If you work directly within the Dataiku web interface, then the client is available out-of-the-box and the authentication is done transparently from the current context. If you work outside of the Dataiku web interface, there are a few additional setup steps to complete.

This tutorial will help you set up your local environment by walking you through each of these steps.

Prerequisites#

  • Access to a Dataiku DSS instance via an API key (see the documentation for explanations on how to generate one)

  • Python >= 3.7 with the following packages:

    • numpy==1.23.5

    • pandas==1.3.5

Building your local virtual environment#

The first step is to create a Python virtual environment on your computer in which you will install all the required dependencies.

  • Generate a new virtual environment using the tool of your choice (venv, Pipfile, poetry). In this tutorial we’ll use venv and call our environment dataiku_local_env:

    # Create the virtual environment
    python3 -m venv dataiku_local_env
    
    # Activate the virtual environment
    source dataiku_local_env/bin/activate
    
  • Install the dataiku package by running the following command (replace https://dss.example with your own instance’s complete URL)

    pip install https://dss.example/public/packages/dataiku-internal-client.tar.gz
    

    Warning

    If your instance has a self-signed or expired certificate, in order to connect with HTTP you will need to add the --trusted-host flag:

    pip install https://dss.example/public/packages/dataiku-internal-client.tar.gz --trusted-host https://dss.example
    
  • Install the dataikuapi package directly from the PyPI repository:

    pip install dataiku-api-client
    

Once all relevant packages are installed, you can establish the connection with your Dataiku instance.

Connecting to your Dataiku instance#

The connection with your Dataiku instance will be established by the dataiku package which will look for :

  • the instance URL,

  • the API key to use for authentication.

You can provide this information in different ways:

  • directly inside your code, by using the set_remote_dss() method and replacing YOURAPIKEY with your own API key:

    import dataiku
    dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY")
    

    Warning

    If your instance has a self-signed or expired certificate, in order to connect with HTTPS you will need to pass the no_check_certificate flag:

    import dataiku
    dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY", no_check_certificate=True)
    
  • with environment variables to be initialized before starting your Python environment:

    export DKU_DSS_URL="https://dss.example"
    export DKU_API_KEY="YOURAPIKEY"
    
  • with a configuration file that should be located at $HOME/.dataiku/.config.json (or %USERPROFILE%/.dataiku/config.json on Windows) with the following structure:

    {
      "dss_instances": {
        "default": {
          "url": "https://dss.example",
          "api_key": "YOURAPIKEY"
        }
      },
      "default_instance": "default"
    }
    

    Warning

    If your instance has a self-signed or expired certificate, in order to connect with HTTPS you will need to add the no_check_certificate property:

    {
      "dss_instances": {
        "default": {
          "url": "https://dss.example",
          "api_key": "YOURAPIKEY",
          "no_check_certificate": true
        }
      },
      "default_instance": "default"
    }
    

Testing your setup#

The last step is to check if you can properly connect to your Dataiku instance. For that, you can use the code snippet below:

import dataiku

# Uncomment this if you are not using environment variables or a configuration file
# dataiku.set_remote_dss("https://dss.example", "YOURAPIKEY")

client = dataiku.api_client()

# Uncomment this if your instance has a self-signed certificate
# client._session.verify = False

info = client.get_auth_info()
print(info)

If all goes well, you should see an output similar to this:

{
    "authSource": "PERSONAL_API_KEY",
    "via": [],
    "authIdentifier": "your-user-name",
    "groups": ["one_group", "another_group"],
    "userProfile": "DESIGNER",
    "associatedDSSUSer": "your-user-name",
    "userForImpersonation": "your-user-name"
}

If so, congratulations: your setup is now fully operational ! You can move on and set up Dataiku plugins/extensions in your favorite IDE or learn more about how to automate things using the public API.