Projects#

Projects are the main unit for organising workflows within the Dataiku platform.

Basic operations#

This section provides common examples of how to programmatically manipulate Projects.

Listing Projects#

The main identifier for Projects is the Project Key. The following can be run to access the list of Project Keys on a Dataiku instance:

import dataiku
client = dataiku.api_client()

# Get a list of Project Keys
project_keys = client.list_project_keys()

Handling an existing Project#

To manipulate a Project and its associated items you first need to get its handle, in the form of a dataikuapi.dss.project.DSSProject object. If the Project already exists on the instance, run:

project = client.get_project("CHURN")

You can also directly get a handle on the current Project you are working on:

project = client.get_default_project()

Creating a new Project#

The following code will create a new empty Project and return its handle:

project = client.create_project(project_key="MYPROJECT",
                                    name="My very own project",
                                    owner="alice")

You can also duplicate an existing Project and get a handle on its copy:

original_project = client.get_project("CHURN")
copy_result = original_project.duplicate(target_project_key="CHURNCOPY",
                                          target_project_name="Churn (copy)")
project = client.get_project(copy_result.get('targetProjectKey', None))

Finally, you can import a Project archive (zip file) and get a handle on the resulting Project. The newly imported Project should not already exist, and the projectKey must be unique.

archive_path = "/path/to/archive.zip"
with open(archive_path, "rb") as f:
    import_result = client.prepare_project_import(f).execute()
    # TODO Get handle

Accessing Project items#

Once your Project handle is created, you can use it to create, list and interact with Project items:

# Print the names of all Datasets in the Project:
for d in project.list_datasets():
    print(d.name)

# Create a new empty Managed Folder:
folder = project.create_managed_folder(name="myfolder")

# Get a handle on a Dataset:
customer_data = project.get_dataset("customers")

Exporting a Project#

To create a Project export archive and save it locally (i.e. on the Dataiku instance server), run the following:

import os
dir_path = "path/to/your/project/export/directory"
archive_name = f"{project.project_key}.zip"
with project.get_export_stream() as s:
    target = os.path.join(dir_path, archive_name)
    with open(target, "wb") as f:
        for chunk in s.stream(512):
            f.write(chunk)

Deleting a Project#

To delete a Project and all its associated objects, run the following:

project.delete()

Warning

While the Project’s Dataset objects will be deleted, by default the underlying data will remain. To clear the data as well, set the clear_managed_datasets argument to True. The deletion operation is permanent so use this method with caution.

Detailed examples#

This section contains more advanced examples on Projects.

Editing Project permissions#

You can programmatically add or change Group permissions for a given Project using the set_permissions() method. In the following example, the ‘readers’ Group is added to the DKU_TSHIRTS Project with read-only permissions:

import dataiku

PROJECT_KEY = "DKU_TSHIRTS"
GROUP = "readers"

client = dataiku.api_client()
project = client.get_project(PROJECT_KEY)
permissions = project.get_permissions()

new_perm = {
    "group": GROUP,
    "admin": False,
    "executeApp": False,
    "exportDatasetsData": False,
    "manageAdditionalDashboardUsers": False,
    "manageDashboardAuthorizations": False,
    "manageExposedElements": False,
    "moderateDashboards": False,
    "readDashboards": True,
    "readProjectContent": True,
    "runScenarios": False,
    "shareToWorkspaces": False,
    "writeDashboards": False,
    "writeProjectContent": False
}

permissions["permissions"].append(new_perm)
project.set_permissions(permissions)

Creating a Project with custom settings#

You can add pre-built properties to your Projects when creating them using the API. This example illustrates how to generate a Project and define the following properties:

  • name

  • description

  • tags

  • status

  • checklist

First, create a helper function to generate the checklist :

def create_checklist(author, items):
    checklist = {
        "title": "To-do list",
        "createdOn": 0,
        "items": []
    }
    for item in items:
        checklist["items"].append({
            "createdBy": author,
            "createdOn": int(datetime.now().timestamp()),
            "done": False,
            "stateChangedOn": 0,
            "text": item
        })
    return checklist

You can now write the creation function, which wraps the create_project() method and returns a handle to the newly-created Project:

def create_custom_project(client,
                          project_key,
                          name,
                          custom_tags,
                          description,
                          checklist_items):
    current_user = client.get_auth_info()["authIdentifier"]
    project = client.create_project(project_key=project_key,
                                    name=name,
                                    owner=current_user,
                                    description=description)
    # Add tags                                 
    tags = project.get_tags()
    tags["tags"] = {k: {} for k in custom_tags}
    project.set_tags(tags)

    # Add checklist
    metadata = project.get_metadata()
    metadata["checklists"]["checklists"].append(create_checklist(author=current_user,
                                                                 items=checklist_items))
    project.set_metadata(metadata)

    # Set default status to "Draft"
    settings = project.get_settings()
    settings.settings["projectStatus"] = "Draft"
    settings.save()

    return project

This is how you would call this function:

client = dataiku.api_client()
tags = ["work-in-progress", "machine-learning", "priority-high"]
checklist = [
    "Connect to data sources",
    "Clean, aggregate and join data",
    "Train ML model",
    "Evaluate ML model",
    "Deploy ML model to production"
    ]
            
project = create_custom_project(client=client,
                                project_key="MYPROJECT",
                                name="A custom Project",
                                custom_tags=tags,
                                description="This is a cool Project",
                                checklist_items=checklist)

Export multiple Projects at once#

If instead of just exporting a single Project you want to generate exports several Projects in one go and store the resulting archives in a local Managed Folder, you can extend the usage of get_export_stream() with the following example:


import dataiku
import os

from datetime import datetime

PROJECT_KEY = "BACKUP_PROJECTS"
FOLDER_NAME = "exports"
PROJECT_KEYS_TO_EXPORT = ["FOO", "BAR"]

# Generate timestamp (e.g. 20221201-123000)
ts = datetime \
    .now() \
    .strftime("%Y%m%d-%H%M%S")

client = dataiku.api_client()
project = client.get_project(PROJECT_KEY)
folder_path = dataiku.Folder(FOLDER_NAME) \
    .get_path()
for pkey in PROJECT_KEYS_TO_EXPORT:
    zip_name = f"{pkey}-{ts}.zip"
    pkey_project = client.get_project(pkey)
    with pkey_project.get_export_stream() as es:
        target = os.path.join(folder_path, zip_name)
        with open(target, "wb") as f:
            for chunk in es.stream(512):
                f.write(chunk)

Reference documentation#

dataikuapi package#

dataikuapi.dss.project.DSSProject(client, ...)

A handle to interact with a project on the DSS instance.

dataikuapi.dss.project.DSSProjectGit(client, ...)

Handle to manage the git repository of a DSS project (fetch, push, pull, ...)

dataiku package#

dataiku.Project([project_key])

This is a handle to interact with the current project