Projects#
Projects are the main unit for organising workflows within the Dataiku platform.
Basic operations#
This section provides common examples of how to programmatically manipulate Projects.
Listing Projects#
The main identifier for Projects is the Project Key. The following can be run to access the list of Project Keys on a Dataiku instance:
import dataiku
client = dataiku.api_client()
# Get a list of Project Keys
project_keys = client.list_project_keys()
Handling an existing Project#
To manipulate a Project and its associated items you first need to get its handle, in the form of a dataikuapi.dss.project.DSSProject
object. If the Project already exists on the instance, run:
project = client.get_project("CHURN")
You can also directly get a handle on the current Project you are working on:
project = client.get_default_project()
Creating a new Project#
The following code will create a new empty Project and return its handle:
project = client.create_project(project_key="MYPROJECT",
name="My very own project",
owner="alice")
You can also duplicate an existing Project and get a handle on its copy:
original_project = client.get_project("CHURN")
copy_result = original_project.duplicate(target_project_key="CHURNCOPY",
target_project_name="Churn (copy)")
project = client.get_project(copy_result.get('targetProjectKey', None))
Finally, you can import a Project archive (zip file) and get a handle on the resulting Project.
The newly imported Project should not already exist, and the projectKey
must be unique.
archive_path = "/path/to/archive.zip"
with open(archive_path, "rb") as f:
import_result = client.prepare_project_import(f).execute()
# TODO Get handle
Accessing Project items#
Once your Project handle is created, you can use it to create, list and interact with Project items:
# Print the names of all Datasets in the Project:
for d in project.list_datasets():
print(d.name)
# Create a new empty Managed Folder:
folder = project.create_managed_folder(name="myfolder")
# Get a handle on a Dataset:
customer_data = project.get_dataset("customers")
Exporting a Project#
To create a Project export archive and save it locally (i.e. on the Dataiku instance server), run the following:
import os
dir_path = "path/to/your/project/export/directory"
archive_name = f"{project.project_key}.zip"
with project.get_export_stream() as s:
target = os.path.join(dir_path, archive_name)
with open(target, "wb") as f:
for chunk in s.stream(512):
f.write(chunk)
Deleting a Project#
To delete a Project and all its associated objects, run the following:
project.delete()
Warning
While the Project’s Dataset objects will be deleted, by default the underlying data will remain. To clear the data as well, set the clear_managed_datasets
argument to True
. The deletion operation is permanent so use this method with caution.
Detailed examples#
This section contains more advanced examples on Projects.
Editing Project permissions#
You can programmatically add or change Group permissions for a given Project using the set_permissions()
method. In the following example, the ‘readers’ Group is added to the DKU_TSHIRTS
Project with read-only permissions:
import dataiku
PROJECT_KEY = "DKU_TSHIRTS"
GROUP = "readers"
client = dataiku.api_client()
project = client.get_project(PROJECT_KEY)
permissions = project.get_permissions()
new_perm = {
"group": GROUP,
"admin": False,
"executeApp": False,
"exportDatasetsData": False,
"manageAdditionalDashboardUsers": False,
"manageDashboardAuthorizations": False,
"manageExposedElements": False,
"moderateDashboards": False,
"readDashboards": True,
"readProjectContent": True,
"runScenarios": False,
"shareToWorkspaces": False,
"writeDashboards": False,
"writeProjectContent": False
}
permissions["permissions"].append(new_perm)
project.set_permissions(permissions)
Creating a Project with custom settings#
You can add pre-built properties to your Projects when creating them using the API. This example illustrates how to generate a Project and define the following properties:
name
description
tags
status
checklist
First, create a helper function to generate the checklist :
def create_checklist(author, items):
checklist = {
"title": "To-do list",
"createdOn": 0,
"items": []
}
for item in items:
checklist["items"].append({
"createdBy": author,
"createdOn": int(datetime.now().timestamp()),
"done": False,
"stateChangedOn": 0,
"text": item
})
return checklist
You can now write the creation function, which wraps the create_project()
method and returns a handle to the newly-created Project:
def create_custom_project(client,
project_key,
name,
custom_tags,
description,
checklist_items):
current_user = client.get_auth_info()["authIdentifier"]
project = client.create_project(project_key=project_key,
name=name,
owner=current_user,
description=description)
# Add tags
tags = project.get_tags()
tags["tags"] = {k: {} for k in custom_tags}
project.set_tags(tags)
# Add checklist
metadata = project.get_metadata()
metadata["checklists"]["checklists"].append(create_checklist(author=current_user,
items=checklist_items))
project.set_metadata(metadata)
# Set default status to "Draft"
settings = project.get_settings()
settings.settings["projectStatus"] = "Draft"
settings.save()
return project
This is how you would call this function:
client = dataiku.api_client()
tags = ["work-in-progress", "machine-learning", "priority-high"]
checklist = [
"Connect to data sources",
"Clean, aggregate and join data",
"Train ML model",
"Evaluate ML model",
"Deploy ML model to production"
]
project = create_custom_project(client=client,
project_key="MYPROJECT",
name="A custom Project",
custom_tags=tags,
description="This is a cool Project",
checklist_items=checklist)
Export multiple Projects at once#
If instead of just exporting a single Project you want to generate exports several Projects in one go and store the resulting archives in a local Managed Folder, you can extend the usage of get_export_stream()
with the following example:
import dataiku
import os
from datetime import datetime
PROJECT_KEY = "BACKUP_PROJECTS"
FOLDER_NAME = "exports"
PROJECT_KEYS_TO_EXPORT = ["FOO", "BAR"]
# Generate timestamp (e.g. 20221201-123000)
ts = datetime \
.now() \
.strftime("%Y%m%d-%H%M%S")
client = dataiku.api_client()
project = client.get_project(PROJECT_KEY)
folder_path = dataiku.Folder(FOLDER_NAME) \
.get_path()
for pkey in PROJECT_KEYS_TO_EXPORT:
zip_name = f"{pkey}-{ts}.zip"
pkey_project = client.get_project(pkey)
with pkey_project.get_export_stream() as es:
target = os.path.join(folder_path, zip_name)
with open(target, "wb") as f:
for chunk in es.stream(512):
f.write(chunk)
Reference documentation#
dataikuapi
package#
|
A handle to interact with a project on the DSS instance. |
|
Handle to manage the git repository of a DSS project (fetch, push, pull, ...) |
dataiku
package#
|
This is a handle to interact with the current project |