Project deployment#

This tutorial guides you through the various steps to deploy a project. We will create a bundle, specifying the infrastructure to use, and publish it using a deployer:

  • The bundle will represent the state of the project as you want to use it.

  • The deployment infrastructures categorize resources and settings used to deploy the bundle.

  • The Project Deployer is the tool that will operate your actions.

Prerequisites#

  • Dataiku >= 13.4

  • Python >= 3.10

Introduction#

In the life of a project, you will reach a point where it is considered ready for the next steps. Depending on the goals and topics of the project, you may have some or all of the following:

  • well-documented workflows

  • optimized data pipelines

  • quality rules and/or checks

  • agents adding value to the data

  • dashboards and webapps

The project will now need to be deployed to a QA or preproduction instance to prepare for the launch in production.

For this tutorial, you will need a project to interact with. You can use one of your own, or use one of the many projects created in other tutorials from this developer guide.

To develop and test the code, you have multiple possibilities, but you may consider using:

Exporting a bundle#

The first step to deploy your project is to export a bundle. A bundle essentially captures a project’s state at a specific point in time, including the necessary data required for task recomputation. You can find more in-depth information about bundles in the documentation.

import dataiku

# Get the API client to interact with Dataiku
client = dataiku.api_client()

# Export a bundle for your project
PROJECT_KEY = "" # Fill with your project key
project = client.get_project(PROJECT_KEY)
BUNDLE_ID = "" # Fill with the unique identifier for the bundle
release_notes = "" # Indicates the changes introduced by this bundle
bundle = project.export_bundle(BUNDLE_ID, release_notes)

Publishing the bundle#

Once the bundle is available in the project, you need to publish it to the Project Deployer. The Project Deployer is the tool that will centralize the bundle, deployments, and deployment infrastructures.

# Publish the bundle in a published project
PUBLISHED_PROJECT_ID = "" # Fill with the identifier of the published project
published_project = project.publish_bundle(BUNDLE_ID, PUBLISHED_PROJECT_ID)

Note

Using the method publish_bundle() creates a new published project when the specified PUBLISHED_PROJECT_ID doesn’t exist. If you prefer to create a published project before using it, you can refer to this code snippet for guidance.

Choosing an infrastructure#

The bundle will be deployed on a deployment infrastructure. The deployment infrastructure is a mechanism for organizing the resources and settings used during bundle deployment. You will find more details in the Deployment infrastructures documentation.

The choice depends on how you organize your Dataiku projects and infrastructure, but you can use this code snippet to list the existing deployment infrastructure identifiers.

Creating a deployment#

With a bundle and infrastructure identifiers, you can now use the DSSProjectDeployer class to create a deployment. This action will push all the necessary information to the targeted deployment infrastructure.

# Get the project deployer to deploy and control deployment
project_deployer = client.get_projectdeployer()

INFRA_ID = "" # Fill with the deployment ID you chose
DEPLOYMENT_ID = "" # Fill with your deployment ID
deployment = project_deployer.create_deployment(deployment_id=DEPLOYMENT_ID, project_key=PUBLISHED_PROJECT_ID, infra_id=INFRA_ID, bundle_id=BUNDLE_ID)

# Start the deployment
update = deployment.start_update()
update.wait_for_result()
print(f"Deployment state: {update.state}")

Complete code#

Here is the complete code for this tutorial:

deploy.py
import dataiku

# Get the API client to interact with Dataiku
client = dataiku.api_client()

# Export a bundle for your project
PROJECT_KEY = "" # Fill with your project key
project = client.get_project(PROJECT_KEY)
BUNDLE_ID = "" # Fill with the unique identifier for the bundle
release_notes = "" # Indicates the changes introduced by this bundle
bundle = project.export_bundle(BUNDLE_ID, release_notes)

# Publish the bundle in a published project
PUBLISHED_PROJECT_ID = "" # Fill with the identifier of the published project
published_project = project.publish_bundle(BUNDLE_ID, PUBLISHED_PROJECT_ID)

# Get the project deployer to deploy and control deployment
project_deployer = client.get_projectdeployer()

INFRA_ID = "" # Fill with the deployment ID you chose
DEPLOYMENT_ID = "" # Fill with your deployment ID
deployment = project_deployer.create_deployment(deployment_id=DEPLOYMENT_ID, project_key=PUBLISHED_PROJECT_ID, infra_id=INFRA_ID, bundle_id=BUNDLE_ID)

# Start the deployment
update = deployment.start_update()
update.wait_for_result()
print(f"Deployment state: {update.state}")

Wrapping up/Conclusion#

Congratulations! You are now able to deploy projects. For additional information, refer to the documentation on Production deployments and bundles.

Reference documentation#

Classes#

dataikuapi.DSSClient(host[, api_key, ...])

Entry point for the DSS API client

dataikuapi.dss.project.DSSProject(client, ...)

A handle to interact with a project on the DSS instance.

dataikuapi.dss.projectdeployer.DSSProjectDeployer(client)

Handle to interact with the Project Deployer.

dataikuapi.dss.projectdeployer.DSSProjectDeployerDeployment(...)

A deployment on the Project Deployer.

Functions#

create_deployment(deployment_id, ...[, ...])

Create a deployment and return the handle to interact with it.

export_bundle(bundle_id[, release_notes, ...])

Creates a new project bundle on the Design node

get_project(project_key)

Get a handle to interact with a specific project.

get_projectdeployer()

Gets a handle to work with the Project Deployer

publish_bundle(bundle_id[, ...])

Publish a bundle on the Project Deployer.

start_update()

Start an asynchronous update of this deployment.