Projects#

For usage information and examples, please see Projects.

class dataikuapi.dss.project.DSSProject(client, project_key)#

A handle to interact with a project on the DSS instance.

Important

Do not create this class directly, instead use dataikuapi.DSSClient.get_project()

get_summary()#

Returns a summary of the project. The summary is a read-only view of some of the state of the project. You cannot edit the resulting dict and use it to update the project state on DSS, you must use the other more specific methods of this dataikuapi.dss.project.DSSProject object

Returns:

a dict containing a summary of the project. Each dict contains at least a projectKey field

Return type:

dict

get_project_folder()#

Get the folder containing this project

Return type:

dataikuapi.dss.projectfolder.DSSProjectFolder

move_to_folder(folder)#

Moves this project to a project folder

Parameters:

folder (dataikuapi.dss.projectfolder.DSSProjectFolder) – destination folder

delete(clear_managed_datasets=False, clear_output_managed_folders=False, clear_job_and_scenario_logs=True, **kwargs)#

Delete the project

Attention

This call requires an API key with admin rights

Parameters:
  • clear_managed_datasets (bool) – Should the data of managed datasets be cleared (defaults to False)

  • clear_output_managed_folders (bool) – Should the data of managed folders used as outputs of recipes be cleared (defaults to False)

  • clear_job_and_scenario_logs (bool) – Should the job and scenario logs be cleared (defaults to True)

get_export_stream(options=None)#

Return a stream of the exported project

Warning

You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Parameters:

options (dict) –

Dictionary of export options (defaults to {}). The following options are available:

  • exportUploads (boolean): Exports the data of Uploaded datasets (default to False)

  • exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)

  • exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)

  • exportSavedModels (boolean): Exports the models trained in saved models (default to False)

  • exportManagedFolders (boolean): Exports the data of managed folders (default to False)

  • exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)

  • exportAllDatasets (boolean): Exports the data of all datasets (default to False)

  • exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)

  • exportGitRepository (boolean): Exports the Git repository history (default to False)

  • exportInsightsData (boolean): Exports the data of static insights (default to False)

Returns:

a stream of the export archive

Return type:

file-like object

export_to_file(path, options=None)#

Export the project to a file

Parameters:
  • path (str) – the path of the file in which the exported project should be saved

  • options (dict) –

    Dictionary of export options (defaults to {}). The following options are available:

    • exportUploads (boolean): Exports the data of Uploaded datasets (default to False)

    • exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)

    • exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)

    • exportSavedModels (boolean): Exports the models trained in saved models (default to False)

    • exportModelEvaluationStores (boolean): Exports the evaluation stores (default to False)

    • exportManagedFolders (boolean): Exports the data of managed folders (default to False)

    • exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)

    • exportAllDatasets (boolean): Exports the data of all datasets (default to False)

    • exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)

    • exportGitRepository (boolean): Exports the Git repository history (default to False)

    • exportInsightsData (boolean): Exports the data of static insights (default to False)

duplicate(target_project_key, target_project_name, duplication_mode='MINIMAL', export_analysis_models=True, export_saved_models=True, export_git_repository=True, export_insights_data=True, remapping=None, target_project_folder=None)#

Duplicate the project

Parameters:
  • target_project_key (str) – The key of the new project

  • target_project_name (str) – The name of the new project

  • duplication_mode (str) – can be one of the following values: MINIMAL, SHARING, FULL, NONE (defaults to MINIMAL)

  • export_analysis_models (bool) – (defaults to True)

  • export_saved_models (bool) – (defaults to True)

  • export_git_repository (bool) – (defaults to True)

  • export_insights_data (bool) – (defaults to True)

  • remapping (dict) – dict of connections to be remapped for the new project (defaults to {})

  • target_project_folder (A dataikuapi.dss.projectfolder.DSSProjectFolder) – the project folder where to put the duplicated project (defaults to None)

Returns:

A dict containing the original and duplicated project’s keys

Return type:

dict

get_metadata()#

Get the metadata attached to this project. The metadata contains label, description checklists, tags and custom metadata of the project.

Note

For more information on available metadata, please see https://doc.dataiku.com/dss/api/latest/rest/

Returns:

the project metadata.

Return type:

dict

set_metadata(metadata)#

Set the metadata on this project.

Usage example:

project_metadata = project.get_metadata()
project_metadata['tags'] = ['tag1','tag2']
project.set_metadata(project_metadata)
Parameters:

metadata (dict) – the new state of the metadata for the project. You should only set a metadata object that has been retrieved using the get_metadata() call.

get_settings()#

Gets the settings of this project. This does not contain permissions. See get_permissions()

Returns:

a handle to read, modify and save the settings

Return type:

dataikuapi.dss.project.DSSProjectSettings

get_permissions()#

Get the permissions attached to this project

Returns:

A dict containing the owner and the permissions, as a list of pairs of group name and permission type

Return type:

dict

set_permissions(permissions)#

Sets the permissions on this project

Usage example:

project_permissions = project.get_permissions()
project_permissions['permissions'].append({'group':'data_scientists',
                                            'readProjectContent': True,
                                            'readDashboards': True})
project.set_permissions(project_permissions)
Parameters:

permissions (dict) – a permissions object with the same structure as the one returned by get_permissions() call

get_interest()#

Get the interest of this project. The interest means the number of watchers and the number of stars.

Returns:

a dict object containing the interest of the project with two fields:

  • starCount: number of stars for this project

  • watchCount: number of users watching this project

Return type:

dict

get_timeline(item_count=100)#

Get the timeline of this project. The timeline consists of information about the creation of this project (by whom, and when), the last modification of this project (by whom and when), a list of contributors, and a list of modifications. This list of modifications contains a maximum of item_count elements (default to 100). If item_count is greater than the real number of modification, item_count is adjusted.

Parameters:

item_count (int) – maximum number of modifications to retrieve in the items list

Returns:

a timeline where the top-level fields are :

  • allContributors: all contributors who have been involved in this project

  • items: a history of the modifications of the project

  • createdBy: who created this project

  • createdOn: when the project was created

  • lastModifiedBy: who modified this project for the last time

  • lastModifiedOn: when this modification took place

Return type:

dict

list_datasets(as_type='listitems')#

List the datasets in this project.

Parameters:

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns:

The list of the datasets. If “as_type” is “listitems”, each one as a dataikuapi.dss.dataset.DSSDatasetListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.dataset.DSSDataset

Return type:

list

get_dataset(dataset_name)#

Get a handle to interact with a specific dataset

Parameters:

dataset_name (str) – the name of the desired dataset

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)#

Create a new dataset in the project, and return a handle to interact with it.

The precise structure of params and formatParams depends on the specific dataset type and dataset format type. To know which fields exist for a given dataset type and format type, create a dataset from the UI, and use get_dataset() to retrieve the configuration of the dataset and inspect it. Then reproduce a similar structure in the create_dataset() call.

Not all settings of a dataset can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the dataset

Parameters:
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • type (str) – the type of the dataset

  • params (dict) – the parameters for the type, as a python dict (defaults to {})

  • formatType (str) – an optional format to create the dataset with (only for file-oriented datasets)

  • formatParams (dict) – the parameters to the format, as a python dict (only for file-oriented datasets, default to {})

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

create_upload_dataset(dataset_name, connection=None)#

Create a new dataset of type ‘UploadedFiles’ in the project, and return a handle to interact with it.

Parameters:
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • connection (str) – the name of the upload connection (defaults to None)

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

create_filesystem_dataset(dataset_name, connection, path_in_connection)#

Create a new filesystem dataset in the project, and return a handle to interact with it.

Parameters:
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • connection (str) – the name of the connection

  • path_in_connection (str) – the path of the dataset in the connection

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

create_s3_dataset(dataset_name, connection, path_in_connection, bucket=None)#

Creates a new external S3 dataset in the project and returns a dataikuapi.dss.dataset.DSSDataset to interact with it.

The created dataset does not have its format and schema initialized, it is recommended to use autodetect_settings() on the returned object

Parameters:
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • connection (str) – the name of the connection

  • path_in_connection (str) – the path of the dataset in the connection

  • bucket (str) – the name of the s3 bucket (defaults to None)

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

create_gcs_dataset(dataset_name, connection, path_in_connection, bucket=None)#

Creates a new external GCS dataset in the project and returns a dataikuapi.dss.dataset.DSSDataset to interact with it.

The created dataset does not have its format and schema initialized, it is recommended to use autodetect_settings() on the returned object

Parameters:
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • connection (str) – the name of the connection

  • path_in_connection (str) – the path of the dataset in the connection

  • bucket (str) – the name of the GCS bucket (defaults to None)

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

create_azure_blob_dataset(dataset_name, connection, path_in_connection, container=None)#

Creates a new external Azure dataset in the project and returns a dataikuapi.dss.dataset.DSSDataset to interact with it.

The created dataset does not have its format and schema initialized, it is recommended to use autodetect_settings() on the returned object

Parameters:
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • connection (str) – the name of the connection

  • path_in_connection (str) – the path of the dataset in the connection

  • container (str) – the name of the storage account container (defaults to None)

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

create_fslike_dataset(dataset_name, dataset_type, connection, path_in_connection, extra_params=None)#

Create a new file-based dataset in the project, and return a handle to interact with it.

Parameters:
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • dataset_type (str) – the type of the dataset

  • connection (str) – the name of the connection

  • path_in_connection (str) – the path of the dataset in the connection

  • extra_params (dict) – a python dict of extra parameters (defaults to None)

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

create_sql_table_dataset(dataset_name, type, connection, table, schema, catalog=None)#

Create a new SQL table dataset in the project, and return a handle to interact with it.

Parameters:
  • dataset_name (str) – the name of the dataset to create. Must not already exist

  • type (str) – the type of the dataset

  • connection (str) – the name of the connection

  • table (str) – the name of the table in the connection

  • schema (str) – the schema of the table

  • catalog (str) – [optional] the catalog of the table

Returns:

A dataset handle

Return type:

dataikuapi.dss.dataset.DSSDataset

new_managed_dataset_creation_helper(dataset_name)#

Caution

Deprecated. Please use new_managed_dataset()

new_managed_dataset(dataset_name)#

Initializes the creation of a new managed dataset. Returns a dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper or one of its subclasses to complete the creation of the managed dataset.

Usage example:

builder = project.new_managed_dataset("my_dataset")
builder.with_store_into("target_connection")
dataset = builder.create()
Parameters:

dataset_name (str) – Name of the dataset to create

Returns:

An object to create the managed dataset

Return type:

dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper

get_labeling_task(labeling_task_id)#

Get a handle to interact with a specific labeling task

Parameters:

labeling_task_id (str) – the id of the desired labeling task

Returns:

A labeling task handle

Return type:

dataikuapi.dss.labeling_task.DSSLabelingTask

list_streaming_endpoints(as_type='listitems')#

List the streaming endpoints in this project.

Parameters:

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns:

The list of the streaming endpoints. If “as_type” is “listitems”, each one as a dataikuapi.dss.streaming_endpoint.DSSStreamingEndpointListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

Return type:

list

get_streaming_endpoint(streaming_endpoint_name)#

Get a handle to interact with a specific streaming endpoint

Parameters:

streaming_endpoint_name (str) – the name of the desired streaming endpoint

Returns:

A streaming endpoint handle

Return type:

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_streaming_endpoint(streaming_endpoint_name, type, params=None)#

Create a new streaming endpoint in the project, and return a handle to interact with it.

The precise structure of params depends on the specific streaming endpoint type. To know which fields exist for a given streaming endpoint type, create a streaming endpoint from the UI, and use get_streaming_endpoint() to retrieve the configuration of the streaming endpoint and inspect it. Then reproduce a similar structure in the create_streaming_endpoint() call.

Not all settings of a streaming endpoint can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the streaming endpoint.

Parameters:
  • streaming_endpoint_name (str) – the name for the new streaming endpoint

  • type (str) – the type of the streaming endpoint

  • params (dict) – the parameters for the type, as a python dict (defaults to {})

Returns:

A streaming endpoint handle

Return type:

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_kafka_streaming_endpoint(streaming_endpoint_name, connection=None, topic=None)#

Create a new kafka streaming endpoint in the project, and return a handle to interact with it.

Parameters:
  • streaming_endpoint_name (str) – the name for the new streaming endpoint

  • connection (str) – the name of the kafka connection (defaults to None)

  • topic (str) – the name of the kafka topic (defaults to None)

Returns:

A streaming endpoint handle

Return type:

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

create_httpsse_streaming_endpoint(streaming_endpoint_name, url=None)#

Create a new https streaming endpoint in the project, and return a handle to interact with it.

Parameters:
  • streaming_endpoint_name (str) – the name for the new streaming endpoint

  • url (str) – the url of the endpoint (defaults to None)

Returns:

A streaming endpoint handle

Return type:

dataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint

new_managed_streaming_endpoint(streaming_endpoint_name, streaming_endpoint_type=None)#

Initializes the creation of a new streaming endpoint. Returns a dataikuapi.dss.streaming_endpoint.DSSManagedStreamingEndpointCreationHelper to complete the creation of the streaming endpoint

Parameters:
  • streaming_endpoint_name (str) – Name of the new streaming endpoint - must be unique in the project

  • streaming_endpoint_type (str) – Type of the new streaming endpoint (optional if it can be inferred from a connection type)

Returns:

An object to create the streaming endpoint

Return type:

DSSManagedStreamingEndpointCreationHelper

create_prediction_ml_task(input_dataset, target_variable, ml_backend_type='PY_MEMORY', guess_policy='DEFAULT', prediction_type=None, wait_guess_complete=True)#

Creates a new prediction task in a new visual analysis lab for a dataset.

Parameters:
  • input_dataset (str) – the dataset to use for training/testing the model

  • target_variable (str) – the variable to predict

  • ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)

  • guess_policy (str) – Policy to use for setting the default parameters. Valid values are: DEFAULT, SIMPLE_FORMULA, DECISION_TREE, EXPLANATORY and PERFORMANCE (defaults to DEFAULT)

  • prediction_type (str) – The type of prediction problem this is. If not provided the prediction type will be guessed. Valid values are: BINARY_CLASSIFICATION, REGRESSION, MULTICLASS (defaults to None)

  • wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Returns:

A ML task handle of type ‘PREDICTION’

Return type:

dataikuapi.dss.ml.DSSMLTask

create_clustering_ml_task(input_dataset, ml_backend_type='PY_MEMORY', guess_policy='KMEANS', wait_guess_complete=True)#

Creates a new clustering task in a new visual analysis lab for a dataset.

The returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms.

You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Parameters:
  • ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)

  • guess_policy (str) – Policy to use for setting the default parameters. Valid values are: KMEANS and ANOMALY_DETECTION (defaults to KMEANS)

  • wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Returns:

A ML task handle of type ‘CLUSTERING’

Return type:

dataikuapi.dss.ml.DSSMLTask

create_timeseries_forecasting_ml_task(input_dataset, target_variable, time_variable, timeseries_identifiers=None, guess_policy='TIMESERIES_DEFAULT', wait_guess_complete=True)#

Creates a new time series forecasting task in a new visual analysis lab for a dataset.

Parameters:
  • input_dataset (string) – The dataset to use for training/testing the model

  • target_variable (string) – The variable to forecast

  • time_variable (string) – Column to be used as time variable. Should be a Date (parsed) column.

  • timeseries_identifiers (list) – List of columns to be used as time series identifiers (when the dataset has multiple series)

  • guess_policy (string) – Policy to use for setting the default parameters. Valid values are: TIMESERIES_DEFAULT, TIMESERIES_STATISTICAL, and TIMESERIES_DEEP_LEARNING

  • wait_guess_complete (boolean) – If False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Returns:

dataiku.dss.ml.DSSMLTask

create_causal_prediction_ml_task(input_dataset, outcome_variable, treatment_variable, prediction_type=None, wait_guess_complete=True)#

Creates a new causal prediction task in a new visual analysis lab for a dataset.

Parameters:
  • input_dataset (string) – The dataset to use for training/testing the model

  • outcome_variable (string) – The outcome to predict.

  • treatment_variable (string) – Column to be used as treatment variable.

  • prediction_type (string or None) – Valid values are: “CAUSAL_BINARY_CLASSIFICATION”, “CAUSAL_REGRESSION” or None (in this case prediction_type will be set by the Guesser)

  • wait_guess_complete (boolean) – If False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)

Returns:

dataiku.dss.ml.DSSMLTask

list_ml_tasks()#

List the ML tasks in this project

Returns:

the list of the ML tasks summaries, each one as a python dict

Return type:

list

get_ml_task(analysis_id, mltask_id)#

Get a handle to interact with a specific ML task

Parameters:
  • analysis_id (str) – the identifier of the visual analysis containing the desired ML task

  • mltask_id (str) – the identifier of the desired ML task

Returns:

A ML task handle

Return type:

dataikuapi.dss.ml.DSSMLTask

list_mltask_queues()#

List non-empty ML task queues in this project

Returns:

an iterable listing of MLTask queues (each a dict)

Return type:

dataikuapi.dss.ml.DSSMLTaskQueues

create_analysis(input_dataset)#

Creates a new visual analysis lab for a dataset.

Parameters:

input_dataset (str) – the dataset to use for the analysis

Returns:

A visual analysis handle

Return type:

dataikuapi.dss.analysis.DSSAnalysis

list_analyses()#

List the visual analyses in this project

Returns:

the list of the visual analyses summaries, each one as a python dict

Return type:

list

get_analysis(analysis_id)#

Get a handle to interact with a specific visual analysis

Parameters:

analysis_id (str) – the identifier of the desired visual analysis

Returns:

A visual analysis handle

Return type:

dataikuapi.dss.analysis.DSSAnalysis

list_saved_models()#

List the saved models in this project

Returns:

the list of the saved models, each one as a python dict

Return type:

list

get_saved_model(sm_id)#

Get a handle to interact with a specific saved model

Parameters:

sm_id (str) – the identifier of the desired saved model

Returns:

A saved model handle

Return type:

dataikuapi.dss.savedmodel.DSSSavedModel

create_mlflow_pyfunc_model(name, prediction_type=None)#

Creates a new external saved model for storing and managing MLFlow models

Parameters:
  • name (str) – Human readable name for the new saved model in the flow

  • prediction_type (str) – Optional (but needed for most operations). One of BINARY_CLASSIFICATION, MULTICLASS, REGRESSION or None. Defaults to None, standing for other prediction types. If the Saved Model has a None prediction type, scoring, inclusion in a bundle or in an API service will be possible, but features related to performance analysis and explainability will not be available.

Returns:

The created saved model handle

Return type:

dataikuapi.dss.savedmodel.DSSSavedModel

create_external_model(name, prediction_type, configuration)#

Creates a new Saved model that can contain external remote endpoints as versions.

Parameters:
  • name (string) – Human-readable name for the new saved model in the flow

  • prediction_type (string) – One of BINARY_CLASSIFICATION, MULTICLASS or REGRESSION

  • configuration (dict) –

    A dictionary containing the desired external saved model configuration.

    • For SageMaker, the syntax is:

      configuration = {
          "protocol": "sagemaker",
          "region": "<region-name>"
          "connection": "<connection-name>"
      }
      

      Where the parameters have the following meaning:

      • region: The AWS region of the endpoint, e.g. eu-west-1

      • connection: (optional) The DSS SageMaker connection to use for authentication. If not defined, credentials will be derived from environment. See the reference documentation for details.

    • For AzureML, syntax is:

      configuration = {
          "protocol": "azure-ml",
          "connection": "<connection-name>",
          "subscription_id": "<id>",
          "resource_group": "<rg>",
          "workspace": "<workspace>"
      }
      

      Where the parameters have the following meaning:

      • connection: (optional) The DSS Azure ML connection to use for authentication. If not defined, credentials will be derived from environment. See the reference documentation for details.

      • subscription_id: The Azure subscription ID

      • resource_group: The Azure resource group

      • workspace: The Azure ML workspace

    • For Vertex AI, syntax is:

      configuration = {
          "protocol": "vertex-ai",
          "region": "<region-name>"
          "connection": "<connection-name>",
          "project_id": "<name> or <id>"
      }
      

      Where the parameters have the following meaning:

      • region: The GCP region of the endpoint, e.g. europe-west-1

      • connection: (optional) The DSS Vertex AI connection to use for authentication. If not defined, credentials will be derived from environment. See the reference documentation for details.

      • project_id: The ID or name of the GCP project

  • Example: create a saved model for SageMaker endpoints serving binary classification models in region eu-west-1

    import dataiku
    client = dataiku.api_client()
    project = client.get_default_project()
    configuration = {
        "protocol": "sagemaker",
        "region": "eu-west-1"
    }
    sm = project.create_external_model("SaveMaker Proxy Model", "BINARY_CLASSIFICATION", configuration)
    
  • Example: create a saved model for Vertex AI endpoints serving regression models in region eu-west-1, on project “my-project”, performing authentication using DSS connection “vertex_conn” of type “Vertex AI”.

    import dataiku
    client = dataiku.api_client()
    project = client.get_default_project()
    configuration = {
        "protocol": "vertex-ai",
        "region": "europe-west1",
        "connection": "vertex_conn"
        "project_id": "my-project"
    }
    sm = project.create_external_model("Vertex AI Proxy Model", "BINARY_CLASSIFICATION", configuration)
    
list_managed_folders()#

List the managed folders in this project

Returns:

the list of the managed folders, each one as a python dict

Return type:

list

get_managed_folder(odb_id)#

Get a handle to interact with a specific managed folder

Parameters:

odb_id (str) – the identifier of the desired managed folder

Returns:

A managed folder handle

Return type:

dataikuapi.dss.managedfolder.DSSManagedFolder

create_managed_folder(name, folder_type=None, connection_name='filesystem_folders')#

Create a new managed folder in the project, and return a handle to interact with it

Parameters:
  • name (str) – the name of the managed folder

  • folder_type (str) – type of storage (defaults to None)

  • connection_name (str) – the connection name (defaults to filesystem_folders)

Returns:

A managed folder handle

Return type:

dataikuapi.dss.managedfolder.DSSManagedFolder

list_model_evaluation_stores()#

List the model evaluation stores in this project.

Returns:

The list of the model evaluation stores

Return type:

list of dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

get_model_evaluation_store(mes_id)#

Get a handle to interact with a specific model evaluation store

Parameters:

mes_id (str) – the id of the desired model evaluation store

Returns:

A model evaluation store handle

Return type:

dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

create_model_evaluation_store(name)#

Create a new model evaluation store in the project, and return a handle to interact with it.

Parameters:

name (str) – the name for the new model evaluation store

Returns:

A model evaluation store handle

Return type:

dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore

list_model_comparisons()#

List the model comparisons in this project.

Returns:

The list of the model comparisons

Return type:

list

get_model_comparison(mec_id)#

Get a handle to interact with a specific model comparison

Parameters:

mec_id (str) – the id of the desired model comparison

Returns:

A model comparison handle

Return type:

dataikuapi.dss.modelcomparison.DSSModelComparison

create_model_comparison(name, prediction_type)#

Create a new model comparison in the project, and return a handle to interact with it.

Parameters:
  • name (str) – the name for the new model comparison

  • prediction_type (str) – one of BINARY_CLASSIFICATION, REGRESSION, MULTICLASS, TIMESERIES_FORECAST, CAUSAL_BINARY_CLASSIFICATION, CAUSAL_REGRESSION

Returns:

A new model comparison handle

Return type:

dataikuapi.dss.modelcomparison.DSSModelComparison

list_jobs()#

List the jobs in this project

Returns:

a list of the jobs, each one as a python dict, containing both the definition and the state

Return type:

list

get_job(id)#

Get a handler to interact with a specific job

Parameters:

id (str) – the id of the desired job

Returns:

A job handle

Return type:

dataikuapi.dss.job.DSSJob

start_job(definition)#

Create a new job, and return a handle to interact with it

Parameters:

definition (dict) –

The definition should contain:

  • the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)

  • a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)

  • (Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.

Returns:

A job handle

Return type:

dataikuapi.dss.job.DSSJob

start_job_and_wait(definition, no_fail=False)#

Starts a new job and waits for it to complete.

Parameters:
  • definition (dict) –

    The definition should contain:

    • the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)

    • a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)

    • (Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.

  • no_fail (bool) – if true, the function won’t fail even if the job fails or aborts (defaults to False)

Returns:

the final status of the job

Return type:

str

new_job(job_type='NON_RECURSIVE_FORCED_BUILD')#

Create a job to be run. You need to add outputs to the job (i.e. what you want to build) before running it.

job_builder = project.new_job()
job_builder.with_output("mydataset")
complete_job = job_builder.start_and_wait()
print("Job %s done" % complete_job.id)
Parameters:

job_type (str) – the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) (defaults to NON_RECURSIVE_FORCED_BUILD)

Returns:

A job handle

Return type:

dataikuapi.dss.project.JobDefinitionBuilder

new_job_definition_builder(job_type='NON_RECURSIVE_FORCED_BUILD')#

Caution

Deprecated. Please use new_job()

list_jupyter_notebooks(active=False, as_type='object')#

List the jupyter notebooks of a project.

Parameters:
  • active (bool) – if True, only return currently running jupyter notebooks (defaults to active).

  • as_type (bool) – How to return the list. Supported values are “listitems” and “object” (defaults to object).

Returns:

The list of the notebooks. If “as_type” is “listitems”, each one as a dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem, if “as_type” is “objects”, each one as a dataikuapi.dss.jupyternotebook.DSSJupyterNotebook

Return type:

list of dataikuapi.dss.jupyternotebook.DSSJupyterNotebook or list of dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem

get_jupyter_notebook(notebook_name)#

Get a handle to interact with a specific jupyter notebook

Parameters:

notebook_name (str) – The name of the jupyter notebook to retrieve

Returns:

A handle to interact with this jupyter notebook

Return type:

dataikuapi.dss.jupyternotebook.DSSJupyterNotebook jupyter notebook handle

create_jupyter_notebook(notebook_name, notebook_content)#

Create a new jupyter notebook and get a handle to interact with it

Parameters:
  • notebook_name (str) – the name of the notebook to create

  • notebook_content (dict) – the data of the notebook to create, as a dict. The data will be converted to a JSON string internally. Use DSSJupyterNotebook.get_content() on a similar existing DSSJupyterNotebook object in order to get a sample definition object.

Returns:

A handle to interact with the newly created jupyter notebook

Return type:

dataikuapi.dss.jupyternotebook.DSSJupyterNotebook jupyter notebook handle

list_continuous_activities(as_objects=True)#

List the continuous activities in this project

Parameters:

as_objects (bool) – if True, returns a list of dataikuapi.dss.continuousactivity.DSSContinuousActivity objects, else returns a list of python dicts (defaults to True)

Returns:

a list of the continuous activities, each one as a python dict, containing both the definition and the state

Return type:

list

get_continuous_activity(recipe_id)#

Get a handler to interact with a specific continuous activities

Parameters:

recipe_id (str) – the identifier of the recipe controlled by the continuous activity

Returns:

A job handle

Return type:

dataikuapi.dss.continuousactivity.DSSContinuousActivity

get_variables()#

Gets the variables of this project.

Returns:

a dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project

Return type:

dict

set_variables(obj)#

Sets the variables of this project.

Warning

If executed from a python recipe, the changes made by set_variables will not be “seen” in that recipe. Use the internal API dataiku.get_custom_variables() instead if this behavior is needed

Parameters:

obj (dict) – must be a modified version of the object returned by get_variables

update_variables(variables, type='standard')#

Updates a set of variables for this project

Parameters:
  • dict (variables) – a dict of variable name -> value to set. Keys of the dict must be strings. Values in the dict can be strings, numbers, booleans, lists or dicts

  • str (type) – Can be “standard” to update regular variables or “local” to update local-only variables that are not part of bundles for this project (defaults to standard)

list_api_services(as_type='listitems')#

List the API services in this project

Parameters:

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns:

The list of the datasets. If “as_type” is “listitems”, each one as a dataikuapi.dss.apiservice.DSSAPIServiceListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.apiservice.DSSAPIService

Return type:

list

create_api_service(service_id)#

Create a new API service, and returns a handle to interact with it. The newly-created service does not have any endpoint.

Parameters:

service_id (str) – the ID of the API service to create

Returns:

A API Service handle

Return type:

dataikuapi.dss.apiservice.DSSAPIService

get_api_service(service_id)#

Get a handle to interact with a specific API Service from the API Designer

Parameters:

service_id (str) – The identifier of the API Designer API Service to retrieve

Returns:

A handle to interact with this API Service

Return type:

dataikuapi.dss.apiservice.DSSAPIService

list_exported_bundles()#

List all the bundles created in this project on the Design Node.

Returns:

A dictionary of all bundles for a project on the Design node.

Return type:

dict

export_bundle(bundle_id)#

Creates a new project bundle on the Design node

Parameters:

bundle_id (str) – bundle id tag

delete_exported_bundle(bundle_id)#

Deletes a project bundle from the Design node :param str bundle_id: bundle id tag

get_exported_bundle_archive_stream(bundle_id)#

Download a bundle archive that can be deployed in a DSS automation Node, as a binary stream.

Warning

The stream must be closed after use. Use a with statement to handle closing the stream at the end of the block by default. For example:

with project.get_exported_bundle_archive_stream('v1') as fp:
    # use fp

# or explicitly close the stream after use
fp = project.get_exported_bundle_archive_stream('v1')
# use fp, then close
fp.close()
Parameters:

bundle_id (str) – the identifier of the bundle

download_exported_bundle_archive_to_file(bundle_id, path)#

Download a bundle archive that can be deployed in a DSS automation Node into the given output file.

Parameters:
  • bundle_id (str) – the identifier of the bundle

  • path (str) – if “-”, will write to /dev/stdout

publish_bundle(bundle_id, published_project_key=None)#

Publish a bundle on the Project Deployer.

Parameters:
  • bundle_id (str) – The identifier of the bundle

  • published_project_key (str) – The key of the project on the Project Deployer where the bundle will be published.A new published project will be created if none matches the key. If the parameter is not set, the key from the current DSSProject is used.

Returns:

a dict with info on the bundle state once published. It contains the keys “publishedOn” for the publish date, “publishedBy” for the user who published the bundle, and “publishedProjectKey” for the key of the Project Deployer project used.

Return type:

dict

list_imported_bundles()#

List all the bundles imported for this project, on the Automation node.

Returns:

a dict containing bundle imports for a project, on the Automation node.

Return type:

dict

import_bundle_from_archive(archive_path)#

Imports a bundle from a zip archive path on the Automation node.

Parameters:

archive_path (str) – A full path to a zip archive, for example /home/dataiku/my-bundle-v1.zip

import_bundle_from_stream(fp)#

Imports a bundle from a file stream, on the Automation node.

Usage example:

project = client.get_project('MY_PROJECT')
with open('/home/dataiku/my-bundle-v1.zip', 'rb') as f:
    project.import_bundle_from_stream(f)
Parameters:

fp (file-like) – file handler.

activate_bundle(bundle_id, scenarios_to_enable=None)#

Activates a bundle in this project.

Parameters:
  • bundle_id (str) – The ID of the bundle to activate

  • scenarios_to_enable (dict) – An optional dict of scenarios to enable or disable upon bundle activation. The format of the dict should be scenario IDs as keys with values of True or False (defaults to {}).

Returns:

A report containing any error or warning messages that occurred during bundle activation

Return type:

dict

preload_bundle(bundle_id)#

Preloads a bundle that has been imported on the Automation node

Parameters:

bundle_id (str) – the bundle_id for an existing imported bundle

list_scenarios(as_type='listitems')#

List the scenarios in this project.

Parameters:

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns:

The list of the datasets. If “rtype” is “listitems”, each one as a dataikuapi.dss.scenario.DSSScenarioListItem. If “rtype” is “objects”, each one as a dataikuapi.dss.scenario.DSSScenario

Return type:

list

get_scenario(scenario_id)#

Get a handle to interact with a specific scenario

Parameters:

str – scenario_id: the ID of the desired scenario

Returns:

A scenario handle

Return type:

dataikuapi.dss.scenario.DSSScenario

create_scenario(scenario_name, type, definition=None)#

Create a new scenario in the project, and return a handle to interact with it

Parameters:
  • scenario_name (str) – The name for the new scenario. This does not need to be unique (although this is strongly recommended)

  • type (str) – The type of the scenario. MUst be one of ‘step_based’ or ‘custom_python’

  • definition (dict) – the JSON definition of the scenario. Use get_definition(with_status=False) on an existing DSSScenario object in order to get a sample definition object (defaults to {‘params’: {}})

Returns:

a dataikuapi.dss.scenario.DSSScenario handle to interact with the newly-created scenario

list_recipes(as_type='listitems')#

List the recipes in this project

Parameters:

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns:

The list of the recipes. If “as_type” is “listitems”, each one as a dataikuapi.dss.recipe.DSSRecipeListItem. If “as_type” is “objects”, each one as a dataikuapi.dss.recipe.DSSRecipe

Return type:

list

get_recipe(recipe_name)#

Gets a dataikuapi.dss.recipe.DSSRecipe handle to interact with a recipe

Parameters:

recipe_name (str) – The name of the recipe

Returns:

A recipe handle

Return type:

dataikuapi.dss.recipe.DSSRecipe

create_recipe(recipe_proto, creation_settings)#

Create a new recipe in the project, and return a handle to interact with it. We strongly recommend that you use the creator helpers instead of calling this directly.

Some recipe types require additional parameters in creation_settings:

  • ‘grouping’ : a ‘groupKey’ column name

  • ‘python’, ‘sql_query’, ‘hive’, ‘impala’ : the code of the recipe as a ‘payload’ string

Parameters:
  • recipe_proto (dict) – a prototype for the recipe object. Must contain at least ‘type’ and ‘name’

  • creation_settings (dict) – recipe-specific creation settings

Returns:

A recipe handle

Return type:

dataikuapi.dss.recipe.DSSRecipe

new_recipe(type, name=None)#

Initializes the creation of a new recipe. Returns a dataikuapi.dss.recipe.DSSRecipeCreator or one of its subclasses to complete the creation of the recipe.

Usage example:

grouping_recipe_builder = project.new_recipe("grouping")
grouping_recipe_builder.with_input("dataset_to_group_on")
# Create a new managed dataset for the output in the "filesystem_managed" connection
grouping_recipe_builder.with_new_output("grouped_dataset", "filesystem_managed")
grouping_recipe_builder.with_group_key("column")
recipe = grouping_recipe_builder.build()

# After the recipe is created, you can edit its settings
recipe_settings = recipe.get_settings()
recipe_settings.set_column_aggregations("value", sum=True)
recipe_settings.save()

# And you may need to apply new schemas to the outputs
recipe.compute_schema_updates().apply()
Parameters:
  • type (str) – Type of the recipe

  • name (str) – Optional, base name for the new recipe.

Returns:

A new DSS Recipe Creator handle

Return type:

dataikuapi.dss.recipe.DSSRecipeCreator

get_flow()#
Returns:

A Flow handle

Return type:

A dataikuapi.dss.flow.DSSProjectFlow

sync_datasets_acls()#

Resync permissions on HDFS datasets in this project

Attention

This call requires an API key with admin rights

Returns:

a handle to the task of resynchronizing the permissions

Return type:

dataikuapi.dss.future.DSSFuture

list_running_notebooks(as_objects=True)#

Caution

Deprecated. Use DSSProject.list_jupyter_notebooks()

List the currently-running notebooks

Returns:

list of notebooks. Each object contains at least a ‘name’ field

Return type:

list

get_tags()#

List the tags of this project.

Returns:

a dictionary containing the tags with a color

Return type:

dict

set_tags(tags=None)#

Set the tags of this project. :param dict tags: must be a modified version of the object returned by list_tags (defaults to {})

list_macros(as_objects=False)#

List the macros accessible in this project

Parameters:

as_objects – if True, return the macros as dataikuapi.dss.macro.DSSMacro macro handles instead of a list of python dicts (defaults to False)

Returns:

the list of the macros

Return type:

list

get_macro(runnable_type)#

Get a handle to interact with a specific macro

Parameters:

runnable_type (str) – the identifier of a macro

Returns:

A macro handle

Return type:

dataikuapi.dss.macro.DSSMacro

get_wiki()#

Get the wiki

Returns:

the wiki associated to the project

Return type:

dataikuapi.dss.wiki.DSSWiki

get_object_discussions()#

Get a handle to manage discussions on the project

Returns:

the handle to manage discussions

Return type:

dataikuapi.dss.discussion.DSSObjectDiscussions

init_tables_import()#

Start an operation to import Hive or SQL tables as datasets into this project

Returns:

a dataikuapi.dss.project.TablesImportDefinition to add tables to import

Return type:

dataikuapi.dss.project.TablesImportDefinition

list_sql_schemas(connection_name)#

Lists schemas from which tables can be imported in a SQL connection

Parameters:

connection_name (str) – name of the SQL connection

Returns:

an array of schemas names

Return type:

list

list_hive_databases()#

Lists Hive databases from which tables can be imported

Returns:

an array of databases names

Return type:

list

list_sql_tables(connection_name, schema_name=None)#

Lists tables to import in a SQL connection

Parameters:
  • connection_name (str) – name of the SQL connection

  • schema_name (str) – Optional, name of the schema in the SQL connection in which to list tables.

Returns:

an array of tables

Return type:

list

list_hive_tables(hive_database)#

Lists tables to import in a Hive database

Parameters:

hive_database (str) – name of the Hive database

Returns:

an array of tables

Return type:

list

list_elasticsearch_indices_or_aliases(connection_name)#
get_app_manifest()#

Gets the manifest of the application if the project is an app template or an app instance, fails otherwise.

Returns:

the manifest of the application associated to the project

Return type:

dataikuapi.dss.app.DSSAppManifest

setup_mlflow(managed_folder, host=None)#

Set up the dss-plugin for MLflow

Parameters:
get_mlflow_extension()#

Get a handle to interact with the extension of MLflow provided by DSS

Returns:

A Mlflow Extension handle

Return type:

dataikuapi.dss.mlflow.DSSMLflowExtension

list_code_studios(as_type='listitems')#

List the code studio objects in this project

Parameters:

as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).

Returns:

the list of the code studio objects, each one as a python dict

Return type:

list

get_code_studio(code_studio_id)#

Get a handle to interact with a specific code studio object

Parameters:

code_studio_id (str) – the identifier of the desired code studio object

Returns:

A code studio object handle

Return type:

dataikuapi.dss.codestudio.DSSCodeStudioObject

create_code_studio(name, template_id)#

Create a new code studio object in the project, and return a handle to interact with it

Parameters:
  • name (str) – the name of the code studio object

  • template_id (str) – the identifier of a code studio template

Returns:

A code studio object handle

Return type:

dataikuapi.dss.codestudio.DSSCodeStudioObject

get_library()#

Get a handle to manage the project library

Returns:

A dataikuapi.dss.projectlibrary.DSSLibrary handle

Return type:

dataikuapi.dss.projectlibrary.DSSLibrary

list_llms(purpose='GENERIC_COMPLETION', as_type='listitems')#

List the LLM usable in this project

Parameters:
  • purpose (str) – Usage purpose of the LLM. Main values are GENERIC_COMPLETION and TEXT_EMBEDDING_EXTRACTION

  • as_type (str) – How to return the list. Supported values are “listitems” and “objects”.

Returns:

The list of the webapps. If “as_type” is “listitems”, each one as a llm.DSSLLMListItem. If “as_type” is “objects”, each one as a llm.DSSLLM

Return type:

list

get_llm(llm_id)#
list_knowledge_banks(as_type='listitems')#

List the knowledge banks of this project

Parameters:

as_type (str) – How to return the list. Supported values are “listitems” and “objects”.

Returns:

The list of the webapps. If “as_type” is “listitems”, each one as a knowledgebank.DSSKnowledgeBankListItem. If “as_type” is “objects”, each one as a knowledgebank.DSSKnowledgeBank

Return type:

list

get_knowledge_bank(id)#
list_webapps(as_type='listitems')#

List the webapp heads of this project

Parameters:

as_type (str) – How to return the list. Supported values are “listitems” and “objects”.

Returns:

The list of the webapps. If “as_type” is “listitems”, each one as a scenario.DSSWebAppListItem. If “as_type” is “objects”, each one as a scenario.DSSWebApp

Return type:

list

get_webapp(webapp_id)#

Get a handle to interact with a specific webapp :param webapp_id: the identifier of a webapp :returns: A dataikuapi.dss.webapp.DSSWebApp webapp handle

list_dashboards(as_type='listitems')#

List the Dashboards in this project.

Returns:

The list of the dashboards.

Return type:

list

get_dashboard(dashboard_id)#

Get a handle to interact with a specific dashboard object

Parameters:

dashboard_id (str) – the identifier of the desired dashboard object

Returns:

A dataikuapi.dss.dashboard.DSSDashboard dashboard object handle

create_dashboard(dashboard_name, settings=None)#

Create a new dashboard in the project, and return a handle to interact with it

Parameters:
  • dashboard_name (str) – The name for the new dashboard. This does not need to be unique (although this is strongly recommended)

  • settings (dict) – the JSON definition of the dashboard. Use get_settings() on an existing DSSDashboard object in order to get a sample settings object (defaults to {‘pages’: []})

Returns:

a dashboard.DSSDashboard handle to interact with the newly-created dashboard

list_insights(as_type='listitems')#

List the Insights in this project.

Returns:

The list of the insights.

Return type:

list

get_insight(insight_id)#

Get a handle to interact with a specific insight object

Parameters:

insight_id (str) – the identifier of the desired insight object

Returns:

A dataikuapi.dss.insight.DSSInsight insight object handle

create_insight(creation_info)#

Create a new insight in the project, and return a handle to interact with it

Parameters:

creation_info (dict) – the JSON definition of the insight creation. Use get_settings() on an existing DSSInsight object in order to get a sample settings object

Returns:

a insight.DSSInsight handle to interact with the newly-created insight

get_project_git()#

Gets an handle to perform operations on the project’s git repository.

Returns:

a handle to perform git operations on project.

Return type:

dataikuapi.dss.project.DSSProjectGit

get_data_quality_status(only_monitored=True)#

Get the aggregated quality status of a project with the list of the datasets and their associated status

Parameters:

only_monitored – boolean to retrieve only monitored dataset, default to True.

Returns:

The dict of data quality dataset statuses.

:rtype : dict with DATASET_NAME as key

get_data_quality_timeline(min_timestamp=None, max_timestamp=None)#

Get the list of quality status aggregated per day during the timeframe [min_timestamp, max_timestamp]. It includes the current & worst outcome for each days and the details of the datasets runs within the period, also includes previous deleted monitored datasets with the mention “(deleted)” at the end of their id. Default parameters include the timeframe for the last 14 days.

Parameters:
  • min_timestamp (int) – timestamp representing the beginning of the timeframe

  • max_timestamp (int) – timestamp representing the end of the timeframe

Returns:

list of datasets per day in the timeline

:rtype : list of dict

class dataikuapi.dss.project.DSSProjectGit(client, project_key)#

Handle to manage the git repository of a DSS project (fetch, push, pull, …)

get_status()#

Get the current state of the project’s git repository

Returns:

A dict containing the following keys: ‘currentBranch’, ‘remotes’, ‘trackingCount’, ‘clean’, ‘hasUncommittedChanges’,

‘added’, ‘changed’, ‘removed’, ‘missing’, ‘modified’, ‘conflicting’, ‘untracked’ and ‘untrackedFolders’ :rtype: dict

get_remote(name='origin')#

Get the URL of the remote repository.

Param:

str name: The name of the remote. Defaults to “origin”.

Returns:

The URL of the remote origin if set, None otherwise.

Return type:

str or None

set_remote(url, name='origin')#

Set the URL of the remote repository.

Param:

str name: The name of the remote to set. Defaults to “origin”.

Parameters:

url (str) – The URL of the remote repository (git@github.com:user/repo.git).

remove_remote(name='origin')#

Remove the remote origin of the project’s git repository.

Param:

str name: The name of the remote to remove. Defaults to “origin”.

list_branches(remote=False)#

List all branches (local only or local & remote) of the project’s git repository.

Parameters:

remote (bool) – Whether to include remote branches.

Returns:

A list of branch names.

Return type:

list

create_branch(branch_name, commit=None)#

Create a new local branch on the project’s git repository and switches to it.

Parameters:
  • branch_name (str) – The name of the branch to create.

  • commit (str) – Hash of a commit to create the branch from (optional).

Returns:

A dict containing keys ‘success’ and ‘output’ with information about the command execution.

Return type:

dict

delete_branch(branch_name, force_delete=False, remote=False, delete_remotely=False)#

Delete a local or remote branch on the project’s git repository.

Parameters:
  • branch_name (str) – The name of the branch to delete.

  • remote (bool) – True if the branch to delete is a remote branch; False if it’s a local branch.

  • delete_remotely (bool) – True to delete a remote branch both locally and remotely; False to delete the remote branch on the local repository only.

  • force_delete (bool) – True to force the deletion even if some commits have not been pushed; False to fail in case some commits have not been pushed.

get_current_branch()#

Get the name of the current branch

Returns:

The name of the current branch

Return type:

str

list_tags()#

Lists all existing tags.

Returns:

A list of dict objects, each one containing the following keys: ‘name’, ‘shortName’, ‘commit’ (hash of the commit associated with the tag),

‘annotations’ and ‘readOnly’. :rtype: list

create_tag(name, reference='HEAD', message='')#

Create a tag for the specified or current reference.

Parameters:
  • name (str) – The name of the tag to create.

  • reference (str) – ID of a commit to tag. Defaults to HEAD.

delete_tag(name)#

Remove a tag from the local repository

Parameters:

name (str) – The name of the tag to delete.

switch(branch_name)#

Switch the current repository to the specified branch.

Parameters:

branch_name (str) – The name of the branch to switch to.

Returns:

A dict containing keys ‘success’, ‘messages’, ‘output’ with information about the command execution.

Return type:

dict

checkout(branch_name)#

Switch the current repository to the specified branch (identical to`switch`)

Parameters:

branch_name (str) – The name of the branch to checkout.

Returns:

A dict containing keys ‘success’, ‘messages’, ‘output’ with information about the command execution.

Return type:

dict

fetch()#

Fetch branches and/or tags (collectively, “refs”) from the remote repository to the project’s git repository.

Returns:

A dict containing keys ‘success’, ‘logs’, ‘output’ with information about the command execution.

Return type:

dict

pull(branch_name=None)#

Incorporate changes from a remote repository into the current branch on the project’s git repository.

Parameters:

branch_name (str) – The name of the branch to pull. If None, pull from the current branch.

Returns:

A dict containing keys ‘success’, ‘logs’, ‘output’ with information about the command execution.

Return type:

dict

push(branch_name=None)#

Update the remote repository with the project’s local commits.

Parameters:

branch_name (str) – The name of the branch to push. If None, push commits from the current branch.

Returns:

A dict containing keys ‘success’, ‘logs’, ‘output’ with information about the command execution.

Return type:

dict

log(path=None, start_commit=None, count=1000)#

List commits in the project’s git repository.

Parameters:
  • path (str) – Path to filter the logs (optional). If specified, only commits impacting files located in the provided path are returned.

  • start_commit (str) – ID of the first commit to list. Use the value found in the nextCommit field from a previous response (optional).

  • count (int) – Maximum number of commits to return (20 by default).

Returns:

A dict containing a key entries and optionally a second key nextCommit if there are more commits.

Return type:

dict

diff(commit_from=None, commit_to=None)#

Show changes between the working copy and the last commit (commit_from=None, commit_to=None), between two commits (commit_from=SOME_ID, commit_to=SOME_ID), or made in a given commit (commit_from=SOME_ID, commit_to=None).

Parameters:
  • commit_from (str) – ID of the first commit or None

  • commit_to (str) – ID of the second commit or None

Returns:

A containing containing the following keys: ‘commitFrom’ (dict), ‘commitTo’ (dict), ‘addedLines’ (int), ‘removedLines’ (int), ‘changedFiles’ (int), ‘entries’ (array of dict)

Return type:

dict

commit(message)#

Commit pending changes in the project’s git repository with the given message.

Note: Untracked tracked are automatically added before committing.

Parameters:

message (str) – The commit message.

revert_to_revision(commit)#

Revert the project content to the supplied revision.

Parameters:

commit (str) – Hash of a valid commit as returned by the log method.

Returns:

A dict containing a key ‘success’ and optionally a second key ‘logs’ with information about the command execution.

Return type:

dict

revert_commit(commit)#

Revert the changes that the specified commit introduces

Parameters:

commit (str) – ID of a valid commit as returned by the log method.

Returns:

A dict containing the following keys: ‘success’ and optionally ‘logs’ with information about the command execution.

Return type:

dict

reset_to_head()#

Drop uncommitted changes in the project’s git repository (hard reset to HEAD).

reset_to_upstream()#

Drop local changes in the project’s git repository and hard reset to the upstream branch.

drop_and_rebuild(i_know_what_i_am_doing=False)#

Fully drop the current git repositoty and rebuild from scratch a new one. CAUTION: ALL HISTORY WILL BE LOST. ONLY CALL THIS METHOD IF YOU KNOW WHAT YOU ARE DOING.

Parameters:

i_know_what_i_m_doing (bool) – True if you really want to wipe out all git history for this project.

list_libraries()#

Get the list of all external libraries for this project

Returns:

A list of external libraries.

Return type:

list

add_library(repository, local_target_path, checkout, path_in_git_repository='', add_to_python_path=True)#

Add a new external library to the project and pull it.

Parameters:
  • repository (str) – The remote repository.

  • local_target_path (str) – The local target path (relative to root of libraries).

  • checkout (str) – The branch, commit, or tag to check out.

  • path_in_git_repository (str) – The path in the git repository.

  • add_to_python_path (bool) – Whether to add the reference to the Python path.

Returns:

a dataikuapi.dss.future.DSSFuture representing the pull process

Return type:

dataikuapi.dss.future.DSSFuture

set_library(git_reference_path, remote, remotePath, checkout)#

Set an existing external library.

Parameters:
  • git_reference_path (str) – The path of the external library.

  • remote (str) – The remote repository.

  • remotePath (str) – The path in the git repository.

  • checkout (str) – The branch, commit, or tag to check out.

Returns:

The path of the external library.

Return type:

str

remove_library(git_reference_path, delete_directory)#

Remove an external library from the project.

Parameters:
  • git_reference_path (str) – The path of the external library.

  • delete_directory (bool) – Whether to delete the local directory associated with the reference.

reset_library(git_reference_path)#

Reset changes to HEAD from the external library.

Parameters:

git_reference_path (str) – The path of the external library to reset.

Returns:

a dataikuapi.dss.future.DSSFuture representing the reset process

Return type:

dataikuapi.dss.future.DSSFuture

push_library(git_reference_path, commit_message)#

Push changes to the external library

Parameters:
  • git_reference_path (str) – The path of the external library.

  • commit_message (str) – The commit message for the push.

Returns:

a dataikuapi.dss.future.DSSFuture representing the push process

Return type:

dataikuapi.dss.future.DSSFuture

push_all_libraries(commit_message)#

Push changes for all libraries in the project.

Parameters:

commit_message (str) – The commit message for the push.

Returns:

a dataikuapi.dss.future.DSSFuture representing the push process

Return type:

dataikuapi.dss.future.DSSFuture

reset_all_libraries()#

Reset changes for all libraries in the project. :return: a dataikuapi.dss.future.DSSFuture representing the reset process :rtype: dataikuapi.dss.future.DSSFuture

class dataiku.Project(project_key=None)#

This is a handle to interact with the current project

Note

This class is also available as dataiku.Project

get_last_metric_values()#

Get the set of last values of the metrics on this project.

Return type:

dataiku.core.metrics.ComputedMetrics

get_metric_history(metric_lookup)#

Get the set of all values a given metric took on this project.

Parameters:

metric_lookup (string) – metric name or unique identifier

Return type:

dict

save_external_metric_values(values_dict)#

Save metrics on this project.

The metrics are saved with the type “external”

Parameters:

values_dict (dict) – the values to save, as a dict. The keys of the dict are used as metric names

get_last_check_values()#

Get the set of last values of the checks on this project.

Return type:

dataiku.core.metrics.ComputedChecks

get_check_history(check_lookup)#

Get the set of all values a given check took on this project.

Parameters:

check_lookup (string) – check name or unique identifier

Return type:

dict

set_variables(variables)#

Set all variables of the current project

Parameters:

variables (dict) – must be a modified version of the object returned by get_variables

get_variables()#

Get project variables :param bool typed: typed true to try to cast the variable into its original type (eg. int rather than string)

Returns:

A dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project

save_external_check_values(values_dict)#

Save checks on this project.

The checks are saved with the type “external”.

Parameters:

values_dict (dict) – the values to save, as a dict. The keys of the dict are used as check names