Projects#
For usage information and examples, please see Projects.
- class dataikuapi.dss.project.DSSProject(client, project_key)#
A handle to interact with a project on the DSS instance.
Important
Do not create this class directly, instead use
dataikuapi.DSSClient.get_project()
- get_summary()#
Returns a summary of the project. The summary is a read-only view of some of the state of the project. You cannot edit the resulting dict and use it to update the project state on DSS, you must use the other more specific methods of this
dataikuapi.dss.project.DSSProject
object- Returns:
a dict containing a summary of the project. Each dict contains at least a projectKey field
- Return type:
dict
- get_project_folder()#
Get the folder containing this project
- Return type:
- move_to_folder(folder)#
Moves this project to a project folder
- Parameters:
folder (
dataikuapi.dss.projectfolder.DSSProjectFolder
) – destination folder
- delete(clear_managed_datasets=False, clear_output_managed_folders=False, clear_job_and_scenario_logs=True, **kwargs)#
Delete the project
Attention
This call requires an API key with admin rights
- Parameters:
clear_managed_datasets (bool) – Should the data of managed datasets be cleared (defaults to False)
clear_output_managed_folders (bool) – Should the data of managed folders used as outputs of recipes be cleared (defaults to False)
clear_job_and_scenario_logs (bool) – Should the job and scenario logs be cleared (defaults to True)
- get_export_stream(options=None)#
Return a stream of the exported project
Warning
You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.
- Parameters:
options (dict) –
Dictionary of export options (defaults to {}). The following options are available:
exportUploads (boolean): Exports the data of Uploaded datasets (default to False)
exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)
exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)
exportSavedModels (boolean): Exports the models trained in saved models (default to False)
exportManagedFolders (boolean): Exports the data of managed folders (default to False)
exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)
exportAllDatasets (boolean): Exports the data of all datasets (default to False)
exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)
exportGitRepository (boolean): Exports the Git repository history (default to False)
exportInsightsData (boolean): Exports the data of static insights (default to False)
- Returns:
a stream of the export archive
- Return type:
file-like object
- export_to_file(path, options=None)#
Export the project to a file
- Parameters:
path (str) – the path of the file in which the exported project should be saved
options (dict) –
Dictionary of export options (defaults to {}). The following options are available:
exportUploads (boolean): Exports the data of Uploaded datasets (default to False)
exportManagedFS (boolean): Exports the data of managed Filesystem datasets (default to False)
exportAnalysisModels (boolean): Exports the models trained in analysis (default to False)
exportSavedModels (boolean): Exports the models trained in saved models (default to False)
exportModelEvaluationStores (boolean): Exports the evaluation stores (default to False)
exportManagedFolders (boolean): Exports the data of managed folders (default to False)
exportAllInputDatasets (boolean): Exports the data of all input datasets (default to False)
exportAllDatasets (boolean): Exports the data of all datasets (default to False)
exportAllInputManagedFolders (boolean): Exports the data of all input managed folders (default to False)
exportGitRepository (boolean): Exports the Git repository history (default to False)
exportInsightsData (boolean): Exports the data of static insights (default to False)
- duplicate(target_project_key, target_project_name, duplication_mode='MINIMAL', export_analysis_models=True, export_saved_models=True, export_git_repository=True, export_insights_data=True, remapping=None, target_project_folder=None)#
Duplicate the project
- Parameters:
target_project_key (str) – The key of the new project
target_project_name (str) – The name of the new project
duplication_mode (str) – can be one of the following values: MINIMAL, SHARING, FULL, NONE (defaults to MINIMAL)
export_analysis_models (bool) – (defaults to True)
export_saved_models (bool) – (defaults to True)
export_git_repository (bool) – (defaults to True)
export_insights_data (bool) – (defaults to True)
remapping (dict) – dict of connections to be remapped for the new project (defaults to {})
target_project_folder (A
dataikuapi.dss.projectfolder.DSSProjectFolder
) – the project folder where to put the duplicated project (defaults to None)
- Returns:
A dict containing the original and duplicated project’s keys
- Return type:
dict
- get_metadata()#
Get the metadata attached to this project. The metadata contains label, description checklists, tags and custom metadata of the project.
Note
For more information on available metadata, please see https://doc.dataiku.com/dss/api/latest/rest/
- Returns:
the project metadata.
- Return type:
dict
- set_metadata(metadata)#
Set the metadata on this project.
Usage example:
project_metadata = project.get_metadata() project_metadata['tags'] = ['tag1','tag2'] project.set_metadata(project_metadata)
- Parameters:
metadata (dict) – the new state of the metadata for the project. You should only set a metadata object that has been retrieved using the
get_metadata()
call.
- get_settings()#
Gets the settings of this project. This does not contain permissions. See
get_permissions()
- Returns:
a handle to read, modify and save the settings
- Return type:
dataikuapi.dss.project.DSSProjectSettings
- get_permissions()#
Get the permissions attached to this project
- Returns:
A dict containing the owner and the permissions, as a list of pairs of group name and permission type
- Return type:
dict
- set_permissions(permissions)#
Sets the permissions on this project
Usage example:
project_permissions = project.get_permissions() project_permissions['permissions'].append({'group':'data_scientists', 'readProjectContent': True, 'readDashboards': True}) project.set_permissions(project_permissions)
- Parameters:
permissions (dict) – a permissions object with the same structure as the one returned by
get_permissions()
call
- get_interest()#
Get the interest of this project. The interest means the number of watchers and the number of stars.
- Returns:
a dict object containing the interest of the project with two fields:
starCount: number of stars for this project
watchCount: number of users watching this project
- Return type:
dict
- get_timeline(item_count=100)#
Get the timeline of this project. The timeline consists of information about the creation of this project (by whom, and when), the last modification of this project (by whom and when), a list of contributors, and a list of modifications. This list of modifications contains a maximum of item_count elements (default to 100). If item_count is greater than the real number of modification, item_count is adjusted.
- Parameters:
item_count (int) – maximum number of modifications to retrieve in the items list
- Returns:
a timeline where the top-level fields are :
allContributors: all contributors who have been involved in this project
items: a history of the modifications of the project
createdBy: who created this project
createdOn: when the project was created
lastModifiedBy: who modified this project for the last time
lastModifiedOn: when this modification took place
- Return type:
dict
- list_datasets(as_type='listitems')#
List the datasets in this project.
- Parameters:
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns:
The list of the datasets. If “as_type” is “listitems”, each one as a
dataikuapi.dss.dataset.DSSDatasetListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.dataset.DSSDataset
- Return type:
list
- get_dataset(dataset_name)#
Get a handle to interact with a specific dataset
- Parameters:
dataset_name (str) – the name of the desired dataset
- Returns:
A dataset handle
- Return type:
- create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)#
Create a new dataset in the project, and return a handle to interact with it.
The precise structure of params and formatParams depends on the specific dataset type and dataset format type. To know which fields exist for a given dataset type and format type, create a dataset from the UI, and use
get_dataset()
to retrieve the configuration of the dataset and inspect it. Then reproduce a similar structure in thecreate_dataset()
call.Not all settings of a dataset can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the dataset
- Parameters:
dataset_name (str) – the name of the dataset to create. Must not already exist
type (str) – the type of the dataset
params (dict) – the parameters for the type, as a python dict (defaults to {})
formatType (str) – an optional format to create the dataset with (only for file-oriented datasets)
formatParams (dict) – the parameters to the format, as a python dict (only for file-oriented datasets, default to {})
- Returns:
A dataset handle
- Return type:
- create_upload_dataset(dataset_name, connection=None)#
Create a new dataset of type ‘UploadedFiles’ in the project, and return a handle to interact with it.
- Parameters:
dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the upload connection (defaults to None)
- Returns:
A dataset handle
- Return type:
- create_filesystem_dataset(dataset_name, connection, path_in_connection)#
Create a new filesystem dataset in the project, and return a handle to interact with it.
- Parameters:
dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
- Returns:
A dataset handle
- Return type:
- create_s3_dataset(dataset_name, connection, path_in_connection, bucket=None)#
Creates a new external S3 dataset in the project and returns a
dataikuapi.dss.dataset.DSSDataset
to interact with it.The created dataset does not have its format and schema initialized, it is recommended to use
autodetect_settings()
on the returned object- Parameters:
dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
bucket (str) – the name of the s3 bucket (defaults to None)
- Returns:
A dataset handle
- Return type:
- create_gcs_dataset(dataset_name, connection, path_in_connection, bucket=None)#
Creates a new external GCS dataset in the project and returns a
dataikuapi.dss.dataset.DSSDataset
to interact with it.The created dataset does not have its format and schema initialized, it is recommended to use
autodetect_settings()
on the returned object- Parameters:
dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
bucket (str) – the name of the GCS bucket (defaults to None)
- Returns:
A dataset handle
- Return type:
- create_azure_blob_dataset(dataset_name, connection, path_in_connection, container=None)#
Creates a new external Azure dataset in the project and returns a
dataikuapi.dss.dataset.DSSDataset
to interact with it.The created dataset does not have its format and schema initialized, it is recommended to use
autodetect_settings()
on the returned object- Parameters:
dataset_name (str) – the name of the dataset to create. Must not already exist
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
container (str) – the name of the storage account container (defaults to None)
- Returns:
A dataset handle
- Return type:
- create_fslike_dataset(dataset_name, dataset_type, connection, path_in_connection, extra_params=None)#
Create a new file-based dataset in the project, and return a handle to interact with it.
- Parameters:
dataset_name (str) – the name of the dataset to create. Must not already exist
dataset_type (str) – the type of the dataset
connection (str) – the name of the connection
path_in_connection (str) – the path of the dataset in the connection
extra_params (dict) – a python dict of extra parameters (defaults to None)
- Returns:
A dataset handle
- Return type:
- create_sql_table_dataset(dataset_name, type, connection, table, schema, catalog=None)#
Create a new SQL table dataset in the project, and return a handle to interact with it.
- Parameters:
dataset_name (str) – the name of the dataset to create. Must not already exist
type (str) – the type of the dataset
connection (str) – the name of the connection
table (str) – the name of the table in the connection
schema (str) – the schema of the table
catalog (str) – [optional] the catalog of the table
- Returns:
A dataset handle
- Return type:
- new_managed_dataset_creation_helper(dataset_name)#
Caution
Deprecated. Please use
new_managed_dataset()
- new_managed_dataset(dataset_name)#
Initializes the creation of a new managed dataset. Returns a
dataikuapi.dss.dataset.DSSManagedDatasetCreationHelper
or one of its subclasses to complete the creation of the managed dataset.Usage example:
builder = project.new_managed_dataset("my_dataset") builder.with_store_into("target_connection") dataset = builder.create()
- Parameters:
dataset_name (str) – Name of the dataset to create
- Returns:
An object to create the managed dataset
- Return type:
- get_labeling_task(labeling_task_id)#
Get a handle to interact with a specific labeling task
- Parameters:
labeling_task_id (str) – the id of the desired labeling task
- Returns:
A labeling task handle
- Return type:
dataikuapi.dss.labeling_task.DSSLabelingTask
- list_streaming_endpoints(as_type='listitems')#
List the streaming endpoints in this project.
- Parameters:
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns:
The list of the streaming endpoints. If “as_type” is “listitems”, each one as a
dataikuapi.dss.streaming_endpoint.DSSStreamingEndpointListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.streaming_endpoint.DSSStreamingEndpoint
- Return type:
list
- get_streaming_endpoint(streaming_endpoint_name)#
Get a handle to interact with a specific streaming endpoint
- Parameters:
streaming_endpoint_name (str) – the name of the desired streaming endpoint
- Returns:
A streaming endpoint handle
- Return type:
- create_streaming_endpoint(streaming_endpoint_name, type, params=None)#
Create a new streaming endpoint in the project, and return a handle to interact with it.
The precise structure of params depends on the specific streaming endpoint type. To know which fields exist for a given streaming endpoint type, create a streaming endpoint from the UI, and use
get_streaming_endpoint()
to retrieve the configuration of the streaming endpoint and inspect it. Then reproduce a similar structure in thecreate_streaming_endpoint()
call.Not all settings of a streaming endpoint can be set at creation time (for example partitioning). After creation, you’ll have the ability to modify the streaming endpoint.
- Parameters:
streaming_endpoint_name (str) – the name for the new streaming endpoint
type (str) – the type of the streaming endpoint
params (dict) – the parameters for the type, as a python dict (defaults to {})
- Returns:
A streaming endpoint handle
- Return type:
- create_kafka_streaming_endpoint(streaming_endpoint_name, connection=None, topic=None)#
Create a new kafka streaming endpoint in the project, and return a handle to interact with it.
- Parameters:
streaming_endpoint_name (str) – the name for the new streaming endpoint
connection (str) – the name of the kafka connection (defaults to None)
topic (str) – the name of the kafka topic (defaults to None)
- Returns:
A streaming endpoint handle
- Return type:
- create_httpsse_streaming_endpoint(streaming_endpoint_name, url=None)#
Create a new https streaming endpoint in the project, and return a handle to interact with it.
- Parameters:
streaming_endpoint_name (str) – the name for the new streaming endpoint
url (str) – the url of the endpoint (defaults to None)
- Returns:
A streaming endpoint handle
- Return type:
- new_managed_streaming_endpoint(streaming_endpoint_name, streaming_endpoint_type=None)#
Initializes the creation of a new streaming endpoint. Returns a
dataikuapi.dss.streaming_endpoint.DSSManagedStreamingEndpointCreationHelper
to complete the creation of the streaming endpoint- Parameters:
streaming_endpoint_name (str) – Name of the new streaming endpoint - must be unique in the project
streaming_endpoint_type (str) – Type of the new streaming endpoint (optional if it can be inferred from a connection type)
- Returns:
An object to create the streaming endpoint
- Return type:
- create_prediction_ml_task(input_dataset, target_variable, ml_backend_type='PY_MEMORY', guess_policy='DEFAULT', prediction_type=None, wait_guess_complete=True)#
Creates a new prediction task in a new visual analysis lab for a dataset.
- Parameters:
input_dataset (str) – the dataset to use for training/testing the model
target_variable (str) – the variable to predict
ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)
guess_policy (str) – Policy to use for setting the default parameters. Valid values are: DEFAULT, SIMPLE_FORMULA, DECISION_TREE, EXPLANATORY and PERFORMANCE (defaults to DEFAULT)
prediction_type (str) – The type of prediction problem this is. If not provided the prediction type will be guessed. Valid values are: BINARY_CLASSIFICATION, REGRESSION, MULTICLASS (defaults to None)
wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)
- Returns:
A ML task handle of type ‘PREDICTION’
- Return type:
- create_clustering_ml_task(input_dataset, ml_backend_type='PY_MEMORY', guess_policy='KMEANS', wait_guess_complete=True)#
Creates a new clustering task in a new visual analysis lab for a dataset.
The returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms.
You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)
- Parameters:
ml_backend_type (str) – ML backend to use, one of PY_MEMORY, MLLIB or H2O (defaults to PY_MEMORY)
guess_policy (str) – Policy to use for setting the default parameters. Valid values are: KMEANS and ANOMALY_DETECTION (defaults to KMEANS)
wait_guess_complete (boolean) – if False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms (defaults to True). You should wait for the guessing to be completed by calling wait_guess_complete on the returned object before doing anything else (in particular calling train or get_settings)
- Returns:
A ML task handle of type ‘CLUSTERING’
- Return type:
- create_timeseries_forecasting_ml_task(input_dataset, target_variable, time_variable, timeseries_identifiers=None, guess_policy='TIMESERIES_DEFAULT', wait_guess_complete=True)#
Creates a new time series forecasting task in a new visual analysis lab for a dataset.
- Parameters:
input_dataset (string) – The dataset to use for training/testing the model
target_variable (string) – The variable to forecast
time_variable (string) – Column to be used as time variable. Should be a Date (parsed) column.
timeseries_identifiers (list) – List of columns to be used as time series identifiers (when the dataset has multiple series)
guess_policy (string) – Policy to use for setting the default parameters. Valid values are: TIMESERIES_DEFAULT, TIMESERIES_STATISTICAL, and TIMESERIES_DEEP_LEARNING
wait_guess_complete (boolean) – If False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling
wait_guess_complete
on the returned object before doing anything else (in particular callingtrain
orget_settings
)
- Returns:
dataiku.dss.ml.DSSMLTask
- create_causal_prediction_ml_task(input_dataset, outcome_variable, treatment_variable, prediction_type=None, wait_guess_complete=True)#
Creates a new causal prediction task in a new visual analysis lab for a dataset.
- Parameters:
input_dataset (string) – The dataset to use for training/testing the model
outcome_variable (string) – The outcome to predict.
treatment_variable (string) – Column to be used as treatment variable.
prediction_type (string or None) – Valid values are: “CAUSAL_BINARY_CLASSIFICATION”, “CAUSAL_REGRESSION” or None (in this case prediction_type will be set by the Guesser)
wait_guess_complete (boolean) – If False, the returned ML task will be in ‘guessing’ state, i.e. analyzing the input dataset to determine feature handling and algorithms. You should wait for the guessing to be completed by calling
wait_guess_complete
on the returned object before doing anything else (in particular callingtrain
orget_settings
)
- Returns:
dataiku.dss.ml.DSSMLTask
- list_ml_tasks()#
List the ML tasks in this project
- Returns:
the list of the ML tasks summaries, each one as a python dict
- Return type:
list
- get_ml_task(analysis_id, mltask_id)#
Get a handle to interact with a specific ML task
- Parameters:
analysis_id (str) – the identifier of the visual analysis containing the desired ML task
mltask_id (str) – the identifier of the desired ML task
- Returns:
A ML task handle
- Return type:
- list_mltask_queues()#
List non-empty ML task queues in this project
- Returns:
an iterable listing of MLTask queues (each a dict)
- Return type:
dataikuapi.dss.ml.DSSMLTaskQueues
- create_analysis(input_dataset)#
Creates a new visual analysis lab for a dataset.
- Parameters:
input_dataset (str) – the dataset to use for the analysis
- Returns:
A visual analysis handle
- Return type:
dataikuapi.dss.analysis.DSSAnalysis
- list_analyses()#
List the visual analyses in this project
- Returns:
the list of the visual analyses summaries, each one as a python dict
- Return type:
list
- get_analysis(analysis_id)#
Get a handle to interact with a specific visual analysis
- Parameters:
analysis_id (str) – the identifier of the desired visual analysis
- Returns:
A visual analysis handle
- Return type:
dataikuapi.dss.analysis.DSSAnalysis
- list_saved_models()#
List the saved models in this project
- Returns:
the list of the saved models, each one as a python dict
- Return type:
list
- get_saved_model(sm_id)#
Get a handle to interact with a specific saved model
- Parameters:
sm_id (str) – the identifier of the desired saved model
- Returns:
A saved model handle
- Return type:
- create_mlflow_pyfunc_model(name, prediction_type=None)#
Creates a new external saved model for storing and managing MLFlow models
- Parameters:
name (str) – Human readable name for the new saved model in the flow
prediction_type (str) – Optional (but needed for most operations). One of BINARY_CLASSIFICATION, MULTICLASS, REGRESSION or None. Defaults to None, standing for other prediction types. If the Saved Model has a None prediction type, scoring, inclusion in a bundle or in an API service will be possible, but features related to performance analysis and explainability will not be available.
- Returns:
The created saved model handle
- Return type:
- create_finetuned_llm_saved_model(name)#
Creates a new finetuned LLM Saved Model for finetuning using Python code
- Parameters:
name (str) – Human-readable name for the new saved model in the flow
- Returns:
The created saved model handle
- Return type:
- create_external_model(name, prediction_type, configuration)#
Creates a new Saved model that can contain external remote endpoints as versions.
- Parameters:
name (string) – Human-readable name for the new saved model in the flow
prediction_type (string) – One of BINARY_CLASSIFICATION, MULTICLASS or REGRESSION
configuration (dict) –
A dictionary containing the desired external saved model configuration.
For SageMaker, the syntax is:
configuration = { "protocol": "sagemaker", "region": "<region-name>" "connection": "<connection-name>" }
Where the parameters have the following meaning:
region
: The AWS region of the endpoint, e.g.eu-west-1
connection
: (optional) The DSS SageMaker connection to use for authentication. If not defined, credentials will be derived from environment. See the reference documentation for details.
For Databricks, the syntax is:
configuration = { "protocol": "databricks", "connection": "<connection-name>" }
For AzureML, syntax is:
configuration = { "protocol": "azure-ml", "connection": "<connection-name>", "subscription_id": "<id>", "resource_group": "<rg>", "workspace": "<workspace>" }
Where the parameters have the following meaning:
connection
: (optional) The DSS Azure ML connection to use for authentication. If not defined, credentials will be derived from environment. See the reference documentation for details.subscription_id
: The Azure subscription IDresource_group
: The Azure resource groupworkspace
: The Azure ML workspace
For Vertex AI, syntax is:
configuration = { "protocol": "vertex-ai", "region": "<region-name>" "connection": "<connection-name>", "project_id": "<name> or <id>" }
Where the parameters have the following meaning:
region
: The GCP region of the endpoint, e.g.europe-west-1
connection
: (optional) The DSS Vertex AI connection to use for authentication. If not defined, credentials will be derived from environment. See the reference documentation for details.project_id
: The ID or name of the GCP project
Example: create a saved model for SageMaker endpoints serving binary classification models in region eu-west-1
import dataiku client = dataiku.api_client() project = client.get_default_project() configuration = { "protocol": "sagemaker", "region": "eu-west-1" } sm = project.create_external_model("SaveMaker Proxy Model", "BINARY_CLASSIFICATION", configuration)
Example: create a saved model for Vertex AI endpoints serving regression models in region eu-west-1, on project “my-project”, performing authentication using DSS connection “vertex_conn” of type “Vertex AI”.
import dataiku client = dataiku.api_client() project = client.get_default_project() configuration = { "protocol": "vertex-ai", "region": "europe-west1", "connection": "vertex_conn" "project_id": "my-project" } sm = project.create_external_model("Vertex AI Proxy Model", "BINARY_CLASSIFICATION", configuration)
- list_managed_folders()#
List the managed folders in this project
- Returns:
the list of the managed folders, each one as a python dict
- Return type:
list
- get_managed_folder(odb_id)#
Get a handle to interact with a specific managed folder
- Parameters:
odb_id (str) – the identifier of the desired managed folder
- Returns:
A managed folder handle
- Return type:
- create_managed_folder(name, folder_type=None, connection_name='filesystem_folders')#
Create a new managed folder in the project, and return a handle to interact with it
- Parameters:
name (str) – the name of the managed folder
folder_type (str) – type of storage (defaults to None)
connection_name (str) – the connection name (defaults to filesystem_folders)
- Returns:
A managed folder handle
- Return type:
- list_model_evaluation_stores()#
List the model evaluation stores in this project.
- Returns:
The list of the model evaluation stores
- Return type:
list of
dataikuapi.dss.modelevaluationstore.DSSModelEvaluationStore
- get_model_evaluation_store(mes_id)#
Get a handle to interact with a specific model evaluation store
- Parameters:
mes_id (str) – the id of the desired model evaluation store
- Returns:
A model evaluation store handle
- Return type:
- create_model_evaluation_store(name)#
Create a new model evaluation store in the project, and return a handle to interact with it.
- Parameters:
name (str) – the name for the new model evaluation store
- Returns:
A model evaluation store handle
- Return type:
- list_model_comparisons()#
List the model comparisons in this project.
- Returns:
The list of the model comparisons
- Return type:
list
- get_model_comparison(mec_id)#
Get a handle to interact with a specific model comparison
- Parameters:
mec_id (str) – the id of the desired model comparison
- Returns:
A model comparison handle
- Return type:
dataikuapi.dss.modelcomparison.DSSModelComparison
- create_model_comparison(name, prediction_type)#
Create a new model comparison in the project, and return a handle to interact with it.
- Parameters:
name (str) – the name for the new model comparison
prediction_type (str) – one of BINARY_CLASSIFICATION, REGRESSION, MULTICLASS, TIMESERIES_FORECAST, CAUSAL_BINARY_CLASSIFICATION, CAUSAL_REGRESSION
- Returns:
A new model comparison handle
- Return type:
dataikuapi.dss.modelcomparison.DSSModelComparison
- list_jobs()#
List the jobs in this project
- Returns:
a list of the jobs, each one as a python dict, containing both the definition and the state
- Return type:
list
- get_job(id)#
Get a handler to interact with a specific job
- Parameters:
id (str) – the id of the desired job
- Returns:
A job handle
- Return type:
- start_job(definition)#
Create a new job, and return a handle to interact with it
- Parameters:
definition (dict) –
The definition should contain:
the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)
a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)
(Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
- Returns:
A job handle
- Return type:
- start_job_and_wait(definition, no_fail=False)#
Starts a new job and waits for it to complete.
- Parameters:
definition (dict) –
The definition should contain:
the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD)
a list of outputs to build from the available types: (DATASET, MANAGED_FOLDER, SAVED_MODEL, STREAMING_ENDPOINT)
(Optional) a refreshHiveMetastore field (True or False) to specify whether to re-synchronize the Hive metastore for recomputed HDFS datasets.
no_fail (bool) – if true, the function won’t fail even if the job fails or aborts (defaults to False)
- Returns:
the final status of the job
- Return type:
str
- new_job(job_type='NON_RECURSIVE_FORCED_BUILD')#
Create a job to be run. You need to add outputs to the job (i.e. what you want to build) before running it.
job_builder = project.new_job() job_builder.with_output("mydataset") complete_job = job_builder.start_and_wait() print("Job %s done" % complete_job.id)
- Parameters:
job_type (str) – the type of job (RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD, RECURSIVE_FORCED_BUILD, RECURSIVE_MISSING_ONLY_BUILD) (defaults to NON_RECURSIVE_FORCED_BUILD)
- Returns:
A job handle
- Return type:
- new_job_definition_builder(job_type='NON_RECURSIVE_FORCED_BUILD')#
Caution
Deprecated. Please use
new_job()
- list_jupyter_notebooks(active=False, as_type='object')#
List the jupyter notebooks of a project.
- Parameters:
active (bool) – if True, only return currently running jupyter notebooks (defaults to active).
as_type (string) – How to return the list. Supported values are “listitems” and “object” (defaults to object).
- Returns:
The list of the notebooks. If “as_type” is “listitems”, each one as a
dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem
, if “as_type” is “objects”, each one as adataikuapi.dss.jupyternotebook.DSSJupyterNotebook
- Return type:
list of
dataikuapi.dss.jupyternotebook.DSSJupyterNotebookListItem
or list ofdataikuapi.dss.jupyternotebook.DSSJupyterNotebook
- get_jupyter_notebook(notebook_name)#
Get a handle to interact with a specific jupyter notebook
- Parameters:
notebook_name (str) – The name of the jupyter notebook to retrieve
- Returns:
A handle to interact with this jupyter notebook
- Return type:
dataikuapi.dss.jupyternotebook.DSSJupyterNotebook
jupyter notebook handle
- create_jupyter_notebook(notebook_name, notebook_content)#
Create a new jupyter notebook and get a handle to interact with it
- Parameters:
notebook_name (str) – the name of the notebook to create
notebook_content (dict) – the data of the notebook to create, as a dict. The data will be converted to a JSON string internally. Use
DSSJupyterNotebook.get_content()
on a similar existing DSSJupyterNotebook object in order to get a sample definition object.
- Returns:
A handle to interact with the newly created jupyter notebook
- Return type:
dataikuapi.dss.jupyternotebook.DSSJupyterNotebook
jupyter notebook handle
- list_sql_notebooks(as_type='listitems')#
List the SQL notebooks of a project
- Parameters:
as_type (string) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems)
- Returns:
The list of the notebooks. If “as_type” is “listitems”, each one as a
dataikuapi.dss.sqlnotebook.DSSSQLNotebookListItem
, if “as_type” is “objects”, each one as adataikuapi.dss.sqlnotebook.DSSSQLNotebook
- Return type:
List of
dataikuapi.dss.sqlnotebook.DSSSQLNotebookListItem
or list ofdataikuapi.dss.sqlnotebook.DSSSQLNotebook
- get_sql_notebook(notebook_id)#
Get a handle to interact with a specific SQL notebook
- Parameters:
notebook_id (string) – The id of the SQL notebook to retrieve
- Returns:
A handle to interact with this SQL notebook
- Return type:
dataikuapi.dss.sqlnotebook.DSSSQLNotebook
SQL notebook handle
- create_sql_notebook(notebook_content)#
Create a new SQL notebook and get a handle to interact with it
- Parameters:
notebook_content (dict) – The data of the notebook to create, as a dict. The data will be converted to a JSON string internally. Use
DSSSQLNotebook.get_content()
on a similar existing DSSSQLNotebook object in order to get a sample definition object- Returns:
A handle to interact with the newly created SQL notebook
- Return type:
dataikuapi.dss.sqlnotebook.DSSSQLNotebook
SQL notebook handle
- list_continuous_activities(as_objects=True)#
List the continuous activities in this project
- Parameters:
as_objects (bool) – if True, returns a list of
dataikuapi.dss.continuousactivity.DSSContinuousActivity
objects, else returns a list of python dicts (defaults to True)- Returns:
a list of the continuous activities, each one as a python dict, containing both the definition and the state
- Return type:
list
- get_continuous_activity(recipe_id)#
Get a handler to interact with a specific continuous activities
- Parameters:
recipe_id (str) – the identifier of the recipe controlled by the continuous activity
- Returns:
A job handle
- Return type:
- get_variables()#
Gets the variables of this project.
- Returns:
a dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project
- Return type:
dict
- set_variables(obj)#
Sets the variables of this project.
Warning
If executed from a python recipe, the changes made by set_variables will not be “seen” in that recipe. Use the internal API dataiku.get_custom_variables() instead if this behavior is needed
- Parameters:
obj (dict) – must be a modified version of the object returned by get_variables
- update_variables(variables, type='standard')#
Updates a set of variables for this project
- Parameters:
dict (variables) – a dict of variable name -> value to set. Keys of the dict must be strings. Values in the dict can be strings, numbers, booleans, lists or dicts
str (type) – Can be “standard” to update regular variables or “local” to update local-only variables that are not part of bundles for this project (defaults to standard)
- list_api_services(as_type='listitems')#
List the API services in this project
- Parameters:
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns:
The list of the datasets. If “as_type” is “listitems”, each one as a
dataikuapi.dss.apiservice.DSSAPIServiceListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.apiservice.DSSAPIService
- Return type:
list
- create_api_service(service_id)#
Create a new API service, and returns a handle to interact with it. The newly-created service does not have any endpoint.
- Parameters:
service_id (str) – the ID of the API service to create
- Returns:
A API Service handle
- Return type:
- get_api_service(service_id)#
Get a handle to interact with a specific API Service from the API Designer
- Parameters:
service_id (str) – The identifier of the API Designer API Service to retrieve
- Returns:
A handle to interact with this API Service
- Return type:
- list_exported_bundles()#
List all the bundles created in this project on the Design Node.
- Returns:
A dictionary of all bundles for a project on the Design node.
- Return type:
dict
- export_bundle(bundle_id)#
Creates a new project bundle on the Design node
- Parameters:
bundle_id (str) – bundle id tag
- delete_exported_bundle(bundle_id)#
Deletes a project bundle from the Design node :param str bundle_id: bundle id tag
- get_exported_bundle_archive_stream(bundle_id)#
Download a bundle archive that can be deployed in a DSS automation Node, as a binary stream.
Warning
The stream must be closed after use. Use a with statement to handle closing the stream at the end of the block by default. For example:
with project.get_exported_bundle_archive_stream('v1') as fp: # use fp # or explicitly close the stream after use fp = project.get_exported_bundle_archive_stream('v1') # use fp, then close fp.close()
- Parameters:
bundle_id (str) – the identifier of the bundle
- download_exported_bundle_archive_to_file(bundle_id, path)#
Download a bundle archive that can be deployed in a DSS automation Node into the given output file.
- Parameters:
bundle_id (str) – the identifier of the bundle
path (str) – if “-”, will write to /dev/stdout
- publish_bundle(bundle_id, published_project_key=None)#
Publish a bundle on the Project Deployer.
- Parameters:
bundle_id (str) – The identifier of the bundle
published_project_key (str) – The key of the project on the Project Deployer where the bundle will be published.A new published project will be created if none matches the key. If the parameter is not set, the key from the current
DSSProject
is used.
- Returns:
a dict with info on the bundle state once published. It contains the keys “publishedOn” for the publish date, “publishedBy” for the user who published the bundle, and “publishedProjectKey” for the key of the Project Deployer project used.
- Return type:
dict
- list_imported_bundles()#
List all the bundles imported for this project, on the Automation node.
- Returns:
a dict containing bundle imports for a project, on the Automation node.
- Return type:
dict
- import_bundle_from_archive(archive_path)#
Imports a bundle from a zip archive path on the Automation node.
- Parameters:
archive_path (str) – A full path to a zip archive, for example /home/dataiku/my-bundle-v1.zip
- import_bundle_from_stream(fp)#
Imports a bundle from a file stream, on the Automation node.
Usage example:
project = client.get_project('MY_PROJECT') with open('/home/dataiku/my-bundle-v1.zip', 'rb') as f: project.import_bundle_from_stream(f)
- Parameters:
fp (file-like) – file handler.
- activate_bundle(bundle_id, scenarios_to_enable=None)#
Activates a bundle in this project.
- Parameters:
bundle_id (str) – The ID of the bundle to activate
scenarios_to_enable (dict) – An optional dict of scenarios to enable or disable upon bundle activation. The format of the dict should be scenario IDs as keys with values of True or False (defaults to {}).
- Returns:
A report containing any error or warning messages that occurred during bundle activation
- Return type:
dict
- preload_bundle(bundle_id)#
Preloads a bundle that has been imported on the Automation node
- Parameters:
bundle_id (str) – the bundle_id for an existing imported bundle
- list_scenarios(as_type='listitems')#
List the scenarios in this project.
- Parameters:
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns:
The list of the datasets. If “rtype” is “listitems”, each one as a
dataikuapi.dss.scenario.DSSScenarioListItem
. If “rtype” is “objects”, each one as adataikuapi.dss.scenario.DSSScenario
- Return type:
list
- get_scenario(scenario_id)#
Get a handle to interact with a specific scenario
- Parameters:
str – scenario_id: the ID of the desired scenario
- Returns:
A scenario handle
- Return type:
- create_scenario(scenario_name, type, definition=None)#
Create a new scenario in the project, and return a handle to interact with it
- Parameters:
scenario_name (str) – The name for the new scenario. This does not need to be unique (although this is strongly recommended)
type (str) – The type of the scenario. MUst be one of ‘step_based’ or ‘custom_python’
definition (dict) – the JSON definition of the scenario. Use get_definition(with_status=False) on an existing DSSScenario object in order to get a sample definition object (defaults to {‘params’: {}})
- Returns:
a
dataikuapi.dss.scenario.DSSScenario
handle to interact with the newly-created scenario
- list_recipes(as_type='listitems')#
List the recipes in this project
- Parameters:
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns:
The list of the recipes. If “as_type” is “listitems”, each one as a
dataikuapi.dss.recipe.DSSRecipeListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.recipe.DSSRecipe
- Return type:
list
- get_recipe(recipe_name)#
Gets a
dataikuapi.dss.recipe.DSSRecipe
handle to interact with a recipe- Parameters:
recipe_name (str) – The name of the recipe
- Returns:
A recipe handle
- Return type:
- create_recipe(recipe_proto, creation_settings)#
Create a new recipe in the project, and return a handle to interact with it. We strongly recommend that you use the creator helpers instead of calling this directly.
Some recipe types require additional parameters in creation_settings:
‘grouping’ : a ‘groupKey’ column name
‘python’, ‘sql_query’, ‘hive’, ‘impala’ : the code of the recipe as a ‘payload’ string
- Parameters:
recipe_proto (dict) – a prototype for the recipe object. Must contain at least ‘type’ and ‘name’
creation_settings (dict) – recipe-specific creation settings
- Returns:
A recipe handle
- Return type:
- new_recipe(type, name=None)#
Initializes the creation of a new recipe. Returns a
dataikuapi.dss.recipe.DSSRecipeCreator
or one of its subclasses to complete the creation of the recipe.Usage example:
grouping_recipe_builder = project.new_recipe("grouping") grouping_recipe_builder.with_input("dataset_to_group_on") # Create a new managed dataset for the output in the "filesystem_managed" connection grouping_recipe_builder.with_new_output("grouped_dataset", "filesystem_managed") grouping_recipe_builder.with_group_key("column") recipe = grouping_recipe_builder.build() # After the recipe is created, you can edit its settings recipe_settings = recipe.get_settings() recipe_settings.set_column_aggregations("value", sum=True) recipe_settings.save() # And you may need to apply new schemas to the outputs recipe.compute_schema_updates().apply()
- Parameters:
type (str) – Type of the recipe
name (str) – Optional, base name for the new recipe.
- Returns:
A new DSS Recipe Creator handle
- Return type:
- get_flow()#
- Returns:
A Flow handle
- Return type:
- sync_datasets_acls()#
Resync permissions on HDFS datasets in this project
Attention
This call requires an API key with admin rights
- Returns:
a handle to the task of resynchronizing the permissions
- Return type:
- list_running_notebooks(as_objects=True)#
Caution
Deprecated. Use
DSSProject.list_jupyter_notebooks()
List the currently-running notebooks
- Returns:
list of notebooks. Each object contains at least a ‘name’ field
- Return type:
list
- get_tags()#
List the tags of this project.
- Returns:
a dictionary containing the tags with a color
- Return type:
dict
- set_tags(tags=None)#
Set the tags of this project. :param dict tags: must be a modified version of the object returned by list_tags (defaults to {})
- list_macros(as_objects=False)#
List the macros accessible in this project
- Parameters:
as_objects – if True, return the macros as
dataikuapi.dss.macro.DSSMacro
macro handles instead of a list of python dicts (defaults to False)- Returns:
the list of the macros
- Return type:
list
- get_macro(runnable_type)#
Get a handle to interact with a specific macro
- Parameters:
runnable_type (str) – the identifier of a macro
- Returns:
A macro handle
- Return type:
- get_wiki()#
Get the wiki
- Returns:
the wiki associated to the project
- Return type:
- get_object_discussions()#
Get a handle to manage discussions on the project
- Returns:
the handle to manage discussions
- Return type:
- init_tables_import()#
Start an operation to import Hive or SQL tables as datasets into this project
- Returns:
a
dataikuapi.dss.project.TablesImportDefinition
to add tables to import- Return type:
- list_sql_schemas(connection_name)#
Lists schemas from which tables can be imported in a SQL connection
- Parameters:
connection_name (str) – name of the SQL connection
- Returns:
an array of schemas names
- Return type:
list
- list_hive_databases()#
Lists Hive databases from which tables can be imported
- Returns:
an array of databases names
- Return type:
list
- list_sql_tables(connection_name, schema_name=None)#
Lists tables to import in a SQL connection
- Parameters:
connection_name (str) – name of the SQL connection
schema_name (str) – Optional, name of the schema in the SQL connection in which to list tables.
- Returns:
an array of tables
- Return type:
list
- list_hive_tables(hive_database)#
Lists tables to import in a Hive database
- Parameters:
hive_database (str) – name of the Hive database
- Returns:
an array of tables
- Return type:
list
- list_elasticsearch_indices_or_aliases(connection_name)#
- get_app_manifest()#
Gets the manifest of the application if the project is an app template or an app instance, fails otherwise.
- Returns:
the manifest of the application associated to the project
- Return type:
- setup_mlflow(managed_folder, host=None)#
Set up the dss-plugin for MLflow
- Parameters:
managed_folder (object) – the managed folder where MLflow artifacts should be stored. Can be either a managed folder id as a string, a
dataikuapi.dss.managedfolder.DSSManagedFolder
, or adataiku.Folder
host (str) – setup a custom host if the backend used is not DSS (defaults to None).
- get_mlflow_extension()#
Get a handle to interact with the extension of MLflow provided by DSS
- Returns:
A Mlflow Extension handle
- Return type:
- list_code_studios(as_type='listitems')#
List the code studio objects in this project
- Parameters:
as_type (str) – How to return the list. Supported values are “listitems” and “objects” (defaults to listitems).
- Returns:
the list of the code studio objects, each one as a python dict
- Return type:
list
- get_code_studio(code_studio_id)#
Get a handle to interact with a specific code studio object
- Parameters:
code_studio_id (str) – the identifier of the desired code studio object
- Returns:
A code studio object handle
- Return type:
- create_code_studio(name, template_id)#
Create a new code studio object in the project, and return a handle to interact with it
- Parameters:
name (str) – the name of the code studio object
template_id (str) – the identifier of a code studio template
- Returns:
A code studio object handle
- Return type:
- get_library()#
Get a handle to manage the project library
- Returns:
- Return type:
- list_llms(purpose='GENERIC_COMPLETION', as_type='listitems')#
List the LLM usable in this project
- Parameters:
purpose (str) – Usage purpose of the LLM. Main values are GENERIC_COMPLETION and TEXT_EMBEDDING_EXTRACTION
as_type (str) – How to return the list. Supported values are “listitems” and “objects”.
- Returns:
The list of LLMs. If “as_type” is “listitems”, each one as a
dataikuapi.dss.llm.DSSLLMListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.llm.DSSLLM
- Return type:
list
- get_llm(llm_id)#
Get a handle to interact with a specific LLM
- Parameters:
id – the identifier of an LLM
- Returns:
A
dataikuapi.dss.llm.DSSLLM
LLM handle
- list_knowledge_banks(as_type='listitems')#
List the knowledge banks of this project
- Parameters:
as_type (str) – How to return the list. Supported values are “listitems” and “objects”.
- Returns:
The list of knowledge banks. If “as_type” is “listitems”, each one as a
dataikuapi.dss.knowledgebank.DSSKnowledgeBankListItem
. If “as_type” is “objects”, each one as adataikuapi.dss.knowledgebank.DSSKnowledgeBank
- Return type:
list
- get_knowledge_bank(id)#
Get a handle to interact with a specific knowledge bank
- Parameters:
id – the identifier of a knowledge bank
- Returns:
A
dataikuapi.dss.knowledgebank.DSSKnowledgeBank
knowledge bank handle
- list_webapps(as_type='listitems')#
List the webapp heads of this project
- Parameters:
as_type (str) – How to return the list. Supported values are “listitems” and “objects”.
- Returns:
The list of the webapps. If “as_type” is “listitems”, each one as a
scenario.DSSWebAppListItem
. If “as_type” is “objects”, each one as ascenario.DSSWebApp
- Return type:
list
- get_webapp(webapp_id)#
Get a handle to interact with a specific webapp :param webapp_id: the identifier of a webapp :returns: A
dataikuapi.dss.webapp.DSSWebApp
webapp handle
- list_dashboards(as_type='listitems')#
List the Dashboards in this project.
- Returns:
The list of the dashboards.
- Return type:
list
- get_dashboard(dashboard_id)#
Get a handle to interact with a specific dashboard object
- Parameters:
dashboard_id (str) – the identifier of the desired dashboard object
- Returns:
A
dataikuapi.dss.dashboard.DSSDashboard
dashboard object handle
- create_dashboard(dashboard_name, settings=None)#
Create a new dashboard in the project, and return a handle to interact with it
- Parameters:
dashboard_name (str) – The name for the new dashboard. This does not need to be unique (although this is strongly recommended)
settings (dict) – the JSON definition of the dashboard. Use
get_settings()
on an existingDSSDashboard
object in order to get a sample settings object (defaults to {‘pages’: []})
- Returns:
a
dashboard.DSSDashboard
handle to interact with the newly-created dashboard
- list_insights(as_type='listitems')#
List the Insights in this project.
- Returns:
The list of the insights.
- Return type:
list
- get_insight(insight_id)#
Get a handle to interact with a specific insight object
- Parameters:
insight_id (str) – the identifier of the desired insight object
- Returns:
A
dataikuapi.dss.insight.DSSInsight
insight object handle
- create_insight(creation_info)#
Create a new insight in the project, and return a handle to interact with it
- Parameters:
creation_info (dict) – the JSON definition of the insight creation. Use
get_settings()
on an existingDSSInsight
object in order to get a sample settings object- Returns:
a
insight.DSSInsight
handle to interact with the newly-created insight
- get_project_git()#
Gets an handle to perform operations on the project’s git repository.
- Returns:
a handle to perform git operations on project.
- Return type:
- get_data_quality_status(only_monitored=True)#
Get the aggregated quality status of a project with the list of the datasets and their associated status
- Parameters:
only_monitored – boolean to retrieve only monitored dataset, default to True.
- Returns:
The dict of data quality dataset statuses.
- Return type:
dict with DATASET_NAME as key
- get_data_quality_timeline(min_timestamp=None, max_timestamp=None)#
Get the list of quality status aggregated per day during the timeframe [min_timestamp, max_timestamp]. It includes the current & worst outcome for each days and the details of the datasets runs within the period, also includes previous deleted monitored datasets with the mention “(deleted)” at the end of their id. Default parameters include the timeframe for the last 14 days.
- Parameters:
min_timestamp (int) – timestamp representing the beginning of the timeframe
max_timestamp (int) – timestamp representing the end of the timeframe
- Returns:
list of datasets per day in the timeline
- Return type:
list of dict
- class dataikuapi.dss.project.DSSProjectGit(client, project_key)#
Handle to manage the git repository of a DSS project (fetch, push, pull, …)
- get_status()#
Get the current state of the project’s git repository
- Returns:
A dict containing the following keys: ‘currentBranch’, ‘remotes’, ‘trackingCount’, ‘clean’, ‘hasUncommittedChanges’, ‘added’, ‘changed’, ‘removed’, ‘missing’, ‘modified’, ‘conflicting’, ‘untracked’ and ‘untrackedFolders’
- Return type:
dict
- get_remote(name='origin')#
Get the URL of the remote repository.
- Param:
str name: The name of the remote. Defaults to “origin”.
- Returns:
The URL of the remote origin if set, None otherwise.
- Return type:
str or None
- set_remote(url, name='origin')#
Set the URL of the remote repository.
- Param:
str name: The name of the remote to set. Defaults to “origin”.
- Parameters:
url (str) – The URL of the remote repository (git@github.com:user/repo.git).
- remove_remote(name='origin')#
Remove the remote origin of the project’s git repository.
- Param:
str name: The name of the remote to remove. Defaults to “origin”.
- list_branches(remote=False)#
List all branches (local only or local & remote) of the project’s git repository.
- Parameters:
remote (bool) – Whether to include remote branches.
- Returns:
A list of branch names.
- Return type:
list
- create_branch(branch_name, commit=None)#
Create a new local branch on the project’s git repository and switches to it.
- Parameters:
branch_name (str) – The name of the branch to create.
commit (str) – Hash of a commit to create the branch from (optional).
- Returns:
A dict containing keys ‘success’ and ‘output’ with information about the command execution.
- Return type:
dict
- delete_branch(branch_name, force_delete=False, remote=False, delete_remotely=False)#
Delete a local or remote branch on the project’s git repository.
- Parameters:
branch_name (str) – The name of the branch to delete.
remote (bool) – True if the branch to delete is a remote branch; False if it’s a local branch.
delete_remotely (bool) – True to delete a remote branch both locally and remotely; False to delete the remote branch on the local repository only.
force_delete (bool) – True to force the deletion even if some commits have not been pushed; False to fail in case some commits have not been pushed.
- get_current_branch()#
Get the name of the current branch
- Returns:
The name of the current branch
- Return type:
str
- list_tags()#
Lists all existing tags.
- Returns:
A list of dict objects, each one containing the following keys: ‘name’, ‘shortName’, ‘commit’ (hash of the commit associated with the tag), ‘annotations’ and ‘readOnly’.
- Return type:
list
- create_tag(name, reference='HEAD', message='')#
Create a tag for the specified or current reference.
- Parameters:
name (str) – The name of the tag to create.
reference (str) – ID of a commit to tag. Defaults to HEAD.
- delete_tag(name)#
Remove a tag from the local repository
- Parameters:
name (str) – The name of the tag to delete.
- switch(branch_name)#
Switch the current repository to the specified branch.
- Parameters:
branch_name (str) – The name of the branch to switch to.
- Returns:
A dict containing keys ‘success’, ‘messages’, ‘output’ with information about the command execution.
- Return type:
dict
- checkout(branch_name)#
Switch the current repository to the specified branch (identical to`switch`)
- Parameters:
branch_name (str) – The name of the branch to checkout.
- Returns:
A dict containing keys ‘success’, ‘messages’, ‘output’ with information about the command execution.
- Return type:
dict
- fetch()#
Fetch branches and/or tags (collectively, “refs”) from the remote repository to the project’s git repository.
- Returns:
A dict containing keys ‘success’, ‘logs’, ‘output’ with information about the command execution.
- Return type:
dict
- pull(branch_name=None)#
Incorporate changes from a remote repository into the current branch on the project’s git repository.
- Parameters:
branch_name (str) – The name of the branch to pull. If None, pull from the current branch.
- Returns:
A dict containing keys ‘success’, ‘logs’, ‘output’ with information about the command execution.
- Return type:
dict
- push(branch_name=None)#
Update the remote repository with the project’s local commits.
- Parameters:
branch_name (str) – The name of the branch to push. If None, push commits from the current branch.
- Returns:
A dict containing keys ‘success’, ‘logs’, ‘output’ with information about the command execution.
- Return type:
dict
- log(path=None, start_commit=None, count=1000)#
List commits in the project’s git repository.
- Parameters:
path (str) – Path to filter the logs (optional). If specified, only commits impacting files located in the provided path are returned.
start_commit (str) – ID of the first commit to list. Use the value found in the nextCommit field from a previous response (optional).
count (int) – Maximum number of commits to return (20 by default).
- Returns:
A dict containing a key entries and optionally a second key nextCommit if there are more commits.
- Return type:
dict
- diff(commit_from=None, commit_to=None)#
Show changes between the working copy and the last commit (commit_from=None, commit_to=None), between two commits (commit_from=SOME_ID, commit_to=SOME_ID), or made in a given commit (commit_from=SOME_ID, commit_to=None).
- Parameters:
commit_from (str) – ID of the first commit or None
commit_to (str) – ID of the second commit or None
- Returns:
A containing containing the following keys: ‘commitFrom’ (dict), ‘commitTo’ (dict), ‘addedLines’ (int), ‘removedLines’ (int), ‘changedFiles’ (int), ‘entries’ (array of dict)
- Return type:
dict
- commit(message)#
Commit pending changes in the project’s git repository with the given message.
Note: Untracked tracked are automatically added before committing.
- Parameters:
message (str) – The commit message.
- revert_to_revision(commit)#
Revert the project content to the supplied revision.
- Parameters:
commit (str) – Hash of a valid commit as returned by the log method.
- Returns:
A dict containing a key ‘success’ and optionally a second key ‘logs’ with information about the command execution.
- Return type:
dict
- revert_commit(commit)#
Revert the changes that the specified commit introduces
- Parameters:
commit (str) – ID of a valid commit as returned by the log method.
- Returns:
A dict containing the following keys: ‘success’ and optionally ‘logs’ with information about the command execution.
- Return type:
dict
- reset_to_head()#
Drop uncommitted changes in the project’s git repository (hard reset to HEAD).
- reset_to_upstream()#
Drop local changes in the project’s git repository and hard reset to the upstream branch.
- drop_and_rebuild(i_know_what_i_am_doing=False)#
Fully drop the current git repositoty and rebuild from scratch a new one. CAUTION: ALL HISTORY WILL BE LOST. ONLY CALL THIS METHOD IF YOU KNOW WHAT YOU ARE DOING.
- Parameters:
i_know_what_i_m_doing (bool) – True if you really want to wipe out all git history for this project.
- list_libraries()#
Get the list of all external libraries for this project
- Returns:
A list of external libraries.
- Return type:
list
- add_library(repository, local_target_path, checkout, path_in_git_repository='', add_to_python_path=True)#
Add a new external library to the project and pull it.
- Parameters:
repository (str) – The remote repository.
local_target_path (str) – The local target path (relative to root of libraries).
checkout (str) – The branch, commit, or tag to check out.
path_in_git_repository (str) – The path in the git repository.
add_to_python_path (bool) – Whether to add the reference to the Python path.
- Returns:
a
dataikuapi.dss.future.DSSFuture
representing the pull process- Return type:
- set_library(git_reference_path, remote, remotePath, checkout)#
Set an existing external library.
- Parameters:
git_reference_path (str) – The path of the external library.
remote (str) – The remote repository.
remotePath (str) – The path in the git repository.
checkout (str) – The branch, commit, or tag to check out.
- Returns:
The path of the external library.
- Return type:
str
- remove_library(git_reference_path, delete_directory)#
Remove an external library from the project.
- Parameters:
git_reference_path (str) – The path of the external library.
delete_directory (bool) – Whether to delete the local directory associated with the reference.
- reset_library(git_reference_path)#
Reset changes to HEAD from the external library.
- Parameters:
git_reference_path (str) – The path of the external library to reset.
- Returns:
a
dataikuapi.dss.future.DSSFuture
representing the reset process- Return type:
- push_library(git_reference_path, commit_message)#
Push changes to the external library
- Parameters:
git_reference_path (str) – The path of the external library.
commit_message (str) – The commit message for the push.
- Returns:
a
dataikuapi.dss.future.DSSFuture
representing the push process- Return type:
- push_all_libraries(commit_message)#
Push changes for all libraries in the project.
- Parameters:
commit_message (str) – The commit message for the push.
- Returns:
a
dataikuapi.dss.future.DSSFuture
representing the push process- Return type:
- reset_all_libraries()#
Reset changes for all libraries in the project. :return: a
dataikuapi.dss.future.DSSFuture
representing the reset process :rtype:dataikuapi.dss.future.DSSFuture
- class dataiku.Project(project_key=None)#
This is a handle to interact with the current project
Note
This class is also available as
dataiku.Project
- get_last_metric_values()#
Get the set of last values of the metrics on this project.
- Return type:
- get_metric_history(metric_lookup)#
Get the set of all values a given metric took on this project.
- Parameters:
metric_lookup (string) – metric name or unique identifier
- Return type:
dict
- save_external_metric_values(values_dict)#
Save metrics on this project.
The metrics are saved with the type “external”
- Parameters:
values_dict (dict) – the values to save, as a dict. The keys of the dict are used as metric names
- get_last_check_values()#
Get the set of last values of the checks on this project.
- Return type:
- get_check_history(check_lookup)#
Get the set of all values a given check took on this project.
- Parameters:
check_lookup (string) – check name or unique identifier
- Return type:
dict
- set_variables(variables)#
Set all variables of the current project
- Parameters:
variables (dict) – must be a modified version of the object returned by get_variables
- get_variables()#
Get project variables :param bool typed: typed true to try to cast the variable into its original type (eg. int rather than string)
- Returns:
A dictionary containing two dictionaries : “standard” and “local”. “standard” are regular variables, exported with bundles. “local” variables are not part of the bundles for this project
- save_external_check_values(values_dict)#
Save checks on this project.
The checks are saved with the type “external”.
- Parameters:
values_dict (dict) – the values to save, as a dict. The keys of the dict are used as check names