Flow creation and management#

For usage information and examples, see Flow creation and management

class dataikuapi.dss.flow.DSSProjectFlow(client, project)#

get_graph()#

Get the flow graph.

Returns:: A handle to use the flow graph
Return type:: DSSProjectFlowGraph

create_zone(name, color='#2ab1ac')#

Creates a new flow zone.

Parameters:

name (str) – new flow zone name
color (str) – new flow zone color, in hexadecimal format #RRGGBB (defaults to #2ab1ac)

Returns:

the newly created zone

Return type:

DSSFlowZone

get_zone(id)#

Gets a single Flow zone by id.

Parameters:: id (str) – flow zone id
Returns:: A flow zone
Return type:: DSSFlowZone

get_default_zone()#

Returns the default zone of the Flow.

Returns:: A flow zone
Return type:: DSSFlowZone

list_zones()#

Lists all zones in the Flow.

Returns:: A list of flow zones
Return type:: list of DSSFlowZone

get_zone_of_object(obj)#

Finds the zone to which this object belongs.

If the object is not found in any specific zone, it belongs to the default zone, and the default zone is returned.

Parameters:: obj (DSSDataset, DSSManagedFolder, DSSSavedModel, DSSRecipe DSSModelEvaluationStore or DSSStreamingEndpoint) – An object to search
Returns:: The flow zone containing the object
Return type:: DSSFlowZone

replace_input_computable(current_ref, new_ref, type='DATASET')#

This method replaces all references to a “computable” (Dataset, Managed Folder or Saved Model) as input of recipes in the whole Flow by a reference to another computable.

No specific checks are performed. It is your responsibility to ensure that the schema of the new dataset will be compatible with the previous one (in the case of datasets).

If new_ref references an object in a foreign project, this method will automatically ensure that new_ref is exposed to the current project.

Parameters:

current_ref (str) – Either a “simple” object id (dataset name or model/folder/model evaluation store/streaming endpoint id) or a foreign object reference in the form “FOREIGN_PROJECT_KEY.local_id”
new_ref (str) – Either a “simple” object id (dataset name or model/folder/model evaluation store/streaming endpoint id) or a foreign object reference in the form “FOREIGN_PROJECT_KEY.local_id”
type (str) – The type of object being replaced (DATASET, SAVED_MODEL, MANAGED_FOLDER, MODEL_EVALUATION_STORE, STREAMING_ENDPOINT) (defaults to DATASET)

generate_documentation(folder_id=None, path=None)#

Start the flow document generation from a template docx file in a managed folder, or from the default template if no folder id and path are specified.

Parameters:

folder_id (str) – the id of the managed folder (defaults to None)
path (str) – the path to the file from the root of the folder (defaults to None)

Returns:

The flow document generation future

Return type:

DSSFuture

generate_documentation_from_custom_template(fp)#

Start the flow document generation from a docx template (as a file object).

Parameters:: fp (object) – A file-like object pointing to a template docx file
Returns:: The flow document generation future
Return type:: DSSFuture

download_documentation_stream(export_id)#

Download a flow documentation, as a binary stream.

Warning

You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.

Parameters:: export_id (str) – the id of the generated flow documentation returned as the result of the future
Returns:: the generated flow documentation file as a stream
Return type:: requests.Response

download_documentation_to_file(export_id, path)#

Download a flow documentation into the given output file.

Parameters:

export_id (str) – the id of the generated flow documentation returned as the result of the future
path (str) – the path where to download the flow documentation

start_tool(type, data={})#

Caution

Deprecated, this method will no longer be available for views (starting with DSS 13.4): TAGS, CUSTOM_FIELDS, CONNECTIONS, COUNT_OF_RECORDS, FILESIZE, FILEFORMATS, RECIPES_ENGINES, RECIPES_CODE_ENVS, IMPALA_WRITE_MODE, HIVE_MODE, SPARK_CONFIG, SPARK_PIPELINES, SQL_PIPELINES, PARTITIONING, PARTITIONS, SCENARIOS, CREATION, LAST_MODIFICATION, LAST_BUILD, RECENT_ACTIVITY, WATCH. It will be maintained for flow actions.

Start a tool in the flow.

Parameters:

type (str) – one of {COPY, CHECK_CONSISTENCY, PROPAGATE_SCHEMA}
data (dict) – initial data for the tool (defaults to {})

Returns:

A handle to interact with the newly-created tool or view

Return type:

DSSFlowTool

new_schema_propagation(dataset_name)#

Start an automatic schema propagation from a dataset.

Parameters:: dataset_name (str) – name of a dataset to start propagating from
Returns:: A handle to set options and start the propagation
Return type:: DSSSchemaPropagationRunBuilder

class dataikuapi.dss.flow.DSSProjectFlowGraph(flow, data)#

get_source_computables(as_type='dict')#

Parameters:: as_type (str) – How to return the source computables. Possible values are “dict” and “object” (defaults to dict)
Returns:: The list of source computables
Return type:: If as_type=dict, each computable is returned as a dict containing at least “ref” and “type”. If as_type=object, each computable is returned as a DSSDataset, DSSManagedFolder, DSSSavedModel, DSSModelEvaluationStore or DSSStreamingEndpoint

get_source_recipes(as_type='dict')#

Parameters:: as_type (str) – How to return the source recipes. Possible values are “dict” and “object” (defaults to dict)
Returns:: The list of source recipes
Return type:: If as_type=dict, each recipe is returned as a dict containing at least “ref” and “type”. If as_type=object, each recipe is returned as a DSSRecipe.

get_source_datasets()#

Returns:: The list of source datasets for this project
Return type:: List of DSSDataset

get_successor_recipes(node, as_type='dict')#

Parameters:

node (str or DSSDataset) – Either a name or a dataset object
as_type (str) – How to return the successor recipes. Possible values are “dict” and “object” (defaults to dict)

Returns:

A list of recipes that are a successor of the given graph node

Return type:

If as_type=dict, each recipe is returned as a dict containing at least “ref” and “type”. If as_type=object, each recipe is returned as a DSSRecipe.

get_successor_computables(node, as_type='dict')#

Parameters:

node (str or DSSRecipe) – Either a name or a recipe object
as_type (str) – How to return the successor computables. Possible values are “dict” and “object” (defaults to dict).

Returns:

A list of computables that are a successor of a given graph node

Return type:

If as_type=dict, each computable is returned as a dict containing at least “ref” and “type”. If as_type=object, each computable is returned as a DSSDataset. DSSManagedFolder, DSSSavedModel, DSSModelEvaluationStore or DSSStreamingEndpoint

get_items_in_traversal_order(as_type='dict')#

Get the list of nodes in left to right order.

Parameters:: as_type (str) – How to return the nodes. Possible values are “dict” and “object” (defaults to dict).
Returns:: A list of nodes
Return type:: If as_type=dict, each item is returned as a dict containing at least “ref” and “type”. If as_type=object, each item is returned as a DSSDataset. DSSManagedFolder, DSSSavedModel, DSSModelEvaluationStore, DSSStreamingEndpoint or DSSRecipe

class dataikuapi.dss.flow.DSSFlowZone(flow, data)#

A zone in the Flow.

Important

Do not create this object manually, use DSSProjectFlow.get_zone() or DSSProjectFlow.list_zones()

property id#

property name#

property color#

get_settings()#

Gets the settings of this zone in order to modify them.

Returns:: The settings of the flow zone
Return type:: DSSFlowZoneSettings

add_item(obj)#

Adds an item to this zone.

The item will automatically be moved from its existing zone. Additional items may be moved to this zone as a result of the operation (notably the recipe generating obj).

Parameters:: obj (DSSDataset, DSSManagedFolder, DSSSavedModel, DSSRecipe, DSSModelEvaluationStore or DSSStreamingEndpoint) – object to add to the zone

add_items(items)#

Adds items to this zone.

The items will automatically be moved from their existing zones. Additional items may be moved to this zone as a result of the operations (notably the recipe generating the items).

Parameters:: items (list of DSSDataset, DSSManagedFolder, DSSSavedModel, DSSRecipe, DSSModelEvaluationStore or DSSStreamingEndpoint) – A list of objects to add to the zone

property items#

The list of items explicitly belonging to this zone.

This list is read-only. To add an object, use add_item(). It will remove it from its current zone, if any. To remove an object from a zone without placing in another specific zone, add it to the default zone: flow.get_zone('default').add_item(item)

Note

The “default” zone content is defined as all items that are not explicitly in another zone. It cannot directly be listed with the items property. To get the list of items including those in the default zone, use the get_graph() method.

Returns:: the items in the zone
Return type:: list of DSSDataset, DSSManagedFolder, DSSSavedModel, DSSRecipe, DSSModelEvaluationStore or DSSStreamingEndpoint

add_shared(obj)#

Share an item to this zone.

The item will not be automatically unshared from its existing zone.

Parameters:: obj (DSSDataset, DSSManagedFolder, DSSSavedModel, DSSRecipe, DSSModelEvaluationStore or DSSStreamingEndpoint) – object to share to the zone

remove_shared(obj)#

Remove a shared item from this zone.

Parameters:: obj (DSSDataset, DSSManagedFolder, DSSSavedModel, DSSModelEvaluationStore or DSSStreamingEndpoint) – object to remove from the zone

property shared#

The list of items that have been explicitly pre-shared to this zone.

This list is read-only, to modify it, use add_shared() and remove_shared()

Returns:: the items shared to this zone
Return type:: list of DSSDataset, DSSManagedFolder, DSSSavedModel, DSSModelEvaluationStore or DSSStreamingEndpoint

get_graph()#

Get the flow graph.

Returns:: A handle to use the flow graph
Return type:: DSSProjectFlowGraph

delete()#: Delete the zone, all items will be moved to the default zone.

generate_ai_description(language='english', purpose='generic', length='medium', save_description=False)#

Generates an AI-powered description for this flow zone.

Parameters:

language (str) – The language of the generated description. Supported languages are “dutch”, “english”, “french”, “german”, “portuguese”, and “spanish” (defaults to english).
purpose (str) – The purpose of the generated description. Supported purposes are “generic”, “technical”, “business_oriented”, and “executive” (defaults to generic).
length (str) – The length of the generated description. Supported lengths are “low”, “medium”, and “high” (defaults to medium).
boolean – To save the generated description to this project (defaults to False).

Returns:

a message upon successful completion of the generated AI description. Only contains one msg field. For example, {‘msg’: ‘An example description generated by AI’}

:rtype dict

class dataikuapi.dss.flow.DSSFlowZoneSettings(zone)#

The settings of a flow zone.

Important

Do not create this directly, use DSSFlowZone.get_settings().

get_raw()#: Gets the raw settings of the zone.

Note

You cannot modify the items and shared elements through this class. Instead, use DSSFlowZone.add_item() and others

property name#

property color#

save()#: Saves the settings of the zone

class dataikuapi.dss.flow.DSSSchemaPropagationRunBuilder(project, client, dataset_name)#

Important

Do not create this directly, use DSSProjectFlow.new_schema_propagation().

set_auto_rebuild(auto_rebuild)#

Sets whether to automatically rebuild datasets if needed while propagating.

Parameters:: auto_rebuild (bool) – whether to automatically rebuild datasets if needed (defaults to True)

set_default_partitioning_value(dimension, value)#

In the case of partitioned flows, sets the default partition value to use when rebuilding, for a specific dimension name.

Parameters:

dimension (str) – a partitioning dimension name
value (str) – a partitioning dimension value

set_partition_for_computable(full_id, partition)#

In the case of partitioned flows, sets the partition id to use when building a particular computable. Overrides the default partitioning value per dimension.

Parameters:

full_id (str) – Full name of the computable, in the form PROJECTKEY.id
partition (str) – a full partition id (all dimensions)

stop_at(recipe_name)#

Sets the given recipe as a schema propagation stop mark.

Parameters:: recipe_name (str) – the name of the recipe

mark_recipe_as_ok(name)#

Marks a recipe as always considered as OK during propagation.

Parameters:: name (str) – recipe to mark as ok

set_grouping_update_options(recipe=None, remove_missing_aggregates=True, remove_missing_keys=True, new_aggregates={})#

Sets update options for grouping recipes.

Parameters:

recipe (str) – if None, applies to all grouping recipes. Else, applies only to this name (defaults to None)
remove_missing_aggregates (bool) – whether to remove missing aggregates (defaults to True)
remove_missing_keys (bool) – whether to remove missing keys (defaults to True)
new_aggregates (dict) – new aggregates (defaults to {})

set_window_update_options(recipe=None, remove_missing_aggregates=True, remove_missing_in_window=True, new_aggregates={})#

Sets update options for window recipes.

Parameters:

recipe (str) – if None, applies to all window recipes. Else, applies only to this name (defaults to None)
remove_missing_aggregates (bool) – whether to remove missing aggregates (defaults to True)
remove_missing_in_window (bool) – whether to remove missing keys in windows (defaults to True)
new_aggregates (dict) – new aggregates (defaults to {})

set_join_update_options(recipe=None, remove_missing_join_conditions=True, remove_missing_join_values=True, new_selected_columns={})#

Sets update options for join recipes.

Parameters:

recipe (str) – if None, applies to all join recipes. Else, applies only to this name (defaults to None)
remove_missing_join_conditions (bool) – whether to remove missing join conditions (defaults to True)
remove_missing_join_values (bool) – whether to remove missing join values (defaults to True)
new_selected_columns (dict) – new selected columns (defaults to {})

start()#

Starts the actual propagation. Returns a future to wait for completion.

Returns:: A future representing the schema propagation job
Return type:: DSSFuture