Flow creation and management#
For usage information and examples, see Flow creation and management
- class dataikuapi.dss.flow.DSSProjectFlow(client, project)#
- get_graph()#
Get the flow graph.
- Returns:
A handle to use the flow graph
- Return type:
- create_zone(name, color='#2ab1ac')#
Creates a new flow zone.
- Parameters:
name (str) – new flow zone name
color (str) – new flow zone color, in hexadecimal format #RRGGBB (defaults to #2ab1ac)
- Returns:
the newly created zone
- Return type:
- get_zone(id)#
Gets a single Flow zone by id.
- Parameters:
id (str) – flow zone id
- Returns:
A flow zone
- Return type:
- get_default_zone()#
Returns the default zone of the Flow.
- Returns:
A flow zone
- Return type:
- list_zones()#
Lists all zones in the Flow.
- Returns:
A list of flow zones
- Return type:
list of
DSSFlowZone
- get_zone_of_object(obj)#
Finds the zone to which this object belongs.
If the object is not found in any specific zone, it belongs to the default zone, and the default zone is returned.
- Parameters:
obj (
DSSDataset
,DSSManagedFolder
,DSSSavedModel
,DSSRecipe
DSSModelEvaluationStore
orDSSStreamingEndpoint
) – An object to search- Returns:
The flow zone containing the object
- Return type:
- replace_input_computable(current_ref, new_ref, type='DATASET')#
This method replaces all references to a “computable” (Dataset, Managed Folder or Saved Model) as input of recipes in the whole Flow by a reference to another computable.
No specific checks are performed. It is your responsibility to ensure that the schema of the new dataset will be compatible with the previous one (in the case of datasets).
If new_ref references an object in a foreign project, this method will automatically ensure that new_ref is exposed to the current project.
- Parameters:
current_ref (str) – Either a “simple” object id (dataset name or model/folder/model evaluation store/streaming endpoint id) or a foreign object reference in the form “FOREIGN_PROJECT_KEY.local_id”
new_ref (str) – Either a “simple” object id (dataset name or model/folder/model evaluation store/streaming endpoint id) or a foreign object reference in the form “FOREIGN_PROJECT_KEY.local_id”
type (str) – The type of object being replaced (DATASET, SAVED_MODEL, MANAGED_FOLDER, MODEL_EVALUATION_STORE, STREAMING_ENDPOINT) (defaults to DATASET)
- generate_documentation(folder_id=None, path=None)#
Start the flow document generation from a template docx file in a managed folder, or from the default template if no folder id and path are specified.
- Parameters:
folder_id (str) – the id of the managed folder (defaults to None)
path (str) – the path to the file from the root of the folder (defaults to None)
- Returns:
The flow document generation future
- Return type:
- generate_documentation_from_custom_template(fp)#
Start the flow document generation from a docx template (as a file object).
- Parameters:
fp (object) – A file-like object pointing to a template docx file
- Returns:
The flow document generation future
- Return type:
- download_documentation_stream(export_id)#
Download a flow documentation, as a binary stream.
Warning
You need to close the stream after download. Failure to do so will result in the DSSClient becoming unusable.
- Parameters:
export_id (str) – the id of the generated flow documentation returned as the result of the future
- Returns:
the generated flow documentation file as a stream
- Return type:
requests.Response
- download_documentation_to_file(export_id, path)#
Download a flow documentation into the given output file.
- Parameters:
export_id (str) – the id of the generated flow documentation returned as the result of the future
path (str) – the path where to download the flow documentation
- start_tool(type, data={})#
Start a tool or open a view in the flow.
- Parameters:
type (str) – one of {COPY, CHECK_CONSISTENCY, PROPAGATE_SCHEMA} (tools) or {TAGS, CUSTOM_FIELDS, CONNECTIONS, COUNT_OF_RECORDS, FILESIZE, FILEFORMATS, RECIPES_ENGINES, RECIPES_CODE_ENVS, IMPALA_WRITE_MODE, HIVE_MODE, SPARK_ENGINE, SPARK_CONFIG, SPARK_PIPELINES, SQL_PIPELINES, PARTITIONING, PARTITIONS, SCENARIOS, CREATION, LAST_MODIFICATION, LAST_BUILD, RECENT_ACTIVITY, WATCH} (views)
data (dict) – initial data for the tool (defaults to {})
- Returns:
A handle to interact with the newly-created tool or view
- Return type:
DSSFlowTool
- new_schema_propagation(dataset_name)#
Start an automatic schema propagation from a dataset.
- Parameters:
dataset_name (str) – name of a dataset to start propagating from
- Returns:
A handle to set options and start the propagation
- Return type:
- class dataikuapi.dss.flow.DSSProjectFlowGraph(flow, data)#
- get_source_computables(as_type='dict')#
- Parameters:
as_type (str) – How to return the source computables. Possible values are “dict” and “object” (defaults to dict)
- Returns:
The list of source computables
- Return type:
If as_type=dict, each computable is returned as a dict containing at least “ref” and “type”. If as_type=object, each computable is returned as a
DSSDataset
,DSSManagedFolder
,DSSSavedModel
,DSSModelEvaluationStore
orDSSStreamingEndpoint
- get_source_recipes(as_type='dict')#
- Parameters:
as_type (str) – How to return the source recipes. Possible values are “dict” and “object” (defaults to dict)
- Returns:
The list of source recipes
- Return type:
If as_type=dict, each recipe is returned as a dict containing at least “ref” and “type”. If as_type=object, each recipe is returned as a
DSSRecipe
.
- get_source_datasets()#
- Returns:
The list of source datasets for this project
- Return type:
List of
DSSDataset
- get_successor_recipes(node, as_type='dict')#
- Parameters:
node (str or
DSSDataset
) – Either a name or a dataset objectas_type (str) – How to return the successor recipes. Possible values are “dict” and “object” (defaults to dict)
- Returns:
A list of recipes that are a successor of the given graph node
- Return type:
If as_type=dict, each recipe is returned as a dict containing at least “ref” and “type”. If as_type=object, each recipe is returned as a
DSSRecipe
.
- get_successor_computables(node, as_type='dict')#
- Parameters:
node (str or
DSSRecipe
) – Either a name or a recipe objectas_type (str) – How to return the successor computables. Possible values are “dict” and “object” (defaults to dict).
- Returns:
A list of computables that are a successor of a given graph node
- Return type:
If as_type=dict, each computable is returned as a dict containing at least “ref” and “type”. If as_type=object, each computable is returned as a
DSSDataset
.DSSManagedFolder
,DSSSavedModel
,DSSModelEvaluationStore
orDSSStreamingEndpoint
- get_items_in_traversal_order(as_type='dict')#
Get the list of nodes in left to right order.
- Parameters:
as_type (str) – How to return the nodes. Possible values are “dict” and “object” (defaults to dict).
- Returns:
A list of nodes
- Return type:
If as_type=dict, each item is returned as a dict containing at least “ref” and “type”. If as_type=object, each item is returned as a
DSSDataset
.DSSManagedFolder
,DSSSavedModel
,DSSModelEvaluationStore
,DSSStreamingEndpoint
orDSSRecipe
- class dataikuapi.dss.flow.DSSFlowZone(flow, data)#
A zone in the Flow.
Important
Do not create this object manually, use
DSSProjectFlow.get_zone()
orDSSProjectFlow.list_zones()
- property id#
- property name#
- property color#
- get_settings()#
Gets the settings of this zone in order to modify them.
- Returns:
The settings of the flow zone
- Return type:
- add_item(obj)#
Adds an item to this zone.
The item will automatically be moved from its existing zone. Additional items may be moved to this zone as a result of the operation (notably the recipe generating obj).
- Parameters:
obj (
DSSDataset
,DSSManagedFolder
,DSSSavedModel
,DSSRecipe
,DSSModelEvaluationStore
orDSSStreamingEndpoint
) – object to add to the zone
- add_items(items)#
Adds items to this zone.
The items will automatically be moved from their existing zones. Additional items may be moved to this zone as a result of the operations (notably the recipe generating the items).
- Parameters:
items (list of
DSSDataset
,DSSManagedFolder
,DSSSavedModel
,DSSRecipe
,DSSModelEvaluationStore
orDSSStreamingEndpoint
) – A list of objects to add to the zone
- property items#
The list of items explicitly belonging to this zone.
This list is read-only. To add an object, use
add_item()
. It will remove it from its current zone, if any. To remove an object from a zone without placing in another specific zone, add it to the default zone:flow.get_zone('default').add_item(item)
Note
The “default” zone content is defined as all items that are not explicitly in another zone. It cannot directly be listed with the
items
property. To get the list of items including those in the default zone, use theget_graph()
method.- Returns:
the items in the zone
- Return type:
list of
DSSDataset
,DSSManagedFolder
,DSSSavedModel
,DSSRecipe
,DSSModelEvaluationStore
orDSSStreamingEndpoint
Share an item to this zone.
The item will not be automatically unshared from its existing zone.
- Parameters:
obj (
DSSDataset
,DSSManagedFolder
,DSSSavedModel
,DSSRecipe
,DSSModelEvaluationStore
orDSSStreamingEndpoint
) – object to share to the zone
Remove a shared item from this zone.
- Parameters:
obj (
DSSDataset
,DSSManagedFolder
,DSSSavedModel
,DSSModelEvaluationStore
orDSSStreamingEndpoint
) – object to remove from the zone
The list of items that have been explicitly pre-shared to this zone.
This list is read-only, to modify it, use
add_shared()
andremove_shared()
- Returns:
the items shared to this zone
- Return type:
list of
DSSDataset
,DSSManagedFolder
,DSSSavedModel
,DSSModelEvaluationStore
orDSSStreamingEndpoint
- get_graph()#
Get the flow graph.
- Returns:
A handle to use the flow graph
- Return type:
- delete()#
Delete the zone, all items will be moved to the default zone.
- class dataikuapi.dss.flow.DSSFlowZoneSettings(zone)#
The settings of a flow zone.
Important
Do not create this directly, use
DSSFlowZone.get_settings()
.- get_raw()#
Gets the raw settings of the zone.
Note
You cannot modify the items and shared elements through this class. Instead, use
DSSFlowZone.add_item()
and others
- property name#
- property color#
- save()#
Saves the settings of the zone
- class dataikuapi.dss.flow.DSSSchemaPropagationRunBuilder(project, client, dataset_name)#
Important
Do not create this directly, use
DSSProjectFlow.new_schema_propagation()
.- set_auto_rebuild(auto_rebuild)#
Sets whether to automatically rebuild datasets if needed while propagating.
- Parameters:
auto_rebuild (bool) – whether to automatically rebuild datasets if needed (defaults to True)
- set_default_partitioning_value(dimension, value)#
In the case of partitioned flows, sets the default partition value to use when rebuilding, for a specific dimension name.
- Parameters:
dimension (str) – a partitioning dimension name
value (str) – a partitioning dimension value
- set_partition_for_computable(full_id, partition)#
In the case of partitioned flows, sets the partition id to use when building a particular computable. Overrides the default partitioning value per dimension.
- Parameters:
full_id (str) – Full name of the computable, in the form PROJECTKEY.id
partition (str) – a full partition id (all dimensions)
- stop_at(recipe_name)#
Sets the given recipe as a schema propagation stop mark.
- Parameters:
recipe_name (str) – the name of the recipe
- mark_recipe_as_ok(name)#
Marks a recipe as always considered as OK during propagation.
- Parameters:
name (str) – recipe to mark as ok
- set_grouping_update_options(recipe=None, remove_missing_aggregates=True, remove_missing_keys=True, new_aggregates={})#
Sets update options for grouping recipes.
- Parameters:
recipe (str) – if None, applies to all grouping recipes. Else, applies only to this name (defaults to None)
remove_missing_aggregates (bool) – whether to remove missing aggregates (defaults to True)
remove_missing_keys (bool) – whether to remove missing keys (defaults to True)
new_aggregates (dict) – new aggregates (defaults to {})
- set_window_update_options(recipe=None, remove_missing_aggregates=True, remove_missing_in_window=True, new_aggregates={})#
Sets update options for window recipes.
- Parameters:
recipe (str) – if None, applies to all window recipes. Else, applies only to this name (defaults to None)
remove_missing_aggregates (bool) – whether to remove missing aggregates (defaults to True)
remove_missing_in_window (bool) – whether to remove missing keys in windows (defaults to True)
new_aggregates (dict) – new aggregates (defaults to {})
- set_join_update_options(recipe=None, remove_missing_join_conditions=True, remove_missing_join_values=True, new_selected_columns={})#
Sets update options for join recipes.
- Parameters:
recipe (str) – if None, applies to all join recipes. Else, applies only to this name (defaults to None)
remove_missing_join_conditions (bool) – whether to remove missing join conditions (defaults to True)
remove_missing_join_values (bool) – whether to remove missing join values (defaults to True)
new_selected_columns (dict) – new selected columns (defaults to {})