Script a blueprint migration#
Dataiku Govern requires a Python script to define the conditions for blueprint migrations. Blueprint migration scripts allow you to control how an artifact’s information is mapped from one blueprint version to another. This page provides some guidance about how to script a blueprint migration.
Note
The migration script cannot edit the structure of blueprint versions. It can only map the information of an artifact within the framework of existing blueprint versions.
See also
If you are only interested in applying blueprint migrations, check out How-to | Switch artifact templates (blueprint versions).
Pre-populated lines#
The pre-populated lines in the script establish the source and target blueprints.
from govern.core.migration_handler import get_migration_handler
handler = get_migration_handler()
### Get the artifact to migrate
target_artifact = handler.target_artifact
### Get the source enriched artifact
source_enriched_artifact = handler.source_enriched_artifact
### Get the target blueprint version definition
target_blueprint_version = handler.target_blueprint_version
There are also a few lines that are commented out that may help you do things like rename the target artifact, create default target fields, etcetera.
Note
source_enriched_artifact
contains multiple objects:
the source blueprint (
source_enriched_artifact.blueprint
)the source blueprint version (
source_enriched_artifact.blueprint_version
)the source artifact (
source_enriched_artifact.artifact
)
The most important one that needs to be manipulated is target_artifact
.
Important
A key point is that the target_artifact
object in the migration script is a copy of the incoming artifact that will be migrated.
Managing fields#
When writing a migration script, one main task is to migrate the fields. For each field being migrated, there are four possible cases to consider:
Case |
Old blueprint version |
New blueprint version |
---|---|---|
Case 1 |
Field exists |
Field exists with the same definition |
Case 2 |
Field exists |
Field exists with different definition |
Case 3 |
Field exists |
Field doesn’t exist |
Case 4 |
Field doesn’t exist |
Field exists |
Note
In the examples below, fields
is a python dictionary and is accessed with target_artifact.fields
.
Case 1#
The field already exists in the old BPV, and the field still exists in the new BPV with the same definition.
→ Nothing to do in the migration script, because the copied value is valid.
Case 2#
The field already exists in the old BPV, and the field still exists in the new BPV but with a different definition.
→ The field must be overridden in the target_artifact
to match the new definition because the copied value won’t be valid.
Example: changed field2
from a text
to a list of text
#
Error when applying the migration:
Error executing migration mp.mig1: The migrated artifact generated by the migration is not valid. Field field2 is a list. , caused by: ValidationException: Field field2 is a list.
Possible solution in migration script:
# Wrap the existing value to become the first element of the list
fields['field2'] = [fields['field2']]
Case 3#
The field exists in the previous BPV, and the field is removed in the new BPV.
→ The field must be removed from target_artifact
Example: removed field3
in the target bpv#
Error when applying the migration:
Error executing migration mp.mig1: The migrated artifact generated by the migration is not valid. Some fields of Artifact were not found in its blueprint version reference. Field names: field3, caused by: ValidationException: Some fields of Artifact were not found in its blueprint version reference. Field names: field3
Possible solution in migration script:
# Remove the field safely (does nothing if the field doesn't exist)
fields.pop('field3', None)
Case 4#
The field does not exist in the previous BPV, and the field is created in the new BPV.
→ The field can/must (depending on the mandatory checkbox) be set in the target_artifact.
Example: added a mandatory text field4
in the target BPV#
Error when applying the migration:
Error executing migration mp.mig1: The migrated artifact generated by the migration is not valid. Field ID ‘field4’ is required, caused by: ValidationException: Field ID ‘field4’ is required`
Possible solution in migration script:
# Value the new field with a static value
fields['field4'] = 'new value'
To sum up:#
For the migration to be successful, all the fields must be correctly handled according to one of the cases 1, 2, 3 or 4.
Managing workflow steps#
To change the current step:
# Change the current step to be 'new_step'
target_artifact.json.get("status", {})["stepId"] = "new_step"
TLDR#
Here is a short list of potential manipulations.
Action |
Code |
---|---|
Define a default value for a new field |
|
Remove values from a deleted field |
|
Move the data from one field to another |
|
Change the current workflow step |
|
If you want a new field to appear empty in the target blueprint, you don’t need to define anything in the blueprint migration script. There are no values to migrate.