Data Quality#

You can interact with Data Quality through the API.

Basic operations#

Listing Data Quality rules of a dataset#

project = client.get_project("SomeProjectId")
dataset = project.get_dataset("SomeDatasetId")
ruleset = dataset.get_data_quality_rules()
rules = ruleset.list_rules()
# Returns a list of DSSDataQualityRule

for rule in rules:

        # Access to main information of the rule
        print("Rule id: %s" % rule.id)
        print("name: %s" % rule.name)

Computing a rule#

project = client.get_project("SomeProjectId")
dataset = project.get_dataset("SomeDatasetId")
ruleset = dataset.get_data_quality_rules()
rules = ruleset.list_rules()
future = rules[0].compute()
future.wait_for_result()

Creating a rule#

project = client.get_project("SomeProjectId")
dataset = project.get_dataset("SomeDatasetId")
ruleset = dataset.get_data_quality_rules()
rule_config = { "type": "RecordCountInRangeRule", "softMinimum": 10, "softMinimumEnabled": True, "displayName": "My newly created rule."}
newRule = ruleset.create_rule(rule_config)

Deleting a rule#

project = client.get_project("SomeProjectId")
dataset = project.get_dataset("SomeDatasetId")
ruleset = dataset.get_data_quality_rules()
rules = ruleset.list_rules()
rules[0].delete()

Reference documentation#

dataikuapi.dss.data_quality.DSSDataQualityRuleSet(...)

Base settings class for dataset data quality rules.

dataikuapi.dss.data_quality.DSSDataQualityRule(...)

A rule defined on a dataset.

dataikuapi.dss.data_quality.DSSDataQualityRuleResult(data)

The result of a rule defined on a dataset