Data Quality#

You can interact with Data Quality through the API.

Basic operations#

Listing Data Quality rules of a dataset#

project = client.get_project("SomeProjectId")
dataset = project.get_dataset("SomeDatasetId")
ruleset = dataset.get_data_quality_rules()
rules = ruleset.list_rules()
# Returns a list of DSSDataQualityRule

for rule in rules:

        # Access to main information of the rule
        print("Rule id: %s" % rule.id)
        print("name: %s" % rule.name)

Computing a rule#

project = client.get_project("SomeProjectId")
dataset = project.get_dataset("SomeDatasetId")
ruleset = dataset.get_data_quality_rules()
rules = ruleset.list_rules()
future = rules[0].compute()
future.wait_for_result()

Creating a rule#

project = client.get_project("SomeProjectId")
dataset = project.get_dataset("SomeDatasetId")
ruleset = dataset.get_data_quality_rules()
rule_config = { "type": "RecordCountInRangeRule", "softMinimum": 10, "softMinimumEnabled": True, "displayName": "My newly created rule."}
newRule = ruleset.create_rule(rule_config)

Deleting a rule#

project = client.get_project("SomeProjectId")
dataset = project.get_dataset("SomeDatasetId")
ruleset = dataset.get_data_quality_rules()
rules = ruleset.list_rules()
rules[0].delete()

Reference documentation#

Classes#

dataikuapi.DSSClient(host[, api_key, ...])

Entry point for the DSS API client

dataikuapi.dss.data_quality.DSSDataQualityRule(...)

A rule defined on a dataset.

dataikuapi.dss.data_quality.DSSDataQualityRuleSet(...)

Base settings class for dataset data quality rules.

dataikuapi.dss.dataset.DSSDataset(client, ...)

A dataset on the DSS instance.

dataikuapi.dss.future.DSSFuture(client, job_id)

A future represents a long-running task on a DSS instance.

dataikuapi.dss.project.DSSProject(client, ...)

A handle to interact with a project on the DSS instance.

Functions#

compute([partition])

Compute the rule on a given partition or the full dataset.

create_rule([config])

Create a data quality rule on the current dataset.

delete()

Delete the rule from the dataset configuration.

get_data_quality_rules()

Get a handle to interact with the data quality rules of the dataset.

get_dataset(dataset_name)

Get a handle to interact with a specific dataset

get_project(project_key)

Get a handle to interact with a specific project.

list_rules([as_type])

Get the list of rules defined on the dataset.

wait_for_result()

Waits for the completion of the long-running task, and returns its result.