Data Quality#

For usage information and examples, see Data Quality

class dataikuapi.dss.data_quality.DSSDataQualityRuleSet(project_key, dataset_name, client)#

Base settings class for dataset data quality rules.

Caution

Do not instantiate this class directly, use dataikuapi.dss.dataset.DSSDataset.get_data_quality_rules()

list_rules(as_type='objects')#

Get the list of rules defined on the dataset.

Parameters:

as_type (str) – How to return the rules. Possible values are “dict” and “objects” (defaults to objects)

Returns:

The rules defined on the dataset.

Return type:

a list of DSSDataQualityRule if as_type is “objects”, a list of dict if as_type is “dict”

create_rule(config=None)#

Create a data quality rule on the current dataset.

Parameters:

config (object) – The config of the rule

Returns:

The created data quality rule

Return type:

DSSDataQualityRule

get_partitions_status(partitions='NP')#

Get the last computed status of the specified partition(s).

Parameters:

partitions – The list of partitions name or the name of the partition to get the last status (or “ALL” to retrieve the whole dataset partition). If the dataset is not partitioned use “NP” or None.

Returns:

the status of the specified partitions if they exists

Return type:

object

compute_rules(partition='NP')#

Compute all data quality enabled rules of the current dataset.

Parameters:

partition (str) – If the dataset is partitioned, the name of the partition to compute (or “ALL” to compute on the whole dataset). If the dataset is not partitioned use “NP” or None.

Returns:

Job of the currently computed data quality rules.

Return type:

dataikuapi.dss.future.DSSFuture

get_status()#

Get the status of the dataset. For partitioned dataset this is the worst result of the last computed partitions.

Returns:

The status of the dataset.

Return type:

str

get_status_by_partition(include_all_partitions=False)#

Return the status of a dataset detailed per partition used to compute it if any. If the dataset is not partitioned it will contain only one result.

Parameters:

include_all_partitions (boolean) – Include all the partition having a data quality status or only the one relevant to the current status of the dataset. Default is False.

Returns:

The current status of each last built partitions of the dataset

Return type:

dict

get_last_rules_results(partition='NP')#

Return the last result of all the rules defined on the dataset on a specified partition. If the dataset is not partitioned it will get all the last rules results

Parameters:

partition (str) – If the dataset is partitioned, the name of the partition to get the detailed rules results (or “ALL” to compute on the whole dataset). If the dataset is not partitioned use “NP” or None.

Returns:

The last result of each rule on the specified partition

Return type:

a list of DSSDataQualityRuleResult

get_rules_history(min_timestamp=None, max_timestamp=None, results_per_page=10000, page=0, rule_ids=None)#

Get the history of computed rules.

Parameters:
  • min_timestamp (int) – Timestamp representing the beginning of the timeframe. (included)

  • max_timestamp (int) – Timestamp representing the end of the timeframe. (included)

  • results_per_page (int) – The maximum number of records to be returned, default will be the last 10 000 records.

  • page (int) – The page to be returned, default will be first page (page=0).

  • rule_ids (list) – A list of rule ids to get the history from. Default is all the rules on the dataset.

Returns:

The detailed execution of data quality rules matching the filters set

Return type:

a list of DSSDataQualityRuleResult

class dataikuapi.dss.data_quality.DSSDataQualityRule(rule, dataset_name, project_key, client)#

A rule defined on a dataset.

Caution

Do not instantiate this class, use DSSDataQualityRuleSet.list_rules()

get_raw()#

Get the raw representation of this DSSDataQualityRule

Return type:

dict

property id#
property name#
compute(partition='NP')#

Compute the rule on a given partition or the full dataset.

Parameters:

partition (str) – If the dataset is partitioned, the name of the partition to compute (or “ALL” to compute on the whole dataset). If the dataset is not partitioned use “NP” or None.

Returns:

A job of the computation of the rule.

Return type:

dataikuapi.dss.future.DSSFuture

save()#

Save the settings of a rule.

Returns:

‘Success’

Return type:

str

delete()#

Delete the rule from the dataset configuration.

get_last_result(partition='NP')#

Return the last result of the rule on a specified dataset/partition.

Parameters:

partition (str) – If the dataset is partitioned, the name of the partition to get the detailed rules results (or “ALL” to refer to the whole dataset). If the dataset is not partitioned use “NP” or None.

Returns:

The last result of the rule on the specified partition

Return type:

DSSDataQualityRuleResult

get_rule_history(min_timestamp=None, max_timestamp=None, results_per_page=10000, page=0)#

Get the history of the current rule.

Parameters:
  • min_timestamp (int) – Timestamp representing the beginning of the timeframe. (included)

  • max_timestamp (int) – Timestamp representing the end of the timeframe. (included)

  • results_per_page (int) – The maximum number of records to be returned, default will be the last 10 000 records.

  • page (int) – The page to be returned, default will be first page.

Returns:

The detailed execution of data quality rule matching the timeframe set

Return type:

a list of DSSDataQualityRuleResult

class dataikuapi.dss.data_quality.DSSDataQualityRuleResult(data)#

The result of a rule defined on a dataset

get_raw()#

Get the raw representation of this DSSDataQualityRuleResult

Return type:

dict

property id#
property name#
property outcome#
property message#
property compute_date#
property run_origin#
property partition#