Data Quality#
For usage information and examples, see Data Quality
- class dataikuapi.dss.data_quality.DSSDataQualityRuleSet(project_key, dataset_name, client)#
Base settings class for dataset data quality rules.
Caution
Do not instantiate this class directly, use
dataikuapi.dss.dataset.DSSDataset.get_data_quality_rules()
- list_rules(as_type='objects')#
Get the list of rules defined on the dataset.
- Parameters:
as_type (str) – How to return the rules. Possible values are “dict” and “objects” (defaults to objects)
- Returns:
The rules defined on the dataset.
- Return type:
a list of
DSSDataQualityRule
if as_type is “objects”, a list of dict if as_type is “dict”
- create_rule(config=None)#
Create a data quality rule on the current dataset.
- Parameters:
config (object) – The config of the rule
- Returns:
The created data quality rule
- Return type:
- get_partitions_status(partitions='NP')#
Get the last computed status of the specified partition(s).
- Parameters:
partitions – The list of partitions name or the name of the partition to get the last status (or “ALL” to retrieve the whole dataset partition). If the dataset is not partitioned use “NP” or None.
- Returns:
the status of the specified partitions if they exists
- Return type:
object
- compute_rules(partition='NP')#
Compute all data quality enabled rules of the current dataset.
- Parameters:
partition (str) – If the dataset is partitioned, the name of the partition to compute (or “ALL” to compute on the whole dataset). If the dataset is not partitioned use “NP” or None.
- Returns:
Job of the currently computed data quality rules.
- Return type:
- get_status()#
Get the status of the dataset. For partitioned dataset this is the worst result of the last computed partitions.
- Returns:
The status of the dataset.
- Return type:
str
- get_status_by_partition(include_all_partitions=False)#
Return the status of a dataset detailed per partition used to compute it if any. If the dataset is not partitioned it will contain only one result.
- Parameters:
include_all_partitions (boolean) – Include all the partition having a data quality status or only the one relevant to the current status of the dataset. Default is False.
- Returns:
The current status of each last built partitions of the dataset
- Return type:
dict
- get_last_rules_results(partition='NP')#
Return the last result of all the rules defined on the dataset on a specified partition. If the dataset is not partitioned it will get all the last rules results
- Parameters:
partition (str) – If the dataset is partitioned, the name of the partition to get the detailed rules results (or “ALL” to compute on the whole dataset). If the dataset is not partitioned use “NP” or None.
- Returns:
The last result of each rule on the specified partition
- Return type:
a list of
DSSDataQualityRuleResult
- get_rules_history(min_timestamp=None, max_timestamp=None, results_per_page=10000, page=0, rule_ids=None)#
Get the history of computed rules.
- Parameters:
min_timestamp (int) – Timestamp representing the beginning of the timeframe. (included)
max_timestamp (int) – Timestamp representing the end of the timeframe. (included)
results_per_page (int) – The maximum number of records to be returned, default will be the last 10 000 records.
page (int) – The page to be returned, default will be first page (page=0).
rule_ids (list) – A list of rule ids to get the history from. Default is all the rules on the dataset.
- Returns:
The detailed execution of data quality rules matching the filters set
- Return type:
a list of
DSSDataQualityRuleResult
- class dataikuapi.dss.data_quality.DSSDataQualityRule(rule, dataset_name, project_key, client)#
A rule defined on a dataset.
Caution
Do not instantiate this class, use
DSSDataQualityRuleSet.list_rules()
- get_raw()#
Get the raw representation of this
DSSDataQualityRule
- Return type:
dict
- property id#
- property name#
- compute(partition='NP')#
Compute the rule on a given partition or the full dataset.
- Parameters:
partition (str) – If the dataset is partitioned, the name of the partition to compute (or “ALL” to compute on the whole dataset). If the dataset is not partitioned use “NP” or None.
- Returns:
A job of the computation of the rule.
- Return type:
- save()#
Save the settings of a rule.
- Returns:
‘Success’
- Return type:
str
- delete()#
Delete the rule from the dataset configuration.
- get_last_result(partition='NP')#
Return the last result of the rule on a specified dataset/partition.
- Parameters:
partition (str) – If the dataset is partitioned, the name of the partition to get the detailed rules results (or “ALL” to refer to the whole dataset). If the dataset is not partitioned use “NP” or None.
- Returns:
The last result of the rule on the specified partition
- Return type:
- get_rule_history(min_timestamp=None, max_timestamp=None, results_per_page=10000, page=0)#
Get the history of the current rule.
- Parameters:
min_timestamp (int) – Timestamp representing the beginning of the timeframe. (included)
max_timestamp (int) – Timestamp representing the end of the timeframe. (included)
results_per_page (int) – The maximum number of records to be returned, default will be the last 10 000 records.
page (int) – The page to be returned, default will be first page.
- Returns:
The detailed execution of data quality rule matching the timeframe set
- Return type:
a list of
DSSDataQualityRuleResult
- class dataikuapi.dss.data_quality.DSSDataQualityRuleResult(data)#
The result of a rule defined on a dataset
Caution
Do not instantiate this class, use:
DSSDataQualityRuleSet.get_last_rules_results()
orDSSDataQualityRuleSet.get_rules_history()
orDSSDataQualityRule.get_last_result()
orDSSDataQualityRule.get_rule_history()
- get_raw()#
Get the raw representation of this
DSSDataQualityRuleResult
- Return type:
dict
- property id#
- property name#
- property outcome#
- property message#
- property compute_date#
- property run_origin#
- property partition#