Metrics and checks#
Note
There are two main parts related to handling of metrics and checks in Dataiku’s Python APIs:
dataiku.core.metrics.ComputedMetrics
in the dataiku package. It was initially designed for usage within DSSdataikuapi.dss.metrics.ComputedMetrics
in the dataikuapi package. It was initially designed for usage outside of DSS.
Both classes have fairly similar capabilities
For usage information and examples, see Metrics and checks
dataiku package API#
- class dataiku.core.metrics.ComputedMetrics(raw)#
Handle to the metrics of a DSS object and their last computed value
Important
Do not create this class directly, instead use
dataiku.Dataset.get_last_metric_values()
,dataiku.Folder.get_last_metric_values()
,dataiku.ModelEvaluationStore.get_last_metric_values()
ordataiku.Project.get_last_metric_values()
- get_metric_by_id(metric_id)#
Retrieve the info for a given metric
Usage example
dataset = dataiku.Dataset("my_dataset") metrics = dataset.get_last_metric_values() count_files_metric = metrics.get_metric_by_id("basic:COUNT_FILES") for value in count_files_metric['lastValues']: print("partition=%s -> count of files=%s" % (value['partition'], value['value']))
- Parameters:
metric_id (string) – identifier of the metric
- Returns:
information about the metric and its values. Top-level fields are
metric : definition of the metric
meta : display metadata, as a dict of name and fullName
computingProbe : name of the probe that computes the metric
displayedAsMetric : whether the metric is among the metrics displayed on the “Status” tab of the object
notExistingViews : list of the possible types of metrics datasets not yet created on the object
partitionsWithValue : list of the partition identifiers for which some value of the metric exists
lastValues : list of the last computed value, per partition. Each list element has
partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_global_data(metric_id)#
Get the global value point of a given metric, or throws.
For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).
- Parameters:
metric_id (string) – identifier of the metric
- Returns:
the metric data, as a dict. Fields are
partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_global_value(metric_id)#
Get the global value of a given metric, or throws.
For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).
Usage example
dataset = dataiku.Dataset("my_dataset") metrics = dataset.get_last_metric_values() print("record count = %s" % metrics.get_global_value('records:COUNT_RECORDS'))
- Parameters:
metric_id (string) – identifier of the metric
- Returns:
the value of the metric for the partition
- Return type:
str, int or float
- get_partition_data(metric_id, partition)#
Get the value point of a given metric for a given partition, or throws.
- Parameters:
metric_id (string) – identifier of the metric
partition (string) – partition identifier
- Returns:
the metric data, as a dict. Fields are
partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_partition_value(metric_id, partition)#
Get the value of a given metric for a given partition, or throws.
- Parameters:
metric_id (string) – identifier of the metric
partition (string) – partition identifier
- Returns:
the value of the metric for the partition
- Return type:
str, int or float
- get_first_partition_data(metric_id)#
Get a value point of a given metric, or throws. The first value encountered is returned.
- Parameters:
metric_id (string) – identifier of the metric
- Returns:
the metric data, as a dict. Fields are
partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_partition_data_for_version(metric_id, version_id)#
Get the metric of the first partition matching version_id, for saved models
- Parameters:
metric_id (string) – identifier of the metric
version_id (string) – identifier of the version
- Returns:
the metric data, as a dict. Fields are
partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_all_ids()#
Get the identifiers of all metrics defined in this object
- Returns:
list of metric identifiers
- Return type:
list[string]
- static get_value_from_data(data)#
Retrieves the value from a metric point, cast in the appropriate type (str, int or float).
For other types, the value is not cast and left as a string.
- Parameters:
data (dict) – a value point for a metric, retrieved with
get_global_data()
orget_partition_data()
- Returns:
the value, cast to the appropriate Python type
- Return type:
str, int or float
- class dataiku.core.metrics.MetricDataPoint(raw)#
A value of a metric, on a partition
Note
Instances of this class are only created by Python checks
- get_metric()#
Get the definition of the metric
- Returns:
a dict defining the metric. Fields are
id : the metric full identifier
type : type of the probe computing the metric
metricType : type of the metric for the probe
dataType : type of the value computed (of BIGINT, DOUBLE, STRING, BOOLEAN)
column : (optional) name of the column the metric is computed for
- Return type:
dict
- get_metric_id()#
Get the metric’s full identifier.
- Return type:
string
- get_partition()#
Get the identifier of the partition on which the value was computed.
- Return type:
string
- get_value()#
Get the raw value of the metric.
Usage example:
# the code for a Python check that errors if there are more # than 10k records in the dataset. # the parameters of process() are filled by DSS: # - last_values is a dict of metric name to MetricDataPoint # - dataset is a handle on the dataset # - partition_id is the partition for which the check is run def process(last_values, dataset, partition_id): # get the MetricDataPoint last_known_record_count = last_values.get('records:COUNT_RECORDS') if last_known_record_count is None: return 'EMPTY', 'Record count not yet computed' record_count = int(last_known_record_count.get_value()) if record_count < 10000: return 'OK' else: return 'ERROR', 'Too many records'
- Return type:
string
- get_compute_time()#
Get the time at which the value was computed.
- Return type:
datetime.datetime
- get_type()#
Get the type of the value.
- Returns:
a type, of BIGINT, DOUBLE, BOOLEAN, STRING
- Return type:
string
- class dataiku.core.metrics.ComputedChecks(raw)#
Handle to the checks of a DSS object and their last computed value
Important
Do not create this class directly, instead use
dataiku.Project.get_last_check_values()
- get_check_by_name(check_name)#
Retrive the info for a given check
- Parameters:
check_name (string) – identifier of the check
- get_global_data(check_name)#
Get the global value point of a given check, or throws.
For a partitioned dataset, the global value is the value of the check computed on the whole dataset (coded as partition ‘ALL’).
- Parameters:
check_name (string) – identifier of the check
- Returns:
the check data, as a dict. Fields are
partition : the partition identifier, as a string.
outcome : one of OK, ERROR, WARNING, EMPTY
message : (optional) message of the check
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_global_value(check_name)#
Get the global value of a given check, or throws.
For a partitioned dataset, the global value is the value of the check computed on the whole dataset (coded as partition ‘ALL’).
- Parameters:
check_name (string) – identifier of the check
- Returns:
outcome of the check (OK, ERROR, WARNING or EMPTY)
- Return type:
string
- get_partition_data(check_name, partition)#
Get the value point of a given check for a given partition, or throws.
- Parameters:
check_name (string) – identifier of the check
partition (string) – partition identifier
- Returns:
the check data, as a dict. Fields are
partition : the partition identifier, as a string.
outcome : one of OK, ERROR, WARNING, EMPTY
message : (optional) message of the check
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_partition_value(check_name, partition)#
Get the value of a given check for a given partition, or throws.
- Parameters:
check_name (string) – identifier of the check
partition (string) – partition identifier
- Returns:
outcome of the check for this partition (OK, ERROR, WARNING or EMPTY)
- Return type:
string
- get_first_partition_data(check_name)#
Get a value point of a given check, or throws. The first value encountered is returned.
- Parameters:
check_name (string) – identifier of the check
- Returns:
the check data, as a dict. Fields are
partition : the partition identifier, as a string.
outcome : one of OK, ERROR, WARNING, EMPTY
message : (optional) message of the check
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_partition_data_for_version(check_name, version_id)#
Get the check of the first partition matching version_id, for saved models
- Parameters:
check_name (string) – identifier of the check
version_id (string) – identifier of the version
- Returns:
the check data, as a dict. Fields are
partition : the partition identifier, as a string.
outcome : one of OK, ERROR, WARNING, EMPTY
message : (optional) message of the check
computed : timestamp of computation, in milliseconds since epoch
- Return type:
dict
- get_all_names()#
Get the identifiers of all checks defined in this object
- Returns:
list of check identifiers
- Return type:
list[string]
- static get_outcome_from_data(data)#
Retrieves the value from a check data point
- Parameters:
data (dict) – a value point for a check, retrieved with
get_global_data()
orget_partition_data()
- Returns:
a check result (OK, ERROR, WARNING or EMPTY)
- Return type:
string
- class dataiku.core.metrics.CheckDataPoint(raw)#
A value of a check, on a partition
Note
Instances of this class are only created by Python checks
- get_check()#
Returns the definition of the check
- Returns:
a dict of the check definition. Notable fields are
type : the type of check
meta : the display metadata, as a dict of name and label
- Return type:
dict
- get_partition()#
Returns the partition on which the value was computed
- Returns:
a partition identifier
- Return type:
string
- get_value()#
Returns the value of the check, as a string
- Returns:
one of OK, ERROR, WARNING, EMPTY (means “no data, check can’t be computed”)
- Return type:
string
- get_compute_time()#
Returns the time at which the value was computed
- Return type:
datetime.datetime
dataikuapi package API#
- class dataikuapi.dss.metrics.ComputedMetrics(raw)#
Handle to the metrics of a DSS object and their last computed value
Important
Do not create this class directly, instead use
DSSDataset.get_last_metric_values()
,DSSSavedModel.get_metric_values()
,DSSManagedFolder.get_last_metric_values()
.- get_metric_by_id(id)#
Retrieve the info for a given metric
Usage example
dataset = project.get_ataset("my_dataset") metrics = dataset.get_last_metric_values() count_files_metric = metrics.get_metric_by_id("basic:COUNT_FILES") for value in count_files_metric['lastValues']: print("partition=%s -> count of files=%s" % (value['partition'], value['value']))
- Parameters:
metric_id (string) – identifier of the metric
- Returns:
information about the metric and its values. Since the last value of the metric depends on the partition considered, the last values of the metric are given in a sub-list of the dict.
- Return type:
dict
- get_global_data(metric_id)#
Get the global value point of a given metric, or throws.
For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).
- Parameters:
metric_id (string) – identifier of the metric
- Returns:
the metric data, as a dict. The value itself is a value string field.
- Return type:
dict
- get_global_value(metric_id)#
Get the global value of a given metric, or throws.
For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).
Usage example
dataset = project.get_ataset("my_dataset") metrics = dataset.get_last_metric_values() print("record count = %s" % metrics.get_global_value('records:COUNT_RECORDS'))
- Parameters:
metric_id (string) – identifier of the metric
- Returns:
the value of the metric for the partition
- Return type:
str, int or float
- get_partition_data(metric_id, partition)#
Get the value point of a given metric for a given partition, or throws.
- Parameters:
metric_id (string) – identifier of the metric
partition (string) – partition identifier
- Returns:
the metric data, as a dict. The value itself is a value string field.
- Return type:
dict
- get_partition_value(metric_id, partition)#
Get the value of a given metric for a given partition, or throws.
- Parameters:
metric_id (string) – identifier of the metric
partition (string) – partition identifier
- Returns:
the value of the metric for the partition
- Return type:
str, int or float
- get_first_partition_data(metric_id)#
Get a value point of a given metric, or throws. The first value encountered is returned.
- Parameters:
metric_id (string) – identifier of the metric
- Returns:
the metric data, as a dict. The value itself is a value string field.
- Return type:
dict
- get_all_ids()#
Get the identifiers of all metrics defined in this object
- Returns:
list of metric identifiers
- Return type:
list[string]
- static get_value_from_data(data)#
Retrieves the value from a metric point, cast in the appropriate type (str, int or float).
For other types, the value is not cast and left as a string.
- Parameters:
data (dict) – a value point for a metric, retrieved with
get_global_data()
,get_partition_data()
orget_first_partition_data()
- Returns:
the value, cast to the appropriate Python type
- Return type:
str, int or float