Metrics and checks#

Note

There are two main parts related to handling of metrics and checks in Dataiku’s Python APIs:

dataiku.core.metrics.ComputedMetrics in the dataiku package. It was initially designed for usage within DSS
dataikuapi.dss.metrics.ComputedMetrics in the dataikuapi package. It was initially designed for usage outside of DSS.

Both classes have fairly similar capabilities

For usage information and examples, see Metrics and checks

dataiku package API#

class dataiku.core.metrics.ComputedMetrics(raw)#

Handle to the metrics of a DSS object and their last computed value

Important

Do not create this class directly, instead use dataiku.Dataset.get_last_metric_values(), dataiku.Folder.get_last_metric_values(), dataiku.ModelEvaluationStore.get_last_metric_values() or dataiku.Project.get_last_metric_values()

get_metric_by_id(metric_id)#

Retrieve the info for a given metric

Usage example

dataset = dataiku.Dataset("my_dataset")
metrics = dataset.get_last_metric_values()
count_files_metric = metrics.get_metric_by_id("basic:COUNT_FILES")
for value in count_files_metric['lastValues']:
    print("partition=%s -> count of files=%s" % (value['partition'], value['value']))        

Parameters:

metric_id (string) – identifier of the metric

Returns:

information about the metric and its values. Top-level fields are

metric : definition of the metric
meta : display metadata, as a dict of name and fullName
computingProbe : name of the probe that computes the metric
displayedAsMetric : whether the metric is among the metrics displayed on the “Status” tab of the object
notExistingViews : list of the possible types of metrics datasets not yet created on the object
partitionsWithValue : list of the partition identifiers for which some value of the metric exists
lastValues : list of the last computed value, per partition. Each list element has
- partition : the partition identifier, as a string.
- value : the metric value, as a string
- dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
- computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_global_data(metric_id)#

Get the global value point of a given metric, or throws.

For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).

Parameters:

metric_id (string) – identifier of the metric

Returns:

the metric data, as a dict. Fields are

partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_global_value(metric_id)#

Get the global value of a given metric, or throws.

For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).

Usage example

dataset = dataiku.Dataset("my_dataset")
metrics = dataset.get_last_metric_values()
print("record count = %s" % metrics.get_global_value('records:COUNT_RECORDS'))

Parameters:: metric_id (string) – identifier of the metric
Returns:: the value of the metric for the partition
Return type:: str, int or float

get_partition_data(metric_id, partition)#

Get the value point of a given metric for a given partition, or throws.

Parameters:

metric_id (string) – identifier of the metric
partition (string) – partition identifier

Returns:

the metric data, as a dict. Fields are

partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_partition_value(metric_id, partition)#

Get the value of a given metric for a given partition, or throws.

Parameters:

metric_id (string) – identifier of the metric
partition (string) – partition identifier

Returns:

the value of the metric for the partition

Return type:

str, int or float

get_first_partition_data(metric_id)#

Get a value point of a given metric, or throws. The first value encountered is returned.

Parameters:

metric_id (string) – identifier of the metric

Returns:

the metric data, as a dict. Fields are

partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_partition_data_for_version(metric_id, version_id)#

Get the metric of the first partition matching version_id, for saved models

Parameters:

metric_id (string) – identifier of the metric
version_id (string) – identifier of the version

Returns:

the metric data, as a dict. Fields are

partition : the partition identifier, as a string.
value : the metric value, as a string
dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)
computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_all_ids()#

Get the identifiers of all metrics defined in this object

Returns:: list of metric identifiers
Return type:: list[string]

static get_value_from_data(data)#

Retrieves the value from a metric point, cast in the appropriate type (str, int or float).

For other types, the value is not cast and left as a string.

Parameters:: data (dict) – a value point for a metric, retrieved with get_global_data() or get_partition_data()
Returns:: the value, cast to the appropriate Python type
Return type:: str, int or float

class dataiku.core.metrics.MetricDataPoint(raw)#

A value of a metric, on a partition

Note

Instances of this class are only created by Python checks

get_metric()#

Get the definition of the metric

Returns:

a dict defining the metric. Fields are

id : the metric full identifier
type : type of the probe computing the metric
metricType : type of the metric for the probe
dataType : type of the value computed (of BIGINT, DOUBLE, STRING, BOOLEAN)
column : (optional) name of the column the metric is computed for

Return type:

dict

get_metric_id()#

Get the metric’s full identifier.

Return type:: string

get_partition()#

Get the identifier of the partition on which the value was computed.

Return type:: string

get_value()#

Get the raw value of the metric.

Usage example:

# the code for a Python check that errors if there are more 
# than 10k records in the dataset.
# the parameters of process() are filled by DSS:
# - last_values is a dict of metric name to MetricDataPoint
# - dataset is a handle on the dataset
# - partition_id is the partition for which the check is run
def process(last_values, dataset, partition_id):
    # get the MetricDataPoint
    last_known_record_count = last_values.get('records:COUNT_RECORDS')
    if last_known_record_count is None:
        return 'EMPTY', 'Record count not yet computed'
    record_count = int(last_known_record_count.get_value())
    if record_count < 10000:    
        return 'OK'
    else:
        return 'ERROR', 'Too many records'        

Return type:: string

get_compute_time()#

Get the time at which the value was computed.

Return type:: datetime.datetime

get_type()#

Get the type of the value.

Returns:: a type, of BIGINT, DOUBLE, BOOLEAN, STRING
Return type:: string

class dataiku.core.metrics.ComputedChecks(raw)#

Handle to the checks of a DSS object and their last computed value

Important

Do not create this class directly, instead use dataiku.Project.get_last_check_values()

get_check_by_name(check_name)#

Retrive the info for a given check

Parameters:: check_name (string) – identifier of the check

get_global_data(check_name)#

Get the global value point of a given check, or throws.

For a partitioned dataset, the global value is the value of the check computed on the whole dataset (coded as partition ‘ALL’).

Parameters:

check_name (string) – identifier of the check

Returns:

the check data, as a dict. Fields are

partition : the partition identifier, as a string.
outcome : one of OK, ERROR, WARNING, EMPTY
message : (optional) message of the check
computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_global_value(check_name)#

Get the global value of a given check, or throws.

For a partitioned dataset, the global value is the value of the check computed on the whole dataset (coded as partition ‘ALL’).

Parameters:: check_name (string) – identifier of the check
Returns:: outcome of the check (OK, ERROR, WARNING or EMPTY)
Return type:: string

get_partition_data(check_name, partition)#

Get the value point of a given check for a given partition, or throws.

Parameters:

check_name (string) – identifier of the check
partition (string) – partition identifier

Returns:

the check data, as a dict. Fields are

partition : the partition identifier, as a string.
outcome : one of OK, ERROR, WARNING, EMPTY
message : (optional) message of the check
computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_partition_value(check_name, partition)#

Get the value of a given check for a given partition, or throws.

Parameters:

check_name (string) – identifier of the check
partition (string) – partition identifier

Returns:

outcome of the check for this partition (OK, ERROR, WARNING or EMPTY)

Return type:

string

get_first_partition_data(check_name)#

Get a value point of a given check, or throws. The first value encountered is returned.

Parameters:

check_name (string) – identifier of the check

Returns:

the check data, as a dict. Fields are

partition : the partition identifier, as a string.
outcome : one of OK, ERROR, WARNING, EMPTY
message : (optional) message of the check
computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_partition_data_for_version(check_name, version_id)#

Get the check of the first partition matching version_id, for saved models

Parameters:

check_name (string) – identifier of the check
version_id (string) – identifier of the version

Returns:

the check data, as a dict. Fields are

partition : the partition identifier, as a string.
outcome : one of OK, ERROR, WARNING, EMPTY
message : (optional) message of the check
computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_all_names()#

Get the identifiers of all checks defined in this object

Returns:: list of check identifiers
Return type:: list[string]

static get_outcome_from_data(data)#

Retrieves the value from a check data point

Parameters:: data (dict) – a value point for a check, retrieved with get_global_data() or get_partition_data()
Returns:: a check result (OK, ERROR, WARNING or EMPTY)
Return type:: string

class dataiku.core.metrics.CheckDataPoint(raw)#

A value of a check, on a partition

Note

Instances of this class are only created by Python checks

get_check()#

Returns the definition of the check

Returns:

a dict of the check definition. Notable fields are

type : the type of check
meta : the display metadata, as a dict of name and label

Return type:

dict

get_partition()#

Returns the partition on which the value was computed

Returns:: a partition identifier
Return type:: string

get_value()#

Returns the value of the check, as a string

Returns:: one of OK, ERROR, WARNING, EMPTY (means “no data, check can’t be computed”)
Return type:: string

get_compute_time()#

Returns the time at which the value was computed

Return type:: datetime.datetime

dataikuapi package API#

class dataikuapi.dss.metrics.ComputedMetrics(raw)#

Handle to the metrics of a DSS object and their last computed value

Important

Do not create this class directly, instead use DSSDataset.get_last_metric_values(), DSSSavedModel.get_metric_values(), DSSManagedFolder.get_last_metric_values().

get_metric_by_id(id)#

Retrieve the info for a given metric

Usage example

dataset = project.get_ataset("my_dataset")
metrics = dataset.get_last_metric_values()
count_files_metric = metrics.get_metric_by_id("basic:COUNT_FILES")
for value in count_files_metric['lastValues']:
    print("partition=%s -> count of files=%s" % (value['partition'], value['value']))        

Parameters:: metric_id (string) – identifier of the metric
Returns:: information about the metric and its values. Since the last value of the metric depends on the partition considered, the last values of the metric are given in a sub-list of the dict.
Return type:: dict

get_global_data(metric_id)#

Get the global value point of a given metric, or throws.

For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).

Parameters:: metric_id (string) – identifier of the metric
Returns:: the metric data, as a dict. The value itself is a value string field.
Return type:: dict

get_global_value(metric_id)#

Get the global value of a given metric, or throws.

For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).

Usage example

dataset = project.get_ataset("my_dataset")
metrics = dataset.get_last_metric_values()
print("record count = %s" % metrics.get_global_value('records:COUNT_RECORDS'))

Parameters:: metric_id (string) – identifier of the metric
Returns:: the value of the metric for the partition
Return type:: str, int or float

get_partition_data(metric_id, partition)#

Get the value point of a given metric for a given partition, or throws.

Parameters:

metric_id (string) – identifier of the metric
partition (string) – partition identifier

Returns:

the metric data, as a dict. The value itself is a value string field.

Return type:

dict

get_partition_value(metric_id, partition)#

Get the value of a given metric for a given partition, or throws.

Parameters:

metric_id (string) – identifier of the metric
partition (string) – partition identifier

Returns:

the value of the metric for the partition

Return type:

str, int or float

get_first_partition_data(metric_id)#

Get a value point of a given metric, or throws. The first value encountered is returned.

Parameters:: metric_id (string) – identifier of the metric
Returns:: the metric data, as a dict. The value itself is a value string field.
Return type:: dict

get_all_ids()#

Get the identifiers of all metrics defined in this object

Returns:: list of metric identifiers
Return type:: list[string]

static get_value_from_data(data)#

Retrieves the value from a metric point, cast in the appropriate type (str, int or float).

For other types, the value is not cast and left as a string.

Parameters:: data (dict) – a value point for a metric, retrieved with get_global_data(), get_partition_data() or get_first_partition_data()
Returns:: the value, cast to the appropriate Python type
Return type:: str, int or float