Metrics and checks#

Note

There are two main parts related to handling of metrics and checks in Dataiku’s Python APIs:

Both classes have fairly similar capabilities

For usage information and examples, see Metrics and checks

dataiku package API#

class dataiku.core.metrics.ComputedMetrics(raw)#

Handle to the metrics of a DSS object and their last computed value

get_metric_by_id(metric_id)#

Retrieve the info for a given metric

Usage example

dataset = dataiku.Dataset("my_dataset")
metrics = dataset.get_last_metric_values()
count_files_metric = metrics.get_metric_by_id("basic:COUNT_FILES")
for value in count_files_metric['lastValues']:
    print("partition=%s -> count of files=%s" % (value['partition'], value['value']))        
Parameters:

metric_id (string) – identifier of the metric

Returns:

information about the metric and its values. Top-level fields are

  • metric : definition of the metric

  • meta : display metadata, as a dict of name and fullName

  • computingProbe : name of the probe that computes the metric

  • displayedAsMetric : whether the metric is among the metrics displayed on the “Status” tab of the object

  • notExistingViews : list of the possible types of metrics datasets not yet created on the object

  • partitionsWithValue : list of the partition identifiers for which some value of the metric exists

  • lastValues : list of the last computed value, per partition. Each list element has

    • partition : the partition identifier, as a string.

    • value : the metric value, as a string

    • dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)

    • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_global_data(metric_id)#

Get the global value point of a given metric, or throws.

For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).

Parameters:

metric_id (string) – identifier of the metric

Returns:

the metric data, as a dict. Fields are

  • partition : the partition identifier, as a string.

  • value : the metric value, as a string

  • dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)

  • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_global_value(metric_id)#

Get the global value of a given metric, or throws.

For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).

Usage example

dataset = dataiku.Dataset("my_dataset")
metrics = dataset.get_last_metric_values()
print("record count = %s" % metrics.get_global_value('records:COUNT_RECORDS'))
Parameters:

metric_id (string) – identifier of the metric

Returns:

the value of the metric for the partition

Return type:

str, int or float

get_partition_data(metric_id, partition)#

Get the value point of a given metric for a given partition, or throws.

Parameters:
  • metric_id (string) – identifier of the metric

  • partition (string) – partition identifier

Returns:

the metric data, as a dict. Fields are

  • partition : the partition identifier, as a string.

  • value : the metric value, as a string

  • dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)

  • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_partition_value(metric_id, partition)#

Get the value of a given metric for a given partition, or throws.

Parameters:
  • metric_id (string) – identifier of the metric

  • partition (string) – partition identifier

Returns:

the value of the metric for the partition

Return type:

str, int or float

get_first_partition_data(metric_id)#

Get a value point of a given metric, or throws. The first value encountered is returned.

Parameters:

metric_id (string) – identifier of the metric

Returns:

the metric data, as a dict. Fields are

  • partition : the partition identifier, as a string.

  • value : the metric value, as a string

  • dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)

  • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_partition_data_for_version(metric_id, version_id)#

Get the metric of the first partition matching version_id, for saved models

Parameters:
  • metric_id (string) – identifier of the metric

  • version_id (string) – identifier of the version

Returns:

the metric data, as a dict. Fields are

  • partition : the partition identifier, as a string.

  • value : the metric value, as a string

  • dataType : expected type of value (one of BIGINT, DOUBLE, STRING, BOOLEAN)

  • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_all_ids()#

Get the identifiers of all metrics defined in this object

Returns:

list of metric identifiers

Return type:

list[string]

static get_value_from_data(data)#

Retrieves the value from a metric point, cast in the appropriate type (str, int or float).

For other types, the value is not cast and left as a string.

Parameters:

data (dict) – a value point for a metric, retrieved with get_global_data() or get_partition_data()

Returns:

the value, cast to the appropriate Python type

Return type:

str, int or float

class dataiku.core.metrics.MetricDataPoint(raw)#

A value of a metric, on a partition

Note

Instances of this class are only created by Python checks

get_metric()#

Get the definition of the metric

Returns:

a dict defining the metric. Fields are

  • id : the metric full identifier

  • type : type of the probe computing the metric

  • metricType : type of the metric for the probe

  • dataType : type of the value computed (of BIGINT, DOUBLE, STRING, BOOLEAN)

  • column : (optional) name of the column the metric is computed for

Return type:

dict

get_metric_id()#

Get the metric’s full identifier.

Return type:

string

get_partition()#

Get the identifier of the partition on which the value was computed.

Return type:

string

get_value()#

Get the raw value of the metric.

Usage example:

# the code for a Python check that errors if there are more 
# than 10k records in the dataset.
# the parameters of process() are filled by DSS:
# - last_values is a dict of metric name to MetricDataPoint
# - dataset is a handle on the dataset
# - partition_id is the partition for which the check is run
def process(last_values, dataset, partition_id):
    # get the MetricDataPoint
    last_known_record_count = last_values.get('records:COUNT_RECORDS')
    if last_known_record_count is None:
        return 'EMPTY', 'Record count not yet computed'
    record_count = int(last_known_record_count.get_value())
    if record_count < 10000:    
        return 'OK'
    else:
        return 'ERROR', 'Too many records'        
Return type:

string

get_compute_time()#

Get the time at which the value was computed.

Return type:

datetime.datetime

get_type()#

Get the type of the value.

Returns:

a type, of BIGINT, DOUBLE, BOOLEAN, STRING

Return type:

string

class dataiku.core.metrics.ComputedChecks(raw)#

Handle to the checks of a DSS object and their last computed value

Important

Do not create this class directly, instead use dataiku.Project.get_last_check_values()

get_check_by_name(check_name)#

Retrive the info for a given check

Parameters:

check_name (string) – identifier of the check

get_global_data(check_name)#

Get the global value point of a given check, or throws.

For a partitioned dataset, the global value is the value of the check computed on the whole dataset (coded as partition ‘ALL’).

Parameters:

check_name (string) – identifier of the check

Returns:

the check data, as a dict. Fields are

  • partition : the partition identifier, as a string.

  • outcome : one of OK, ERROR, WARNING, EMPTY

  • message : (optional) message of the check

  • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_global_value(check_name)#

Get the global value of a given check, or throws.

For a partitioned dataset, the global value is the value of the check computed on the whole dataset (coded as partition ‘ALL’).

Parameters:

check_name (string) – identifier of the check

Returns:

outcome of the check (OK, ERROR, WARNING or EMPTY)

Return type:

string

get_partition_data(check_name, partition)#

Get the value point of a given check for a given partition, or throws.

Parameters:
  • check_name (string) – identifier of the check

  • partition (string) – partition identifier

Returns:

the check data, as a dict. Fields are

  • partition : the partition identifier, as a string.

  • outcome : one of OK, ERROR, WARNING, EMPTY

  • message : (optional) message of the check

  • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_partition_value(check_name, partition)#

Get the value of a given check for a given partition, or throws.

Parameters:
  • check_name (string) – identifier of the check

  • partition (string) – partition identifier

Returns:

outcome of the check for this partition (OK, ERROR, WARNING or EMPTY)

Return type:

string

get_first_partition_data(check_name)#

Get a value point of a given check, or throws. The first value encountered is returned.

Parameters:

check_name (string) – identifier of the check

Returns:

the check data, as a dict. Fields are

  • partition : the partition identifier, as a string.

  • outcome : one of OK, ERROR, WARNING, EMPTY

  • message : (optional) message of the check

  • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_partition_data_for_version(check_name, version_id)#

Get the check of the first partition matching version_id, for saved models

Parameters:
  • check_name (string) – identifier of the check

  • version_id (string) – identifier of the version

Returns:

the check data, as a dict. Fields are

  • partition : the partition identifier, as a string.

  • outcome : one of OK, ERROR, WARNING, EMPTY

  • message : (optional) message of the check

  • computed : timestamp of computation, in milliseconds since epoch

Return type:

dict

get_all_names()#

Get the identifiers of all checks defined in this object

Returns:

list of check identifiers

Return type:

list[string]

static get_outcome_from_data(data)#

Retrieves the value from a check data point

Parameters:

data (dict) – a value point for a check, retrieved with get_global_data() or get_partition_data()

Returns:

a check result (OK, ERROR, WARNING or EMPTY)

Return type:

string

class dataiku.core.metrics.CheckDataPoint(raw)#

A value of a check, on a partition

Note

Instances of this class are only created by Python checks

get_check()#

Returns the definition of the check

Returns:

a dict of the check definition. Notable fields are

  • type : the type of check

  • meta : the display metadata, as a dict of name and label

Return type:

dict

get_partition()#

Returns the partition on which the value was computed

Returns:

a partition identifier

Return type:

string

get_value()#

Returns the value of the check, as a string

Returns:

one of OK, ERROR, WARNING, EMPTY (means “no data, check can’t be computed”)

Return type:

string

get_compute_time()#

Returns the time at which the value was computed

Return type:

datetime.datetime

dataikuapi package API#

class dataikuapi.dss.metrics.ComputedMetrics(raw)#

Handle to the metrics of a DSS object and their last computed value

get_metric_by_id(id)#

Retrieve the info for a given metric

Usage example

dataset = project.get_ataset("my_dataset")
metrics = dataset.get_last_metric_values()
count_files_metric = metrics.get_metric_by_id("basic:COUNT_FILES")
for value in count_files_metric['lastValues']:
    print("partition=%s -> count of files=%s" % (value['partition'], value['value']))        
Parameters:

metric_id (string) – identifier of the metric

Returns:

information about the metric and its values. Since the last value of the metric depends on the partition considered, the last values of the metric are given in a sub-list of the dict.

Return type:

dict

get_global_data(metric_id)#

Get the global value point of a given metric, or throws.

For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).

Parameters:

metric_id (string) – identifier of the metric

Returns:

the metric data, as a dict. The value itself is a value string field.

Return type:

dict

get_global_value(metric_id)#

Get the global value of a given metric, or throws.

For a partitioned dataset, the global value is the value of the metric computed on the whole dataset (coded as partition ‘ALL’).

Usage example

dataset = project.get_ataset("my_dataset")
metrics = dataset.get_last_metric_values()
print("record count = %s" % metrics.get_global_value('records:COUNT_RECORDS'))
Parameters:

metric_id (string) – identifier of the metric

Returns:

the value of the metric for the partition

Return type:

str, int or float

get_partition_data(metric_id, partition)#

Get the value point of a given metric for a given partition, or throws.

Parameters:
  • metric_id (string) – identifier of the metric

  • partition (string) – partition identifier

Returns:

the metric data, as a dict. The value itself is a value string field.

Return type:

dict

get_partition_value(metric_id, partition)#

Get the value of a given metric for a given partition, or throws.

Parameters:
  • metric_id (string) – identifier of the metric

  • partition (string) – partition identifier

Returns:

the value of the metric for the partition

Return type:

str, int or float

get_first_partition_data(metric_id)#

Get a value point of a given metric, or throws. The first value encountered is returned.

Parameters:

metric_id (string) – identifier of the metric

Returns:

the metric data, as a dict. The value itself is a value string field.

Return type:

dict

get_all_ids()#

Get the identifiers of all metrics defined in this object

Returns:

list of metric identifiers

Return type:

list[string]

static get_value_from_data(data)#

Retrieves the value from a metric point, cast in the appropriate type (str, int or float).

For other types, the value is not cast and left as a string.

Parameters:

data (dict) – a value point for a metric, retrieved with get_global_data(), get_partition_data() or get_first_partition_data()

Returns:

the value, cast to the appropriate Python type

Return type:

str, int or float