API for plugin datasets#
- class dataiku.connector.Connector(config, plugin_config=None)#
The base interface for a Custom Python connector
- get_read_schema()#
Returns the schema that this connector generates when returning rows.
The returned schema may be None if the schema is not known in advance. In that case, the dataset schema will be infered from the first rows.
Additional columns returned by the generate_rows are discarded if and only if connector.json contains “strictSchema”:true
The schema must be a dict, with a single key: “columns”, containing an array of
{'name':name, 'type' : type}
.- Example:
return {“columns” : [ {“name”: “col1”, “type” : “string”}, {“name” :”col2”, “type” : “float”}]}
Supported types are: string, int, bigint, float, double, date, boolean
- generate_rows(dataset_schema=None, dataset_partitioning=None, partition_id=None, records_limit=-1)#
The main reading method.
Returns a generator over the rows of the dataset (or partition) Each yielded row must be a dictionary, indexed by column name.
The dataset schema and partitioning are given for information purpose.
Example:
from apiLibrary import apiClient # Connect to API service. client = apiClient() # Get a list of JSON objects, where each element corresponds to row in dataset. data = client.get_data() for datum in data: yield { "col1" : datum["api_json_key1"], "col2" : datum["api_json_key2"] }
- get_writer(dataset_schema=None, dataset_partitioning=None, partition_id=None, write_mode='OVERWRITE')#
Returns a write object to write in the dataset (or in a partition)
The dataset_schema given here will match the the rows passed in to the writer.
write_mode can either be OVERWRITE or APPEND. It will not be APPEND unless the plugin specifically supports append mode. See flag supportAppend in connector metadata.
Note
the writer is responsible for clearing the partition, if relevant
- get_partitioning()#
Return the partitioning schema that the connector defines.
Example:
return { "dimensions": [{ "name" : "date", # Name of column to partition on. "type" : "time", "params" : {"period" : "DAY"} }] }
- list_partitions(partitioning)#
Return the list of partitions for the partitioning scheme passed as parameter
- partition_exists(partitioning, partition_id)#
Return whether the partition passed as parameter exists
Implementation is only required if the corresponding flag is set to True in the connector definition
- get_records_count(partitioning=None, partition_id=None)#
Returns the count of records for the dataset (or a partition).
Implementation is only required if the corresponding flag is set to True in the connector definition
- get_connector_resource()#
You may create a folder DATA_DIR/plugins/dev/<plugin id>/resource/ to hold resources useful fo your plugin, e.g. data files; this method returns the path of this folder.
This resource folder is meant to be read-only, and included in the .zip release of your plugin. Do not put resources next to the connector.py or recipe.py.