Dataiku Developer Guide

You are viewing the developer guide for version 14 of DSS.

Api Reference
Python
Interaction With Pyspark

Interaction with Pyspark#

dataiku.spark.start_spark_context_and_setup_sql_context(load_defaults=True, hive_db='dataiku', conf={})#: Helper to start a Spark Context and a SQL Context “like DSS recipes do”. This helper is mainly for information purpose and not used by default.

dataiku.spark.setup_sql_context(sc, hive_db='dataiku', conf={})#: Helper to start a SQL Context “like DSS recipes do”. This helper is mainly for information purpose and not used by default.

dataiku.spark.distribute_py_libs(sc)#

dataiku.spark.get_dataframe(sqlContext, dataset)#: Opens a DSS dataset as a SparkSQL dataframe. The ‘dataset’ argument must be a dataiku.Dataset object

dataiku.spark.write_schema_from_dataframe(dataset, dataframe)#: Sets the schema on an existing dataset to be write-compatible with given SparkSQL dataframe

dataiku.spark.write_dataframe(dataset, dataframe, delete_first=True)#: Saves a SparkSQL dataframe into an existing DSS dataset

dataiku.spark.write_with_schema(dataset, dataframe, delete_first=True)#: Writes a SparkSQL dataframe into an existing DSS dataset. This first overrides the schema of the dataset to match the schema of the dataframe

dataiku.spark.apply_prepare_recipe(df, recipe_name, project_key=None)#