Interaction with Pyspark#
- dataiku.spark.reduceNotebookLogs(sql_context, log_level='WARN')#
- dataiku.spark.reduce_notebook_logs(sql_context, log_level='WARN')#
Alter the logging level of the Spark context.
Spark spawns threads that do tasks in the background, and some write to the log with a logging level of DEBUG. Additionally, Spark writes a lot of log at the DEBUG level. In notebooks this implies that cells can end up drowning in uninformative DEBUG log messages, so it’s advised to change the logging level to WARN
This method has no effect if not called from a notebook
- Parameters:
sql_context – the current SQLContext or SparkSession
log_level – desired logging level (INFO, WARN, ERROR, DEBUG)
- dataiku.spark.start_spark_context_and_setup_sql_context(load_defaults=True, hive_db='dataiku', conf={})#
Helper to start a Spark Context and a SQL Context “like DSS recipes do”. This helper is mainly for information purpose and not used by default.
- dataiku.spark.setup_sql_context(sc, hive_db='dataiku', conf={})#
Helper to start a SQL Context “like DSS recipes do”. This helper is mainly for information purpose and not used by default.
- dataiku.spark.distribute_py_libs(sc)#
- dataiku.spark.get_dataframe(sqlContext, dataset)#
Opens a DSS dataset as a SparkSQL dataframe. The ‘dataset’ argument must be a dataiku.Dataset object
- dataiku.spark.write_schema_from_dataframe(dataset, dataframe)#
Sets the schema on an existing dataset to be write-compatible with given SparkSQL dataframe
- dataiku.spark.write_dataframe(dataset, dataframe, delete_first=True)#
Saves a SparkSQL dataframe into an existing DSS dataset
- dataiku.spark.write_with_schema(dataset, dataframe, delete_first=True)#
Writes a SparkSQL dataframe into an existing DSS dataset. This first overrides the schema of the dataset to match the schema of the dataframe
- dataiku.spark.apply_prepare_recipe(df, recipe_name, project_key=None)#
