There are two main classes related to managed folder handling in Dataiku’s Python APIs:
dataikupackage. It was initially designed for usage within DSS in recipes and Jupyter notebooks.
dataikuapipackage. It was initially designed for usage outside of DSS.
Both classes have fairly similar capabilities, but we recommend using
dataiku.Folder within DSS.
For more details on the two packages, please see Getting started
This section contains more advanced examples on Managed Folders.
Load a model from a remote Managed Folder#
If you have a trained model artifact stored remotely (e.g. using a cloud object storage Connection like AWS S3), then you can leverage it in a code Recipe. To do so, you first need to download the artifact and temporarily store it on the Dataiku instance’s local filesystem. The following code sample illustrates an example using a Tensorflow serialized model and assumes that it is stored in a Managed Folder called
spam_detection alog with the following files:
import dataiku import tensorflow as tf from tensorflow.keras.models import load_model import os import tempfile from pathlib import Path import shutil folder = dataiku.Folder("NvrBgKDk") model_folder = "spam_detection" #Create temporary directory in /tmp with tempfile.TemporaryDirectory() as tmpdirname: #Loop through every file of the TF model and copy it localy to the tmp directory for file_name in folder.list_paths_in_partition(): local_file_path = tmpdirname + file_name #Create file localy if not os.path.exists(os.path.dirname(local_file_path)): os.makedirs(os.path.dirname(local_file_path)) #Copy file from remote to local with folder.get_download_stream(file_name) as f_remote, open(local_file_path,'wb') as f_local: shutil.copyfileobj(f_remote,f_local) #Load model from local repository model = tf.keras.models.load_model(os.path.join(tmpdirname,model_folder))
Handle to interact with a folder.
Use the following class preferably outside of DSS.