In this tutorial, you’ll learn how to build a basic Machine Learning project in Dataiku, from data exploration to machine learning model development, using mainly Jupyter Notebooks.
Have access to a Dataiku 11+ instance
Create a Python>=3.6 code environment named
heart-attack-projectwith the following required packages:
In Dataiku, the equivalent of virtual environments is called a “code environment.” In the code environment documentation, you can find more information and instructions for creating a new Python code environment .
1. Import the project
On the Dataiku homepage, select + NEW PROJECT > DSS Tutorials. In the Quick Start section, select Developers Quick Start.
Alternatively, you can download the project from this page and then upload the project on your Dataiku instance: + NEW PROJECT > Import project.
2. Set the code environment
To ensure the code environment is automatically selected for running all the Python scripts in your project, we will change the project settings to use it by default.
On the top bar, select … > Settings > Code env selection.
In the Default Python code env:
Change Mode to
Select an environment.
In the Environment parameter, select the code environment you’ve just created.
Savebutton or do a
Set up the project#
This tutorial comes with the followings:
README.mdfile (stored in the project Wiki)
an input dataset: the Heart Failure Prediction Dataset
three Jupyter Notebooks that you will leverage to build the project
a Python repository stored in the project library, with some Python functions that will be used in the different notebooks. The project aims to build a binary predictive Machine Learning model to predict the risk of heart failure based on health information. For that, you’ll go through the standard steps of a Machine Learning project: data exploration, data preparation, machine learning modeling using different ML models, and model evaluation.
The project is composed of three notebooks (they can be found in the
Notebooks section: </> > Notebooks) that you will run one by one. For each notebook:
Ensure you’re using the
heart-attack-projectcode environment (see prerequisites above).
Run the notebook cell by cell.
For notebooks 1 and 3, follow the instructions in the last section of each notebook to build a new step in the project workflow.
You’ll find the details of these notebooks and the associated outputs in the following sections: