Introduction#

Dataiku is a data science platform that accelerates the development of data and ML projects by reducing time spent on managing your infrastructure (access to databases, compute resources, development & production environments) and allowing you to focus on the most added-value tasks, in a collaborative space.

dku-diagram

Dataiku supports you on 5 main pillars throughout your data & ML projects lifecycle:

  1. Access your data and compute resources: connect to your different databases and seamlessly access all your data assets, run on elastic resources and where you want.

  2. Build your data preparation pipeline: perform transformation steps offloaded to your data storage or in memory using your preferred language (python, R, SQL, Spark, etc.) and get a visual representation of your workflow. Structure your code with git versioned libraries & scripts.

  3. Develop & evaluate ML models: train ML models with the python frameworks of your choice using notebooks or your preferred IDE, track & compare your different experimentations and automatically generate pre-built evaluation interfaces with performance metrics, features importance…

  4. Deploy & monitor your model/pipeline: deploy on API endpoints, orchestrate your pipeline & build monitoring interfaces for your projects.

  5. Collaborate and accelerate data teams: build reusable components for non-technical counterparts & share your work through webapps & advanced visualizations.

In the following pages you will get a high-level overview of the platform’s capabilities from a new user’s perspective. For more in-depth and hands-on walkthroughs, check out the available tutorials.