Writing a macro for project creation#

Prerequisites#

  • Dataiku >= 12.0

  • Access to a dataiku instance with the “Develop plugins” permissions

  • Access to an existing project with the following permissions:
    • “Read project content”

    • “Write project content”

Note

We recommend reading this tutorial. We will assume that you already have a plugin created.

Introduction#

This tutorial will show you how to create a macro dedicated to project creation.

The purpose of this macro is to make the project creation process more efficient and streamlined. Once created, the macro will be accessible under the +New project button on the DSS home page, as shown in Fig 1. This tutorial will provide step-by-step instructions on completing this macro to save time and effort when creating new projects.

Figure 1: Project creation macro.

Figure 1: Project creation macro.#

To create a project creation macro, go to the plugin editor, click the +New component button, and choose the macro component. This will create a subfolder named python-runnables in your plugin directory. Within this subfolder, a subfolder with the name of your macro will be created. You will find two files in this subfolder: runnable.json and runnable.py. The JSON file is a placeholder for your macro’s configuration, including the parameters. The Python file, on the other hand, defines how the macro is executed

To ensure a successful implementation, it’s essential first to define the requirements of the macro. In this case, the macro is designed to assist users in setting up a new project. The macro will prompt the user to provide a name for their project and select a dedicated code environment and default cluster. Additionally, users can apply tags to their projects and create a starter wiki to get their projects off the ground.

Macro configuration#

Fill in the meta section as usual. If you need help, you will find dedicated information in this documentation. For the macroRoles field, enter the PROJECT_CREATOR``and ``NONE for the resultType field, as shown in Code 1.

Code 1: Macro’s global configuration#
  "meta": {
    "label": "Create and set up a project",
    "description": "Project creation (from macro)",
    "icon": "icon-thumbs-up"
  },
  "permissions": [
    "ADMIN"
  ],
  "resultType": "NONE",
  "macroRoles": [
    {
      "type": "PROJECT_CREATOR"
    }
  ],
  "impersonate": false,

To create a macro, it is crucial to define its parameters carefully. As this macro’s scope has been restricted, the parameters required are straightforward and relatively easy to determine.

The first parameter is the project name (project_name), which should be a concise but descriptive label for the project. The second parameter is the code environment (code_envs), which should be specified to ensure the project’s compatibility with the project’s coding standards and requirements.

The third parameter is the cluster name (default_cluster), which should be defined based on the infrastructure used for the project.

The fourth parameter (tags) is a list of strings representing the tags for the project. These tags help to categorize the project, making it easier to search for and identify relevant content.

Finally, the fifth parameter is the wiki content (wiki_content). If the user wishes to add a wiki, this parameter is necessary. Additionally, it is essential to include a parameter (additional_wiki) that allows the user to indicate whether or not they want to create a wiki.

Defining all these parameters leads to the code shown in Code 2.

Code 2: Macro’s parameters configuration#
  "impersonate": false,
  "params": [
    {
      "name": "project_name",
      "label": "Project name",
      "type": "STRING",
      "description": "Name of the project",
      "mandatory": true
    },
    {
      "name": "code_envs",
      "label": "Select a code env",
      "description": "Default code env for the project",
      "type": "CODE_ENV"
    },
    {
      "name": "default_cluster",
      "label": "Select a default container",
      "type": "CLUSTER"
    },
    {
      "name": "tags",
      "label": "Additional tags",
      "type": "STRINGS"
    },
    {
      "name": "additional_wiki",
      "label": "Do you want to add a wiki page?",
      "type": "BOOLEAN"
    },
    {
      "name": "wiki_content",
      "label": "Wiki page",
      "type": "TEXTAREA",
      "visibilityCondition": "model.additional_wiki"
    }
  ]
}

Macro execution#

For more information about the initial generated code, you can refer to this documentation.

The Code 3 presents the whole code of the macro. Comments help you to understand how it works. The complete processing procedure is executed within the run function. We have defined a helper function named create_checklist for generating a checklist item from a list of strings. This function could/should be in a separate file (for example, in a library), but for presentation purposes, it is included in the Python file.

As part of the processing:

  • We first create a client for interacting with Dataiku and gather information about the connected user.

  • Then, we generate a unique project key which is used to identify the project throughout the system.

  • After that, we create the project and set the default code environment and cluster, essential components for any project. Additionally, we create a checklist for the project and add tags to it.

  • Finally, if the user desires, we make a wiki for the project, which can be used to document the project’s progress and guide team members.

If you want to impose a particular configuration, you can add these steps to the procedure without requesting the user’s input.

Wrapping up#

Congratulations! You have completed this tutorial and built your first macro for project creation. Understanding all these basic concepts allows you to create more complex macros.

For example, you can import some datasets from the feature store, define other settings for your newly created project, or set specific permission for users/groups.

Here is the complete version of the code presented in this tutorial:

runnable.json
{
  "meta": {
    "label": "Create and set up a project",
    "description": "Project creation (from macro)",
    "icon": "icon-thumbs-up"
  },
  "permissions": [
    "ADMIN"
  ],
  "resultType": "NONE",
  "macroRoles": [
    {
      "type": "PROJECT_CREATOR"
    }
  ],
  "impersonate": false,
  "params": [
    {
      "name": "project_name",
      "label": "Project name",
      "type": "STRING",
      "description": "Name of the project",
      "mandatory": true
    },
    {
      "name": "code_envs",
      "label": "Select a code env",
      "description": "Default code env for the project",
      "type": "CODE_ENV"
    },
    {
      "name": "default_cluster",
      "label": "Select a default container",
      "type": "CLUSTER"
    },
    {
      "name": "tags",
      "label": "Additional tags",
      "type": "STRINGS"
    },
    {
      "name": "additional_wiki",
      "label": "Do you want to add a wiki page?",
      "type": "BOOLEAN"
    },
    {
      "name": "wiki_content",
      "label": "Wiki page",
      "type": "TEXTAREA",
      "visibilityCondition": "model.additional_wiki"
    }
  ]
}
runnable.py
Code 3: Macro’s processing#
# This file is the actual code for the Python runnable project-creation
import random
import string
from datetime import datetime
import dataiku
from dataiku.runnables import Runnable


class MyRunnable(Runnable):
    """The base interface for a Python runnable"""

    def __init__(self, project_key, config, plugin_config):
        """
        :param project_key: the project in which the runnable executes
        :param config: the dict of the configuration of the object
        :param plugin_config: contains the plugin settings
        """
        self.config = config
        self.plugin_config = plugin_config
        self.project_name = self.config.get('project_name')
        self.project_key = self.project_name.upper()
        self.code_envs = self.config.get('code_envs')
        self.default_cluster = self.config.get('default_cluster')
        self.tags = self.config.get('tags')
        self.additional_wiki = self.config.get('additional_wiki')
        self.wiki_content = self.config.get('wiki_content')

    def get_progress_target(self):
        """
        If the runnable will return some progress info, have this function return a tuple of
        (target, unit) where unit is one of: SIZE, FILES, RECORDS, NONE
        """
        return None

    def create_checklist(self, author, items):
        """
        Generate a checklist from a list of items

        :param author: Author of the checklist
        :param items: list of items
        :return: the checklist
        """
        checklist = {
            "title": "To-do list",
            "createdOn": 0,
            "items": []
        }
        for item in items:
            checklist["items"].append({
                "createdBy": author,
                "createdOn": int(datetime.now().timestamp()),
                "done": False,
                "stateChangedOn": 0,
                "text": item
            })
        return checklist

    def run(self, progress_callback):
        """
        Do stuff here. Can return a string or raise an exception.
        The progress_callback is a function expecting 1 value: current progress
        """

        # Create a (Dataiku) client for interacting with Dataiku and collect the connected user.
        user_client = dataiku.api_client()
        user_auth_info = user_client.get_auth_info()

        # Generate a unique project_key
        while self.project_key in user_client.list_project_keys():
            self.project_key = self.project_name.upper() + '_' + ''.join(random.choices(string.ascii_uppercase, k=10))

        # Create the project
        new_project = user_client.create_project(self.project_key, self.project_name,
                                                 user_auth_info.get('authIdentifier'))

        # Set the default code env and cluster
        settings = new_project.get_settings()
        settings.set_python_code_env(self.code_envs)
        settings.set_k8s_cluster(self.default_cluster)
        settings.save()

        # Add tags to the project settings
        tags = new_project.get_tags()
        tags["tags"] = {t: {} for t in self.tags}
        new_project.set_tags(tags)

        # Create the checklist and add tags to the projects
        to_dos = ["Import some datasets",
                  "Create your first recipe/notebook",
                  "Enjoy coding"]
        metadata = new_project.get_metadata()
        metadata["checklists"]["checklists"].append(self.create_checklist(user_auth_info.get('authIdentifier'),
                                                                          items=to_dos))
        metadata['tags'] = [t for t in self.tags]
        new_project.set_metadata(metadata)

        # Create the wiki if the user wants it.
        if self.additional_wiki:
            wiki = new_project.get_wiki()
            article = wiki.create_article("Home page",
                                          content=self.wiki_content)
            settings = wiki.get_settings()
            settings.set_home_article_id(article.article_id)
            settings.save()