How to create a custom tool for integration into a visual agent#
This tutorial outlines the creation of a custom tool. By default, Dataiku provides some tools that are usable in Visual Agents. As tools depend very much on a company’s business, Dataiku could not provide every tool that each company would need. Dataiku provides some general tools and a way to integrate specific tools. This integration is done by using a custom tool.
Custom tools are the way to provide a tool for your company’s business. This tutorial relies on the same use case shown in Building and using an agent with Dataiku’s LLM Mesh and Langchain and LLM Mesh agentic applications. The use case involves retrieving customer information based on a provided ID and fetching additional data about the customer’s company utilizing an internet search. By the end of this tutorial, you will know how to create a custom tool and how to use it in a Visual agent.
Prerequisites#
You have followed the Creating and configuring a plugin tutorial or already know how to develop a plugin.
Dataiku >= 13.4
Develop plugins permission
An SQL Dataset named
pro_customers_sql
. You can create this file by uploading thisCSV file
.
Creating the plugin environment#
To develop a custom tool, you must first create a plugin.
Go to the main menu, click the Plugins menu, and select the Write your own from the Add plugin button.
Then, choose a meaningful name, such as “toolbox.”
Once the plugin is created, click the Create a code environment button and select Python as the default language.
In the requirements.txt
file (located in toolbox/code-env/python/spec
),
add the duckduckgo_search
requirement.
Once you have saved the modification, go to the Summary tabs to build the plugin code environment.
The custom tool plugin will use this code environment when the tool is used.
Under the toolbox
directory, create a folder named python-agent-tools
.
This directory is where you code custom tools.
Usually, creating a new component is done by clicking the New component button. But at the time of writing the custom tools component is not visible in the list of components.
Creating the first tool – Dataset Lookup#
The first tool you will create is the dataset lookup tool. This tool is already provided by default in Dataiku, but for the sake of this tutorial, you will need to re-implement a new one. The default Dataiku tool is named Look up a record in a dataset. It is more configurable than the one you will create. However, understanding how to make a tool is the purpose of this tutorial. Once you know, you can adapt the tool to meet your needs.
Dataset lookup tool: used to execute SQL queries on the pro_customers_sql
dataset
to retrieve customer information (name, role, company), given a customer ID.
Code 2 shows an implementation of this tool.
To create this tool, create a folder named dataset-lookup
(for example) under the python-agent-tools
directory.
In this folder, create two files: tool.json
and tool.py
.
The tool.json
file contains the description of the custom tool, like any other component,
and the tool.py
contains the plugin’s code.
Code 1 shows a possible configuration of this tool, and Code 2 shows how to implement it.
tool.json
#{
"id": "dataset-lookup",
"meta": {
"label": "Dataset Lookup",
"description": "Provide a name, job title and company of a customer, given the customer's ID"
},
"params" : [
]
}
tool.py
#from dataiku.llm.agent_tools import BaseAgentTool
import logging
import dataiku
from dataiku import SQLExecutor2
class DatasetLookupTool(BaseAgentTool):
def set_config (self, config, plugin_config):
self.logger = logging.getLogger(__name__)
self.config = config
self.plugin_config = plugin_config
def get_descriptor(self, tool):
return {
"description": """Provide a name, job title and company of a customer, given the customer's ID""",
"inputSchema": {
"title": "Input for a customer id",
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "The customer Id"
}
}
}
}
def invoke(self, input, trace):
self.logger.setLevel(logging.DEBUG)
self.logger.debug(input)
args = input["input"]
customerId = args["id"]
dataset = dataiku.Dataset("pro_customers_sql")
table_name = dataset.get_location_info().get('info', {}).get('table')
executor = SQLExecutor2(dataset=dataset)
eid = customerId.replace("'", "\\'")
query_reader = executor.query_to_iter(
f"""SELECT name, job, company FROM "{table_name}" WHERE id = '{eid}'""")
for (name, job, company) in query_reader.iter_tuples():
return {"output" : f"""The customer's name is "{name}", holding the position "{job}" at the company named "{company}"."""}
return {"output" : f"No information can be found about the customer {customerId}"}
Once the plugin is saved, you can find the new tool in Dataiku.
To find your new tool, go to a project where you planned to use the tool,
go to the analysis menu, select Agent Tools, and then click the New agent tool button.
Your tool is on the list, so you should be able to find a tool like the one shown in
Figure 1.
If your tool is not on the list, you may need to reload Dataiku to force Dataiku to reload the plugin.
The title and the description come from the label
and the description
highlighted in
Code 1.

Fig. 1: Custom tool visible in the list.#
At the top of the modal, select a meaningful name for this tool: Get Customer Info, choose your custom tool, and click the Create button. Your tool is ready for Dataiku to use. However, you should enter an additional description, as shown in Figure 2. For example, you could enter the following description: “Use this tool when you need to retrieve information about a customer ID. The expected output is the name, the job title, and the company.” This helps the LLM to understand in which circumstance this tool should be used.

Fig. 2: Creation of a tool.#
If you want to see your tool in action, click the Quick test tab,
provide the data you want to use, and click the Run button.
If everything goes well, you should go to something similar
to Figure 3.
The inputSchema
,
emphasized in Code 2,
is mandatory.
Dataiku uses it to provide the correct input to the tool.
You can find this inputSchema
in the Quick test tab under the Tool Schema block,
as shown in Figure 3.

Fig. 3: Testing a tool.#
Wrapping up#
Congratulations! You now know how to create a custom tool and declare it usable by Dataiku. You can now create a second tool (for searching over the internet) and follow the Creating and using a Visual Agent tutorial. Below, you will find a possible implementation of this tool.
Creating the second tool – Internet search#
The second tool you will create is also provided by Dataiku, which uses Google to search for information on the Internet. In this tutorial, you will make a “Get Company Info” tool that uses the DuckDuckGo search engine. The process to create the second tool is the same as the previous one.
Create a folder named internet-search
(for example) under the python-agent-tools
,
and create also the two files: tool.json
and tool.py
.
You will find a default implementation in codes
3
and 4, respectively.
Code 3: Internet Search – tool.json
tool.json
#{
"id": "internet-search",
"meta": {
"label": "Internet search",
"description": "Provide general information about a company, given the company's name."
},
"params" : [
]
}
Code 4: Internet Search – tool.py
tool.py
#from dataiku.llm.agent_tools import BaseAgentTool
import logging
import dataiku
from duckduckgo_search import DDGS
class InternetSearchTool(BaseAgentTool):
def set_config (self, config, plugin_config):
self.logger = logging.getLogger(__name__)
self.config = config
self.plugin_config = plugin_config
def get_descriptor(self, tool):
return {
"description": """Provide general information about a company, given the company's name.""",
"inputSchema": {
"title": "Input for a company",
"type": "object",
"properties": {
"company": {
"type": "string",
"description": "The company you need info on"
}
}
}
}
def invoke(self, input, trace):
self.logger.info(input)
args = input["input"]
company_name = args["company"]
with DDGS() as ddgs:
results = list(ddgs.text(f"{company_name} (company)", max_results=1))
if results:
return {"output" : f"Information found about {company_name}: {results[0]['body']}"}
return {"output": f"No information found about {company_name}"}