Building and using an agent with Dataiku’s LLM Mesh and Langchain#
Large Language Models’ (LLMs) impressive text generation capabilities can be further enhanced by integrating them with additional modules: planning, memory, and tools. These LLM-based agents can perform tasks such as accessing databases, incorporating contextual understanding from external sensors, or interfacing with other software to execute more complex actions. This integration allows for more dynamic and practical applications, making LLMs active participants in decision-making processes.
This tutorial will construct an LLM agent using a practical use case. The use case involves retrieving customer information based on a provided ID and fetching additional data about the customer’s company utilizing an internet search. By the end of this tutorial, you will have a structured understanding of integrating Language Models with external tools to create functional and efficient agents.
Prerequisites#
Dataiku >= 12.6.2
Python >= 3.10
A code environment with the following packages:
langchain # tested with 1.2.15 langchain-core # tested with 1.2.28 langchain-classic # tested with 1.0.3 ddgs #tested with 9.13.0
An SQL dataset called
pro_customers_sqlin the flow, like the one shown in Table 1.
id |
name |
job |
company |
|---|---|---|---|
tcook |
Tim Cook |
CEO |
Apple |
snadella |
Satya Nadella |
CEO |
Microsoft |
jbezos |
Jeff Bezos |
CEO |
Amazon |
fdouetteau |
Florian Douetteau |
CEO |
Dataiku |
wcoyote |
Wile E. Coyote |
Business Developer |
ACME |
LLM initialization and library import#
To begin with, you need to set up a development environment
by importing some necessary libraries and initializing the chat LLM you want to use to create the agent.
The tutorial relies on the LLM Mesh for this and the Langchain package to orchestrate the process.
The DKUChatModel class allows you to call a model previously registered in the LLM Mesh
and make it recognizable as a Langchain chat model for further use.
import dataiku
# Prepare the LLM
from dataiku.langchain.dku_llm import DKUChatModel
LLM_ID = "" # Replace with a valid LLM id
llm = DKUChatModel(llm_id=LLM_ID, temperature=0)
Tip
You’ll need to provide DKUChatModel with an llm_id, a Dataiku internal ID used in the LLM Mesh.
The documentation provides instructions on obtaining an LLM ID.
The following code snippet will print you an exhaustive list of all the models to which your project has access.
import dataiku
client = dataiku.api_client()
project = client.get_default_project()
llm_list = project.list_llms()
for llm in llm_list:
print(f"- {llm.description} (id: {llm.id})")
Tools’ definition#
In this section, you will define the external tools that your LLM agent will use to perform more advanced tasks. In our case, these tools include:
Dataset lookup tool: used to execute SQL queries on the
pro_customers_sqldataset to retrieve customer information (name, role, company), given a customer ID. Code 2 shows an implementation of this tool.Internet search tool: used to perform internet searches to fetch more detailed information about the customer’s company. Code 3 shows an implementation of this tool.
Note
Langchain offers three main ways to define custom tools: the @tool decorator,
the StructuredTool.from_function() method that takes a Python function as input,
or the class method, which extends the built-in BaseTool class and provides metadata as well as a _run method (at least).
The tutorial defines the tool here using the last option because we noticed that the LLM tends to use them more consistently. But don’t hesitate to try all three methods yourself.
from dataiku import SQLExecutor2
from dataiku.sql import Constant, toSQL, Dialects
from typing import Type
from pydantic import BaseModel, Field
from langchain.tools import BaseTool
class CustomerInfo(BaseModel):
"""Parameter for GetCustomerInfo"""
id: str = Field(description="customer ID")
class GetCustomerInfo(BaseTool):
"""Gathering customer information"""
name: str = "GetCustomerInfo"
description: str = "Provide a name, job title and company of a customer, given the customer's ID"
args_schema: Type[BaseModel] = CustomerInfo
def _run(self, id: str):
dataset = dataiku.Dataset("pro_customers_sql")
table_name = dataset.get_location_info().get('info', {}).get('quotedResolvedTableName')
executor = SQLExecutor2(dataset=dataset)
cid = Constant(str(id))
escaped_cid = toSQL(cid, dialect=Dialects.POSTGRES) # Replace by your DB
query_reader = executor.query_to_iter(
f"""SELECT * FROM {table_name} where "id"={escaped_cid}""")
for (user_id, name, job, company) in query_reader.iter_tuples():
return f"The customer's name is \"{name}\", holding the position \"{job}\" at the company named \"{company}\""
return f"No information can be found about the customer {id}"
def _arun(self, name: str):
raise NotImplementedError("This tool does not support async")
Note
The SQL query might be written differently depending on your SQL Engine.
from ddgs import DDGS
class CompanyInfo(BaseModel):
"""Parameter for the GetCompanyInfo"""
name: str = Field(description="Company's name")
class GetCompanyInfo(BaseTool):
"""Class for gathering in the company information"""
name: str = "GetCompanyInfo"
description: str = "Provide general information about a company, given the company's name."
args_schema: Type[BaseModel] = CompanyInfo
def _run(self, name: str):
results = DDGS().text(name + " (company)", max_results=1)
result = "Information found about " + name + ": " + results[0]["body"] + "\n" \
if len(results) > 0 and "body" in results[0] \
else None
if not result:
results = DDGS().text(name, max_results=1)
result = "Information found about " + name + ": " + results[0]["body"] + "\n" \
if len(results) > 0 and "body" in results[0] \
else "No information can be found about the company " + name
return result
def _arun(self, name: str):
raise NotImplementedError("This tool does not support async")
LLM agent creation#
With the tools defined, the next step is to create an agent that can effectively utilize these tools. This tutorial uses the ReAct logic, which combines the LLM’s ability for reasoning (e.g., chain-of-thought prompting, etc.) and acting (e.g., interfacing with external software, etc.) through a purposely crafted prompt.
Alternatively, it is possible to use the tool calling logic, when the LLM supports tool calls. LangGraph will work alongside LangChain to drive the reasoning and tool execution flow. More information can be found in the tool calls section of the developer guide.
Note
Langchain offers a hub for community members to share pre-built prompt templates and other resources. The prompt below has been taken from there, and it is also possible to fetch it directly with the following code:
# Only need if you want to use a default prompt (may require langchainhub dependency)
from langchain import hub
prompt = hub.pull("hwchase17/react")
# Initializes the agent
from langchain.agents import create_agent
# Link the tools
tools = [GetCustomerInfo(), GetCompanyInfo()]
tool_names = [tool.name for tool in tools]
prompt="""Answer the following questions as best you can. You have only access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of {tool_names}
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}"""
agent = create_agent(model=llm, tools=tools, system_prompt=prompt)
Note
Langchain offers a hub for community members to share pre-built prompt templates and other resources. The prompt below has been taken from there, and it is also possible to fetch it directly with the following code:
# Only need if you want to use a default prompt (may require langchainhub dependency)
from langchain import hub
prompt = hub.pull("hwchase17/openai-tools-agent")
from typing import Annotated
from typing_extensions import TypedDict
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
tools = [GetCustomerInfo(), GetCompanyInfo()]
tool_names = ["GetCustomerInfo", "GetCompanyInfo"]
# bind_tools advertises the tool schemas to the LLM so it can emit tool_calls
llm_with_tools = llm.bind_tools(tools)
prompt = f"""Answer the following questions as best you can. You have only access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of {tool_names}
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!"""
class State(TypedDict):
messages: Annotated[list, add_messages]
def call_llm(state: State) -> dict:
# Prepend the system prompt on every turn so it is always in context
response = llm_with_tools.invoke([SystemMessage(content=prompt)] + state["messages"])
return {"messages": [response]}
def should_continue(state: State) -> str:
# Route to the tool executor when the LLM requested tool calls, else stop
return "tools" if state["messages"][-1].tool_calls else END
# Two-node graph: agent ↔ tools, with a conditional exit after each LLM call
graph = StateGraph(State)
graph.add_node("agent", call_llm)
graph.add_node("tools", ToolNode(tools)) # executes tool_calls and appends ToolMessages
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, ["tools", END])
graph.add_edge("tools", "agent") # always return to the LLM after running tools
agent = graph.compile()
LLM agent invocation#
Finally, you can run the invoke method on your agent.
Depending on the level of detail you want to see about the intermediate steps
and the “decisions” taken by the agents, Langchain offers several methods and a debug mode.
We are showing them below.
from langchain_core.globals import set_debug
set_debug(False) ## Set to True to get debug traces
customer_id = "fdouetteau"
content = f"""Give all the professional information you can about the customer with ID: {customer_id}. Also include information about the company if you can."""
## This will directly return the output from the defined input
agent.invoke(
{"messages": [{"role": "user", "content": content}]}
)
## You can also iterate on intermediate steps, to print them or run any tests
from langchain_core.messages import HumanMessage
content=f"""Give all the professional information you can about the customer with ID: {customer_id}.
Also include information about the company if you can."""
for step in agent.stream(
{"messages": [HumanMessage(content=content)]},
stream_mode="updates"
):
print("*"*20, "\n")
if 'model' in step.keys():
for message in step['model']['messages']:
message.pretty_print()
elif 'tools' in step.keys():
print("Calling tools:")
for tool_message in step['tools']['messages']:
print(f"# tool: {tool_message.name}")
print(f" {tool_message.content}")
customer_id = "fdouetteau"
agent.invoke({
"messages": [
HumanMessage(content=f"Give all the professional information you can about the customer with ID: {customer_id}. Also include information about the company if you can.")
]
})
## You can also iterate on intermediate steps, to print them or run any tests
from langchain_core.messages import HumanMessage
content=f"""Give all the professional information you can about the customer with ID: {customer_id}.
Also include information about the company if you can."""
for step in agent.stream(
{"messages": [HumanMessage(content=content)]},
stream_mode="updates"
):
print("*"*20, "\n")
if 'model' in step.keys():
for message in step['model']['messages']:
message.pretty_print()
elif 'tools' in step.keys():
print("Calling tools:")
for tool_message in step['tools']['messages']:
print(f"# tool: {tool_message.name}")
print(f" {tool_message.content}")
Wrapping up#
This tutorial provided a walk-through for building an LLM-based agent capable of interacting with external tools to fetch and process information. Modularizing the approach - from initialization and tool definition to the creation and invocation of the agent - ensures clarity, reusability, and efficiency, suitable for tackling similar tasks.
