Creating an API endpoint for using an LLM-Based agent#
In this tutorial, you will learn how to create an API endpoint using a headless API webapp for an LLM-based agent. This tutorial is based on two tutorials: Building and using an agent with Dataiku’s LLM Mesh and Langchain and Creating an API endpoint from webapps. You will use Dash as the applicative framework, but you can quickly adapt this tutorial to your preferred framework. You can directly jump to this section if you already have a working LLM-based agent.
Prerequisites#
Dataiku >= 13.1
You must download
this dataset
and create an SQL dataset namedpro_customers_sql
.A code environment with the following packages:
dash langchain==0.2.0 duckduckgo_search==6.1.0
Tools’ definition#
You will define the external tools your LLM agent will use, as you defined in the previous tutorial. In our case, these tools include:
Dataset lookup tool: used to execute SQL queries on the
pro_customers_sql
dataset to retrieve customer information (name, role, company), given a customer ID. Code 1 shows an implementation of this tool.Internet search tool: used to perform internet searches to fetch more detailed information about the customer’s company. Code 2 shows an implementation of this tool.
from langchain.tools import BaseTool
from dataiku import SQLExecutor2
from langchain.pydantic_v1 import BaseModel, Field
from typing import Type
class CustomerInfo(BaseModel):
"""Parameter for GetCustomerInfo"""
id: str = Field(description="customer ID")
class GetCustomerInfo(BaseTool):
"""Gathering customer information"""
name = "GetCustomerInfo"
description = "Provide a name, job title and company of a customer, given the customer's ID"
args_schema: Type[BaseModel] = CustomerInfo
def _run(self, id: str):
dataset = dataiku.Dataset("pro_customers_sql")
table_name = dataset.get_location_info().get('info', {}).get('table')
executor = SQLExecutor2(dataset=dataset)
eid = id.replace("'", "\\'")
query_reader = executor.query_to_iter(
f"""SELECT "name", "job", "company" FROM "{table_name}" WHERE "id" = '{eid}'""")
for (name, job, company) in query_reader.iter_tuples():
return f"The customer's name is \"{name}\", holding the position \"{job}\" at the company named {company}"
return f"No information can be found about the customer {id}"
def _arun(self, name: str):
raise NotImplementedError("This tool does not support async")
from duckduckgo_search import DDGS
class CompanyInfo(BaseModel):
"""Parameter for the GetCompanyInfo"""
name: str = Field(description="Company's name")
class GetCompanyInfo(BaseTool):
"""Class for gathering in the company information"""
name = "GetCompanyInfo"
description = "Provide general information about a company, given the company's name."
args_schema: Type[BaseModel] = CompanyInfo
def _run(self, name: str):
results = DDGS().answers(name + " (company)")
result = "Information found about " + name + ": " + results[0]["text"] + "\n" \
if len(results) > 0 and "text" in results[0] \
else None
if not result:
results = DDGS().answers(name)
result = "Information found about " + name + ": " + results[0]["text"] + "\n" \
if len(results) > 0 and "text" in results[0] \
else "No information can be found about the company " + name
return result
def _arun(self, name: str):
raise NotImplementedError("This tool does not support async")
LLM agent creation#
With the tools defined, the next step is to create an agent that can effectively utilize these tools. This tutorial uses the ReAct logic, which combines the LLM’s ability for reasoning (e.g., chain-of-thought prompting, etc.) and acting (e.g., interfacing with external software, etc.) through a purposely crafted prompt.
import dataiku
from dataiku.langchain.dku_llm import DKUChatLLM
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import AgentExecutor, create_react_agent
LLM_ID = "<A valid LLM ID>" # Replace with a valid LLM id
llm = DKUChatLLM(llm_id=LLM_ID, temperature=0)
# Initializes the agent
# Link the tools
tools = [GetCustomerInfo(), GetCompanyInfo()]
tool_names = [tool.name for tool in tools]
prompt = ChatPromptTemplate.from_template(
"""Answer the following questions as best you can. You have only access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}""")
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools,
verbose=True, return_intermediate_steps=True, handle_parsing_errors=True)
Defining the routes#
The first step is to define the routes you want your API to handle. A single route is responsible for a (simple) process. Dataiku provides an easy way to describe those routes. Relying on a Flask server helps you return the desired resource types. Check the API access in the web apps’ settings to use this functionality, as shown in Figure 1.
This tutorial relies on a single route parametrized by the customer’s ID to query the LLM and give the user the appropriate answer. Once you have set the code env in the settings panel, you will define the route, as shown in Code 4.
@app.server.route("/get_customer_info/<customer_id>")
def get_customer_info(customer_id):
"""
Ask the agent to retrieve information about the customer
Args:
customer_id: the customer ID
Returns:
Information about the customer
"""
return agent_executor.invoke(
{
"input": f"""Give all the professional information you can about the customer with ID: {customer_id}. Also include information about the company if you can.""",
"tools": tools,
"tool_names": tool_names
})["output"]
Testing the API#
Once you have set up everything, you may test your API.
Testing the API can be done in different ways.
They are all required to know the WEBAPP_ID
and the PROJECT_KEY
.
The WEBAPP_ID
is the first eight characters (before the underscore) in the webapp URL.
For example, if the webapp URL in DSS is /projects/HEADLESS/webapps/kUDF1mQ_api/view
,
the WEBAPP_ID
is kUDF1mQ
, and the PROJECT_KEY
is HEADLESS
.
Additionally, you may need an API key to test the API, depending on the way you want to access your API.
Please read this documentation if you need help setting up an API key.
Via browser#
In a browser, enter the URL:
http://<DSS_ADDRESS>:<DSS_PORT>/web-apps-backends/<PROJECT_KEY>/<WEBAPP_ID>/get_customer_info/<customer_ID>
This will require the user to be logged to access to this resource.
Via command line#
Using cUrl
requires an API key to access the headless API or an equivalent way of authenticating,
depending on the authentication method set on the Dataiku instance.
curl -X GET --header 'Authorization: Bearer <USE_YOUR_API_KEY>' \
'http://<DSS_ADDRESS>:<DSS_PORT>/web-apps-backends/<PROJECT_KEY>/<WEBAPP_ID>/get_customer_info/<customer_ID>'
Via Python#
You can access the headless API using the Python API.
Depending on whether you are inside Dataiku or outside, you will use the dataikuapi
or the dataiku
package,
respectively, as shown in Code 5.
import dataiku, dataikuapi
API_KEY="bx73rdSrUHol2qfmmefetUBCPUaJd3BY"
DSS_LOCATION = "http://dss.example.com/"
PROJECT_KEY = "HEADLESS"
WEBAPP_ID = "kUDF1mQ"
# If you are outside Dataiku use this function call
client = dataikuapi.DSSClient(DSS_LOCATION, API_KEY)
# If you are inside Dataiku you can use this function call
client = dataiku.api_client()
project = client.get_project(PROJECT_KEY)
webapp = project.get_webapp(WEBAPP_ID)
backend = webapp.get_backend_client()
# To filter on one user
print(backend.session.get(backend.base_url + '/get_customer_info/fdouetteau').text)
Wrapping up#
Congratulations! You have completed this tutorial and have a working API serving an agent. You can try to tweak the agent or integrate this API into a more complex process.
Here is the complete code of the headless web application:
app.py
from dash import html
import dataiku
from dataiku.langchain.dku_llm import DKUChatLLM
from langchain.tools import BaseTool
from dataiku import SQLExecutor2
from langchain.pydantic_v1 import BaseModel, Field
from typing import Type
from duckduckgo_search import DDGS
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import AgentExecutor, create_react_agent
LLM_ID = "<A valid LLM ID>" # Replace with a valid LLM id
llm = DKUChatLLM(llm_id=LLM_ID, temperature=0)
class CustomerInfo(BaseModel):
"""Parameter for GetCustomerInfo"""
id: str = Field(description="customer ID")
class GetCustomerInfo(BaseTool):
"""Gathering customer information"""
name = "GetCustomerInfo"
description = "Provide a name, job title and company of a customer, given the customer's ID"
args_schema: Type[BaseModel] = CustomerInfo
def _run(self, id: str):
dataset = dataiku.Dataset("pro_customers_sql")
table_name = dataset.get_location_info().get('info', {}).get('table')
executor = SQLExecutor2(dataset=dataset)
eid = id.replace("'", "\\'")
query_reader = executor.query_to_iter(
f"""SELECT "name", "job", "company" FROM "{table_name}" WHERE "id" = '{eid}'""")
for (name, job, company) in query_reader.iter_tuples():
return f"The customer's name is \"{name}\", holding the position \"{job}\" at the company named {company}"
return f"No information can be found about the customer {id}"
def _arun(self, name: str):
raise NotImplementedError("This tool does not support async")
class CompanyInfo(BaseModel):
"""Parameter for the GetCompanyInfo"""
name: str = Field(description="Company's name")
class GetCompanyInfo(BaseTool):
"""Class for gathering in the company information"""
name = "GetCompanyInfo"
description = "Provide general information about a company, given the company's name."
args_schema: Type[BaseModel] = CompanyInfo
def _run(self, name: str):
results = DDGS().answers(name + " (company)")
result = "Information found about " + name + ": " + results[0]["text"] + "\n" \
if len(results) > 0 and "text" in results[0] \
else None
if not result:
results = DDGS().answers(name)
result = "Information found about " + name + ": " + results[0]["text"] + "\n" \
if len(results) > 0 and "text" in results[0] \
else "No information can be found about the company " + name
return result
def _arun(self, name: str):
raise NotImplementedError("This tool does not support async")
# Initializes the agent
# Link the tools
tools = [GetCustomerInfo(), GetCompanyInfo()]
tool_names = [tool.name for tool in tools]
prompt = ChatPromptTemplate.from_template(
"""Answer the following questions as best you can. You have only access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}""")
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools,
verbose=True, return_intermediate_steps=True, handle_parsing_errors=True)
@app.server.route("/get_customer_info/<customer_id>")
def get_customer_info(customer_id):
"""
Ask the agent to retrieve information about the customer
Args:
customer_id: the customer ID
Returns:
Information about the customer
"""
return agent_executor.invoke(
{
"input": f"""Give all the professional information you can about the customer with ID: {customer_id}. Also include information about the company if you can.""",
"tools": tools,
"tool_names": tool_names
})["output"]
# build your Dash app
app.layout = html.Div()