Using Dash and Azure OpenAI to build a GPT-powered web app assistant#
Prerequisites#
Dataiku >= 12.0
- “Use” permission on a code environment using Python >= 3.9 with the following packages:
openai
(tested with version0.27.8
)dash
(tested with version2.10.2
)dash-bootstrap-components
(tested with version1.4.1
)
- Access to an existing project with the following permissions:
“Read project content”
“Write project content”
Access to an Azure subscription with the required permissions to use the Azure OpenAI service.
Introduction#
The OpenAI API offers powerful tools that enable data scientists to integrate cutting-edge natural language processing (NLP) capabilities into their applications. In particular, it exposes its latest large language models (LLM) from the GPT family to be easily queried. Combining these tools with the coder-oriented features of Dataiku further empowers the platform users to configure and run NLP tasks in a project.
In this tutorial, you will learn how to call a GPT model into a Dataiku web app for a simple question answering task.
Setting up the Azure OpenAI API client#
The code environment you will be using comes with the official Python client for the OpenAI API, with some Azure specificities. This package provides a convenient set of helpers on top of OpenAI’s REST APIs, avoiding the need for writing low-level HTTP calls.
Authentication#
From your Azure subscription, retrieve the ENDPOINT, API-KEY and DEPLOYMENT-NAME values following the instructions from the Azure documentation.
Next, create a Dataiku user secret in your profile called AZURE_OPENAI_API_KEY. You can set it up with the Dataiku public API by following the instructions here and reading more about it in the product documentation.
Configuration#
In this part, you will populate your project library with essential building blocks, making your code modular and easily reusable.
Go to your project library, and under python/
, create a new directory called azure_gpt_utils
.
Inside that directory, create two files:
__init__.py
that should be left emptyauth.py
that will fetch the Azure OpenAI API secret key from the user secrets: (Code 1: auth.py)
import dataiku
def get_api_key(secret_name: str = "AZURE_OPENAI_API_KEY") -> str:
"""
Get an API Key from Dataiku user secrets
Args:
secret_name: The secret key you want to retrieve.
Returns:
The value of the secret key (if found)
"""
client = dataiku.api_client()
auth_info = client.get_auth_info(with_secrets=True)
secret_value = None
for secret in auth_info["secrets"]:
if secret["key"] == secret_name:
secret_value = secret["value"]
break
if not secret_value:
raise Exception(f"User secret key {secret_name} not found.")
else:
return secret_value
Calling the GPT model through the OpenAI API#
You are now equipped to leverage the OpenAI models. In practice, you will use the
*Chat Completions API*, which will allow you to
design advanced prompts with the latest available models. In short, your input will be made of a list of messages
,
each with a specific type:
system
messages define how the model should behaveuser
messages contain the requests or comments provided by the user to which the model should respond
From the azure_gpt_utils
directory, create a new file called chat.py
with Code 2: Chat GPT API call.
import openai
import json
from typing import List, Dict
from .auth import get_api_key
import tiktoken
DEFAULT_TEMPERATURE = 0
DEFAULT_MAX_TOKENS = 500
DEFAULT_MODEL = "gpt-3.5-turbo"
openai.api_key = get_api_key("AZURE_OPENAI_API_KEY")
openai.api_base = "https://xxx.openai.azure.com" # Replace with your ENDPOINT value
openai.api_type = "azure"
openai.api_version = "2023-05-15"
deployment_name = "xxx" # Replace with your DEPLOYMENT-NAME value
def send_prompt(messages: str, model: str = DEFAULT_MODEL):
"""
Send a question to the Chat GPT Bot
Args:
messages: The messages with the context
model: Model used
Returns:
The response of the bot.
"""
response = openai.ChatCompletion.create(
engine=deployment_name,
messages=messages,
temperature=DEFAULT_TEMPERATURE,
max_tokens=DEFAULT_MAX_TOKENS
)
return response['choices'][0]['message']['content']
To check if everything is working as expected, you can run Code 3 in a notebook.
from azure_gpt_utils.chat import send_prompt
base_system_message = "You are a helpful data-scientist assistant."
system_message = f"{base_system_message.strip()}"
question = "When was Dataiku created?"
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": question}
]
answer = send_prompt(messages)
print(answer)
messages.append({"role": "assistant", "content": answer})
messages.append({"role": "user", "content": "What is it activities?"})
while len(messages) > 10:
messages.pop(0)
response = send_prompt(messages)
print(response)
Building the web app#
CSS file#
Before sketching the web app, you must create an accessible CSS file.
Go to the Application menu > Global Shared Code > Static web resources. Inside the local-static
directory,
create a file named: loading-state.css
with the content shown in
Code 4. As shown in the next section, this code will help displaying
an indicator when you query the model.
*[data-dash-is-loading="true"]{
visibility: hidden;
}
*[data-dash-is-loading="true"]::before{
content: "Thinking...";
display: inline-block;
visibility: visible;
}
Sketching the Webapp#
First, you need to create an empty Dash webapp. If you don’t know how to create one, please refer to this mini-tutorial.
Then, import all the required libraries and configure the webapp to use the CSS file that you created before. Code 5 shows how to do it.
import dash
from dash import html
from dash import dcc
import dash_bootstrap_components as dbc
from dash.dependencies import Input
from dash.dependencies import Output
from dash.dependencies import State
from dash.exceptions import PreventUpdate
import dataiku
from azure_gpt_utils.chat import send_prompt
from dataset_io.save import write_question_answer_sql
from dataset_io.save import write_question_answer_csv
Now you can design the application. You will need a user input and a chatGPT output for this application. As the webapp focuses on a question-answering bot, you should keep some context (the previously asked questions and answers). You have to limit the context size to avoid long queries and reduce costs. There are various ways to do that, but the most understandable is restricting the messages kept by a certain amount. Usually, OpenAI uses a token counter to do that, but it is not a human-readable metric. You also may need a button to reset the conversation if the user needs it.
To sum up, we need the following:
a number input (for the size of the kept messages)
a button to reset the conversation
a text input (for the user input)
a text to display the response.
Code 6 implements this application. The highlighted line is where the application will display a processing spinner when the user sends its request. This line works in conjunction with the CSS defined earlier.
# use the style of examples on the Plotly documentation
app.config.external_stylesheets = [dbc.themes.BOOTSTRAP, "/local/static/loading-state.css"]
search_text_layout = html.Div([
dcc.Store(id='messages', data=[
{"role": "system", "content": "You are a helpful assistant"}]),
dbc.Row([
dbc.Label("Max messages", html_for="max_messages", width=2),
dbc.Col(dbc.Input(id="max_messages", value="5", type="number", min=1, max=10), width=2),
dbc.Col(width=6),
dbc.Col(dbc.Button("Reset conversation", id="flush_messages", n_clicks=0, class_name="btn-danger"), width=2,
class_name="d-grid col-2 gap-2")
], class_name="mb-3", ),
dbc.Row([
dbc.Label("Ask your question", html_for="search_input", width=2),
dbc.Col(html.Div(children=[
dbc.Input(id="search_input", placeholder="What can I do for you?"),
dcc.Loading(id="ls-loading-1", children=[html.Div(id="ls-loading-output-1")], type="default")]), width=10),
], className="mb-3", ),
])
# build your Dash app
app.layout = html.Div([
search_text_layout,
dbc.Row([
Now that you have designed the application, you only have to connect the components and call the associated functions. The first highlighted line in Code 7 is the callback associated with the Q&A processing. The second one is for resetting the current conversation.
@app.callback(
[Output("ls-loading-output-1", "children"),
Output("text_output", "value"),
Output("messages", "data")],
Input("search_input", "n_submit"),
State("search_input", "value"),
State("max_messages", "value"),
State("messages", "data"),
running=[
(Output("search_button", "disabled"), True, False),
],
prevent_initial_call=True
)
def get_answer(_, question, max_messages, messages):
"""
Ask a question to Chat GPT (with some context), and give back the response
Args:
_: number of enter pressed in the input text (not used)
question: the question (with the context)
max_messages: number of context messages to keep
messages: the context
Returns:
the response, and an update version of the context
"""
if not (question) or not (max_messages) or not (messages):
raise PreventUpdate
while len(messages) > int(max_messages):
messages.pop(1)
messages.append({"role": "user", "content": question})
try:
answer = send_prompt(messages)
messages.append({"role": "assistant", "content": answer})
return ["", answer, messages]
except:
return ["", "Something went wrong", messages]
@app.callback(
Output("messages", "data", allow_duplicate=True),
Input("flush_messages", "n_clicks"),
prevent_initial_call=True
)
def reset_conversation(_clicks):
"""
Reset the conversation
Args:
_clicks: number of clicks on the flush button (unused)
Returns:
a new context for the conversation
"""
return [{"role": "system", "content": "You are a helpful assistant"}]
Saving the response into a dataset#
Requesting GPT assistance might be costly, so you could save the already answered question and its response. So, before requesting the GPT assistant for an answer, check if the question has been asked before. If so, reply with the previous response or ask the GTP assistant if the question has yet to be asked. Depending on the dataset type, adding data to an existing dataset could be done in various ways. Let’s consider two kinds of datasets:
An SQL-like dataset, you can then use the
dataiku.SQLExecutor2
to insert data into a dataset.A CSV-like dataset, you can then load the data as a dataframe, add data and save back the dataframe. This method requires loading the whole dataset into memory and may be inappropriate for big datasets.
Using the SQLExecutor#
Considering you already have an SQL dataset, you can use the SQLExecutor to insert a row into a dataset, letting the dataset engine optimize for you. To insert a new row, you will need to use an INSERT statement, like sql = f”””INSERT INTO “{table_name}” (question, answer) VALUES {values_string}”””. Using this kind of statement, you may face trouble if the answer contains some specific characters (like ‘, “, …). As you have no control of the response (and fine-tuning the prompt might be tough to prevent the assistant from generating an appropriate answer), you must encode the response before inserting it into the dataset into a proper format.
Code 10 shows how to encode the question
and the response before inserting the data using the SQL statement and the dataiku.SQLExecutor2
.
def write_question_answer_sql(client, dataset_name, connection, connection_type, project_key, question, answer):
dataset = dataiku.Dataset(dataset_name)
table_name = dataset.get_location_info().get('info', {}).get('table')
value_string = (f"('{base64.b64encode(question.encode('utf-8')).decode('utf-8')}', "
f"'{base64.b64encode(answer.encode('utf-8')).decode('utf-8')}')")
sql = f"""INSERT INTO "{table_name}" (question, answer) VALUES {value_string}"""
client.sql_query(sql, connection=connection, type=connection_type, project_key=dataset.project_key,
post_queries=['COMMIT'])
Using dataframe#
If your dataset fits into memory, you can rely on the dataframe to append the data to an existing CSV-like dataset. The principle is straightforward:
Read the dataset as a dataframe.
Create the data.
Append them to the dataframe.
Save back the dataframe.
Code 10 shows how to do this.
def write_question_answer_csv(dataset_name, question, answer):
dataset = dataiku.Dataset(dataset_name)
row = {
"question": [f"""{question}"""],
"answer": [f"""{answer}"""]
}
df = dataset.get_dataframe()
df = df.append(pd.DataFrame(row))
with dataset.get_writer() as writer:
writer.write_dataframe(df)
Wrapping up#
Congratulations! You have completed this tutorial and built a Dash web application that enables ChatGPT integration inside a Dataiku webapp. Understanding all these basic concepts allows you to create more complex applications.
You can add a field to this web application to tweak the way Chat GPT answers or try to reduce the number of tokens used in a query. If you save the data into a dataset, you can look for the question before requesting the ChatGPT assistant.
We put the code for saving the data into a separate library file (dataset_io/save.py
)
and used the saving CSV version in the webapp.
auth.py
import dataiku
def get_api_key(secret_name: str = "AZURE_OPENAI_API_KEY") -> str:
"""
Get an API Key from Dataiku user secrets
Args:
secret_name: The secret key you want to retrieve.
Returns:
The value of the secret key (if found)
"""
client = dataiku.api_client()
auth_info = client.get_auth_info(with_secrets=True)
secret_value = None
for secret in auth_info["secrets"]:
if secret["key"] == secret_name:
secret_value = secret["value"]
break
if not secret_value:
raise Exception(f"User secret key {secret_name} not found.")
else:
return secret_value
chat.py
import openai
import json
from typing import List, Dict
from .auth import get_api_key
import tiktoken
DEFAULT_TEMPERATURE = 0
DEFAULT_MAX_TOKENS = 500
DEFAULT_MODEL = "gpt-3.5-turbo"
openai.api_key = get_api_key("AZURE_OPENAI_API_KEY")
openai.api_base = "https://xxx.openai.azure.com" # Replace with your ENDPOINT value
openai.api_type = "azure"
openai.api_version = "2023-05-15"
deployment_name = "xxx" # Replace with your DEPLOYMENT-NAME value
def send_prompt(messages: str, model: str = DEFAULT_MODEL):
"""
Send a question to the Chat GPT Bot
Args:
messages: The messages with the context
model: Model used
Returns:
The response of the bot.
"""
response = openai.ChatCompletion.create(
engine=deployment_name,
messages=messages,
temperature=DEFAULT_TEMPERATURE,
max_tokens=DEFAULT_MAX_TOKENS
)
return response['choices'][0]['message']['content']
loading-state.css
*[data-dash-is-loading="true"]{
visibility: hidden;
}
*[data-dash-is-loading="true"]::before{
content: "Thinking...";
display: inline-block;
visibility: visible;
}
save.py
import dataiku
import base64
import pandas as pd
def write_question_answer_sql(client, dataset_name, connection, connection_type, project_key, question, answer):
"""
Save data into a SQL like dataset
Args:
client: the dataiku client
dataset_name: name of the SQL dataset
connection: name of the SQL connection used for saving
connection_type: type of connection
project_key: project key
question: the question
answer: the answer
"""
dataset = dataiku.Dataset(dataset_name)
table_name = dataset.get_location_info().get('info', {}).get('table')
value_string = f"('{base64.b64encode(question.encode('utf-8')).decode('utf-8')}', '{base64.b64encode(answer.encode('utf-8')).decode('utf-8')}')"
sql = f"""INSERT INTO "{table_name}" (question, answer) VALUES {value_string}"""
client.sql_query(sql, connection=connection, type=connection_type, project_key=dataset.project_key,
post_queries=['COMMIT'])
def write_question_answer_csv(dataset_name, question, answer):
"""
Save data into a CSV like dataset
Args:
dataset_name: the CSV dataset
question: the question
answer: the answer
"""
dataset = dataiku.Dataset(dataset_name)
row = {
"question": [f"""{question}"""],
"answer": [f"""{answer}"""]
}
df = dataset.get_dataframe()
df = df.append(pd.DataFrame(row))
with dataset.get_writer() as writer:
writer.write_dataframe(df)
webapp.py
import dash
from dash import html
from dash import dcc
import dash_bootstrap_components as dbc
from dash.dependencies import Input
from dash.dependencies import Output
from dash.dependencies import State
from dash.exceptions import PreventUpdate
import dataiku
from azure_gpt_utils.chat import send_prompt
from dataset_io.save import write_question_answer_sql
from dataset_io.save import write_question_answer_csv
# use the style of examples on the Plotly documentation
app.config.external_stylesheets = [dbc.themes.BOOTSTRAP, "/local/static/loading-state.css"]
search_text_layout = html.Div([
dcc.Store(id='messages', data=[
{"role": "system", "content": "You are a helpful assistant"}]),
dbc.Row([
dbc.Label("Max messages", html_for="max_messages", width=2),
dbc.Col(dbc.Input(id="max_messages", value="5", type="number", min=1, max=10), width=2),
dbc.Col(width=6),
dbc.Col(dbc.Button("Reset conversation", id="flush_messages", n_clicks=0, class_name="btn-danger"), width=2,
class_name="d-grid col-2 gap-2")
], class_name="mb-3", ),
dbc.Row([
dbc.Label("Ask your question", html_for="search_input", width=2),
dbc.Col(html.Div(children=[
dbc.Input(id="search_input", placeholder="What can I do for you?"),
dcc.Loading(id="ls-loading-1", children=[html.Div(id="ls-loading-output-1")], type="default")]), width=10),
], className="mb-3", ),
])
# build your Dash app
app.layout = html.Div([
search_text_layout,
dbc.Row([
dbc.Col(width=2),
dbc.Col(dbc.Textarea(id="text_output", style={"height": "200px"}), width=10)], class_name="mb-3"),
dbc.Row(
[dbc.Col(dbc.Button("Save this answer", id="save_answer", n_clicks=0, class_name="btn-primary", size="lg"))],
justify="end", className="d-grid gap-2 col-12 mx-auto", )
], className="container-fluid mt-3")
@app.callback(
[Output("ls-loading-output-1", "children"),
Output("text_output", "value"),
Output("messages", "data")],
Input("search_input", "n_submit"),
State("search_input", "value"),
State("max_messages", "value"),
State("messages", "data"),
running=[
(Output("search_button", "disabled"), True, False),
],
prevent_initial_call=True
)
def get_answer(_, question, max_messages, messages):
"""
Ask a question to Chat GPT (with some context), and give back the response
Args:
_: number of enter pressed in the input text (not used)
question: the question (with the context)
max_messages: number of context messages to keep
messages: the context
Returns:
the response, and an update version of the context
"""
if not (question) or not (max_messages) or not (messages):
raise PreventUpdate
while len(messages) > int(max_messages):
messages.pop(1)
messages.append({"role": "user", "content": question})
try:
answer = send_prompt(messages)
messages.append({"role": "assistant", "content": answer})
return ["", answer, messages]
except:
return ["", "Something went wrong", messages]
@app.callback(
Output("messages", "data", allow_duplicate=True),
Input("flush_messages", "n_clicks"),
prevent_initial_call=True
)
def reset_conversation(_clicks):
"""
Reset the conversation
Args:
_clicks: number of clicks on the flush button (unused)
Returns:
a new context for the conversation
"""
return [{"role": "system", "content": "You are a helpful assistant"}]
@app.callback(
Output("save_answer", "n_clicks"),
Input("save_answer", "n_clicks"),
State("search_input", "value"),
State("text_output", "value"),
prevent_initial_call=True
)
def save_answer(_clicks, question, answer):
"""
Save the answer
Args:
_clicks: number of clicks on the flush button (unused)
question: the question
answer: the answer
Returns:
"""
## Uncomment these lines if you need to save into an SQL dataset
# client = dataiku.api_client()
# dataset_name = "History_SQL"
# connection = "postgres"
# connection_type = "sql"
# project_key = client.get_default_project()
# write_question_answer_sql(client, dataset_name, connection, connection_type, project_key, question, answer)
## Saving into a CSV dataset
dataset_name = "History"
write_question_answer_csv(dataset_name, question, answer)