Using Dash and Azure OpenAI to build a GPT-powered web app assistant#

Prerequisites#

  • Dataiku >= 12.0

  • “Use” permission on a code environment using Python >= 3.9 with the following packages:
    • openai (tested with version 0.27.8)

    • dash (tested with version 2.10.2)

    • dash-bootstrap-components (tested with version 1.4.1)

  • Access to an existing project with the following permissions:
    • “Read project content”

    • “Write project content”

  • Access to an Azure subscription with the required permissions to use the Azure OpenAI service.

Introduction#

The OpenAI API offers powerful tools that enable data scientists to integrate cutting-edge natural language processing (NLP) capabilities into their applications. In particular, it exposes its latest large language models (LLM) from the GPT family to be easily queried. Combining these tools with the coder-oriented features of Dataiku further empowers the platform users to configure and run NLP tasks in a project.

In this tutorial, you will learn how to call a GPT model into a Dataiku web app for a simple question answering task.

Setting up the Azure OpenAI API client#

The code environment you will be using comes with the official Python client for the OpenAI API, with some Azure specificities. This package provides a convenient set of helpers on top of OpenAI’s REST APIs, avoiding the need for writing low-level HTTP calls.

Authentication#

From your Azure subscription, retrieve the ENDPOINT, API-KEY and DEPLOYMENT-NAME values following the instructions from the Azure documentation.

Next, create a Dataiku user secret in your profile called AZURE_OPENAI_API_KEY. You can set it up with the Dataiku public API by following the instructions here and reading more about it in the product documentation.

Configuration#

In this part, you will populate your project library with essential building blocks, making your code modular and easily reusable.

Go to your project library, and under python/, create a new directory called azure_gpt_utils. Inside that directory, create two files:

  • __init__.py that should be left empty

  • auth.py that will fetch the Azure OpenAI API secret key from the user secrets: (Code 1: auth.py)

Code 1: auth.py#
import dataiku


def get_api_key(secret_name: str = "AZURE_OPENAI_API_KEY") -> str:
    """
    Get an API Key from Dataiku user secrets
    Args:
        secret_name: The secret key you want to retrieve.

    Returns:
        The value of the secret key (if found)
    """
    client = dataiku.api_client()
    auth_info = client.get_auth_info(with_secrets=True)
    secret_value = None
    for secret in auth_info["secrets"]:
        if secret["key"] == secret_name:
            secret_value = secret["value"]
            break
    if not secret_value:
        raise Exception(f"User secret key {secret_name} not found.")
    else:
        return secret_value

Calling the GPT model through the OpenAI API#

You are now equipped to leverage the OpenAI models. In practice, you will use the *Chat Completions API*, which will allow you to design advanced prompts with the latest available models. In short, your input will be made of a list of messages, each with a specific type:

  • system messages define how the model should behave

  • user messages contain the requests or comments provided by the user to which the model should respond

From the azure_gpt_utils directory, create a new file called chat.py with Code 2: Chat GPT API call.

Code 2: Chat GPT API call#
import openai
import json
from typing import List, Dict
from .auth import get_api_key
import tiktoken

DEFAULT_TEMPERATURE = 0
DEFAULT_MAX_TOKENS = 500
DEFAULT_MODEL = "gpt-3.5-turbo"

openai.api_key = get_api_key("AZURE_OPENAI_API_KEY") 
openai.api_base = "https://xxx.openai.azure.com" # Replace with your ENDPOINT value
openai.api_type = "azure"
openai.api_version = "2023-05-15"

deployment_name = "xxx" # Replace with your DEPLOYMENT-NAME value

def send_prompt(messages: str, model: str = DEFAULT_MODEL):
    """
    Send a question to the Chat GPT Bot
    Args:
        messages: The messages with the context
        model: Model used

    Returns:
        The response of the bot.
    """
    response = openai.ChatCompletion.create(
        engine=deployment_name,
        messages=messages,
        temperature=DEFAULT_TEMPERATURE,
        max_tokens=DEFAULT_MAX_TOKENS
    )
    return response['choices'][0]['message']['content']

To check if everything is working as expected, you can run Code 3 in a notebook.

Code 3: Code for testing if all requirements are met.#
from azure_gpt_utils.chat import send_prompt

base_system_message = "You are a helpful data-scientist assistant."
system_message = f"{base_system_message.strip()}"

question = "When was Dataiku created?"

messages=[
    {"role": "system", "content": system_message},
    {"role": "user", "content": question}
]

answer = send_prompt(messages)
print(answer)

messages.append({"role": "assistant", "content": answer})
messages.append({"role": "user", "content": "What is it activities?"})

while len(messages) > 10:
    messages.pop(0)

response = send_prompt(messages)
print(response)

Building the web app#

CSS file#

Before sketching the web app, you must create an accessible CSS file. Go to the Application menu > Global Shared Code > Static web resources. Inside the local-static directory, create a file named: loading-state.css with the content shown in Code 4. As shown in the next section, this code will help displaying an indicator when you query the model.

Code 4: CSS code for indication of a loading state#
*[data-dash-is-loading="true"]{
    visibility: hidden;
}
*[data-dash-is-loading="true"]::before{
    content: "Thinking...";
    display: inline-block;
    visibility: visible;
}

Sketching the Webapp#

First, you need to create an empty Dash webapp. If you don’t know how to create one, please refer to this mini-tutorial.

Then, import all the required libraries and configure the webapp to use the CSS file that you created before. Code 5 shows how to do it.

Code 5: Import packages#
import dash
from dash import html
from dash import dcc
import dash_bootstrap_components as dbc
from dash.dependencies import Input
from dash.dependencies import Output
from dash.dependencies import State
from dash.exceptions import PreventUpdate
import dataiku

from azure_gpt_utils.chat import send_prompt
from dataset_io.save import write_question_answer_sql
from dataset_io.save import write_question_answer_csv

Now you can design the application. You will need a user input and a chatGPT output for this application. As the webapp focuses on a question-answering bot, you should keep some context (the previously asked questions and answers). You have to limit the context size to avoid long queries and reduce costs. There are various ways to do that, but the most understandable is restricting the messages kept by a certain amount. Usually, OpenAI uses a token counter to do that, but it is not a human-readable metric. You also may need a button to reset the conversation if the user needs it.

To sum up, we need the following:

  • a number input (for the size of the kept messages)

  • a button to reset the conversation

  • a text input (for the user input)

  • a text to display the response.

Code 6 implements this application. The highlighted line is where the application will display a processing spinner when the user sends its request. This line works in conjunction with the CSS defined earlier.

Code 6: Design of the application.#
# use the style of examples on the Plotly documentation
app.config.external_stylesheets = [dbc.themes.BOOTSTRAP, "/local/static/loading-state.css"]

search_text_layout = html.Div([
    dcc.Store(id='messages', data=[
        {"role": "system", "content": "You are a helpful assistant"}]),
    dbc.Row([
        dbc.Label("Max messages", html_for="max_messages", width=2),
        dbc.Col(dbc.Input(id="max_messages", value="5", type="number", min=1, max=10), width=2),
        dbc.Col(width=6),
        dbc.Col(dbc.Button("Reset conversation", id="flush_messages", n_clicks=0, class_name="btn-danger"), width=2,
                class_name="d-grid col-2 gap-2")
    ], class_name="mb-3", ),
    dbc.Row([
        dbc.Label("Ask your question", html_for="search_input", width=2),
        dbc.Col(html.Div(children=[
            dbc.Input(id="search_input", placeholder="What can I do for you?"),
            dcc.Loading(id="ls-loading-1", children=[html.Div(id="ls-loading-output-1")], type="default")]), width=10),
    ], className="mb-3", ),
])

# build your Dash app
app.layout = html.Div([
    search_text_layout,
    dbc.Row([

Now that you have designed the application, you only have to connect the components and call the associated functions. The first highlighted line in Code 7 is the callback associated with the Q&A processing. The second one is for resetting the current conversation.

Code 7: Callbacks of the webapp#
@app.callback(
    [Output("ls-loading-output-1", "children"),
     Output("text_output", "value"),
     Output("messages", "data")],
    Input("search_input", "n_submit"),
    State("search_input", "value"),
    State("max_messages", "value"),
    State("messages", "data"),
    running=[
        (Output("search_button", "disabled"), True, False),
    ],
    prevent_initial_call=True
)
def get_answer(_, question, max_messages, messages):
    """
    Ask a question to Chat GPT (with some context), and give back the response
    Args:
        _: number of enter pressed in the input text (not used)
        question: the question (with the context)
        max_messages: number of context messages to keep
        messages: the context

    Returns:
        the response, and an update version of the context
    """
    if not (question) or not (max_messages) or not (messages):
        raise PreventUpdate

    while len(messages) > int(max_messages):
        messages.pop(1)

    messages.append({"role": "user", "content": question})
    try:
        answer = send_prompt(messages)

        messages.append({"role": "assistant", "content": answer})
        return ["", answer, messages]
    except:
        return ["", "Something went wrong", messages]

@app.callback(
    Output("messages", "data", allow_duplicate=True),
    Input("flush_messages", "n_clicks"),
    prevent_initial_call=True
)
def reset_conversation(_clicks):
    """
    Reset the conversation
    Args:
        _clicks: number of clicks on the flush button (unused)

    Returns:
        a new context for the conversation
    """
    return [{"role": "system", "content": "You are a helpful assistant"}]


Saving the response into a dataset#

Requesting GPT assistance might be costly, so you could save the already answered question and its response. So, before requesting the GPT assistant for an answer, check if the question has been asked before. If so, reply with the previous response or ask the GTP assistant if the question has yet to be asked. Depending on the dataset type, adding data to an existing dataset could be done in various ways. Let’s consider two kinds of datasets:

  • An SQL-like dataset, you can then use the dataiku.SQLExecutor2 to insert data into a dataset.

  • A CSV-like dataset, you can then load the data as a dataframe, add data and save back the dataframe. This method requires loading the whole dataset into memory and may be inappropriate for big datasets.

Adding a button to save the data#

Before digging into the details, you should add a button to the webapp allowing the user to save the question and the answer into a dataset. This is done in two steps:

  • First, add a save button to the layout:
    Code 8: Adding a button for saving the Q&A#
    dbc.Row([dbc.Col(dbc.Button("Save this answer",
                                id="save_answer",
                                n_clicks=0,
                                class_name="btn-primary",
                                size="lg"))],
            justify="end",
            className="d-grid gap-2 col-12 mx-auto", )
    
  • Then connect this button to a callback in charge of saving the Q&A:
    Code 9: Connecting the button to a callback#
    @app.callback(
        Output("save_answer", "n_clicks"),
        Input("save_answer", "n_clicks"),
        State("search_input", "value"),
        State("text_output", "value"),
        prevent_initial_call = True
    )
    def save_answer(_, question, answer):
    

Depending on the dataset kind, the implementation of the callback will change.

Using the SQLExecutor#

Considering you already have an SQL dataset, you can use the SQLExecutor to insert a row into a dataset, letting the dataset engine optimize for you. To insert a new row, you will need to use an INSERT statement, like sql = f”””INSERT INTO “{table_name}” (question, answer) VALUES {values_string}”””. Using this kind of statement, you may face trouble if the answer contains some specific characters (like ‘, “, …). As you have no control of the response (and fine-tuning the prompt might be tough to prevent the assistant from generating an appropriate answer), you must encode the response before inserting it into the dataset into a proper format.

Code 10 shows how to encode the question and the response before inserting the data using the SQL statement and the dataiku.SQLExecutor2.

Code 10: Saving the data into a SQL-Like dataset#
def write_question_answer_sql(client, dataset_name, connection, connection_type, project_key, question, answer):
    dataset = dataiku.Dataset(dataset_name)
    table_name = dataset.get_location_info().get('info', {}).get('table')
    value_string = (f"('{base64.b64encode(question.encode('utf-8')).decode('utf-8')}', "
                    f"'{base64.b64encode(answer.encode('utf-8')).decode('utf-8')}')")
    sql = f"""INSERT INTO "{table_name}" (question, answer) VALUES {value_string}"""
    client.sql_query(sql, connection=connection, type=connection_type, project_key=dataset.project_key,
                     post_queries=['COMMIT'])

Using dataframe#

If your dataset fits into memory, you can rely on the dataframe to append the data to an existing CSV-like dataset. The principle is straightforward:

  • Read the dataset as a dataframe.

  • Create the data.

  • Append them to the dataframe.

  • Save back the dataframe.

Code 10 shows how to do this.

Code 10: Saving the data into an CSV-Like dataset#
def write_question_answer_csv(dataset_name, question, answer):
    dataset = dataiku.Dataset(dataset_name)
    row = {
        "question": [f"""{question}"""],
        "answer": [f"""{answer}"""]
    }
    df = dataset.get_dataframe()
    df = df.append(pd.DataFrame(row))
    with dataset.get_writer() as writer:
        writer.write_dataframe(df)

Wrapping up#

Congratulations! You have completed this tutorial and built a Dash web application that enables ChatGPT integration inside a Dataiku webapp. Understanding all these basic concepts allows you to create more complex applications.

You can add a field to this web application to tweak the way Chat GPT answers or try to reduce the number of tokens used in a query. If you save the data into a dataset, you can look for the question before requesting the ChatGPT assistant.

We put the code for saving the data into a separate library file (dataset_io/save.py) and used the saving CSV version in the webapp.

auth.py
import dataiku


def get_api_key(secret_name: str = "AZURE_OPENAI_API_KEY") -> str:
    """
    Get an API Key from Dataiku user secrets
    Args:
        secret_name: The secret key you want to retrieve.

    Returns:
        The value of the secret key (if found)
    """
    client = dataiku.api_client()
    auth_info = client.get_auth_info(with_secrets=True)
    secret_value = None
    for secret in auth_info["secrets"]:
        if secret["key"] == secret_name:
            secret_value = secret["value"]
            break
    if not secret_value:
        raise Exception(f"User secret key {secret_name} not found.")
    else:
        return secret_value
chat.py
import openai
import json
from typing import List, Dict
from .auth import get_api_key
import tiktoken

DEFAULT_TEMPERATURE = 0
DEFAULT_MAX_TOKENS = 500
DEFAULT_MODEL = "gpt-3.5-turbo"

openai.api_key = get_api_key("AZURE_OPENAI_API_KEY") 
openai.api_base = "https://xxx.openai.azure.com" # Replace with your ENDPOINT value
openai.api_type = "azure"
openai.api_version = "2023-05-15"

deployment_name = "xxx" # Replace with your DEPLOYMENT-NAME value

def send_prompt(messages: str, model: str = DEFAULT_MODEL):
    """
    Send a question to the Chat GPT Bot
    Args:
        messages: The messages with the context
        model: Model used

    Returns:
        The response of the bot.
    """
    response = openai.ChatCompletion.create(
        engine=deployment_name,
        messages=messages,
        temperature=DEFAULT_TEMPERATURE,
        max_tokens=DEFAULT_MAX_TOKENS
    )
    return response['choices'][0]['message']['content']
loading-state.css
*[data-dash-is-loading="true"]{
    visibility: hidden;
}
*[data-dash-is-loading="true"]::before{
    content: "Thinking...";
    display: inline-block;
    visibility: visible;
}
save.py
import dataiku
import base64
import pandas as pd


def write_question_answer_sql(client, dataset_name, connection, connection_type, project_key, question, answer):
    """
    Save data into a SQL like dataset
    Args:
        client: the dataiku client
        dataset_name: name of the SQL dataset
        connection: name of the SQL connection used for saving
        connection_type: type of connection
        project_key: project key
        question: the question
        answer: the answer
    """
    dataset = dataiku.Dataset(dataset_name)
    table_name = dataset.get_location_info().get('info', {}).get('table')
    value_string = f"('{base64.b64encode(question.encode('utf-8')).decode('utf-8')}', '{base64.b64encode(answer.encode('utf-8')).decode('utf-8')}')"
    sql = f"""INSERT INTO "{table_name}" (question, answer) VALUES {value_string}"""
    client.sql_query(sql, connection=connection, type=connection_type, project_key=dataset.project_key,
                     post_queries=['COMMIT'])


def write_question_answer_csv(dataset_name, question, answer):
    """
    Save data into a CSV like dataset
    Args:
        dataset_name: the CSV dataset
        question: the question
        answer: the answer
    """
    dataset = dataiku.Dataset(dataset_name)
    row = {
        "question": [f"""{question}"""],
        "answer": [f"""{answer}"""]
    }
    df = dataset.get_dataframe()
    df = df.append(pd.DataFrame(row))
    with dataset.get_writer() as writer:
        writer.write_dataframe(df)
webapp.py
import dash
from dash import html
from dash import dcc
import dash_bootstrap_components as dbc
from dash.dependencies import Input
from dash.dependencies import Output
from dash.dependencies import State
from dash.exceptions import PreventUpdate
import dataiku

from azure_gpt_utils.chat import send_prompt
from dataset_io.save import write_question_answer_sql
from dataset_io.save import write_question_answer_csv

# use the style of examples on the Plotly documentation
app.config.external_stylesheets = [dbc.themes.BOOTSTRAP, "/local/static/loading-state.css"]

search_text_layout = html.Div([
    dcc.Store(id='messages', data=[
        {"role": "system", "content": "You are a helpful assistant"}]),
    dbc.Row([
        dbc.Label("Max messages", html_for="max_messages", width=2),
        dbc.Col(dbc.Input(id="max_messages", value="5", type="number", min=1, max=10), width=2),
        dbc.Col(width=6),
        dbc.Col(dbc.Button("Reset conversation", id="flush_messages", n_clicks=0, class_name="btn-danger"), width=2,
                class_name="d-grid col-2 gap-2")
    ], class_name="mb-3", ),
    dbc.Row([
        dbc.Label("Ask your question", html_for="search_input", width=2),
        dbc.Col(html.Div(children=[
            dbc.Input(id="search_input", placeholder="What can I do for you?"),
            dcc.Loading(id="ls-loading-1", children=[html.Div(id="ls-loading-output-1")], type="default")]), width=10),
    ], className="mb-3", ),
])

# build your Dash app
app.layout = html.Div([
    search_text_layout,
    dbc.Row([
        dbc.Col(width=2),
        dbc.Col(dbc.Textarea(id="text_output", style={"height": "200px"}), width=10)], class_name="mb-3"),
    dbc.Row(
        [dbc.Col(dbc.Button("Save this answer", id="save_answer", n_clicks=0, class_name="btn-primary", size="lg"))],
        justify="end", className="d-grid gap-2 col-12 mx-auto", )
], className="container-fluid mt-3")


@app.callback(
    [Output("ls-loading-output-1", "children"),
     Output("text_output", "value"),
     Output("messages", "data")],
    Input("search_input", "n_submit"),
    State("search_input", "value"),
    State("max_messages", "value"),
    State("messages", "data"),
    running=[
        (Output("search_button", "disabled"), True, False),
    ],
    prevent_initial_call=True
)
def get_answer(_, question, max_messages, messages):
    """
    Ask a question to Chat GPT (with some context), and give back the response
    Args:
        _: number of enter pressed in the input text (not used)
        question: the question (with the context)
        max_messages: number of context messages to keep
        messages: the context

    Returns:
        the response, and an update version of the context
    """
    if not (question) or not (max_messages) or not (messages):
        raise PreventUpdate

    while len(messages) > int(max_messages):
        messages.pop(1)

    messages.append({"role": "user", "content": question})
    try:
        answer = send_prompt(messages)

        messages.append({"role": "assistant", "content": answer})
        return ["", answer, messages]
    except:
        return ["", "Something went wrong", messages]

@app.callback(
    Output("messages", "data", allow_duplicate=True),
    Input("flush_messages", "n_clicks"),
    prevent_initial_call=True
)
def reset_conversation(_clicks):
    """
    Reset the conversation
    Args:
        _clicks: number of clicks on the flush button (unused)

    Returns:
        a new context for the conversation
    """
    return [{"role": "system", "content": "You are a helpful assistant"}]


@app.callback(
    Output("save_answer", "n_clicks"),
    Input("save_answer", "n_clicks"),
    State("search_input", "value"),
    State("text_output", "value"),
    prevent_initial_call=True
)
def save_answer(_clicks, question, answer):
    """
    Save the answer
    Args:
        _clicks: number of clicks on the flush button (unused)
        question: the question
        answer: the answer

    Returns:

    """
    ## Uncomment these lines if you need to save into an SQL dataset

    #    client =  dataiku.api_client()
    #    dataset_name = "History_SQL"
    #    connection = "postgres"
    #    connection_type = "sql"
    #    project_key = client.get_default_project()

    #    write_question_answer_sql(client, dataset_name, connection, connection_type, project_key, question, answer)

    ## Saving into a CSV dataset

    dataset_name = "History"
    write_question_answer_csv(dataset_name, question, answer)