Using OpenAI-compatible API calls via the LLM Mesh#

An OpenAI-compatible Python client is available for the LLM Mesh. You can use it to send both Chat Completions API and Responses API requests through the same governed endpoint. Instead of handling separate API keys and endpoints for each provider, you can use the LLM Mesh to:

  • Access multiple models using OpenAI’s standard Python format …

  • while maintaining centralized governance, monitoring and cost control …

  • and easily switching between different LLM providers

Prerequisites#

Before starting, ensure you have:

  • Dataiku >= 13.2 for Chat Completions API examples

  • Dataiku >= 14.4.3 for Responses API examples

  • A valid Dataiku API key

  • Project permissions for “Read project content” and “Write project content”

  • An existing OpenAI LLM Mesh connection

  • Python environment with the openai package installed

OpenAI client for the LLM Mesh#

Set up the OpenAI client by pointing to its LLM Mesh configuration. You will need several pieces of information for access and authentication:

  • A public Dataiku URL to access the LLM Mesh API

  • An API key for Dataiku

  • The LLM ID

from openai import OpenAI

# Specify the Dataiku OpenAI-compatible public API URL, e.g. http://my.dss/public/api/projects/PROJECT_KEY/llms/openai/v1/
BASE_URL = ""

# Use your Dataiku API key instead of an OpenAI secret
API_KEY = ""

# Fill with your LLM id - to get the list of LLM ids, you can use dataiku.api_client().project.list_llms()
LLM_ID = "" 

# Initialize the OpenAI client
client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY
)

# Default parameters
DEFAULT_TEMPERATURE = 0
DEFAULT_MAX_TOKENS = 500

Tip

In case you need to find the LLM ID, there’s a standard way to look up all available LLM Mesh configured APIs using the dataiku client. Use the project.list_llms() method and note down the OpenAI model you want to use. It will look something like openai:CONNECTION-NAME:MODEL-NAME.

Choosing an API surface#

The LLM Mesh exposes both OpenAI-compatible endpoints from the same base URL:

  • client.chat.completions.create(...) for the classic chat-completions format

  • client.responses.create(...) for OpenAI’s Responses API

Use the Chat Completions API when you want the familiar messages=[...] format. Use the Responses API when you want the newer input format, typed content items, or event-based streaming. The OpenAI client automatically targets the matching LLM Mesh endpoint for the method you call.

Making requests to OpenAI via LLM Mesh#

Now you can make requests to the LLM just like you would with the standard OpenAI API:

# Create a prompt

context = '''You are a capable ghost writer 
  who helps college applicants'''

content = '''Write a complete 350-word short essay 
  for a college application on the topic - 
  My first memories.'''


prompt = [
    {"role": "system", 
      "content": context}, 
    {'role': 'user',
      'content': content}
]

# Send the request
try:
    response = client.chat.completions.create(
        model=LLM_ID,
        messages=prompt,
        temperature=DEFAULT_TEMPERATURE,
        max_tokens=DEFAULT_MAX_TOKENS
    )
    
    print(response.choices[0].message.content)
except Exception as e:
    print(f"Error making request: {e}")

Note

The Responses API uses input instead of messages, returns generated text in response.output_text, and streams typed events instead of chat-completion delta chunks.

Using typed input with the Responses API#

For simple prompts, a string input is enough. For multimodal inputs or multi-turn conversations, pass a list of typed items instead:

typed_input = [
    {
        "role": "user",
        "content": [
            {
                "type": "input_text",
                "text": "Summarize the three most important points about governed LLM access."
            }
        ]
    }
]

response = client.responses.create(
    model=LLM_ID,
    input=typed_input,
    max_output_tokens=DEFAULT_MAX_TOKENS
)

print(response.output_text)

On follow-up, you can also pass prior response.output items back into the next input list, together with any function_call_output items you generate locally.

Wrapping up#

Now that you have the basic setup working, you can:

  • Experiment with both client.chat.completions.create(...) and client.responses.create(...)

  • Use typed input items with the Responses API for multimodal prompts or tool-calling loops

  • Try structured outputs with client.responses.parse(...) when your model and provider support them

  • Use other LLM providers available through the LLM Mesh

  • Learn more about the OpenAI-compatible setup in the LLM Mesh concept page

Streaming#

The Chat Completions API and the Responses API both support streaming through the same LLM Mesh endpoint, but the event shape differs:

  • The Chat Completions API streams delta chunks

  • The Responses API streams typed events such as response.created, response.output_text.delta, and response.completed

chat_completions_streaming.py
Code 1 – Streaming with the Chat Completions API#
print("📚 .. imports ... ")
print("🤖 .. Python client for OpenAI API calls ...")
from openai import OpenAI

print("⏱ .. library for timing ...")
import time
import httpx # in case of self-signed certificates
print("\n\n")

# Specify the Dataiku OpenAI-compatible public API URL, e.g. http://my.dss/public/api/projects/PROJECT_KEY/llms/openai/v1/
BASE_URL = ""

# Use your Dataiku API key instead of an OpenAI secret
API_KEY = ""

# Fill with your LLM id - to get the list of LLM ids, you can use dataiku.api_client().project.list_llms()
LLM_ID = "" 

# Create an OpenAI client
open_client = OpenAI(
  base_url=BASE_URL,
  api_key=API_KEY,
  http_client=httpx.Client(verify=False)  # in case of self-signed certificates
)

print("🔑 .. client created, key set ...")


DEFAULT_TEMPERATURE = 0
DEFAULT_MAX_TOKENS = 1000

print("\n\n")

context = '''You are a capable ghost writer 
  who helps college applicants'''

content = '''Write a complete 500-word short essay 
  for a college application on the topic - 
  My first memories.'''

prompt = [
    {"role": "system", 
     "content": context}, 
    {'role': 'user',
     'content': content}
]


print(f"This is the prompt: {content}")
print("\n\n")  

print("⏲ .. Record the time before the request is sent ..")
start_time = time.time()

print("📤 .. Send a ChatCompletion request ...")
response = open_client.chat.completions.create(
    model=LLM_ID,
    stream=True,
    messages=prompt,
    temperature=DEFAULT_TEMPERATURE,
    max_tokens=DEFAULT_MAX_TOKENS
)


collected_chunks = []
collected_messages = []

# iterate through the stream of events
for chunk in response:
    chunk_time = time.time() - start_time  # calculate the time delay of the chunk
    collected_chunks.append(chunk)  # save the event response
    chunk_message = chunk.choices[0].delta  # extract the message
    collected_messages.append(chunk_message)  # save the message
    if hasattr(chunk_message, 'content'):
        print(chunk_message.content, end="")

print("\n\n\n")  

# print the time delay and text received
print(f"Full response received {chunk_time:.2f} seconds after request")
full_reply_content = ''.join([m.content for m in collected_messages if hasattr(m, 'content') and m.content is not None])
responses_streaming.py
Code 2 – Streaming with the Responses API#
from openai import OpenAI

import httpx  # Optional: useful for self-signed certificates

# Specify the Dataiku OpenAI-compatible public API URL, e.g. http://my.dss/public/api/projects/PROJECT_KEY/llms/openai/v1/
BASE_URL = ""

# Use your Dataiku API key instead of an OpenAI secret
API_KEY = ""

# Fill with your LLM id - to get the list of LLM ids, you can use dataiku.api_client().project.list_llms()
LLM_ID = ""

client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY,
    http_client=httpx.Client(verify=False),  # Optional: for self-signed certificates
)

stream = client.responses.create(
    model=LLM_ID,
    input="Write a short poem about governed AI platforms.",
    stream=True,
)

collected_text = []

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")
        collected_text.append(event.delta)
    elif event.type == "response.completed":
        print("\n")

full_text = "".join(collected_text)
print(full_text)

Remember that all requests go through the LLM Mesh, which provides monitoring and governing capabilities while maintaining the familiar OpenAI API interface.