LLM Mesh#
The LLM Mesh is the common backbone for Enterprise Generative AI Applications. For more details on the LLM Mesh features of Dataiku, please visit Generative AI and LLM Mesh.
The LLM Mesh API allows you to:
Send completion and embedding queries to all LLMs supported by the LLM Mesh
Stream responses from LLMs that support it
Query LLMs using multimodal inputs (image and text)
Query the LLM Mesh from LangChain code
Interact with knowledge banks, and perform semantic search
Create a fine-tuned saved model
Read LLM Mesh metadata#
List and get LLMs#
import dataiku
client = dataiku.api_client()
project = client.get_default_project()
llm_list = project.list_llms()
By default, list_llms()
returns a list of DSSLLMListItem
. To get more details :
for llm in llm_list:
print(f"- {llm.description} (id: {llm.id})")
Perform completion queries on LLMs#
Your first simple completion query#
This sample receives an LLM and uses a completion query to ask the LLM to “write a haiku on GPT models.”
import dataiku
# Fill with your LLM id. For example, if you have an OpenAI connection called "myopenai", LLM_ID can be "openai:myopenai:gpt-4o"
# To get the list of LLM ids, you can use project.list_llms() (see above)
LLM_ID = ""
# Create a handle for the LLM of your choice
client = dataiku.api_client()
project = client.get_default_project()
llm = project.get_llm(LLM_ID)
# Create and run a completion query
completion = llm.new_completion()
completion.with_message("Write a haiku on GPT models")
resp = completion.execute()
# Display the LLM output
if resp.success:
print(resp.text)
# GPT, a marvel,
# Deep learning's symphony plays,
# Thoughts dance, words unveil.
Multi-turn and system prompts#
You can have multiple messages in the completion
object, with roles
completion = llm.new_completion()
# First, put a system prompt
completion.with_message("You are a poetic assistant who always answers in haikus", role="system")
# Then, give an example, or send the conversation history
completion.with_message("What is a transformer", role="user")
completion.with_message("Transformers, marvels\nOf the deep learning research\nAttention, you need", role="assistant")
# Then, the last query of the user
completion.with_message("What's your name", role="user")
resp = completion.execute()
Multimodal input#
Multimodal input is supported on a subset of the LLMs in the LLM Mesh:
OpenAI
Bedrock Anthropic Claude
Azure OpenAI
Gemini Pro
completion = llm.new_completion()
with open("myimage.jpg", "rb") as f:
image = f.read()
mp_message = completion.new_multipart_message()
mp_message.with_text("The image represents an artwork. Describe it as it would be described by art critics")
mp_message.with_inline_image(image)
# Add it to the completion request
mp_message.add()
resp = completion.execute()
Completion settings#
You can set settings on the completion query
completion = llm.new_completion()
completion.with_message("Write a haiku on GPT models")
completion.settings["temperature"] = 0.7
completion.settings["topK"] = 10
completion.settings["topP"] = 0.3
completion.settings["maxOutputTokens"] = 2048
completion.settings["stopSequences"] = [".", "\n"]
completion.settings["presencePenalty"] = 0.6
completion.settings["frequencyPenalty"] = 0.9
completion.settings["logitBias"] = {
1489: 60, # apply a logit bias of 60 on token value "1489"
}
completion.settings["logProbs"] = True
completion.settings["topLogProbs"] = 3
resp = completion.execute()
Response streaming#
from dataikuapi.dss.llm import DSSLLMStreamedCompletionChunk, DSSLLMStreamedCompletionFooter
completion = llm.new_completion()
completion.with_message("Please explain special relativity")
for chunk in completion.execute_streamed():
if isinstance(chunk, DSSLLMStreamedCompletionChunk):
print("Received text: %s" % chunk.data["text"])
elif isinstance(chunk, DSSLLMStreamedCompletionFooter):
print("Completion is complete: %s" % chunk.data)
Text embedding#
import dataiku
EMBEDDING_MODEL_ID = "" # Fill with your embedding model id, for example: openai:myopenai:text-embedding-3-small
# Create a handle for the embedding model of your choice
client = dataiku.api_client()
project = client.get_default_project()
emb_model = project.get_llm(EMBEDDING_MODEL_ID)
# Create and run an embedding query
txt = "The quick brown fox jumps over the lazy dog."
emb_query = emb_model.new_embeddings()
emb_query.add_text(txt)
emb_resp = emb_query.execute()
# Display the embedding output
print(emb_resp.get_embeddings())
# [[0.000237455,
# -0.103262354,
# ...
# ]]
Tool calls#
Tool calls (sometimes referred to as “function calling”) allow you to augment a LLM with “tools”, functions that it can call and provide the arguments. Your client code can then perform those calls, and provide the output back to the LLM so that it can generate the next response.
Tool calls are supported on the compatible completion models of some LLM connections:
OpenAI
Azure OpenAI
Azure LLM
Anthropic Claude
Anthropic Claude models on AWS Bedrock connections
MistralAI
Define tools#
You can define tools as settings in the completion query. Tool parameters are defined as JSON Schema objects. See the JSON Schema reference for documentation about the format.
Tools can also be automatically prepared and invoked from Python code, e.g. using Langchain.
completion = llm.new_completion()
completion.settings["tools"] = [
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply integers",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "integer",
"description": "The first integer to multiply",
},
"b": {
"type": "integer",
"description": "The other integer to multiply",
},
},
"required": ["a", "b"],
}
}
}
]
completion.with_message("What is 3 * 6 ?")
resp = completion.execute()
# {
# "ok": true,
# "finishReason": "tool_calls",
# "toolCalls": [
# {
# "type": "function",
# "function": {
# "name": "multiply",
# "arguments": "{\"a\":3,\"b\":6}"
# },
# "id": "call_da9P8tJ0TcnvdFbsJtpUwqZr"
# }
# ],
# ...
# }
Response streaming with tool calls#
LLM responses which include tool calls can also leverage streaming. Depending on the LLM, response chunks may
include either complete tool calls or partial tool calls. When the LLM sends partial tool calls, the
streamed chunk contains an extra field index
allowing to reconstruct the whole LLM response.
for chunk in completion.execute_streamed():
if isinstance(chunk, DSSLLMStreamedCompletionChunk):
if "text" in chunk.data:
print("Received text: %s" % chunk.data["text"])
if "toolCalls" in chunk.data:
print("Received tool call: %s" % chunk.data["toolCalls"])
elif isinstance(chunk, DSSLLMStreamedCompletionFooter):
print("Completion is complete: %s" % chunk.data)
Provide tool outputs#
Tool calls can then be parsed and executed. In order to provide the tool response in the chat messages, use the following methods:
tool_calls = resp.tool_calls
call_id = tool_calls[0]["id"]
completion.with_tool_calls(tool_calls)
completion.with_tool_output("18", tool_call_id=call_id)
resp = completion.execute()
# {
# "ok": true,
# "finishReason": "stop",
# "text": "3 multiplied by 6 is 18.",
# ...
# }
Control tool usage#
Tool usage can be constrained in the completion settings:
completion = llm.new_completion()
# Let the LLM decide whether to call a tool
completion.settings["toolChoice"] = {"type": "auto"}
# The LLM must call at least one tool
completion.settings["toolChoice"] = {"type": "required"}
# The LLM must not call any tool
completion.settings["toolChoice"] = {"type": "none"}
# The LLM must call the tool with name 'multiply'
completion.settings["toolChoice"] = {"type": "tool_name", "name": "multiply"}
Knowledge Banks (KB)#
List and get KBs#
To list the KB present in a project:
import dataiku
client = dataiku.api_client()
project = client.get_default_project()
kb_list = project.list_knowledge_banks()
By default, list_knowledge_banks()
returns a list of DSSKnowledgeBankListItem
.
To get more details:
for kb in kb_list:
print(f"{kb.name} (id: {kb.id})")
To get a “core handle” on the KB (i.e. to retrieve a KnowledgeBank
object) :
KB_ID = "" # Fill with your KB id
kb_public_api = project.get_knowledge_bank(KB_ID)
kb_core = kb_public_api.as_core_knowledge_bank()
LangChain integration#
Dataiku LLM model objects can be turned into langchain-compatible objects, making it easy to:
stream responses
run asynchronous queries
batch queries
chain several models and adapters
integrate with the wider langchain ecosystem
Transforming LLM handles to LangChain model#
# In this sample, llm is the result of calling project.get_llm() (see above)
# Turn a regular LLM handle into a langchain-compatible one
langchain_llm = llm.as_langchain_llm()
# Run a single completion query
langchain_llm.invoke("Write a haiku on GPT models")
# Run a batch of completion queries
langchain_llm.batch(["Write a haiku on GPT models", "Write a haiku on GPT models in German"])
# Run a completion query and stream the response
for chunk in langchain_llm.stream("Write a haiku on GPT models"):
print(chunk, end="", flush=True)
See the langchain documentation for more details.
You can also turn it into a langchain “chat model”, a specific type of LLM geared towards conversation:
# In this sample, llm is the result of calling project.get_llm() (see above)
# Turn a regular LLM handle into a langchain-compatible one
langchain_llm = llm.as_langchain_chat_model()
# Run a simple query
langchain_llm.invoke("Write a haiku on GPT models")
# Run a chat query
from langchain_core.messages import HumanMessage, SystemMessage
messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(content="What is the purpose of model regularization?"),
]
langchain_llm.invoke(messages)
# Streaming and chaining
from langchain.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt | langchain_llm
for chunk in chain.stream({"topic": "parrot"}):
print(chunk.content, end="", flush=True)
See the langchain documentation for more details.
Creating Langchain models directly#
If running from inside DSS, you can also directly create the Langchain model:
from dataiku.langchain.dku_llm import DKULLM, DKUChatLLM
langchain_llm = DKUChatLLM(llm_id="your llm id") # For example: openai:myopenai:gpt-4o
Response streaming#
LangChain adapters (DKULLM and DKUChatLLM) also support streaming of answer:
from dataiku.langchain.dku_llm import DKULLM, DKUChatLLM
from langchain_core.messages import HumanMessage, SystemMessage
langchain_llm = DKUChatLLM(llm_id="your llm id") # For example: openai:myopenai:gpt-4o
messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(content="What is the purpose of model regularization?"),
]
for gen in langchain_llm.stream(messages):
print(gen)
Using knowledge banks as LangChain retrievers#
Core handles allow users to leverage the Langchain library and, through it:
query the KB for semantic similarity search
combine the KB with an LLM to form a chain and perform complex workflows such as retrieval-augmented generation (RAG).
In practice, core handles expose KBs as a Langchain-native vector store through two different methods:
as_langchain_retriever()
returns a genericVectorStoreRetriever
objectas_langchain_vectorstore()
returns an object whose class corresponds to the KB type. For example, for a FAISS-based KB, you will get alangchain.vectorstores.faiss.FAISS
object.
import dataiku
client = dataiku.api_client()
project = client.get_default_project()
kb_core = project.get_knowledge_bank(KB_ID).as_core_knowledge_bank()
# Return a langchain.vectorstores.base.VectorStoreRetriever
lc_generic_vs= kb_core.as_langchain_retriever()
# Return an object which type depends on the KB type
lc_vs = kb_core.as_langchain_vectorstore()
# [...] Move forward with similarity search or RAG
Using tool calls#
The LangChain chat model adapter supports tool calling, assuming that the underlying LLM supports it too.
import dataiku
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
# Define tools
@tool
def add(a: int, b: int) -> int:
"""Adds a and b."""
return a + b
@tool
def multiply(a: int, b: int) -> int:
"""Multiplies a and b."""
return a * b
tools_by_name = {"add": add, "multiply": multiply}
tools = [add, multiply]
tool_choice = {"type": "auto"}
# Get the LangChain chat model, bind it to the tools
client = dataiku.api_client()
project = client.get_default_project()
llm_id = "<your llm id>" # For example: "openai:myopenai:gpt-4o"
llm = project.get_llm(llm_id).as_langchain_chat_model()
llm_with_tools = llm.bind_tools(tools, tool_choice=tool_choice)
# Ask your question
messages = [HumanMessage("What is 3 * 12? and 6 + 4?")]
ai_msg = llm_with_tools.invoke(messages)
messages.append(ai_msg)
# Retrieve tool calls, run them and put the results in the chat messages
for tool_call in ai_msg.tool_calls:
tool_name = tool_call["name"]
selected_tool = tools_by_name[tool_name]
tool_msg = selected_tool.invoke(tool_call)
messages.append(tool_msg)
# Get the final response
ai_msg = llm_with_tools.invoke(messages)
ai_msg.content
# '3 * 12 is 36, and 6 + 4 is 10.'
Fine-tuning#
Create a Fine-tuned LLM Saved Model version#
Note
This API capability is experimental.
Visual model fine-tuning is also available to customers with the Advanced LLM Mesh add-on.
With a Python recipe or notebook, it is possible to fine-tune an LLM from the
HuggingFace Hub and save it as a Fine-tuned LLM Saved Model version.
This is done with the create_finetuned_llm_version()
method, which takes an LLM Mesh connection name as input.
Settings on this connection like usage permission, guardrails, code environment, or
container configuration, will apply at inference time.
The above method must be called on an existing Saved Model. Create one
either programmatically (if you are in a notebook and don’t have one yet) with
create_finetuned_llm_saved_model()
or visually from the Saved Models list via +New Saved Model > Fine-tuned LLM
(if you want to do this in a python recipe, its output Saved Model must exist to create the recipe).
Here we fine-tune using several open-source frameworks from HuggingFace: transformers, trl & peft.
Attention
Note that fine-tuning a local LLM requires significant computational resources (GPU). The code samples below show state-of-the-art techniques to optimize memory usage and processing time, but this depends on your setup and might not always work. Also, beware that the size of your training (and optionally validation) dataset(s) greatly impacts the memory use and storage during fine-tuning.
One can fine-tune a smaller LLM with a small GPU available. Phi3 Mini is a good example, with “only” 3.8B parameters.
There are many techniques available to reduce memory usage and speed up computation. One of them is called Low-Rank Adaptation. It consists in freezing the weights from the base model and adding new, trainable matrices to the Transformer architecture. It drastically reduces the number of trainable parameters and, hence, the GPU memory requirement.
import datasets
from peft import LoraConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTConfig, SFTTrainer
from dataiku import recipe
from dataiku.llm.finetuning import formatters
base_model_name = "microsoft/Phi-3-mini-4k-instruct"
assert base_model_name, ("please specify a base LLM, it must be available"
" on HuggingFace hub")
connection_name = "a_huggingface_connection_name"
assert connection_name, ("please specify a connection name, the fine-tuned "
"LLM will be available from this connection")
##################
# Initial setup
##################
# Here, we're assuming that your training dataset is composed of 2 columns:
# the input (user message) and expected output (assistant message).
# If using a validation dataset, format should be the same.
user_message_column = "input"
assistant_message_column = "output"
columns = [user_message_column, assistant_message_column]
system_message_column = "" # optional
static_system_message = "" # optional
if system_message_column:
columns.append(system_message_column)
# Turn Dataiku datasets into SFTTrainer datasets.
training_dataset = recipe.get_inputs()[0]
df = training_dataset.get_dataframe(columns=columns)
train_dataset = datasets.Dataset.from_pandas(df)
validation_dataset = None
eval_dataset = None
if len(recipe.get_inputs()) > 1:
validation_dataset = recipe.get_inputs()[1]
df = validation_dataset.get_dataframe(columns=columns)
eval_dataset = datasets.Dataset.from_pandas(df)
saved_model = recipe.get_outputs()[0]
##################
# Model loading
##################
model = AutoModelForCausalLM.from_pretrained(base_model_name)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# It is mandatory to define a formatting function for fine-tuning,
# because ultimately, the model is fed with only one string:
# the concatenation of your input columns, in a specific format.
# Here, we leverage the apply_chat_template method, which depends on
# the tokenizer. For more information, see
# https://huggingface.co/docs/transformers/v4.43.3/chat_templating
formatting_func = formatters.ConversationalPromptFormatter(tokenizer.apply_chat_template,
*columns)
##################
# Fine-tune using SFTTrainer
##################
with saved_model.create_finetuned_llm_version(connection_name) as finetuned_llm_version:
# feel free to customize, the only requirement is for a transformers model
# to be created in finetuned_model_version.working_directory
# TRL package offers many possibilities to configure the training job.
# For the full list,
# see https://huggingface.co/docs/transformers/v4.43.3/en/main_classes/trainer#transformers.TrainingArguments
train_conf = SFTConfig(
output_dir=finetuned_llm_version.working_directory,
save_safetensors=True,
gradient_checkpointing=True,
num_train_epochs=1,
logging_steps=5,
eval_strategy="steps" if eval_dataset else "no",
)
# LoRA is one of the most popular adapter-based methods to reduce memory-usage
# and speed up fine-tuning
peft_conf = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
task_type="CAUSAL_LM",
target_modules="all-linear",
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
formatting_func=formatting_func,
args=train_conf,
peft_config=peft_conf,
)
trainer.train()
trainer.save_model()
# Finally, we are logging training information to the Saved Model version
config = finetuned_llm_version.config
config["trainingDataset"] = training_dataset.short_name
if validation_dataset:
config["validationDataset"] = validation_dataset.short_name
config["userMessageColumn"] = user_message_column
config["assistantMessageColumn"] = assistant_message_column
config["systemMessageColumn"] = system_message_column
config["staticSystemMessage"] = static_system_message
config["batchSize"] = trainer.state.train_batch_size
config["eventLog"] = trainer.state.log_history
It is also possible to fine-tune larger models, for instance, Mistral 7B. In that case, quantization can help further reducing the memory footprint. A paper called QLoRA shows how the LoRA technique can efficiently fine-tune quantized LLMs while limiting the performance loss.
import datasets
import torch
from peft import LoraConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from trl import SFTConfig, SFTTrainer
from dataiku import recipe
from dataiku.llm.finetuning import formatters
base_model_name = "mistralai/Mistral-7B-Instruct-v0.2"
assert base_model_name, ("please specify a base LLM, it must be available"
" on HuggingFace hub")
connection_name = "a_huggingface_connection_name"
assert connection_name, ("please specify a connection name, the fine-tuned"
" LLM will be available from this connection")
##################
# Initial setup
##################
# Here, we're assuming that your training dataset is composed of 2 columns:
# the input (user message) and expected output (assistant message).
# If using a validation dataset, format should be the same.
user_message_column = "input"
assistant_message_column = "output"
columns = [user_message_column, assistant_message_column]
system_message_column = "" # optional
static_system_message = "" # optional
if system_message_column:
columns.append(system_message_column)
# Turn Dataiku datasets into SFTTrainer datasets.
training_dataset = recipe.get_inputs()[0]
df = training_dataset.get_dataframe(columns=columns)
train_dataset = datasets.Dataset.from_pandas(df)
validation_dataset = None
eval_dataset = None
if len(recipe.get_inputs()) > 1:
validation_dataset = recipe.get_inputs()[1]
df = validation_dataset.get_dataframe(columns=columns)
eval_dataset = datasets.Dataset.from_pandas(df)
saved_model = recipe.get_outputs()[0]
##################
# Model loading
##################
# Here, we are quantizing the Mistral model. It means that the weights
# are represented with lower-precision data types (like "Normal Float 4"
# from the [QLoRA paper](https://arxiv.org/pdf/2305.14314)) to optimize
# memory usage.
# We also change the data type used for matrix multiplication to speed
# up compute.
# One can of course use double (/nested) quantization, but with inevitable
# important precision loss.
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=False,
)
model = AutoModelForCausalLM.from_pretrained(base_model_name,
quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
tokenizer.pad_token = tokenizer.unk_token
tokenizer.padding_side = "right"
# It is mandatory to define a formatting function for fine-tuning,
# because ultimately, the model is fed with only one string:
# the concatenation of your input columns, in a specific format.
# Here, we leverage the apply_chat_template method, which depends
# on the tokenizer. For more information,
# see https://huggingface.co/docs/transformers/v4.43.3/chat_templating
formatting_func = formatters.ConversationalPromptFormatter(tokenizer.apply_chat_template,
*columns)
##################
# Fine-tune using SFTTrainer
##################
with saved_model.create_finetuned_llm_version(connection_name) as finetuned_llm_version:
# feel free to customize, the only requirement is for a transformers model
# to be created in finetuned_model_version.working_directory
# TRL package offers many possibilities to configure the training job.
# For the full list, see
# https://huggingface.co/docs/transformers/v4.43.3/en/main_classes/trainer#transformers.TrainingArguments
train_conf = SFTConfig(
output_dir=finetuned_llm_version.working_directory,
save_safetensors=True,
gradient_checkpointing=True,
num_train_epochs=1,
logging_steps=5,
eval_strategy="steps" if eval_dataset else "no",
)
# LoRA is one of the most popular adapter-based methods to reduce memory-usage
# and speed up fine-tuning
peft_conf = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
task_type="CAUSAL_LM",
target_modules="all-linear",
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
formatting_func=formatting_func,
args=train_conf,
peft_config=peft_conf,
)
trainer.train()
trainer.save_model()
# Finally, we are logging training information to the Saved Model version
config = finetuned_llm_version.config
config["trainingDataset"] = training_dataset.short_name
if validation_dataset:
config["validationDataset"] = validation_dataset.short_name
config["userMessageColumn"] = user_message_column
config["assistantMessageColumn"] = assistant_message_column
config["systemMessageColumn"] = system_message_column
config["staticSystemMessage"] = static_system_message
config["batchSize"] = trainer.state.train_batch_size
config["eventLog"] = trainer.state.log_history
In these examples, we used popular techniques to optimize memory usage and processing time, like LoRA, quantization or gradient checkpointing. Note that the research and open source community is constantly coming up with new ways to make fine-tuning more accessible, while trying to avoid too much performance loss. For more information on other techniques you could try, see for instance the Transformers
or PEFT
documentations.
OpenAI-compatible API#
The OpenAI-compatible API provides an easy way to query the LLM Mesh as it is built on top of the LLM Mesh API and implements the most used parts of OpenAI’s API for text completion.
The OpenAI-compatible API allows you to send chat completion queries to all LLMs supported by the LLM Mesh, using a standard OpenAI format. This includes, for models that support it:
Streamed chat completion responses
Multimodal inputs (image and text)
Tool calls
JSON output mode
Attention
Some arguments from the OpenAI’s API reference are not supported.
Chat completion request:
n
response_format
seed
service_tier
parallel_tool_calls
user
function_call (deprecated)
functions (deprecated)
Chat completion response:
choices.message.refusal
choices.logprobs.refusal
created
service_tier
system_fingerprint
usage.completion_tokens_details
Your first OpenAI completion query#
from openai import OpenAI
# Specify the DSS OpenAI-compatible public API URL, e.g. http://my.dss/public/api/projects/PROJECT_KEY/llms/openai/v1/
BASE_URL = ""
# Fill with your DSS API Key
API_KEY = ""
# Fill with your LLM id. For example, if you have a HuggingFace connection called "myhf", LLM_ID can be "huggingfacelocal:myhf:meta-llama/Meta-Llama-3.1-8B-Instruct:TEXT_GENERATION_LLAMA_2:promptDriven=true"
# To get the list of LLM ids, you can use openai_client.models.list() or project.list_llms() through the dataiku client
LLM_ID = ""
# Create an OpenAI client
openai_client = OpenAI(
base_url=BASE_URL,
api_key=API_KEY
)
resp = openai_client.chat.completions.create(
model=LLM_ID,
messages=[{"role": "user", "content": "Write a haiku on GPT models" }],
)
if resp and resp.choices:
print(resp.choices[0].message.content)
# GPT, a marvel,
# Deep learning's symphony plays,
# Thoughts dance, words unveil.
from openai import OpenAI
# Specify the DSS OpenAI-compatible public API URL, e.g. http://my.dss/public/api/projects/PROJECT_KEY/llms/openai/v1/
BASE_URL = ""
# Fill with your DSS API Key
API_KEY = ""
# Fill with your LLM id. For example, if you have a HuggingFace connection called "myhf", LLM_ID can be "huggingfacelocal:myhf:meta-llama/Meta-Llama-3.1-8B-Instruct:TEXT_GENERATION_LLAMA_2:promptDriven=true"
# To get the list of LLM ids, you can use openai_client.models.list() or project.list_llms() through the dataiku client
LLM_ID = ""
# Create an OpenAI client
openai_client = OpenAI(
base_url=BASE_URL,
api_key=API_KEY
)
resp = openai_client.chat.completions.create(
model=LLM_ID,
messages=[{"role": "user", "content": "Write a haiku on GPT models" }],
stream=True
)
for chunk in resp:
if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content)
# Words
# weave
# through
# the
# code
# ,
#
#
# Silent
# thoughts
# brought
# into
# light
# ,
# M
# inds
# connect
# in
# spark
# .
Reference documentation#
|
A handle to interact with a DSS-managed LLM. |
|
An item in a list of llms |
A handle to interact with a completion query. |
|
A handle to interact with a multi-completion query. |
|
|
|
A handle to interact with a completion response. |
|
A handle to interact with an embedding query. |
|
A handle to interact with an embedding query result. |
|
An item in a list of knowledege banks |
|
A handle to interact with a DSS-managed knowledge bank. |
|
|
Langchain-compatible wrapper around Dataiku-mediated chat LLMs |
|
Langchain-compatible wrapper around Dataiku-mediated LLMs |
Langchain-compatible wrapper around Dataiku-mediated embedding LLMs |