LLM Mesh#
For usage information and examples, please see LLM Mesh
- class dataikuapi.dss.llm.DSSLLM(client, project_key, llm_id)#
A handle to interact with a DSS-managed LLM.
Important
Do not create this class directly, use
dataikuapi.dss.project.DSSProject.get_llm()instead.- new_completion()#
Create a new completion query.
- Returns:
A handle on the generated completion query.
- Return type:
- new_completions()#
Create a new multi-completion query.
- Returns:
A handle on the generated multi-completion query.
- Return type:
- new_embeddings(text_overflow_mode='FAIL')#
Create a new embedding query.
- Parameters:
text_overflow_mode (str) – How to handle longer texts than what the model supports. Either ‘TRUNCATE’ or ‘FAIL’.
- Returns:
A handle on the generated embeddings query.
- Return type:
- new_images_generation()#
- new_reranking()#
Create a new reranking query.
- Returns:
A handle on the generated reranking query.
- Return type:
- as_langchain_llm(**data)#
Create a langchain-compatible LLM object for this LLM.
- Returns:
A langchain-compatible LLM object.
- Return type:
- as_langchain_chat_model(**data)#
Create a langchain-compatible chat LLM object for this LLM.
- Returns:
A langchain-compatible LLM object.
- Return type:
- as_langchain_embeddings(**data)#
Create a langchain-compatible embeddings object for this LLM.
- Returns:
A langchain-compatible embeddings object.
- Return type:
- class dataikuapi.dss.llm.DSSLLMListItem(client, project_key, data)#
An item in a list of llms
Important
Do not instantiate this class directly, instead use
dataikuapi.dss.project.DSSProject.list_llms().- to_llm()#
Convert the current item.
- Returns:
A handle for the llm.
- Return type:
- property id#
- Returns:
The id of the llm.
- Return type:
string
- property type#
- Returns:
The type of the LLM
- Return type:
string
- property description#
- Returns:
The description of the LLM
- Return type:
string
- class dataikuapi.dss.llm.DSSLLMCompletionsQuery(llm)#
A handle to interact with a multi-completion query. Completion queries allow you to send a prompt to a DSS-managed LLM and retrieve its response.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLM.new_completion()instead.- property settings#
- Returns:
The completion query settings.
- Return type:
dict
- new_completion()#
- new_guardrail(type)#
Start adding a guardrail to the request. You need to configure the returned object, and call add() to actually add it
- execute()#
Run the completions query and retrieve the LLM response.
- Returns:
The LLM response.
- Return type:
- with_json_output(schema=None, strict=None, compatible=None)#
Request the model to generate a valid JSON response, for models that support it.
Note that some models may require you to also explicitly request this in the user or system prompt to use this.
Caution
JSON output support is experimental for locally-running Hugging Face models.
- Parameters:
schema (dict) – (optional) If specified, request the model to produce a JSON response that adheres to the provided schema. Support varies across models/providers.
strict (bool) – (optional) If a schema is provided, whether to strictly enforce it. Support varies across models/providers.
compatible (bool) – (optional) Allow DSS to modify the schema in order to increase compatibility, depending on known limitations of the model/provider. Defaults to automatic.
- with_structured_output(model_type, strict=None, compatible=None)#
Instruct the model to generate a response as an instance of a specified Pydantic model.
This functionality depends on with_json_output and necessitates that the model supports JSON output with a schema.
Caution
Structured output support is experimental for locally-running Hugging Face models.
- Parameters:
model_type (pydantic.BaseModel) – A Pydantic model class used for structuring the response.
strict (bool) – (optional) see
with_json_output()compatible (bool) – (optional) see
with_json_output()
- class dataikuapi.dss.llm.DSSLLMCompletionsQuerySingleQuery#
- new_multipart_message(role='user')#
Start adding a multipart-message to the completion query.
Use this to add image parts to the message.
- Parameters:
role (str) – The message role. Use
systemto set the LLM behavior,assistantto store predefined responses,userto provide requests or comments for the LLM to answer to. Defaults touser.- Return type:
- with_message(message, role='user')#
Add a message to the completion query.
- Parameters:
message (str) – The message text.
role (str) – The message role. Use
systemto set the LLM behavior,assistantto store predefined responses,userto provide requests or comments for the LLM to answer to. Defaults touser.
- with_memory_fragment(memory_fragment)#
Add a memory fragment to the completion query.
- Parameters:
memory_fragment (dict) – The memory fragment returned by the model on the previous turn.
- with_tool_calls(tool_calls, role='assistant')#
Add tool calls to the completion query.
Caution
Tool calls support is experimental for locally-running Hugging Face models.
- Parameters:
tool_calls (list[dict]) – Calls to tools that the LLM requested to use.
role (str) – The message role. Defaults to
assistant.
- with_tool_validation_requests(tool_validation_requests)#
Add tool validation requests to the completion query.
- Parameters:
tool_validation_requests (list[dict]) – Validation requests for tools that the agent requested to use.
- with_tool_validation_response(validation_request_id, validated, arguments=None)#
Add a tool validation response to the completion query.
- Parameters:
validation_request_id (str) – The validation request id, as provided by the agent in the conversation messages.
validated (bool) – Whether to validate or reject the tool call.
arguments (str) – (optional) The arguments to use for the tool call. If None, uses the arguments from the validation request.
- new_multipart_tool_output(tool_call_id, role='tool', output='')#
Start adding a multipart tool output to the completion query.
- Parameters:
tool_call_id (str) – The tool call id, as provided by the LLM in the conversation messages.
role (str) – The message role. Defaults to
tool.output (str) – The tool’s output. Defaults to an empty string.
- Return type:
- with_tool_output(tool_output, tool_call_id, role='tool')#
Add a tool message to the completion query.
- Parameters:
tool_output (str) – The tool output, as a string.
tool_call_id (str) – The tool call id, as provided by the LLM in the conversation messages.
role (str) – The message role. Defaults to
tool.
- with_context(context)#
- class dataikuapi.dss.llm.DSSLLMCompletionsResponse(raw_resp, response_parser=None)#
A handle to interact with a multi-completion response.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLMCompletionsQuery.execute()instead.- property responses#
The array of responses
- class dataikuapi.dss.llm.DSSLLMRerankingQuery(llm)#
A handle to interact with a reranking query. Reranking queries allow you to send a text query and a list of documents to a DSS-managed ranking model and retrieve the documents ranked according to their relevance to the query.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLM.new_reranking()instead.- with_query(text)#
Sets the reranking text query.
- Parameters:
text (str) – The reranking text query.
- with_document(text)#
Adds a text document to the list of documents to be reranked.
- Parameters:
text (str) – The text document to be reranked.
- execute()#
Run the reranking query and retrieve the LLM response.
- Returns:
The LLM response.
- Return type:
- class dataikuapi.dss.llm.DSSLLMRerankingResponse(raw_resp)#
A handle to interact with a ranking query result.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLMRerankingQuery.execute()instead.- property success#
- Returns:
The outcome of the reranking query.
- Return type:
bool
- property error_message#
- Returns:
The error message if the reranking query failed, None otherwise.
- Return type:
Union[str, None]
- property documents#
- Returns:
The array of reranked documents.
- Return type:
- property trace#
- Returns:
The trace of the reranking query if available, None otherwise.
- Return type:
Union[dict, None]
- class dataikuapi.dss.llm.DSSLLMCompletionQuery(llm)#
A handle to interact with a completion query. Completion queries allow you to send a prompt to a DSS-managed LLM and retrieve its response.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLM.new_completion()instead.- property settings#
- Returns:
The completion query settings.
- Return type:
dict
- new_guardrail(type)#
Start adding a guardrail to the request. You need to configure the returned object, and call add() to actually add it
- execute()#
Run the completion query and retrieve the LLM response.
- Returns:
The LLM response.
- Return type:
- execute_streamed()#
Run the completion query and retrieve the LLM response as streamed chunks.
- Returns:
An iterator over the LLM response chunks
- Return type:
Iterator[Union[
DSSLLMStreamedCompletionChunk,DSSLLMStreamedCompletionFooter]]
- new_multipart_message(role='user')#
Start adding a multipart-message to the completion query.
Use this to add image parts to the message.
- Parameters:
role (str) – The message role. Use
systemto set the LLM behavior,assistantto store predefined responses,userto provide requests or comments for the LLM to answer to. Defaults touser.- Return type:
- new_multipart_tool_output(tool_call_id, role='tool', output='')#
Start adding a multipart tool output to the completion query.
- Parameters:
tool_call_id (str) – The tool call id, as provided by the LLM in the conversation messages.
role (str) – The message role. Defaults to
tool.output (str) – The tool’s output. Defaults to an empty string.
- Return type:
- with_context(context)#
- with_json_output(schema=None, strict=None, compatible=None)#
Request the model to generate a valid JSON response, for models that support it.
Note that some models may require you to also explicitly request this in the user or system prompt to use this.
Caution
JSON output support is experimental for locally-running Hugging Face models.
- Parameters:
schema (dict) – (optional) If specified, request the model to produce a JSON response that adheres to the provided schema. Support varies across models/providers.
strict (bool) – (optional) If a schema is provided, whether to strictly enforce it. Support varies across models/providers.
compatible (bool) – (optional) Allow DSS to modify the schema in order to increase compatibility, depending on known limitations of the model/provider. Defaults to automatic.
- with_memory_fragment(memory_fragment)#
Add a memory fragment to the completion query.
- Parameters:
memory_fragment (dict) – The memory fragment returned by the model on the previous turn.
- with_message(message, role='user')#
Add a message to the completion query.
- Parameters:
message (str) – The message text.
role (str) – The message role. Use
systemto set the LLM behavior,assistantto store predefined responses,userto provide requests or comments for the LLM to answer to. Defaults touser.
- with_structured_output(model_type, strict=None, compatible=None)#
Instruct the model to generate a response as an instance of a specified Pydantic model.
This functionality depends on with_json_output and necessitates that the model supports JSON output with a schema.
Caution
Structured output support is experimental for locally-running Hugging Face models.
- Parameters:
model_type (pydantic.BaseModel) – A Pydantic model class used for structuring the response.
strict (bool) – (optional) see
with_json_output()compatible (bool) – (optional) see
with_json_output()
- with_tool_calls(tool_calls, role='assistant')#
Add tool calls to the completion query.
Caution
Tool calls support is experimental for locally-running Hugging Face models.
- Parameters:
tool_calls (list[dict]) – Calls to tools that the LLM requested to use.
role (str) – The message role. Defaults to
assistant.
- with_tool_output(tool_output, tool_call_id, role='tool')#
Add a tool message to the completion query.
- Parameters:
tool_output (str) – The tool output, as a string.
tool_call_id (str) – The tool call id, as provided by the LLM in the conversation messages.
role (str) – The message role. Defaults to
tool.
- with_tool_validation_requests(tool_validation_requests)#
Add tool validation requests to the completion query.
- Parameters:
tool_validation_requests (list[dict]) – Validation requests for tools that the agent requested to use.
- with_tool_validation_response(validation_request_id, validated, arguments=None)#
Add a tool validation response to the completion query.
- Parameters:
validation_request_id (str) – The validation request id, as provided by the agent in the conversation messages.
validated (bool) – Whether to validate or reject the tool call.
arguments (str) – (optional) The arguments to use for the tool call. If None, uses the arguments from the validation request.
- class dataikuapi.dss.llm.DSSLLMCompletionQueryMultipartMessage(q, role)#
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLMCompletionQuery.new_multipart_message()ordataikuapi.dss.llm.DSSLLMCompletionsQuerySingleQuery.new_multipart_message().- add()#
Add this message to the completion query
- class dataikuapi.dss.llm.DSSLLMCompletionQueryMultipartToolOutput(q, tool_call_id, role, output)#
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLMCompletionQuery.new_multipart_tool_output()ordataikuapi.dss.llm.DSSLLMCompletionsQuerySingleQuery.new_multipart_tool_output().- add()#
Add this tool output to the completion query
- class dataikuapi.dss.llm.DSSLLMCompletionResponse(raw_resp=None, text=None, finish_reason=None, response_parser=None, trace=None)#
Response to a completion
- property json#
- Returns:
LLM response parsed as a JSON object
- property parsed#
- property success#
- Returns:
The outcome of the completion query.
- Return type:
bool
- property text#
- Returns:
The raw text of the LLM response.
- Return type:
Union[str, None]
- property tool_calls#
- Returns:
The tool calls of the LLM response.
- Return type:
Union[list, None]
- property tool_validation_requests#
- Returns:
The tool validation requests of the agent response.
- Return type:
Union[list, None]
- property memory_fragment#
- Returns:
Data generated by the model that must be passed back in the next query.
- Return type:
Union[dict, None]
- property log_probs#
- Returns:
The log probs of the LLM response.
- Return type:
Union[list, None]
- property trace#
- property total_usage#
- class dataikuapi.dss.llm.DSSLLMEmbeddingsQuery(llm, text_overflow_mode)#
A handle to interact with an embedding query. Embedding queries allow you to transform text into embedding vectors using a DSS-managed model.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLM.new_embeddings()instead.- add_text(text)#
Add text to the embedding query.
- Parameters:
text (str) – Text to add to the query.
- add_image(image, text=None)#
Add an image to the embedding query.
- Parameters:
image – Image content as bytes or str (base64)
text – Optional text (requires a multimodal model)
- new_guardrail(type)#
Start adding a guardrail to the request. You need to configure the returned object, and call add() to actually add it
- execute()#
Run the embedding query.
- Returns:
The results of the embedding query.
- Return type:
- class dataikuapi.dss.llm.DSSLLMEmbeddingsResponse(raw_resp)#
A handle to interact with an embedding query result.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLMEmbeddingsQuery.execute()instead.- get_embeddings()#
Retrieve vectors resulting from the embeddings query.
- Returns:
A list of lists containing all embedding vectors.
- Return type:
list
- class dataikuapi.dss.llm.DSSLLMImageGenerationQuery(llm)#
A handle to interact with an image generation query.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLM.new_images_generation()instead.- with_prompt(prompt, weight=None)#
Add a prompt to the image generation query.
- Parameters:
prompt (str) – The prompt text.
weight (float) – Optional weight between 0 and 1 for the prompt.
- with_negative_prompt(prompt, weight=None)#
Add a negative prompt to the image generation query.
- Parameters:
prompt (str) – The prompt text.
weight (float) – Optional weight between 0 and 1 for the negative prompt.
- with_original_image(image, mode=None, weight=None)#
Add an image to the generation query.
To edit specific pixels of the original image. A mask can be applied by calling with_mask():
>>> query.with_original_image(image, mode="INPAINTING") # replace the pixels using a mask
To edit an image:
>>> query.with_original_image(image, mode="MASK_FREE") # edit the original image according to the prompt
>>> query.with_original_image(image, mode="VARY") # generates a variation of the original image
- Parameters:
image (Union[str, bytes]) – The original image as str in base 64 or bytes.
mode (str) – The edition mode. Modes support varies across models/providers.
weight (float) – The original image weight between 0 and 1.
- with_mask(mode, image=None)#
Add a mask for edition to the generation query. Call this method alongside with_original_image().
To edit parts of the image using a black mask (replace the black pixels):
>>> query.with_mask("MASK_IMAGE_BLACK", image=black_mask)
To edit parts of the image that are transparent (replace the transparent pixels):
>>> query.with_mask("ORIGINAL_IMAGE_ALPHA")
- Parameters:
mode (str) – The mask mode. Modes support varies across models/providers.
image (Union[str, bytes]) – The mask image to apply to the image edition. As str in base 64 or bytes.
- new_guardrail(type)#
Start adding a guardrail to the request. You need to configure the returned object, and call add() to actually add it
- property height#
- Returns:
The generated image height in pixels.
- Return type:
Optional[int]
- property width#
- Returns:
The generated image width in pixels.
- Return type:
Optional[int]
- property fidelity#
- Returns:
From 0.0 to 1.0, how strongly to adhere to prompt.
- Return type:
Optional[float]
- property quality#
- Returns:
Quality of the image to generate. Valid values depend on the targeted model.
- Return type:
Optional[str]
- property seed#
- Returns:
Seed of the image to generate, gives deterministic results when set.
- Return type:
Optional[int]
- property style#
- Returns:
Style of the image to generate. Valid values depend on the targeted model.
- Return type:
Optional[str]
- property images_to_generate#
- Returns:
Number of images to generate per query. Valid values depend on the targeted model.
- Return type:
Optional[int]
- property aspect_ratio#
- Returns:
The width/height aspect ratio or None if either is not set.
- Return type:
Optional[float]
- execute()#
Executes the image generation
- Return type:
- class dataikuapi.dss.llm.DSSLLMImageGenerationResponse(raw_resp)#
A handle to interact with an image generation response.
Important
Do not create this class directly, use
dataikuapi.dss.llm.DSSLLMImageGenerationQuery.execute()instead.- property success#
- Returns:
The outcome of the image generation query.
- Return type:
bool
- first_image(as_type='bytes')#
- Parameters:
as_type (str) – The type of image to return, ‘bytes’ for bytes otherwise ‘str’ for base 64 str.
- Returns:
The first generated image as bytes or str depending on the as_type parameter.
- Return type:
Union[bytes,str]
- get_images(as_type='bytes')#
- Parameters:
as_type (str) – The type of images to return, ‘bytes’ for bytes otherwise ‘str’ for base 64 str.
- Returns:
The generated images as bytes or str depending on the as_type parameter.
- Return type:
Union[List[bytes], List[str]]
- property images#
- Returns:
The generated images in bytes format.
- Return type:
List[bytes]
- property trace#
- property total_usage#
- class dataikuapi.dss.knowledgebank.DSSKnowledgeBankListItem(client, data)#
An item in a list of knowledge banks
Important
Do not instantiate this class directly, instead use
dataikuapi.dss.project.DSSProject.list_knowledge_banks().- to_knowledge_bank()#
Convert the current item.
- Returns:
A handle for the knowledge_bank.
- Return type:
- as_core_knowledge_bank()#
Get the
dataiku.KnowledgeBankobject corresponding to this knowledge bank- Return type:
- property project_key#
- Returns:
The project key
- Return type:
string
- property id#
- Returns:
The id of the knowledge bank.
- Return type:
string
- property name#
- Returns:
The name of the knowledge bank.
- Return type:
string
- class dataikuapi.dss.knowledgebank.DSSKnowledgeBank(client, project_key, id)#
A handle to interact with a DSS-managed knowledge bank.
Important
Do not create this class directly, use
dataikuapi.dss.project.DSSProject.get_knowledge_bank()instead.- property id#
- as_core_knowledge_bank()#
Get the
dataiku.KnowledgeBankobject corresponding to this knowledge bank- Return type:
- as_langchain_retriever(**data)#
Get the current version of this knowledge bank as a Langchain Retriever object.
- Parameters:
data (dict) – keyword arguments to pass to the
dataikuapi.dss.knowledgebank.DSSKnowledgeBank.search()function- Returns:
a langchain-compatible retriever
- Return type:
dataikuapi.dss.langchain.knowledge_bank.DKUKnowledgeBankRetriever
- get_settings()#
Get the knowledge bank’s definition
- Returns:
a handle on the knowledge bank definition
- Return type:
- delete()#
Delete the knowledge bank
- build(job_type='NON_RECURSIVE_FORCED_BUILD', wait=True)#
Start a new job to build this knowledge bank and wait for it to complete. Raises if the job failed.
job = knowledge_bank.build() print("Job %s done" % job.id)
- Parameters:
job_type – the job type. One of RECURSIVE_BUILD, NON_RECURSIVE_FORCED_BUILD or RECURSIVE_FORCED_BUILD
wait (bool) – whether to wait for the job completion before returning the job handle, defaults to True
- Returns:
the
dataikuapi.dss.job.DSSJobjob handle corresponding to the built job- Return type:
- search(query, max_documents=10, search_type='SIMILARITY', similarity_threshold=0.5, mmr_documents_count=20, mmr_factor=0.25, hybrid_use_advanced_reranking=False, hybrid_rrf_rank_constant=60, hybrid_rrf_rank_window_size=4)#
Search for documents in a knowledge bank
MMR and HYBRID search types are not supported by every vector stores.
- Parameters:
query (str) – what to search for
max_documents (int) – the maximum number of documents to return, default to 10
search_type (str) – the search algorithm to use. One of SIMILARITY, SIMILARITY_THRESHOLD, MMR or HYBRID. Defaults to SIMILARITY
similarity_threshold (float) – only return documents with a similarity score above this threshold, typically between 0 and 1, only applied with search_type=SIMILARITY_THRESHOLD, defaults to 0.5
mmr_documents_count (int) – number of documents to consider before selecting the documents to retrieve, only applied with search_type=MMR, defaults to 20
mmr_factor (float) – weight of diversity vs relevancy, between 0 and 1, where 0 favors maximum diversity and 1 favors maximum relevancy, only applied with search_type=MMR, defaults to 0.25
hybrid_use_advanced_reranking (bool) – whether to use proprietary rerankers, valid for Azure AI and ElasticSearch vector stores, defaults to False
hybrid_rrf_rank_constant (int) – higher values give more weight to lower-ranked documents, valid for ElasticSearch vector stores, defaults to 60
hybrid_rrf_rank_window_size (int) – number of documents to consider from each search type, valid for ElasticSearch vector stores, defaults to 4
- Returns:
a result object with a list of documents that matched the query
- Return type:
- class dataikuapi.dss.knowledgebank.DSSKnowledgeBankSettings(client, project_key, settings)#
Settings for a knowledge bank
Important
Do not instantiate directly, use
dataikuapi.dss.knowledgebank.DSSKnowledgeBank.get_settings()instead- property project_key#
Returns the project key of the knowledge bank
- Return type:
str
- property id#
Returns the identifier of the knowledge bank
- Return type:
str
- property vector_store_type#
Returns the type of storage backing the vector store (could be CHROMA, PINECONE, ELASTICSEARCH, AZURE_AI_SEARCH, VERTEX_AI_GCS_BASED, FAISS, QDRANT_LOCAL)
- Return type:
str
- set_metadata_schema(schema)#
Sets the schema for metadata fields.
- Parameters:
schema (Dict[str, str]) – the schema, as a mapping metadata_field -> type
- set_images_folder(managed_folder_id, project_key=None)#
Sets the images folder to use with this knowledge bank.
- Parameters:
managed_folder_id (str) – The (managed) images folder id.
project_key (Optional[str]) – The image folder project key, if different from this knowledge bank project key. Default to None.
- get_images_folder()#
Returns the images folder of the knowledge bank, if any.
- Returns:
the managed folder or None
- Return type:
DSSManagedFolder | None
- get_raw()#
Returns the raw settings of the knowledge bank
- Returns:
the raw settings of the knowledge bank
- Return type:
dict
- save()#
Saves the settings on the knowledge bank
- class dataikuapi.dss.knowledgebank.DSSKnowledgeBankSearchResult(kb, documents)#
The result of a search in a knowledge bank, contains documents that matched the query
Each document is a
dataikuapi.dss.knowledgebank.DSSKnowledgeBankSearchResultDocument- property documents#
Returns a list of documents that matched a search query
- Returns:
a list of result documents
- Return type:
- property managed_folder_id#
- class dataikuapi.dss.knowledgebank.DSSKnowledgeBankSearchResultDocument(result, text, score, metadata)#
A document found by searching a knowledge bank with
dataikuapi.dss.knowledgebank.DSSKnowledgeBank.search()- property text#
Returns the text from the knowledge bank for this document
- Returns:
the text for this document
- Return type:
str
- property score#
Returns the match score for this document
- Returns:
the score for this document
- Return type:
float
- property metadata#
Returns metadata from the knowledge bank for this document
- Returns:
metadata for this document
- Return type:
dict
- property images#
Returns images for this document
- Returns:
a list of images references or None
- Return type:
list[ManagedFolderImageRef] | None
- property file_ref#
Returns the file reference for this document
- Returns:
a file reference or None
- Return type:
ManagedFolderDocumentRef | None
- class dataiku.KnowledgeBank(id, project_key=None, context_project_key=None)#
This is a handle to interact with a Dataiku Knowledge Bank flow object
- set_context_project_key(context_project_key)#
Set the context project key to use to report calls to the embedding LLM associated with this knowledge bank.
- Parameters:
context_project_key (str) – the context project key
- get_current_version(trusted_object: TrustedObject | None = None)#
Gets the current version for this knowledge bank.
- Parameters:
trusted_object (Optional["TrustedObject"]) – the optional trusted object using the kb
- Return type:
str
- get_writer()#
Gets a writer on the latest vector store files on disk. For local vector stores, downloads metadata files as well as data files. For remote vector stores, only downloads metadata files.
The vector store files are automatically uploaded when the context manager is closed.
Note
Each call creates an isolated writer which works on its own folder.
- as_langchain_retriever(search_type='similarity', search_kwargs=None, vectorstore_kwargs=None, **retriever_kwargs)#
Get the current version of this knowledge bank as a Langchain Retriever object.
- Return type:
langchain_core.vectorstores.VectorStoreRetriever
- as_langchain_vectorstore(**vectorstore_kwargs)#
Get the current version of this knowledge bank as a Langchain Vectorstore object.
- Return type:
langchain_core.vectorstores.VectorStore
- get_multipart_context(docs)#
Convert retrieved documents from the vector store to a multipart context. The multipart context contains the parts that can be added to a completion query
- Parameters:
docs (List[Document]) – A list of retrieved documents from the langchain retriever
- Raises:
Exception – If the knowledge bank does not contain multimodal content
- Returns:
A multipart context object composed by a list of parts containing text or images
- Return type:
MultipartContext
- class dataikuapi.dss.langchain.knowledge_bank.DKUKnowledgeBankRetriever(*args: Any, **kwargs: Any)#
Langchain-compatible retriever for a knowledge bank
Important
Do not instantiate directly, use
dataikuapi.dss.knowledgebank.DSSKnowledgeBank.as_langchain_retriever()instead- SEARCH_PARAMETERS_NAMES: ClassVar = ['max_documents', 'search_type', 'similarity_threshold', 'mmr_documents_count', 'mmr_factor', 'hybrid_use_advanced_reranking', 'hybrid_rrf_rank_constant', 'hybrid_rrf_rank_window_size']#
Valid parameter names for the search method
- class dataikuapi.dss.retrieval_augmented_llm.DSSRetrievalAugmentedLLMListItem(client, data)#
An item in a list of retrieval-augmented LLMs
Important
Do not instantiate this class directly, instead use
dataikuapi.dss.project.DSSProject.list_retrieval_augmented_llms().- property project_key#
- Returns:
The project
- Return type:
string
- property id#
- Returns:
The id of the retrieval-augmented LLM.
- Return type:
string
- property name#
- Returns:
The name of the retrieval-augmented LLM.
- Return type:
string
- as_llm()#
Returns this retrieval-augmented LLM as a usable
dataikuapi.dss.llm.DSSLLMfor querying
- class dataikuapi.dss.retrieval_augmented_llm.DSSRetrievalAugmentedLLM(client, project_key, id)#
A handle to interact with a DSS-managed retrieval-augmented LLM.
Important
Do not create this class directly, use
dataikuapi.dss.project.DSSProject.get_retrieval_augmented_llm()instead.- property id#
- as_llm()#
Returns this retrieval-augmented LLM as a usable
dataikuapi.dss.llm.DSSLLMfor querying
- get_settings()#
Get the retrieval-augmented LLM’s definition
- Returns:
a handle on the retrieval-augmented LLM definition
- Return type:
dataikuapi.dss.retrieval_augmented_llm.DSSRetrievalAugmentedLLMSettings
- delete()#
Delete the retrieval-augmented LLM
- class dataikuapi.dss.retrieval_augmented_llm.DSSRetrievalAugmentedLLMSettings(client, settings)#
Settings for a retrieval-augmented LLM
Important
Do not instantiate directly, use
dataikuapi.dss.retrieval_augmented_llm.DSSRetrievalAugmentedLLM.get_settings()instead- get_version_ids()#
- property active_version#
Returns the active version of this retrieval-augmented LLM. May return None if no version is declared as active
- get_version_settings(version_id)#
- get_raw()#
Returns the raw settings of the retrieval-augmented LLM
- Returns:
the raw settings of the retrieval-augmented LLM
- Return type:
dict
- save()#
Saves the settings on the retrieval-augmented LLM
- class dataikuapi.dss.retrieval_augmented_llm.DSSRetrievalAugmentedLLMVersionSettings(version_settings)#
- get_raw()#
- property llm_id#
Get or set the name of the Data Collection
- Return type:
str
- class dataiku.core.knowledge_bank.MultipartContext#
A reference to a list of text or images parts that can be added to a completion query
- append(part)#
- Parameters:
part (
MultipartContent) – Part of a completion query
- add_to_completion_query(completion, role='user')#
Add the accumulated parts as a new multipart-message to the completion query
- Parameters:
completion (
DSSLLMCompletionsQuerySingleQuery) – the completion query to be editedrole (str) – The message role. Use
systemto set the LLM behavior,assistantto store predefined responses,userto provide requests or comments for the LLM to answer to. Defaults touser.
- is_text_only()#
- Returns:
True if all the accumulated parts are text parts, False otherwise
- Return type:
bool
- to_text()#
- Returns:
the concatenation of accumulated text parts (other parts are skipped)
- Return type:
str
- class dataiku.core.vector_stores.data.writer.VectorStoreWriter(project_key: str, kb_full_id: str, isolated_folder: VectorStoreIsolatedFolder)#
A helper class to write vector store data to the underlying knowledge bank folder.
Important
Do not create this class directly, use
dataiku.KnowledgeBank.get_writer()- property folder_path: str#
The path to the underlying folder on the filesystem.
- clear()#
Clears the vector store data stored in the underlying folder.
- save()#
Saves the content of the underlying folder as a new knowledge bank version.
- Returns:
the created version
- Return type:
str
- as_langchain_vectorstore(**vectorstore_kwargs) VectorStore#
Gets this writer as a Langchain Vectorstore object
- Return type:
langchain_core.vectorstores.VectorStore
- get_metadata_formatter() DocumentMetadataFormatter#
Gets the metadata formatter to help writing documents to this vector store.
- Return type:
DocumentMetadataFormatter
- class dataiku.core.vector_stores.data.metadata.DocumentMetadataFormatter(project_key: str, vector_store_implementation)#
Helper class to format vector store documents metadata for usage within Dataiku.
Important
Do not create this class directly, use
VectorStoreWriter.get_metadata_formatter()instead.- with_security_tokens(security_tokens: List[str])#
Adds the security tokens in the metadata.
- Parameters:
security_tokens – The security tokens.
- with_original_document(folder_id: str, path: str, project_key: str | None = None)#
Adds the original document information in the metadata.
- Parameters:
folder_id – The id of the managed folder that contains the original document.
path – The original document path in the managed folder.
project_key – The managed folder project key. Defaults to the project key of the knowledge bank.
- with_original_document_ref(document_ref: ManagedFolderDocumentRef, project_key: str | None = None)#
Adds the original document information in the metadata.
- Parameters:
document_ref – The reference to the original document.
project_key – The managed folder project key. Defaults to the project key of the knowledge bank.
- with_original_document_page_range(page_start: int, page_end: int)#
Adds the page range in the original document. This metadata is intended to start at index 1.
- Parameters:
page_start – The original document page where the extract started. Must be positive, and lower or equal to page_end.
page_end – The original document page where the extract ended. Must be positive, and greater or equal to page_start.
- with_original_document_section_outline(section_outline: List[str])#
Adds a section outline in the metadata. Section outlines can be derived from the document extracted content. For example, it may contain the titles of the sections that contains this part of the original document, from top level headers to lower level headers.
- Parameters:
section_outline – The section outline.
- static make_captioned_images(caption: str, image_paths: List[str]) Dict[str, str | List[str]]#
Construct a captioned images dictionary from the text and image.
- Parameters:
caption – The caption .
image_paths – The paths to the images, relative to the managed folder that is configured in the knowledge bank.
- with_retrieval_content(text: str | None = None, image_paths: List[str] | None = None, captioned_images: Dict[str, str | List[str]] | None = None)#
- Adds the retrieval content in the metadata. Accepts either
text content
image paths relative to the knowledge bank images folder.
captioned images
- Parameters:
text – The text content.
image_paths – The paths to the images, relative to the managed folder that is configured in the knowledge bank.
captioned_images – Captioned images constructed using the constructed using
make_captioned_images().
- format_metadata(document: Document) Document#
Formats the metadata in the provided document, so that it can be used for retrieval in Dataiku.
- Parameters:
document – The Langchain document which metadata must be formatted.
- Returns:
The document with updated metadata.
- class dataikuapi.dss.langchain.DKULLM(*args: Any, **kwargs: Any)#
Langchain-compatible wrapper around Dataiku-mediated LLMs
Note
Direct instantiation of this class is possible from within DSS, though it’s recommended to instead use
dataikuapi.dss.llm.DSSLLM.as_langchain_llm().Example:
llm = dkullm.as_langchain_llm() # single prompt print(llm.invoke("tell me a joke")) # multiple prompts with batching for response in llm.batch(["tell me a joke in English", "tell me a joke in French"]): print(response) # streaming, with stop sequence for chunk in llm.stream("Explain photosynthesis in a few words in English then French", stop=["dioxyde de"]): print(chunk, end="", flush=True)
- llm_id: str#
LLM identifier to use
- max_tokens: int = None#
Denotes the number of tokens to predict per generation. Deprecated: use key “maxOutputTokens” in field “completion_settings”.
- temperature: float = None#
A non-negative float that tunes the degree of randomness in generation. Deprecated: use key “temperature” in field “completion_settings”.
- top_k: int = None#
Number of tokens to pick from when sampling. Deprecated: use key “topK” in field “completion_settings”.
- top_p: float = None#
Sample from the top tokens whose probabilities add up to p. Deprecated: use key “topP” in field “completion_settings”.
- completion_settings: dict = {}#
Settings applied to completion queries, all keys are optional and can include: maxOutputTokens, temperature, topK, topP, frequencyPenalty, presencePenalty, logitBias, logProbs and topLogProbs.
- class dataikuapi.dss.langchain.DKUChatModel(*args: Any, **kwargs: Any)#
Langchain-compatible wrapper around Dataiku-mediated chat LLMs
Note
Direct instantiation of this class is possible from within DSS, though it’s recommended to instead use
dataikuapi.dss.llm.DSSLLM.as_langchain_chat_model().Example:
from langchain_core.prompts import ChatPromptTemplate llm = dkullm.as_langchain_chat_model() prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}") chain = prompt | llm for chunk in chain.stream({"topic": "parrot"}): print(chunk.content, end="", flush=True)
- llm_id: str#
LLM identifier to use
- max_tokens: int = None#
Denotes the number of tokens to predict per generation. Deprecated: use key “maxOutputTokens” in field “completion_settings”.
- temperature: float = None#
A non-negative float that tunes the degree of randomness in generation. Deprecated: use key “temperature” in field “completion_settings”.
- top_k: int = None#
Number of tokens to pick from when sampling. Deprecated: use key “topK” in field “completion_settings”.
- top_p: float = None#
Sample from the top tokens whose probabilities add up to p. Deprecated: use key “topP” in field “completion_settings”.
- completion_settings: dict = {}#
Settings applied to completion queries, all keys are optional and can include: maxOutputTokens, temperature, topK, topP, frequencyPenalty, presencePenalty, logitBias, logProbs and topLogProbs.
- bind_tools(tools: Sequence[Dict[str, Any] | Type[pydantic.BaseModel] | Callable | langchain_core.tools.BaseTool], tool_choice: dict | str | Literal['auto', 'none', 'required', 'any'] | bool | None = None, strict: bool | None = None, compatible: bool | None = None, **kwargs: Any)#
Bind tool-like objects to this chat model.
- Args:
- tools: A list of tool definitions to bind to this chat model.
Can be a dictionary, pydantic model, callable, or BaseTool. Pydantic models, callables, and BaseTools will be automatically converted to their schema dictionary representation.
- tool_choice: Which tool to request the model to call.
- Options are:
name of the tool (str): call the corresponding tool;
“auto”: automatically select a tool (or no tool);
“none”: do not call a tool;
“any” or “required”: force at least one tool call;
True: call the one given tool (requires tools to be of length 1);
a dict of the form: {“type”: “tool_name”, “name”: “<<tool_name>>”}, or {“type”: “required”}, or {“type”: “any”} or {“type”: “none”}, or {“type”: “auto”};
strict: If specified, request the model to produce a JSON tool call that adheres to the provided schema. Support varies across models/providers. compatible: Allow DSS to modify the schema in order to increase compatibility, depending on known limitations of the model/provider. Defaults to automatic.
kwargs: Any additional parameters to bind.
- class dataikuapi.dss.langchain.DKUEmbeddings(*args: Any, **kwargs: Any)#
Langchain-compatible wrapper around Dataiku-mediated embedding LLMs
Note
Direct instantiation of this class is possible from within DSS, though it’s recommended to instead use
dataikuapi.dss.llm.DSSLLM.as_langchain_embeddings().- llm_id: str#
LLM identifier to use
- embed_documents(texts: List[str]) List[List[float]]#
Call out to Dataiku-mediated LLM
- Args:
texts: The list of texts to embed.
- Returns:
List of embeddings, one for each text.
- async aembed_documents(texts: List[str]) List[List[float]]#
- embed_query(text: str) List[float]#
- async aembed_query(text: str) List[float]#
- class dataikuapi.dss.document_extractor.DocumentExtractor(client, project_key)#
A handle to interact with a DSS-managed Document Extractor.
- vlm_extract(images, llm_id, llm_prompt=None, window_size=1, window_overlap=0)#
Extract text content from images using a vision LLM: for each group of ‘window_size’ consecutive images, prompt the given vision LLM to summarize in plain text.
- Parameters:
images (iterable(
InlineImageRef) | iterable(ManagedFolderImageRef)) – iterable over the images to be described by the vision LLMllm_id (str) – the identifier of a vision LLM
llm_prompt (str) – Custom prompt to extract text from the images
window_size (int) – Number of consecutive images to represent in a single output. Use -1 for all images.
window_overlap (int) – Number of overlapping images between two windows of images. Must be less than window_size.
- Returns:
Extracted text content per group of images
- Return type:
- structured_extract(document, max_section_depth=6, image_handling_mode='IGNORE', ocr_engine='AUTO', languages='en', llm_id=None, llm_prompt=None, output_managed_folder=None, image_validation=True)#
Splits a document (txt, md, pdf, docx, pptx, html, png, jpg, jpeg) into a structured hierarchy of sections and texts
- Parameters:
document (
DocumentRef) – document to splitmax_section_depth (int) – Maximum depth of sections to extract - consider deeper sections as plain text. If set to 0, extract the whole document as one single section.
image_handling_mode (str) – How to handle images in the document. Can be one of: ‘IGNORE’, ‘OCR’, ‘VLM_ANNOTATE’.
ocr_engine (str) – Engine to perform the OCR, either ‘AUTO’, ‘EASYOCR’ or ‘TESSERACT’. Auto uses tesseract if available, otherwise easyOCR.
languages (str | list) – OCR languages to use for recognition. List (either a comma-separated string, or list of strings) of ISO639 languages codes.
llm_id (str) – ID of the (vision-capable) LLM to use for annotating images when image_handling_mode is ‘VLM_ANNOTATE’
llm_prompt (str) – Custom prompt to extract text from the images
output_managed_folder (str) – id of a managed folder to store the image in the document. When unspecified, return inline images in the response.
image_validation (boolean) – Whether to validate images before processing. If True, images classified as barcodes, icons, logos, QR codes, signatures, or stamps are skipped.
- Returns:
Structured content of the document
- Return type:
- text_extract(document, image_handling_mode='IGNORE', ocr_engine='AUTO', languages='en')#
Extract raw text from a document (txt, md, pdf, docx, pptx, html, png, jpg, jpeg).
Some documents like PDF or PowerPoint have an inherent structure (page, bookmarks or slides); for those documents, the returned results contain this structure. Otherwise, the document’s structure is not inferred, resulting in one or more text item(s).
PDF files are converted to images and processed using OCR if image_handling_mode is set to ‘OCR’, recommended for scanned PDFs. Otherwise, their text content is extracted.
- Parameters:
document (
DocumentRef) – document to splitimage_handling_mode (str) – How to handle images in the document, either ‘IGNORE’ or ‘OCR’.
ocr_engine (str) – Engine to perform the OCR, either ‘AUTO’, ‘EASYOCR’ or ‘TESSERACT’. Auto uses tesseract if available, otherwise easyOCR.
languages (str | list) – OCR languages to use for recognition. List (either a comma-separated string, or list of strings) of ISO639 languages codes.
- Returns:
Text content of the document
- Return type:
- generate_pages_screenshots(document, output_managed_folder=None, offset=0, fetch_size=10, keep_fetched=True)#
Generate per-page screenshots of a document, returning an iterable over the screenshots.
Usage example:
doc_extractor = DocumentExtractor(client, "project_key") document_ref = ManagedFolderDocumentRef('path_in_folder/document.pdf', folder_id) fetch_size = 10 response = doc_extractor.generate_pages_screenshots(document_ref, fetch_size=fetch_size) # The first 10 screenshots (fetch_size) are computed & retrieved immediately within the response. first_screenshot = response.fetch_screenshot(0) # InlineImageRef or ManagedFolderImageRef # Iterating through the first 10 items is instantaneous as they are already fetched. # Iterating from the 11th item triggers new backend requests (processing pages 11-20, fetch screenshots). for idx, screenshot in enumerate(response): if (idx % fetch_size == 0) and idx != 0: print(f"Computing the next {fetch_size} screenshots") print(f"Screenshot #{idx}: {screenshot.as_json()}") # Alternatively, response being an iterable, you can compute & fetch all screenshots at once: response = doc_extractor.generate_pages_screenshots(document_ref) screenshots = list(response) # list of InlineImageRef or ManagedFolderImageRef objects
- Parameters:
document (
DocumentRef) – input document (txt | pdf | docx | doc | odt | pptx | ppt | odp | xlsx | xls | xlsm | xlsb | ods | png | jpg | jpeg).output_managed_folder (str) – id of a managed folder to store the generated screenshots as png. When unspecified, return inline images in the response.
offset (int) – start extraction from offset screenshots.
fetch_size (int) – number of screenshots to fetch in each request, iterating on the next result automatically sends a new request for another fetch_size screenshots
keep_fetched (boolean) – whether to keep previous screenshots requests within this response object when fetching next ones.
- Returns:
An iterable over the result screenshots
- Return type:
- convert_to_pdf(document, output_managed_folder=None, path_in_output_folder=None)#
Convert a document to PDF format.
- Parameters:
document (
DocumentRef) – input document (docx | doc | odt | pptx | ppt | odp | xlsx | xls | xlsm | xlsb | ods | png | jpg | jpeg).output_managed_folder (str) – id of an optional managed folder to store the generated PDF document. If unspecified, the document is not stored and should be downloaded from the returned
PDFConversionResponsepath_in_output_folder (str) – optional path of the generated PDF document in the output managed folder. If unspecified and the input document is in a managed folder, defaults to the input document path (with a .pdf extension).
- Returns:
A
PDFConversionResponse, to reference & download the resulting PDF.- Return type:
- class dataikuapi.dss.document_extractor.PDFConversionResponse(client, project_key, document, output_managed_folder, path_in_output_folder=None)#
A handle to interact with a document PDF conversion result.
Important
Do not create this class directly, use
convert_to_pdf()instead.- get_raw()#
- stream()#
Download the converted PDF as a binary stream.
- Returns:
The converted PDF file as a binary stream.
- Return type:
requests.Response
- download_to_file(path)#
Download the converted PDF to a local file.
- Parameters:
path (str) – the path where to download the PDF file
- Returns:
None
- property document#
- Returns:
The reference to the stored PDF if applicable, otherwise None
- Return type:
- property success#
- Returns:
The outcome of the PDF conversion request.
- Return type:
bool
- class dataikuapi.dss.document_extractor.ScreenshotterResponse(client, project_key, screenshotter_request, keep_fetched)#
A handle to interact with a screenshotter result. Iterable over the
ImageRefscreenshots.Important
Do not create this class directly, use
generate_pages_screenshots()instead.- get_raw()#
- fetch_screenshot(screenshot_index)#
- property success#
- Returns:
The outcome of the extractor request / latest fetch.
- Return type:
bool
- property has_next#
- Returns:
Whether there are more screenshots to extract after this response
- Return type:
bool
- property total_count#
- Returns:
Total number of screenshots that can be extracted from the document. In most cases corresponds to the number of pages of the document.
- Return type:
int
- property document#
- Returns:
The reference to the screenshotted document.
- Return type:
- class dataikuapi.dss.document_extractor.TextExtractorResponse(data)#
A handle to interact with a document text extractor result.
Important
Do not create this class directly, use
text_extract()instead.- get_raw()#
- property success#
- Returns:
The outcome of the text extraction request.
- Return type:
bool
- property content#
The content of the document as extracted by
text_extract()can contain some structure inherent to the document. For instance, PDF documents are extracted page by page, and PowerPoint documents slide by slide. Some PDF documents contain bookmarks that can be used to separate them into sections. For other documents, a single section with one or more text item(s).This property returns a dict that represents this structure.
- Returns:
The structure of the document as a dictionary
- Return type:
dict
- property text_content#
- Returns:
The textual content of the document as a string.
- Return type:
str
- class dataikuapi.dss.document_extractor.StructuredExtractorResponse(data)#
A handle to interact with a document structured extractor result.
Important
Do not create this class directly, use
structured_extract()instead.- get_raw()#
- property success#
- Returns:
The outcome of the structured extractor request.
- Return type:
bool
- property content#
- Returns:
The structure of the document as a dictionary
- Return type:
dict
- property text_chunks#
- Returns:
A flattened text-only view of the documents, along with their outline.
- Return type:
list[dict]
- class dataikuapi.dss.document_extractor.VlmExtractorResponse(data)#
A handle to interact with a VLM extractor result.
Important
Do not create this class directly, use
vlm_extract()- get_raw()#
- property success#
- Returns:
The outcome of the extractor request.
- Return type:
bool
- property chunks#
Content extracted from the original document, split into chunks
- Returns:
extracted text content per chunk.
- Return type:
list[str]
- class dataikuapi.dss.document_extractor.DocumentRef(mime_type=None)#
A reference to a document file.
Important
- Do not create this class directly, use one of its implementations:
LocalFileDocumentReffor a local file to be uploadedManagedFolderDocumentReffor a file inside a DSS-managed folder
- as_json()#
- class dataikuapi.dss.document_extractor.LocalFileDocumentRef(fp, mime_type=None)#
A reference to a client-local file.
Usage example:
with open("/Users/mdupont/document.pdf", "rb") as f: file_ref = LocalFileDocumentRef(f) # upload the document & generate images of the document's pages: images = list(doc_ex.generate_pages_screenshots(file_ref))
- as_json()#
- class dataikuapi.dss.document_extractor.ManagedFolderDocumentRef(file_path, managed_folder_id, mime_type=None)#
A reference to a file in a DSS-managed folder.
Usage example:
file_ref = ManagedFolderDocumentRef('path_in_folder/document.pdf', folder_id) # generate images of the document's pages: resp = doc_ex.generate_pages_screenshots(file_ref)
- as_json()#
- class dataikuapi.dss.document_extractor.ImageRef#
A reference to a single image
Important
- Do not create this class directly, use one of its implementations:
InlineImageReffor an inline (bytes / base64 string) imageManagedFolderImageReffor an image stored in a DSS-managed folder
- as_json()#
- class dataikuapi.dss.document_extractor.InlineImageRef(image, mime_type=None)#
A reference to an inline image.
Usage example:
with open("/Users/mdupont/image.jpg", "rb") as f: image_ref = InlineImageRef(f.read()) # Extract a text summary from the image using a vision LLM: resp = doc_ex.vlm_extract([image_ref], 'llm_id')
- as_json()#
- class dataikuapi.dss.document_extractor.ManagedFolderImageRef(managed_folder_id, image_path)#
A reference to an image stored in a DSS-managed folder.
Usage example:
managed_img = ManagedFolderImageRef('managed_folder_id', 'path_in_folder/image.png') # Extract a text summary from the image using a vision LLM: resp = doc_ex.vlm_extract([managed_img], 'llm_id')
- as_json()#
