This document provides detailed interface specifications required for Dify model plugin development, including model provider implementation, interface definitions for five model types (LLM, TextEmbedding, Rerank, Speech2text, Text2speech), and complete specifications for related data structures such as PromptMessage and LLMResult. The document serves as a development reference for developers implementing various model integrations.
This section introduces the interface methods and parameter descriptions that providers and each model type need to implement. Before developing a model plugin, you may first need to read Model Design Rules and Model Plugin Introduction.
Inherit the __base.model_provider.ModelProvider
base class and implement the following interface:
credentials
(object) Credential informationThe credential parameters are defined by the provider YAML configuration fileâs provider_credential_schema
, passed in as api_key
, etc. If validation fails, please throw a errors.validate.CredentialsValidateFailedError
error. Note: Predefined models need to fully implement this interface, while custom model providers only need to implement it simply as follows:
Models are divided into 5 different types, with different base classes to inherit from and different methods to implement for each type.
All models need to implement the following 2 methods consistently:
Similar to provider credential validation, this validates individual models.
Parameters:
model
(string) Model namecredentials
(object) Credential informationThe credential parameters are defined by the provider YAML configuration fileâs provider_credential_schema
or model_credential_schema
, passed in as api_key
, etc. If validation fails, please throw a errors.validate.CredentialsValidateFailedError
error.
When a model invocation exception occurs, it needs to be mapped to a specified InvokeError
type in Runtime, which helps Dify handle different errors differently. Runtime Errors:
InvokeConnectionError
Connection error during invocationInvokeServerUnavailableError
Service provider unavailableInvokeRateLimitError
Rate limit reachedInvokeAuthorizationError
Authentication failedInvokeBadRequestError
Incorrect parameters passedYou can also directly throw corresponding Errors and define them as follows, so that in subsequent calls you can directly throw exceptions like InvokeConnectionError
.
Inherit the __base.large_language_model.LargeLanguageModel
base class and implement the following interface:
Implement the core method for LLM invocation, which can support both streaming and synchronous responses.
model
(string) Model namecredentials
(object) Credential informationThe credential parameters are defined by the provider YAML configuration fileâs provider_credential_schema
or model_credential_schema
, passed in as api_key
, etc.
prompt_messages
(array[PromptMessage]) Prompt listIf the model is of Completion
type, the list only needs to include one UserPromptMessage element; if the model is of Chat
type, different messages need to be passed in as a list of SystemPromptMessage, UserPromptMessage, AssistantPromptMessage, ToolPromptMessage elements
model_parameters
(object) Model parameters defined by the model YAML configurationâs parameter_rules
.
tools
(array[PromptMessageTool]) [optional] Tool list, equivalent to function
in function calling
. This is the tool list passed to tool calling.
stop
(array[string]) [optional] Stop sequence. The model response will stop output before the string defined in the stop sequence.
stream
(bool) Whether to stream output, default is True
For streaming output, it returns Generator[LLMResultChunk], for non-streaming output, it returns LLMResult.
user
(string) [optional] A unique identifier for the user that can help the provider monitor and detect abuse.
Return Value
For streaming output, it returns Generator[LLMResultChunk], for non-streaming output, it returns LLMResult.
If the model does not provide a pre-calculation tokens interface, you can directly return 0.
Parameter explanations are the same as in LLM Invocation
above. This interface needs to calculate based on the appropriate tokenizer
for the corresponding model
. If the corresponding model does not provide a tokenizer
, you can use the _get_num_tokens_by_gpt2(text: str)
method in the AIModel
base class for calculation.
When a provider supports adding custom LLMs, this method can be implemented to allow custom models to obtain model rules. By default, it returns None.
For most fine-tuned models under the OpenAI
provider, the base model can be obtained through the fine-tuned model name, such as gpt-3.5-turbo-1106
, and then return the predefined parameter rules of the base model. Refer to the specific implementation of OpenAI.
Inherit the __base.text_embedding_model.TextEmbeddingModel
base class and implement the following interface:
Parameters:
model
(string) Model name
credentials
(object) Credential information
The credential parameters are defined by the provider YAML configuration fileâs provider_credential_schema
or model_credential_schema
, passed in as api_key
, etc.
texts
(array[string]) Text list, can be processed in batch
user
(string) [optional] A unique identifier for the user
Can help the provider monitor and detect abuse.
Return:
TextEmbeddingResult entity.
Parameter explanations can be found in the Embedding Invocation
section above.
Similar to the LargeLanguageModel
above, this interface needs to calculate based on the appropriate tokenizer
for the corresponding model
. If the corresponding model does not provide a tokenizer
, you can use the _get_num_tokens_by_gpt2(text: str)
method in the AIModel
base class for calculation.
Inherit the __base.rerank_model.RerankModel
base class and implement the following interface:
Parameters:
model
(string) Model name
credentials
(object) Credential information
The credential parameters are defined by the provider YAML configuration fileâs provider_credential_schema
or model_credential_schema
, passed in as api_key
, etc.
query
(string) Query request content
docs
(array[string]) List of segments that need to be reranked
score_threshold
(float) [optional] Score threshold
top_n
(int) [optional] Take the top n segments
user
(string) [optional] A unique identifier for the user
Can help the provider monitor and detect abuse.
Return:
RerankResult entity.
Inherit the __base.speech2text_model.Speech2TextModel
base class and implement the following interface:
Parameters:
model
(string) Model name
credentials
(object) Credential information
The credential parameters are defined by the provider YAML configuration fileâs provider_credential_schema
or model_credential_schema
, passed in as api_key
, etc.
file
(File) File stream
user
(string) [optional] A unique identifier for the user
Can help the provider monitor and detect abuse.
Return:
String after speech conversion.
Inherit the __base.text2speech_model.Text2SpeechModel
base class and implement the following interface:
Parameters:
model
(string) Model name
credentials
(object) Credential information
The credential parameters are defined by the provider YAML configuration fileâs provider_credential_schema
or model_credential_schema
, passed in as api_key
, etc.
content_text
(string) Text content to be converted
streaming
(bool) Whether to stream output
user
(string) [optional] A unique identifier for the user
Can help the provider monitor and detect abuse.
Return:
Audio stream after text conversion.
Inherit the __base.moderation_model.ModerationModel
base class and implement the following interface:
Parameters:
model
(string) Model name
credentials
(object) Credential information
The credential parameters are defined by the provider YAML configuration fileâs provider_credential_schema
or model_credential_schema
, passed in as api_key
, etc.
text
(string) Text content
user
(string) [optional] A unique identifier for the user
Can help the provider monitor and detect abuse.
Return:
False indicates the input text is safe, True indicates it is not.
Message role
Message content type, divided into plain text and images.
Message content base class, used only for parameter declaration, cannot be initialized.
Currently supports two types: text and images, and can support text and multiple images simultaneously.
You need to initialize TextPromptMessageContent
and ImagePromptMessageContent
separately.
When passing in text and images, text needs to be constructed as this entity as part of the content
list.
When passing in text and images, images need to be constructed as this entity as part of the content
list.
data
can be a url
or an image base64
encoded string.
Base class for all Role message bodies, used only for parameter declaration, cannot be initialized.
UserMessage message body, represents user messages.
Represents model response messages, typically used for few-shots
or chat history input.
Here tool_calls
is the list of tool call
returned by the model after passing in tools
to the model.
Represents system messages, typically used to set system instructions for the model.
Represents tool messages, used to pass results to the model for next-step planning after a tool has been executed.
The base classâs content
passes in the tool execution result.
Delta entity within each iteration in streaming response
Iteration entity in streaming response
Edit this page | Report an issue