EvaluationΒΆ
- giskard.rag.evaluate(answer_fn: Callable | Sequence[AgentAnswer | str], testset: QATestset | None = None, knowledge_base: KnowledgeBase | None = None, llm_client: LLMClient | None = None, agent_description: str = 'This agent is a chatbot that answers question from users.', metrics: Sequence[Callable] | None = None) RAGReport[source]ΒΆ
Evaluate an agent by comparing its answers on a QATestset.
- Parameters:
answers_fn (Union[Callable, Sequence[Union[AgentAnswer,str]]]) β The prediction function of the agent to evaluate or a list of precalculated answers on the testset.
testset (QATestset, optional) β The test set to evaluate the agent on. If not provided, a knowledge base must be provided and a default testset will be created from the knowledge base. Note that if the answers_fn is a list of answers, the testset is required.
knowledge_base (KnowledgeBase, optional) β The knowledge base of the agent to evaluate. If not provided, a testset must be provided.
llm_client (LLMClient, optional) β The LLM client to use for the evaluation. If not provided, a default openai client will be used.
agent_description (str, optional) β Description of the agent to be tested.
metrics (Optional[Sequence[Callable]], optional) β Metrics to compute on the test set.
- Returns:
The report of the evaluation.
- Return type:
RAGReport