Evaluation¶

giskard.rag.evaluate(answer_fn: Callable | Sequence[AgentAnswer | str], testset: QATestset | None = None, knowledge_base: KnowledgeBase | None = None, llm_client: LLMClient | None = None, agent_description: str = 'This agent is a chatbot that answers question from users.', metrics: Sequence[Callable] | None = None) → RAGReport[source]¶

Evaluate an agent by comparing its answers on a QATestset.

Parameters:

answers_fn (Union[Callable, Sequence[Union[AgentAnswer,str]]]) – The prediction function of the agent to evaluate or a list of precalculated answers on the testset.
testset (QATestset, optional) – The test set to evaluate the agent on. If not provided, a knowledge base must be provided and a default testset will be created from the knowledge base. Note that if the answers_fn is a list of answers, the testset is required.
knowledge_base (KnowledgeBase, optional) – The knowledge base of the agent to evaluate. If not provided, a testset must be provided.
llm_client (LLMClient, optional) – The LLM client to use for the evaluation. If not provided, a default openai client will be used.
agent_description (str, optional) – Description of the agent to be tested.
metrics (Optional[Sequence[Callable]], optional) – Metrics to compute on the test set.

Returns:

The report of the evaluation.

Return type:

RAGReport

class giskard.llm.evaluators.CorrectnessEvaluator(answer_col='reference_answer', *args, **kwargs)[source]¶: Assess the correctness of a model answers given questions and associated reference answers.