Contextual Relevancy
The contextual relevancy metric measures the quality of your RAG pipeline's retriever by evaluating the overall relevance of the information presented in your retrieval_context for a given input. deepeval's contextual relevancy metric is a self-explaining LLM-Eval, meaning it outputs a reason for its metric score.
Required Arguments
To use the ContextualRelevancyMetric, you'll have to provide the following arguments when creating an LLMTestCase:
inputactual_outputretrieval_context
Similar to ContextualPrecisionMetric, the ContextualRelevancyMetric uses retrieval_context from your RAG pipeline for evaluation.
Example
from deepeval import evaluate
from deepeval.metrics import ContextualRelevancyMetric
from deepeval.test_case import LLMTestCase
# Replace this with the actual output from your LLM application
actual_output = "We offer a 30-day full refund at no extra cost."
# Replace this with the actual retrieved context from your RAG pipeline
retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]
metric = ContextualRelevancyMetric(
    threshold=0.7,
    model="gpt-4",
    include_reason=True
)
test_case = LLMTestCase(
    input="What if these shoes don't fit?",
    actual_output=actual_output,
    retrieval_context=retrieval_context
)
metric.measure(test_case)
print(metric.score)
print(metric.reason)
# or evaluate test cases in bulk
evaluate([test_case], [metric])
There are four optional parameters when creating a ContextualRelevancyMetricMetric:
- [Optional] 
threshold: a float representing the minimum passing threshold, defaulted to 0.5. - [Optional] 
model: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of typeDeepEvalBaseLLM. Defaulted to 'gpt-4-0125-preview'. - [Optional] 
include_reason: a boolean which when set toTrue, will include a reason for its evaluation score. Defaulted toTrue. - [Optional] 
multithreading: a boolean which when set toTrue, enables concurrent evaluation of said metric. Defaulted toTrue. 
How Is It Calculated?
The ContextualRelevancyMetric score is calculated according to the following equation:
Although similar to how the AnswerRelevancyMetric is calculated, the ContextualRelevancyMetric first uses an LLM to extract all statements made in the retrieval_context instead, before using the same LLM to classify whether each statement is relevant to the input.