Sub-Question Query Generation Evaluation

The SubQuestionQueryGeneration operator decomposes a question into sub-questions, generating responses for each using a RAG query engine. Given the complexity, we include the previous evaluations and add:

Sub Query Completeness: Assures that the sub-questions accurately and comprehensively cover the original query.

How to do it?

Install UpTrain and LlamaIndex

pip install -q html2text llama-index pandas tqdm uptrain cohere

Import required libraries

from llama_index import (
    ServiceContext,
    VectorStoreIndex,
)
from llama_index.node_parser import SentenceSplitter
from llama_index.readers import SimpleWebPageReader
from llama_index.callbacks import CallbackManager, UpTrainCallbackHandler
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.service_context import set_global_service_context
from llama_index.query_engine.sub_question_query_engine import (
    SubQuestionQueryEngine,
)
from llama_index.tools.query_engine import QueryEngineTool
from llama_index.tools.types import ToolMetadata

Setup UpTrain Open-Source Software (OSS)

You can use the open-source evaluation service to evaluate your model. In this case, you will need to provie an OpenAI API key. You can get yours here.Parameters:

key_type=“openai”
api_key=“OPENAI_API_KEY”
project_name_prefix=“PROJECT_NAME_PREFIX”

callback_handler = UpTrainCallbackHandler(
    key_type="openai",
    api_key="sk-...",  # Replace with your OpenAI API key
    project_name_prefix="llama",
)
Settings.callback_manager = CallbackManager([callback_handler])

Load and Parse Documents

Load documents from Paul Graham’s essay “What I Worked On”.

documents = SimpleWebPageReader().load_data(
  [
      "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt"
  ]
)

Parse the document into nodes.

parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

Sub-Question Query Generation Evaluation

The sub question query engine is used to tackle the problem of answering a complex query using multiple data sources. It first breaks down the complex query into sub questions for each relevant data source, then gather all the intermediate responses and synthesizes a final response.UpTrain callback handler will automatically capture the sub-question and the responses for each of them once generated and will run the following three evaluations (Graded from 0 to 1) on the response:

Context Relevance: Determines if the context extracted from the query is relevant to the response.
Factual Accuracy: Assesses if the LLM is hallcuinating or providing incorrect information.
Response Completeness: Checks if the response contains all the information requested by the query.

In addition to the above evaluations, the callback handler will also run the following evaluation:

Sub Query Completeness: Checks if the sub-questions accurately and completely cover the original query.

# build index and query engine
vector_query_engine = VectorStoreIndex.from_documents(
    documents=documents, use_async=True, service_context=service_context
).as_query_engine()

query_engine_tools = [
    QueryEngineTool(
        query_engine=vector_query_engine,
        metadata=ToolMetadata(
            name="documents",
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    service_context=service_context,
    use_async=True,
)

response = query_engine.query(
    "How was Paul Grahams life different before, during, and after YC?"
)

Question: What did Paul Graham work on during YC?
Context Relevance Score: 0.5
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5


Question: What did Paul Graham work on after YC?
Context Relevance Score: 0.5
Factual Accuracy Score: 1.0
Response Completeness Score: 0.5


Question: What did Paul Graham work on before YC?
Context Relevance Score: 1.0
Factual Accuracy Score: 1.0
Response Completeness Score: 0.0


Question: How was Paul Grahams life different before, during, and after YC?
Sub Query Completeness Score: 1.0

Tutorial

Open this tutorial in Colab

Have Questions?

Join our community for any questions or requests

Getting Started

Pre-configured Evaluations

Supported LLMs

Integrations

Tutorials

FAQ

Sub-Question Query Generation Evaluation

How to do it?

Tutorial

Have Questions?

Getting Started

Pre-configured Evaluations

Supported LLMs

Integrations

Tutorials

FAQ

​How to do it?

Tutorial

Have Questions?

How to do it?