多文档代理
在本笔记本中,我们将使用 ReAct Agent
和 DocumentAgents
的概念,来研究如何处理大量文档的 RAG。
安装
!pip install llama-index
!pip install llama-index-llms-anthropic
!pip install llama-index-embeddings-huggingface
设置日志
# 注意:这仅在 Jupyter notebook 中是必需的。
# 详情:Jupyter 在后台运行一个事件循环。
# 这会导致我们在启动事件循环以进行异步查询时出现嵌套事件循环。
# 这通常是不允许的,我们使用 nest_asyncio 来方便地允许它。
import nest_asyncio
nest_asyncio.apply()
import logging
import sys
# 设置根记录器
logger = logging.getLogger()
logger.setLevel(logging.INFO) # 设置记录器级别为 INFO
# 清除所有现有的处理程序
logger.handlers = []
# 设置 StreamHandler 以输出到 sys.stdout (Colab 的输出)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO) # 设置处理程序级别为 INFO
# 将处理程序添加到记录器
logger.addHandler(handler)
from IPython.display import display, HTML
设置 Anthropic API 密钥
import os
os.environ['ANTHROPIC_API_KEY'] = 'YOUR ANTHROPIC API KEY'
设置 LLM 和嵌入模型
我们将使用 Anthropic 最新发布的 Claude-3 Opus
LLM。
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
llm = Anthropic(temperature=0.0, model='claude-3-opus-20240229')
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512
下载文档
我们将使用多伦多、西雅图、芝加哥、波士顿、休斯顿这几个城市的维基百科页面来构建 RAG 管道。
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston"]
from pathlib import Path
import requests
for title in wiki_titles:
response = requests.get(
"https://en.wikipedia.org/w/api.php",
params={
"action": "query",
"format": "json",
"titles": title,
"prop": "extracts",
# 'exintro': True,
"explaintext": True,
},
).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]
data_path = Path("data")
if not data_path.exists():
Path.mkdir(data_path)
with open(data_path / f"{title}.txt", "w") as fp:
fp.write(wiki_text)
加载文档
# 加载所有维基百科文档
from llama_index.core import SimpleDirectoryReader
city_docs = {}
for wiki_title in wiki_titles:
city_docs[wiki_title] = SimpleDirectoryReader(
input_files=[f"data/{wiki_title}.txt"]
).load_data()
为每个城市构建 ReAct Agent
from llama_index.core.agent import ReActAgent
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
# 构建代理字典
agents = {}
for wiki_title in wiki_titles:
# 构建向量索引
vector_index = VectorStoreIndex.from_documents(
city_docs[wiki_title],
)
# 构建摘要索引
summary_index = SummaryIndex.from_documents(
city_docs[wiki_title],
)
# 定义查询引擎
vector_query_engine = vector_index.as_query_engine()
summary_query_engine = summary_index.as_query_engine()
# 定义工具
query_engine_tools = [
QueryEngineTool(
query_engine=vector_query_engine,
metadata=ToolMetadata(
name="vector_tool",
description=(
f"用于从 {wiki_title} 中检索特定上下文"
),
),
),
QueryEngineTool(
query_engine=summary_query_engine,
metadata=ToolMetadata(
name="summary_tool",
description=(
"用于与"
f" {wiki_title} 相关的摘要问题"
),
),
),
]
# 构建代理
agent = ReActAgent.from_tools(
query_engine_tools,
llm=llm,
verbose=True,
)
agents[wiki_title] = agent
为这些代理定义 IndexNode
from llama_index.core.schema import IndexNode
# 定义顶层节点
objects = []
for wiki_title in wiki_titles:
# 定义指向这些代理的索引节点
wiki_summary = (
f"此内容包含关于 {wiki_title} 的维基百科文章。如果需要查找关于"
f" {wiki_title} 的特定事实,请使用此索引。\n如果想分析多个城市,请不要使用此索引。"
)
node = IndexNode(
text=wiki_summary, index_id=wiki_title, obj=agents[wiki_title]
)
objects.append(node)
定义顶层检索器以选择代理
vector_index = VectorStoreIndex(
objects=objects,
)
query_engine = vector_index.as_query_engine(similarity_top_k=1, verbose=True)
测试查询
应根据查询选择特定代理的向量工具/摘要工具。
# 应使用多伦多代理 -> 向量工具
response = query_engine.query("What is the population of Toronto?")
[1;3;38;2;11;159;203mRetrieval entering Toronto: ReActAgent
[0m [1;3;38;2;237;90;200mRetrieving from object ReActAgent with query What is the population of Toronto?
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: vector_tool
Action Input: {'input': 'What is the population of Toronto?'}
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;34mObservation: According to the context information, the population of Toronto in 2021 was 2,794,356, making it the fourth-most populous city in North America.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;5;200mThought: I can answer without using any more tools.
Answer: According to the information provided, the population of Toronto in 2021 was 2,794,356, making it the fourth-most populous city in North America.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
The population of Toronto is 2,794,356 as of 2021. It is the fourth-most populous city in North America.
# 应使用休斯顿代理 -> 向量工具
response = query_engine.query("Who and when was Houston founded?")
[1;3;38;2;11;159;203mRetrieval entering Houston: ReActAgent
[0m [1;3;38;2;237;90;200mRetrieving from object ReActAgent with query Who and when was Houston founded?
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;5;200mThought: I need to use a tool to help me answer the question about who founded Houston and when it was founded.
Action: vector_tool
Action Input: {'input': 'Who founded Houston and when was it founded?'}
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;34mObservation: Houston was founded by land investors on August 30, 1836, at the confluence of Buffalo Bayou and White Oak Bayou, a point now known as Allen's Landing. The city was incorporated on June 5, 1837 and named after former General Sam Houston, who was president of the Republic of Texas at the time.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;5;200mThought: The vector_tool provided the key information needed to answer the question of who founded Houston and when it was founded. I can now provide a complete answer without using any more tools.
Answer: Houston was founded by land investors on August 30, 1836. The city was incorporated on June 5, 1837 and named after Sam Houston, who was the president of the Republic of Texas at the time. The location where Houston was founded is at the confluence of Buffalo Bayou and White Oak Bayou, which is now known as Allen's Landing.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
Houston was founded by land investors on August 30, 1836. The city was named after Sam Houston, who was serving as the president of the Republic of Texas at that time.
# 应使用波士顿代理 -> 摘要工具
response = query_engine.query("Summarize about the sports teams in Boston")
[1;3;38;2;11;159;203mRetrieval entering Boston: ReActAgent
[0m [1;3;38;2;237;90;200mRetrieving from object ReActAgent with query Summarize about the sports teams in Boston
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: summary_tool
Action Input: {'input': 'Summarize the sports teams in Boston'}
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;34mObservation: Boston has teams in the four major North American men's professional sports leagues plus Major League Soccer, and has won 39 championships in these leagues:
- The Boston Red Sox (MLB) play at Fenway Park. They are one of the most storied franchises in baseball.
- The Boston Celtics (NBA) play at TD Garden. Along with the Los Angeles Lakers, they have won the most NBA championships with 17.
- The Boston Bruins (NHL) also play at TD Garden. They were the first American NHL team and are an Original Six franchise.
- The New England Patriots (NFL) play in nearby Foxborough. They have won 6 Super Bowls in the 2000s and 2010s.
- The New England Revolution (MLS) also play in Foxborough.
Boston also has several other professional sports teams like the Boston Breakers (women's soccer) and Boston Cannons (lacrosse). The area's many colleges field competitive NCAA Division I teams, especially in ice hockey. The annual Boston Marathon is one of the world's most famous running events.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;5;200mThought: The summary tool provided a good overview of the major sports teams in Boston. I think I can provide a concise summary answer to the original question based on this information.
Answer: Boston is home to successful professional sports teams in baseball (Red Sox), basketball (Celtics), hockey (Bruins), football (Patriots), and soccer (Revolution). The Red Sox, Celtics, and Bruins are some of the most historic franchises in their respective leagues. In total, Boston teams have won 39 championships in the four major North American sports leagues and MLS. The area also hosts the famous Boston Marathon each year and has many competitive college sports programs, especially in ice hockey.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
Boston is a city with a rich sports tradition, boasting several highly successful professional teams across multiple leagues. In baseball, the Boston Red Sox are one of the most storied franchises in the sport's history. The Boston Celtics have a similarly impressive legacy in basketball, with numerous championships to their name. Hockey fans in the city passionately support the Boston Bruins, another team with a long and successful history. The New England Patriots, who play in the nearby town of Foxborough, have been a dominant force in the NFL for many years. Even in the relatively newer MLS, the New England Revolution have made their mark on the Boston sports scene. These teams have combined to win an impressive 39 championships across the five leagues. Beyond professional sports, Boston is also known for hosting the prestigious Boston Marathon annually and having strong college sports programs, particularly in ice hockey.
# 应使用西雅图代理 -> 摘要工具
response = query_engine.query(
"Give me a summary on all the positive aspects of Chicago"
)
[1;3;38;2;11;159;203mRetrieval entering Chicago: ReActAgent
[0m [1;3;38;2;237;90;200mRetrieving from object ReActAgent with query Give me a summary on all the positive aspects of Chicago
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;5;200mThought: I need to use a tool to help me summarize the positive aspects of Chicago.
Action: summary_tool
Action Input: {'input': 'Provide a summary of the positive aspects and attributes of the city of Chicago'}
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;34mObservation: Based on the provided information, some of the positive aspects and attributes of Chicago include:
- Chicago is the third most populous city in the U.S. and a major transportation hub. It is an important component in global distribution.
- The city has a diverse economy, with major industries including finance, commerce, industry, technology, telecommunications, and transportation. Chicago has the second largest central business district in the U.S.
- Chicago is an international hub for finance, culture, commerce, and industry. It has the largest derivatives market in the world and is a major financial center.
- The city is a popular tourist destination known for its architecture, skyscrapers, parks, museums, and unique cuisine. Major attractions include Millennium Park, Navy Pier, the Magnificent Mile, and numerous cultural institutions.
- Chicago has a rich history in the visual and performing arts, literature, film, theater, and music. It has influential art museums, a strong tradition of jazz and blues music, and acclaimed theater, dance, and comedy scenes.
- The city has an extensive public transportation system including rail lines and buses. It also has a large network of parks and open spaces.
- Chicago is home to several top universities, numerous major company headquarters, and a variety of unique neighborhoods and communities reflecting its diverse population and history.
In summary, Chicago stands out for its economic strength, cultural vibrancy, tourist appeal, transportation infrastructure, higher education, and cosmopolitan character. The context portrays Chicago in an overall positive light as a thriving, world-class American city.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
[1;3;38;5;200mThought: The summary tool provided a good overview of the positive aspects of Chicago. I think I can provide a satisfactory answer to the original question based on the information given.
Answer: Chicago has many positive attributes that make it a vibrant and appealing city:
Economy and Business: Chicago has a diverse and thriving economy, serving as a major hub for industries like finance, commerce, telecommunications, and transportation. It is home to the second largest central business district in the U.S. and the world's largest derivatives market.
Tourism and Culture: Chicago is a popular tourist destination, known for its iconic architecture, world-class museums, unique cuisine, and beautiful parks like Millennium Park. The city has a rich artistic heritage in visual arts, music (especially jazz and blues), theater, dance and comedy.
Transportation: Chicago is a critical transportation center for the U.S. with an extensive public transit network of trains and buses. Its airports and rail lines make it an important link in the nation's distribution network.
Education: Numerous prestigious universities call Chicago home, adding to its intellectual capital.
Diversity: Chicago's many neighborhoods reflect the diverse backgrounds and cultures of its residents, resulting in a cosmopolitan character.
In summary, Chicago stands out for its robust economy, vibrant cultural scene, strong transportation infrastructure, acclaimed educational institutions, diversity, and global status - making it an attractive place to live, work and visit.
[0mHTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK"
display(HTML(f'<p style="font-size:20px">{response.response}</p>'))
Chicago is a dynamic and appealing city with many positive attributes: It has a thriving, diverse economy serving as a major hub for finance, commerce, telecommunications and transportation. Chicago is a popular tourist destination, known for its iconic architecture, world-class museums, unique cuisine, and beautiful parks. The city is a critical transportation center with an extensive public transit network, airports and rail lines. Numerous prestigious universities call Chicago home. Its neighborhoods reflect the diverse backgrounds and cultures of residents, giving the city a cosmopolitan character. In summary, Chicago stands out for its robust economy, vibrant culture, strong transportation infrastructure, acclaimed educational institutions, diversity, and global status - making it an attractive place to live, work and visit.