如何使用具有知识库的函数

本笔记本基于参数生成笔记本中的概念,创建了一个能够访问知识库和两个可根据用户需求调用的函数的代理。

我们将创建一个代理,该代理使用 arXiv 的数据来回答有关学术主题的问题。它有两个可用的函数:

  • get_articles:一个函数,用于获取 arXiv 上关于某个主题的文章,并为用户提供带链接的摘要。
  • read_article_and_summarize:此函数接受一篇先前搜索过的文章,完整阅读并总结其核心论点、证据和结论。

这将使您熟悉多函数工作流,该工作流可以从多个服务中进行选择,并且其中一些数据来自第一个函数,并被保留以供第二个函数使用。

演练

本指南将引导您完成以下工作流:

  • 搜索实用程序:创建访问 arXiv 以获取答案的两个函数。
  • 配置代理:构建代理行为,该行为将评估函数的需求,如果需要,则调用该函数并将结果返回给代理。
  • arXiv 对话:将所有这些内容整合到实时对话中。
!pip install scipy --quiet
!pip install tenacity --quiet
!pip install tiktoken==0.3.3 --quiet
!pip install termcolor --quiet
!pip install openai --quiet
!pip install arxiv --quiet
!pip install pandas --quiet
!pip install PyPDF2 --quiet
!pip install tqdm --quiet
import arxiv
import ast
import concurrent
import json
import os
import pandas as pd
import tiktoken
from csv import writer
from IPython.display import display, Markdown, Latex
from openai import OpenAI
from PyPDF2 import PdfReader
from scipy import spatial
from tenacity import retry, wait_random_exponential, stop_after_attempt
from tqdm import tqdm
from termcolor import colored

GPT_MODEL = "gpt-4o-mini"
EMBEDDING_MODEL = "text-embedding-ada-002"
client = OpenAI()

搜索实用程序

我们将首先设置一些支持我们两个函数的实用程序。

下载的论文将存储在一个目录中(我们在此处使用 ./data/papers)。我们创建一个文件 arxiv_library.csv 来存储下载论文的嵌入和详细信息,以便使用 summarize_text 进行检索。

directory = './data/papers'

# Check if the directory already exists
if not os.path.exists(directory):
    # If the directory doesn't exist, create it and any necessary intermediate directories
    os.makedirs(directory)
    print(f"Directory '{directory}' created successfully.")
else:
    # If the directory already exists, print a message indicating it
    print(f"Directory '{directory}' already exists.")
Directory './data/papers' already exists.
# Set a directory to store downloaded papers
data_dir = os.path.join(os.curdir, "data", "papers")
paper_dir_filepath = "./data/papers/arxiv_library.csv"

# Generate a blank dataframe where we can store downloaded files
df = pd.DataFrame(list())
df.to_csv(paper_dir_filepath)
@retry(wait=wait_random_exponential(min=1, max=40), stop=stop_after_attempt(3))
def embedding_request(text):
    response = client.embeddings.create(input=text, model=EMBEDDING_MODEL)
    return response


@retry(wait=wait_random_exponential(min=1, max=40), stop=stop_after_attempt(3))
def get_articles(query, library=paper_dir_filepath, top_k=10):
    """This function gets the top_k articles based on a user's query, sorted by relevance.
    It also downloads the files and stores them in arxiv_library.csv to be retrieved by the read_article_and_summarize.
    """
    client = arxiv.Client()
    search = arxiv.Search(
        query = query,
        max_results = top_k
    )
    result_list = []
    for result in client.results(search):
        result_dict = {}
        result_dict.update({"title": result.title})
        result_dict.update({"summary": result.summary})

        # Taking the first url provided
        result_dict.update({"article_url": [x.href for x in result.links][0]})
        result_dict.update({"pdf_url": [x.href for x in result.links][1]})
        result_list.append(result_dict)

        # Store references in library file
        response = embedding_request(text=result.title)
        file_reference = [
            result.title,
            result.download_pdf(data_dir),
            response.data[0].embedding,
        ]

        # Write to file
        with open(library, "a") as f_object:
            writer_object = writer(f_object)
            writer_object.writerow(file_reference)
            f_object.close()
    return result_list
# Test that the search is working
result_output = get_articles("ppo reinforcement learning")
result_output[0]
{'title': 'Proximal Policy Optimization and its Dynamic Version for Sequence Generation',
 'summary': 'In sequence generation task, many works use policy gradient for model\noptimization to tackle the intractable backpropagation issue when maximizing\nthe non-differentiable evaluation metrics or fooling the discriminator in\nadversarial learning. In this paper, we replace policy gradient with proximal\npolicy optimization (PPO), which is a proved more efficient reinforcement\nlearning algorithm, and propose a dynamic approach for PPO (PPO-dynamic). We\ndemonstrate the efficacy of PPO and PPO-dynamic on conditional sequence\ngeneration tasks including synthetic experiment and chit-chat chatbot. The\nresults show that PPO and PPO-dynamic can beat policy gradient by stability and\nperformance.',
 'article_url': 'http://arxiv.org/abs/1808.07982v1',
 'pdf_url': 'http://arxiv.org/pdf/1808.07982v1'}
def strings_ranked_by_relatedness(
    query: str,
    df: pd.DataFrame,
    relatedness_fn=lambda x, y: 1 - spatial.distance.cosine(x, y),
    top_n: int = 100,
) -> list[str]:
    """Returns a list of strings and relatednesses, sorted from most related to least."""
    query_embedding_response = embedding_request(query)
    query_embedding = query_embedding_response.data[0].embedding
    strings_and_relatednesses = [
        (row["filepath"], relatedness_fn(query_embedding, row["embedding"]))
        for i, row in df.iterrows()
    ]
    strings_and_relatednesses.sort(key=lambda x: x[1], reverse=True)
    strings, relatednesses = zip(*strings_and_relatednesses)
    return strings[:top_n]
def read_pdf(filepath):
    """Takes a filepath to a PDF and returns a string of the PDF's contents"""
    # creating a pdf reader object
    reader = PdfReader(filepath)
    pdf_text = ""
    page_number = 0
    for page in reader.pages:
        page_number += 1
        pdf_text += page.extract_text() + f"\nPage Number: {page_number}"
    return pdf_text


# Split a text into smaller chunks of size n, preferably ending at the end of a sentence
def create_chunks(text, n, tokenizer):
    """Returns successive n-sized chunks from provided text."""
    tokens = tokenizer.encode(text)
    i = 0
    while i < len(tokens):
        # Find the nearest end of sentence within a range of 0.5 * n and 1.5 * n tokens
        j = min(i + int(1.5 * n), len(tokens))
        while j > i + int(0.5 * n):
            # Decode the tokens and check for full stop or newline
            chunk = tokenizer.decode(tokens[i:j])
            if chunk.endswith(".") or chunk.endswith("\n"):
                break
            j -= 1
        # If no end of sentence found, use n tokens as the chunk size
        if j == i + int(0.5 * n):
            j = min(i + n, len(tokens))
        yield tokens[i:j]
        i = j


def extract_chunk(content, template_prompt):
    """This function applies a prompt to some input content. In this case it returns a summarized chunk of text"""
    prompt = template_prompt + content
    response = client.chat.completions.create(
        model=GPT_MODEL, messages=[{"role": "user", "content": prompt}], temperature=0
    )
    return response.choices[0].message.content


def summarize_text(query):
    """This function does the following:

    - Reads in the arxiv_library.csv file in including the embeddings
    - Finds the closest file to the user's query
    - Scrapes the text out of the file and chunks it
    - Summarizes each chunk in parallel
    - Does one final summary and returns this to the user"""

    # A prompt to dictate how the recursive summarizations should approach the input paper
    summary_prompt = """Summarize this text from an academic paper. Extract any key points with reasoning.

Content:"""

    # If the library is empty (no searches have been performed yet), we perform one and download the results
    library_df = pd.read_csv(paper_dir_filepath).reset_index()
    if len(library_df) == 0:
        print("No papers searched yet, downloading first.")
        get_articles(query)
        print("Papers downloaded, continuing")
        library_df = pd.read_csv(paper_dir_filepath).reset_index()
    else:
        print("Existing papers found... Articles:", len(library_df))
    library_df.columns = ["title", "filepath", "embedding"]
    library_df["embedding"] = library_df["embedding"].apply(ast.literal_eval)
    strings = strings_ranked_by_relatedness(query, library_df, top_n=1)
    print("Chunking text from paper")
    pdf_text = read_pdf(strings[0])

    # Initialise tokenizer
    tokenizer = tiktoken.get_encoding("cl100k_base")
    results = ""

    # Chunk up the document into 1500 token chunks
    chunks = create_chunks(pdf_text, 1500, tokenizer)
    text_chunks = [tokenizer.decode(chunk) for chunk in chunks]
    print("Summarizing each chunk of text")

    # Parallel process the summaries
    with concurrent.futures.ThreadPoolExecutor(
        max_workers=len(text_chunks)
    ) as executor:
        futures = [
            executor.submit(extract_chunk, chunk, summary_prompt)
            for chunk in text_chunks
        ]
        with tqdm(total=len(text_chunks)) as pbar:
            for _ in concurrent.futures.as_completed(futures):
                pbar.update(1)
        for future in futures:
            data = future.result()
            results += data

    # Final summary
    print("Summarizing into overall summary")
    response = client.chat.completions.create(
        model=GPT_MODEL,
        messages=[
            {
                "role": "user",
                "content": f"""Write a summary collated from this collection of key points extracted from an academic paper.
                        The summary should highlight the core argument, conclusions and evidence, and answer the user's query.
                        User query: {query}
                        The summary should be structured in bulleted lists following the headings Core Argument, Evidence, and Conclusions.
                        Key points:\n{results}\nSummary:\n""",
            }
        ],
        temperature=0,
    )
    return response
# Test the summarize_text function works
chat_test_response = summarize_text("PPO reinforcement learning sequence generation")
Existing papers found... Articles: 10
Chunking text from paper
Summarizing each chunk of text


100%|███████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00,  1.40s/it]


Summarizing into overall summary

Core Argument

  • The paper argues that Proximal Policy Optimization (PPO) and its dynamic variant (PPO-dynamic) significantly improve sequence generation tasks, particularly for chit-chat chatbots, by addressing the instability and suboptimal performance associated with traditional policy gradient methods.

Evidence

  • Challenges with Traditional Methods: Traditional policy gradient methods, like REINFORCE, suffer from unstable training and poor performance due to large updates and similar action tendencies, especially in non-differentiable evaluation contexts (e.g., BLEU scores).
  • PPO Advantages: PPO regularizes policy updates, enhancing training stability and enabling the generation of coherent and diverse chatbot responses.
  • Dynamic PPO Approach: PPO-dynamic introduces adaptive constraints on KL-divergence, allowing for dynamic adjustments based on action probabilities, which leads to improved training performance.
  • Experimental Validation: The authors conducted experiments on synthetic counting tasks and real-world chit-chat scenarios, demonstrating that PPO and PPO-dynamic outperform traditional methods like REINFORCE and SeqGAN in terms of stability and performance metrics (e.g., BLEU-2 scores).
  • Results: PPO-dynamic showed faster convergence and higher precision in the counting task, and it achieved the best performance in the chit-chat task, indicating its effectiveness in generating diverse and contextually appropriate responses.

Conclusions

  • The paper concludes that the introduction of PPO and PPO-dynamic enhances the training stability and output diversity in sequence generation tasks, making them more suitable for applications like chatbots.
  • The dynamic variant of PPO not only improves performance but also accelerates convergence, addressing the limitations of traditional policy gradient methods and providing a robust framework for reinforcement learning in sequence generation.

Configure Agent

We'll create our agent in this step, including a Conversation class to support multiple turns with the API, and some Python functions to enable interaction between the ChatCompletion API and our knowledge base functions.

@retry(wait=wait_random_exponential(min=1, max=40), stop=stop_after_attempt(3))
def chat_completion_request(messages, functions=None, model=GPT_MODEL):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            functions=functions,
        )
        return response
    except Exception as e:
        print("Unable to generate ChatCompletion response")
        print(f"Exception: {e}")
        return e
class Conversation:
    def __init__(self):
        self.conversation_history = []

    def add_message(self, role, content):
        message = {"role": role, "content": content}
        self.conversation_history.append(message)

    def display_conversation(self, detailed=False):
        role_to_color = {
            "system": "red",
            "user": "green",
            "assistant": "blue",
            "function": "magenta",
        }
        for message in self.conversation_history:
            print(
                colored(
                    f"{message['role']}: {message['content']}\n\n",
                    role_to_color[message["role"]],
                )
            )
# Initiate our get_articles and read_article_and_summarize functions
arxiv_functions = [
    {
        "name": "get_articles",
        "description": """Use this function to get academic papers from arXiv to answer user questions.""",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": f"""
                            User query in JSON. Responses should be summarized and should include the article URL reference
                            """,
                }
            },
            "required": ["query"],
        },
    },
    {
        "name": "read_article_and_summarize",
        "description": """Use this function to read whole papers and provide a summary for users.
        You should NEVER call this function before get_articles has been called in the conversation.""",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": f"""
                            Description of the article in plain text based on the user's query
                            """,
                }
            },
            "required": ["query"],
        },
    }
]
def chat_completion_with_function_execution(messages, functions=[None]):
    """This function makes a ChatCompletion API call with the option of adding functions"""
    response = chat_completion_request(messages, functions)
    full_message = response.choices[0]
    if full_message.finish_reason == "function_call":
        print(f"Function generation requested, calling function")
        return call_arxiv_function(messages, full_message)
    else:
        print(f"Function not required, responding to user")
        return response


def call_arxiv_function(messages, full_message):
    """Function calling function which executes function calls when the model believes it is necessary.
    Currently extended by adding clauses to this if statement."""

    if full_message.message.function_call.name == "get_articles":
        try:
            parsed_output = json.loads(
                full_message.message.function_call.arguments
            )
            print("Getting search results")
            results = get_articles(parsed_output["query"])
        except Exception as e:
            print(parsed_output)
            print(f"Function execution failed")
            print(f"Error message: {e}")
        messages.append(
            {
                "role": "function",
                "name": full_message.message.function_call.name,
                "content": str(results),
            }
        )
        try:
            print("Got search results, summarizing content")
            response = chat_completion_request(messages)
            return response
        except Exception as e:
            print(type(e))
            raise Exception("Function chat request failed")

    elif (
        full_message.message.function_call.name == "read_article_and_summarize"
    ):
        parsed_output = json.loads(
            full_message.message.function_call.arguments
        )
        print("Finding and reading paper")
        summary = summarize_text(parsed_output["query"])
        return summary

    else:
        raise Exception("Function does not exist and cannot be called")

arXiv 对话

让我们通过对话来整合这一切,以测试我们的函数。

# Start with a system message
paper_system_message = """You are arXivGPT, a helpful assistant pulls academic papers to answer user questions.
You summarize the papers clearly so the customer can decide which to read to answer their question.
You always provide the article_url and title so the user can understand the name of the paper and click through to access it.
Begin!"""
paper_conversation = Conversation()
paper_conversation.add_message("system", paper_system_message)
# Add a user message
paper_conversation.add_message("user", "Hi, how does PPO reinforcement learning work?")
chat_response = chat_completion_with_function_execution(
    paper_conversation.conversation_history, functions=arxiv_functions
)
assistant_message = chat_response.choices[0].message.content
paper_conversation.add_message("assistant", assistant_message)
display(Markdown(assistant_message))
Function generation requested, calling function
Getting search results
Got search results, summarizing content

以下是一些关于近端策略优化(PPO)在强化学习中的应用的近期论文,它们解释了其机制和各种改进:

  1. Proximal Policy Optimization and its Dynamic Version for Sequence Generation - 摘要: 本文将 PPO 应用于序列生成任务,证明了它在稳定性和性能方面优于传统的策略梯度方法。它为这些任务引入了 PPO 的动态版本。 - PDF

  2. CIM-PPO: Proximal Policy Optimization with Liu-Correntropy Induced Metric - 摘要: 这项工作研究了 PPO-KL 中 KL 散度的不对称性,并提出了 PPO-CIM 作为一种增强版本,具有较低的计算成本和改进的策略更新,已在连续动作任务的实验中得到验证。 - PDF

  3. A2C is a special case of PPO - 摘要: 本文表明 A2C 可以被视为 PPO 的特例,提供了理论依据和经验证据,证明了它们在受控条件下的等价性。 - PDF

  4. Proximal Policy Optimization via Enhanced Exploration Efficiency - 摘要: 本文通过改进探索策略来增强 PPO 算法,提出了 IEM-PPO,该算法在复杂环境中比标准方法具有更好的样本效率和奖励。 - PDF

  5. ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models - 摘要: ReMax 方法被提出作为 PPO 的替代方案,用于训练大型语言模型,降低了超参数调整的复杂性并提高了训练效率。 - PDF

  6. Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks - 摘要: 这项工作研究了 DreamerV3 的技巧在 PPO 中的适用性,揭示了混合结果,并为 PPO 的性能裁剪机制提供了见解。 - PDF

  7. Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective - 摘要: 本文为 PPO-Clip 提供了理论基础,并介绍了其机制的新解释框架,显示了改进的收敛特性。 - PDF

  8. Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling - 摘要: 本研究提出了一种使用相关噪声来改进探索的 PPO 变体,与传统方法相比,性能有所提高。 - PDF

  9. A dynamical clipping approach with task feedback for Proximal Policy Optimization - 摘要: 本文提出了 Pb-PPO,它动态调整 PPO 中的裁剪边界以提高回报,在各种任务中均显示出改进的性能。 - PDF

  10. PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

    • 摘要: 引入 PPO-UE,它结合了不确定性感知探索,与标准 PPO 相比,该论文显示了收敛速度和性能的提高。
    • PDF

这些论文全面介绍了 PPO 的发展和改进,以及它如何在强化学习框架中运行。您可以点击标题访问完整文章。

# Add another user message to induce our system to use the second tool
paper_conversation.add_message(
    "user",
    "Can you read the PPO sequence generation paper for me and give me a summary",
)
updated_response = chat_completion_with_function_execution(
    paper_conversation.conversation_history, functions=arxiv_functions
)
display(Markdown(updated_response.choices[0].message.content))
Function generation requested, calling function
Finding and reading paper
Existing papers found... Articles: 20
Chunking text from paper
Summarizing each chunk of text


100%|███████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.21s/it]


Summarizing into overall summary

Core Argument

  • 该论文认为,与传统的策略梯度方法相比,近端策略优化(PPO)及其动态变体(PPO-dynamic)在序列生成任务,特别是聊天机器人方面,显著提高了性能。
  • 它强调了传统策略梯度方法(如 REINFORCE)在训练中的不稳定性以及次优性能,并提出 PPO 是一种更稳定、更高效的替代方法。

Evidence

  • 策略梯度的挑战:传统方法由于更新幅度大和动作倾向相似,尤其是在不可微评估指标(如 BLEU 分数)方面,会导致训练不稳定和性能不佳。
  • PPO 的优势:PPO 对策略更新进行正则化,提高了稳定性,并能生成连贯且多样化的聊天机器人响应。
  • 动态 PPO 方法:PPO-dynamic 引入了对 KL 散度的自适应约束,允许根据动作概率进行动态调整,从而提高了训练性能。
  • 实验验证:在合成计数任务和真实聊天场景中进行的实验表明,PPO 和 PPO-dynamic 在稳定性、性能指标(如 BLEU-2 分数)方面优于传统的 REINFORCE 和 SeqGAN 等方法。
  • 结果:PPO-dynamic 在计数任务中显示出更快的收敛速度和更高的精度,并在聊天任务中取得了最佳性能,表明其在生成多样化且上下文相关的响应方面非常有效。

Conclusions

  • 该论文总结认为,引入 PPO 和 PPO-dynamic 可以提高序列生成任务的训练稳定性和输出多样性,使其更适用于聊天机器人等应用。
  • PPO 的动态变体不仅提高了性能,还加快了收敛速度,解决了传统策略梯度方法的局限性,并为强化学习在序列生成中的应用提供了稳健的框架。