如何通过新的 seed 参数使您的 completions 输出可复现

TLDR: 开发人员现在可以在 Chat Completion 请求中指定 seed 参数,以获得(大部分)一致的输出。为了帮助您跟踪这些更改,我们公开了 system_fingerprint 字段。如果此值不同,您可能会因为我们对系统所做的更改而看到不同的输出。请注意,此功能处于测试阶段,目前仅支持 gpt-4-1106-previewgpt-3.5-turbo-1106

背景

在我们的 API 中,可复现性一直是用户社区的一大需求。例如,当获得可复现数值结果的能力时,用户可以解锁许多对数值变化敏感的用例。

用于一致输出的模型级功能

Chat Completions 和 Completions API 默认是非确定性的(意味着模型输出可能因请求而异),但现在通过一些模型级控件提供了对确定性输出的控制。

这可以实现一致的 completions,从而对 API 之上的任何内容实现对模型行为的完全控制,并且对于复现结果和测试非常有用,这样您就可以确切地知道会得到什么,从而安心。

实现一致的输出

要获得跨 API 调用的 大部分 确定性输出:

  • seed 参数设置为您选择的任何整数,但在请求之间使用相同的值。例如,12345
  • 将所有其他参数(prompt、temperature、top_p 等)设置为相同的值。
  • 在响应中,检查 system_fingerprint 字段。系统指纹是模型权重、基础设施和其他 OpenAI 服务器用于生成 completions 的配置选项的组合标识符。每当您更改请求参数,或 OpenAI 更新我们模型所服务的 But 基础架构的数值配置时(这可能每年发生几次),它都会发生变化。

如果您的请求中的 seed、请求参数和 system_fingerprint 都匹配,那么模型输出将大部分相同。即使请求参数和 system_fingerprint 匹配,由于我们模型固有的非确定性,响应也可能略有不同。

用于一致输出的模型级控件 - seedsystem_fingerprint

seed

如果指定,我们的系统将尽最大努力进行确定性采样,以便具有相同 seed 和参数的重复请求应返回相同的结果。不保证确定性,您应参考 system_fingerprint 响应参数来监控后端的变化。

system_fingerprint

此指纹代表模型运行的后端配置。它可以与 seed 请求参数结合使用,以了解何时进行了可能影响确定性的后端更改。这是用户是否应期望“几乎总是相同的结果”的指标。

示例:使用固定 seed 生成短文

在此示例中,我们将演示如何使用固定的 seed 生成短文。这在需要为测试、调试或需要一致输出的应用程序生成一致结果的情况下特别有用。

Python SDK

注意 将 SDK 切换到最新版本(撰写本文时为 1.3.3)。

!pip install --upgrade openai # 切换到最新版本的 OpenAI (撰写本文时为 1.3.3)
import openai
import asyncio
from IPython.display import display, HTML

from utils.embeddings_utils import (
    get_embedding,
    distances_from_embeddings
)

GPT_MODEL = "gpt-3.5-turbo-1106"
async def get_chat_response(
    system_message: str, user_request: str, seed: int = None, temperature: float = 0.7
):
    try:
        messages = [
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_request},
        ]

        response = openai.chat.completions.create(
            model=GPT_MODEL,
            messages=messages,
            seed=seed,
            max_tokens=200,
            temperature=temperature,
        )

        response_content = response.choices[0].message.content
        system_fingerprint = response.system_fingerprint
        prompt_tokens = response.usage.prompt_tokens
        completion_tokens = response.usage.total_tokens - response.usage.prompt_tokens

        table = f"""
        <table>
        <tr><th>Response</th><td>{response_content}</td></tr>
        <tr><th>System Fingerprint</th><td>{system_fingerprint}</td></tr>
        <tr><th>Number of prompt tokens</th><td>{prompt_tokens}</td></tr>
        <tr><th>Number of completion tokens</th><td>{completion_tokens}</td></tr>
        </table>
        """
        display(HTML(table))

        return response_content
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

def calculate_average_distance(responses):
    """
    This function calculates the average distance between the embeddings of the responses.
    The distance between embeddings is a measure of how similar the responses are.
    """
    # Calculate embeddings for each response
    response_embeddings = [get_embedding(response) for response in responses]

    # Compute distances between the first response and the rest
    distances = distances_from_embeddings(response_embeddings[0], response_embeddings[1:])

    # Calculate the average distance
    average_distance = sum(distances) / len(distances)

    # Return the average distance
    return average_distance

First, let's try generating few different versions of a short excerpt about "a journey to Mars" without the seed parameter. This is the default behavior:

topic = "a journey to Mars"
system_message = "You are a helpful assistant."
user_request = f"Generate a short excerpt of news about {topic}."

responses = []


async def get_response(i):
    print(f'Output {i + 1}\n{"-" * 10}')
    response = await get_chat_response(
        system_message=system_message, user_request=user_request
    )
    return response


responses = await asyncio.gather(*[get_response(i) for i in range(5)])
average_distance = calculate_average_distance(responses)
print(f"The average similarity between responses is: {average_distance}")
Output 1
----------
Response"NASA's Mars mission reaches critical stage as spacecraft successfully enters orbit around the red planet. The historic journey, which began over a year ago, has captured the world's attention as scientists and astronauts prepare to land on Mars for the first time. The mission is expected to provide valuable insights into the planet's geology, atmosphere, and potential for sustaining human life in the future."
System Fingerprintfp_772e8125bb
Number of prompt tokens29
Number of completion tokens76
Output 2
----------
Response"NASA's Perseverance rover successfully landed on Mars, marking a major milestone in the mission to explore the red planet. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples of rock and soil for future return to Earth. This historic achievement paves the way for further exploration and potential human missions to Mars in the near future."
System Fingerprintfp_772e8125bb
Number of prompt tokens29
Number of completion tokens76
Output 3
----------
Response"SpaceX successfully launched the first manned mission to Mars yesterday, marking a historic milestone in space exploration. The crew of four astronauts will spend the next six months traveling to the red planet, where they will conduct groundbreaking research and experiments. This mission represents a significant step towards establishing a human presence on Mars and paves the way for future interplanetary travel."
System Fingerprintfp_772e8125bb
Number of prompt tokens29
Number of completion tokens72
Output 4
----------
Response"NASA's latest Mars mission exceeds expectations as the Perseverance rover uncovers tantalizing clues about the Red Planet's past. Scientists are thrilled by the discovery of ancient riverbeds and sedimentary rocks, raising hopes of finding signs of past life on Mars. With this exciting progress, the dream of sending humans to Mars feels closer than ever before."
System Fingerprintfp_772e8125bb
Number of prompt tokens29
Number of completion tokens72
Output 5
----------




    <table>
    <tr><th>Response</th><td>"NASA's Perseverance Rover Successfully Lands on Mars, Begins Exploration Mission

In a historic moment for space exploration, NASA's Perseverance rover has successfully landed on the surface of Mars. After a seven-month journey, the rover touched down in the Jezero Crater, a location scientists believe may have once held a lake and could potentially contain signs of ancient microbial life.

The rover's primary mission is to search for evidence of past life on Mars and collect rock and soil samples for future return to Earth. Equipped with advanced scientific instruments, including cameras, spectrometers, and a drill, Perseverance will begin its exploration of the Martian surface, providing valuable data and insights into the planet's geology and potential habitability.

This successful landing marks a significant milestone in humanity's quest to understand the red planet and paves the way for future manned missions to Mars. NASA's Perseverance rover is poised to unravel the mysteries of Mars and unlock new possibilities System Fingerprintfp_772e8125bb Number of prompt tokens29 Number of completion tokens200

The average similarity between responses is: 0.1136714512418833

Now, let's try to tun the same code with a constant seed of 123 and temperature of 0 and compare the responses and system_fingerprint.

SEED = 123
responses = []


async def get_response(i):
    print(f'Output {i + 1}\n{"-" * 10}')
    response = await get_chat_response(
        system_message=system_message,
        seed=SEED,
        temperature=0,
        user_request=user_request,
    )
    return response


responses = await asyncio.gather(*[get_response(i) for i in range(5)])

average_distance = calculate_average_distance(responses)
print(f"The average distance between responses is: {average_distance}")
Output 1
----------




    <table>
    <tr><th>Response</th><td>"NASA's Perseverance Rover Successfully Lands on Mars

In a historic achievement, NASA's Perseverance rover has successfully landed on the surface of Mars, marking a major milestone in the exploration of the red planet. The rover, which traveled over 293 million miles from Earth, is equipped with state-of-the-art instruments designed to search for signs of ancient microbial life and collect rock and soil samples for future return to Earth. This mission represents a significant step forward in our understanding of Mars and the potential for human exploration of the planet in the future." System Fingerprintfp_772e8125bb Number of prompt tokens29 Number of completion tokens113

Output 2
----------
Response"NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel and expand our understanding of the universe."
System Fingerprintfp_772e8125bb
Number of prompt tokens29
Number of completion tokens81
Output 3
----------
Response"NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as NASA continues to push the boundaries of space exploration."
System Fingerprintfp_772e8125bb
Number of prompt tokens29
Number of completion tokens72
Output 4
----------
Response"NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel and expand our understanding of the universe."
System Fingerprintfp_772e8125bb
Number of prompt tokens29
Number of completion tokens81
Output 5
----------
Response"NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel."
System Fingerprintfp_772e8125bb
Number of prompt tokens29
Number of completion tokens74
The average distance between responses is: 0.0449054397632461

As we can observe, the seed parameter allows us to generate much more consistent results.

Conclusion

We demonstrated how to use a fixed integer seed to generate consistent outputs from our model. This is particularly useful in scenarios where reproducibility is important. However, it's important to note that while the seed ensures consistency, it does not guarantee the quality of the output. Note that when you want to use reproducible outputs, you need to set the seed to the same integer across Chat Completions calls. You should also match any other parameters like temperature, max_tokens etc. Further extension of reproducible outputs could be to use consistent seed when benchmarking/evaluating the performance of different prompts or models, to ensure that each version is evaluated under the same conditions, making the comparisons fair and the results reliable.