如何通过新的 seed 参数使您的 completions 输出可复现
TLDR: 开发人员现在可以在 Chat Completion 请求中指定 seed
参数,以获得(大部分)一致的输出。为了帮助您跟踪这些更改,我们公开了 system_fingerprint
字段。如果此值不同,您可能会因为我们对系统所做的更改而看到不同的输出。请注意,此功能处于测试阶段,目前仅支持 gpt-4-1106-preview
和 gpt-3.5-turbo-1106
。
背景
在我们的 API 中,可复现性一直是用户社区的一大需求。例如,当获得可复现数值结果的能力时,用户可以解锁许多对数值变化敏感的用例。
用于一致输出的模型级功能
Chat Completions 和 Completions API 默认是非确定性的(意味着模型输出可能因请求而异),但现在通过一些模型级控件提供了对确定性输出的控制。
这可以实现一致的 completions,从而对 API 之上的任何内容实现对模型行为的完全控制,并且对于复现结果和测试非常有用,这样您就可以确切地知道会得到什么,从而安心。
实现一致的输出
要获得跨 API 调用的 大部分 确定性输出:
- 将
seed
参数设置为您选择的任何整数,但在请求之间使用相同的值。例如,12345
。 - 将所有其他参数(prompt、temperature、top_p 等)设置为相同的值。
- 在响应中,检查
system_fingerprint
字段。系统指纹是模型权重、基础设施和其他 OpenAI 服务器用于生成 completions 的配置选项的组合标识符。每当您更改请求参数,或 OpenAI 更新我们模型所服务的 But 基础架构的数值配置时(这可能每年发生几次),它都会发生变化。
如果您的请求中的 seed
、请求参数和 system_fingerprint
都匹配,那么模型输出将大部分相同。即使请求参数和 system_fingerprint
匹配,由于我们模型固有的非确定性,响应也可能略有不同。
用于一致输出的模型级控件 - seed
和 system_fingerprint
seed
如果指定,我们的系统将尽最大努力进行确定性采样,以便具有相同 seed 和参数的重复请求应返回相同的结果。不保证确定性,您应参考 system_fingerprint
响应参数来监控后端的变化。
system_fingerprint
此指纹代表模型运行的后端配置。它可以与 seed 请求参数结合使用,以了解何时进行了可能影响确定性的后端更改。这是用户是否应期望“几乎总是相同的结果”的指标。
示例:使用固定 seed 生成短文
在此示例中,我们将演示如何使用固定的 seed 生成短文。这在需要为测试、调试或需要一致输出的应用程序生成一致结果的情况下特别有用。
Python SDK
注意 将 SDK 切换到最新版本(撰写本文时为
1.3.3
)。
!pip install --upgrade openai # 切换到最新版本的 OpenAI (撰写本文时为 1.3.3)
import openai
import asyncio
from IPython.display import display, HTML
from utils.embeddings_utils import (
get_embedding,
distances_from_embeddings
)
GPT_MODEL = "gpt-3.5-turbo-1106"
async def get_chat_response(
system_message: str, user_request: str, seed: int = None, temperature: float = 0.7
):
try:
messages = [
{"role": "system", "content": system_message},
{"role": "user", "content": user_request},
]
response = openai.chat.completions.create(
model=GPT_MODEL,
messages=messages,
seed=seed,
max_tokens=200,
temperature=temperature,
)
response_content = response.choices[0].message.content
system_fingerprint = response.system_fingerprint
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.total_tokens - response.usage.prompt_tokens
table = f"""
<table>
<tr><th>Response</th><td>{response_content}</td></tr>
<tr><th>System Fingerprint</th><td>{system_fingerprint}</td></tr>
<tr><th>Number of prompt tokens</th><td>{prompt_tokens}</td></tr>
<tr><th>Number of completion tokens</th><td>{completion_tokens}</td></tr>
</table>
"""
display(HTML(table))
return response_content
except Exception as e:
print(f"An error occurred: {e}")
return None
def calculate_average_distance(responses):
"""
This function calculates the average distance between the embeddings of the responses.
The distance between embeddings is a measure of how similar the responses are.
"""
# Calculate embeddings for each response
response_embeddings = [get_embedding(response) for response in responses]
# Compute distances between the first response and the rest
distances = distances_from_embeddings(response_embeddings[0], response_embeddings[1:])
# Calculate the average distance
average_distance = sum(distances) / len(distances)
# Return the average distance
return average_distance
First, let's try generating few different versions of a short excerpt about "a journey to Mars" without the seed
parameter. This is the default behavior:
topic = "a journey to Mars"
system_message = "You are a helpful assistant."
user_request = f"Generate a short excerpt of news about {topic}."
responses = []
async def get_response(i):
print(f'Output {i + 1}\n{"-" * 10}')
response = await get_chat_response(
system_message=system_message, user_request=user_request
)
return response
responses = await asyncio.gather(*[get_response(i) for i in range(5)])
average_distance = calculate_average_distance(responses)
print(f"The average similarity between responses is: {average_distance}")
Output 1
----------
Response | "NASA's Mars mission reaches critical stage as spacecraft successfully enters orbit around the red planet. The historic journey, which began over a year ago, has captured the world's attention as scientists and astronauts prepare to land on Mars for the first time. The mission is expected to provide valuable insights into the planet's geology, atmosphere, and potential for sustaining human life in the future." |
---|---|
System Fingerprint | fp_772e8125bb |
Number of prompt tokens | 29 |
Number of completion tokens | 76 |
Output 2
----------
Response | "NASA's Perseverance rover successfully landed on Mars, marking a major milestone in the mission to explore the red planet. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples of rock and soil for future return to Earth. This historic achievement paves the way for further exploration and potential human missions to Mars in the near future." |
---|---|
System Fingerprint | fp_772e8125bb |
Number of prompt tokens | 29 |
Number of completion tokens | 76 |
Output 3
----------
Response | "SpaceX successfully launched the first manned mission to Mars yesterday, marking a historic milestone in space exploration. The crew of four astronauts will spend the next six months traveling to the red planet, where they will conduct groundbreaking research and experiments. This mission represents a significant step towards establishing a human presence on Mars and paves the way for future interplanetary travel." |
---|---|
System Fingerprint | fp_772e8125bb |
Number of prompt tokens | 29 |
Number of completion tokens | 72 |
Output 4
----------
Response | "NASA's latest Mars mission exceeds expectations as the Perseverance rover uncovers tantalizing clues about the Red Planet's past. Scientists are thrilled by the discovery of ancient riverbeds and sedimentary rocks, raising hopes of finding signs of past life on Mars. With this exciting progress, the dream of sending humans to Mars feels closer than ever before." |
---|---|
System Fingerprint | fp_772e8125bb |
Number of prompt tokens | 29 |
Number of completion tokens | 72 |
Output 5
----------
<table>
<tr><th>Response</th><td>"NASA's Perseverance Rover Successfully Lands on Mars, Begins Exploration Mission
In a historic moment for space exploration, NASA's Perseverance rover has successfully landed on the surface of Mars. After a seven-month journey, the rover touched down in the Jezero Crater, a location scientists believe may have once held a lake and could potentially contain signs of ancient microbial life.
The rover's primary mission is to search for evidence of past life on Mars and collect rock and soil samples for future return to Earth. Equipped with advanced scientific instruments, including cameras, spectrometers, and a drill, Perseverance will begin its exploration of the Martian surface, providing valuable data and insights into the planet's geology and potential habitability.
This successful landing marks a significant milestone in humanity's quest to understand the red planet and paves the way for future manned missions to Mars. NASA's Perseverance rover is poised to unravel the mysteries of Mars and unlock new possibilities
The average similarity between responses is: 0.1136714512418833
Now, let's try to tun the same code with a constant seed
of 123 and temperature
of 0 and compare the responses and system_fingerprint
.
SEED = 123
responses = []
async def get_response(i):
print(f'Output {i + 1}\n{"-" * 10}')
response = await get_chat_response(
system_message=system_message,
seed=SEED,
temperature=0,
user_request=user_request,
)
return response
responses = await asyncio.gather(*[get_response(i) for i in range(5)])
average_distance = calculate_average_distance(responses)
print(f"The average distance between responses is: {average_distance}")
Output 1
----------
<table>
<tr><th>Response</th><td>"NASA's Perseverance Rover Successfully Lands on Mars
In a historic achievement, NASA's Perseverance rover has successfully landed on the surface of Mars, marking a major milestone in the exploration of the red planet. The rover, which traveled over 293 million miles from Earth, is equipped with state-of-the-art instruments designed to search for signs of ancient microbial life and collect rock and soil samples for future return to Earth. This mission represents a significant step forward in our understanding of Mars and the potential for human exploration of the planet in the future."
Output 2
----------
Response | "NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel and expand our understanding of the universe." |
---|---|
System Fingerprint | fp_772e8125bb |
Number of prompt tokens | 29 |
Number of completion tokens | 81 |
Output 3
----------
Response | "NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as NASA continues to push the boundaries of space exploration." |
---|---|
System Fingerprint | fp_772e8125bb |
Number of prompt tokens | 29 |
Number of completion tokens | 72 |
Output 4
----------
Response | "NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel and expand our understanding of the universe." |
---|---|
System Fingerprint | fp_772e8125bb |
Number of prompt tokens | 29 |
Number of completion tokens | 81 |
Output 5
----------
Response | "NASA's Perseverance rover successfully lands on Mars, marking a historic milestone in space exploration. The rover is equipped with advanced scientific instruments to search for signs of ancient microbial life and collect samples for future return to Earth. This mission paves the way for future human exploration of the red planet, as scientists and engineers continue to push the boundaries of space travel." |
---|---|
System Fingerprint | fp_772e8125bb |
Number of prompt tokens | 29 |
Number of completion tokens | 74 |
The average distance between responses is: 0.0449054397632461
As we can observe, the seed
parameter allows us to generate much more consistent results.
Conclusion
We demonstrated how to use a fixed integer seed
to generate consistent outputs from our model. This is particularly useful in scenarios where reproducibility is important. However, it's important to note that while the seed
ensures consistency, it does not guarantee the quality of the output. Note that when you want to use reproducible outputs, you need to set the seed
to the same integer across Chat Completions calls. You should also match any other parameters like temperature
, max_tokens
etc. Further extension of reproducible outputs could be to use consistent seed
when benchmarking/evaluating the performance of different prompts or models, to ensure that each version is evaluated under the same conditions, making the comparisons fair and the results reliable.