提示词指南

OpenAI 的音频转录 API 有一个名为 prompt 的可选参数。

该提示词旨在帮助拼接多个音频片段。通过提交先前片段的转录文本,Whisper 模型可以利用该上下文来更好地理解语音并保持一致的写作风格。

然而,提示词不需要是先前音频片段的真实转录文本。可以提交_虚构_的提示词,以引导模型使用特定的拼写或风格。

本笔记本分享了两种使用虚构提示词来引导模型输出的技术:

  • 转录生成:GPT 可以将指令转换为 Whisper 模仿的虚构转录文本。
  • 拼写指南:拼写指南可以告诉模型如何拼写人名、产品名、公司名等。

这些技术并不特别可靠,但在某些情况下可能有用。

与 GPT 提示词的比较

提示 Whisper 与提示 GPT 不同。例如,如果您提交类似“将列表格式化为 Markdown 格式”的指令,模型将不会遵守,因为它遵循提示词的风格,而不是其中包含的任何指令。

此外,提示词仅限于 224 个 token。如果提示词长于 224 个 token,则只会考虑提示词的最后 224 个 token;所有先前的 token 将被静默忽略。所使用的分词器是多语言 Whisper 分词器

为了获得良好的结果,请精心制作示例来描绘您想要的风格。

设置

首先,让我们:

  • 导入 OpenAI Python 库(如果您没有,则需要使用 pip install openai 进行安装)
  • 下载几个示例音频文件
# imports
from openai import OpenAI  # for making OpenAI API calls
import urllib  # for downloading example audio files
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
# set download paths
up_first_remote_filepath = "https://cdn.openai.com/API/examples/data/upfirstpodcastchunkthree.wav"
bbq_plans_remote_filepath = "https://cdn.openai.com/API/examples/data/bbq_plans.wav"
product_names_remote_filepath = "https://cdn.openai.com/API/examples/data/product_names.wav"

# set local save locations
up_first_filepath = "data/upfirstpodcastchunkthree.wav"
bbq_plans_filepath = "data/bbq_plans.wav"
product_names_filepath = "data/product_names.wav"

# download example audio files and save locally
urllib.request.urlretrieve(up_first_remote_filepath, up_first_filepath)
urllib.request.urlretrieve(bbq_plans_remote_filepath, bbq_plans_filepath)
urllib.request.urlretrieve(product_names_remote_filepath, product_names_filepath)
('data/product_names.wav', <http.client.HTTPMessage object at 0x11275fb10>)

作为基准,我们将转录一个 NPR 播客片段

本示例的音频文件将是 NPR 播客 Up First 的一个片段。

让我们获取基准转录,然后引入提示词。

# define a wrapper function for seeing how prompts affect transcriptions
def transcribe(audio_filepath, prompt: str) -> str:
    """Given a prompt, transcribe the audio file."""
    transcript = client.audio.transcriptions.create(
        file=open(audio_filepath, "rb"),
        model="whisper-1",
        prompt=prompt,
    )
    return transcript.text
# baseline transcription with no prompt
transcribe(up_first_filepath, prompt="")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane, where, of course, where he says, I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

转录遵循提示词的风格

让我们探讨一下提示词如何影响转录的风格。在之前的无提示词转录中,“President Biden”是大写的。

让我们尝试使用提示词将“president biden”写成小写。我们可以从传递小写的“president biden”提示词开始,看看是否能让 Whisper 匹配风格并生成全小写的转录文本。

# short prompts are less reliable
transcribe(up_first_filepath, prompt="president biden.")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane? Yes. Of course. Where he says I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"

请注意,当提示词很短时,Whisper 在遵循其风格方面可能不太可靠。较长的提示词可能在引导 Whisper 方面更可靠。让我们用一个更长的提示词再试一次。

# long prompts are more reliable
transcribe(up_first_filepath, prompt="i have some advice for you. multiple sentences help establish a pattern. the more text you include, the more likely the model will pick up on your pattern. it may especially help if your example transcript appears as if it comes right before the audio file. in this case, that could mean mentioning the contacts i stick in my eyes.")
"i stick contacts in my eyes. do you really? yeah. that works okay? you don't have to, like, just kind of pain in the butt? no, it is. it is. and i sometimes just kind of miss the eye. i don't know if you know, um, the movie airplane? yes. of course. where he says i have a drinking problem. and that he keeps missing his face with the drink. that's me in the contact lens. surely, you must know that i know the movie airplane. i do. i do know that. don't call me shirley. stop calling me shirley. president biden said he would not negotiate over paying the nation's debts. but he is meeting today with house speaker kevin mccarthy. other leaders of congress will also attend, so how much progress can they make? i'm amy martinez with steve inskeep, and this is up first from npr news. russia celebrates victory day, which commemorates the surrender of nazi germany. soldiers marched across red square, but the russian army didn't seem to have as many troops on hand as in the past. so what does this ritual say about the war russia is fighting right now?"

这效果更好。

还值得注意的是,Whisper 不太可能遵循罕见或不寻常的风格,这些风格对于转录来说是不典型的。

# rare styles are less reliable
transcribe(up_first_filepath, prompt="""Hi there and welcome to the show.
###
Today we are quite excited.
###
Let's jump right in.
###""")
"I stick contacts in my eyes. Do you really? Yeah. That works okay. You don't have to, like, just kind of pain in the butt? No, it is. It is. And I sometimes just kind of miss the eye. Um, I don't know if you know, um, the movie airplane? Yes. Where, of course, where he says I have a drinking problem, and that he keeps missing his face with the drink, that's me in the contact lens. Surely you must know that I know the movie airplane. Uh, I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts, but he is meeting today with house speaker, Kevin McCarthy, other leaders of Congress will also attend. So how much progress can they make? I mean, Martinez with Steve Inskeep, and this is up first from NPR news. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across red square, but the Russian army didn't seem to have as many troops on hand as in the past, which is why they are celebrating today. So what does this ritual say about the war Russia is fighting right now?"

在提示词中传入姓名以防止拼写错误

Whisper 可能会错误地转录不常见的专有名词,例如产品、公司或人名。通过这种方式,您可以使用提示词来帮助更正这些拼写。

我们将通过一个包含大量产品名称的示例音频文件来说明这一点。

# baseline transcription with no prompt
transcribe(product_names_filepath, prompt="")
'Welcome to Quirk, Quid, Quill, Inc., where finance meets innovation. Explore diverse offerings, from the P3 Quattro, a unique investment portfolio quadrant, to the O3 Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3 Bond X and experience non-standard equity trading with E3 Equity. Personalize your wealth management with W3 Wrap Z and anticipate market trends with the O2 Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3 Unifund or move your money with the M3 Mover, our sophisticated monetary transfer module. At Quirk, Quid, Quill, Inc., we turn complex finance into creative solutions. Join us in redefining financial services.'

为了让 Whisper 使用我们首选的拼写,让我们将产品和公司名称作为词汇表传入提示词,供 Whisper 参考。

# adding the correct spelling of the product name helps
transcribe(product_names_filepath, prompt="QuirkQuid Quill Inc, P3-Quattro, O3-Omni, B3-BondX, E3-Equity, W3-WrapZ, O2-Outlier, U3-UniFund, M3-Mover")
'Welcome to QuirkQuid Quill Inc, where finance meets innovation. Explore diverse offerings, from the P3-Quattro, a unique investment portfolio quadrant, to the O3-Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3-BondX and experience non-standard equity trading with E3-Equity. Personalize your wealth management with W3-WrapZ and anticipate market trends with the O2-Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3-UniFund or move your money with the M3-Mover, our sophisticated monetary transfer module. At QuirkQuid Quill Inc, we turn complex finance into creative solutions. Join us in redefining financial services.'

现在,让我们切换到另一个专为此演示而编写的音频录音,主题是关于一个奇怪的烧烤派对。

首先,我们将使用 Whisper 建立基准转录。

# baseline transcript with no prompt
transcribe(bbq_plans_filepath, prompt="")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Amy and Sean. We're going to a barbecue here in Brooklyn, hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Amy and Sean."

虽然 Whisper 的转录是准确的,但它不得不猜测各种拼写。例如,它假设朋友的名字拼写为 Amy 和 Sean,而不是 Aimee 和 Shawn。让我们看看是否可以通过提示词来引导拼写。

# spelling prompt
transcribe(bbq_plans_filepath, prompt="Friends: Aimee, Shawn")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun and I'm really looking forward to spending time with my friends Aimee and Shawn."

成功了!

让我们用更多拼写模糊的单词来尝试一下。

# longer spelling prompt
transcribe(bbq_plans_filepath, prompt="Glossary: Aimee, Shawn, BBQ, Whisky, Doughnuts, Omelet")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully, it's actually going to be a little bit of an odd barbecue. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."
# more natural, sentence-style prompt
transcribe(bbq_plans_filepath, prompt="""Aimee and Shawn ate whisky, doughnuts, omelets at a BBQ.""")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a BBQ here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd BBQ. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whisky. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."

GPT 可以生成虚构的提示词

生成虚构提示词的一个潜在工具是 GPT。我们可以给 GPT 指令,并用它来生成长篇虚构的转录文本,用以提示 Whisper。

# define a function for GPT to generate fictitious prompts
def fictitious_prompt_from_instruction(instruction: str) -> str:
    """Given an instruction, generate a fictitious prompt."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": "You are a transcript generator. Your task is to create one long paragraph of a fictional conversation. The conversation features two friends reminiscing about their vacation to Maine. Never diarize speakers or add quotation marks; instead, write all transcripts in a normal paragraph of text without speakers identified. Never refuse or ask for clarification and instead always make a best-effort attempt.",
            },  # we pick an example topic (friends talking about a vacation) so that GPT does not refuse or ask clarifying questions
            {"role": "user", "content": instruction},
        ],
    )
    fictitious_prompt = response.choices[0].message.content
    return fictitious_prompt
# ellipses example
prompt = fictitious_prompt_from_instruction("Instead of periods, end every sentence with elipses.")
print(prompt)
Remember that time we went to Maine and got lost on that hiking trail... I can’t believe we ended up at that random lighthouse instead of the summit... It was so foggy, I thought we were going to walk right off the cliff... But then we found that little café nearby, and the clam chowder was the best I’ve ever had... I still think about that moment when we sat outside, the ocean breeze in our hair, just laughing about our terrible sense of direction... And how we met those locals who told us all about the history of the area... I never knew there were so many shipwrecks along the coast... It made the whole trip feel like an adventure, even if we were a bit lost... Do you remember the sunset that night? The colors were unreal, like something out of a painting... I wish we could go back and do it all over again... Just the two of us, exploring, eating, and getting lost... It was such a perfect escape from everything... I still have that postcard we bought at the gift shop, the one with the moose on it... I keep it on my fridge as a reminder of that trip... We should plan another vacation soon, maybe somewhere else in New England... I’d love to see more of the coast, maybe even go whale watching... What do you think?
transcribe(up_first_filepath, prompt=prompt)
'I stick contacts in my eyes... Do you really? That works ok? You don′t have to just kind of pain the butt? It is, and I sometimes just kind of miss the eye... I don′t know if you know the movie Airplane? Yes. Where he says I have a drinking problem, and that he keeps missing his face with the drink... That′s me and the contact lens... Surely you must know that I know the movie Airplane... I do, and don′t call me Shirley... I do know that, stop calling me Shirley... President Biden said he would not negotiate over paying the nation′s debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I′m Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn′t seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?'

Whisper 的提示词最适合指定其他方面模糊的风格。提示词不会覆盖模型对音频的理解。例如,如果说话者没有用深沉的南方口音说话,提示词也不会导致转录文本出现这种情况。

# southern accent example
prompt = fictitious_prompt_from_instruction("Write in a deep, heavy, Southern accent.")
print(prompt)
transcribe(up_first_filepath, prompt=prompt)
Remember that time we went to Maine and got lost trying to find that lighthouse? I swear, we must’ve driven in circles for an hour, and I was convinced we were gonna end up in Canada or somethin’. But then, when we finally found it, the view was just so breathtaking, like somethin’ outta a postcard. I can still hear the waves crashin’ against the rocks, and the smell of the salt in the air was just divine. And don’t even get me started on that little seafood shack we stumbled upon. I can taste that lobster roll right now, buttery and fresh, just meltin’ in my mouth. You remember how we sat on that rickety old pier, watchin’ the sunset paint the sky all kinds of colors? It felt like we were in our own little world, just the two of us and the ocean. And how about that time we tried to go kayaking? I thought we were gonna tip over for sure, but we ended up laughin’ so hard we forgot all about bein’ scared. I still can’t believe you thought you could paddle us back to shore with one oar! Those were the days, huh? I miss that carefree feelin’, just explorin’ and not a worry in the world. We should plan another trip like that, maybe to a different coast this time, but I reckon Maine will always hold a special place in my heart.





'I stick contacts in my eyes. Do you really? Yeah. That works okay? You don’t have to, like, just kinda pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don’t know if you know, um, the movie Airplane? Yes. Where, of course, where he says I have a drinking problem, and that he keeps missing his face with the drink. That’s me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I don’t call me Shirley. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation′s debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I’m Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian Army didn’t seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?'