我们正在解决转录精度问题,特别是公司名称和产品引用。我们的解决方案采用了双重策略,结合了 Whisper 的 prompt 参数和 GPT-4 的后处理能力。
有两种纠正不准确性的方法:
-
我们将正确拼写的列表直接输入到 Whisper 的 prompt 参数中,以指导初始转录。
-
我们利用 GPT-4 在转录后修复拼写错误,同样使用相同的正确拼写列表作为 prompt。
这些策略旨在确保对不熟悉专有名词的精确转录。
设置
首先,让我们:
- 导入 OpenAI Python 库(如果您没有,需要使用
pip install openai
进行安装) - 下载示例音频文件
# imports
from openai import OpenAI # for making OpenAI API calls
import urllib # for downloading example audio files
import os # for accessing environment variables
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
# set download paths
ZyntriQix_remote_filepath = "https://cdn.openai.com/API/examples/data/ZyntriQix.wav"
# set local save locations
ZyntriQix_filepath = "data/ZyntriQix.wav"
# download example audio files and save locally
urllib.request.urlretrieve(ZyntriQix_remote_filepath, ZyntriQix_filepath)
('data/ZyntriQix.wav', <http.client.HTTPMessage object at 0x10559a910>)
用虚构的音频录音设定我们的基准
我们的参考点是一段独白,这段独白是由 ChatGPT 根据作者给出的提示生成的。然后,作者为这段内容配音。因此,作者既通过提示指导了 ChatGPT 的输出,又通过说话赋予了它生命。
我们虚构的公司 ZyntriQix 提供一系列科技产品。这些产品包括 Digique Plus、CynapseFive、VortiQore V8、EchoNix Array、OrbitalLink Seven 和 DigiFractal Matrix。我们还领导了几个倡议,如 PULSE、RAPT、B.R.I.C.K.、Q.U.A.R.T.Z. 和 F.L.I.N.T.。
# define a wrapper function for seeing how prompts affect transcriptions
def transcribe(prompt: str, audio_filepath) -> str:
"""Given a prompt, transcribe the audio file."""
transcript = client.audio.transcriptions.create(
file=open(audio_filepath, "rb"),
model="whisper-1",
prompt=prompt,
)
return transcript.text
# baseline transcription with no prompt
transcribe(prompt="", audio_filepath=ZyntriQix_filepath)
"Have you heard of ZentricX? This tech giant boasts products like Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, and not to forget the latest Orbital Link 7 and Digifractal Matrix. Their innovation arsenal also includes the Pulse framework, Wrapped system, they've developed a brick infrastructure court system, and launched the Flint initiative, all highlighting their commitment to relentless innovation. ZentricX, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"
Whisper 转录了我们的公司名称、产品名称,并错误地大写了我们的首字母缩略词。让我们在 prompt 中传递正确的名称列表。
# add the correct spelling names to the prompt
transcribe(
prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.",
audio_filepath=ZyntriQix_filepath,
)
"Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system. They've developed a B.R.I.C.K. infrastructure, Q.U.A.R.T.Z. system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix in just 30 years has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"
在传递产品名称列表时,一些产品名称被正确转录,而另一些仍然拼写错误。
# add a full product list to the prompt
transcribe(
prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.",
audio_filepath=ZyntriQix_filepath,
)
"Have you heard of ZentricX? This tech giant boasts products like DigiCube Plus, Synapse 5, VortiCore V8, EchoNix Array, and not to forget the latest Orbital Link 7 and Digifractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system. They've developed a brick infrastructure court system and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZentricX in just 30 years has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"
您可以使用 GPT-4 来修复拼写错误
当语音内容事先未知且我们手头有产品名称列表时,利用 GPT-4 特别有用。
与仅依赖 Whisper 的 prompt 参数(其 token 限制为 244)相比,使用 GPT-4 的后处理技术在可扩展性方面明显更胜一筹。GPT-4 允许我们处理更长的正确拼写列表,使其成为处理大量产品列表的更稳健的方法。
然而,这种后处理技术并非没有局限性。它受限于所选模型的上下文窗口,在处理大量唯一术语时可能会带来挑战。例如,拥有数千个 SKU 的公司可能会发现 GPT-4 的上下文窗口不足以满足其需求,可能需要探索替代解决方案。
有趣的是,GPT-4 后处理技术似乎比单独使用 Whisper 更可靠。这种利用产品列表的方法提高了我们结果的可靠性。但是,这种可靠性的提高是有代价的,因为使用这种方法会增加成本并可能导致更高的延迟。
# define a wrapper function for seeing how prompts affect transcriptions
def transcribe_with_spellcheck(system_message, audio_filepath):
completion = client.chat.completions.create(
model="gpt-4",
temperature=0,
messages=[
{"role": "system", "content": system_message},
{
"role": "user",
"content": transcribe(prompt="", audio_filepath=audio_filepath),
},
],
)
return completion.choices[0].message.content
现在,让我们将原始产品列表输入 GPT-4 并评估其性能。通过这样做,我们旨在评估 AI 模型在事先不知道确切术语的情况下正确拼写专有产品名称的能力。在我们的实验中,GPT-4 成功地正确拼写了我们的产品名称,证实了其作为确保转录准确性的可靠工具的潜力。
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?
在此案例中,我们提供了一个包含所有先前使用的拼写以及其他新名称的全面产品列表。这种情况模拟了现实生活场景,即我们拥有大量的 SKU 列表,并且不确定转录中会出现的确切术语。将此广泛的产品名称列表输入系统后,得到了正确转录的输出。
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?
我们正在使用 GPT-4 作为拼写检查器,使用与之前在 prompt 中使用的正确拼写列表相同的列表。
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your first task is to list the words that are not spelled correctly according to the list provided to you and to tell me the number of misspelled words. Your next task is to insert those correct words in place of the misspelled ones. List: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
The misspelled words are: ZentricX, Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, Orbital Link 7, Digifractal Matrix, Pulse, Wrapped, brick, Flint, and 30. The total number of misspelled words is 12.
The corrected paragraph is:
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just MetaSync Thirty years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?