提示缓存入门

OpenAI 为超过 1024 个 token 的提示提供折扣提示缓存，使长达 10,000 多个 token 的提示延迟最多可降低 80%。通过在 LLM API 请求之间缓存重复信息，您可以大大减少延迟和成本。提示缓存的作用域是组织级别，这意味着只有同一组织的成员才能访问共享缓存。此外，缓存符合零数据保留要求，因为在此过程中不存储任何数据。

提示缓存会自动为超过 1024 个 token 的提示激活——您无需更改任何内容即可在完成请求中使用。当发出 API 请求时，系统会首先检查提示的开头部分（前缀）是否已缓存。如果找到匹配项（缓存命中），则使用缓存的提示，从而降低延迟和成本。如果没有匹配项，系统将从头开始处理完整提示，并缓存前缀以供将来使用。

考虑到这些好处，提示缓存特别有利的一些关键用例包括：

使用工具和结构化输出的代理：缓存工具和模式的扩展列表。
编码和写作助手：将代码库和工作区的较大段落或摘要直接插入提示中。
聊天机器人：缓存多轮对话的静态部分，以在扩展对话中有效维护上下文。

在本指南中，我们将通过几个缓存工具和图像的示例。请记住，通常情况下，您希望将指令和示例等静态内容放在提示的开头，将用户特定信息等动态内容放在末尾。这也适用于图像和工具，它们在请求之间的顺序也必须相同。所有请求，包括少于 1024 个 token 的请求，都将在 chat completions 对象 usage.prompt_tokens_details 中显示 cached_tokens 字段，指示提示中有多少 token 是缓存命中。对于少于 1024 个 token 的请求，cached_tokens 将为零。缓存折扣基于实际处理的 token 数量，包括用于图像的 token，这些 token 也计入您的速率限制。

示例 1：缓存工具和多轮对话

在此示例中，我们为客户支持助手定义了工具和交互，该助手能够处理诸如检查交货日期、取消订单和更新付款方式等任务。助手处理两个单独的消息，首先响应初始查询，然后延迟响应后续查询。

缓存工具时，工具定义及其顺序保持不变很重要，这样才能将其包含在提示前缀中。要缓存多轮对话中的消息历史记录，请将新元素追加到 messages 数组的末尾。在响应对象和下面的输出中，对于第二个完成项 run2，您可以看到 cached_tokens 值大于零，这表明缓存成功。

from openai import OpenAI
import os
import json
import time


api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI(organization='org-l89177bnhkme4a44292n5r3j', api_key=api_key)

import time
import json

# 定义工具
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_date",
            "description": "获取客户订单的交货日期。每当您需要知道交货日期时调用此函数，例如当客户询问“我的包裹在哪里”时。",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "客户的订单 ID。",
                    },
                },
                "required": ["order_id"],
                "additionalProperties": False,
            },
        }
    },
    {
        "type": "function",
        "function": {
            "name": "cancel_order",
            "description": "取消尚未发货的订单。当客户请求取消订单时使用此功能。",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "客户的订单 ID。"
                    },
                    "reason": {
                        "type": "string",
                        "description": "取消订单的原因。"
                    }
                },
                "required": ["order_id", "reason"],
                "additionalProperties": False
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "return_item",
            "description": "处理订单退货。当客户想要退货且订单已送达时，应调用此函数。",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "客户的订单 ID。"
                    },
                    "item_id": {
                        "type": "string",
                        "description": "客户想要退货的具体商品 ID。"
                    },
                    "reason": {
                        "type": "string",
                        "description": "退货原因。"
                    }
                },
                "required": ["order_id", "item_id", "reason"],
                "additionalProperties": False
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "update_shipping_address",
            "description": "更新尚未发货的订单的送货地址。如果客户想更改送货地址，请使用此功能。",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "客户的订单 ID。"
                    },
                    "new_address": {
                        "type": "object",
                        "properties": {
                            "street": {
                                "type": "string",
                                "description": "新的街道地址。"
                            },
                            "city": {
                                "type": "string",
                                "description": "新的城市。"
                            },
                            "state": {
                                "type": "string",
                                "description": "新的州。"
                            },
                            "zip": {
                                "type": "string",
                                "description": "新的邮政编码。"
                            },
                            "country": {
                                "type": "string",
                                "description": "新的国家。"
                            }
                        },
                        "required": ["street", "city", "state", "zip", "country"],
                        "additionalProperties": False
                    }
                },
                "required": ["order_id", "new_address"],
                "additionalProperties": False
            }
        }
    },
    # 新工具：更新付款方式
    {
        "type": "function",
        "function": {
            "name": "update_payment_method",
            "description": "更新尚未完成的订单的付款方式。如果客户想更改其付款详细信息，请使用此功能。",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "客户的订单 ID。"
                    },
                    "payment_method": {
                        "type": "object",
                        "properties": {
                            "card_number": {
                                "type": "string",
                                "description": "新的信用卡号。"
                            },
                            "expiry_date": {
                                "type": "string",
                                "description": "新的信用卡到期日期（MM/YY 格式）。"
                            },
                            "cvv": {
                                "type": "string",
                                "description": "新的信用卡 CVV 代码。"
                            }
                        },
                        "required": ["card_number", "expiry_date", "cvv"],
                        "additionalProperties": False
                    }
                },
                "required": ["order_id", "payment_method"],
                "additionalProperties": False
            }
        }
    }
]

# 增强的系统消息，包含防护栏
messages = [
    {
        "role": "system",
        "content": (
            "您是一位专业、有同情心且高效的客户支持助手。您的任务是为客户提供快速、清晰、全面的帮助，同时保持热情友好的语气。 "
            "始终表达同情，尤其是在用户似乎感到沮丧或担忧时，并确保您的语言礼貌且专业。 "
            "使用简单清晰的沟通方式，避免任何误解，并在进行任何操作前与用户确认。 "
            "在更复杂或时间紧迫的情况下，请向用户保证您正在迅速采取行动，并提供定期更新。 "
            "适应用户的语气：即使在压力大或困难的情况下，也要保持冷静、友好和理解。"
            "\n\n"
            "此外，您在协助用户时必须遵守以下几项重要防护栏："
            "\n\n"
            "1. **保密和数据隐私**：不要分享任何关于公司或其他用户的信息。在处理订单 ID、地址或付款方式等个人详细信息时，请确保这些信息得到最高程度的保密。如果用户请求访问其数据，请仅提供与其请求相关的信息，确保不会意外泄露其他用户的信息。"
            "\n\n"
            "2. **安全支付处理**：在更新付款详细信息或处理退款时，请始终确保付款数据（如信用卡号、CVV 和到期日期）得到安全传输和存储。切勿显示或记录完整的信用卡号。在处理任何付款更改或退款之前，请与用户确认。"
            "\n\n"
            "3. **尊重界限**：如果用户表达沮丧或不满，请保持冷静和同情，但不要越过职业界限。不要做出个人判断，并避免使用可能加剧情况的语言。坚持事实信息和清晰的解决方案来解决用户的问题。"
            "\n\n"
            "4. **法律合规**：确保您采取的所有行动都符合法律和法规标准。例如，如果用户请求退款、取消或退货，请严格遵守公司的退款政策。如果订单因已发货或其他限制而无法取消，请清晰但同情地解释政策。"
            "\n\n"
            "5. **一致性**：始终提供与公司政策一致的信息。如果不确定公司政策，请与用户清晰沟通，告知他们您正在核实信息，并避免提供虚假承诺。如果将问题升级到其他团队，请告知用户并提供他们可以期望解决问题的时间表。"
            "\n\n"
            "6. **赋能用户**：尽可能赋能用户做出明智的决定。为他们提供相关选项并清晰解释每个选项，确保他们了解每个选择的后果（例如，取消订单可能会导致失去忠诚积分等）。确保您的协助支持他们的自主性。"
            "\n\n"
            "7. **无猜测性信息**：不要猜测结果或提供您不确定的信息。在讨论订单状态、政策或潜在解决方案时，始终坚持经过验证的事实。如果不确定，请告知用户您将进一步调查，然后再做任何承诺。"
            "\n\n"
            "8. **尊重和包容性语言**：无论用户的语气如何，都要确保您的语言保持包容和尊重。避免基于有限信息做出假设，并注意用户多样化的需求和背景。"
        )
    },
    {
        "role": "user",
        "content": (
            "您好，我三天前下了一个订单，但没有收到关于何时发货的任何更新。 "
            "您能帮我查一下交货日期吗？我的订单号是 #9876543210。我有点担心，因为我急需这个物品。"
        )
    }
]

# 增强的 user_query2
user_query2 = {
    "role": "user",
    "content": (
        "由于我的订单实际上还没有发货，我想取消它。 "
        "订单号是 #9876543210，我需要取消，因为我决定在本地购买以更快地收到它。 "
        "你能帮我处理吗？谢谢！"
    )
}

# 运行完成的函数，提供消息历史和工具
def completion_run(messages, tools):
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        tools=tools,
        messages=messages,
        tool_choice="required"
    )
    usage_data = json.dumps(completion.to_dict(), indent=4)
    return usage_data

# 处理两次运行的主函数
def main(messages, tools, user_query2):
    # 运行 1：初始查询
    print("运行 1：")
    run1 = completion_run(messages, tools)
    print(run1)

    # 延迟 7 秒
    time.sleep(7)

    # 追加 user_query2 到消息历史记录
    messages.append(user_query2)

    # 运行 2：带追加查询
    print("\n运行 2：")
    run2 = completion_run(messages, tools)
    print(run2)


# 运行主函数
main(messages, tools, user_query2)

运行 1：
{
    "id": "chatcmpl-ADeOueQSi2DIUMdLXnZIv9caVfnro",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "call_5TnLcdD9tyVMVbzNGdejlJJa",
                        "function": {
                            "arguments": "{\"order_id\":\"9876543210\"}",
                            "name": "get_delivery_date"
                        },
                        "type": "function"
                    }
                ]
            }
        }
    ],
    "created": 1727816928,
    "model": "gpt-4o-mini-2024-07-18",
    "object": "chat.completion",
    "system_fingerprint": "fp_f85bea6784",
    "usage": {
        "completion_tokens": 17,
        "prompt_tokens": 1079,
        "total_tokens": 1096,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

运行 2：
{
    "id": "chatcmpl-ADeP2i0frELC4W5RVNNkKz6TQ7hig",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": [
                    {
                        "id": "call_viwwDZPuQh8hJFPf2Co1dYJK",
                        "function": {
                            "arguments": "{\"order_id\": \"9876543210\"}",
                            "name": "get_delivery_date"
                        },
                        "type": "function"
                    },
                    {
                        "id": "call_t1FFdAhrfvRc5IgqA6WkPKYj",
                        "function": {
                            "arguments": "{\"order_id\": \"9876543210\", \"reason\": \"Decided to purchase locally to get it faster.\"}",
                            "name": "cancel_order"
                        },
                        "type": "function"
                    }
                ]
            }
        }
    ],
    "created": 1727816936,
    "model": "gpt-4o-mini-2024-07-18",
    "object": "chat.completion",
    "system_fingerprint": "fp_f85bea6784",
    "usage": {
        "completion_tokens": 64,
        "prompt_tokens": 1136,
        "total_tokens": 1200,
        "prompt_tokens_details": {
            "cached_tokens": 1024
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

示例 2：图像

在我们的第二个示例中，我们在 messages 数组中包含了多个杂货商品的图片 URL 以及用户查询，并进行了三次延迟运行。图像——无论是链接的还是在用户消息中以 base64 编码的——都有资格进行缓存。请确保 detail 参数保持一致，因为它会影响图像的标记方式。请注意，GPT-4o-mini 会增加额外的 token 来覆盖图像处理成本，尽管它对文本使用低成本 token 模型。缓存折扣基于实际处理的 token 数量，包括用于图像的 token，这些 token 也计入您的速率限制。

此示例的输出显示，第二次运行时命中了缓存，但第三次运行时未命中，因为第一个 URL 不同（eggs_url 而不是 veggie_url），尽管用户查询相同。

sauce_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/9/97/12-04-20-saucen-by-RalfR-15.jpg/800px-12-04-20-saucen-by-RalfR-15.jpg"
veggie_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/Veggies.jpg/800px-Veggies.jpg"
eggs_url= "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Egg_shelf.jpg/450px-Egg_shelf.jpg"
milk_url= "https://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/Lactaid_brand.jpg/800px-Lactaid_brand.jpg"

def multiimage_completion(url1, url2, user_query):
    completion = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {
        "role": "user",
        "content": [
            {
            "type": "image_url",
            "image_url": {
                "url": url1,
                "detail": "high"
            },
            },
            {
            "type": "image_url",
            "image_url": {
                "url": url2,
                "detail": "high"
            },
            },
            {"type": "text", "text": user_query}
        ],
        }
    ],
    max_tokens=300,
    )
    print(json.dumps(completion.to_dict(), indent=4))


def main(sauce_url, veggie_url):
    multiimage_completion(sauce_url, veggie_url, "Please list the types of sauces are shown in these images")
    #延迟 20 秒
    time.sleep(20)
    multiimage_completion(sauce_url, veggie_url, "Please list the types of vegetables are shown in these images")
    time.sleep(20)
    multiimage_completion(milk_url, sauce_url, "Please list the types of sauces are shown in these images")

if __name__ == "__main__":
    main(sauce_url, veggie_url)

{
    "id": "chatcmpl-ADeV3IrUqhpjMXEgv29BFHtTQ0Pzt",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The images show the following types of sauces:\n\n1. **Soy Sauce** - Kikkoman brand.\n2. **Worcester Sauce** - Appel brand, listed as \"Dresdner Art.\"\n3. **Tabasco Sauce** - Original pepper sauce.\n\nThe second image shows various vegetables, not sauces.",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817309,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 65,
        "prompt_tokens": 1548,
        "total_tokens": 1613,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
{
    "id": "chatcmpl-ADeVRSI6zFINkx99k7V6ux1v5iF5f",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The images show different types of items. In the first image, you'll see bottles of sauces like soy sauce, Worcester sauce, and Tabasco. The second image features various vegetables, including:\n\n1. Napa cabbage\n2. Kale\n3. Carrots\n4. Bok choy\n5. Swiss chard\n6. Leeks\n7. Parsley\n\nThese vegetables are arranged on shelves in a grocery store setting.",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817333,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 86,
        "prompt_tokens": 1548,
        "total_tokens": 1634,
        "prompt_tokens_details": {
            "cached_tokens": 1280
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}
{
    "id": "chatcmpl-ADeVphj3VALQVrdnt2efysvSmdnBx",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The second image shows three types of sauces:\n\n1. Soy Sauce (Kikkoman)\n2. Worcestershire Sauce\n3. Tabasco Sauce",
                "refusal": null,
                "role": "assistant"
            }
        }
    ],
    "created": 1727817357,
    "model": "gpt-4o-2024-08-06",
    "object": "chat.completion",
    "system_fingerprint": "fp_2f406b9113",
    "usage": {
        "completion_tokens": 29,
        "prompt_tokens": 1548,
        "total_tokens": 1577,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        }
    }
}

总体建议

为了充分利用提示缓存，请考虑遵循以下最佳实践：

将静态或经常重用的内容放在提示的开头：这有助于通过将动态数据保留在提示的末尾来确保更好的缓存效率。
保持一致的使用模式：不经常使用的提示会自动从缓存中删除。为了防止缓存被逐出，请保持提示的一致使用。
监控关键指标：定期跟踪缓存命中率、延迟和缓存 token 的比例。利用这些见解来微调您的缓存策略并最大化性能。

通过实施这些实践，您可以充分利用提示缓存，确保您的应用程序既响应迅速又具成本效益。妥善管理的缓存策略将显著减少处理时间，降低成本，并有助于保持流畅的用户体验。