使用 Vision 与工具

在此食谱中,我们将演示如何结合使用 Vision 和工具来分析营养标签的图像,并使用自定义工具提取结构化的营养信息。

设置

首先,让我们安装必要的库并设置 Anthropic API 客户端:

%pip install anthropic IPython
from IPython.display import Image
from anthropic import Anthropic
import base64

client = Anthropic()
MODEL_NAME = "claude-3-opus-20240229"

定义营养标签提取工具

接下来,我们将定义一个名为“print_nutrition_info”的自定义工具,该工具从营养标签的图像中提取结构化的营养信息。该工具具有卡路里、总脂肪、胆固醇、总碳水化合物和蛋白质的属性:

nutrition_tool = {
    "name": "print_nutrition_info",
    "description": "Extracts nutrition information from an image of a nutrition label",
    "input_schema": {
        "type": "object",
        "properties": {
            "calories": {"type": "integer", "description": "The number of calories per serving"},
            "total_fat": {"type": "integer", "description": "The amount of total fat in grams per serving"},
            "cholesterol": {"type": "integer", "description": "The amount of cholesterol in milligrams per serving"},
            "total_carbs": {"type": "integer", "description": "The amount of total carbohydrates in grams per serving"},
            "protein": {"type": "integer", "description": "The amount of protein in grams per serving"}
        },
        "required": ["calories", "total_fat", "cholesterol", "total_carbs", "protein"]
    }
}

分析营养标签图像

现在,让我们将所有内容整合在一起。我们将加载营养标签图像,将其与提示一起传递给 Claude,并让 Claude 调用“print_nutrition_info”工具以将结构化的营养信息提取到格式良好的 JSON 对象中:

Image(filename='../images/tool_use/nutrition_label.png')

png

def get_base64_encoded_image(image_path):
    with open(image_path, "rb") as image_file:
        binary_data = image_file.read()
        base_64_encoded_data = base64.b64encode(binary_data)
        base_64_string = base_64_encoded_data.decode('utf-8')
        return base_64_string

message_list = [
    {
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": get_base64_encoded_image("../images/tool_use/nutrition_label.png")}},
            {"type": "text", "text": "Please print the nutrition information from this nutrition label image."}
        ]
    }
]

response = client.messages.create(
    model=MODEL_NAME,
    max_tokens=4096,
    messages=message_list,
    tools=[nutrition_tool]
)

if response.stop_reason == "tool_use":
    last_content_block = response.content[-1]
    if last_content_block.type == 'tool_use':
        tool_name = last_content_block.name
        tool_inputs = last_content_block.input
        print(f"=======Claude Wants To Call The {tool_name} Tool=======")
        print(tool_inputs)

else:
    print("No tool was called. This shouldn't happen!")

=======Claude Wants To Call The print_nutrition_info Tool=======