使用 Vision 与工具
在此食谱中,我们将演示如何结合使用 Vision 和工具来分析营养标签的图像,并使用自定义工具提取结构化的营养信息。
设置
首先,让我们安装必要的库并设置 Anthropic API 客户端:
%pip install anthropic IPython
from IPython.display import Image
from anthropic import Anthropic
import base64
client = Anthropic()
MODEL_NAME = "claude-3-opus-20240229"
定义营养标签提取工具
接下来,我们将定义一个名为“print_nutrition_info”的自定义工具,该工具从营养标签的图像中提取结构化的营养信息。该工具具有卡路里、总脂肪、胆固醇、总碳水化合物和蛋白质的属性:
nutrition_tool = {
"name": "print_nutrition_info",
"description": "Extracts nutrition information from an image of a nutrition label",
"input_schema": {
"type": "object",
"properties": {
"calories": {"type": "integer", "description": "The number of calories per serving"},
"total_fat": {"type": "integer", "description": "The amount of total fat in grams per serving"},
"cholesterol": {"type": "integer", "description": "The amount of cholesterol in milligrams per serving"},
"total_carbs": {"type": "integer", "description": "The amount of total carbohydrates in grams per serving"},
"protein": {"type": "integer", "description": "The amount of protein in grams per serving"}
},
"required": ["calories", "total_fat", "cholesterol", "total_carbs", "protein"]
}
}
分析营养标签图像
现在,让我们将所有内容整合在一起。我们将加载营养标签图像,将其与提示一起传递给 Claude,并让 Claude 调用“print_nutrition_info”工具以将结构化的营养信息提取到格式良好的 JSON 对象中:
Image(filename='../images/tool_use/nutrition_label.png')
def get_base64_encoded_image(image_path):
with open(image_path, "rb") as image_file:
binary_data = image_file.read()
base_64_encoded_data = base64.b64encode(binary_data)
base_64_string = base_64_encoded_data.decode('utf-8')
return base_64_string
message_list = [
{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": get_base64_encoded_image("../images/tool_use/nutrition_label.png")}},
{"type": "text", "text": "Please print the nutrition information from this nutrition label image."}
]
}
]
response = client.messages.create(
model=MODEL_NAME,
max_tokens=4096,
messages=message_list,
tools=[nutrition_tool]
)
if response.stop_reason == "tool_use":
last_content_block = response.content[-1]
if last_content_block.type == 'tool_use':
tool_name = last_content_block.name
tool_inputs = last_content_block.input
print(f"=======Claude Wants To Call The {tool_name} Tool=======")
print(tool_inputs)
else:
print("No tool was called. This shouldn't happen!")
=======Claude Wants To Call The print_nutrition_info Tool=======