以高输入保真度生成图像

本食谱展示了如何利用图像 API 和响应图像生成工具中提供的 input_fidelity 参数来保留输入的独特特征。

将 input_fidelity="high" 设置为在编辑具有人脸、徽标或任何需要高保真度输出的细节的图像时特别有用。

如果您还不熟悉使用 OpenAI API 进行图像生成，我们建议从我们的图像生成入门食谱开始。

设置

%pip install pillow openai -U  # (如果已安装则跳过)

import base64, os
from io import BytesIO
from PIL import Image
from IPython.display import display, Image as IPImage
from openai import OpenAI

client = OpenAI()
# 如果未全局设置 API 密钥，请设置它
#client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<如果未设置为环境变量，则为您的 OpenAI API 密钥>"))

folder_path = "imgs"
os.makedirs(folder_path, exist_ok=True)

def resize_img(image, target_w):
    w, h = image.size
    target_h = int(round(h * (target_w / float(w))))
    resized_image = image.resize((target_w, target_h), Image.LANCZOS)
    return resized_image

def edit_img(input_img, prompt):
    result = client.images.edit(
        model="gpt-image-1",
        image=input_img,
        prompt=prompt,
        input_fidelity="high",
        quality="high",
        output_format="jpeg"
    )

    image_base64 = result.data[0].b64_json
    image_bytes = base64.b64decode(image_base64)
    image = Image.open(BytesIO(image_bytes))
    return image

精确编辑

高输入保真度允许您对图像进行细微的编辑，而不会改变不相关的区域。这非常适合受控的局部更改。

示例用例：

物品编辑：更改孤立的元素（例如，更换马克杯颜色），同时保持其他所有内容不变。
元素移除：干净地移除孤立的元素，而不更改图片的其余部分。
元素添加：无缝地将新对象插入场景。

edit_input_path = "imgs/desk.png"
edit_input_img = open(edit_input_path, "rb")
display(IPImage(edit_input_path))

png

物品编辑

edit_prompt = "将马克杯改为橄榄绿色"
edit_result = edit_img(edit_input_img, edit_prompt)

# 显示结果
edit_resized_result = resize_img(edit_result, 300)
display(edit_resized_result)

png

移除物品

remove_prompt = "从桌子上移除马克杯"
remove_result = edit_img(edit_input_img, remove_prompt)

# 显示结果
remove_resized_result = resize_img(remove_result, 300)
display(remove_resized_result)

png

添加物品

add_prompt = "在显示器上添加一个写着“马上回来！”的便利贴"
add_result = edit_img(edit_input_img, add_prompt)

# 显示结果
add_resized_result = resize_img(add_result, 300)
display(add_resized_result)

png

人脸保留

使用高输入保真度时，人脸的保留程度远高于标准模式。当您需要人物在编辑过程中保持可识别时，可以使用此功能。

示例用例：

图像编辑：编辑照片时保留面部特征。
个性化：创建仍然像原始人物的头像，具有不同的背景或风格。
照片合并：将多张照片中的人脸合并到一张图像中。

注意：目前，虽然所有输入图像都以高保真度保留，但只有您提供的第一张图像在纹理方面保留了更丰富的细节。当处理来自不同照片的多个面孔时，请尝试在发送请求之前将所有需要的面孔合并到单个复合图像中（请参阅下面的示例）。

face_input_path = "imgs/woman_portrait.png"
face_input_img = open(face_input_path, "rb")
display(IPImage(face_input_path))

png

图像编辑

edit_face_prompt = "添加柔和的霓虹紫和青绿色灯光以及发光的背光。"
edit_face_result = edit_img(face_input_img, edit_face_prompt)

# 显示结果
edit_face_resized_result = resize_img(edit_face_result, 300)
display(edit_face_resized_result)

png

头像

avatar_prompt = "以数字艺术风格生成此人的头像，具有鲜艳的色彩飞溅。"
avatar_result = edit_img(face_input_img, avatar_prompt)

# 显示结果
avatar_resized_result = resize_img(avatar_result, 300)
display(avatar_resized_result)

png

合并多张带有人脸的图片

second_woman_input_path = "imgs/woman_smiling.jpg"
second_woman_input_img = open(second_woman_input_path, "rb")
display(IPImage(second_woman_input_path))

jpeg

def combine_imgs(left_path, right_path, bg_color=(255, 255, 255)):

    left_img = Image.open(open(left_path, "rb"))
    right_img = Image.open(open(right_path, "rb"))

    # 确保 RGBA 以安全粘贴（处理透明度）
    left = left_img.convert("RGBA")
    right = right_img.convert("RGBA")

    # 将右侧图像调整为与左侧图像高度匹配
    target_h = left.height
    scale = target_h / float(right.height)
    target_w = int(round(right.width * scale))
    right = right.resize((target_w, target_h), Image.LANCZOS)

    # 新画布
    total_w = left.width + right.width
    canvas = Image.new("RGBA", (total_w, target_h), bg_color + (255,))

    # 粘贴
    canvas.paste(left, (0, 0), left)
    canvas.paste(right, (left.width, 0), right)

    return canvas

combined_img = combine_imgs(second_woman_input_path, face_input_path)
display(combined_img)

png

import io

# 实用函数，用于转换为字节
def pil_to_bytes(img, fmt="PNG"):
    buf = io.BytesIO()
    img.save(buf, format=fmt)
    buf.seek(0)
    return buf

combined_img_bytes = pil_to_bytes(combined_img)

combined_prompt = "将这两个女人放在同一张照片中，互相搭着肩膀，就像同一张照片的一部分一样。"
combined_result = edit_img(("combined.png", combined_img_bytes, "image/png"), combined_prompt)

# 显示结果
combined_resized_result = resize_img(combined_result, 300)
display(combined_resized_result)

png

品牌一致性

有时，在生成的图像中保持品牌标识至关重要。高输入保真度可确保徽标和其他独特设计元素忠实于原始资产。

示例用例：

营销素材：生成包含品牌徽标的横幅或社交帖子，而不会失真。
模型图：将徽标或其他品牌资产放置到模板或生活方式场景中，而不会产生意外更改。
产品摄影：为不同广告系列更改产品背景，同时保持产品细节清晰。

logo_input_path = "imgs/logo.png"
logo_input_img = open(logo_input_path, "rb")
display(IPImage(logo_input_path))

png

营销素材

marketing_prompt = "生成一个漂亮的现代主横幅，将此徽标置于中心。它应该看起来很未来，带有蓝色和紫色的色调。"
marketing_result = edit_img(logo_input_img, marketing_prompt)

# 显示结果
marketing_resized_result = resize_img(marketing_result, 300)
display(marketing_resized_result)

png

模型图

mockup_prompt = "生成一张高度逼真的手持倾斜 iPhone 的图片，屏幕上有一个应用程序，该应用程序在中心展示此徽标，下方带有加载动画"
mockup_result = edit_img(logo_input_img, mockup_prompt)

# 显示结果
mockup_resized_result = resize_img(mockup_result, 300)
display(mockup_resized_result)

png

产品摄影

bag_input_path = "imgs/bag.png"
bag_input_img = open(bag_input_path, "rb")
display(IPImage(bag_input_path))

png

product_prompt = "生成一张漂亮的广告，将此包放在中心，放在深色背景上，背景中心有一个发光的 दिसून。"
product_result = edit_img(bag_input_img, product_prompt)

# 显示结果
product_resized_result = resize_img(product_result, 300)
display(product_resized_result)

png

时尚与产品修饰

电子商务和时尚通常需要编辑服装或产品细节，同时又不影响真实感。高输入保真度可确保织物纹理、图案和徽标保持一致。

示例用例：

服装变化：更改模特照片上的服装颜色或样式。
配饰添加：在模特照片中添加珠宝、帽子或其他配饰，而不会更改其姿势或面部。
产品提取：在新的环境中展示相同的产品或服装，同时保持细节不变。

model_input_path = "imgs/model.png"
model_input_img = open(model_input_path, "rb")
display(IPImage(model_input_path))

png

服装变化

variation_prompt = "编辑此图片，使模特穿着蓝色背心，而不是外套和毛衣。"
variation_result = edit_img(model_input_img, variation_prompt)

# 显示结果
variation_resized_result = resize_img(variation_result, 300)
display(variation_resized_result)

png

配饰添加

在此示例中，我们将合并 2 个输入图像。包含面部的图像应作为第一个输入提供，因为第一张图像会保留更多细节。

input_imgs = [('model.png',
                 open('imgs/model.png', 'rb'),
                 'image/png'),
    ('bag.png', open('imgs/bag.png', 'rb'),'image/png'),
]

accessory_prompt = "为服装添加斜挎包。"
accessory_result = edit_img(input_imgs, accessory_prompt)

# 显示结果
accessory_resized_result = resize_img(accessory_result, 300)
display(accessory_resized_result)

png

产品提取

extraction_prompt = "生成一张这张完全相同的夹克在白色背景上的图片"
extraction_result = edit_img(model_input_img, extraction_prompt)

# 显示结果
extraction_resized_result = resize_img(extraction_result, 300)
display(extraction_resized_result)

png

总结

在本指南中，我们介绍了如何启用高输入保真度以更好地保留输入图像中的重要视觉细节。

请以我们介绍的示例用例为灵感，并尝试使用您自己的图像来测试该参数，以了解高输入保真度在何处能带来最大的不同。

请记住，高输入保真度比默认设置消耗更多的图像输入令牌。此外，虽然所有输入图像都以高输入保真度进行处理，但列表中的第一张图像保留了最精细的细节和最丰富的纹理，这对于人脸尤其重要。

祝您构建愉快！