Milvus 和 OpenAI 入门指南

寻找你的下一本书

在本 notebook 中,我们将通过 OpenAI 为书籍描述生成嵌入,并在 Milvus 中使用这些嵌入来查找相关书籍。此示例中的数据集来源于 HuggingFace datasets,包含一百万多条标题-描述对。

让我们开始下载本 notebook 所需的库:

  • openai 用于与 OpenAI 嵌入服务通信
  • pymilvus 用于与 Milvus 服务器通信
  • datasets 用于下载数据集
  • tqdm 用于进度条
! pip install openai pymilvus datasets tqdm
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: openai in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (0.27.2)
Requirement already satisfied: pymilvus in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (2.2.2)
Requirement already satisfied: datasets in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (2.10.1)
Requirement already satisfied: tqdm in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (4.64.1)
Requirement already satisfied: aiohttp in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from openai) (3.8.4)
Requirement already satisfied: requests>=2.20 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from openai) (2.28.2)
Requirement already satisfied: pandas>=1.2.4 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.5.3)
Requirement already satisfied: ujson<=5.4.0,>=2.0.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (5.1.0)
Requirement already satisfied: mmh3<=3.0.0,>=2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (3.0.0)
Requirement already satisfied: grpcio<=1.48.0,>=1.47.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.47.2)
Requirement already satisfied: grpcio-tools<=1.48.0,>=1.47.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.47.2)
Requirement already satisfied: huggingface-hub<1.0.0,>=0.2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.12.1)
Requirement already satisfied: dill<0.3.7,>=0.3.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.3.6)
Requirement already satisfied: xxhash in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (3.2.0)
Requirement already satisfied: pyyaml>=5.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (5.4.1)
Requirement already satisfied: fsspec[http]>=2021.11.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (2023.1.0)
Requirement already satisfied: packaging in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (23.0)
Requirement already satisfied: numpy>=1.17 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (1.23.5)
Requirement already satisfied: multiprocess in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.70.14)
Requirement already satisfied: pyarrow>=6.0.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (10.0.1)
Requirement already satisfied: responses<0.19 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.18.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)
Requirement already satisfied: frozenlist>=1.1.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.3.3)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.8.2)
Requirement already satisfied: aiosignal>=1.1.2 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.3.1)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (3.0.1)
Requirement already satisfied: attrs>=17.3.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (22.2.0)
Requirement already satisfied: six>=1.5.2 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio<=1.48.0,>=1.47.0->pymilvus) (1.16.0)
Requirement already satisfied: protobuf<4.0dev,>=3.12.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio-tools<=1.48.0,>=1.47.0->pymilvus) (3.20.1)
Requirement already satisfied: setuptools in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio-tools<=1.48.0,>=1.47.0->pymilvus) (65.6.3)
Requirement already satisfied: filelock in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets) (3.9.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets) (4.5.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pandas>=1.2.4->pymilvus) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pandas>=1.2.4->pymilvus) (2022.7.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.14)
Requirement already satisfied: idna<4,>=2.5 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (2022.12.7)

安装了所需的包之后,我们就可以开始了。让我们先启动 Milvus 服务。运行的文件是此文件所在文件夹中的 docker-compose.yaml。此命令启动一个 Milvus 单机实例,我们将用于此测试。

! docker compose up -d
 [1A [1B [0G [?25l[+] Running 0/0
 [37m ⠋ Network milvus  Creating                                                0.1s
 [0m [?25h [1A [1A [0G [?25l [34m[+] Running 1/1 [0m
 [34m ⠿ Network milvus          Created                                         0.1s
 [0m [37m ⠋ Container milvus-minio  Creating                                        0.1s
 [0m [37m ⠋ Container milvus-etcd   Creating                                        0.1s
 [0m [?25h [1A [1A [1A [1A [0G [?25l[+] Running 1/3
 [34m ⠿ Network milvus          Created                                         0.1s
 [0m [37m ⠙ Container milvus-minio  Creating                                        0.2s
 [0m [37m ⠙ Container milvus-etcd   Creating                                        0.2s
 [0m [?25h [1A [1A [1A [1A [0G [?25l[+] Running 1/3
 [34m ⠿ Network milvus          Created                                         0.1s
 [0m [37m ⠹ Container milvus-minio  Creating                                        0.3s
 [0m [37m ⠹ Container milvus-etcd   Creating                                        0.3s
 [0m [?25h [1A [1A [1A [1A [0G [?25l [34m[+] Running 3/3 [0m
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Created                                    0.3s
 [0m [34m ⠿ Container milvus-etcd        Created                                    0.3s
 [0m [37m ⠋ Container milvus-standalone  Creating                                   0.1s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Created                                    0.3s
 [0m [34m ⠿ Container milvus-etcd        Created                                    0.3s
 [0m [37m ⠙ Container milvus-standalone  Creating                                   0.2s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠹ Container milvus-minio       Creating                                   0.3s
 [0m [37m ⠹ Container milvus-etcd        Creating                                   0.3s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   0.7s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   0.7s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   0.8s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   0.8s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   0.9s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   0.9s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   1.0s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   1.0s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   1.1s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   1.1s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   1.2s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   1.2s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   1.3s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   1.3s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   1.4s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   1.4s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   1.5s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   1.5s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   1.6s
 [0m [37m ⠿ Container milvus-etcd        Starting                                   1.6s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [37m ⠿ Container milvus-minio       Starting                                   1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [34m ⠿ Container milvus-standalone  Created                                    0.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   1.6s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   1.7s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   1.8s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   1.9s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   2.0s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   2.1s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   2.2s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   2.3s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   2.4s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   2.5s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [37m ⠿ Container milvus-standalone  Starting                                   2.6s
 [0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 4/4 [0m
 [34m ⠿ Network milvus               Created                                    0.1s
 [0m [34m ⠿ Container milvus-minio       Started                                    1.8s
 [0m [34m ⠿ Container milvus-etcd        Started                                    1.7s
 [0m [34m ⠿ Container milvus-standalone  Started                                    2.6s
 [0m [?25h

在 Milvus 运行起来后,我们可以设置全局变量:

  • HOST:Milvus 主机地址
  • PORT:Milvus 端口号
  • COLLECTION_NAME:Milvus 中集合的名称
  • DIMENSION:嵌入的维度
  • OPENAI_ENGINE:要使用的嵌入模型
  • openai.api_key:你的 OpenAI 账户密钥
  • INDEX_PARAM:集合要使用的索引设置
  • QUERY_PARAM:要使用的搜索参数
  • BATCH_SIZE:一次嵌入和插入多少文本
import openai

HOST = 'localhost'
PORT = 19530
COLLECTION_NAME = 'book_search'
DIMENSION = 1536
OPENAI_ENGINE = 'text-embedding-3-small'
openai.api_key = 'sk-your_key'

INDEX_PARAM = {
    'metric_type':'L2',
    'index_type':"HNSW",
    'params':{'M': 8, 'efConstruction': 64}
}

QUERY_PARAM = {
    "metric_type": "L2",
    "params": {"ef": 64},
}

BATCH_SIZE = 1000

Milvus

此部分涉及 Milvus 和为该用例设置数据库。在 Milvus 中,我们需要设置一个集合并索引该集合。

from pymilvus import connections, utility, FieldSchema, Collection, CollectionSchema, DataType

# 连接到 Milvus 数据库
connections.connect(host=HOST, port=PORT)
# 如果集合已存在,则删除它
if utility.has_collection(COLLECTION_NAME):
    utility.drop_collection(COLLECTION_NAME)
# 创建包含 id、title 和 embedding 的集合。
fields = [
    FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=64000),
    FieldSchema(name='description', dtype=DataType.VARCHAR, max_length=64000),
    FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=DIMENSION)
]
schema = CollectionSchema(fields=fields)
collection = Collection(name=COLLECTION_NAME, schema=schema)
# 在集合上创建索引并加载它。
collection.create_index(field_name="embedding", index_params=INDEX_PARAM)
collection.load()

数据集

在 Milvus 启动并运行后,我们可以开始获取数据。Hugging Face Datasets 是一个包含许多用户数据集的中心,在此示例中,我们使用的是 Skelebor 的书籍数据集。此数据集包含一百万多本书的标题-描述对。我们将嵌入每个描述并将其与标题一起存储在 Milvus 中。

import datasets

# 下载数据集并仅使用 `train` 部分(文件大小约为 800Mb)
dataset = datasets.load_dataset('Skelebor/book_titles_and_descriptions_en_clean', split='train')
/Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Found cached dataset parquet (/Users/filiphaltmayer/.cache/huggingface/datasets/Skelebor___parquet/Skelebor--book_titles_and_descriptions_en_clean-3596935b1d8a7747/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)

插入数据

现在我们已将数据加载到本地,可以开始嵌入并将其插入 Milvus。嵌入函数接收文本并以列表格式返回嵌入。

# 将文本转换为嵌入的简单函数
def embed(texts):
    embeddings = openai.Embedding.create(
        input=texts,
        engine=OPENAI_ENGINE
    )
    return [x['embedding'] for x in embeddings['data']]

下一步是实际插入。由于数据点很多,如果您想立即进行测试,可以提前停止插入单元块并继续。这样做可能会降低结果的准确性(因为数据点较少),但应该仍然足够好。

from tqdm import tqdm

data = [
    [], # title
    [], # description
]

# 分批嵌入和插入
for i in tqdm(range(0, len(dataset))):
    data[0].append(dataset[i]['title'])
    data[1].append(dataset[i]['description'])
    if len(data[0]) % BATCH_SIZE == 0:
        data.append(embed(data[1]))
        collection.insert(data)
        data = [[],[]]

# 嵌入并插入剩余数据
if len(data[0]) != 0:
    data.append(embed(data[1]))
    collection.insert(data)
    data = [[],[]]
  0%|          | 1999/1032335 [00:06<57:22, 299.31it/s]



---------------------------------------------------------------------------

KeyboardInterrupt                         Traceback (most recent call last)

Cell In[18], line 13
     11 data[1].append(dataset[i]['description'])
     12 if len(data[0]) % BATCH_SIZE == 0:
---> 13     data.append(embed(data[1]))
     14     collection.insert(data)
     15     data = [[],[]]


Cell In[17], line 3, in embed(texts)
      2 def embed(texts):
----> 3     embeddings = openai.Embedding.create(
      4         input=texts,
      5         engine=OPENAI_ENGINE
      6     )
      7     return [x['embedding'] for x in embeddings['data']]


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_resources/embedding.py:33, in Embedding.create(cls, *args, **kwargs)
     31 while True:
     32     try:
---> 33         response = super().create(*args, **kwargs)
     35         # If a user specifies base64, we'll just return the encoded string.
     36         # This is only for the default case.
     37         if not user_provided_encoding_format:


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py:153, in EngineAPIResource.create(cls, api_key, api_base, api_type, request_id, api_version, organization, **params)
    127 @classmethod
    128 def create(
    129     cls,
   (...)
    136     **params,
    137 ):
    138     (
    139         deployment_id,
    140         engine,
   (...)
    150         api_key, api_base, api_type, api_version, organization, **params
    151     )
    152     response, _, api_key = requestor.request(
    153         "post",
    154         url,
    155         params=params,
    156         headers=headers,
    157         stream=stream,
    158         request_id=request_id,
    159         request_timeout=request_timeout,
    160     )
    161     if stream:
    162         # must be an iterator
    163         assert not isinstance(response, OpenAIResponse)


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_requestor.py:216, in APIRequestor.request(self, method, url, params, headers, files, stream, request_id, request_timeout)
    205 def request(
    206     self,
    207     method,
   (...)
    214     request_timeout: Optional[Union[float, Tuple[float, float]]] = None,
    215 ) -> Tuple[Union[OpenAIResponse, Iterator[OpenAIResponse]], bool, str]:
--> 216     result = self.request_raw(
    217         method.lower(),
    218         url,
    219         params=params,
    220         supplied_headers=headers,
    221         files=files,
    222         stream=stream,
    223         request_id=request_id,
    224         request_timeout=request_timeout,
    225     )
    226     resp, got_stream = self._interpret_response(result, stream)
    227     return resp, got_stream, self.api_key


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_requestor.py:516, in APIRequestor.request_raw(self, method, url, params, supplied_headers, files, stream, request_id, request_timeout)
    514     _thread_context.session = _make_session()
    515 try:
--> 516     result = _thread_context.session.request(
    517         method,
    518         abs_url,
    519         headers=headers,
    520         data=data,
    521         files=files,
    522         stream=stream,
    523         timeout=request_timeout if request_timeout else TIMEOUT_SECS,
    524     )
    525 except requests.exceptions.Timeout as e:
    526     raise error.Timeout("Request timed out: {}".format(e)) from e


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    582 send_kwargs = {
    583     "timeout": timeout,
    584     "allow_redirects": allow_redirects,
    585 }
    586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
    589 return resp


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/sessions.py:701, in Session.send(self, request, **kwargs)
    698 start = preferred_clock()
    700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
    703 # Total elapsed time of the request (approximately)
    704 elapsed = preferred_clock() - start


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/adapters.py:489, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    487 try:
    488     if not chunked:
--> 489         resp = conn.urlopen(
    490             method=request.method,
    491             url=url,
    492             body=request.body,
    493             headers=request.headers,
    494             redirect=False,
    495             assert_same_host=False,
    496             preload_content=False,
    497             decode_content=False,
    498             retries=self.max_retries,
    499             timeout=timeout,
    500         )
    502     # Send the request.
    503     else:
    504         if hasattr(conn, "proxy_pool"):


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    700     _prepare_proxy(conn)
    701 # Make the request on the httplib connection object.
--> 702 httplib_response = self._make_request(
    703     conn,
    704     method,
    705     url,
    706     timeout=timeout_obj,
    707     body=body,
    708     headers=headers,
    709     chunked=chunked,
    710 )
    711 # If we're going to release the connection in ``finally:``, then
    712 # the response doesn't need to know about the connection. Otherwise
    713 # it will also try to release it and we'll have a double-release
    714 # mess.
    715 response_conn = conn if not release_conn else None


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:449, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    444             httplib_response = conn.getresponse()
    445         except BaseException as e:
    446             # Remove the TypeError from the exception chain in
    447             # Python 3 (including for exceptions like SystemExit).
    448             # Otherwise it looks like a bug in the code.
--> 449             six.raise_from(e, None)
    450 except (SocketTimeout, BaseSSLError, SocketError) as e:
    451     self._raise_timeout(err=e, url=url, timeout_value=read_timeout)


File <string>:3, in raise_from(value, from_value)


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:444, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    441 except TypeError:
    442     # Python 3
    443     try:
--> 444         httplib_response = conn.getresponse()
    445     except BaseException as e:
    446         # Remove the TypeError from the exception chain in
    447         # Python 3 (including for exceptions like SystemExit).
    448         # Otherwise it looks like a code bug.
    449         six.raise_from(e, None)


File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:1377, in HTTPConnection.getresponse(self)
   1375 try:
   1376     try:
-> 1377         response.begin()
   1378     except ConnectionError:
   1379         self.close()


File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:320, in HTTPResponse.begin(self)
    318 # read until we get a non-100 response
    319 while True:
--> 320     version, status, reason = self._read_status()
    321     if status != CONTINUE:
    322         break


File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:281, in HTTPResponse._read_status(self)
    280 def _read_status(self):
--> 281     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    282     if len(line) > _MAXLINE:
    283         raise LineTooLong("status line")


File ~/miniconda3/envs/haystack/lib/python3.9/socket.py:704, in SocketIO.readinto(self, b)
    702 while True:
    703     try:
--> 704         return self._sock.recv_into(b)
    705     except timeout:
    706         self._timeout_occurred = True


File ~/miniconda3/envs/haystack/lib/python3.9/ssl.py:1242, in SSLSocket.recv_into(self, buffer, nbytes, flags)
    1238     if flags != 0:
    1239         raise ValueError(
    1240           "non-zero flags not allowed in calls to recv_into() on %s" %
    1241           self.__class__)
-> 1242     return self.read(nbytes, buffer)
    1243 else:
    1244     return super().recv_into(buffer, nbytes, flags)


File ~/miniconda3/envs/haystack/lib/python3.9/ssl.py:1100, in SSLSocket.read(self, len, buffer)
    1098 try:
    1099     if buffer is not None:
-> 1100         return self._sslobj.read(len, buffer)
    1101     else:
    1102         return self._sslobj.read(len)


KeyboardInterrupt:

查询数据库

在我们的数据安全地插入 Milvus 后,我们现在可以执行查询了。查询接收一个或多个字符串,并搜索它们。结果会打印出您提供的描述以及包含结果分数、结果标题和结果书籍描述的结果。

import textwrap

def query(queries, top_k = 5):
    if type(queries) != list:
        queries = [queries]
    res = collection.search(embed(queries), anns_field='embedding', param=QUERY_PARAM, limit = top_k, output_fields=['title', 'description'])
    for i, hit in enumerate(res):
        print('Description:', queries[i])
        print('Results:')
        for ii, hits in enumerate(hit):
            print('\t' + 'Rank:', ii + 1, 'Score:', hits.score, 'Title:', hits.entity.get('title'))
            print(textwrap.fill(hits.entity.get('description'), 88))
            print()
query('Book about a k-9 from europe')
RPC error: [search], <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>, <Time:{'RPC start': '2023-03-17 14:22:18.368461', 'RPC error': '2023-03-17 14:22:18.382086'}>



---------------------------------------------------------------------------

MilvusException                           Traceback (most recent call last)

Cell In[32], line 1
----> 1 query('Book about a k-9 from europe')


Cell In[31], line 6, in query(queries, top_k)
      4 if type(queries) != list:
      5     queries = [queries]
----> 6 res = collection.search(embed(queries), anns_field='embedding', param=QUERY_PARAM, limit = top_k, output_fields=['title', 'description'])
      7 for i, hit in enumerate(res):
      8     print('Description:', queries[i])


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/orm/collection.py:614, in Collection.search(self, data, anns_field, param, limit, expr, partition_names, output_fields, timeout, round_decimal, **kwargs)
    611     raise DataTypeNotMatchException(message=ExceptionsMessage.ExprType % type(expr))
    613 conn = self._get_connection()
--> 614 res = conn.search(self._name, data, anns_field, param, limit, expr,
    615                   partition_names, output_fields, round_decimal, timeout=timeout,
    616                   schema=self._schema_dict, **kwargs)
    617 if kwargs.get("_async", False):
    618     return SearchFuture(res)


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:109, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
    107     record_dict["RPC error"] = str(datetime.datetime.now())
    108     LOGGER.error(f"RPC error: [{inner_name}], {e}, <Time:{record_dict}>")
--> 109     raise e
    110 except grpc.FutureTimeoutError as e:
    111     record_dict["gRPC timeout"] = str(datetime.datetime.now())


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:105, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
    103 try:
    104     record_dict["RPC start"] = str(datetime.datetime.now())
--> 105     return func(*args, **kwargs)
    106 except MilvusException as e:
    107     record_dict["RPC error"] = str(datetime.datetime.now())


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:136, in tracing_request.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
    134 if req_id:
    135     self.set_onetime_request_id(req_id)
--> 136 ret = func(self, *args, **kwargs)
    137 return ret


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:85, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
     83         back_off = min(back_off * back_off_multiplier, max_back_off)
     84     else:
---> 85         raise e
     86 except Exception as e:
     87     raise e


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:50, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
     48 while True:
     49     try:
---> 50         return func(self, *args, **kwargs)
     51     except grpc.RpcError as e:
     52         # DEADLINE_EXCEEDED means that the task wat not completed
     53         # UNAVAILABLE means that the service is not reachable currently
     54         # Reference: https://grpc.github.io/grpc/python/grpc.html#grpc-status-code
     55         if e.code() != grpc.StatusCode.DEADLINE_EXCEEDED and e.code() != grpc.StatusCode.UNAVAILABLE:


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:472, in GrpcHandler.search(self, collection_name, data, anns_field, param, limit, expression, partition_names, output_fields, round_decimal, timeout, schema, **kwargs)
    467 requests = Prepare.search_requests_with_expr(collection_name, data, anns_field, param, limit, schema,
    468                                              expression, partition_names, output_fields, round_decimal,
    469                                              **kwargs)
    471 auto_id = schema["auto_id"]
--> 472 return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:441, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
    439 if kwargs.get("_async", False):
    440     return SearchFuture(None, None, True, pre_err)
--> 441 raise pre_err


File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:432, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
    429     response = self._stub.Search(request, timeout=timeout)
    431     if response.status.error_code != 0:
--> 432         raise MilvusException(response.status.error_code, response.status.reason)
    434     raws.append(response)
    435 round_decimal = kwargs.get("round_decimal", -1)


MilvusException: <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>