Milvus 和 OpenAI 入门指南
寻找你的下一本书
在本 notebook 中,我们将通过 OpenAI 为书籍描述生成嵌入,并在 Milvus 中使用这些嵌入来查找相关书籍。此示例中的数据集来源于 HuggingFace datasets,包含一百万多条标题-描述对。
让我们开始下载本 notebook 所需的库:
openai
用于与 OpenAI 嵌入服务通信pymilvus
用于与 Milvus 服务器通信datasets
用于下载数据集tqdm
用于进度条
! pip install openai pymilvus datasets tqdm
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: openai in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (0.27.2)
Requirement already satisfied: pymilvus in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (2.2.2)
Requirement already satisfied: datasets in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (2.10.1)
Requirement already satisfied: tqdm in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (4.64.1)
Requirement already satisfied: aiohttp in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from openai) (3.8.4)
Requirement already satisfied: requests>=2.20 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from openai) (2.28.2)
Requirement already satisfied: pandas>=1.2.4 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.5.3)
Requirement already satisfied: ujson<=5.4.0,>=2.0.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (5.1.0)
Requirement already satisfied: mmh3<=3.0.0,>=2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (3.0.0)
Requirement already satisfied: grpcio<=1.48.0,>=1.47.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.47.2)
Requirement already satisfied: grpcio-tools<=1.48.0,>=1.47.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pymilvus) (1.47.2)
Requirement already satisfied: huggingface-hub<1.0.0,>=0.2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.12.1)
Requirement already satisfied: dill<0.3.7,>=0.3.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.3.6)
Requirement already satisfied: xxhash in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (3.2.0)
Requirement already satisfied: pyyaml>=5.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (5.4.1)
Requirement already satisfied: fsspec[http]>=2021.11.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (2023.1.0)
Requirement already satisfied: packaging in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (23.0)
Requirement already satisfied: numpy>=1.17 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (1.23.5)
Requirement already satisfied: multiprocess in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.70.14)
Requirement already satisfied: pyarrow>=6.0.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (10.0.1)
Requirement already satisfied: responses<0.19 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from datasets) (0.18.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)
Requirement already satisfied: frozenlist>=1.1.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.3.3)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.8.2)
Requirement already satisfied: aiosignal>=1.1.2 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (1.3.1)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (3.0.1)
Requirement already satisfied: attrs>=17.3.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from aiohttp->openai) (22.2.0)
Requirement already satisfied: six>=1.5.2 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio<=1.48.0,>=1.47.0->pymilvus) (1.16.0)
Requirement already satisfied: protobuf<4.0dev,>=3.12.0 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio-tools<=1.48.0,>=1.47.0->pymilvus) (3.20.1)
Requirement already satisfied: setuptools in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from grpcio-tools<=1.48.0,>=1.47.0->pymilvus) (65.6.3)
Requirement already satisfied: filelock in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets) (3.9.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from huggingface-hub<1.0.0,>=0.2.0->datasets) (4.5.0)
Requirement already satisfied: python-dateutil>=2.8.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pandas>=1.2.4->pymilvus) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from pandas>=1.2.4->pymilvus) (2022.7.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.14)
Requirement already satisfied: idna<4,>=2.5 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages (from requests>=2.20->openai) (2022.12.7)
安装了所需的包之后,我们就可以开始了。让我们先启动 Milvus 服务。运行的文件是此文件所在文件夹中的 docker-compose.yaml
。此命令启动一个 Milvus 单机实例,我们将用于此测试。
! docker compose up -d
[1A [1B [0G [?25l[+] Running 0/0
[37m ⠋ Network milvus Creating 0.1s
[0m [?25h [1A [1A [0G [?25l [34m[+] Running 1/1 [0m
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠋ Container milvus-minio Creating 0.1s
[0m [37m ⠋ Container milvus-etcd Creating 0.1s
[0m [?25h [1A [1A [1A [1A [0G [?25l[+] Running 1/3
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠙ Container milvus-minio Creating 0.2s
[0m [37m ⠙ Container milvus-etcd Creating 0.2s
[0m [?25h [1A [1A [1A [1A [0G [?25l[+] Running 1/3
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠹ Container milvus-minio Creating 0.3s
[0m [37m ⠹ Container milvus-etcd Creating 0.3s
[0m [?25h [1A [1A [1A [1A [0G [?25l [34m[+] Running 3/3 [0m
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Created 0.3s
[0m [34m ⠿ Container milvus-etcd Created 0.3s
[0m [37m ⠋ Container milvus-standalone Creating 0.1s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Created 0.3s
[0m [34m ⠿ Container milvus-etcd Created 0.3s
[0m [37m ⠙ Container milvus-standalone Creating 0.2s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠹ Container milvus-minio Creating 0.3s
[0m [37m ⠹ Container milvus-etcd Creating 0.3s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 0.7s
[0m [37m ⠿ Container milvus-etcd Starting 0.7s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 0.8s
[0m [37m ⠿ Container milvus-etcd Starting 0.8s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 0.9s
[0m [37m ⠿ Container milvus-etcd Starting 0.9s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 1.0s
[0m [37m ⠿ Container milvus-etcd Starting 1.0s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 1.1s
[0m [37m ⠿ Container milvus-etcd Starting 1.1s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 1.2s
[0m [37m ⠿ Container milvus-etcd Starting 1.2s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 1.3s
[0m [37m ⠿ Container milvus-etcd Starting 1.3s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 1.4s
[0m [37m ⠿ Container milvus-etcd Starting 1.4s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 1.5s
[0m [37m ⠿ Container milvus-etcd Starting 1.5s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 2/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 1.6s
[0m [37m ⠿ Container milvus-etcd Starting 1.6s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [37m ⠿ Container milvus-minio Starting 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [34m ⠿ Container milvus-standalone Created 0.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 1.6s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 1.7s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 1.8s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 1.9s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 2.0s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 2.1s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 2.2s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 2.3s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 2.4s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 2.5s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 3/4
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [37m ⠿ Container milvus-standalone Starting 2.6s
[0m [?25h [1A [1A [1A [1A [1A [0G [?25l[+] Running 4/4 [0m
[34m ⠿ Network milvus Created 0.1s
[0m [34m ⠿ Container milvus-minio Started 1.8s
[0m [34m ⠿ Container milvus-etcd Started 1.7s
[0m [34m ⠿ Container milvus-standalone Started 2.6s
[0m [?25h
在 Milvus 运行起来后,我们可以设置全局变量:
- HOST:Milvus 主机地址
- PORT:Milvus 端口号
- COLLECTION_NAME:Milvus 中集合的名称
- DIMENSION:嵌入的维度
- OPENAI_ENGINE:要使用的嵌入模型
- openai.api_key:你的 OpenAI 账户密钥
- INDEX_PARAM:集合要使用的索引设置
- QUERY_PARAM:要使用的搜索参数
- BATCH_SIZE:一次嵌入和插入多少文本
import openai
HOST = 'localhost'
PORT = 19530
COLLECTION_NAME = 'book_search'
DIMENSION = 1536
OPENAI_ENGINE = 'text-embedding-3-small'
openai.api_key = 'sk-your_key'
INDEX_PARAM = {
'metric_type':'L2',
'index_type':"HNSW",
'params':{'M': 8, 'efConstruction': 64}
}
QUERY_PARAM = {
"metric_type": "L2",
"params": {"ef": 64},
}
BATCH_SIZE = 1000
Milvus
此部分涉及 Milvus 和为该用例设置数据库。在 Milvus 中,我们需要设置一个集合并索引该集合。
from pymilvus import connections, utility, FieldSchema, Collection, CollectionSchema, DataType
# 连接到 Milvus 数据库
connections.connect(host=HOST, port=PORT)
# 如果集合已存在,则删除它
if utility.has_collection(COLLECTION_NAME):
utility.drop_collection(COLLECTION_NAME)
# 创建包含 id、title 和 embedding 的集合。
fields = [
FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=64000),
FieldSchema(name='description', dtype=DataType.VARCHAR, max_length=64000),
FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=DIMENSION)
]
schema = CollectionSchema(fields=fields)
collection = Collection(name=COLLECTION_NAME, schema=schema)
# 在集合上创建索引并加载它。
collection.create_index(field_name="embedding", index_params=INDEX_PARAM)
collection.load()
数据集
在 Milvus 启动并运行后,我们可以开始获取数据。Hugging Face Datasets 是一个包含许多用户数据集的中心,在此示例中,我们使用的是 Skelebor 的书籍数据集。此数据集包含一百万多本书的标题-描述对。我们将嵌入每个描述并将其与标题一起存储在 Milvus 中。
import datasets
# 下载数据集并仅使用 `train` 部分(文件大小约为 800Mb)
dataset = datasets.load_dataset('Skelebor/book_titles_and_descriptions_en_clean', split='train')
/Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Found cached dataset parquet (/Users/filiphaltmayer/.cache/huggingface/datasets/Skelebor___parquet/Skelebor--book_titles_and_descriptions_en_clean-3596935b1d8a7747/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
插入数据
现在我们已将数据加载到本地,可以开始嵌入并将其插入 Milvus。嵌入函数接收文本并以列表格式返回嵌入。
# 将文本转换为嵌入的简单函数
def embed(texts):
embeddings = openai.Embedding.create(
input=texts,
engine=OPENAI_ENGINE
)
return [x['embedding'] for x in embeddings['data']]
下一步是实际插入。由于数据点很多,如果您想立即进行测试,可以提前停止插入单元块并继续。这样做可能会降低结果的准确性(因为数据点较少),但应该仍然足够好。
from tqdm import tqdm
data = [
[], # title
[], # description
]
# 分批嵌入和插入
for i in tqdm(range(0, len(dataset))):
data[0].append(dataset[i]['title'])
data[1].append(dataset[i]['description'])
if len(data[0]) % BATCH_SIZE == 0:
data.append(embed(data[1]))
collection.insert(data)
data = [[],[]]
# 嵌入并插入剩余数据
if len(data[0]) != 0:
data.append(embed(data[1]))
collection.insert(data)
data = [[],[]]
0%| | 1999/1032335 [00:06<57:22, 299.31it/s]
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[18], line 13
11 data[1].append(dataset[i]['description'])
12 if len(data[0]) % BATCH_SIZE == 0:
---> 13 data.append(embed(data[1]))
14 collection.insert(data)
15 data = [[],[]]
Cell In[17], line 3, in embed(texts)
2 def embed(texts):
----> 3 embeddings = openai.Embedding.create(
4 input=texts,
5 engine=OPENAI_ENGINE
6 )
7 return [x['embedding'] for x in embeddings['data']]
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_resources/embedding.py:33, in Embedding.create(cls, *args, **kwargs)
31 while True:
32 try:
---> 33 response = super().create(*args, **kwargs)
35 # If a user specifies base64, we'll just return the encoded string.
36 # This is only for the default case.
37 if not user_provided_encoding_format:
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py:153, in EngineAPIResource.create(cls, api_key, api_base, api_type, request_id, api_version, organization, **params)
127 @classmethod
128 def create(
129 cls,
(...)
136 **params,
137 ):
138 (
139 deployment_id,
140 engine,
(...)
150 api_key, api_base, api_type, api_version, organization, **params
151 )
152 response, _, api_key = requestor.request(
153 "post",
154 url,
155 params=params,
156 headers=headers,
157 stream=stream,
158 request_id=request_id,
159 request_timeout=request_timeout,
160 )
161 if stream:
162 # must be an iterator
163 assert not isinstance(response, OpenAIResponse)
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_requestor.py:216, in APIRequestor.request(self, method, url, params, headers, files, stream, request_id, request_timeout)
205 def request(
206 self,
207 method,
(...)
214 request_timeout: Optional[Union[float, Tuple[float, float]]] = None,
215 ) -> Tuple[Union[OpenAIResponse, Iterator[OpenAIResponse]], bool, str]:
--> 216 result = self.request_raw(
217 method.lower(),
218 url,
219 params=params,
220 supplied_headers=headers,
221 files=files,
222 stream=stream,
223 request_id=request_id,
224 request_timeout=request_timeout,
225 )
226 resp, got_stream = self._interpret_response(result, stream)
227 return resp, got_stream, self.api_key
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/openai/api_requestor.py:516, in APIRequestor.request_raw(self, method, url, params, supplied_headers, files, stream, request_id, request_timeout)
514 _thread_context.session = _make_session()
515 try:
--> 516 result = _thread_context.session.request(
517 method,
518 abs_url,
519 headers=headers,
520 data=data,
521 files=files,
522 stream=stream,
523 timeout=request_timeout if request_timeout else TIMEOUT_SECS,
524 )
525 except requests.exceptions.Timeout as e:
526 raise error.Timeout("Request timed out: {}".format(e)) from e
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/sessions.py:587, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
582 send_kwargs = {
583 "timeout": timeout,
584 "allow_redirects": allow_redirects,
585 }
586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
589 return resp
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/sessions.py:701, in Session.send(self, request, **kwargs)
698 start = preferred_clock()
700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
703 # Total elapsed time of the request (approximately)
704 elapsed = preferred_clock() - start
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/requests/adapters.py:489, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
487 try:
488 if not chunked:
--> 489 resp = conn.urlopen(
490 method=request.method,
491 url=url,
492 body=request.body,
493 headers=request.headers,
494 redirect=False,
495 assert_same_host=False,
496 preload_content=False,
497 decode_content=False,
498 retries=self.max_retries,
499 timeout=timeout,
500 )
502 # Send the request.
503 else:
504 if hasattr(conn, "proxy_pool"):
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
700 _prepare_proxy(conn)
701 # Make the request on the httplib connection object.
--> 702 httplib_response = self._make_request(
703 conn,
704 method,
705 url,
706 timeout=timeout_obj,
707 body=body,
708 headers=headers,
709 chunked=chunked,
710 )
711 # If we're going to release the connection in ``finally:``, then
712 # the response doesn't need to know about the connection. Otherwise
713 # it will also try to release it and we'll have a double-release
714 # mess.
715 response_conn = conn if not release_conn else None
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:449, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
444 httplib_response = conn.getresponse()
445 except BaseException as e:
446 # Remove the TypeError from the exception chain in
447 # Python 3 (including for exceptions like SystemExit).
448 # Otherwise it looks like a bug in the code.
--> 449 six.raise_from(e, None)
450 except (SocketTimeout, BaseSSLError, SocketError) as e:
451 self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File <string>:3, in raise_from(value, from_value)
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/urllib3/connectionpool.py:444, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
441 except TypeError:
442 # Python 3
443 try:
--> 444 httplib_response = conn.getresponse()
445 except BaseException as e:
446 # Remove the TypeError from the exception chain in
447 # Python 3 (including for exceptions like SystemExit).
448 # Otherwise it looks like a code bug.
449 six.raise_from(e, None)
File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:1377, in HTTPConnection.getresponse(self)
1375 try:
1376 try:
-> 1377 response.begin()
1378 except ConnectionError:
1379 self.close()
File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:320, in HTTPResponse.begin(self)
318 # read until we get a non-100 response
319 while True:
--> 320 version, status, reason = self._read_status()
321 if status != CONTINUE:
322 break
File ~/miniconda3/envs/haystack/lib/python3.9/http/client.py:281, in HTTPResponse._read_status(self)
280 def _read_status(self):
--> 281 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
282 if len(line) > _MAXLINE:
283 raise LineTooLong("status line")
File ~/miniconda3/envs/haystack/lib/python3.9/socket.py:704, in SocketIO.readinto(self, b)
702 while True:
703 try:
--> 704 return self._sock.recv_into(b)
705 except timeout:
706 self._timeout_occurred = True
File ~/miniconda3/envs/haystack/lib/python3.9/ssl.py:1242, in SSLSocket.recv_into(self, buffer, nbytes, flags)
1238 if flags != 0:
1239 raise ValueError(
1240 "non-zero flags not allowed in calls to recv_into() on %s" %
1241 self.__class__)
-> 1242 return self.read(nbytes, buffer)
1243 else:
1244 return super().recv_into(buffer, nbytes, flags)
File ~/miniconda3/envs/haystack/lib/python3.9/ssl.py:1100, in SSLSocket.read(self, len, buffer)
1098 try:
1099 if buffer is not None:
-> 1100 return self._sslobj.read(len, buffer)
1101 else:
1102 return self._sslobj.read(len)
KeyboardInterrupt:
查询数据库
在我们的数据安全地插入 Milvus 后,我们现在可以执行查询了。查询接收一个或多个字符串,并搜索它们。结果会打印出您提供的描述以及包含结果分数、结果标题和结果书籍描述的结果。
import textwrap
def query(queries, top_k = 5):
if type(queries) != list:
queries = [queries]
res = collection.search(embed(queries), anns_field='embedding', param=QUERY_PARAM, limit = top_k, output_fields=['title', 'description'])
for i, hit in enumerate(res):
print('Description:', queries[i])
print('Results:')
for ii, hits in enumerate(hit):
print('\t' + 'Rank:', ii + 1, 'Score:', hits.score, 'Title:', hits.entity.get('title'))
print(textwrap.fill(hits.entity.get('description'), 88))
print()
query('Book about a k-9 from europe')
RPC error: [search], <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>, <Time:{'RPC start': '2023-03-17 14:22:18.368461', 'RPC error': '2023-03-17 14:22:18.382086'}>
---------------------------------------------------------------------------
MilvusException Traceback (most recent call last)
Cell In[32], line 1
----> 1 query('Book about a k-9 from europe')
Cell In[31], line 6, in query(queries, top_k)
4 if type(queries) != list:
5 queries = [queries]
----> 6 res = collection.search(embed(queries), anns_field='embedding', param=QUERY_PARAM, limit = top_k, output_fields=['title', 'description'])
7 for i, hit in enumerate(res):
8 print('Description:', queries[i])
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/orm/collection.py:614, in Collection.search(self, data, anns_field, param, limit, expr, partition_names, output_fields, timeout, round_decimal, **kwargs)
611 raise DataTypeNotMatchException(message=ExceptionsMessage.ExprType % type(expr))
613 conn = self._get_connection()
--> 614 res = conn.search(self._name, data, anns_field, param, limit, expr,
615 partition_names, output_fields, round_decimal, timeout=timeout,
616 schema=self._schema_dict, **kwargs)
617 if kwargs.get("_async", False):
618 return SearchFuture(res)
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:109, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
107 record_dict["RPC error"] = str(datetime.datetime.now())
108 LOGGER.error(f"RPC error: [{inner_name}], {e}, <Time:{record_dict}>")
--> 109 raise e
110 except grpc.FutureTimeoutError as e:
111 record_dict["gRPC timeout"] = str(datetime.datetime.now())
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:105, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
103 try:
104 record_dict["RPC start"] = str(datetime.datetime.now())
--> 105 return func(*args, **kwargs)
106 except MilvusException as e:
107 record_dict["RPC error"] = str(datetime.datetime.now())
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:136, in tracing_request.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
134 if req_id:
135 self.set_onetime_request_id(req_id)
--> 136 ret = func(self, *args, **kwargs)
137 return ret
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:85, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
83 back_off = min(back_off * back_off_multiplier, max_back_off)
84 else:
---> 85 raise e
86 except Exception as e:
87 raise e
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/decorators.py:50, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
48 while True:
49 try:
---> 50 return func(self, *args, **kwargs)
51 except grpc.RpcError as e:
52 # DEADLINE_EXCEEDED means that the task wat not completed
53 # UNAVAILABLE means that the service is not reachable currently
54 # Reference: https://grpc.github.io/grpc/python/grpc.html#grpc-status-code
55 if e.code() != grpc.StatusCode.DEADLINE_EXCEEDED and e.code() != grpc.StatusCode.UNAVAILABLE:
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:472, in GrpcHandler.search(self, collection_name, data, anns_field, param, limit, expression, partition_names, output_fields, round_decimal, timeout, schema, **kwargs)
467 requests = Prepare.search_requests_with_expr(collection_name, data, anns_field, param, limit, schema,
468 expression, partition_names, output_fields, round_decimal,
469 **kwargs)
471 auto_id = schema["auto_id"]
--> 472 return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:441, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
439 if kwargs.get("_async", False):
440 return SearchFuture(None, None, True, pre_err)
--> 441 raise pre_err
File ~/miniconda3/envs/haystack/lib/python3.9/site-packages/pymilvus/client/grpc_handler.py:432, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
429 response = self._stub.Search(request, timeout=timeout)
431 if response.status.error_code != 0:
--> 432 raise MilvusException(response.status.error_code, response.status.reason)
434 raws.append(response)
435 round_decimal = kwargs.get("round_decimal", -1)
MilvusException: <MilvusException: (code=1, message=code: UnexpectedError, reason: code: CollectionNotExists, reason: can't find collection: book_search)>