Very slow reindexing #28452

ducanh997 · 2023-11-15T08:17:31Z

ducanh997
Nov 15, 2023

Milvus Version: v2.2.14 Standalone

I've added 8 million float vectors to Milvus, each with a length of 768. To add 8 million vectors, I splitted the data into chunks with 10,000 records, then called collection.insert() using PyMilvus. After completing the data insertion, I created an IVF_SQ8 index with nlist = 2048. While monitoring the indexing process, I observed:

{'total_rows': 8313118, 'indexed_rows': 8313118, 'pending_index_rows': 5063118 }

The equality of indexed_rows and total_rows suggests that I can now begin searches on Milvus. However, the pending_index_rows was decreasing very slowly, which caused 100% CPU usage for many hours. After reading this GitHub issue, I speculated that Milvus might be reindexing for optimization. Is there any way to disable this feature or set a specific time for Milvus for reindexing?

I would appreciate any insights or suggestions to enhance performance in this context.

xiaofan-luan · 2023-11-15T09:25:09Z

xiaofan-luan
Nov 15, 2023
Maintainer

How many cpu cores are there in your cluster?
IVFSQ8 shouldn't be too slow. usually index building for 1m data take less than 1 minutes

0 replies

xiaofan-luan · 2023-11-15T09:26:47Z

xiaofan-luan
Nov 15, 2023
Maintainer

but if you are doing frequent flush, it may trigger recursive compaction and index.
DON'T manually call flush if you don't know what is flush. even you don't do manual flush data is still persistent and searchable.
Under most case you don't need to do flush

1 reply

ducanh997 Nov 15, 2023
Author

Thank @xiaofan-luan . I don't manually call flush. Just insert data chunk by chunk as below.

collection = Collection("wiki_corpus_emb", schema)
CHUNK_SIZE = 10000
with open(file_name, 'rb') as f:
       vectors, ids = ...
       for i in range(0, len(ids), CHUNK_SIZE):
           chunk_vectors = vectors[i:i + CHUNK_SIZE].tolist()
           chunk_ids = ids[i:i + CHUNK_SIZE]

           entities = [chunk_ids, chunk_vectors]
           collection.insert(entities)

I'm now using Milvus on my personal desktop, which has an Intel(R) Core(TM) i7-10700K CPU with 8 cores/16 threads.

xiaofan-luan · 2023-11-16T01:47:34Z

xiaofan-luan
Nov 16, 2023
Maintainer

Thank @xiaofan-luan . I don't manually call flush. Just insert data chunk by chunk as below.
collection = Collection("wiki_corpus_emb", schema)
CHUNK_SIZE = 10000
with open(file_name, 'rb') as f:
       vectors, ids = ...
       for i in range(0, len(ids), CHUNK_SIZE):
           chunk_vectors = vectors[i:i + CHUNK_SIZE].tolist()
           chunk_ids = ids[i:i + CHUNK_SIZE]

           entities = [chunk_ids, chunk_vectors]
           collection.insert(entities)
I'm now using Milvus on my personal desktop, which has an Intel(R) Core(TM) i7-10700K CPU with 8 cores/16 threads.

If that's the code you are using, seems that the use case looks code. Could you offer detailed logs so we can investigate on the reason?

To get logs you will need to run export log scripts

1 reply

ducanh997 Nov 16, 2023
Author

I also tried exporting data to numpy format, uploading them to Minio, using do_bulk_insert to insert data, and then creating an IVF_SQ8 index. But index building still takes hours.
Here are the logs I exported from Milvus and Etcd.

milvus.log
etcd.log

xiaofan-luan · 2023-11-16T06:00:15Z

xiaofan-luan
Nov 16, 2023
Maintainer

This seems to be standalone deployment.
From the it shows that the index is still under building.

Did you still have enough memory to build index?
Did you try to reboot the standalone see it could catch up?

1 reply

xiaofan-luan Nov 16, 2023
Maintainer

From the current log, It seems to be very normal from go side.
If it stuck some where inside the index. reboot should help on that

xiaofan-luan · 2023-11-16T06:03:40Z

xiaofan-luan
Nov 16, 2023
Maintainer

[2023/11/16 04:45:37.808 +00:00] [INFO] [indexnode/indexnode_service.go:209] ["Get Index Job Stats"] [traceID=714b518fc69f9d63] [Unissued=0] [Active=1] [Slot=0]
[2023/11/16 04:45:37.808 +00:00] [INFO] [indexcoord/index_builder.go:178] ["there is no IndexNode available or etcd is not serviceable, wait a minute..."]

1 reply

czs007 Nov 16, 2023
Maintainer

"there is no IndexNode available or etcd is not serviceable, wait a minute..."

This log message can be somewhat misleading. The intended meaning here is that there are no idle index nodes available, and it is highly likely unrelated to etcd.

czs007 · 2023-11-16T06:47:54Z

czs007
Nov 16, 2023
Maintainer

@ducanh997 Hi.
You mentioned that the decrease in "pending_index_rows" is slow. Could you provide a rough description of the speed at which it is decreasing? Has this number remained unchanged for several hours?

Based on your configuration, a rough estimate suggests that the "pending_index_rows" value should decrease at a rate of approximately 300,000 per minute.

I have noticed that the collection you mentioned earlier, which had 8,313,118 rows, now shows that the "pending_index_rows" value has reached 0.

[2023/11/16 04:44:31.511 +00:00] [INFO] [indexcoord/index_coord.go:952] ["IndexCoord DescribeIndex success"] [collectionID=445648711268376834] [indexID=445648711277193316] ["total rows"=8313118] ["indexed rows"=8313118] ["pending index rows"=0] ["index state"=Finished] ["check segment num"=52]

[2023/11/16 04:45:11.264 +00:00] [INFO] [indexcoord/index_coord.go:633] ["IndexCoord completeIndexInfo success"] [collID=445648711268376834] [totalRows=8313118] [indexRows=8313118] [state=Finished] [failReason=]

it seems that there are other collections still undergoing index construction.

[2023/11/16 04:44:31.789 +00:00] [INFO] [indexcoord/index_coord.go:952] ["IndexCoord DescribeIndex success"] [collectionID=445650142279534041] [indexID=445672886269949381] ["total rows"=2327471] ["indexed rows"=842270] ["pending index rows"=2327471] ["index state"=InProgress] ["check segment num"=14]

4 replies

ducanh997 Nov 16, 2023
Author

@czs007 Hi
Has this number remained unchanged for several hours?
=> No, this number changed, but at a very slow speed. Yesterday, when I started indexing 8,313,118 rows, it took about 10 hours to reach 0. During the index build process, the CPU usage was 100%.

It seems that there are other collections still undergoing index construction.
=> Yes, today I tried another way to insert 2,327,471 rows then created an index on them. But it also took about 3 or 4 hours to complete. I tried docker-compose down and up several times , recreating containers but it did not solve the problem.

I really appreciate your help.

lukezx3 Jul 19, 2024

hi how did you fix this, i appear to have the same very slow load collection / reindex speed on milvus 2.4.4, ubuntu 22.04

xiaofan-luan Jul 21, 2024
Maintainer

hi how did you fix this, i appear to have the same very slow load collection / reindex speed on milvus 2.4.4, ubuntu 22.04

load and reindex are two different process I would say. please open a seperate issue for that and offer detailed logs.

you need to check:

CPU usage of your cluster. It's better you can do some pprof and track where your cpu is used.
check what kind of cpu you use, we recommend to use intel X86 or Aws ARM cpu.
if you don't have enough resources, don't change the default segment size
if you don't have enough resources, try to use ivf flat index rather than graph index

and you also have to understand what is slow. for a 8 core machine, build a 1GB segment might take than half an hour(IVF could be much faster) and that is expected

xiaofan-luan Jul 21, 2024
Maintainer

@czs007 Hi Has this number remained unchanged for several hours? => No, this number changed, but at a very slow speed. Yesterday, when I started indexing 8,313,118 rows, it took about 10 hours to reach 0. During the index build process, the CPU usage was 100%.

It seems that there are other collections still undergoing index construction. => Yes, today I tried another way to insert 2,327,471 rows then created an index on them. But it also took about 3 or 4 hours to complete. I tried docker-compose down and up several times , recreating containers but it did not solve the problem.

I really appreciate your help.

Index build is slow. for a 8 core 32G machine, build 512MB segment may take 2-3 minutes and index node will always try to eat all cpu. 10m 1536 dim is almost 50-60GB RAW data so use couple of hours is as expected.

To solve this problem:

use more resource, scale up
use distributed milvus, scale up index node and reduce index node onece index build done
use zilliz cloud(https://zilliz.com/). We have a large pool for index building and it always dynamic scaled.

KerolosAtef · 2024-11-02T16:10:53Z

KerolosAtef
Nov 2, 2024

Hello @ducanh997, could you share your code that shows how to monitor the indexing process in Milvus?
I tried the code below but shows

{'total_rows': 0, 'indexed_rows': 0, 'pending_index_rows': 0, 'state': 'Finished'}

from pymilvus import (
    connections, 
    Collection, 
    CollectionSchema, 
    FieldSchema, 
    DataType, 
    utility,db,MilvusClient
)
import random
db_name="test_dv"
collection_name="dev_collection"
connections.connect(host="127.0.0.1", port=19530,alias="test")
if db_name not in db.list_database("test"):
    db.create_database(db_name,using="test")
milvus_vb = MilvusClient(uri='http://localhost:19530',db_name=db_name)
if milvus_vb.has_collection(collection_name):
    milvus_vb.drop_collection(collection_name)
    print("dropped")
milvus_vb.create_collection(
    collection_name=collection_name,
    schema=CollectionSchema([
        FieldSchema("id", DataType.INT64, is_primary=True),
        FieldSchema("vector", DataType.FLOAT_VECTOR, dim=128)
    ])
)
index_params = milvus_vb.prepare_index_params()
index_params.add_index(
    field_name="vector",
    index_type="IVF_FLAT",
    metric_type="IP",
    params={"nlist": 128}
)
milvus_vb.create_index(
    collection_name=collection_name,
    index_params=index_params, 
)
print("Loading...")
milvus_vb.load_collection(
            collection_name=collection_name,
            replica_number=1 # Number of replicas to create on query nodes. Max value is 1 for Milvus Standalone, and no greater than `queryNode.replicas` for Milvus Cluster.
        )
print("Loaded")
conn=milvus_vb._get_connection()
conn.get_index_build_progress(collection_name=collection_name, index_name="vector")
dummy_data =[{"id": i, "vector": [random.random() for _ in range(128)]} for i in range(10000)]
batch=1000
for i in range(0, len(dummy_data), batch):
    print("Inserting batch", i)
    milvus_vb.insert(collection_name=collection_name, data=dummy_data[i:i+batch])
    res=conn.get_index_build_progress(collection_name=collection_name, index_name="vector")
    print(res)
    
# Query
query_vector = [random.random() for _ in range(128)]
res=conn.get_index_build_progress(collection_name=collection_name, index_name="vector")
print(res)
results=milvus_vb.search(collection_name=collection_name, data=[query_vector], limit=5, output_fields=["id"])
res=conn.get_index_build_progress(collection_name=collection_name, index_name="vector")
print(res)
print(results)

7 replies

yhmo Nov 3, 2024
Collaborator

Something is wrong with the index node of your milvus. Could you provide the full log of the milvus server.

KerolosAtef Nov 3, 2024

Hi @yhmo
I'm new to this field, could you please tell me where I can find the logs to share it with you

xiaofan-luan Nov 3, 2024
Maintainer

Hi @yhmo I'm new to this field, could you please tell me where I can find the logs to share it with you

When you create index, you got 0 entities in the collection, so the collection index is already done.

that's why you get
{'total_rows': 0, 'indexed_rows': 0, 'pending_index_rows': 0, 'state': 'Finished'}

after you do insertion, if you call a flush , the index will be automatically indexed. But usually you don't need to worry about this since it's all automatically happened.

If you do want to ensure all the data is indexed, after you insert all the data you can call index build a second time and then your stats will get refreshed to something like
{'total_rows': 100000, 'indexed_rows': 100, 'pending_index_rows':99900, 'state': 'Pending'}

yhmo Nov 4, 2024
Collaborator

Actually, my script has called collection.flush() before get_index_progress().
The "total_rows":10000 already indicate the data has been accepted by milvus.
The output is:

dropped
Loading...
Loaded
Inserting batch 0
Inserting batch 1000
Inserting batch 2000
Inserting batch 3000
Inserting batch 4000
Inserting batch 5000
Inserting batch 6000
Inserting batch 7000
Inserting batch 8000
Inserting batch 9000
......
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 0, 'pending_index_rows': 10000, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 10000, 'pending_index_rows': 0, 'state': 'Finished'}
index done
{'total_rows': 10000, 'indexed_rows': 10000, 'pending_index_rows': 0, 'state': 'Finished'}
{'total_rows': 10000, 'indexed_rows': 10000, 'pending_index_rows': 0, 'state': 'Finished'}
data: ["[{'id': 7642, 'distance': 39.431365966796875, 'entity': {'id': 7642}}, {'id': 5153, 'distance': 39.3652229309082, 'entity': {'id': 5153}}, {'id': 9412, 'distance': 39.198822021484375, 'entity': {'id': 9412}}, {'id': 1331, 'distance': 39.10251998901367, 'entity': {'id': 1331}}, {'id': 1467, 'distance': 38.77155303955078, 'entity': {'id': 1467}}]"] 

Process finished with exit code 0

yhmo Nov 4, 2024
Collaborator

@KerolosAtef
If your milvus is standalone, use docker command to output its log:
Firstly show the container list to know the id of the milvus server container:
docker ps
Secondly use this command to output the milvus log:
docker logs [milvus container id] >1.log

If your milvus is deployed by k8s, you can follow this doc to ouput the log: https://github.com/milvus-io/milvus/tree/master/deployments/export-log

Very slow reindexing #28452

ducanh997 Nov 15, 2023

Replies: 7 comments · 15 replies

xiaofan-luan Nov 15, 2023 Maintainer

xiaofan-luan Nov 15, 2023 Maintainer

ducanh997 Nov 15, 2023 Author

xiaofan-luan Nov 16, 2023 Maintainer

ducanh997 Nov 16, 2023 Author

xiaofan-luan Nov 16, 2023 Maintainer

xiaofan-luan Nov 16, 2023 Maintainer

xiaofan-luan Nov 16, 2023 Maintainer

czs007 Nov 16, 2023 Maintainer

czs007 Nov 16, 2023 Maintainer

ducanh997 Nov 16, 2023 Author

lukezx3 Jul 19, 2024

xiaofan-luan Jul 21, 2024 Maintainer

xiaofan-luan Jul 21, 2024 Maintainer

KerolosAtef Nov 2, 2024

yhmo Nov 3, 2024 Collaborator

KerolosAtef Nov 3, 2024

xiaofan-luan Nov 3, 2024 Maintainer

yhmo Nov 4, 2024 Collaborator

yhmo Nov 4, 2024 Collaborator

ducanh997
Nov 15, 2023

Replies: 7 comments 15 replies

xiaofan-luan
Nov 15, 2023
Maintainer

xiaofan-luan
Nov 15, 2023
Maintainer

ducanh997 Nov 15, 2023
Author

xiaofan-luan
Nov 16, 2023
Maintainer

ducanh997 Nov 16, 2023
Author

xiaofan-luan
Nov 16, 2023
Maintainer

xiaofan-luan Nov 16, 2023
Maintainer

xiaofan-luan
Nov 16, 2023
Maintainer

czs007 Nov 16, 2023
Maintainer

czs007
Nov 16, 2023
Maintainer

ducanh997 Nov 16, 2023
Author

xiaofan-luan Jul 21, 2024
Maintainer

xiaofan-luan Jul 21, 2024
Maintainer

KerolosAtef
Nov 2, 2024

yhmo Nov 3, 2024
Collaborator

xiaofan-luan Nov 3, 2024
Maintainer

yhmo Nov 4, 2024
Collaborator

yhmo Nov 4, 2024
Collaborator