You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now we convert from numpy types and back throughout the code. Besides being messy, this also creates additional memory overhead which can significantly impact performance when ingesting data, because it may cause a user's system to start hard swapping.
When adding 16,384 embeddings, the memory used at the end of Collection.add() is 594 MiB. When adding 32,768 embeddings, the memory used at the end of Collection.add() is 996 MiB. Thus, an additional 25 KiB is used per 384 dimension embedding during inserts. The minimum byte size of a 384 f32 embedding is 4 * 384 = 1.5 KiB, so there’s about 16x more memory used than the theoretical minimum.
This also might be a good opportunity to improve HTTP performance by using a serialization format other than JSON, which we know to be quite inefficient for large arrays of floating point numbers.
If it's possible to use a compact serialization that can be converted efficiently to Numpy arrays, it would improve performance quite a bit across the board.
This is a good point, I think we will punt it out of this milestone, but we should create an issue for it. @levand could you please do that and link it here?
Numpy Everywhere
Right now we convert from numpy types and back throughout the code. Besides being messy, this also creates additional memory overhead which can significantly impact performance when ingesting data, because it may cause a user's system to start hard swapping.
From @codetheweb's profiling;
When adding 16,384 embeddings, the memory used at the end of
Collection.add()
is 594 MiB. When adding 32,768 embeddings, the memory used at the end ofCollection.add()
is 996 MiB. Thus, an additional 25 KiB is used per 384 dimension embedding during inserts. The minimum byte size of a 384 f32 embedding is4 * 384 = 1.5 KiB
, so there’s about 16x more memory used than the theoretical minimum.Profiling (https://pypi.org/project/memory-profiler/) revealed that:
chroma/chromadb/api/models/CollectionCommon.py
Line 559 in f66b47d
chroma/chromadb/api/segment.py
Line 358 in f66b47d
We stand to gain a lot and don't lose much / anything by sticking to a numpy representation of embeddings.
The text was updated successfully, but these errors were encountered: