Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Remove vector data when the corresponding cache data expires in cache storage #632

Open
Laarryliu opened this issue Jul 11, 2024 · 8 comments

Comments

@Laarryliu
Copy link

What would you like to be added?

In adapter.py line 108, when trying to search for data in cache store according to vector data:

for search_data in search_data_list:
            cache_data = time_cal(
                chat_cache.data_manager.get_scalar_data,
                func_name="get_data",
                report_func=chat_cache.report.data,
            )(
                search_data,
                extra_param=context.get("get_scalar_data", None),
                session=session,
            )
            if cache_data is None:
                continue

here, if cache_data is None means no data in cache store could be found with data from vector store.
Is it better to remove the corresponding data from vector store to avoid hit it again?

Why is this needed?

Since the data in cache store is no longer available (maybe due to a cache ttl strategy, etc.), removing vector data is necessary so that the expired cache data will no longer be accessed.

Anything else?

Please correct me if I missed some details

@SimFG
Copy link
Collaborator

SimFG commented Jul 11, 2024

I don't recommend cleaning data here, because when data becomes invalid using the evict strategy, all data will be deleted, including scalars and vectors.
detail: https://github.com/zilliztech/GPTCache/blob/main/gptcache/manager/eviction_manager.py

@Laarryliu
Copy link
Author

I don't recommend cleaning data here, because when data becomes invalid using the evict strategy, all data will be deleted, including scalars and vectors. detail: https://github.com/zilliztech/GPTCache/blob/main/gptcache/manager/eviction_manager.py

Thank you for the suggestion. After using the Redis evict strategy, I have another question: seems like there is a "deleted" field in eviction base, which eviction_manager.soft_evict() method will mark as -1, then eviction_manager.delete() method will permanently remove soft-evicted data from both scalar&vector store.
However, looks like only Memory evict storage will try to do the soft_evict operation. When Redis evict strategy is used, is there a way to mark deleted? Seems that using TTL in Redis results in permanent deletion and the corresponding vector data will never be cleaned up.

@SimFG
Copy link
Collaborator

SimFG commented Jul 12, 2024

@Laarryliu For the relationship between eviction manager and evict base, you can refer to this https://github.com/zilliztech/GPTCache/blob/main/gptcache/manager/data_manager.py#L236. In fact, we have abstracted the evict base here, and the eviction manager deletion does not actually care about the strategy.
image
But I looked at this part and found that there is indeed a bug. Here, the on_evict internal attribute of the passed evict base should be assigned a value. Thanks a lot!

If you can, please help me create the pr when you have free time.

@Laarryliu
Copy link
Author

@Laarryliu For the relationship between eviction manager and evict base, you can refer to this https://github.com/zilliztech/GPTCache/blob/main/gptcache/manager/data_manager.py#L236. In fact, we have abstracted the evict base here, and the eviction manager deletion does not actually care about the strategy. image But I looked at this part and found that there is indeed a bug. Here, the on_evict internal attribute of the passed evict base should be assigned a value. Thanks a lot!

If you can, please help me create the pr when you have free time.

Do you mean that if I init a SSDataManager with a RedisCacheEviction(or other implementations), an on_evict attribute should be init with _clear() method?
Also, in MemoryCacheEviction implementation, on_evict function is used as a callback function when a key in memory cache is evicted. So I think it's also necessary to somehow trigger the on_evict function when a key in RedisCacheEviction evicted? Now I can't figure out how on_evict is triggered in RedisCacheEviction.

@SimFG
Copy link
Collaborator

SimFG commented Jul 12, 2024

yes

@ykzeng
Copy link

ykzeng commented Nov 3, 2024

Hi @Laarryliu @SimFG Do we have any update on this issue? Any clarity on how Redis should interaction with EvictionBase and EvictionManager Instead would be appreciated and wondering if there's any PR for this soon?

@SimFG
Copy link
Collaborator

SimFG commented Nov 4, 2024

@ykzeng There is no update yet, if I have time this week I will fix it.

@SimFG
Copy link
Collaborator

SimFG commented Nov 4, 2024

@ykzeng What is the problem you are facing now? I looked at it and it seems that there are some doubts about the current implementation of redis. There seems to be no callback function after the data expires.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants