Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何实现短语屏蔽功能 #197

Open
chongqiWang opened this issue Jun 11, 2021 · 6 comments
Open

如何实现短语屏蔽功能 #197

chongqiWang opened this issue Jun 11, 2021 · 6 comments

Comments

@chongqiWang
Copy link

我想在搜索到的结果中屏蔽掉某些短语,如搜索"吕布战天下"时,屏蔽掉“战天下”这个短语,结果中允许出现“天下大乱,云长战吕布”,但是不允许“吕布大战天下”。有什么好的解决方案吗?

@shi-yuan
Copy link
Member

是不让包含这个短语的结果搜出来,还是需要搜出来只是替换成***之类的?

@chongqiWang
Copy link
Author

是不让包含这个短语的结果搜出来,还是需要搜出来只是替换成***之类的?

不让他搜出来

并且把这个短语弄成读取文件的方式

@shi-yuan
Copy link
Member

可以在搜索的时候,加must_not过滤掉

@chongqiWang
Copy link
Author

可以在搜索的时候,加must_not过滤掉

那不是得写很长过滤条件,我主要是对结果过滤,屏蔽短语

@shi-yuan
Copy link
Member

如果短语很多,
如果变化不频繁,可以考虑写索引的时候放进去,这样搜索的时候直接用这个字段来过滤
如果变化频繁,可以在获取到结果集之后,程序里处理

@shi-yuan
Copy link
Member

从内容里提取短语,参考:

词典dic_xxx内容:

战天下	a	1000
天下大乱	a	2000

示例:

import org.ansj.library.DicLibrary;
import org.nlpcn.commons.lang.tire.GetWord;
import org.nlpcn.commons.lang.tire.domain.Forest;
import java.util.Arrays;

public class Test {
    public static void main(String[] args) {
        Forest forest = DicLibrary.get("dic_xxx");
        GetWord gw = forest.getWord("如何实现短语屏蔽功能:天下大乱,云长战吕布,吕布大战天下");
        String word;
        while ((word = gw.getAllWords()) != null) {
            System.out.println(word + "============" + Arrays.toString(gw.getParam()));
        }
    }
}

输出:

天下大乱============[a, 2000]
战天下============[a, 1000]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants