Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Backrub: implement a document link graph and incorporate backlinks in score #147

Open
charlesreid1 opened this issue Nov 2, 2018 · 0 comments

Comments

@charlesreid1
Copy link
Contributor

Project Backrub is our implementation of a PageRank-like algorithm that accounts for the number of back-links (that is, pages that link back to a given page), which is a measure of popularity of a document.

See this comment in dib-lab/copper#305:

The graph concept could also be extended and utilized by centillion, for example to enhance the ranking system (with a graph where nodes are documents and edges are interlinked documents, highly linked-to documents receive higher weighting in centillion)...

This helps centillion to transition to a high-level view of documents in the Data Commons - and will greatly improve its ability to retrieve the most relevant results based on the number of back-links to a document. (The PageRank algorithm was originally called Backrub.)

However, linking this idea of the graph structure to centillion... would be centric to the documents indexed by centillion (i.e., it would be restricted to a particular folder hierarchy).

The idea is to assemble a graph of documents in the search index (node = document indexed by centillion, directed edge = link from document A to document B), compute the in-degree of each node in the graph (number of documents that link to a given document), and store this in the search index, for use in the scoring mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant