Project Backrub: implement a document link graph and incorporate backlinks in score #147

charlesreid1 · 2018-11-02T05:55:09Z

Project Backrub is our implementation of a PageRank-like algorithm that accounts for the number of back-links (that is, pages that link back to a given page), which is a measure of popularity of a document.

See this comment in dib-lab/copper#305:

The graph concept could also be extended and utilized by centillion, for example to enhance the ranking system (with a graph where nodes are documents and edges are interlinked documents, highly linked-to documents receive higher weighting in centillion)...

This helps centillion to transition to a high-level view of documents in the Data Commons - and will greatly improve its ability to retrieve the most relevant results based on the number of back-links to a document. (The PageRank algorithm was originally called Backrub.)

However, linking this idea of the graph structure to centillion... would be centric to the documents indexed by centillion (i.e., it would be restricted to a particular folder hierarchy).

The idea is to assemble a graph of documents in the search index (node = document indexed by centillion, directed edge = link from document A to document B), compute the in-degree of each node in the graph (number of documents that link to a given document), and store this in the search index, for use in the scoring mechanism.

charlesreid1 mentioned this issue Nov 29, 2018

Provide suggestions for similar documents #106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Backrub: implement a document link graph and incorporate backlinks in score #147

Project Backrub: implement a document link graph and incorporate backlinks in score #147

charlesreid1 commented Nov 2, 2018

Project Backrub: implement a document link graph and incorporate backlinks in score #147

Project Backrub: implement a document link graph and incorporate backlinks in score #147

Comments

charlesreid1 commented Nov 2, 2018