Skip to content

Raw data level

Benjamin Ooghe-Tabanou edited this page Dec 21, 2012 · 1 revision

The raw data level is the low level storage module.

In this module will be stored all information harvested by the crawler, i.e. : web page contents and extracted links.

This level of data storage has to be optimized for insertion of data. Data retrieval will only be used for ponctual needs such as rebuilding Page_link information after change in Precision_limit settings.

No specifications written yet but something like flat files would probably do the trick.