-
Notifications
You must be signed in to change notification settings - Fork 59
First prototype basic user scenario
Benjamin Ooghe-Tabanou edited this page Dec 21, 2012
·
1 revision
Known Limitations :
- only one corpus by server
- no security or authentification
- configuration of the corpus
- admin set the precision limit in core.settings.py
- admin set the default web entity creation rule in core.settings.py which will be inserted in the memory structure
- creating a corpus
- the user add pages to the corpus (system will apply the default creation rule)
- the user can change web entities based on those pages inserted (alias also)
- crawling and filtering
- the user ask to crawl some of the web entitites created :
- the core will then retrive the pages of that WE from the MS and pass them as starting points to the crawler
- the crawler stores page crawled in a queue
- the core consume the queue and ask the Memory strucutre to store them (through the cache system), fire webentity creations, filter links depending on Precision Exception...
- after a while, the user ask the content of a webentity to see new pages discovered by the crawl
- after a while, the user refresh the list of web entities
- the user creates new web entity based on the pages found by the crawler
- the user launch new crawl tasks
- advance use
- the user creates webentity creation rules
- the user set precision limits on some specific pages
- export corpus
- the suer ask a gexf of the network of webentities
- corpus configuration
- PRECISION_LIMIT : 4
- webentity default rule : at first subdomain
- starting points are :
- www.sciences-po.fr
- www.sciencespo.fr
- medialab.sciences-po.fr
- web entities created at step 1 :
- SCIENCES PO : alias of fr|sciences-po|www & fr|sciencespo|www
- MEDIALAB : fr|sciences-po|medialab
- crawl tasks at step 1 :
- display new web entities