Skip to content

User interface API

jrault edited this page Dec 21, 2012 · 1 revision

The protocol used for communication between UI and Core is  JSON RPC

Here is the API of the 20101223 version :

Table of Contents

getWebEntities()

get the list of web entities of the corpus

  • params : no params
  • returns : the corpus as list of web entities
  • example :

declareWebEntity(page_lru,depth)

declare a new web entity from UI. If the web entity already exists the page is added to the existing web entity as a leaf.

No web entity will be created if the Memory Structure doesn't have any pages whose LRU containes page_lru (a lru_prefixe). We don't want to have a web entity without a web page inside it. An other to say it is : a LRU_prefixe can be a web entity only if this LRU_prefixe is present in the Memory structure as a prefixe of a longer LRU branch (which necessarly points to a page).

  • params :
    • page_lru : the reverse URL of a Web page
    • depth : the depth in the reverse URL where the web entity has to be set
  • returns : the web entity declared

getWebEntity(page_lru)

Look for the web entity in which the lru belongs. If no web entity found, create one from default depth. If the web entity already exists the page is added to the existing web entity as a leaf.

  • params :
    • page_lru : the reverse URL of a Web page
  • returns : the web entity

crawl(lrus)

starts a crawl on the web entity listed as param. One crawler is started by web entities.

  • params :
    • lrus : liste of web entity identified by their lru
  • returns : number of current crawlers (included previously started ones).

monitorCrawl()

returns information about the actualy crawling activities

  • params : no params
  • returns : number of crawlers (could be easily much better)

WebEntity.setStatus(lru,status)

set the status of a web entity

  • params :
    • lru: the reverse url of the web entity (= identifier)
    • status : the new status set as a string
  • returns : the modified web entity or "not found" if the seb entity isn't known by the corpus

WebEntity.getLinksTo(lru)

returns the links found by the crawler in the web entity pointing to other web Entities

  • params :
    • lru : the reverse url of the web entity (= identifier)
  • returns : the links of the web entity or "not found" if the seb entity isn't known by the corpus

WebEntity.getPages(lru)

returns the pages seen by the system (crawler or UI) included inside the web entity

  • params :
    • lru : the reverse url of the web entity (= identifier)
  • returns : the pages included in the web entity or "not found" if the seb entity isn't known by the corpus
Clone this wiki locally