Skip to content

Scrapy outsourcing contract

jrault edited this page Dec 21, 2012 · 1 revision

We chose to ask the scrapinghub to help us out with implementing our intentions using scrapy. Here is the specifications of the work to be achieved.

Table of Contents

Task 1 : implementation proposal

Work and deliverable

design and prepare specification for HCI crawler implementation on Scrapy, and Scrapy changes/improvements required. This work will be lead by the questions gathered here : Core#scrapy_implementation

Execution period

Nov 15th - 30th (1 week)

estimated amount of work

6 billable days

Results

The results of this task can be found in Scrapy implementation proposal

Task 2 : community management proposal

Work and deliverable

discuss and agree over the interaction between Scrapy and HCI open source projects. The objectives is to think about the best community management organisation so that both project can benefit one from another from future developments (scrapy-developers list for discussion, github pull requests, etc)

Execution period

Nov 15th - 30th (2 weeks)

estimated amount of work

2 billable days

Results

The results of this task can be found in Scrapy community management

Task 3 : training & development

Work and deliverable

  • Scrapy training to HCI team
  • Development of a prototype based on the implementation proposal from specifications written at this page Core and from Task 1 results
  • task 3 will include a visit to Paris at Sciences Po médialab : a week, from Dec 5th to 9th

Execution period

December 2011

estimated amount of work

16 billable days

Results

Task closed on December 15th.

Task 4 : Developing the beta version

Work and deliverable

    • Code cleaning and refactoring if needed
    • Bug corrections and minor features implementation
    • Writing the documentation
    • Unit testing and quick benchmark

Execution period

December 2011

estimated amount of work

13 billable days

Results

Task closed on 27th of December. See documentation about the Crawler architecture.

Clone this wiki locally