Web crawler - books

My first attempt at creating a web crawler in Python. It traverses an online book shop and grabs book titles and URLs.

I have followed the tutorial that can be found at: https://realpython.com/web-scraping-with-scrapy-and-mongodb/.

Lessons Learned

While working on this project I have learned the following:

Set up and configure a project using Scrapy
Build a working web scraper using Scrapy
Extract data from websites using CSS selectors
Store scraped data in a MongoDB database

In addition, my code takes into account that sometimes duplicate data can be scraped so the duplicates are ignored.

Installation

Python: Get it from here: https://www.python.org/downloads/ or via Microsoft Store

venv:

pip install virtualenv

Scrapy:

python -m pip install scrapy

MongoDB: Download the relevant to you installer from https://www.mongodb.com/docs/manual/installation/#mongodb-community-edition-installation-tutorials. Additionally, you will need to run the below command in cmd:

python -m pip install pymongo

Deployment

To deploy this project run the following commands in the cmd:

venv

venv\Scripts\activate.bat

or add it to your PATH

Scrapy

  scrapy startproject books

MongoDB

test> use books_db
switched to db books_db
books_db> db.createCollection("books")
{ ok: 1 }
books_db> show collections
books
books_db>

Roadmap

Scrape dynamically generated content

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
books		books
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web crawler - books

Lessons Learned

Installation

Deployment

Roadmap

About

Releases

Packages

Languages

agOnGithub/Web-crawler-books

Folders and files

Latest commit

History

Repository files navigation

Web crawler - books

Lessons Learned

Installation

Deployment

Roadmap

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages