In this project, I used scrapy to scrape amazon webpages detailing laptop information.
- A crawler is run which first scrapes the original index page from which the laptop descriptions and prices are saved.
- Then each laptop's href is followed to get the extended description of each laptop.
- All the info is saved to a pandas dataframe that is stored locally.
Similarly in this part, I've used scrapy's crawler to extract book info from the web-scraping practice website https://books.toscrape.com/index.html
- The book titles and prices are extracted from the main index page along with each book's url
- Each book's main page is followed by the crawler using the href and the book description is extracted and appended to a list
- All the lists (titles, prices, urls, descriptions) are added as columns to a pandas dataframe which is further stored as a csv locally