Skip to content

Commit

Permalink
docs: compress blog pictures (webp) (#2554)
Browse files Browse the repository at this point in the history
During a recent blog post review, I noticed huge image sizes in some of
the posts. This PR compresses the embedded images to save bandwidth and
reach lower load times.


![image](https://github.com/apify/crawlee/assets/61918049/e0c07dc1-4768-47fe-b550-83fe0947ff4d)
  • Loading branch information
barjin authored Jun 24, 2024
1 parent 0d1f644 commit 4d718c1
Show file tree
Hide file tree
Showing 24 changed files with 16 additions and 16 deletions.
4 changes: 2 additions & 2 deletions website/blog/2024/02-22-launching-crawlee-blog/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
slug: crawlee-blog-launch
title: 'Launching Crawlee Blog'
description: 'Your Node.js resource hub for web scraping and automation.'
image: https://raw.githubusercontent.com/souravjain540/crawlee-first-blog/main/og-image.png
image: https://raw.githubusercontent.com/souravjain540/crawlee-first-blog/main/og-image.webp
author: Saurav Jain
authorTitle: Developer Community Manager
authorURL: https://github.com/souravjain540
authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4
authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4&s=48
authorTwitter: sauain
---

Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
slug: how-to-scrape-amazon
title: 'How to scrape Amazon products'
description: 'A detailed step-by-step guide to scraping products on Amazon using TypeScript, Cheerio, and Crawlee.'
image: ./img/how-to-scrape-amazon.png
image: ./img/how-to-scrape-amazon.webp
author: Lukáš Průša
authorTitle: Junior Web Automation Engineer
authorURL: https://github.com/Patai5
authorImageURL: ./img/lukasp.png
authorImageURL: ./img/lukasp.webp
---

## Introduction
Expand All @@ -15,7 +15,7 @@ Amazon is one of the largest and most complex websites, which means scraping it

In this guide, we'll be extracting information from Amazon product pages using the power of [TypeScript](https://www.typescriptlang.org) in combination with the [Cheerio](https://cheerio.js.org) and [Crawlee](https://crawlee.dev) libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.

![How to scrape Amazon using Typescript, Cheerio, and Crawlee](./img/how-to-scrape-amazon.png)
![How to scrape Amazon using Typescript, Cheerio, and Crawlee](./img/how-to-scrape-amazon.webp)

<!--truncate-->

Expand All @@ -37,7 +37,7 @@ To begin with, let's identify the product fields that we're interested in scrapi
- Image URLs
- Product Overview Attributes

![Image highlighting the product fields to be scraped on Amazon](./img/fields-to-scrape.png)
![Image highlighting the product fields to be scraped on Amazon](./img/fields-to-scrape.webp)

For now, our focus will be solely on the scraping part. In a later section, we'll shift our attention to Crawlee, our crawling tool. Let's begin!

Expand All @@ -46,7 +46,7 @@ For now, our focus will be solely on the scraping part. In a later section, we'l
Our first step will be to utilize [browser DevTools](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/Tools_and_setup/What_are_browser_developer_tools) to inspect the layout and discover the [CSS selectors](https://developer.mozilla.org/en-US/docs/Learn/CSS/Building_blocks/Selectors) for the data points we aim to scrape. (by default on [Chrome](https://developer.chrome.com/docs/devtools), press `Ctrl + Shift + C`)

For example, let's take a look at how we find the selector for the product title:
![Amazon product title selector in DevTools](./img/dev-tools-example.png)
![Amazon product title selector in DevTools](./img/dev-tools-example.webp)

The product title selector we've deduced is `span#productTitle`. This selector targets all `span` elements with the id of `productTitle`. Luckily, there's only one such element on the page - exactly what we're after.

Expand Down
Binary file not shown.
Binary file not shown.
4 changes: 2 additions & 2 deletions website/blog/2024/04-23-scrapy-vs-crawlee/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
slug: scrapy-vs-crawlee
title: 'Scrapy vs. Crawlee'
description: 'Which web scraping library should you use in 2024? Learn how each handles headless mode, autoscaling, proxy rotation, errors, and anti-scraping techniques.'
image: ./img/scrapy-vs-crawlee.png
image: ./img/scrapy-vs-crawlee.webp
author: Saurav Jain
authorTitle: Developer Community Manager
authorURL: https://github.com/souravjain540
authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4
authorImageURL: https://avatars.githubusercontent.com/u/53312820?v=4&s=48
authorTwitter: sauain
---

Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ slug: netflix-show-recommender
title: 'Building a Netflix show recommender using Crawlee and React'
tags: [community]
description: 'Create a Netflix show recommendation system using Crawlee to scrape the data, JavaScript to code, and React to build the front end.'
image: ./img/create-netflix-show-recommender.png
image: ./img/create-netflix-show-recommender.webp
author: Ayush Thakur
authorTitle: Community Member of Crawlee
authorURL: https://github.com/ayush2390
authorImageURL: https://avatars.githubusercontent.com/u/43995654?v=4
authorImageURL: https://avatars.githubusercontent.com/u/43995654?v=4&s=48
authorTwitter: JSAyushThakur
---

Expand All @@ -19,7 +19,7 @@ In this blog, we'll guide you through the process of using Vite and Crawlee to b
One of our community members wrote this blog as a contribution to Crawlee Blog. If you want to contribute blogs like these to Crawlee Blog, please reach out to us on our [discord channel](https://apify.com/discord).
:::

![How to scrape Netflix using Crawlee and React to build a show recommender](./img/create-netflix-show-recommender.png)
![How to scrape Netflix using Crawlee and React to build a show recommender](./img/create-netflix-show-recommender.webp)

<!-- truncate -->

Expand Down Expand Up @@ -47,13 +47,13 @@ You can check out the [Vite Docs](https://vitejs.dev/guide/) for more details on

Once the React app is created, open it in VS Code.

![react](./img/react.png)
![react](./img/react.webp)

This will be the structure of your React app.

Run `npm run dev` command in the terminal to run the app.

![viteandreact](./img/viteandreact.png)
![viteandreact](./img/viteandreact.webp)

This will be the output displayed.

Expand Down Expand Up @@ -85,11 +85,11 @@ To scrape the genres and shows, we will utilize the [browser DevTools](https://d

We can capture the HTML structure and call `$(element)` to query the element's subtree.

![genre](./img/genre.png)
![genre](./img/genre.webp)

Here, we can observe that the name of the genre is captured by a `span` tag with `nm-collections-row-name` class. So we can use the `span.nm-collections-row-name` selector to capture this and similar elements.

![title](./img/title.png)
![title](./img/title.webp)

Similarly, we can observe that the title of the show is captured by the `span` tag having `nm-collections-title-name` class. So we can use the `span.nm-collections-title-name` selector to capture this and similar elements.

Expand Down

0 comments on commit 4d718c1

Please sign in to comment.