-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add-blog #2539
docs: add-blog #2539
Conversation
Cc: @janbuchar / @barjin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice article. I found several minor issues worth addressing. Also it'd be nice to
- link to a repo with the complete source code, ideally it should be hosted on Apify's github
- run all code snippets through a code formatter
website/blog/2024/06-10-creating-a-netflix-show-recommender-using-crawlee-and-react/index.md
Outdated
Show resolved
Hide resolved
|
||
## Prerequisites | ||
|
||
To use Crawlee, you need to have Node.js 16 or higher version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use Crawlee, you need to have Node.js 16 or higher version. | |
To use Crawlee, you need to have Node.js 16 or newer. |
|
||
You can install the latest version of Node.js from the [official website](https://nodejs.org/en/). This great [Node.js installation guide](https://blog.apify.com/how-to-install-nodejs/) gives you tips to avoid issues later on. | ||
|
||
## Creating React app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Creating React app | |
## Creating a React app |
npx create-vite@latest | ||
``` | ||
|
||
You can check out the [Vite Docs](https://vitejs.dev/guide/) to create a React app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check out the [Vite Docs](https://vitejs.dev/guide/) to create a React app. | |
You can check out the [Vite Docs](https://vitejs.dev/guide/) for more details on how to create a React app. |
|
||
Additionally, Crawlee supports headless browser libraries like [Playwright](https://playwright.dev/) and [Puppeteer](https://pptr.dev/) for scraping of websites that are JavaScript-rendered. | ||
|
||
After installing the libraries, it’s time to create the scraper code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, which ones do I need? Netflix is an SPA, so I'll need Playwright or Puppeteer, right? Which one do we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Netflix is an SPA but in this use case, it works good with CheerioCrawler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should just remove this or reword it so that it's clear that playwright will not be necessary.
const allShows = []; | ||
let genreShows = []; | ||
shows.forEach((show) => { | ||
genreShows.push(show); | ||
if (genreShows.length === 40) { | ||
allShows.push(genreShows); | ||
genreShows = []; | ||
} | ||
}); | ||
if (genreShows.length > 0) { | ||
allShows.push(genreShows); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on the shows
array to be sorted by genre and having exactly 40 items, I'd make a Map
with the genre as key and an array of show titles as value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or - even better - capture the structure straight from the HTML. You can call $(element)
to query the element's subtree.
const out = $('[data-uia="collections-row"]').map((_, el) => { // get a genre row
const genre = $(el).find('[data-uia="collections-row-title"]').text(); // pick its (genre) title
const items = $(el).find('[data-uia="collections-title"]').map((_, el) => $(el).text()).get(); // pick all items in the genre
return { genre, items };
});
npm start | ||
``` | ||
|
||
After running this command, you will see a `storage` folder with the `key_value_stores/default/results.json` file. The scrapped data will be stored in JSON format in this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After running this command, you will see a `storage` folder with the `key_value_stores/default/results.json` file. The scrapped data will be stored in JSON format in this file. | |
After running this command, you will see a `storage` folder with the `key_value_stores/default/results.json` file. The scraped data will be stored in JSON format in this file. |
} | ||
|
||
function App() { | ||
const [count, setCount] = useState(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not a count, it's an index into an array of shows, isn't it?
Add the following code in `'scripts'` object: | ||
|
||
``` | ||
'start': 'node src/scraper.js' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'start': 'node src/scraper.js' | |
"start": "node src/scraper.js" |
slug: netflix-show-recommender | ||
title: 'Building a Netflix show recommender using Crawlee and React' | ||
tags: [community] | ||
description: 'Create a Netflix show recommendation system using Crawlee to scrape the data, JavaScript to code, and React to build the front end.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the wording we agreed upon? 😄 If I clicked on an article about recommender systems with scraping (sounds super cool) and got a simple React App, I would be a bit disappointed.
Netflix is a large player in the realm of recommender systems, with the Netflix Prize, their research papers, and stuff... This article is going to have a lot of very strong SEO competition with this name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we agreed on it. I got the point, and I agree you are right, but this article is not from us. As said in the beginning, it is from one of the community members, and we made it clear that it is not supposed to be that perfect; it is just an app showing something like this can be created through a little work.
|
||
:::tip | ||
Before we start this tutorial, we recommend you [visit Crawlee's GitHub](https://github.com/apify/crawlee) and check out the codebase and installation guide. If you like Crawlee, do give us a star. | ||
If you are liking this blog so far, we request you to [give Crawlee a star on GitHub](https://github.com/apify/crawlee), it helps us to reach and help more developers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One paragraph in? That's a bit early to like it.
genre: genres, | ||
shows: shows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious - storing genres
(plural) under genre
(singular)
@janbuchar can we please make it live today? I need to give link to marketing for the newsletter :) |
To use Crawlee, you need to have Node.js 16 or newer. | ||
|
||
:::tip | ||
If you are Crawlee blog so far, we request you to [give Crawlee a star on GitHub](https://github.com/apify/crawlee), it helps us to reach and help more developers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are Crawlee blog so far, we request you to [give Crawlee a star on GitHub](https://github.com/apify/crawlee), it helps us to reach and help more developers. | |
If you like the posts on the Crawlee blog so far, please consider [giving Crawlee a star on GitHub](https://github.com/apify/crawlee), it helps us to reach and help more developers. |
I believe so, there is just a handful of comments to resolve. |
@janbuchar done :) |
Adding new blog to Crawlee Blog