Skip to content

Commit

Permalink
improve html crawler keyword
Browse files Browse the repository at this point in the history
  • Loading branch information
Saurav Jain authored and Saurav Jain committed May 17, 2024
1 parent 6cc2297 commit e5b8f31
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/examples/http_crawler.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
import ApiLink from '@site/src/components/ApiLink';
import HttpCrawlerSource from '!!raw-loader!roa-loader!./http_crawler.ts';

This example demonstrates how to use <ApiLink to="http-crawler/class/HttpCrawler">`HttpCrawler`</ApiLink> to crawl a list of URLs from an external file, load each URL using a plain HTTP request, and save HTML.
This example demonstrates how to use <ApiLink to="http-crawler/class/HttpCrawler">`HttpCrawler`</ApiLink> to build a crawler that crawls a list of URLs from an external file, load each URL using a plain HTTP request, and save HTML.

<RunnableCodeBlock className="language-js" type="cheerio">
{HttpCrawlerSource}
Expand Down
5 changes: 3 additions & 2 deletions docs/examples/http_crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ const crawler = new HttpCrawler({
// Store the results to the dataset. In local configuration,
// the data will be stored as JSON files in ./storage/datasets/default
await Dataset.pushData({
url: request.url,
body,
url: request.url, // URL of the page
body, // HTML code of the page
});
},

Expand All @@ -47,6 +47,7 @@ const crawler = new HttpCrawler({
});

// Run the crawler and wait for it to finish.
// It will crawl a list of URLs from an external file, load each URL using a plain HTTP request, and save HTML
await crawler.run([
'https://crawlee.dev',
]);
Expand Down

0 comments on commit e5b8f31

Please sign in to comment.