An example use-case for Queues: a web crawler built on Browser Rendering and Puppeteer. The crawler finds the number of links to Cloudflare.com on the site, and archives a screenshot to Workers KV.
For this project, Queues helps batch sites to be crawled, which limits the overhead of opening and closing new Puppeteer instances. Because loading pages and scraping links takes some time, Queues makes it possible to respond to inbound crawl requests instantly while providing peace of mind that the long-running crawl will be triggered. Queues also helps handle bursty traffic and reliability issues!
This assumes you have access to the Browser Rendering feature - you can join the waitlist here.
First, fork this project. Install Node.js and Wrangler, and run npm install
.
Then, to configure your project and deploy on Cloudflare Workers:
- Go to the Dash and click on Workers & Pages > Queues > Create queue. Enter a Queue name.
- In the
pages
directory,wrangler pages deploy .
, and enter a project name (PROJECT_NAME
). - Go to the Dash and click on Workers & Pages > Overview >
PROJECT_NAME
> Settings > Functions > Queue Producers bindings > Add binding. - Set the variable name to
CRAWLER_QUEUE
and select your queue as the Queue name. Click "Save". - In the Dash, click on Workers & Pages > KV > Create a namespace. Create one namespace called
crawler_screenshots
and one calledcrawler_links
. - Create two KV namespace bindings. Set
CRAWLER_LINKS_KV
as first's variable name andcrawler_links
as the KV namespace. Then, setCRAWLER_SCREENSHOTS_KV
as the second's variable name andcrawler_screenshots
as the KV namespace. - In the
consumer
directory, update thewrangler.toml
file with your new KV namespace IDs. Also update the[[queues.consumers]]
name to the Queue you created. - In the
consumer
directory,wrangler deploy
.
Your Queues-powered web crawler will be live!