Skip to content

Commit

Permalink
improve setting up
Browse files Browse the repository at this point in the history
  • Loading branch information
vdusek committed Aug 21, 2024
1 parent ca80ebd commit d565113
Showing 1 changed file with 39 additions and 15 deletions.
54 changes: 39 additions & 15 deletions docs/introduction/01-setting-up.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,52 +20,76 @@ pip --version

## Installation

Crawlee is available as the [`crawlee`](https://pypi.org/project/crawlee/) PyPI package.
Crawlee is available as the [`crawlee`](https://pypi.org/project/crawlee/) PyPI package. To install the core package, use:

```sh
pip install crawlee
```

Additional, optional dependencies unlocking more features are shipped as package extras.
After installation, verify that Crawlee is installed correctly by checking its version:

If you plan to parse HTML and use CSS selectors, install `crawlee` with either the `beautifulsoup` or `parsel` extra:
```sh
python -c 'import crawlee; print(crawlee.__version__)'
```

Crawlee offers several optional features through package extras. You can choose to install only the dependencies you need or install everything if you don't mind the package size.

### Install all features

If you do not care about the package size, install Crawlee with all features:

```sh
pip install 'crawlee[beautifulsoup]'
pip install 'crawlee[all]'
```

### Installing only specific extras

Depending on your use case, you may want to install specific extras to enable additional functionality:

#### BeautifulSoup

For using the [`BeautifulSoupCrawler`](https://crawlee.dev/python/api/class/BeautifulSoupCrawler), install the `beautifulsoup` extra:

```sh
pip install 'crawlee[parsel]'
pip install 'crawlee[beautifulsoup]'
```

If you plan to use a (headless) browser, install `crawlee` with the `playwright` extra:
#### Parsel

For using the [`ParselCrawler`](https://crawlee.dev/python/api/class/ParselCrawler), install the `parsel` extra:

```sh
pip install 'crawlee[playwright]'
pip install 'crawlee[parsel]'
```

Then, install the Playwright dependencies:
#### Curl impersonate

For using the [`CurlImpersonateHttpClient`](https://crawlee.dev/python/api/class/CurlImpersonateHttpClient), install the `curl-impersonate` extra:

```sh
playwright install
pip install 'crawlee[curl-impersonate]'
```

You can install multiple extras at once by using a comma as a separator:
#### Playwright

If you plan to use a (headless) browser with [`PlaywrightCrawler`](https://crawlee.dev/python/api/class/PlaywrightCrawler), install Crawlee with the `playwright` extra:

```sh
pip install 'crawlee[beautifulsoup,playwright]'
pip install 'crawlee[playwright]'
```

Or if you do not care about the package size, you can install everything:
After installing the playwright extra, install the necessary Playwright dependencies:

```sh
pip install 'crawlee[all]'
playwright install
```

Verify that Crawlee is successfully installed:
### Installing multiple extras

You can install multiple extras at once by using a comma as a separator:

```sh
python -c 'import crawlee; print(crawlee.__version__)'
pip install 'crawlee[beautifulsoup,curl-impersonate]'
```

## With Crawlee CLI
Expand Down

0 comments on commit d565113

Please sign in to comment.