Skip to content

Commit

Permalink
Update docs for PLAYWRIGHT_PROCESS_REQUEST_HEADERS setting
Browse files Browse the repository at this point in the history
  • Loading branch information
elacuesta committed Jul 16, 2024
1 parent 35d48b7 commit d288003
Showing 1 changed file with 43 additions and 3 deletions.
46 changes: 43 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,12 +288,17 @@ default headers could be sent as well). Coroutine functions (`async def`) are su
This will be called at least once for each Scrapy request, but it could be called additional times
if Playwright generates more requests (e.g. to retrieve assets like images or scripts).

The function must return a `dict` object, and receives the following positional arguments:
The function must return a `Dict[str, str]` object, and receives the following three **keyword** arguments:

```python
- browser_type: str
- browser_type_name: str
- playwright_request: playwright.async_api.Request
- scrapy_headers: scrapy.http.headers.Headers
- scrapy_request_data: dict
* method: str
* url: str
* headers: scrapy.http.headers.Headers
* body: Optional[bytes]
* encoding: str
```

The default function (`scrapy_playwright.headers.use_scrapy_headers`) tries to
Expand All @@ -308,6 +313,41 @@ set by Playwright will be sent. Keep in mind that in this case, headers passed
via the `Request.headers` attribute or set by Scrapy components are ignored
(including cookies set via the `Request.cookies` attribute).

Example:
```python
async def custom_headers(
*,
browser_type_name: str,
playwright_request: playwright.async_api.Request,
scrapy_request_data: dict,
) -> Dict[str, str]:
headers = await playwright_request.all_headers()
if browser_type == "firefox":
headers["User-Agent"] = "asdf"
else:
scrapy_headers = scrapy_request_data["headers"].to_unicode_dict()
headers["Content-Type"] = scrapy_headers.get("Content-Type")
return headers

PLAYWRIGHT_PROCESS_REQUEST_HEADERS = custom_headers
```

#### Deprecated argument handling

In version 0.0.39 and earlier arguments were passed to the function positionally,
and only the Scrapy headers were passed instead of a dictionary with data about the
Scrapy request.
This is deprecated since version 0.0.40, and support for this way of handling arguments
will eventually be removed in accordance with the [Deprecation policy](#deprecation-policy).

Passed arguments:
```python
- browser_type: str
- playwright_request: playwright.async_api.Request
- scrapy_headers: scrapy.http.headers.Headers
```

Example:
```python
def custom_headers(
browser_type: str,
Expand Down

0 comments on commit d288003

Please sign in to comment.