-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change headers to a dict that parses comma-separated values #7679
base: master
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Some questions to consider here:
I think all of these being done directly in the parsers is best for performance, and would make for easier and less error-prone usage. |
HEXDIGIT = re.compile(rb"[0-9a-fA-F]+") | ||
|
||
|
||
class HeadersDictProxy(Mapping[str, str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you compared performance compared to inheriting multidict and overriding the behaviors of storing the data and outputting the combined values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation details are not a concern yet, so will come back to that. First, I want to get consensus that this is the correct approach and should be changed in v4, given that it is likely to cause backwards-compatibility breakages for atleast a small proportion of users.
If you've got any information in the specs to answer those questions, that'd be great to have. |
Just by their nature, I think it's certainly safe to deduplicate the content negotiation fields defined in Section 12.5 of RFC 9110. However, AFAICT, the RFC has no guidance on what quality value to assign if the duplicates happen to disagree. Seems like server's choice in that edge case would be conformant. Other list headers should not be deduplicated because duplicates can actually mean something. For example, |
OK, now having time to think over this, I think it's all a level above what we should be doing here. I think we're just providing a list based on the definition of a general HTTP field. So, I feel the answer is no to all 3 questions. Maintaining logic for all the different kind of headers seems out of scope to me (and if we did, it'd be through dedicated attributes, like cookies). |
After reading the spec a bit more, I guess I agree with you on the first two, but parameters are actually generically defined in section 5.6.6. The syntax and case-insensitivity of parameter names is defined there (the content negotiation headers just happen to use "q" as the name). I think the parsers should provide a way to access them as a dictionary (maybe by returning a list subclass?). |
(('"applebanna, this',), ('"applebanna', "this")), | ||
(('fooo", "bar"',), ('fooo"', "bar")), | ||
((" spam , eggs ",), ("spam", "eggs")), | ||
((" spam ", " eggs "), ("spam", "eggs")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add tests for escaped quotes (e.g. "foo\"bar"
), maybe also escaped backslash, if that's valid (e.g. "foo\\"
or "foo\\\""
).
This is a proposal to change the headers from a CIMultiDict to a more regular dict (in v4). The problem with the multidict approach is that list headers (i.e. headers that can have multiple values) can have values combined in single headers and/or split over multiple headers.
Basically, these 2 payloads should be considered equivalent:
But, currently aiohttp will produce in the first case:
And, in the second case:
The spec recommends concatenating duplicate headers together with
", "
. This is also what the vast majority of existing software does (including requests).The only problem with this, is that if the user wants the values as a list, they are left to parse the value themselves, which when accounting for quoted values becomes quite complex and easy to get wrong on edge cases. So, in this proposal I've concatenated the values as recommended, but added a
.getall()
method which parses the final value to get the list.With the code in this PR, both of the previous payloads produce the same output:
From kenballus's testing:
I've also seen that requests combines into a regular dict. The only other library I've seen that uses a multidict for this is Starlette.
Also: https://www.rfc-editor.org/rfc/rfc9110.html#name-recipient-requirements