Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record dumper assumes content type and content length #17

Open
jnioche opened this issue Jul 20, 2016 · 0 comments
Open

record dumper assumes content type and content length #17

jnioche opened this issue Jul 20, 2016 · 0 comments

Comments

@jnioche
Copy link

jnioche commented Jul 20, 2016

As stated in [http://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.0/#content-type]

All records with a non-empty block (non-zero Content-Length), except ‘continuation’ records, should have a Content-Type field. Only if the media type is not given by a Content-Type field, a reader may attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the reader should treat it as type “application/octet-stream”.

This is a should not a must. The record dumper should not assume that a record has a content type or content length. It currently crashes on such records but should be able to handle such cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant