500 error on upload w/ missing col/row values #45

rebeccabilbro · 2016-01-18T20:06:16Z

I'm getting an error when I attempt to upload datasets that have missing values in some of the columns/rows. Noticing this because a lot of gov't datasets use the first few rows of a table to provide metadata info.

rebeccabilbro · 2016-01-18T22:09:16Z

good (terrible) example: https://www.ssa.gov/foia/html/FY08CSV.csv

rebeccabilbro · 2016-01-18T22:20:13Z

another great one: http://www.planecrashinfo.com/1920/1920.htm

rebeccabilbro · 2016-01-18T22:22:40Z

another example: https://catalog.data.gov/dataset/veterans-health-administration-2008-hospital-report-card-patient-satisfaction

bbengfort · 2016-01-19T19:04:34Z

Great examples - I definitely noticed all the error messages that came up as you were experimenting! The actual error seems to be a Unicode decoding error, which is actually potentially more serious. It leads to the question of whether or not these files are actually unicode encoded or if they have some other scheme (making things way more difficult).

rebeccabilbro · 2016-01-19T22:18:40Z

Hmm, sounds like it's potentially related to my #43 then?

bbengfort · 2016-01-20T12:02:35Z

Potentially, though encoding detection is more of an annoying task that's tough to figure out. You could use the file command in your terminal to see if your computer knows the encoding. It's definitely something I'll take a look at.

rebeccabilbro · 2016-01-25T21:10:40Z

Found some more test data that might help with this issue, see: https://github.com/okfn/messytables/tree/7e4f12abef257a4d70a8020e0d024df6fbb02976/horror

lauralorenz · 2017-05-22T21:44:32Z

Ok, so specifically for the files that @rebeccabilbro first linked, in terms of unicode decode errors, this problem has been solved, assumedly from the python 3.x upgrade in which the default python encoding is utf-8 instead of ascii so these utf-8 (or utf-8 subset) encoded files no longer caused unicode errors. Specifically I today tested the CSVs from https://www.ssa.gov/foia/html/FY08CSV.csv and https://catalog.data.gov/dataset/veterans-health-administration-2008-hospital-report-card-patient-satisfaction, and the HTML from http://www.planecrashinfo.com/1920/1920.htm, with both file storage and S3 backend and none of them caused an error.

How we want to deal with files in encodings that are not utf-8 encoded is a much broader question. For example a utf-16le encoded file (i.e. https://github.com/okfn/messytables/blob/7e4f12abef257a4d70a8020e0d024df6fbb02976/horror/utf-16le_encoded.csv) won't work right now since utf-16le isn't a subset of utf-8, but I'm not sure yet if we really care. IMHO, it was unreasonable to expect ascii encoding, but is not unreasonable to expect utf-8/utf-8-subset encoding.

So, given all of that I am going to close this issue in terms of the scope of the initial bug. However I will make a note on the roadmap issue for consideration more generally about how we want to deal with non-utf-8-subset encodings in this project for the future. cc @rebeccabilbro @ojedatony1616 @bbengfort @looselycoupled

rebeccabilbro mentioned this issue Jan 18, 2016

Gather/create datasets for auto analysis validation #42

Closed

bbengfort added type: bug priority: high labels Jul 9, 2016

bbengfort added this to the Version 0.3 milestone Jul 9, 2016

rebeccabilbro added Django Intermediate labels Dec 9, 2016

lauralorenz self-assigned this May 22, 2017

lauralorenz closed this as completed May 22, 2017

lauralorenz mentioned this issue May 22, 2017

Examine existing documentation & develop roadmap #81

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

500 error on upload w/ missing col/row values #45

500 error on upload w/ missing col/row values #45

rebeccabilbro commented Jan 18, 2016

rebeccabilbro commented Jan 18, 2016

rebeccabilbro commented Jan 18, 2016

rebeccabilbro commented Jan 18, 2016

bbengfort commented Jan 19, 2016

rebeccabilbro commented Jan 19, 2016

bbengfort commented Jan 20, 2016

rebeccabilbro commented Jan 25, 2016

lauralorenz commented May 22, 2017

500 error on upload w/ missing col/row values #45

500 error on upload w/ missing col/row values #45

Comments

rebeccabilbro commented Jan 18, 2016

rebeccabilbro commented Jan 18, 2016

rebeccabilbro commented Jan 18, 2016

rebeccabilbro commented Jan 18, 2016

bbengfort commented Jan 19, 2016

rebeccabilbro commented Jan 19, 2016

bbengfort commented Jan 20, 2016

rebeccabilbro commented Jan 25, 2016

lauralorenz commented May 22, 2017