You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That argument is exactly why in ChunkedCSV.jl we don't "jump and recover" even though I still think that is the most performant strategy. Ideally, the user gets to choose which strategy to employ for their file (e.g. if no string fields are present in the file, then what CSV.jl does is pretty much optimal and safe). Still, I think in practice CSV.jl seems to be safe enough and with some work could be made entirely safe -- it would just need to detect it got to an inconsistent state and use this information to retry with better chunking boundaries.
Andrew Gallant (aka burntsushi), author of ripgrep and xsv, wrote in 2020 that some CSVs won't work using CSV.jl's then-current strategy.
https://news.ycombinator.com/item?id=24747509
Just thought I'd bring it up in case there's something worth documenting here.
The text was updated successfully, but these errors were encountered: