-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read files into memory #25
Comments
This is currently the default behavior. qsv reads the file into memory and then starts to parse the csv into an array of objects, after that the array gets processed according to the SQL statement. The problem occurs either on 1. parsing the csv or 2. executing the sql statement or 3. rendering the result into a table. I'm not sure yet if we are exceeding the size limit of an array or if there is some kind of memory leak 🤔 Related #22 |
So I tested on the 28 MB StackOverflow results file, and have the following observations:
I have a feeling that your point "3. redering the result to a table" might just be the bottleneck. I've come across such issues before where displaying to the terminal is slow (because the terminal is bufferred, I guess?). Any idea how this can be examined/confirmed? |
You can have a look into memory usage with the chrome dev tools for example in VS Code or with ndb as a stand alone tool. I'm not really experienced with the memory debugging tools though. But what i've seen is that the files content, in the state before it gets parsed, doesn't get garbage collected. I think thats not a huge issue as long as files are small, because the memory get's allocated only once, so it's not a classic memory leak that starts to bloat, but never the less it should be removed. The other thing I found is a huge collection of strings that are ready to get rendered (ansi escape codes are applied). And as far as I can tell this is the point where we're running out of memory because there are a lot of ansi escape sequences added to the data. Another good claim for Point 3 is that currently the So to make it work my idea would be to try streaming the results to the terminal instead of trying to dump the complete thing at once. |
I've just created #29 so we can test a version that supports streams. If we get this working we can support files of (theoretically) unlimited size. |
For smaller files (say, less than 50 MB?), maybe we should have an option to read them entirely into memory and then run operations on them. Will it speed things up? If yes, maybe we should make this as a first question when a file is loaded. We can caution the end user that importing into memory might take a long time.
The text was updated successfully, but these errors were encountered: