Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming CSV processing/plotting #27

Merged
merged 26 commits into from
Dec 14, 2015
Merged

Streaming CSV processing/plotting #27

merged 26 commits into from
Dec 14, 2015

Conversation

breznak
Copy link
Member

@breznak breznak commented Dec 9, 2015

Fixes #26

@jefffohl
Copy link
Member

jefffohl commented Dec 9, 2015

@breznak - I ran a test of this, and it did not seem to work. When I tried to load a file nothing happens, and I got a lot of stuff logged to the console.

@breznak
Copy link
Member Author

breznak commented Dec 9, 2015

I'm not sure what changed (broken merge, the newer libs) but I'm getting this error too. Will try to investigate.

Conflicts:
	build/app.js
	client/src/app/appCtrl.js
@breznak
Copy link
Member Author

breznak commented Dec 9, 2015

This needs some more love, but I got it working.

  • Not sure if really much faster, but should hog less the browser on large files.
    • we should validate with a 2M points file
  • a strange error!: opening a 2nd file keeps the CSV header settings from the former, thus those errors on "Mismatched count ...". A solution is refresh the browser. @jefffohl any idea how to cleanup properly?
  • after each step(s) when you see console "draw", I'd like to paint a new graph. Why does it not happen? Or is the rendering just that slow so a new batch arrives before it finished?

When ready, this could be a good way to #17

@breznak
Copy link
Member Author

breznak commented Dec 9, 2015

@jefffohl @rhyolight can you take a look at c2c14a7 please? I want to load a small(er) batch, render it quickly, repeate.
But I don't know how exactly to redraw the graph?
Also, do DyGraphs support something like updates? That I keep what is rendered (just shrink it) and append a few of new data?

@jefffohl
Copy link
Member

jefffohl commented Dec 9, 2015

@breznak You can update the graph data by calling the updateOptions function. You can see this being called here: https://github.com/nupic-community/nupic.visualizations/blob/master/client/src/app/appCtrl.js#L192

I can start working on this if you would like - but I don't want to work in parallel with you - so let me know.

@jefffohl
Copy link
Member

jefffohl commented Dec 9, 2015

FYI - here is the Dygraphs API reference: http://dygraphs.com/jsdoc/symbols/Dygraph.html

@breznak
Copy link
Member Author

breznak commented Dec 9, 2015

I can start working on this if you would like - but I don't want to work in parallel with you - so let me know.

That would be great @jefffohl ! I'll take a break and sleep now :) DyGraph.updateOption looks like what we need - called from Papa.step() with the small batch payload. Would be awesome if we can get the "streaming data" done! For now I think we could assume NuPIC runs slower than the data appears, so we don't have to bother with polling and sleeps.

@jefffohl
Copy link
Member

@breznak - I started on this today, but didn't get too far yet. I will work on it again tomorrow, unless you are eager to jump back in. If so - let me know.

@jefffohl
Copy link
Member

@breznak - due to requests from @rhyolight i am prioritizing #40. i still plan on working on this though. let me know your thoughts.

…s. Note that this does not yet allow for streamed files over HTTP.
@jefffohl
Copy link
Member

@breznak PR #54 could possibly replace this. Thoughts?

Fixes #26. Chunked streaming in PapaParse
@breznak
Copy link
Member Author

breznak commented Dec 14, 2015

@jefffohl thanks for #54 (you made a PR to merge into this PR, not master, which confused me..sorry.) Anyway, this is your work from #54.
I've added a new example file examples/CSV/huge_2M.csv and it uncovered a bug in the current implementation:

  • the first chunk seems to determine the x-axis - reading data > CHUNK_SIZE does not extend the axis, instead, the data is written over the existing one.

@jefffohl
Copy link
Member

@breznak - ok, taking a look.

@jefffohl
Copy link
Member

@breznak - this is happening only for files without a timestamp. Working on a fix.

One other thing I noticed with very large files is that there is a limit as to what DyGraphs can handle. My system locked up trying to render the 60MB file. For files this large, it seems we will need to figure out a way to enforce windowing or something.

@breznak
Copy link
Member Author

breznak commented Dec 14, 2015

this is happening only for files without a timestamp. Working on a fix.

Thanks @jefffohl ! I've noticed another thing, when the timestamp is not monotonic the graph is rendered over the exiting values (i think that is an OK behavior, as we expect time to be monotonically increasing only)- but this happens when you append 2 files, maybe we could raise a warning "Your timestamp is not monothonic." ?

My system locked up trying to render the 60MB file. For files this large, it seems we will need to figure out a way to enforce windowing or something.

@jefffohl
Copy link
Member

I made a new PR to the speedup_csv_parse branch from my version of that branch, which fixes the problem of series without timestamps overwriting the iteration value.

Addresses #26. Fixes problem of files that have no timestamp.
@jefffohl
Copy link
Member

@breznak - I am going to merge this into master, if that is OK with you. For your recent comments regarding enforced windowing, and monotonic alerts, lets make some new issues for those.

breznak added a commit that referenced this pull request Dec 14, 2015
@breznak breznak merged commit a6a684a into master Dec 14, 2015
@breznak breznak deleted the speedup_csv_parse branch December 14, 2015 18:47
@breznak
Copy link
Member Author

breznak commented Dec 14, 2015

Thank you Jeff! merged the branch and I'll make the issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants