Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display expression values in 2d bins #162

Open
slowkow opened this issue Mar 20, 2020 · 6 comments
Open

display expression values in 2d bins #162

slowkow opened this issue Mar 20, 2020 · 6 comments

Comments

@slowkow
Copy link

slowkow commented Mar 20, 2020

I'd like to ask if it would be possible to change the way the data is displayed in the main window.

As far as I know, there are no scRNA-seq data browsers that use 2d bins to show expression data. I think it might be worth a try.

This article from the documentation for the datashader python package does a great job showing why plotting colored dots is not optimal for large datasets.

https://datashader.org/user_guide/Plotting_Pitfalls.html

Visualization is supposed to help you explore and understand your data, but if your visualizations are systematically misrepresenting your data because of overplotting, oversaturation, undersampling, undersaturation, underutilized range, and nonuniform colormapping, then you won't be able to discover the real qualities of your data and will be unable to make the right decisions.

I agree with the issues raised in the article, and in my own experience I've found that it's easier to see expression patters when using 2d bins with the mean of all cells shown in each bin.

Here is the summary of the strategy recommended in the article:

  • bin the data in 2d (squares or hexagons)
  • show the summary (mean, median, percent > 0, etc.) for each bin with color
  • for the color scale, use the full range of colors by showing quantiles (if lazy, log-transform)

These examples show some of the issues with various strategies for representing large numbers of dots in two dimensions:

image

An example of a figure that uses the strategy recommended in the article looks like this:

image

I might try to work on this, and I'll try to share if I make progress.

My first idea is to try the density heatmap plot from vega, but there might be other approaches worth trying, too.

image

@slowkow
Copy link
Author

slowkow commented Mar 22, 2020

At this point I've hacked together my own javascript that builds on top of your foundation. I'm thoroughly impressed with your code.

Here are my attempts with d3 and vega. Of course I'd be happy to share the code. Right now I'm still playing around — eventually I'll make a blog post or something.

CellBrowser

Here's a gene in CellBrowser:

image

d3

I got a prototype working with d3-hexbin, which shows the mean of the log2CPM expression value for cells in each bin:

image

vega

I got a rough prototype working with the vega heatmap. However, I don't know how to show the mean of log2CPM values. I posted a new question on Stackoverflow, and I hope someone might be able to suggest a workaround.

So instead, this is actually showing the density of points in each square — weighted by the quantized expression values.

image

@matthewspeir
Copy link
Collaborator

Hey, @slowkow!

This is really cool and it seems like it would be a really great feature. We're currently focusing on bringing in new datasets, so I'm not sure we have any time to dedicate to this. If you have an idea for how to implement this on the python backend, we can certainly draw hexagons in Javascript. Basically, if you can implement this at least partially and then need help integrating it with the rest of our code, @maximilianh said he would be happy to help.

Thanks!

@maximilianh
Copy link
Owner

maximilianh commented Apr 13, 2020 via email

@slowkow
Copy link
Author

slowkow commented Apr 13, 2020

Here's what I have been hacking on in the past few weeks. It seems to work pretty well. It is fun to build on top of the foundation that you built with the binary files and range requests 😀

Data from Smillie et al 2019

ezgif com-video-to-gif (2)

@maximilianh
Copy link
Owner

maximilianh commented Apr 29, 2020 via email

@slowkow
Copy link
Author

slowkow commented Nov 1, 2020

Hey Max, I apologize for the very long delay in response. Sometimes I forget to reply. Also, I was trying to decide for a long time whether I should contribute to your repo or create my own.

In the end, I made my own at https://github.com/slowkow/cellguide

I copied some of your files and then hacked new features until I had something that meets some of my needs. Of course, please feel free to copy anything — I kept the GPL-3 license. I like that I have the freedom to diverge in a different direction with my own repo.

I have more ideas for the future if you want to chat again sometime — let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants