Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify jittering #3602

Open
joelostblom opened this issue Sep 20, 2024 · 0 comments
Open

Simplify jittering #3602

joelostblom opened this issue Sep 20, 2024 · 0 comments

Comments

@joelostblom
Copy link
Contributor

What is your suggestion?

I think jittering is an important and often used functionality, especially for statistical and scientific charts. Ideally, I believe it should be so simple to use that it can be introduced right after showing how to do a simple categorical scatter/dot plot as a way to deal with there being too many points. For example, from this simple example:

import altair as alt
from vega_datasets import data

source = data.cars.url

alt.Chart(source).mark_point().encode(
    x="Horsepower:Q",
    y="Cylinders:O",
    color='Cylinders:N'
)

image

I would like to reach the following chart with minimal modifications:

image

However, jittering is currently somewhat tricky to use correctly, and there is in fact no straightforward path to this chart. Instead we either have to use a calculate transform plus setting the offset domain to avoid overlapping points if using the offset channel encoding, or manually specify a multiplication constant and a centering offset if we are using the mark config. I opened vega/vega-lite#9438, vega/vega-lite#9437, and vega/vega-lite#9436, to try to address these issues. If those are all fixed, we would be able to use a syntax like this to reach the effective chart above:

alt.Chart(source).mark_point(yOffset=alt.expr('random()')).encode(
    x="Horsepower:Q",
    y="Cylinders:O",
    color='Cylinders:N'
)

This is quite simple, although it involves the introduction of expr. I'm not entirely against that as it might be beneficial to see this early on and learn that it is a building block to work with later. However, part of me wonders if we should have a dedicated string function for jittering, something like:

alt.Chart(source).mark_point().encode(
    x="Horsepower:Q",
    y="Cylinders:O",
    color='Cylinders:N',
    yOffset='jitter()'
)

Then jitter() could take a numerical argument that specified how much of the bandwidth should be used for the (random) jittering of the points. This would be quite different than all our other magical string functions which reference aggregation method, so I'm no sold on this idea, but thought I would write it down here while we see what happens with the VL issues I have opened.

Have you considered any alternative solutions?

Arguably the function jitter() could be implemented in VL instead, but then it would still need to be passed as an expression and the offset channels do not yet support taking expression at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant