-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatic triage utility #793
Conversation
A different test case:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, a few comments.
Algorithmic parts seem reasonable to me, but would nonetheless likely benefit from some tests :-)
Fwiw, I think this is a good target for property-based testing -- the idea being that we can randomly sample (or enumerate) some small scenarios and designate an XLA or JAX commit from which the CI will fail. So we'd mock container_exists
, check_container
and build_and_test
and then test the logic using a model like
# Fabricate XLA and JAX commits for days a/b/c
def fake_commits(n_commits, n_days, rng):
dt = datetime.datetime('2024-10-01')
delta = datetime.timedelta(days=n_days) / n_commits
commits: list[tuple[bool, datetime, str]] = []
for i in range(n_commits):
dt += delta
is_jax = rng.coinflip()
commit = '{0:09d}'.format(i) # not really hex, but whatever
commits.append((is_jax, dt, commit))
return commits
def test_one_scenario():
commits = fake_commits(100, 10, rng(seed=123))
for bad_jax, bad_dt, bad_commit in commits:
def build_and_test(xla_commit, jax_commit):
return jax_commit >= bad_commit if bad_jax else xla_commit >= bad_commit
def check_container(date):
return ... # build_and_test(earliest xla commit >= date, earliest jax commit >= xla commit)
assert triage(commits, check_container, build_and_test) == (bad_jax, bad_commit)
561adde
to
19bc222
Compare
This is a small Python program that implements a two-stage bisection, identifying a commit in JAX or XLA that caused a test case to start failing.
f0ce788
to
62dccd1
Compare
My case here:
And this 28a4ebf1369ce34c50dfcbdf9ec5bf5acdfa4e22 is the culprit commit! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a nice and neat work!
Improve documentation of the triage tool added in #793.
This is a small Python program that implements a two-stage bisection, identifying a commit in JAX or XLA that caused a test case to start failing.
An example use-case was the
JetTest.test_dot
test, which had been failing in the JAX unit tests on A100 for an extended period.pointing the finger at openxla/xla@a580700. The failure is worked around in jax-ml/jax#21035.