Have a bunch of benchmarks #177

madig · 2021-09-09T12:30:01Z

I every now and then do profiling of UFO loading (and writing). I think it would be nice to have a bunch of benchmarks (https://doc.rust-lang.org/cargo/commands/cargo-bench.html) ready to be run. They could also serve as entry points for profilers.

I'd say we could import a copy of Noto Sans and maybe even a custom to-UFO translation of Noto Sans CJK to have norad work up a sweat. Need to figure out the CJK part still, until the sources are opened.

Scenarios:

Serial loading of UFOs, no parallel loading of glifs (without rayon feature)
Serial loading of UFOs, parallel loading of glifs (with rayon feature)
Parallel loading of UFOs, no parallel loading of glifs
Parallel loading of UFOs, parallel loading of glifs
Loading a single UFO without any font or glyph libs
Loading a single UFO where every glyph has a lib with lotsa stuff and the font lib is big (to hammer plist)

Could also include the line-ending benches in #172 (comment).

The text was updated successfully, but these errors were encountered:

chrissimpkins · 2021-09-09T12:51:41Z

I really like this idea. Are the GHA runners reliable enough envs to run these benchmarks? If not, how would you standardize the execution/reporting?

madig · 2021-09-09T13:05:23Z

I wasn't thinking about GHA actually, I'm not sure how good CI infrastructure is for reliable benchmarking. This is more aimed towards running benches on various machines easily to compare e.g. platform diffs. Benchmarking on commits does sound enticing, though...

chrissimpkins · 2021-09-09T13:08:06Z

https://fast.vlang.io/ appears to use a free instance on AWS, possibly related to #175 too

cmyr · 2021-09-10T15:13:22Z

I wouldn't benchmark on CI infrastructure, and generally wouldn't want to benchmark on a virtual machine. I do think benchmarks are important, although I would prefer criterion to the built in cargo bench.

madig · 2021-09-12T19:59:07Z

Another idea: look for quadratic runtime by having a massive CJK UFO and loading incrementally more of it and seeing if the timings form a line or upward slope. Also comparison and other things you can do to objects.

chrissimpkins · 2021-10-02T14:36:48Z

having a massive CJK UFO

I looked into Noto CJK sources. They are not available and won't be in the near term.

madig · 2021-10-05T22:31:13Z

I made a 60k glyph amalgamation of Noto at https://github.com/madig/noto-amalgamated. It's just the Regular for now, maybe I should do an amalgamation for all Designspace extremes? Need to think about what and how I want to benchmark.

BTW: I profiled the amalgamation script and was amazed to find out that ~2 mins of the 9-10 mins runtime are spent in ufoLib.filenames.userNameToFileName. What the hell.

madig · 2021-10-10T15:03:09Z

Looking at this 🤔 So, criterion is built such that if you want to compare rayon to no rayon, you run cargo criterion --features rayon instead of changing the benchmarks. This leaves what to benchmark and how.

I currently have Mutator Sans as a small UFO collection, a recent Noto Sans as a medium-size UFO collection but with 15 masters and one huge Noto Amalgamated. I know that plist loading influences parsing time; 15-25% of Mutator Sans glyphs have a lib, almost all glyphs in Noto Sans do, 75% in Noto Amalgamated do. I had the idea of measuring with and without plists and stuff, but maybe I should keep it real and take the 3 UFO families as they are, for now, until I have a clearer idea of what I want to benchmark and why.

So, maybe I'll make a new data repo with Mutator Sans, Noto Sans and Noto Amalgamated (with maybe all points in the Designspace, amalgamated) and hook that in as a sub-repo, and test serial loading in each group plus parallel loading (launching 1 thread per UFO to load). Then I can bench with --features rayon and without?

Edit: just saw that a Noto amalgamated by style name gives me a nice progression of glyph numbers. I can bench that.

madig · 2021-10-10T19:10:08Z

Interestingly, there does seem to be some quadraticness going on without rayon? X-axis is number of glyphs (amalgamated Noto has a nice glyph number progression), Y-axis is load time in seconds. Not loading glyph libs halves loading time but the graph keeps the slope. Or am I reading the graph wrong?

cmyr · 2021-10-12T14:01:19Z

I don't think the graph is especially clear, it isn't far from being a straight line, and there's always the possibility of measurement noise.

chrissimpkins · 2021-12-08T14:14:12Z

I wandered across this project from the Criterion developer that claims to be a way to support benchmark tests on CI infrastructures:

https://github.com/bheisler/iai

Precision: High-precision measurements allow you to reliably detect very small optimizations to your code
Consistency: Iai can take accurate measurements even in virtualized CI environments
Performance: Since Iai only executes a benchmark once, it is typically faster to run than statistical benchmarks
Profiling: Iai generates a Cachegrind profile of your code while benchmarking, so you can use Cachegrind-compatible tools to analyze the results in detail

Valgrind-based, Linux only IIUC.

chrissimpkins · 2021-12-22T15:28:24Z

Can confirm that iai functions on GH Actions Ubuntu runner CI with an apt install of valgrind, and data appear to be relatively stable across runs. Cannot confirm accuracy, nor whether the data are useful for performance improvement work (yet)... :)

madig mentioned this issue Oct 10, 2021

WIP: add benchmarking #192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have a bunch of benchmarks #177

Have a bunch of benchmarks #177

madig commented Sep 9, 2021 •

edited

Loading

chrissimpkins commented Sep 9, 2021

madig commented Sep 9, 2021 •

edited

Loading

chrissimpkins commented Sep 9, 2021

cmyr commented Sep 10, 2021

madig commented Sep 12, 2021 •

edited

Loading

chrissimpkins commented Oct 2, 2021

madig commented Oct 5, 2021 •

edited

Loading

madig commented Oct 10, 2021 •

edited

Loading

madig commented Oct 10, 2021 •

edited

Loading

cmyr commented Oct 12, 2021

chrissimpkins commented Dec 8, 2021

chrissimpkins commented Dec 22, 2021 •

edited

Loading

Have a bunch of benchmarks #177

Have a bunch of benchmarks #177

Comments

madig commented Sep 9, 2021 • edited Loading

chrissimpkins commented Sep 9, 2021

madig commented Sep 9, 2021 • edited Loading

chrissimpkins commented Sep 9, 2021

cmyr commented Sep 10, 2021

madig commented Sep 12, 2021 • edited Loading

chrissimpkins commented Oct 2, 2021

madig commented Oct 5, 2021 • edited Loading

madig commented Oct 10, 2021 • edited Loading

madig commented Oct 10, 2021 • edited Loading

cmyr commented Oct 12, 2021

chrissimpkins commented Dec 8, 2021

chrissimpkins commented Dec 22, 2021 • edited Loading

madig commented Sep 9, 2021 •

edited

Loading

madig commented Sep 9, 2021 •

edited

Loading

madig commented Sep 12, 2021 •

edited

Loading

madig commented Oct 5, 2021 •

edited

Loading

madig commented Oct 10, 2021 •

edited

Loading

madig commented Oct 10, 2021 •

edited

Loading

chrissimpkins commented Dec 22, 2021 •

edited

Loading