Improvements for transpose, and more. #1624

GillesDuvert · 2023-08-29T17:49:03Z

Transpose is improved by forcing use of our multi-threaded code (preexisting) instead of Eigen::'s. Checked to be 4+ times faster. As Transpose is used in several GDL functions this provides some stamina to them. The previous choice of Eigen:: seems strange in retrospect but was based on actual measurements at the time, but optimizations have been added since.

I've added a commandline switch (--with-eigen-transpose) to enable Eigen::Transpose in case this would prove faster on some architectures or after Eigen:: made progresses.

Second, this version permits, via the use of another switch (--smart-tpool) to use a threadpool mode where, in case threads are available, TPOOL_MIN_ELTS would also be more or less the number of elements that each thread will process, so that GDL may use less threads than the machine can provide (some GDL running machines have 64 or more cores). Obviously it is not worth starting 128 threads if 10 would already do the job in time. To get more concurrential threads, diminish TPOOL_MIN_ELTS, and conversely, to find the optimum for a specific case. May be a cure for #1149?

OTOH, it is not always the number of elements processed by one thread that govern the overall spent time. The time spent per element, were it a simple addition or a long procedure, is also a key factor. The parallelize() function (in basegdl.cpp) accepts modifiers to change this behaviour. I've tweaked a few, but this is not very 'adaptive', introspection will be needed.

Running GDL with --smart-tpool on machines with a large number of threads and test it would be invaluable.

…olution instead of Eigen::'s. Added a commandline switch to enable Eigen::Transpose in case this would prove faster on some architectures or after Eigen:: made progresses. Insured (sort of) that in multithreaded mode, TPOOL_MIN_ELTS is also more or less the number of elements that each thread will process, so that GDL may use less threads than the machine can provide (some GDL running machines have 64 or more cores). Obviously it is not worth starting 128 threads if 10 would already do the job in time.

…hreads used insure each thread will process more or less TPOOL_MIN_ELTS, not a diminutve number given the number of available threads, that can be large, 128 or more.

…ode) use of the max available number of threads, or other variant.

codecov · 2023-08-29T20:17:59Z

Codecov Report

Patch coverage: 32.25% and project coverage change: -0.02% ⚠️

Comparison is base (907e95f) 41.02% compared to head (0757621) 41.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1624      +/-   ##
==========================================
- Coverage   41.02%   41.00%   -0.02%     
==========================================
  Files         355      355              
  Lines       95032    95045      +13     
  Branches    19527    19531       +4     
==========================================
- Hits        38987    38974      -13     
- Misses      56045    56071      +26

Files Changed	Coverage Δ
src/basic_fun.cpp	`51.41% <ø> (ø)`
src/gdl.cpp	`59.39% <0.00%> (-1.48%)`	⬇️
src/objects.cpp	`97.16% <ø> (ø)`
src/objects.hpp	`66.66% <ø> (ø)`
src/basegdl.cpp	`6.32% <25.00%> (-0.45%)`	⬇️
src/math_fun_jmg.cpp	`28.12% <33.33%> (ø)`
src/datatypes.cpp	`42.24% <57.14%> (-0.70%)`	⬇️
src/minmax_include.cpp	`36.64% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

GillesDuvert added 6 commits August 29, 2023 17:57

differentiate code for TP_CPU_INTENSIVE

7bc7ede

added switch "--smart-tpool" that enable a mode where the number of t…

8e9a01a

…hreads used insure each thread will process more or less TPOOL_MIN_ELTS, not a diminutve number given the number of available threads, that can be large, 128 or more.

added switch "--smart-tpool" that enable a mode where the number of t…

1105b1f

…hreads used insure each thread will process more or less TPOOL_MIN_ELTS, not a diminutve number given the number of available threads, that can be large, 128 or more.

modified some flags of parallelize() to force (in the 'smart tpoll' m…

783b3fa

…ode) use of the max available number of threads, or other variant.

uint unknown on Windows, bah.

0757621

GillesDuvert merged commit 104626d into gnudatalanguage:master Aug 30, 2023
5 of 8 checks passed

GillesDuvert deleted the improvements_transpose branch September 8, 2023 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements for transpose, and more. #1624

Improvements for transpose, and more. #1624

GillesDuvert commented Aug 29, 2023

codecov bot commented Aug 29, 2023

Improvements for transpose, and more. #1624

Improvements for transpose, and more. #1624

Conversation

GillesDuvert commented Aug 29, 2023

codecov bot commented Aug 29, 2023

Codecov Report