-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trivial improvements for xbin_min and xbin_max may lead to speedups in sample_get_x #969
Comments
This is 291bcf5 |
This might be this, but is seems too silly to have an effect, maybe it was elsewhere 23a1358 |
…ode for xbin_min and xbin_max (part1 of madgraph5#969) There is indeed a small but clear improvement CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 4.5494s [COUNTERS] Fortran Other ( 0 ) : 0.1688s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0669s [COUNTERS] Fortran Random2Momenta ( 3 ) : 3.2830s for 1170103 events => throughput is 2.81E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1061s for 49152 events => throughput is 2.16E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1361s for 16384 events => throughput is 8.31E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0519s for 16384 events => throughput is 3.17E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0649s for 16384 events => throughput is 3.96E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1366s for 1170103 events => throughput is 1.17E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4745s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0257s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0349s for 16384 events => throughput is 2.13E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 4.5145s [COUNTERS] OVERALL MEs ( 22 ) : 0.0349s for 16384 events => throughput is 2.13E-06 events/s
… for xmin=0 and xbin_max for xmax=1 (part2 of madgraph5#969) There is indeed another clear and not too small improvement CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 4.2184s [COUNTERS] Fortran Other ( 0 ) : 0.1695s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0672s [COUNTERS] Fortran Random2Momenta ( 3 ) : 2.9293s for 1170103 events => throughput is 2.50E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1094s for 49152 events => throughput is 2.23E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1379s for 16384 events => throughput is 8.42E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0560s for 16384 events => throughput is 3.42E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0707s for 16384 events => throughput is 4.31E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1447s for 1170103 events => throughput is 1.24E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4719s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0350s for 16384 events => throughput is 2.13E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 4.1834s [COUNTERS] OVERALL MEs ( 22 ) : 0.0350s for 16384 events => throughput is 2.13E-06 events/s
See the difference between the default 079207d
And then the change 1, removing a few xbin calls
And then caching the xbin values
I think this could become a small standalone PR. To discuss with @oliviermattelaer |
… gg_tt.mad), simplify the code for xbin_min and xbin_max (part1 of madgraph5#969) There is indeed a small but clear improvement CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 4.5494s [COUNTERS] Fortran Other ( 0 ) : 0.1688s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0669s [COUNTERS] Fortran Random2Momenta ( 3 ) : 3.2830s for 1170103 events => throughput is 2.81E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1061s for 49152 events => throughput is 2.16E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1361s for 16384 events => throughput is 8.31E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0519s for 16384 events => throughput is 3.17E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0649s for 16384 events => throughput is 3.96E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1366s for 1170103 events => throughput is 1.17E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4745s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0257s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0349s for 16384 events => throughput is 2.13E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 4.5145s [COUNTERS] OVERALL MEs ( 22 ) : 0.0349s for 16384 events => throughput is 2.13E-06 events/s
… gg_tt.mad), cache xbin_min for xmin=0 and xbin_max for xmax=1 (part2 of madgraph5#969) There is indeed another clear and not too small improvement CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 4.2184s [COUNTERS] Fortran Other ( 0 ) : 0.1695s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0672s [COUNTERS] Fortran Random2Momenta ( 3 ) : 2.9293s for 1170103 events => throughput is 2.50E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1094s for 49152 events => throughput is 2.23E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1379s for 16384 events => throughput is 8.42E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0560s for 16384 events => throughput is 3.42E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0707s for 16384 events => throughput is 4.31E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1447s for 1170103 events => throughput is 1.24E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4719s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0350s for 16384 events => throughput is 2.13E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 4.1834s [COUNTERS] OVERALL MEs ( 22 ) : 0.0350s for 16384 events => throughput is 2.13E-06 events/s
… gg_tt.mad), comment out dead if/then branches (for warnings that are commented out) This is another minor component of madgraph5#969. It gives almost insignificant performance improvements, but it simplifies the code. CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 4.1574s [COUNTERS] Fortran Other ( 0 ) : 0.1706s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0670s [COUNTERS] Fortran Random2Momenta ( 3 ) : 2.8950s for 1170103 events => throughput is 2.47E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1021s for 49152 events => throughput is 2.08E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1360s for 16384 events => throughput is 8.30E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0518s for 16384 events => throughput is 3.16E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0679s for 16384 events => throughput is 4.15E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1401s for 1170103 events => throughput is 1.20E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4658s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0263s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0347s for 16384 events => throughput is 2.12E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 4.1227s [COUNTERS] OVERALL MEs ( 22 ) : 0.0347s for 16384 events => throughput is 2.12E-06 events/s
… gg_tt.mad), skip xbin checks if CUDACPP_RUNTIME_SKIPXBINCHECKS is set (part3 of madgraph5#969) This is a very large improvement, but it may be more controversial, hence it is disabled by default... CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 4.1142s [COUNTERS] Fortran Other ( 0 ) : 0.1610s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0670s [COUNTERS] Fortran Random2Momenta ( 3 ) : 2.8821s for 1170103 events => throughput is 2.46E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0962s for 49152 events => throughput is 1.96E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1278s for 16384 events => throughput is 7.80E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0485s for 16384 events => throughput is 2.96E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0670s for 16384 events => throughput is 4.09E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1355s for 1170103 events => throughput is 1.16E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4683s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0262s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0348s for 16384 events => throughput is 2.13E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 4.0794s [COUNTERS] OVERALL MEs ( 22 ) : 0.0348s for 16384 events => throughput is 2.13E-06 events/s CUDACPP_RUNTIME_SKIPXBINCHECKS=1 CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 3.2969s [COUNTERS] Fortran Other ( 0 ) : 0.1726s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s [COUNTERS] Fortran Random2Momenta ( 3 ) : 2.0464s for 1170103 events => throughput is 1.75E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0958s for 49152 events => throughput is 1.95E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1298s for 16384 events => throughput is 7.92E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0482s for 16384 events => throughput is 2.94E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0656s for 16384 events => throughput is 4.00E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1412s for 1170103 events => throughput is 1.21E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4685s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0349s for 16384 events => throughput is 2.13E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 3.2620s [COUNTERS] OVERALL MEs ( 22 ) : 0.0349s for 16384 events => throughput is 2.13E-06 events/s
…5#969 performance improvements in sample_get_x in dsample.f This includes - simplify the code for xbin_min and xbin_max (remove dead code) - cache xbin_min for xmin=0 and xbin_max for xmax=1 - comment out dead if/then branches (for warnings that were already commented out) - optionally skip xbin checks if CUDACPP_RUNTIME_SKIPXBINCHECKS is set The only files that still need to be patched are - 4 in patch.common: Source/makefile, Source/genps.inc, Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that regenerating gg_tt.mad is ok)
…graph5#969 improvements in dsample.f) on itscrd90 Code generation completed in 245 seconds Code generation and additional checks completed in 372 seconds
… copy this to gg_tt.mad!], skip xbin checks if CUDACPP_RUNTIME_SKIPXBINCHECKS is set (part3 of madgraph5#969) This is a very large improvement, but it may be more controversial, hence it is disabled by default... CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 4.1142s [COUNTERS] Fortran Other ( 0 ) : 0.1610s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0670s [COUNTERS] Fortran Random2Momenta ( 3 ) : 2.8821s for 1170103 events => throughput is 2.46E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0962s for 49152 events => throughput is 1.96E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1278s for 16384 events => throughput is 7.80E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0485s for 16384 events => throughput is 2.96E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0670s for 16384 events => throughput is 4.09E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1355s for 1170103 events => throughput is 1.16E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4683s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0262s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0348s for 16384 events => throughput is 2.13E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 4.0794s [COUNTERS] OVERALL MEs ( 22 ) : 0.0348s for 16384 events => throughput is 2.13E-06 events/s CUDACPP_RUNTIME_SKIPXBINCHECKS=1 CUDACPP_RUNTIME_DISABLEFPE=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_dy3j_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 3.2969s [COUNTERS] Fortran Other ( 0 ) : 0.1726s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s [COUNTERS] Fortran Random2Momenta ( 3 ) : 2.0464s for 1170103 events => throughput is 1.75E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0958s for 49152 events => throughput is 1.95E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1298s for 16384 events => throughput is 7.92E-06 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0482s for 16384 events => throughput is 2.94E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0656s for 16384 events => throughput is 4.00E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1412s for 1170103 events => throughput is 1.21E-07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4685s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0349s for 16384 events => throughput is 2.13E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 3.2620s [COUNTERS] OVERALL MEs ( 22 ) : 0.0349s for 16384 events => throughput is 2.13E-06 events/s
…5#969 performance improvements in sample_get_x in dsample.f This includes - simplify the code for xbin_min and xbin_max (remove dead code) - cache xbin_min for xmin=0 and xbin_max for xmax=1 - comment out dead if/then branches (for warnings that were already commented out) - [NOT YET INCLUDED! I forgot this...] optionally skip xbin checks if CUDACPP_RUNTIME_SKIPXBINCHECKS is set The only files that still need to be patched are - 4 in patch.common: Source/makefile, Source/genps.inc, Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that regenerating gg_tt.mad is ok)
…graph5#969 improvements in dsample.f) on itscrd90 [NB: CUDACPP_RUNTIME_SKIPXBINCHECKS is still missing here!] Code generation completed in 245 seconds Code generation and additional checks completed in 372 seconds
…cluding the latest timers/counters and madgraph5#969 sample_get_x speedups [NB: CUDACPP_RUNTIME_SKIPXBINCHECKS still missing!] CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -fortran pp_dy3j.mad -togridpack
… CUDACPP_RUNTIME_SKIPXBINCHECKS patch madgraph5#968 (on top of madgraph5#969) This includes - optionally skip xbin checks if CUDACPP_RUNTIME_SKIPXBINCHECKS is set The only files that still need to be patched are - 4 in patch.common: Source/makefile, Source/genps.inc, Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that regenerating gg_tt.mad is ok)
…UDACPP_RUNTIME_SKIPXBINCHECKS set madgraph5#968 : big improvement!) For the cuda backend is now, skipping xbin checks madgraph5#968 Phase space sampling in dy+3j has decreased from 78s to 53s (down by 30%) thanks to removal of xbin checks > [GridPackCmd.launch] GRIDPCK TOTAL 135.1144 > [madevent COUNTERS] PROGRAM TOTAL 130.8140s > [madevent COUNTERS] Fortran PhaseSpaceSampling 53.0338s for 44652395 events > ... > [madevent COUNTERS] CudaCpp MEs 35.4908s for 1769472 events > [madevent COUNTERS] OVERALL NON-MEs 95.3232s > [madevent COUNTERS] OVERALL MEs 35.4908s for 1769472 events For the cuda backend was, including xbin checks but including trivial improvements madgraph5#969 Phase space sampling in dy+3j has decreased from 93s to 78s (down by 15%) thanks to removal of xbin checks < [GridPackCmd.launch] GRIDPCK TOTAL 160.1718 < [madevent COUNTERS] PROGRAM TOTAL 155.8605s < [madevent COUNTERS] Fortran PhaseSpaceSampling 78.1023s for 44652395 events < ... < [madevent COUNTERS] CudaCpp MEs 35.4320s for 1769472 events < [madevent COUNTERS] OVERALL NON-MEs 120.4290s < [madevent COUNTERS] OVERALL MEs 35.4320s for 1769472 events For the cuda backend was in 2e59eca, without trivial improvements < [GridPackCmd.launch] GRIDPCK TOTAL 176.8891 < [madevent COUNTERS] PROGRAM TOTAL 172.6370s < [madevent COUNTERS] Fortran Random2Momenta 93.2907s for 44651014 events < ... < [madevent COUNTERS] CudaCpp MEs 35.4557s for 1769472 events < [madevent COUNTERS] OVERALL NON-MEs 137.1806s < [madevent COUNTERS] OVERALL MEs 35.4557s for 1769472 events
…ts - but not yet the latest upstream/master) into cmsdyps Fix conflicts in patch.common (NB: the 968/969 improvements are now in the OLD sample_get_x)
…ts - but not yet the latest upstream/master) into cmsdyps Fix conflicts in patch.P1 and patch.common (NB: the 968/969 improvements are now in the OLD sample_get_x)
I am doing a few tests with sample_get_x towards vectorising it, see #963
Apart from the issue reported in #968, I think I identified another two trivial but useful improvements in sample_get_x
This is WIP to be confirmed.
The text was updated successfully, but these errors were encountered: