Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel compilation? #205

Open
gaborcsardi opened this issue Jul 31, 2024 · 11 comments
Open

Parallel compilation? #205

gaborcsardi opened this issue Jul 31, 2024 · 11 comments

Comments

@gaborcsardi
Copy link

AFAIR it is allowed to use two processor cores on CRAN when installing a package. Would it be possible to leverage this and compile duckdb in parallel? That would cut the current ~24 minutes compilation time in half.

Additionally, if NOT_CRAN (or some other env var) is set, then we could use even more processors, and would (ideally) cut the compilation time to ~6 minutes on 4 processors, or even ~3 minutes on 8 processors.

@melhindi
Copy link

melhindi commented Aug 2, 2024

It would be even faster if pre-compiled binaries could be used: googlecolab/colabtools#4256
Related issue are #22,#201
I also think that the installation/compilation time is too long from a usability perspective (of an R user)

@gaborcsardi
Copy link
Author

It is technically much more challenging to use pre-compiled binaries, and they would have to be restricted to a small subset of platforms, maybe x86_64 Linux only.

OTOH parallel compilation is (probably?) easy to switch on and every platform benefits greatly from it. Btw. I just compiled duckdb-r on s390x Linux in qemu, and it took more than 24 hours (!).

@krlmlr
Copy link
Collaborator

krlmlr commented Aug 16, 2024

Thanks. Shouldn't this be a setting in ~/.R/Makevars ? I compile the R package on 8 cores with no problems, it's supported, but why would we hardcode this in the package?

@gaborcsardi
Copy link
Author

I compile the R package on 8 cores with no problems, it's supported

How?

but why would we hardcode this in the package?

Because people don't know about this. If the user has a setting that use that, sure. If there is no setting, then I would use two processors on CRAN, and maybe two, maybe more elsewhere.

@krlmlr
Copy link
Collaborator

krlmlr commented Aug 17, 2024

I have MAKEFLAGS = -j8 in ~/.R/Makevars . Is there prior art of hardcoding the cores, perhaps in Arrow? Would PPM like that setting?

@gaborcsardi
Copy link
Author

Arrow does build in parallel by default for me, but IDK if it uses MAKEFLAGS.

@krlmlr
Copy link
Collaborator

krlmlr commented Aug 17, 2024

@thisisnic: Does this ring a bell? How does the arrow R package achieve multicore compilation by default, regardless of user settings?

@thisisnic
Copy link

I think this is where we make that happen, dependent on the NOT_CRAN env var: https://github.com/apache/arrow/blob/5ef7e01053c526389acefddd6f961bf1fd9d274b/r/tools/nixlibs.R#L513

@krlmlr
Copy link
Collaborator

krlmlr commented Aug 21, 2024

But the nixlibs.R script would only be called by ./configure, no? Or do the environment variables set there affect what happens when building the package?

@thisisnic
Copy link

Hmm, I'm unsure of the exact details, but @jonkeane will know

@jonkeane
Copy link

do the environment variables set there affect what happens when building the package?

Yeah, this is what happens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants