Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add static build config. #510

Closed

Conversation

xuhancn
Copy link
Contributor

@xuhancn xuhancn commented Feb 7, 2024

  1. Add "SLEEF_BUILD_SHARED_LIBS" to instead of CMake reserved variable "BUILD_SHARED_LIBS".
  2. Add clear library type to add_library, it will remove the global control from "BUILD_SHARED_LIBS", Link: https://cmake.org/cmake/help/latest/guide/tutorial/Selecting%20Static%20or%20Shared%20Libraries.html
  3. remove install to dummy dir, it will make git dirty.

Why we avoid to use BUILD_SHARED_LIBS and add_library without STATIC/SHARED parameter. Please check the offical document Note part.
image

Additional, I write a test project to test this PR: https://github.com/xuhancn/sleef_cmake_example
Turn on/off this line to control the build type.
https://github.com/xuhancn/sleef_cmake_example/blob/main/CMakeLists.txt#L14

Static:
image
Shared:
image

@xuhancn xuhancn marked this pull request as ready for review February 7, 2024 11:49
@xuhancn
Copy link
Contributor Author

xuhancn commented Feb 8, 2024

@blapie Please comment and review.

@xuhancn
Copy link
Contributor Author

xuhancn commented Feb 8, 2024

It seems that, we need this PR to fix pytorch sleef build issue. pytorch/pytorch#118980

@blapie
Copy link
Collaborator

blapie commented Feb 8, 2024

Hello! I think I understand the issue/suggested solution, although it did not sound extremely clear.

I don't really understand how this works as is. While the change to CMakeLists.txt seems to make sense, in Configure.cmake and src/common/CMakeLists.txt occurrences of BUILD_SHARED_LBS are replaced by SLEEF_STATIC_LIBS, which I believe is the opposite of what you want. Btw SLEEF_BUILD_STATIC_LIBS is not defined as a user-facing option, which doesn't help understanding the solution.

I'm not very keen on that change anyway, even if it was fixed. The use case is too particular to justify making the build system more complex than it already is.

If pytorch needs to setup its submodules differently locally then the solution implementing Save/Restore (https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L424-L440) was probably the way to go, or something along those lines.

I don't believe it is a relevant feature to add to SLEEF, but happy to be convinced otherwise.

@xuhancn
Copy link
Contributor Author

xuhancn commented Feb 8, 2024

@blapie
Haha, I did a lot of experiment and get a final code and conclusion, please forget previous discussions.
Conclusion:
We can't use "BUILD_SHARED_LIBS" to control sleef static/shared build. As you mentioned to me it is CMake reserved variable: https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html

Let me analysis the detailed issue:

Phenomenon:

pytorch Windows can only build it pass first time by save/restoe "BUILD_SHARED_LIBS", the second time(incremental build) will cause the runtime libraries conflict: #509 (comment)

Reason:

  1. BUILD_SHARED_LIBS is cmake reserved variable, and it is global scope shared in pytorch and all its submodules.
  2. When the first time build. CMake will config the whole project and its submodule in constant order. It is sequential processing. save/restore is works in sequential processing.
  3. The second time build is incremental build, only some modified code/cmake file/submodules will be built. And as we known, CMake is parallel processing build system. Pytorch has a lot of submodules. And modify(save/restore) "BUILD_SHARED_LIBS" will make a unexpectable status and impact on all submodules. It would make the build system chaos.

We need to drop the reserved variable "BUILD_SHARED_LIBS" and use sleef dedicate variable "SLEEF_BUILD_SHARED_LIBS".

@xuhancn
Copy link
Contributor Author

xuhancn commented Feb 8, 2024

Also, you can check the another static lib mimalloc: https://github.com/pytorch/pytorch/blob/main/CMakeLists.txt#L1060-L1068
It is very simple code:

Setup mimalloc dedicate options.
Add mimalloc as submodule.
Use setup include and link as depends.

Acturlly, save/restore is a workaround for sleef exist "BUILD_SHARED_LIBS" issue. Please check this comment: https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L483

@xuhancn xuhancn marked this pull request as draft February 8, 2024 18:31
@xuhancn xuhancn marked this pull request as ready for review February 8, 2024 19:36
@xuhancn
Copy link
Contributor Author

xuhancn commented Feb 10, 2024

Finally, it is continued to rename "BUILD_SHARED_LIBS" to "SLEEF_BUILD_SHARED_LIBS", please check my last PR: #509 (comment)
Also, this PR make a rule to enable "SLEEF_BUILD_SHARED_LIBS" to control sleef static/shared library.

@xuhancn xuhancn changed the title add optional static build config. add static build config. Feb 10, 2024
@xuhancn
Copy link
Contributor Author

xuhancn commented Feb 12, 2024

@blapie Please review it again.

@xuhancn
Copy link
Contributor Author

xuhancn commented Feb 13, 2024

@blapie could you please priority to review this PR? pytorch/pytorch#118980 this pytorch PR id depends on it.

@blapie
Copy link
Collaborator

blapie commented Mar 1, 2024

Hello @xuhancn! I'm looking into now. I would prefer a slightly different approach though but I will look at how feasible that is.

CMakeLists.txt Outdated
@@ -1,6 +1,6 @@

# Options

option(SLEEF_BUILD_SHARED_LIBS "Build sleef as static library" ON)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try this change instead and switch back to old name everywhere? I believe this can work according to the documentation (https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html)

-option(SLEEF_BUILD_SHARED_LIBS "Build sleef as static library" ON)
+# Declare BUILD_SHARED_LIBS option to make it accessible to top-level projects.
+option(BUILD_SHARED_LIBS "Build using shared libraries (CMake reserved)" ON)
+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried this method, It not constant work, the reason as above: #510 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytorch default want setup “BUILD_SHARED_LIBS” as shared, but it want to use sleef as static library. Current pytorch is used a save/restore this variable to WA it.
Save: https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L424-L440
Restore: https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L480-L485
And the original author think we should use SLEEF_ to pass the sleef build config also: https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L483

@blapie
Copy link
Collaborator

blapie commented Mar 1, 2024

Right, let me know if I'm missing something but I believe we don't have to rename the variable to get the behaviour you are expecting.

Could you try defining the BUILD_SHARED_LIBS option in SLEEF's CMakeLists.txt see comment in review?
According to the documentation of CMake it should work, when top-level project switches the option.

https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html

Note that if bringing external dependencies directly into the build, such as with [FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html#module:FetchContent) or a direct call to [add_subdirectory()](https://cmake.org/cmake/help/latest/command/add_subdirectory.html#command:add_subdirectory), and one of those dependencies has such a call to [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option), the top level project must also call [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option) before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.

https://discourse.cmake.org/t/build-shared-libs/4200

It’s fine to declare it as an option if you want to change the default. As long as the minimum required CMake version is 3.13, a parent project can override it with a normal variable.

But it might need to be done the right way. In order to fix the parallel build I think you need to put the save/restore stages in a function (since it will have a scope), see
https://stackoverflow.com/a/31140797

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 1, 2024

Right, let me know if I'm missing something but I believe we don't have to rename the variable to get the behaviour you are expecting.

Could you try defining the BUILD_SHARED_LIBS option in SLEEF's CMakeLists.txt see comment in review? According to the documentation of CMake it should work, when top-level project switches the option.

https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html

Note that if bringing external dependencies directly into the build, such as with [FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html#module:FetchContent) or a direct call to [add_subdirectory()](https://cmake.org/cmake/help/latest/command/add_subdirectory.html#command:add_subdirectory), and one of those dependencies has such a call to [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option), the top level project must also call [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option) before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.

https://discourse.cmake.org/t/build-shared-libs/4200

It’s fine to declare it as an option if you want to change the default. As long as the minimum required CMake version is 3.13, a parent project can override it with a normal variable.

But it might need to be done the right way. In order to fix the parallel build I think you need to put the save/restore stages in a function (since it will have a scope), see https://stackoverflow.com/a/31140797

The original pytorch is use this method, it works on Linux but not works on Windows. If It works, why I do a lot of work to fix it?
Linux static lib is a .a file and it is one file library.
Windows static is a big .lib file and contains everything. Windows dynamic has two files, a small .lib file only contains interface, and a .dll file.
It failed and report can't find symbol issue. The second build, I guess BUILD_SHARED_LIBS changed by restore and let the linker confused sleef lib type.
I can't explain the detail why it doesn't work. But I tried it and it not work. It is OS related.

@xuhancn xuhancn requested a review from blapie March 1, 2024 12:43
@blapie
Copy link
Collaborator

blapie commented Mar 1, 2024

First, what I was getting at is not simply moving back to using BUILD_SHARED_LIBS but also making sure save/add_subdirectory/restore occurs inside a function (or at least a scope) so it builds safely in parallel.
Have you tried that? This should be applied everywhere save/restore is used (not just for SLEEF).

I can't explain the detail why it doesn't work. But I tried it and it not work. It is OS related.

Second, if the above approach does not work, maybe it would help to dive a little bit more into why this actually fails.
We are not excluding that something else needs to be fixed in SLEEF to make pytorch build on Windows. Introducing SLEEF_BUILD_SHARED_LIBS is just not an appropriate change from our point of view.
This blog is a good reference for this type of issues and it mentions the function-trick https://alexreinking.com/blog/building-a-dual-shared-and-static-library-with-cmake.html

If PyTorch's use of BUILD_SHARED_LIBS diverges from its definition (by building both static and shared components) then it needs to be handled properly in PyTorch and not in the components.
Therefore, I would not refer to the save/restore trick as a "work around", or more precisely not refer to this as a SLEEF issue.

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 1, 2024

First, what I was getting at is not simply moving back to using BUILD_SHARED_LIBS but also making sure save/add_subdirectory/restore occurs inside a function (or at least a scope) so it builds safely in parallel. Have you tried that? This should be applied everywhere save/restore is used (not just for SLEEF).

I can't explain the detail why it doesn't work. But I tried it and it not work. It is OS related.

Second, if the above approach does not work, maybe it would help to dive a little bit more into why this actually fails. We are not excluding that something else needs to be fixed in SLEEF to make pytorch build on Windows. Introducing SLEEF_BUILD_SHARED_LIBS is just not an appropriate change from our point of view. This blog is a good reference for this type of issues and it mentions the function-trick https://alexreinking.com/blog/building-a-dual-shared-and-static-library-with-cmake.html

If PyTorch's use of BUILD_SHARED_LIBS diverges from its definition (by building both static and shared components) then it needs to be handled properly in PyTorch and not in the components. Therefore, I would not refer to the save/restore trick as a "work around", or more precisely not refer to this as a SLEEF issue.

set(sources ...)
add_library(SomeLib_static STATIC ${sources})
add_library(SomeLib_shared SHARED ${sources})

From the example, gen sleef_static and sleef(shared) at the same time? do you think it is acceptable solution.

@blapie
Copy link
Collaborator

blapie commented Mar 1, 2024

Not sure I understand. I don't think this will solve the issue, but feel fee to try.

I was picturing defining a function encapsulating everything at this link https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L442-L478. Then just calling this function.

function(add_sleef)
 // basically all the content at link above
endfunction()

add_sleef()

The benefit of the function is that you don't even have to save and restore BUILD_SHARED_LIBS and BUILD_TESTS, the scope of the function itself protects from overriding BUILD_SHARED_LIBS at the global scope.

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 1, 2024

Not sure I understand. I don't think this will solve the issue, but feel fee to try.

I was picturing defining a function encapsulating everything at this link https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L442-L478. Then just calling this function.

function(add_sleef)
 // basically all the content at link above
endfunction()

add_sleef()

The benefit of the function is that you don't even have to save and restore BUILD_SHARED_LIBS and BUILD_TESTS, the scope of the function itself protects from overriding BUILD_SHARED_LIBS at the global scope.

  1. I means that If we can consider to build shared and static lib at the same time, like mimalloc: https://github.com/microsoft/mimalloc/blob/master/CMakeLists.txt#L22-L23
    shared library name: sleef.
    static library name: sleef_static.

  2. I don't think cmake function can works. BUILD_SHARED_LIBS is global scope.

image FROM: https://cmake.org/cmake/help/git-stage/variable/BUILD_SHARED_LIBS.html

Please read the offical guide. The offical guide said that:
do so in the top level CMakeLists.txt file, before any add_library() calls. and Note that if bringing external dependencies directly into the build, such as with FetchContent or a direct call to add_subdirectory(), and one of those dependencies has such a call to option(BUILD_SHARED_LIBS ...), the top level project must also call option(BUILD_SHARED_LIBS ...)before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.

Use the BUILD_SHARED_LIBS will bring in a lot of limitation.

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 4, 2024

Hi @blapie
Acturaly, sleef library as a math library, it is usual been integated into another projects as a compontemt. And in git management system, it is usual as a git submodule.
From the CMake offical document of BUILD_SHARED_LIBS:

  1. do so in the top level CMakeLists.txt file, before any add_library() calls.
  2. Note that if bringing external dependencies directly into the build, such as with FetchContent or a direct call to add_subdirectory(), and one of those dependencies has such a call to option(BUILD_SHARED_LIBS ...), the top level project must also call option(BUILD_SHARED_LIBS ...)before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.

In git-cmake build system, we usual build dependency via add_subdirectory(), As a submodule, sleef request its parent project also define BUILD_SHARED_LIBS ? whether it too much limitation?
pytorch is not and can't set BUILD_SHARED_LIBS due to it is a CMake reserved variable. And the doc said: Failure to do so can lead to different behavior between the first and subsequent CMake runs. is similar to my conclusion: #510 (comment)

We maintains a open source project should consider easy to developer use. Use BUILD_SHARED_LIBS control sleef will bring in too much limitation, and impact to parent project and other sub-modules.
My PR is good to solve the issue:

  1. "SLEEF_" prefix of "SLEEF_BUILD_SHARED_LIBS" shows it only control and impact on sleef project.
  2. Not use CMake global varibal and not impact to other modules.
  3. No limitation and easy for developer use.

@blapie
Copy link
Collaborator

blapie commented Mar 4, 2024

We have no issue with people using SLEEF as a submodule, if anything that is probably the preferred approach and we provide examples for that.

As a submodule, sleef request its parent project also define BUILD_SHARED_LIBS ? whether it too much limitation?

No, you don't have to define it, but if you need to switch between static and shared libs then that is the flag you should use.

Sorry I am going to repeat myself but your PR "solves" the problem from your point of view (or rather hides it) by introducing flaws in our codebase. We cannot accept a change like that simply because it will give unexpected results to SLEEF users (who also use it as a submodule).

Instead I suggested a simple alternative, that you don't want to try, which is encapsulating the options and add_subdirectory(sleef) in a function. I did it for you and it works, as far as your simple example goes, it does not modify BUILD_SHARED_LIBS in the global scope : xuhancn/sleef_cmake_example#1

I don't see what's wrong with this approach

  1. it is actually safer in terms of parallel build and global scope for pytorch
  2. you don't need me to approve a SLEEF PR to fix your problem

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 4, 2024

We have no issue with people using SLEEF as a submodule, if anything that is probably the preferred approach and we provide examples for that.

As a submodule, sleef request its parent project also define BUILD_SHARED_LIBS ? whether it too much limitation?

No, you don't have to define it, but if you need to switch between static and shared libs then that is the flag you should use.

Sorry I am going to repeat myself but your PR "solves" the problem from your point of view (or rather hides it) by introducing flaws in our codebase. We cannot accept a change like that simply because it will give unexpected results to SLEEF users (who also use it as a submodule).

Instead I suggested a simple alternative, that you don't want to try, which is encapsulating the options and add_subdirectory(sleef) in a function. I did it for you and it works, as far as your simple example goes, it does not modify BUILD_SHARED_LIBS in the global scope : xuhancn/sleef_cmake_example#1

I don't see what's wrong with this approach

  1. it is actually safer in terms of parallel build and global scope for pytorch
  2. you don't need me to approve a SLEEF PR to fix your problem

I tried your code just now, snapshot as below:
image

  1. Output directory is not coordinate as my origin example.
  2. The output library is shared lib(.dll), but not static lib as your expected.Maybe is the reason pytorch original code not works on Windows.

blapie_example_output.txt
I attached the all log to you for analysis. Windows is not as same as linux, you can setup a Windows build environment and reproduce it for validate.
Maybe your referenced blog existly works on Linux, but Windows is different.

* default build sleef in shared mode.
* fix install files to dummy folder.
@xuhancn xuhancn force-pushed the xu_add_optional_static_build_config branch from 3050f5e to f98ce8f Compare March 8, 2024 03:36
@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 12, 2024

@blapie some days passed, any update?

@blapie
Copy link
Collaborator

blapie commented Mar 13, 2024

Hi! Sorry, I did not find time to test before. But I just did it on MSYS2 (Unix-like Windows environment) and as expected I get static libs when setting BUILD_SHARED_LIBS=OFF in the function, and I get shared libs otherwise. This is a Windows on Arm environment and I'm using Clang (not cl) so not exactly your case, but the difference should be quite small.

It would be good to go to the bottom of that, as it could simply be that a minor change in SLEEF is needed to work on Windows on x86 with cl.

Please note that SLEEF support for MSVC/CL is limited for now, in particular reproducing and testing is not available on the repo.

@blapie
Copy link
Collaborator

blapie commented Mar 15, 2024

Sounds very strange. I don't see the connection between the diff and the error message.

@blapie
Copy link
Collaborator

blapie commented Mar 15, 2024

Should xuhancn/pytorch#8 be rebased first?

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 15, 2024

Should xuhancn/pytorch#8 be rebased first?

No, it is passed if use SLEEF_BUILD_SHARED_LIB.

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 15, 2024

@jgong5 I have update code to compatible to BUILD_SHARED_LIB, and also work for SLEEF_BUILD_SHARED_LIB.
@blapie please review and merge ASAP.

image
Local test passed on pytorch/pytorch#118980 code.

@blapie
Copy link
Collaborator

blapie commented Mar 15, 2024

I'm looking into your last commit and will leave comments in patch.

Meanwhile let me go back to the initial problem for a second.

I just cannot believe that CMake created a variable that is not usable when doing such trivial thing as adding components as submodules. In fact what CMake says is (sorry for quoting again but this is pretty relevant to our position)

Note that if bringing external dependencies directly into the build, such as with [FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html#module:FetchContent) or a direct call to [add_subdirectory()](https://cmake.org/cmake/help/latest/command/add_subdirectory.html#command:add_subdirectory), and one of those dependencies has such a call to [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option), the top level project must also call [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option) before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.

The "Failure to do so" refers to the need to add option(BUILD_SHARED_LIBS ...) before add_subdirectory(sleef). I don't think at any point we tried to follow this indication and add option(BUILD_SHARED_LIBS ...) rather than set(BUILD_SHARED_LIBS ...) before add_subdirectory(sleef).

Could it be a simple solution to the issue? Could you please try that?

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 15, 2024

I'm looking into your last commit and will leave comments in patch.

Meanwhile let me go back to the initial problem for a second.

I just cannot believe that CMake created a variable that is not usable when doing such trivial thing as adding components as submodules. In fact what CMake says is (sorry for quoting again but this is pretty relevant to our position)

Note that if bringing external dependencies directly into the build, such as with [FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html#module:FetchContent) or a direct call to [add_subdirectory()](https://cmake.org/cmake/help/latest/command/add_subdirectory.html#command:add_subdirectory), and one of those dependencies has such a call to [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option), the top level project must also call [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option) before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.

The "Failure to do so" refers to the need to add option(BUILD_SHARED_LIBS ...) before add_subdirectory(sleef). I don't think at any point we tried to follow this indication and add option(BUILD_SHARED_LIBS ...) rather than set(BUILD_SHARED_LIBS ...) before add_subdirectory(sleef).

Could it be a simple solution to the issue? Could you please try that?

My solution is merge this PR.
You can setup a Windows MSVC environment and try it by yourself. Thanks.

@blapie
Copy link
Collaborator

blapie commented Mar 15, 2024

@alexreinking Hello! Sorry to bother you, but we stumbled upon your blog (https://alexreinking.com/blog/building-a-dual-shared-and-static-library-with-cmake.html) and wondered if maybe you could shine some light on this situation?
There seems to be issues controlling BUILD_SHARED_LIBS at the component boundary in pytorch (https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L423-L485).
We would like to avoid merging a change like currently suggested in the PR, as we think it will confuse users since it will result in unexpected behaviour.
What do you recommend?

Comment on lines +31 to +37
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
if(NOT DEFINED ${SLEEF_BUILD_SHARED_LIBS})
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS})
else()
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done with options instead? For instance,

option(BUILD_SHARED_LIBS "Build sleef as shared library" ON)
option(SLEEF_BUILD_SHARED_LIBS "Build sleef as shared library" ${BUILD_SHARED_LIBS})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blapie -- These are not equivalent because this defines SLEEF_BUILD_SHARED_LIBS as a cache variable when no variable by that name (normal or cache) exists. So then changing BUILD_SHARED_LIBS only works as expected on the first configuration.

My blog post suggests the following:

Suggested change
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
if(NOT DEFINED ${SLEEF_BUILD_SHARED_LIBS})
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS})
else()
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif()
if (DEFINED SLEEF_BUILD_SHARED_LIBS)
set(BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif ()

Then you can just look at the truthiness of BUILD_SHARED_LIBS down the line. If you care about documenting this variable in the CMake CLI/GUI, then this approach works:

Suggested change
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
if(NOT DEFINED ${SLEEF_BUILD_SHARED_LIBS})
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS})
else()
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif()
set(SLEEF_BUILD_SHARED_LIBS "undefined"
CACHE BOOL "Override BUILD_SHARED_LIBS while configuring SLEEF")
if (NOT SLEEF_BUILD_SHARED_LIBS STREQUAL "undefined")
set(BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif ()

Copy link
Contributor

@alexreinking alexreinking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a little carried away with code review outside the question at hand; sorry about this. It's just that CMakeLists.txt that don't start with cmake_minimum_required+project are liable to break without warning on any future version.

I don't understand how my blog post was used to justify splatting out _INTERNAL_SLEEF_BUILD_SHARED_LIBS everywhere. Just use SLEEF_BUILD_SHARED_LIBS as the value of BUILD_SHARED_LIBS if and only if it is defined by the user. I repeat the approach in my blog post and give an alternative approach.

Comment on lines +31 to +37
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
if(NOT DEFINED ${SLEEF_BUILD_SHARED_LIBS})
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS})
else()
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blapie -- These are not equivalent because this defines SLEEF_BUILD_SHARED_LIBS as a cache variable when no variable by that name (normal or cache) exists. So then changing BUILD_SHARED_LIBS only works as expected on the first configuration.

My blog post suggests the following:

Suggested change
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
if(NOT DEFINED ${SLEEF_BUILD_SHARED_LIBS})
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS})
else()
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif()
if (DEFINED SLEEF_BUILD_SHARED_LIBS)
set(BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif ()

Then you can just look at the truthiness of BUILD_SHARED_LIBS down the line. If you care about documenting this variable in the CMake CLI/GUI, then this approach works:

Suggested change
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
if(NOT DEFINED ${SLEEF_BUILD_SHARED_LIBS})
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS})
else()
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif()
set(SLEEF_BUILD_SHARED_LIBS "undefined"
CACHE BOOL "Override BUILD_SHARED_LIBS while configuring SLEEF")
if (NOT SLEEF_BUILD_SHARED_LIBS STREQUAL "undefined")
set(BUILD_SHARED_LIBS ${SLEEF_BUILD_SHARED_LIBS})
endif ()

Comment on lines +166 to +168
if(SLEEF_ENABLE_LTO AND _INTERNAL_SLEEF_BUILD_SHARED_LIBS)
message(FATAL_ERROR "SLEEF_ENABLE_LTO and _INTERNAL_SLEEF_BUILD_SHARED_LIBS cannot be specified at the same time")
endif(SLEEF_ENABLE_LTO AND _INTERNAL_SLEEF_BUILD_SHARED_LIBS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my other suggestions:

Suggested change
if(SLEEF_ENABLE_LTO AND _INTERNAL_SLEEF_BUILD_SHARED_LIBS)
message(FATAL_ERROR "SLEEF_ENABLE_LTO and _INTERNAL_SLEEF_BUILD_SHARED_LIBS cannot be specified at the same time")
endif(SLEEF_ENABLE_LTO AND _INTERNAL_SLEEF_BUILD_SHARED_LIBS)
if (SLEEF_ENABLE_LTO AND BUILD_SHARED_LIBS)
message(FATAL_ERROR "SLEEF_ENABLE_LTO and (SLEEF_)BUILD_SHARED_LIBS cannot be enabled at the same time")
endif ()

Note that duplicating the condition in the endif hasn't been a requirement since at least 2.8.x but it might be as far back as 2.4.x. Definitely not needed in 3.18, which is your minimum.

@@ -305,7 +311,7 @@ if(SLEEF_SHOW_CONFIG)
message(" Native build dir: ${NATIVE_BUILD_DIR}")
endif(CMAKE_CROSSCOMPILING)
message(STATUS "Using option `${SLEEF_C_FLAGS}` to compile libsleef")
message(STATUS "Building shared libs : " ${BUILD_SHARED_LIBS})
message(STATUS "Building shared libs : " ${_INTERNAL_SLEEF_BUILD_SHARED_LIBS})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message(STATUS "Building shared libs : " ${_INTERNAL_SLEEF_BUILD_SHARED_LIBS})
message(STATUS "Building shared libs : " ${BUILD_SHARED_LIBS})

@@ -7,7 +7,7 @@ include(CheckLanguage)

if (SLEEF_BUILD_STATIC_TEST_BINS)
set(CMAKE_FIND_LIBRARY_SUFFIXES ".a")
set(BUILD_SHARED_LIBS OFF)
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set(_INTERNAL_SLEEF_BUILD_SHARED_LIBS OFF)
set(BUILD_SHARED_LIBS OFF)

@@ -834,7 +834,7 @@ endif()

# Set common definitions

if (NOT BUILD_SHARED_LIBS)
if (NOT _INTERNAL_SLEEF_BUILD_SHARED_LIBS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (NOT _INTERNAL_SLEEF_BUILD_SHARED_LIBS)
if (NOT BUILD_SHARED_LIBS)

@@ -157,9 +163,9 @@ separate build directory. Note: Please remove autogenerated file \
`CMakeCache.txt` and directory `CMakeFiles` in the current directory.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere you should just use SLEEF_SOURCE_DIR and SLEEF_BINARY_DIR since those are the standard and expected names. The project command defines them and the VERSION variables.

https://cmake.org/cmake/help/v3.18/command/project.html#project

endif(SLEEF_ENABLE_LTO AND BUILD_SHARED_LIBS)
if(SLEEF_ENABLE_LTO AND _INTERNAL_SLEEF_BUILD_SHARED_LIBS)
message(FATAL_ERROR "SLEEF_ENABLE_LTO and _INTERNAL_SLEEF_BUILD_SHARED_LIBS cannot be specified at the same time")
endif(SLEEF_ENABLE_LTO AND _INTERNAL_SLEEF_BUILD_SHARED_LIBS)

if(SLEEF_ENABLE_LTO)
cmake_policy(SET CMP0069 NEW)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cmake_policy(SET CMP0069 NEW)

That policy is already enabled by cmake_minimum_required(VERSION 3.18). It's enabled for 3.9+.

@@ -133,7 +133,7 @@ optimized, or any other special set of flags.
- `SLEEF_SHOW_CONFIG` : Show relevant CMake variables upon configuring a build
- `SLEEF_SHOW_ERROR_LOG` : Show the content of CMakeError.log

- `BUILD_SHARED_LIBS` : Static libs are built if set to FALSE
- `SLEEF_BUILD_SHARED_LIBS` : Static libs are built if set to FALSE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `SLEEF_BUILD_SHARED_LIBS` : Static libs are built if set to FALSE
- `SLEEF_BUILD_SHARED_LIBS` : Static libs are built if set to FALSE. If not manually set, then `BUILD_SHARED_LIBS` is used.

Comment on lines +376 to +380
if(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
add_library(${TARGET_LIBDFT} SHARED $<TARGET_OBJECTS:dftcommon_obj> $<TARGET_OBJECTS:${TARGET_LIBARRAYMAP_OBJ}>)
else()
add_library(${TARGET_LIBDFT} STATIC $<TARGET_OBJECTS:dftcommon_obj> $<TARGET_OBJECTS:${TARGET_LIBARRAYMAP_OBJ}>)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
add_library(${TARGET_LIBDFT} SHARED $<TARGET_OBJECTS:dftcommon_obj> $<TARGET_OBJECTS:${TARGET_LIBARRAYMAP_OBJ}>)
else()
add_library(${TARGET_LIBDFT} STATIC $<TARGET_OBJECTS:dftcommon_obj> $<TARGET_OBJECTS:${TARGET_LIBARRAYMAP_OBJ}>)
endif()
add_library(${TARGET_LIBDFT} $<TARGET_OBJECTS:dftcommon_obj> $<TARGET_OBJECTS:${TARGET_LIBARRAYMAP_OBJ}>)

Comment on lines +362 to +366
if(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
add_library(${TARGET_LIBSLEEF} SHARED ${STANDARD_SOURCES})
else()
add_library(${TARGET_LIBSLEEF} STATIC ${STANDARD_SOURCES})
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(_INTERNAL_SLEEF_BUILD_SHARED_LIBS)
add_library(${TARGET_LIBSLEEF} SHARED ${STANDARD_SOURCES})
else()
add_library(${TARGET_LIBSLEEF} STATIC ${STANDARD_SOURCES})
endif()
add_library(${TARGET_LIBSLEEF} ${STANDARD_SOURCES})

I'm going to stop highlighting all the places this needs to be reverted.

@alexreinking
Copy link
Contributor

alexreinking commented Mar 15, 2024

With the approach I outlined above, PyTorch should be able to build SLEEF by setting SLEEF_BUILD_SHARED_LIBS before calling add_subdirectory (barring other bugs I'm not aware of). I understand it is currently saving and restoring the value of BUILD_SHARED_LIBS. That will still work.

If you can show me with clear, reproducible steps in a self-contained post why my approach doesn't work, then I will be very happy to refine it further. I don't have time to read through 100+ comments across multiple PRs.

@jgong5
Copy link

jgong5 commented Mar 16, 2024

With the approach I outlined above, PyTorch should be able to build SLEEF by setting SLEEF_BUILD_SHARED_LIBS before calling add_subdirectory (barring other bugs I'm not aware of). I understand it is currently saving and restoring the value of BUILD_SHARED_LIBS. That will still work.

Thanks @alexreinking for the suggestions. @blapie Sounds like this is exactly what we discussed in the meeting. I believe it should work with PyTorch.

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 16, 2024

I got a little carried away with code review outside the question at hand; sorry about this. It's just that CMakeLists.txt that don't start with cmake_minimum_required+project are liable to break without warning on any future version.

I don't understand how my blog post was used to justify splatting out _INTERNAL_SLEEF_BUILD_SHARED_LIBS everywhere. Just use SLEEF_BUILD_SHARED_LIBS as the value of BUILD_SHARED_LIBS if and only if it is defined by the user. I repeat the approach in my blog post and give an alternative approach.

@alexreinking I use _INTERNAL_SLEEF_BUILD_SHARED_LIBS as an internal variable to control static/shared lib output. It is in order to avoid use BUILD_SHARED_LIB from the CMake offical document, snapshot as below:
image
BUILD_SHARED_LIB has too many limitation to use, I marked with highlight and red pen. I think my current PR is good enough and tested on pytorch.
If legacy user still use BUILD_SHARED_LIBS but not define SLEEF_BUILD_SHARED_LIBS, we can compatiable to them.
If new user defined SLEEF_BUILD_SHARED_LIBS, it will take over the control right.
Whatever, I think still use BUILD_SHARED_LIBS & add_library without STATIC\SHARED arg, which it not stronge enough and safe enough. So, I can't agree to use SLEEF_BUILD_SHARED_LIBS overwrite BUILD_SHARED_LIBS solution.
Please agree to my current code, thanks. @alexreinking.

@jgong5
Copy link

jgong5 commented Mar 16, 2024

With the approach I outlined above, PyTorch should be able to build SLEEF by setting SLEEF_BUILD_SHARED_LIBS before calling add_subdirectory (barring other bugs I'm not aware of). I understand it is currently saving and restoring the value of BUILD_SHARED_LIBS. That will still work.

Thanks @alexreinking for the suggestions. @blapie Sounds like this is exactly what we discussed in the meeting. I believe it should work with PyTorch.

Oh, it seems I misunderstood the approach proposed by @alexreinking . Thanks @xuhancn for the reminder. I guess the difference is whether to overwrite the global BUILD_SHARED_LIBS. I doubt overwriting it would work (since PyTorch does the save/restore too and it doesn't work on Windows) and it is also not necessary. We only need to pass the SHARED/STATIC flag to the add_library according to BUILD_SHARED_LIBS and SLEEF_BUILD_SHARED_LIBS just like what @xuhancn without bothering altering the value of BUILD_SHARED_LIBS.

@alexreinking
Copy link
Contributor

alexreinking commented Mar 16, 2024

I guess the difference is whether to overwrite the global BUILD_SHARED_LIBS

@jgong5 what I suggested does not alter the global BUILD_SHARED_LIBS option. When you call add_subdirectory it creates a new variable environment. Setting a normal variable in a subproject is not visible to the parent project.

We only need to pass the SHARED/STATIC flag to the add_library according to BUILD_SHARED_LIBS

This is exactly what BUILD_SHARED_LIBS does.

@alexreinking
Copy link
Contributor

@xuhancn - the documentation you cite outlines a pitfall of using option to do this. I do not use option. I use set.

option will create a cache variable when no variable at all exists. This will then be visible to the parent project on subsequent configure steps.

My solution does not suffer from this problem because it does not touch the cache.

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 16, 2024

@xuhancn - the documentation you cite outlines a pitfall of using option to do this. I do not use option. I use set.

option will create a cache variable when no variable at all exists. This will then be visible to the parent project on subsequent configure steps.

My solution does not suffer from this problem because it does not touch the cache.

Currently, we need to guarantee it works on pytorch's submodule. My PR code is validated on pytorch all PreCI.
image
If you persist in your solution and willing to contribute to sleef. Could you please submit your solution as a sleef PR? I will help on validate your PR on pytorch. @alexreinking

@alexreinking
Copy link
Contributor

Instead I suggested a simple alternative, that you don't want to try, which is encapsulating the options and add_subdirectory(sleef) in a function. I did it for you and it works, as far as your simple example goes, it does not modify BUILD_SHARED_LIBS in the global scope

@blapie -- I think you're 100% correct in your assessment that PyTorch could use a function to set up BUILD_SHARED_LIBS before calling add_subdirectory. I still think that providing SLEEF_BUILD_SHARED_LIBS is nicer for downstreams because doing that function dance is pretty ugly. The situation has gotten a little better since block() was introduced in CMake 3.25.

@xuhancn
Copy link
Contributor Author

xuhancn commented Mar 18, 2024

Currently,we have two PRs can fix BUILD_SHARED_LIB unexpected behavior on Windows MSVC.

  1. This PR: add static build config. #510
  2. alexreinking's PR: Add SLEEF_BUILD_SHARED_LIBS to override BUILD_SHARED_LIBS from including projects #531
    Both of them can pass pytorch's PreCI.
PR_#510 PR_#531
Fix build issue Yes Yes
Implement Avoid to use BUILD_SHARED_LIB Overwrite to BUILD_SHARED_LIB
Support SLEEF_BUILD_SHARED_LIB Yes Yes
compatible to BUILD_SHARED_LIB Yes Yes
Easy to maintain Basic CMake knowledge is ok Need deep dive to CMake

I made a comparsion table to summary the two PRs. My PR is simpler than alexreinking's. My suggestion is merge my PR into master branch.

Any comments? @blapie @jgong5 @alexreinking

@jgong5
Copy link

jgong5 commented Mar 18, 2024

I made a comparsion table to summary the two PRs. My PR is simpler than alexreinking's. My suggestion is merge my PR into master branch.

Any comments? @blapie @jgong5 @alexreinking

I don't have strong preference on the option. Both options LGTM.

@alexreinking
Copy link
Contributor

@xuhancn - I think your comparison is wildly unfair.

My PR is simpler than alexreinking's.

My PR consists of three lines that matter for the shared libraries issue. It also fixes a real misuse of CMake and improves some console output.

Your PR requires ongoing maintenance to ensure that any new add_library call is guarded by a check to the unnecessary and nonstandard variable you introduced.

With my PR, nothing special needs to be done going forward.

Need deep dive to CMake

This is also untrue. One just needs basic facts about how variable scope works in CMake.


Re: @jgong5
Cc: @blapie

I don't have strong preference on the option. Both options LGTM.

I encourage the maintainers here to look at both diffs and draw their own conclusions.

@blapie
Copy link
Collaborator

blapie commented Mar 18, 2024

@alexreinking - Hello! Thanks a lot for the insight, I learned a lot reading through your suggestion. First of all, I agree that SLEEF's cmake was in need for some proper housekeeping, which made it prone to bugs.
Thank you for making it clear where the issue most likely came from (set/option, overwriting to cache, ...). I agree that the SLEEF_-prefixed variable can be helpful for downstream projects, and I like that we stick to using the standard variable internally.
I am totally conscious that you did not have to respond to our enquiry, let alone provide so much insight as well as an entire PR/solution. I wish your contribution was welcomed with a bit more warmth...

@jgong5 - I believe this solution meets in the middle while make SLEEF build system way more robust. Thanks for contributing to the discussion and sorry I could not reply before.

I will review #531 but from the first couple of messages it sounded all good.

@blapie
Copy link
Collaborator

blapie commented Mar 18, 2024

Happy we finally got to a satisfying solution for everyone, thanks @alexreinking, @xuhancn and @jgong5.

@blapie blapie closed this Mar 18, 2024
@jgong5
Copy link

jgong5 commented Mar 18, 2024

@blapie @alexreinking Thanks a lot for your help and effort for landing it!

@xuhancn xuhancn deleted the xu_add_optional_static_build_config branch March 19, 2024 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants