Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++] First attempt to regroup a few modules in the modulemap #98214

Merged
merged 1 commit into from
Aug 30, 2024

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Jul 9, 2024

We split up all the headers into top-level modules when we broke up cycles with the C compatibility headers. However, this resulted in a large number of small modules, which is awkward and clearly against the philosophy of Clang modules. This was necessary to make things work.

This patch regroups a few headers from two leaf modules: stop_token and pstl. It should be pretty uncontroversial that grouping these headers into a single module doesn't introduce any cyclic dependency, yet it's a first step towards reducing the number of top-level modules we have in our modulemap.

@ldionne ldionne requested a review from a team as a code owner July 9, 2024 20:25
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jul 9, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 9, 2024

@llvm/pr-subscribers-libcxx

Author: Louis Dionne (ldionne)

Changes

We split up all the headers into top-level modules when we broke up cycles with the C compatibility headers. However, this resulted in a large number of small modules, which is awkward and clearly against the philosophy of Clang modules. This was necessary to make things work.

This patch regroups a few headers from two leaf modules: stop_token and pstl. It should be pretty uncontroversial that grouping these headers into a single module doesn't introduce any cyclic dependency, yet it's a first step towards reducing the number of top-level modules we have in our modulemap.


Full diff: https://github.com/llvm/llvm-project/pull/98214.diff

4 Files Affected:

  • (modified) libcxx/include/module.modulemap (+20-46)
  • (modified) libcxx/test/libcxx/thread/thread.stoptoken/atomic_unique_lock.pass.cpp (+1-1)
  • (modified) libcxx/test/libcxx/thread/thread.stoptoken/intrusive_list_view.pass.cpp (+1-1)
  • (modified) libcxx/test/libcxx/thread/thread.stoptoken/intrusive_shared_ptr.pass.cpp (+1-1)
diff --git a/libcxx/include/module.modulemap b/libcxx/include/module.modulemap
index 9ffccf66ff094..59ad740ac164a 100644
--- a/libcxx/include/module.modulemap
+++ b/libcxx/include/module.modulemap
@@ -245,8 +245,15 @@ module std_stdexcept [system] {
   header "stdexcept"
   export *
 }
-module std_stop_token {
+module std_stop_token [system] {
   header "stop_token"
+  private header "__stop_token/atomic_unique_lock.h"
+  private header "__stop_token/intrusive_list_view.h"
+  private header "__stop_token/intrusive_shared_ptr.h"
+  private header "__stop_token/stop_callback.h"
+  private header "__stop_token/stop_source.h"
+  private header "__stop_token/stop_state.h"
+  private header "__stop_token/stop_token.h"
   export *
 }
 module std_streambuf [system] {
@@ -1584,41 +1591,25 @@ module std_private_numeric_transform_exclusive_scan [system] { header "__numeric
 module std_private_numeric_transform_inclusive_scan [system] { header "__numeric/transform_inclusive_scan.h" }
 module std_private_numeric_transform_reduce         [system] { header "__numeric/transform_reduce.h" }
 
-module std_private_pstl_backend                    [system] {
+module std_private_pstl [system] {
   header "__pstl/backend.h"
-  export *
-}
-module std_private_pstl_backend_fwd                [system] {
   header "__pstl/backend_fwd.h"
-  export *
-}
-module std_private_pstl_backends_default           [system] {
   header "__pstl/backends/default.h"
-  export *
-}
-module std_private_pstl_backends_libdispatch       [system] {
   header "__pstl/backends/libdispatch.h"
-  export *
-}
-module std_private_pstl_backends_serial            [system] {
   header "__pstl/backends/serial.h"
-  export *
-}
-module std_private_pstl_backends_std_thread        [system] {
   header "__pstl/backends/std_thread.h"
-  export *
+  header "__pstl/cpu_algos/any_of.h"
+  header "__pstl/cpu_algos/cpu_traits.h"
+  header "__pstl/cpu_algos/fill.h"
+  header "__pstl/cpu_algos/find_if.h"
+  header "__pstl/cpu_algos/for_each.h"
+  header "__pstl/cpu_algos/merge.h"
+  header "__pstl/cpu_algos/stable_sort.h"
+  header "__pstl/cpu_algos/transform.h"
+  header "__pstl/cpu_algos/transform_reduce.h"
+  header "__pstl/dispatch.h"
+  header "__pstl/handle_exception.h"
 }
-module std_private_pstl_cpu_algos_any_of           [system] { header "__pstl/cpu_algos/any_of.h" }
-module std_private_pstl_cpu_algos_cpu_traits       [system] { header "__pstl/cpu_algos/cpu_traits.h" }
-module std_private_pstl_cpu_algos_fill             [system] { header "__pstl/cpu_algos/fill.h" }
-module std_private_pstl_cpu_algos_find_if          [system] { header "__pstl/cpu_algos/find_if.h" }
-module std_private_pstl_cpu_algos_for_each         [system] { header "__pstl/cpu_algos/for_each.h" }
-module std_private_pstl_cpu_algos_merge            [system] { header "__pstl/cpu_algos/merge.h" }
-module std_private_pstl_cpu_algos_stable_sort      [system] { header "__pstl/cpu_algos/stable_sort.h" }
-module std_private_pstl_cpu_algos_transform        [system] { header "__pstl/cpu_algos/transform.h" }
-module std_private_pstl_cpu_algos_transform_reduce [system] { header "__pstl/cpu_algos/transform_reduce.h" }
-module std_private_pstl_dispatch                   [system] { header "__pstl/dispatch.h" }
-module std_private_pstl_handle_exception           [system] { header "__pstl/handle_exception.h" }
 
 module std_private_queue_fwd [system] { header "__fwd/queue.h" }
 
@@ -1773,23 +1764,6 @@ module std_private_span_span_fwd [system] { header "__fwd/span.h" }
 
 module std_private_stack_fwd [system] { header "__fwd/stack.h" }
 
-module std_private_stop_token_atomic_unique_lock   [system] { header "__stop_token/atomic_unique_lock.h" }
-module std_private_stop_token_intrusive_list_view  [system] { header "__stop_token/intrusive_list_view.h" }
-module std_private_stop_token_intrusive_shared_ptr [system] { header "__stop_token/intrusive_shared_ptr.h" }
-module std_private_stop_token_stop_callback        [system] { header "__stop_token/stop_callback.h" }
-module std_private_stop_token_stop_source          [system] {
-  header "__stop_token/stop_source.h"
-  export *
-}
-module std_private_stop_token_stop_state           [system] {
-  header "__stop_token/stop_state.h"
-  export *
-}
-module std_private_stop_token_stop_token           [system] {
-  header "__stop_token/stop_token.h"
-  export *
-}
-
 module std_private_string_char_traits           [system] {
   header "__string/char_traits.h"
   export *
diff --git a/libcxx/test/libcxx/thread/thread.stoptoken/atomic_unique_lock.pass.cpp b/libcxx/test/libcxx/thread/thread.stoptoken/atomic_unique_lock.pass.cpp
index 2a9b828f4389c..bd08ac17e9964 100644
--- a/libcxx/test/libcxx/thread/thread.stoptoken/atomic_unique_lock.pass.cpp
+++ b/libcxx/test/libcxx/thread/thread.stoptoken/atomic_unique_lock.pass.cpp
@@ -12,7 +12,7 @@
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17
 
-#include <__stop_token/atomic_unique_lock.h>
+#include <stop_token>
 #include <atomic>
 #include <cassert>
 #include <chrono>
diff --git a/libcxx/test/libcxx/thread/thread.stoptoken/intrusive_list_view.pass.cpp b/libcxx/test/libcxx/thread/thread.stoptoken/intrusive_list_view.pass.cpp
index 85cd978625895..35b96c0d84fea 100644
--- a/libcxx/test/libcxx/thread/thread.stoptoken/intrusive_list_view.pass.cpp
+++ b/libcxx/test/libcxx/thread/thread.stoptoken/intrusive_list_view.pass.cpp
@@ -9,7 +9,7 @@
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17
 
-#include <__stop_token/intrusive_list_view.h>
+#include <stop_token>
 #include <cassert>
 
 #include "test_macros.h"
diff --git a/libcxx/test/libcxx/thread/thread.stoptoken/intrusive_shared_ptr.pass.cpp b/libcxx/test/libcxx/thread/thread.stoptoken/intrusive_shared_ptr.pass.cpp
index 47440015f2c50..3ba8f484d0c64 100644
--- a/libcxx/test/libcxx/thread/thread.stoptoken/intrusive_shared_ptr.pass.cpp
+++ b/libcxx/test/libcxx/thread/thread.stoptoken/intrusive_shared_ptr.pass.cpp
@@ -9,7 +9,7 @@
 
 // UNSUPPORTED: c++03, c++11, c++14, c++17
 
-#include <__stop_token/intrusive_shared_ptr.h>
+#include <stop_token>
 #include <atomic>
 #include <cassert>
 #include <utility>

header "stop_token"
private header "__stop_token/atomic_unique_lock.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these still be submodules?

Copy link
Member Author

@ldionne ldionne Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know? Is there a reason why we should prefer making it a submodule rather than listing private headers like this? Genuine question, I'm not an expert on Clang modules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes generally every header should be in its own submodule. Otherwise including stop_token when building with modules will implicitly include all of these headers as well, even if stop_token itself doesn't include them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But <stop_token> does need these headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes. The stop_token implementation headers are a bit unique in that stop_token itself directly includes all of them, and none of the other headers do. Something more typical is like __type_traits headers. You can't do like this.

module std_type_traits [system] {
  header "type_traits"
  private header "__type_traits/A.h"
  export *
}
  1. If say algorithm includes __type_traits/A.h it will also get type_traits and all of the other headers declared at the same level.
  2. algorithm can't include __type_traits/A.h because it's private.

So while this is fine for stop_token, it's going to unravel fast when you try to do some of the more significant ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a surprising best practice. Aren't we trying to stop building so many small modules in the first place? The way I understand it:

  • stop_token is a module, and it has both public APIs and private APIs, as denoted by the public/private headers declared in that module
  • type_traits is a module, but it has almost exclusively public headers because __type_traits/xxxxx.h are meant to be included directly from other parts of the code.

Either way -- we don't currently expect other parts of the code to include implementation details of <stop_token>. If they did, I would argue for moving these parts (e.g. intrusive_shared_ptr) to another module that contains generally-useful utilities, or to memory, or something. But obviously <stop_token> should stay a leaf module whatever happens, and in that sense I think it makes most sense to define it the way it is currently defined (with private headers).

Copy link
Contributor

@ian-twilightcoder ian-twilightcoder Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submodules only control visibility. If you make a submodule for all of the __stop_token headers, it'll still build all of them into a single pcm. It only controls what's visible when something includes __stop_token/atomic_unique_lock.h. i.e. if you have this.

module std_stop_token [system] {
  header "stop_token"
  module atomic_unique_lock { private header "__stop_token/atomic_unique_lock.h" }
  ...
}

Then std_stop_token builds the same as it does the way you have it right now and still cuts down on all the module builds/pcms. The difference is that if something includes __stop_token/atomic_unique_lock.h, then it behaves like non-modular C and you only get __stop_token/atomic_unique_lock.h. But the way you have it, if you build with modules then you also get stop_token and all of the other __stop_token headers, which is pretty unintuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the clarification. This is clearly relevant for how to setup e.g. type_traits. However, for stop_token, nobody outside of the stop_token module can include __stop_token/atomic_unique_lock.h, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you keep it a private header, yes that's right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I believe the original mega module had submodules for all of the implementation headers like I'm suggesting btw)

@@ -12,7 +12,7 @@

// UNSUPPORTED: c++03, c++11, c++14, c++17

#include <__stop_token/atomic_unique_lock.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead just add -Wno-private-header to all libcxx/test/libcxx tests? I think ideally we wouldn't export library details at all other than by including detail headers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but I think I'd rather add it on a per test basis to avoid making it too easy to include detail headers unknowingly.

@ldionne ldionne force-pushed the review/modulemap-cleanup-gradual branch from 99edfb4 to 8451f06 Compare July 9, 2024 21:14
@vgvassilev
Copy link
Contributor

I am wondering what prevents us from wrapping all of the content of the modulemap into a module std {. We do this here and we ship it with practically all versions of osx. It mostly works across versions of libcxx/xcode. The problem is that we need to update it every time a private header is added or goes away. The other problem is that in theory we should have such a modulemap per release of libcxx, which does not scale very well. That was the motivation I had in mind when opening #86193.

Although I do not see a point in having separate pcm files for each header, I see two solutions that can restore piece in our software galaxy:

  • Preferably, wrap everything in a single module -- here we have two options. We can keep the current modulemap for the CI and only do the wrapping at release. That's somewhat safe because if the former works the latter should be guaranteed to work too.
  • Less preferably, we can have two versions of that modulemap. One called module.modulemap with the current design the other called std.modulemap in which cmake cuts and pastes the content of module.modulemap into a module std { export * construct. This way users who want a single module can instruct clang to use std.modulemap.

What do you think?

@ldionne
Copy link
Member Author

ldionne commented Jul 12, 2024

I am wondering what prevents us from wrapping all of the content of the modulemap into a module std {. We do this here and we ship it with practically all versions of osx. It mostly works across versions of libcxx/xcode.

The approach you link to has the same problem that caused us to split the modulemap in the first place. It creates a circular dependency if there's a module in the SDK that includes e.g. stdint.h since that's going to use the libc++ stdint.h, but the libc++ module already depends on the SDK.

I'm not convinced that there's any better approach than having a single, simplified modulemap that matches our actual library organization. I'd be happy to change my mind if someone can explain why that can't work or is not desirable, but so far I have not seen any evidence to support that, so I'm still pretty set on pursuing that.

We split up all the headers into top-level modules when we broke up
cycles with the C compatibility headers. However, this resulted in
a large number of small modules, which is awkward and clearly against
the philosophy of Clang modules. This was necessary to make things work.

This patch regroups a few headers from two leaf modules: stop_token and
pstl. It should be pretty uncontroversial that grouping these headers
into a single module doesn't introduce any cyclic dependency, yet it's
a first step towards reducing the number of top-level modules we have
in our modulemap.
@vgvassilev
Copy link
Contributor

The approach you link to has the same problem that caused us to split the modulemap in the first place. It creates a circular dependency if there's a module in the SDK that includes e.g. stdint.h since that's going to use the libc++ stdint.h, but the libc++ module already depends on the SDK.

Can't we add [no_undeclared_includes] to avoid that cycle as per ed84df0?

ldionne added a commit to ldionne/llvm-project that referenced this pull request Aug 19, 2024
These tests are only enabled on Apple platforms. The purpose of these
tests is to ensure that we don't break existing functionality powered
by Clang modules as we refactor the modulemap, since Swift interop
greatly exercises libc++'s support for Clang modules.

In particular, adding these tests to libc++ is not an official statement
of support from libc++ towards Swift interop (which would require a
community consensus that I am not pursuing at this time), but rather a
way for us to find out when we unexpectedly break things so we can decide
how / whether we want to address such breakage.

Once the refactoring of the modulemap started in llvm#98214 is mostly done,
it would be reasonable to remove these tests unless libc++ wants to
pursue support for Swift interop more officially.
@ldionne
Copy link
Member Author

ldionne commented Aug 30, 2024

I just did some experiments with regrouping modules in the exact same way and I managed to get an absolutely gigantic speedup in our clang modules tests (the .gen.py ones). I got from ~40 seconds to ~3 seconds for running these tests.

So I think this approach is fruitful -- I'll merge this and start opening PRs that work towards what I have locally.

@ldionne ldionne merged commit 57fe53c into llvm:main Aug 30, 2024
54 checks passed
@ldionne ldionne deleted the review/modulemap-cleanup-gradual branch August 30, 2024 20:54
@vgvassilev
Copy link
Contributor

I just did some experiments with regrouping modules in the exact same way and I managed to get an absolutely gigantic speedup in our clang modules tests (the .gen.py ones). I got from ~40 seconds to ~3 seconds for running these tests.

So I think this approach is fruitful -- I'll merge this and start opening PRs that work towards what I have locally.

Hi @ldionne. Do you have stats on how many standalone modules we get before and after this PR?

@ldionne
Copy link
Member Author

ldionne commented Sep 3, 2024

I've just started working my way through the modulemap. For now it only reduces the number of top-level modules by a few since I merged all the private type_traits modules. But eventually I expect that we should have roughly as many top-level modules as we have public headers (on the order of).

@vgvassilev
Copy link
Contributor

I've just started working my way through the modulemap. For now it only reduces the number of top-level modules by a few since I merged all the private type_traits modules. But eventually I expect that we should have roughly as many top-level modules as we have public headers (on the order of).

Thanks for clarifying. Unfortunately that's not a workable solution for us. We would like to have a way to build a single module as per other discussion threads. What would be a way forward to implement that upstream? I still think that we can use cmake to build a std.modulemap file which wraps the content of the default modulemap.

@ldionne
Copy link
Member Author

ldionne commented Sep 3, 2024

Thanks for clarifying. Unfortunately that's not a workable solution for us. We would like to have a way to build a single module as per other discussion threads. What would be a way forward to implement that upstream? I still think that we can use cmake to build a std.modulemap file which wraps the content of the default modulemap.

A single module can never work. For example libc++'s math.h includes the base system's math.h header, but other parts of the system module will include e.g. libc++'s stddef.h. So if libc++'s math.h and libc++'s stddef.h are in the same module, we automatically have a cycle. This doesn't show up unless you have a properly modularized SDK, which might be why this hasn't been a problem for you so far.

While we can't have the C compatibility headers in a single top-level module, I actually don't see why we couldn't have the other libc++ headers in a single top-level module. I think that would be doable.

@ian-twilightcoder
Copy link
Contributor

While we can't have the C compatibility headers in a single top-level module, I actually don't see why we couldn't have the other libc++ headers in a single top-level module. I think that would be doable.

I tried that, and the public c++ headers were too tangled with the c headers to make that practical. But maybe you'll have better luck.

I'm not sure what people are trying to do with modules outside of Swift/Apple environments. There are very few other environments that have any module support at the libc layer so it doesn't really make sense to try to use the libc++ module. I think most people just want pch...

@vgvassilev
Copy link
Contributor

vgvassilev commented Sep 3, 2024

Thanks for clarifying. Unfortunately that's not a workable solution for us. We would like to have a way to build a single module as per other discussion threads. What would be a way forward to implement that upstream? I still think that we can use cmake to build a std.modulemap file which wraps the content of the default modulemap.

A single module can never work. For example libc++'s math.h includes the base system's math.h header, but other parts of the system module will include e.g. libc++'s stddef.h. So if libc++'s math.h and libc++'s stddef.h are in the same module, we automatically have a cycle. This doesn't show up unless you have a properly modularized SDK, which might be why this hasn't been a problem for you so far.

I am stuck understanding this. We do have one of the best modularized framework infrastructure (claim is entirely mine here). I do not see the problem you are describing. The one I think you might refer to is/was solved by [no_undeclared_includes]. I cannot reproduce the setup you are referring to and this never was a problem with the old approach. Can you build a reproducer for me?

While we can't have the C compatibility headers in a single top-level module, I actually don't see why we couldn't have the other libc++ headers in a single top-level module. I think that would be doable.

I am not sure I follow but how single top-level module per header file does not fall into your description for a cycle above?

I'm not sure what people are trying to do with modules outside of Swift/Apple environments. There are very few other environments that have any module support at the libc layer so it doesn't really make sense to try to use the libc++ module. I think most people just want pch...

We do need modules because PCH does not scale beyond 512 Mb due to the offsets in the AST reader.

I'd like to extend that discussion by pinging @ChuanqiXu9, @zygoloid, @Bigcheese and @bcardosolopes. There must be something that I or you guys are missing...

@zygoloid
Copy link
Collaborator

zygoloid commented Sep 3, 2024

A single module can never work. For example libc++'s math.h includes the base system's math.h header, but other parts of the system module will include e.g. libc++'s stddef.h. So if libc++'s math.h and libc++'s stddef.h are in the same module, we automatically have a cycle.

This sounds like a problem in how the modules are set up for the base system. Stuff that libc++ depends on cannot be in the same module as stuff that depends on libc++. That would be broken layering, and splitting up the libc++ module would be working around that, not fixing it.

@zygoloid
Copy link
Collaborator

zygoloid commented Sep 3, 2024

This sounds like a problem in how the modules are set up for the base system. Stuff that libc++ depends on cannot be in the same module as stuff that depends on libc++. That would be broken layering, and splitting up the libc++ module would be working around that, not fixing it.

The libc++ wrapper headers are presumably going to make fixing this hard. Potentially one path forward here might be to set up the base system module so that its references to (for example) "stddef.h" include the "stddef.h" from the base system, not the wrapper header from libc++. I don't recall if there's a direct way to express that in the module map for the base system currently, but that seems like something we could add if not.

[Edit:] Maybe that'd cause problems too. If libc++'s wrapper header is (for example) undefining macros from stddef.h, then it'd be bad for the base system's math.h to include its own stddef.h directly because those macros would then leak, not from stddef.h, but from math.h. Maybe another option: split the base system up into separate top-level modules for parts that are below libc++ and parts that are above libc++, and do not define modules in C++ for the headers that are wrapped by libc++ (allowing the libc++ modules to fully handle those headers) -- or treat those headers as textual headers.

@ian-twilightcoder
Copy link
Contributor

There's just no good way to deal with #include_next in modules other than every such header has to be its own module. Which means all of the cstdlib headers whether they're in the base library, clang, or libc++. And once you've done that, you have to modularize all the way down or you get a plethora of issues from including non-modular headers from modular headers.

Modules aren't a good solution to a size deficiency in PCH, it sounds like that just needs to be fixed.

@ian-twilightcoder
Copy link
Contributor

But also, undef'ing doesn't work in modules. In other words, you can't import and module and then start undef'ing its macros. They're still going to be there for everyone else that imports the original module even if they import your module. At least I'm pretty sure the undef will only have an effect for your module...

@zygoloid
Copy link
Collaborator

zygoloid commented Sep 3, 2024

It's possible that #undefing only works with local submodule visibility enabled, but it does work -- if module A exports a macro, and module B imports A and undefs the macro, then importing both in either order results in the macro not being visible. See https://clang.llvm.org/docs/Modules.html#macros for the details (this is also the model that we merged into C++20 header units).

@ian-twilightcoder
Copy link
Contributor

Hmm, we had to make __undef_macros a textual header to get that to work.

@zygoloid
Copy link
Collaborator

zygoloid commented Sep 3, 2024

There's just no good way to deal with #include_next in modules other than every such header has to be its own module.

For what it's worth, what we do in Google's modules deployment is to have a layer of wrapper headers that sits between the application code and the base system (libc, libc++, etc), and we build a module for that layer of wrapper headers (and nothing below it). That works great for us. So there are other options here :)

Modules aren't a good solution to a size deficiency in PCH, it sounds like that just needs to be fixed.

This isn't just a size issue. Various things in Clang scale linearly in the number of PCM files loaded. The C++ standard library currently has over 120 public headers (including both <cFOO> and <FOO.h> headers) so having separate top-level modules (and therefore separate PCM files) for each is likely to have a significant impact on performance.

Perhaps something that would be interesting to explore would be putting multiple top-level modules in the same PCM file. While splitting a single top-level module across PCM files has some issues, I don't think there's any real problems (other than some hardcoded assumptions and the default behavior of various PCM search mechanisms) with putting multiple modules in the same PCM file.

If the base system's module map could declare that the std module should be built as part of building the base system, and included in the same PCM file, then that could address the issues here, and we could simply allow cycles in that case. (Though for that to really work well you also want to turn on local submodule visibility, which IIRC the Apple base system hasn't done yet.)

@vgvassilev
Copy link
Contributor

There's just no good way to deal with #include_next in modules other than every such header has to be its own module. Which means all of the cstdlib headers whether they're in the base library, clang, or libc++. And once you've done that, you have to modularize all the way down or you get a plethora of issues from including non-modular headers from modular headers.

There is a good way to deal with this: [no_undeclared_includes], however, you classified it as a hack which is subjective.

Modules aren't a good solution to a size deficiency in PCH, it sounds like that just needs to be fixed.

Yes, they are. We cannot fix the size deficiency in the PCH because it is a space constraint and will penalize people with smaller PCH sizes. Modules give us layering pretty much the same reason for why Apple uses it (along with other things). I will keep asking for a case of a cycle which is not fixed by [no_undeclared_includes].

@ian-twilightcoder
Copy link
Contributor

ian-twilightcoder commented Sep 3, 2024

There's just no good way to deal with #include_next in modules other than every such header has to be its own module.

For what it's worth, what we do in Google's modules deployment is to have a layer of wrapper headers that sits between the application code and the base system (libc, libc++, etc), and we build a module for that layer of wrapper headers (and nothing below it). That works great for us. So there are other options here :)

Modules aren't a good solution to a size deficiency in PCH, it sounds like that just needs to be fixed.

This isn't just a size issue. Various things in Clang scale linearly in the number of PCM files loaded. The C++ standard library currently has over 120 public headers (including both <cFOO> and <FOO.h> headers) so having separate top-level modules (and therefore separate PCM files) for each is likely to have a significant impact on performance.

Yes the current setup is pretty crummy and really slow too. It's not only the public headers that have their own top level modules, but all of the implementation headers too. We've explored paring that down and can get it under 200 algorithmically, but Louis is looking at doing it by hand and fixing some unnecessary dependencies along the way.

Perhaps something that would be interesting to explore would be putting multiple top-level modules in the same PCM file. While splitting a single top-level module across PCM files has some issues, I don't think there's any real problems (other than some hardcoded assumptions and the default behavior of various PCM search mechanisms) with putting multiple modules in the same PCM file.

If the base system's module map could declare that the std module should be built as part of building the base system, and included in the same PCM file, then that could address the issues here, and we could simply allow cycles in that case. (Though for that to really work well you also want to turn on local submodule visibility, which IIRC the Apple base system hasn't done yet.)

That is an interesting idea. Yeah Apple doesn't have LSV on because it doesn't support Objective-C (it doesn't see ObjC categories or protocols so pretty much nothing compiles).

@ian-twilightcoder
Copy link
Contributor

Generally speaking [no_undeclared_includes] breaks #include. e.g. it would make an include of <stdatomic.h> from an OS header do the wrong thing in C++ mode and not include <atomic>. There are plenty of other C++ correctness issues like that one, as I recall there are some other libc++ headers that don't include_next. I know we saw a different issue with <stdint.h> but I can't remember the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants