Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: Unify and simplify constructor support #74700

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

keith-packard
Copy link
Collaborator

@keith-packard keith-packard commented Jun 21, 2024

There are two styles of constructor support in wide use. They are signified by which ELF section the function addresses are placed in; the '.ctors' style and the '.init_array' style. Each constructor definition places the address of the function into a separate section using one of those two names (optionally appending a priority value).

The linker gathers these together and prepares data for the runtime to execute them in the correct order. Traditionally, the resulting .ctors section starts with the total count of constructors and ends with a terminating NULL while the .init_array section is simply a list of function pointers.

Keeping these separate was required when the linker and runtime were distributed independently; the runtime would look for both styles and execute them in sequence using slightly different code. A typical toolchain would only end up using one of the two styles, resulting in one of the two sections containing no function pointers. Zephyr had adopted this method as a fairly straightforward port of existing practice.

However, as Zephyr delivers both linker scripts and runtime in the same package, we can eliminate the redundancy and merge both .ctors and .init_array output from the compiler into a single list of initialization functions. This saves memory in ROM by eliminating the unnecessary .ctors count/NULL values, and saves code space by eliminating the .ctors handling.

Finally, all of this mechanism also applies to the ARC environment which had used only the .ctors sections but in an .init_array style.

See #74682

@keith-packard
Copy link
Collaborator Author

Ok, I added a check in the linker script which will fail the build if the application wants to use constructors but the option to include the code to call them is disabled. With that, users can "safely" disable constructors, knowing that they won't accidentally end up needing this option.

@keith-packard
Copy link
Collaborator Author

I've recreated this PR as a simplification of the existing constructor implementation; this can wait to be merged after the release.

@keith-packard keith-packard removed the DNM This PR should not be merged (Do Not Merge) label Jul 5, 2024
@aescolar aescolar dismissed their stale review July 6, 2024 14:54

Content changed.

@cfriedt
Copy link
Member

cfriedt commented Jul 6, 2024

This looks good to me. Is it already covered in a different branch

@keith-packard
Copy link
Collaborator Author

This looks good to me. Is it already covered in a different branch

I'm not sure which branch that would be -- #74682 fixed the bugs with this stuff; this patch is just a cleanup making things simpler -- merging .ctors/.init_array into a single form and unifying ARC MDWT support. The result is a bunch of code savings in both source and binary, but the semantics should be equivalent.

@MaureenHelm
Copy link
Member

@keith-packard coming back to this?

@keith-packard
Copy link
Collaborator Author

Thanks for the reminder; I've been busy with picolibc recently and haven't been monitoring my zephyr PRs. I've rebased and pushed; it should 'just work' still; there weren't even any conflicts (!).

@keith-packard
Copy link
Collaborator Author

Ok, this is ready for review now.

Copy link
Member

@aescolar aescolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @keith-packard
I'm still in favor of the clean up & optimization. But I think the if'ing needs to remain as is (see comments)

@@ -142,13 +142,6 @@ SECTIONS {
#include <snippets-rodata.ld>
#include <zephyr/linker/kobject-rom.ld>

#if defined(CONFIG_CPP) && !defined(CONFIG_STATIC_INIT_GNU) && defined(__MWDT_LINKER_CMD__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to have the arc guys (@evgeniy-paltsev @ruuddw ) confirm things keep working fine for this toolchain. I'm inclined to set a "dnm" until they do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, my first intuition is that this should remain for MetaWare - but need to check with our toolchain or other people that know more about this. I expect the _fctors/_ectors are required for the MetaWare runtime library to handle the constructors/destructors. Not sure that can be replaced by other Zephyr initialization code without other consequences/side effects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think that's the best plan -- I think this should work correctly even on MetaWare as we're completely replacing the whole constructor path with Zephyr-specific code, effectively bypassing the MetaWare runtime. But, as you say, we need to have the arc people review this before merging.

Kconfig.zephyr Outdated
@@ -434,6 +434,7 @@ config CODING_GUIDELINE_CHECK
config NATIVE_LIBC
bool
select FULL_LIBC_SUPPORTED
select STATIC_INIT_GNU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note NATIVE_LIBC is always set for native_posix like targets (NATIVE_APPLICATION) , and sometimes for native_sim targets (NATIVE_LIBRARY).
(in main) STATIC_INIT_GNU defaults to y when building for native_sim targets, and we had disabled it on purpose for native_posix like targets (NATIVE_APPLICATION) due to #75407 (unless CPP=y).

The idea of the later, was to just keep having the constructors initialization be done in the linux executable init step for native_posix (As that works in general, and is the right time for any linux library linked with zephyr.exe, albeit being conceptually too early for possible embedded zephyr library contructors)

Note that with this PR as is, we'd be reintroducing the issue which caused #75407


I'm not sure about what you mentioned in the commit msg. If we are building for native_sim, the glibc init should be run irrespectively in the runner/linux exe init context.
If we are building for native_posix, we'd be disabling this feature by default (and therefore running it also during the linux exe init context). Or?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- that's a big help. It sounds like we need a new mode which explicitly constructs a glibc-compatible constructor array, and then only use that when NATIVE_LIBC is enabled.

I'm a bit concerned that we're mixing Zephyr-domain constructors with Linux-domain constructors. I wonder if we should find a way to ensure the Linux ones are always run first?

@@ -1,76 +1,49 @@
/* SPDX-License-Identifier: Apache-2.0 */

#ifdef CONFIG_STATIC_INIT_GNU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Continuing with the previous comment)
I think we should leave the conditions to choose this this as they are based on CONFIG_STATIC_INIT_GNU (together with the check)

diff --git a/Kconfig.zephyr b/Kconfig.zephyr
index 9f3e937f849..0e5c665b21d 100644
--- a/Kconfig.zephyr
+++ b/Kconfig.zephyr
@@ -434,7 +434,6 @@ config CODING_GUIDELINE_CHECK
 config NATIVE_LIBC
        bool
        select FULL_LIBC_SUPPORTED
-       select STATIC_INIT_GNU
        help
          Zephyr will use the host system C library.
 
diff --git a/include/zephyr/linker/common-rom/common-rom-init.ld b/include/zephyr/linker/common-rom/common-rom-init.ld
index 5523882a109..80587af1d1b 100644
--- a/include/zephyr/linker/common-rom/common-rom-init.ld
+++ b/include/zephyr/linker/common-rom/common-rom-init.ld
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: Apache-2.0 */
 
-#ifdef CONFIG_TOOLCHAIN_SUPPORTS_STATIC_INIT_GNU
+#ifdef CONFIG_STATIC_INIT_GNU
        SECTION_PROLOGUE(init_array,,)
        {
                /*
@@ -14,36 +14,19 @@
                        SORT_BY_INIT_PRIORITY(.ctors.*)))
                KEEP (*(.init_array .ctors))
                __zephyr_init_array_end = .;
-#ifdef CONFIG_NATIVE_LIBC
-               /*
-                 * The __CTOR_LIST__ and __CTOR_END__ symbols are always defined
-                 * to result in an empty list. This is necessary to fix an issue
-                 * where the glibc process initialization code on native_posix
-                 * platforms calls constructors before Zephyr loads (issue #39347).
-                */
-               __CTOR_LIST__ = .;
-#ifdef CONFIG_64BIT
-               QUAD(0)
-               QUAD(0)
-#else
-               LONG(0)
-               LONG(0)
-#endif
-               __CTOR_END__ = .;
-#endif
-               /*
-               * Similar to the schenanigans required for the __CTOR_LIST__ and
-               * __CTOR_END__ symbols we define __init_array_start and __init_array_end
-               * to the same address to define an empty list. This prevents the glibc
-               * startup code from calling any global constructors before Zephyr loads.
-               */
-               __init_array_start = .;
-               __init_array_end = .;
        } GROUP_ROM_LINK_IN(RAMABLE_REGION, ROMABLE_REGION)
 
-#ifndef CONFIG_STATIC_INIT_GNU
-       ASSERT(__zephyr_init_array_start == __zephyr_init_array_end,
-              "GNU-style constructors required but STATIC_INIT_GNU not enabled")
-#endif
+#elif defined(CONFIG_TOOLCHAIN_SUPPORTS_STATIC_INIT_GNU) && !defined(CONFIG_NATIVE_APPLICATION)
+       /*
+        * If the code to invoke constructors is not enabled,
+        * make sure there aren't any in the application
+        */
+       SECTION_PROLOGUE(init_array,,)
+       {
+               KEEP(*(SORT_BY_NAME(".ctors*")))
+               KEEP(*(SORT_BY_NAME(".init_array*")))
+       } GROUP_ROM_LINK_IN(RAMABLE_REGION, ROMABLE_REGION)
 
+       ASSERT (SIZEOF(init_array) == 0,
+               "GNU-style constructors required but STATIC_INIT_GNU not enabled")
 #endif

Note I don't think we need anymore the code that creates the empty lists for native_posix
as with the current code I don't think we have anymore those problems (#39347 #36858)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to switch from the Zephyr names to native C library names and then always include constructors when NATIVE_LIBC is enabled. That seems pretty simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we need a new mode which explicitly constructs a glibc-compatible constructor array, and then only use that when NATIVE_LIBC is enabled.

I don't think we need to. Why do you say that?
AFAIR that was the reason why we have had that bug with the double run of the C++ constructors long ago. I think the original direction you had in this PR was fine, except for the condition when it was enabled.

I think I need to switch from the Zephyr names to native C library names and then always include constructors when NATIVE_LIBC is enabled.

If we do that, won't we be reintroducing #75407 or the counter bug?

A bit more background:

When we build for native simulator targets (NATIVE_LIBRARY), the Zephyr/embedded SW side of things goes first thru a partial linking on its own without the host C library. There we create Zephyr's constructor list which we will go thru at Zephyr init.
(Almost) All embedded symbols are then localized so later linker stages will not mistakenly grab them.
In a later pass, the native_simulator runner itself will be linked together with the host C library, and with whatever other host libraries. That will end up with its own separate Linux compliant constructor list which will be executed during the zephyr.exe init, and which is not a concern for Zephyr. i.e. We end up with one constructor list for the host, executed in the runner initialization context; and another for Zephyr, executed by Zephyr in Zephyr context.

For native_posix like targets (NATIVE_APPLICATION), we do the linking of Zephyr, with the "runner", with the host C library and any other host libraries, in one step. Here the Zephyr linker script expands the host linker script (so a Linux compliant constructor list will be built by the host linker script by default)
In this case, because we have a single link stage, and the linker sees all constructors at the same time, both of host libraries and embedded ones, we have 2 options
a) Do not create our own Zephyr constructor list, and do not have Zephyr run them (just let the host linker script build it and the executable initialization run thru it), or
b) Create a constructor list which Zephyr runs, and ensure the one for the host does not end up with a list of duplicates.
Neither is specially good, as in some cases you need to run the constructors at the executable init because a host library needs it, and in other cases you may need to run it at Zephyr init, because the constructor is from the embedded code and requires enough of Zephyr to be up.
Today in main, by default, we do a) unless CONFIG_CPP=y. So we ensure we don't break host C libraries constructors (so we don't have, say, this bug #75407). For CONFIG_CPP we kept b) just because it is what it was done in that case. In most cases, it will be fine to do a) even if one has embedded C++ constructors, and if not the user can set CONFIG_STATIC_INIT_GNU as they need.

That all being said as native_posix and NATIVE_APPLICATION is being deprecated, I think we can just keep that behavior for NATIVE_APPLICATION by now.

And for NATIVE_LIBRARY we can just behave exactly like for embedded targets. No need to treat it differently even if the embedded code will be eventually linked with the host C library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so the only place we need to worry about mixing native and Zephyr initializers is with NATIVE_APPLICATION? And that's because there's only a single linker stage? I assume Zephyr is still generating the linker script for that? If so, will that script need to generate a native-compatible initializer list? Which format should it use? Presumably "both", using the old style (count + inits + null) for .ctors and the new style (inits) for .init_array?

If all of my assumptions are correct, then the current patch is nearly correct; it just needs to have the selection between Zephyr and Native intializers corrected so that Native initializers are used with NATIVE_APPLICATION and otherwise we always get Zephyr initializers?

In any case, we'll still need help from someone who understands MetaWare to make sure we can use the same initializers as all other toolchains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so the only place we need to worry about mixing native and Zephyr initializers is with NATIVE_APPLICATION? And that's because there's only a single linker stage?

yep & yep

I assume Zephyr is still generating the linker script for that?

Partly. In this case Zephyr's linker script is an extension to the default host linker script (see below).

If so, will that script need to generate a native-compatible initializer list?

Not necessarily. If for that case we do nothing with .ctors & .init_array's, the host linker script will still generate a fully correct list for the host executable init.
In this NATIVE_APPLICATION case we only need to generate this constructors list if we want to have "our own" to be run in Zephyr's init, wrapped in our own __zephyr_init_array_start/end, instead of the host one.
When we generate our own .ctors list, the host one generated by the host linker script will be empty (to be precise, it will not have whatever we have already picked into our list).

Meaning, for NATIVE_APPLICATION, if we do nothing for .ctors in the Zephyr linked script, and do not build in the constructor initialization, the host linker script will still build a correct list, and during the executable init it will be run thru.


Just for reference, this is the relevant part of the host linker script in my case:

.preinit_array    :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  }
  .init_array    :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
    KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors))
    PROVIDE_HIDDEN (__init_array_end = .);
  }
  .fini_array    :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*)))
    KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors))
    PROVIDE_HIDDEN (__fini_array_end = .);
  }
  .ctors          :
  {
    /* gcc uses crtbegin.o to find the start of
       the constructors, so we make sure it is
       first.  Because this is a wildcard, it
       doesn't matter if the user does not
       actually link against crtbegin.o; the
       linker won't look for a file to match a
       wildcard.  The wildcard also means that it
       doesn't matter which directory crtbegin.o
       is in.  */
    KEEP (*crtbegin.o(.ctors))
    KEEP (*crtbegin?.o(.ctors))
    /* We don't want to include the .ctor section from
       the crtend.o file until after the sorted ctors.
       The .ctor section from the crtend file contains the
       end of ctors marker and it must be last */
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
    KEEP (*(SORT(.ctors.*)))
    KEEP (*(.ctors))
  }
  .dtors          :
  {
    KEEP (*crtbegin.o(.dtors))
    KEEP (*crtbegin?.o(.dtors))
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
    KEEP (*(SORT(.dtors.*)))
    KEEP (*(.dtors))
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, we'll still need help from someone who understands MetaWare to make sure we can use the same initializers as all other toolchains.

@evgeniy-paltsev is doing some tests to see if the PR works or not. But even if it works 'today', it may break later if MetaWare changes internal structures and/or C/C++ runtime libs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear how MetaWare could even break things -- none of the Zephyr initializer code is something the toolchain has influence in, except for the ELF sections assigned to the initializer addresses (currently either .init_array or .ctors).

In any case, the tests included in this PR should catch any toolchain changes in how initializers are handled by the compiler, so if they work today (and the tests pass), we can be reasonably confident that future changes will not pass unnoticed and Zephyr can adapt as needed. Which isn't really any different from handling toolchain changes anywhere else in the system.

@keith-packard
Copy link
Collaborator Author

Ok, I re-jiggered the constructor selection so that the native C library will run all of them when CONFIG_NATIVE_APPLICATION is selected. Otherwise, Zephyr runs them all. I think that's right? You mentioned something about CPP being relevant, but I couldn't find where the old code had that condition; maybe it's elsewhere in Kconfig?

@aescolar
Copy link
Member

As mentioned here #74700 (comment), setting DNM until @ruuddw or @evgeniy-paltsev confirm it works with the arc MWDT. Either of you feel free to remove it when you do. Or somebody else if you strongly disagree about waiting for that.

@aescolar aescolar added the DNM This PR should not be merged (Do Not Merge) label Aug 25, 2024
@keith-packard
Copy link
Collaborator Author

As mentioned here #74700 (comment), setting DNM until @ruuddw or @evgeniy-paltsev confirm it works with the arc MWDT. Either of you feel free to remove it when you do. Or somebody else if you strongly disagree about waiting for that.

I definitely agree with this plan -- if we're going to clean up this code, we should clean it up everywhere instead of leaving one architecture behind.

Handle both of these sections in a single chunk of code instead of
separately. We don't need to use the legacy .ctors ABI as both
the constructors array and startup logic are managed within a single
link result.

This can now also be used with ARC MWDT which had been using the .ctors
sections but with .init_array semantics. For ARC MWDT, we now always
discard .dtors and .fini sections as Zephyr will never cause global
destructors to execute. Stop discarding .eh_frame sections so that
exception handling works as expected.

When building a NATIVE_APPLICATION, we ask the native C library to run all
of the constructors to ensure any non-Zephyr constructors are run before
main is invoked. It might be "nice" to split the constructors so that the
Zephyr constructors were executed by the Zephyr code while the non-Zephyr
ones were executed by the native C library. I think that could be done if
we knew the pathnames of either the Zephyr or non-Zephyr files. That might
make a good future enhancement.

Signed-off-by: Keith Packard <[email protected]>
Check the order of constructors, making sure those with a specified
priority are called in numeric order after which those without priority are
called.

Signed-off-by: Keith Packard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.