Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ClassCastException in LMAX Disruptor 3 initialization #2768

Merged
merged 5 commits into from
Jul 26, 2024

Conversation

jackshirazi
Copy link
Contributor

During startup with an agent, the resulting object can be in the wrong classloader leading to a ClassCastException. Catching that falls back to the old behaviour which works fine

The new behaviour was introduced in 2.23.0 with #2112

During startup with an agent, the resulting object can be in the wrong classloader leading to a ClassCastException. Catching that falls back to the old behaviour which works fine

The new behaviour was introduced in 2.23.0 with apache#2112
Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackshirazi,

Thank You for bringing this problem to our attention.

Instead of ignoring the CCE, I would rather solve its cause.

While LoaderUtil might be useful to load classes specified by the user, in this case its usage is not appropriate: we exactly know were the class is and we don't need to look in the TCCL.

Can you replace the LoaderUtil invocation with a simple Class.forName(...).getConstructor()... chain of methods?

A test case would be useful, although in this case we can probably just add a comment not to use LoaderUtil with a link to this issue.

@jackshirazi
Copy link
Contributor Author

jackshirazi commented Jul 25, 2024

We don't use LoaderUtil. The linked issue is this. The full sequence is pretty convoluted

  1. Elastic agent opens a url connection, java.net.URL.openConnection which loads sun.net.www.protocol.http.HttpURLConnection
  2. HttpURLConnection inits it's logger, a jul logger
  3. The command line set -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager so apache logger is used
  4. Because this was loaded from the elastic agent, the current classloader is co.elastic.apm.agent.premain.ShadedClassLoader
  5. ShadedClassLoader loads the log4j classes up to org.apache.logging.log4j.core.async.RingBufferLogEventHandler
  6. The log4j2 class AsyncLoggerDisruptor casts the RingBufferLogEventHandler instance as a EventHandler, but EventHandler was loaded by log4j2 in the app classloader
  7. hence the CCE

Because this is a class loading race condition with the JVM classes using the log4j2 logger before the JVM is initialized (agents kick in at premain), this is complex to solve outside of log4j2, whereas this patch is pretty simple

@ppkarwasz
Copy link
Contributor

Did you try replacing LoaderUtil.newInstanceOf with:

return (EventHandler<RingBufferLogEvent>)
        Class.forName("org.apache.logging.log4j.core.async.RingBufferLogEventHandler")
                .getConstructor()
                .newInstance();

This version should also be more GraalVM-friendly and shouldn't cause a CCE.

@jackshirazi
Copy link
Contributor Author

Again, we're not using LoaderUtil. This is also not in Graal (Graal doesn't yet support agents, and when it does it won't have this kind of issue because it flattens classloading)

Copy link

github-actions bot commented Jul 25, 2024

Job Requested goals Build Tool Version Build Outcome Build Scan®
build-macos-latest clean install 3.9.8 Build Scan PUBLISHED
build-ubuntu-latest clean install 3.9.8 Build Scan PUBLISHED
build-windows-latest clean install 3.9.8 Build Scan PUBLISHED
Generated by gradle/develocity-actions

@ppkarwasz
Copy link
Contributor

Again, we're not using LoaderUtil. This is also not in Graal (Graal doesn't yet support agents, and when it does it won't have this kind of issue because it flattens classloading)

Sorry, I am not referring to the code of your application, but these lines in Log4j Core:

try {
return LoaderUtil.newInstanceOf("org.apache.logging.log4j.core.async.RingBufferLogEventHandler");
} catch (final ReflectiveOperationException | LinkageError e) {
LOGGER.warn("Failed to create event handler for LMAX Disruptor 3.x, trying version 4.x.", e);
}

I think that line 57 constitutes the problem, since it might load the RingBufferLogEventHandler class from a different classloader than the one that loaded AsyncLoggerDisruptor.

@jackshirazi
Copy link
Contributor Author

Thanks, I'll test that

@jackshirazi
Copy link
Contributor Author

that works, sorry took me a while to get what you were suggesting

@vy
Copy link
Member

vy commented Jul 26, 2024

@ppkarwasz, I cannot think of an easy way to test this. The rest LGTM. How shall we proceed?

@vy vy changed the title Add ClassCastException to lmax-4 failure Fix ClassCastException in LMAX Disruptor 3 initialization Jul 26, 2024
@ppkarwasz
Copy link
Contributor

@ppkarwasz, I cannot think of an easy way to test this. The rest LGTM. How shall we proceed?

We should at least add a code comment "Don't use LoaderUtil", otherwise we risk reverting the change. 😉

@vy vy merged commit 0896015 into apache:2.x Jul 26, 2024
9 checks passed
@ppkarwasz ppkarwasz added this to the 2.24.0 milestone Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants