Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LambdaLoadTest hang? in Runtime.gc() #20559

Open
pshipton opened this issue Nov 11, 2024 · 4 comments
Open

LambdaLoadTest hang? in Runtime.gc() #20559

pshipton opened this issue Nov 11, 2024 · 4 comments

Comments

@pshipton
Copy link
Member

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_special.system_ppc64_aix_Personal_testList_2/134 - p8-java1-ibm07
LambdaLoadTest_special_J9_5m_8 -Xgcpolicy:gencon -Xshareclasses -Xjit -Xnocompressedrefs

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk8_j9_special.system_ppc64_aix_Personal_testList_2/134/system_test_output.tar.gz
**FAILED** Process LT has timed out

There are multiple load-* threads waiting in Runtime.gc().

    Java stack frames:
     bp: 0x00000100266d4ed0  method: void java/lang/Runtime.gc()  (Native Method)
      objects: 0xa00000000035850
     bp: 0x00000100266d4f98  method: void java/lang/System.gc()  source: System.java:394
      objects: <no objects in this frame>
     bp: 0x00000100266d5000  method: void net/adoptopenjdk/test/lambda/TestLambdaCapture.testCaptureReturnedLambda()  source: TestLambdaCapture.java:135
      objects: 0xa000001d81d38d0 0xa000000004268d8 0xa000001d81d3858 0xa000001d81d37a8
@pshipton
Copy link
Member Author

@dmitripivkine

Copy link

Issue Number: 20559
Status: Open
Recommended Components: comp:gc, comp:vm, comp:test
Recommended Assignees: pshipton, jasonfengj9, llxia

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Nov 11, 2024

This thread got request to execute SystemGC and trying to acquire Exclusive Access:

!stack 0x1002505b100	!j9vmthread 0x1002505b100	!j9thread 0x1002504b888	tid 0x980291 (9962129) // (load-4)
!j9vmthread 0x1002505b100 publicFlags=1020 privateFlags=0 inNative=0 // load-4

16:10:21.075646309 *0x1002505B100 j9vm.368             Event       Posted Halt request to 1 threads.
16:10:21.075647719  0x1002505B100 j9vm.369             Event       Waiting for 1 thread halt responses.
17:15:05.042548194 *0x100224E2A00 j9vm.243             Exit       <J9SignalAsyncEvent returning 0
17:15:05.042548489  0x100224E2A00 j9vm.242             Entry      >J9SignalAsyncEvent targetThread=0x10025099a00 handlerKey=2
17:15:05.042548731  0x100224E2A00 j9vm.243             Exit       <J9SignalAsyncEvent returning 0

However there is another thread has set Halt flag (0x01) but still keeping VM Access (0x20) and prevents taking Exclusive:

	!stack 0x10025099a00	!j9vmthread 0x10025099a00	!j9thread 0x100250898c8	tid 0x32402cf (52691663) // (load-10)
    !j9vmthread 0x10025099a00 publicFlags=21 privateFlags=0 inNative=0 // load-10

Looking to this threads is dbx:

$t47    run                          running  52691663   u   no   sys
(dbx) thread current 47
(dbx) where
---------
 $r0:0x090000000481148c  $stkp:0x00000100259009b0   $toc:0x08001000a01ca050
 $r3:0x0000010025099a00    $r4:0x000001001218ffcc    $r5:0x00000100121e2ea8
 $r6:0x0000000000000000    $r7:0x0000000000000000    $r8:0x0a000001d90a2778
 $r9:0x00000000004d2a10   $r10:0x0900000004461ca4   $r11:0x0000010022061aa8
$r12:0x000001001218ffbc   $r13:0x000001002590a800   $r14:0x0000010025aa87b8
$r15:0x0000010025099a00   $r16:0x0000010021fcb658   $r17:0xffffffffffffffff
$r18:0x0000010025900bf0   $r19:0x420000842509b678   $r20:0x09000000056b62bc
$r21:0x00000100224113e0   $r22:0x0000010025900c00   $r23:0x0000000000000001
$r24:0x09000000056c0168   $r25:0x09001000a0760b90   $r26:0x0a000000004d2a10
$r27:0x0a000001d90a2748   $r28:0x0a000001d90a2748   $r29:0x0a000000004d2a10
$r30:0x0a0000000026e860   $r31:0x0a000001d90a2778
$iar:0x090000000448d100   $msr:0x800000000000d032    $cr:0x0000000044000280
$link:0x0900000004461d60   $ctr:0x0900000004461ca4   $xer:0x0000000020000010
         Condition status = 0:g 1:g 5:e 6:l
       [unset $noflregs to view floating point registers]
       [unset $novregs to view vector registers]
       [unset $novsregs to view vector scalar registers]

0 old_slow_jitRetranslateMethod(0x10025099a00) at 0x90000000448d100

So, looks like thread in old_slow_jitRetranslateMethod() did not drop VM Access and caused the hang.
@hzongaro FYI

@gacholio
Copy link
Contributor

I have not looked at the code, but the usual cause of issues like this is a monitor - specifically that VM access should be treated as a monitor enter for ordering purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants