Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boot delay of 60s caused by Magisk #7511

Open
danielm234 opened this issue Nov 14, 2023 · 29 comments
Open

Boot delay of 60s caused by Magisk #7511

danielm234 opened this issue Nov 14, 2023 · 29 comments
Labels
needs investigation Reason of this issue is yet unkown

Comments

@danielm234
Copy link

danielm234 commented Nov 14, 2023

[ 70.503041]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 timing out command, waited 60s
[ 70.503083]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=60s
[ 70.503098]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 Sense Key : 0x7 [current]
[ 70.503110]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 ASC=0x27 ASCQ=0x1
[ 70.503124]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 CDB: opcode=0x2a 2a 00 00 00 00 28 00 00 01 00
[ 70.503138]I[6: swapper/6: 0] blk_update_request: I/O error, dev sdd, sector 320 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[ 70.503153]I[6: swapper/6: 0] Buffer I/O error on dev sdd1, logical block 34, lost async page write
[ 70.503395] [6: init: 1] JBD2: recovery failed
[ 70.503420] [6: init: 1] EXT4-fs (sdd1): error loading journal

last_kmsg_without_magisk.log
last_kmsg_magisk_patched.log

Device: a53x
Android version: Android 13
Magisk version name: Magisk v26.4
Magisk version code: 26401 (568c5187)

Update, new logs with the latest magisk debug version:
Device: a53x
Android version: Android 13
Magisk version name: Magisk debug latest
Magisk version code: 26401 (568c5187)

magisk_log_2023-11-15T20.15.34.log
proc_mounts.log
dmesg.log
last_kmsg.log
logcat_a53_android_13_latest_debug_magisk.log

@mark7717
Copy link

Good catch! I was experiencing the same issue, but couldn't find what it was.
Hope it gets fixed soon.

@yashikada
Copy link

Please try the latest build and see whether it'll solve your problem or not. https://github.com/topjohnwu/Magisk/actions/workflows/build.yml

it test already the latest action. Are you a bot? You write always the same message.

@Unknown78

This comment was marked as off-topic.

@mir4452
Copy link

mir4452 commented Nov 15, 2023

I can confirm same issue.
I've tested it on Android 13, Magisk v24.3, A53 device.

@danielm234
Copy link
Author

danielm234 commented Nov 15, 2023

It seems related to Android 13, as I've not been able to reproduce the issue on Android 12.
Same device

@yujincheng08
Copy link
Collaborator

yujincheng08 commented Nov 15, 2023

[   70.505086] [6:           init:    1] init: Service 'exec 18 (/system/bin/recovery-persist)' (pid 738) exited with status 0 oneshot service took 60.033001 seconds in background
[   70.505104] [6:           init:    1] init: Sending signal 9 to service 'exec 18 (/system/bin/recovery-persist)' (pid 738) process group...
[   70.505284] [6:           init:    1] libprocessgroup: Successfully killed process cgroup uid 1000 pid 738 in 0ms
[   70.505500] [6:           init:    1] init: Service 'exec 19 (/system/bin/sh -c profman \
[   70.505508] [6:           init:    1] init:         --generate-test-profile=/data/misc/speg/test.prof --generate-test-profile-num-dex=0 && \
[   70.505513] [6:           init:    1] init:         test `head -c 8 /data/misc/speg/test.prof | tr -d '\0'` == `/apex/com.samsung.android.spqr/bin/spqr --profile-version || \
[   70.505519] [6:           init:    1] init:         echo bad` || setprop com.samsung.speg.disable true; rm /data/misc/speg/test.prof)' (pid 739) exited with status 0 oneshot service took 60.030998 seconds in background
[   70.505530] [6:           init:    1] init: Sending signal 9 to service 'exec 19 (/system/bin/sh -c profman \
[   70.505535] [6:           init:    1] init:         --generate-test-profile=/data/misc/speg/test.prof --generate-test-profile-num-dex=0 && \
[   70.505540] [6:           init:    1] init:         test `head -c 8 /data/misc/speg/test.prof | tr -d '\0'` == `/apex/com.samsung.android.spqr/bin/spqr --profile-version || \
[   70.505545] [6:           init:    1] init:         echo bad` || setprop com.samsung.speg.disable true; rm /data/misc/speg/test.prof)' (pid 739) process group...

@yujincheng08
Copy link
Collaborator

  1. You are not using debug Magisk; I cannot see any debug log in your kmsg.
  2. Are you sure it won't block for 60s without Magisk? Looks like it's caused by a data corruption, not by Magisk.

@TotallyAvailable

This comment was marked as off-topic.

@Unknown78

This comment was marked as off-topic.

@danielm234
Copy link
Author

danielm234 commented Nov 15, 2023

@Unknown78 , @topjohnwu here are all the logs:
Device: a53x
Latest debug magisk version for kernel and app

last_kmsg.log
logcat_a53_android_13_latest_debug_magisk.log
magisk_log_2023-11-15T20.15.34.log
proc_mounts.log
dmesg.log

@danielm234
Copy link
Author

danielm234 commented Nov 15, 2023

  1. You are not using debug Magisk; I cannot see any debug log in your kmsg.
  2. Are you sure it won't block for 60s without Magisk? Looks like it's caused by a data corruption, not by Magisk.
  1. Sent new logs
  2. @yujincheng08 , yes, I'm sure 100%, I was flashing a boot image without magisk and it works fine, as soon as I flash magisk patched boot, it produces 60 sec delay, and again when I flash non-patched boot image, the delay dissappears.

If needed, we can arrange a TeamViewer session, so I can show you.

@TotallyAvailable

This comment was marked as off-topic.

@danielm234
Copy link
Author

Unfortunately, my issue was closed two times before this; I was quite close to giving up.
But Magisk is such an amazing tool, and I appreciate all your hard work.

@mir4452
Copy link

mir4452 commented Nov 15, 2023

Given that I'm reading v24.3 being affected as well, you might not even need detailed logs (besides a case specific work around?), that might just be spot on depending on when it started ? Might be interesting IF that's the case.

Just tested it on Magisk v24.0 / Android 13, delay of 60 seconds still reproduces.
But on Android 12 there is no issue, it happens only on Android 13

If someone can provide the Magisk v24 debug apk, I can provide correct logs

@TotallyAvailable

This comment was marked as off-topic.

@Unknown78

This comment was marked as off-topic.

@TotallyAvailable

This comment was marked as off-topic.

@danielm234
Copy link
Author

danielm234 commented Nov 16, 2023

Update @topjohnwu @yujincheng08 @TotallyAvailable @Unknown78

[ 70.257910] [6: init: 1] init: Command 'mount ext4 /dev/block/by-name/cpefs /mnt/vendor/cpefs nosuid nodev noatime noauto_da_alloc,discard,journal_checksum,data=ordered,errors=panic' action=post-fs-data (/vendor/etc/init/init.baseband.rc:13) took 60007ms and failed: mount() failed: I/O error

adb shell
su
mount -o rw,remount /vendor
rm /vendor/etc/init/init.baseband.rc

I've removed the above file, and there are no more delays on the boot.
Of course, this is not a fix, but at least it points to a possible problem.

init.baseband.rc is exactly same file on Android 12 and Android 13, but on android 13 with Magisk it causes 60 sec delay
init.baseband.rc.log

@danielm234
Copy link
Author

danielm234 commented Nov 16, 2023

Since in /vendor/etc/fstab.s5e8825 there is already a following line

/dev/block/by-name/cpefs /mnt/vendor/cpefs ext4 noatime,nosuid,nodev,noauto_da_alloc,discard,journal_checksum,data=ordered,errors=panic wait,check,nofail

Is it really necessary to have a similar line in init.baseband.rc (because the bottom line makes a 60s delay with magisk)

mkdir /mnt/vendor/cpefs 0771 radio system
mount ext4 /dev/block/by-name/cpefs /mnt/vendor/cpefs nosuid nodev noatime noauto_da_alloc,discard,journal_checksum,data=ordered,errors=panic

@TotallyAvailable

This comment was marked as off-topic.

@danielm234
Copy link
Author

danielm234 commented Nov 16, 2023

Device wasn't used before, its stock android without any changes,and with/without magisk patch

Output when trying to execute this command:
mount -t ext4 /dev/block/by-name/cpefs /mnt/vendor/cpefs -o nosuid,nodev,noatime,noauto_da_alloc,discard,journal_checksum,data=ordered,errors=panic

'/dev/block/by-name/cpefs' is read-only
60seconds delay happens
mount: '/dev/block/by-name/cpefs'->'/mnt/vendor/cpefs': I/O error

If I flash the boot image without Magisk, this error won't happen.

Update:
On android 12 /dev/block/by-name/cpefs is RW
on android 13 /dev/block/by-name/cpefs is RO

@TotallyAvailable
Copy link

TotallyAvailable commented Nov 16, 2023

If it wasn't for the fact that it appears on v24.3 as well, the narrowing down would've been somewhat easier. The Device isn't brand new either.
A long known (in Custom Recovery Circles) Samsung related issue popping into existence here.
Android 13 has apparently been out for the A53 line for a while as well, so

Output when trying to execute this command: mount -t ext4 /dev/block/by-name/cpefs /mnt/vendor/cpefs -o nosuid,nodev,noatime,noauto_da_alloc,discard,journal_checksum,data=ordered,errors=panic

Just a thought - what if you add " wait,check,nofail" to the end of that command like in the fstab entry, assuming that one of the 2 has to actually work.

Maybe even add it to the baseband.rc entry instead of deleting that line completely and reboot...what could possibly go wrong here.

And the RO only seems given: took 60007ms and failed: mount()
What changes, that shouldn't be there. And been in discussions since A8.
(I'm far beyond my expertise anyway but sometimes you'll fall over the answer, walking through the dark.)

You know the feeling missing the most obvious?
"Old" Magisk Version
"Old" Device
"Old" OS
Actual old issue
Why now
It's not going to be the November Security Patch is it ¯\_(ツ)_/¯

(Totally not this Issue popping up in Google while searching for "jbd2 recovery failed", I'm sure I'll help myself with that one)
Actual interesting read while waiting: https://android.googlesource.com/kernel/msm/+/android-msm-3.9-usb-and-mmc-hacks/fs/jbd2/journal.c
Quite old but with those lovely comments as guidance you'll make your way

Anyway probably time to just wait for someone with the required knowledge.

@danielm234
Copy link
Author

danielm234 commented Nov 16, 2023

Just a thought - what if you add " wait,check,nofail" to the end of that command like in the fstab entry, assuming that one of the 2 has to actually work.

Maybe even add it to the baseband.rc entry instead of deleting that line completely and reboot...what could possibly go wrong here.

This actually removes 60s delay, but the cpefs mount still doesn't work, and it is not accessible in /mnt/vendor/cpefs

It's not going to be the November Security Patch is it ¯_(ツ)_/¯

Just tried the older Android 13 version, same behaviour, so its not latest security patch

@TotallyAvailable
Copy link

TotallyAvailable commented Nov 17, 2023

While progress, there's obviously more going on. The wrong recent Security Patch guess doesn't help either.
While still left with the "Why now" question, my lack of insight into the Samsung specific rooting stuff certainly won't carry this discussion any further I feel.
While the errors in the logs are clearly visible, looking at the stuff my fully functional device spits out...pretty ugly as well.

I still want that one more step though, just piecing together what was already mentioned.
Ignoring the "same file in A12&13", going 1 step back

mkdir /mnt/vendor/cpefs 0771 radio system

(This bit right here.
As A13 returns -> read only, 60s stuck until timeout, returning mount() fail...
Can't confirm a successful mount if the expected writing check to it fails ?)

Still doesn't answer the initial question.
And while I highly doubt it looking at a certain init.rc file and first_stage_init/mount (never look at the most obvious stuff first), that whole /fs_mgr/ directory sure got me (personally) a bit deeper into it, driven by curiosity and the earlier mentioned "falling over the answer...", so maybe next time. (As actual final edit on this one - it's not even the lack of familiarity and actual understanding with the codebases involved, there are plenty of "trigger points", without a similar enough device on hand though, massive skill point decrease. Access at the most out of the blue moment when it hit me ? Instant solve, maybe...if that's your way of making it through life as well, you'll know the feeling. Also as off topic as it will get. Yes I know how discussions here usually progress towards a solved Issue.)

// If relative_timeout is std::chrono::milliseconds::max(), then the wait will block indefinitely.

Can apparently always get worse than "just" 60s
And we haven't even touched the whole "/apex/com.samsung.android.spqr/bin/spqr --profile-version" & calling for rm on /data/misc/speg/test.pro, hints.

And with no one else jumping in on possible causes or solutions they found... that's not really something I'd like to do either, knowing how much it sucks to see stuff potentially end without a (quick) solution.
Given the right Version to avoid bot closure + debug logs in the first post, there's obviously hope.

@danielm234
Copy link
Author

danielm234 commented Nov 17, 2023

mkdir /mnt/vendor/cpefs 0771 radio system
mount ext4 /dev/block/by-name/cpefs /mnt/vendor/cpefs nosuid nodev noatime noauto_da_alloc,discard,journal_checksum,data=ordered,errors=panic

Should this 'cpefs' mount be visible in /proc/mounts?
I cannot find it in /proc/mounts with a patched boot image, and not even with a non-patched boot image.

This could mean that the following line can be easily removed from init.baseband.rc ?

mount ext4 /dev/block/by-name/cpefs /mnt/vendor/cpefs nosuid nodev noatime noauto_da_alloc,discard,journal_checksum,data=ordered,errors=panic

cpefs doesn't ever get mounted on a stock OS, with stock boot. With magisk patched boot it makes the boot delay of 60s, it seems that it can deleted from init.baseband.rc to avoid 60s delay.

@TotallyAvailable
Copy link

So
It works on A12
Given that the file has apparently been the same since atleast A12
A13 and the device have been around for a while
Known OEM and environment
The involved Magisks versions not brand new either
Adding those arguments to the end of the command "works"

Where are all the other users with similar problems ?
Does the device actually work, including all that radio related stuff ?
Why the 60s delay and only once ?

There's the whole thought about "stuff no longer available when rooted" playing into it of course, given that you come across vbmeta, dm-verity and selinux pretty early on.
Which parts of the command might be removed to not end with a recovery or journal error, does it even matter... probably because someone left it there...

@Skizzels
Copy link

[ 70.503041]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 timing out command, waited 60s [ 70.503083]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=60s [ 70.503098]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 Sense Key : 0x7 [current] [ 70.503110]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 ASC=0x27 ASCQ=0x1 [ 70.503124]I[6: swapper/6: 0] sd 0:0:0:3: [sdd] tag#2 CDB: opcode=0x2a 2a 00 00 00 00 28 00 00 01 00 [ 70.503138]I[6: swapper/6: 0] blk_update_request: I/O error, dev sdd, sector 320 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 [ 70.503153]I[6: swapper/6: 0] Buffer I/O error on dev sdd1, logical block 34, lost async page write [ 70.503395] [6: init: 1] JBD2: recovery failed [ 70.503420] [6: init: 1] EXT4-fs (sdd1): error loading journal

last_kmsg_without_magisk.log last_kmsg_magisk_patched.log

Device: a53x Android version: Android 13 Magisk version name: Magisk v26.4 Magisk version code: 26401 (568c5187)

Update, new logs with the latest magisk debug version: Device: a53x Android version: Android 13 Magisk version name: Magisk debug latest Magisk version code: 26401 (568c5187)

magisk_log_2023-11-15T20.15.34.log proc_mounts.log dmesg.log last_kmsg.log logcat_a53_android_13_latest_debug_magisk.log

Yes

@osm0sis osm0sis added the needs investigation Reason of this issue is yet unkown label Apr 30, 2024
Repository owner deleted a comment from tylvn-yizhe Apr 30, 2024
@BigKananga
Copy link

Any news about that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs investigation Reason of this issue is yet unkown
Projects
None yet
Development

No branches or pull requests

10 participants