Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i915/pmu: Wire GuC backend to per-client busyness #5

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

XuBing0
Copy link

@XuBing0 XuBing0 commented Apr 3, 2023

GuC provides engine_id and last_switch_in ticks for an active context in the pphwsp. The context image provides a 32 bit total ticks which is the accumulated by the context (a.k.a. context[CTX_TIMESTAMP]). This information is used to calculate the context busyness as follows:

If the engine_id is valid, then busyness is the sum of accumulated total ticks and active ticks. Active ticks is calculated with current gt time as reference.

If engine_id is invalid, busyness is equal to accumulated total ticks.

Since KMD (CPU) retrieves busyness data from 2 sources - GPU and GuC, a potential race was highlighted in an earlier review that can lead to double accounting of busyness. While the solution to this is a wip, busyness is still usable for platforms running GuC submission.

GuC provides engine_id and last_switch_in ticks for an active context in the
pphwsp. The context image provides a 32 bit total ticks which is the accumulated
by the context (a.k.a. context[CTX_TIMESTAMP]). This information is used to
calculate the context busyness as follows:

If the engine_id is valid, then busyness is the sum of accumulated total ticks
and active ticks. Active ticks is calculated with current gt time as reference.

If engine_id is invalid, busyness is equal to accumulated total ticks.

Since KMD (CPU) retrieves busyness data from 2 sources - GPU and GuC, a
potential race was highlighted in an earlier review that can lead to double
accounting of busyness. While the solution to this is a wip, busyness is still
usable for platforms running GuC submission.

Signed-off-by: John Harrison <[email protected]>
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
sysopenci pushed a commit that referenced this pull request May 19, 2023
commit 60eed1e3d45045623e46944ebc7c42c30a4350f0 upstream.

code path:

ocfs2_ioctl_move_extents
 ocfs2_move_extents
  ocfs2_defrag_extent
   __ocfs2_move_extent
    + ocfs2_journal_access_di
    + ocfs2_split_extent  //sub-paths call jbd2_journal_restart
    + ocfs2_journal_dirty //crash by jbs2 ASSERT

crash stacks:

PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
 #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
 #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
 #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
 #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
 #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
 #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
 #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
    [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
    RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
    RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
    R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
    R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
 #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
 #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]

Analysis

This bug has the same root cause of 'commit 7f27ec9 ("ocfs2: call
ocfs2_journal_access_di() before ocfs2_journal_dirty() in
ocfs2_write_end_nolock()")'.  For this bug, jbd2_journal_restart() is
called by ocfs2_split_extent() during defragmenting.

How to fix

For ocfs2_split_extent() can handle journal operations totally by itself.
Caller doesn't need to call journal access/dirty pair, and caller only
needs to call journal start/stop pair.  The fix method is to remove
journal access/dirty from __ocfs2_move_extent().

The discussion for this patch:
https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Heming Zhao <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
sysopenci pushed a commit that referenced this pull request May 19, 2023
[ Upstream commit 4e264be98b88a6d6f476c11087fe865696e8bef5 ]

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Copy link

@feijiang1 feijiang1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@JeevakaPrabu JeevakaPrabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged here c01c587

JeevakaPrabu pushed a commit that referenced this pull request Jul 13, 2023
In binder_transaction_buffer_release() the 'failed_at' offset indicates
the number of objects to clean up. However, this function was changed by
commit 44d8047 ("binder: use standard functions to allocate fds"),
to release all the objects in the buffer when 'failed_at' is zero.

This introduced an issue when a transaction buffer is released without
any objects having been processed so far. In this case, 'failed_at' is
indeed zero yet it is misinterpreted as releasing the entire buffer.

This leads to use-after-free errors where nodes are incorrectly freed
and subsequently accessed. Such is the case in the following KASAN
report:

  ==================================================================
  BUG: KASAN: slab-use-after-free in binder_thread_read+0xc40/0x1f30
  Read of size 8 at addr ffff4faf037cfc58 by task poc/474

  CPU: 6 PID: 474 Comm: poc Not tainted 6.3.0-12570-g7df047b3f0aa #5
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   dump_backtrace+0x94/0xec
   show_stack+0x18/0x24
   dump_stack_lvl+0x48/0x60
   print_report+0xf8/0x5b8
   kasan_report+0xb8/0xfc
   __asan_load8+0x9c/0xb8
   binder_thread_read+0xc40/0x1f30
   binder_ioctl+0xd9c/0x1768
   __arm64_sys_ioctl+0xd4/0x118
   invoke_syscall+0x60/0x188
  [...]

  Allocated by task 474:
   kasan_save_stack+0x3c/0x64
   kasan_set_track+0x2c/0x40
   kasan_save_alloc_info+0x24/0x34
   __kasan_kmalloc+0xb8/0xbc
   kmalloc_trace+0x48/0x5c
   binder_new_node+0x3c/0x3a4
   binder_transaction+0x2b58/0x36f0
   binder_thread_write+0x8e0/0x1b78
   binder_ioctl+0x14a0/0x1768
   __arm64_sys_ioctl+0xd4/0x118
   invoke_syscall+0x60/0x188
  [...]

  Freed by task 475:
   kasan_save_stack+0x3c/0x64
   kasan_set_track+0x2c/0x40
   kasan_save_free_info+0x38/0x5c
   __kasan_slab_free+0xe8/0x154
   __kmem_cache_free+0x128/0x2bc
   kfree+0x58/0x70
   binder_dec_node_tmpref+0x178/0x1fc
   binder_transaction_buffer_release+0x430/0x628
   binder_transaction+0x1954/0x36f0
   binder_thread_write+0x8e0/0x1b78
   binder_ioctl+0x14a0/0x1768
   __arm64_sys_ioctl+0xd4/0x118
   invoke_syscall+0x60/0x188
  [...]
  ==================================================================

In order to avoid these issues, let's always calculate the intended
'failed_at' offset beforehand. This is renamed and wrapped in a helper
function to make it clear and convenient.

Fixes: 32e9f56a96d8 ("binder: don't detect sender/target during buffer cleanup")
Reported-by: Zi Fan Tan <[email protected]>
Link: https://b.corp.google.com/issues/275041864
Cc: [email protected]
Signed-off-by: Carlos Llamas <[email protected]>

Bug: 275041864
Link: https://lore.kernel.org/all/[email protected]
Change-Id: I4bcc8bde77a8118872237d100cccb5caf95d99a1
Signed-off-by: Carlos Llamas <[email protected]>
JeevakaPrabu pushed a commit that referenced this pull request Jul 13, 2023
[ Upstream commit 91d6a468e335571f1e67e046050dea9af5fa4ebe ]

gpi_ch_init() doesn't lock the ctrl_lock mutex, so there is no need to
unlock it too. Instead the mutex is handled by the function
gpi_alloc_chan_resources(), which properly locks and unlocks the mutex.

=====================================
WARNING: bad unlock balance detected!
6.3.0-rc5-00253-g99792582ded1-dirty #15 Not tainted
-------------------------------------
kworker/u16:0/9 is trying to release lock (&gpii->ctrl_lock) at:
[<ffffb99d04e1284c>] gpi_alloc_chan_resources+0x108/0x5bc
but there are no more locks to release!

other info that might help us debug this:
6 locks held by kworker/u16:0/9:
 #0: ffff575740010938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x220/0x594
 #1: ffff80000809bdd0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x220/0x594
 #2: ffff575740f2a0f8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x188
 #3: ffff57574b5570f8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x188
 #4: ffffb99d06a2f180 (of_dma_lock){+.+.}-{3:3}, at: of_dma_request_slave_channel+0x138/0x280
 #5: ffffb99d06a2ee20 (dma_list_mutex){+.+.}-{3:3}, at: dma_get_slave_channel+0x28/0x10c

stack backtrace:
CPU: 7 PID: 9 Comm: kworker/u16:0 Not tainted 6.3.0-rc5-00253-g99792582ded1-dirty #15
Hardware name: Google Pixel 3 (DT)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
 dump_backtrace+0xa0/0xfc
 show_stack+0x18/0x24
 dump_stack_lvl+0x60/0xac
 dump_stack+0x18/0x24
 print_unlock_imbalance_bug+0x130/0x148
 lock_release+0x270/0x300
 __mutex_unlock_slowpath+0x48/0x2cc
 mutex_unlock+0x20/0x2c
 gpi_alloc_chan_resources+0x108/0x5bc
 dma_chan_get+0x84/0x188
 dma_get_slave_channel+0x5c/0x10c
 gpi_of_dma_xlate+0x110/0x1a0
 of_dma_request_slave_channel+0x174/0x280
 dma_request_chan+0x3c/0x2d4
 geni_i2c_probe+0x544/0x63c
 platform_probe+0x68/0xc4
 really_probe+0x148/0x2ac
 __driver_probe_device+0x78/0xe0
 driver_probe_device+0x3c/0x160
 __device_attach_driver+0xb8/0x138
 bus_for_each_drv+0x84/0xe0
 __device_attach+0x9c/0x188
 device_initial_probe+0x14/0x20
 bus_probe_device+0xac/0xb0
 device_add+0x60c/0x7d8
 of_device_add+0x44/0x60
 of_platform_device_create_pdata+0x90/0x124
 of_platform_bus_create+0x15c/0x3c8
 of_platform_populate+0x58/0xf8
 devm_of_platform_populate+0x58/0xbc
 geni_se_probe+0xf0/0x164
 platform_probe+0x68/0xc4
 really_probe+0x148/0x2ac
 __driver_probe_device+0x78/0xe0
 driver_probe_device+0x3c/0x160
 __device_attach_driver+0xb8/0x138
 bus_for_each_drv+0x84/0xe0
 __device_attach+0x9c/0x188
 device_initial_probe+0x14/0x20
 bus_probe_device+0xac/0xb0
 deferred_probe_work_func+0x8c/0xc8
 process_one_work+0x2bc/0x594
 worker_thread+0x228/0x438
 kthread+0x108/0x10c
 ret_from_fork+0x10/0x20

Fixes: 5d0c353 ("dmaengine: qcom: Add GPI dma driver")
Signed-off-by: Dmitry Baryshkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
JeevakaPrabu pushed a commit that referenced this pull request Jul 13, 2023
[ Upstream commit 05bb0167c80b8f93c6a4e0451b7da9b96db990c2 ]

ACPICA commit 770653e3ba67c30a629ca7d12e352d83c2541b1e

Before this change we see the following UBSAN stack trace in Fuchsia:

  #0    0x000021e4213b3302 in acpi_ds_init_aml_walk(struct acpi_walk_state*, union acpi_parse_object*, struct acpi_namespace_node*, u8*, u32, struct acpi_evaluate_info*, u8) ../../third_party/acpica/source/components/dispatcher/dswstate.c:682 <platform-bus-x86.so>+0x233302
  #1.2  0x000020d0f660777f in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x3d77f
  #1.1  0x000020d0f660777f in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x3d77f
  #1    0x000020d0f660777f in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:387 <libclang_rt.asan.so>+0x3d77f
  #2    0x000020d0f660b96d in handlepointer_overflow_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:809 <libclang_rt.asan.so>+0x4196d
  #3    0x000020d0f660b50d in compiler-rt/lib/ubsan/ubsan_handlers.cpp:815 <libclang_rt.asan.so>+0x4150d
  #4    0x000021e4213b3302 in acpi_ds_init_aml_walk(struct acpi_walk_state*, union acpi_parse_object*, struct acpi_namespace_node*, u8*, u32, struct acpi_evaluate_info*, u8) ../../third_party/acpica/source/components/dispatcher/dswstate.c:682 <platform-bus-x86.so>+0x233302
  #5    0x000021e4213e2369 in acpi_ds_call_control_method(struct acpi_thread_state*, struct acpi_walk_state*, union acpi_parse_object*) ../../third_party/acpica/source/components/dispatcher/dsmethod.c:605 <platform-bus-x86.so>+0x262369
  #6    0x000021e421437fac in acpi_ps_parse_aml(struct acpi_walk_state*) ../../third_party/acpica/source/components/parser/psparse.c:550 <platform-bus-x86.so>+0x2b7fac
  #7    0x000021e4214464d2 in acpi_ps_execute_method(struct acpi_evaluate_info*) ../../third_party/acpica/source/components/parser/psxface.c:244 <platform-bus-x86.so>+0x2c64d2
  #8    0x000021e4213aa052 in acpi_ns_evaluate(struct acpi_evaluate_info*) ../../third_party/acpica/source/components/namespace/nseval.c:250 <platform-bus-x86.so>+0x22a052
  #9    0x000021e421413dd8 in acpi_ns_init_one_device(acpi_handle, u32, void*, void**) ../../third_party/acpica/source/components/namespace/nsinit.c:735 <platform-bus-x86.so>+0x293dd8
  #10   0x000021e421429e98 in acpi_ns_walk_namespace(acpi_object_type, acpi_handle, u32, u32, acpi_walk_callback, acpi_walk_callback, void*, void**) ../../third_party/acpica/source/components/namespace/nswalk.c:298 <platform-bus-x86.so>+0x2a9e98
  #11   0x000021e4214131ac in acpi_ns_initialize_devices(u32) ../../third_party/acpica/source/components/namespace/nsinit.c:268 <platform-bus-x86.so>+0x2931ac
  #12   0x000021e42147c40d in acpi_initialize_objects(u32) ../../third_party/acpica/source/components/utilities/utxfinit.c:304 <platform-bus-x86.so>+0x2fc40d
  #13   0x000021e42126d603 in acpi::acpi_impl::initialize_acpi(acpi::acpi_impl*) ../../src/devices/board/lib/acpi/acpi-impl.cc:224 <platform-bus-x86.so>+0xed603

Add a simple check that avoids incrementing a pointer by zero, but
otherwise behaves as before. Note that our findings are against ACPICA
20221020, but the same code exists on master.

Link: acpica/acpica@770653e3
Signed-off-by: Bob Moore <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
JeevakaPrabu pushed a commit that referenced this pull request Jul 13, 2023
[ Upstream commit 63f8793ee60513a09f110ea460a6ff2c33811cdb ]

Make sure to check device queue mode in the null_validate_conf() and
return error for NULL_Q_RQ as we don't allow legacy I/O path, without
this patch we get OOPs when queue mode is set to 1 from configfs,
following are repro steps :-

modprobe null_blk nr_devices=0
mkdir config/nullb/nullb0
echo 1 > config/nullb/nullb0/memory_backed
echo 4096 > config/nullb/nullb0/blocksize
echo 20480 > config/nullb/nullb0/size
echo 1 > config/nullb/nullb0/queue_mode
echo 1 > config/nullb/nullb0/power

Entering kdb (current=0xffff88810acdd080, pid 2372) on processor 42 Oops: (null)
due to oops @ 0xffffffffc041c329
CPU: 42 PID: 2372 Comm: sh Tainted: G           O     N 6.3.0-rc5lblk+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:null_add_dev.part.0+0xd9/0x720 [null_blk]
Code: 01 00 00 85 d2 0f 85 a1 03 00 00 48 83 bb 08 01 00 00 00 0f 85 f7 03 00 00 80 bb 62 01 00 00 00 48 8b 75 20 0f 85 6d 02 00 00 <48> 89 6e 60 48 8b 75 20 bf 06 00 00 00 e8 f5 37 2c c1 48 8b 75 20
RSP: 0018:ffffc900052cbde0 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88811084d800 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888100042e00
RBP: ffff8881053d8200 R08: ffffc900052cbd68 R09: ffff888105db2000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
R13: ffff888104765200 R14: ffff88810eec1748 R15: ffff88810eec1740
FS:  00007fd445fd1740(0000) GS:ffff8897dfc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 0000000166a00000 CR4: 0000000000350ee0
DR0: ffffffff8437a488 DR1: ffffffff8437a489 DR2: ffffffff8437a48a
DR3: ffffffff8437a48b DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 nullb_device_power_store+0xd1/0x120 [null_blk]
 configfs_write_iter+0xb4/0x120
 vfs_write+0x2ba/0x3c0
 ksys_write+0x5f/0xe0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fd4460c57a7
Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffd3792a4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4460c57a7
RDX: 0000000000000002 RSI: 000055b43c02e4c0 RDI: 0000000000000001
RBP: 000055b43c02e4c0 R08: 000000000000000a R09: 00007fd44615b4e0
R10: 00007fd44615b3e0 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fd446198520 R14: 0000000000000002 R15: 00007fd446198700
 </TASK>

Signed-off-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
JeevakaPrabu pushed a commit that referenced this pull request Jul 13, 2023
[ Upstream commit 98bea253aa28ad8be2ce565a9ca21beb4a9419e5 ]

Log load and replay is part of the metadata handle flow during mount
operation. The $MFT record will be loaded and used while replaying logs.
However, a malformed $MFT record, say, has RECORD_FLAG_DIR flag set and
contains an ATTR_ROOT attribute will misguide kernel to treat it as a
directory, and try to free the allocated resources when the
corresponding inode is freed, which will cause an invalid kfree because
the memory hasn't actually been allocated.

[  101.368647] BUG: KASAN: invalid-free in kvfree+0x2c/0x40
[  101.369457]
[  101.369986] CPU: 0 PID: 198 Comm: mount Not tainted 6.0.0-rc7+ #5
[  101.370529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  101.371362] Call Trace:
[  101.371795]  <TASK>
[  101.372157]  dump_stack_lvl+0x49/0x63
[  101.372658]  print_report.cold+0xf5/0x689
[  101.373022]  ? ni_write_inode+0x754/0xd90
[  101.373378]  ? kvfree+0x2c/0x40
[  101.373698]  kasan_report_invalid_free+0x77/0xf0
[  101.374058]  ? kvfree+0x2c/0x40
[  101.374352]  ? kvfree+0x2c/0x40
[  101.374668]  __kasan_slab_free+0x189/0x1b0
[  101.374992]  ? kvfree+0x2c/0x40
[  101.375271]  kfree+0x168/0x3b0
[  101.375717]  kvfree+0x2c/0x40
[  101.376002]  indx_clear+0x26/0x60
[  101.376316]  ni_clear+0xc5/0x290
[  101.376661]  ntfs_evict_inode+0x45/0x70
[  101.377001]  evict+0x199/0x280
[  101.377432]  iput.part.0+0x286/0x320
[  101.377819]  iput+0x32/0x50
[  101.378166]  ntfs_loadlog_and_replay+0x143/0x320
[  101.378656]  ? ntfs_bio_fill_1+0x510/0x510
[  101.378968]  ? iput.part.0+0x286/0x320
[  101.379367]  ntfs_fill_super+0xecb/0x1ba0
[  101.379729]  ? put_ntfs+0x1d0/0x1d0
[  101.380046]  ? vsprintf+0x20/0x20
[  101.380542]  ? mutex_unlock+0x81/0xd0
[  101.380914]  ? set_blocksize+0x95/0x150
[  101.381597]  get_tree_bdev+0x232/0x370
[  101.382254]  ? put_ntfs+0x1d0/0x1d0
[  101.382699]  ntfs_fs_get_tree+0x15/0x20
[  101.383094]  vfs_get_tree+0x4c/0x130
[  101.383675]  path_mount+0x654/0xfe0
[  101.384203]  ? putname+0x80/0xa0
[  101.384540]  ? finish_automount+0x2e0/0x2e0
[  101.384943]  ? putname+0x80/0xa0
[  101.385362]  ? kmem_cache_free+0x1c4/0x440
[  101.385968]  ? putname+0x80/0xa0
[  101.386666]  do_mount+0xd6/0xf0
[  101.387228]  ? path_mount+0xfe0/0xfe0
[  101.387585]  ? __kasan_check_write+0x14/0x20
[  101.387979]  __x64_sys_mount+0xca/0x110
[  101.388436]  do_syscall_64+0x3b/0x90
[  101.388757]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  101.389289] RIP: 0033:0x7fa0f70e948a
[  101.390048] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[  101.391297] RSP: 002b:00007ffc24fdecc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[  101.391988] RAX: ffffffffffffffda RBX: 000055932c183060 RCX: 00007fa0f70e948a
[  101.392494] RDX: 000055932c183260 RSI: 000055932c1832e0 RDI: 000055932c18bce0
[  101.393053] RBP: 0000000000000000 R08: 000055932c183280 R09: 0000000000000020
[  101.393577] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055932c18bce0
[  101.394044] R13: 000055932c183260 R14: 0000000000000000 R15: 00000000ffffffff
[  101.394747]  </TASK>
[  101.395402]
[  101.396047] Allocated by task 198:
[  101.396724]  kasan_save_stack+0x26/0x50
[  101.397400]  __kasan_slab_alloc+0x6d/0x90
[  101.397974]  kmem_cache_alloc_lru+0x192/0x5a0
[  101.398524]  ntfs_alloc_inode+0x23/0x70
[  101.399137]  alloc_inode+0x3b/0xf0
[  101.399534]  iget5_locked+0x54/0xa0
[  101.400026]  ntfs_iget5+0xaf/0x1780
[  101.400414]  ntfs_loadlog_and_replay+0xe5/0x320
[  101.400883]  ntfs_fill_super+0xecb/0x1ba0
[  101.401313]  get_tree_bdev+0x232/0x370
[  101.401774]  ntfs_fs_get_tree+0x15/0x20
[  101.402224]  vfs_get_tree+0x4c/0x130
[  101.402673]  path_mount+0x654/0xfe0
[  101.403160]  do_mount+0xd6/0xf0
[  101.403537]  __x64_sys_mount+0xca/0x110
[  101.404058]  do_syscall_64+0x3b/0x90
[  101.404333]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  101.404816]
[  101.405067] The buggy address belongs to the object at ffff888008cc9ea0
[  101.405067]  which belongs to the cache ntfs_inode_cache of size 992
[  101.406171] The buggy address is located 232 bytes inside of
[  101.406171]  992-byte region [ffff888008cc9ea0, ffff888008cca280)
[  101.406995]
[  101.408559] The buggy address belongs to the physical page:
[  101.409320] page:00000000dccf19dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cc8
[  101.410654] head:00000000dccf19dd order:2 compound_mapcount:0 compound_pincount:0
[  101.411533] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
[  101.412665] raw: 000fffffc0010200 0000000000000000 dead000000000122 ffff888003695140
[  101.413209] raw: 0000000000000000 00000000800e000e 00000001ffffffff 0000000000000000
[  101.413799] page dumped because: kasan: bad access detected
[  101.414213]
[  101.414427] Memory state around the buggy address:
[  101.414991]  ffff888008cc9e80: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00
[  101.415785]  ffff888008cc9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  101.416933] >ffff888008cc9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  101.417857]                       ^
[  101.418566]  ffff888008cca000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  101.419704]  ffff888008cca080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Signed-off-by: Edward Lo <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
JeevakaPrabu pushed a commit that referenced this pull request Jul 13, 2023
[ Upstream commit bdc1c5fac982845a58d28690cdb56db8c88a530d ]

In binder_transaction_buffer_release() the 'failed_at' offset indicates
the number of objects to clean up. However, this function was changed by
commit 44d8047 ("binder: use standard functions to allocate fds"),
to release all the objects in the buffer when 'failed_at' is zero.

This introduced an issue when a transaction buffer is released without
any objects having been processed so far. In this case, 'failed_at' is
indeed zero yet it is misinterpreted as releasing the entire buffer.

This leads to use-after-free errors where nodes are incorrectly freed
and subsequently accessed. Such is the case in the following KASAN
report:

  ==================================================================
  BUG: KASAN: slab-use-after-free in binder_thread_read+0xc40/0x1f30
  Read of size 8 at addr ffff4faf037cfc58 by task poc/474

  CPU: 6 PID: 474 Comm: poc Not tainted 6.3.0-12570-g7df047b3f0aa #5
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   dump_backtrace+0x94/0xec
   show_stack+0x18/0x24
   dump_stack_lvl+0x48/0x60
   print_report+0xf8/0x5b8
   kasan_report+0xb8/0xfc
   __asan_load8+0x9c/0xb8
   binder_thread_read+0xc40/0x1f30
   binder_ioctl+0xd9c/0x1768
   __arm64_sys_ioctl+0xd4/0x118
   invoke_syscall+0x60/0x188
  [...]

  Allocated by task 474:
   kasan_save_stack+0x3c/0x64
   kasan_set_track+0x2c/0x40
   kasan_save_alloc_info+0x24/0x34
   __kasan_kmalloc+0xb8/0xbc
   kmalloc_trace+0x48/0x5c
   binder_new_node+0x3c/0x3a4
   binder_transaction+0x2b58/0x36f0
   binder_thread_write+0x8e0/0x1b78
   binder_ioctl+0x14a0/0x1768
   __arm64_sys_ioctl+0xd4/0x118
   invoke_syscall+0x60/0x188
  [...]

  Freed by task 475:
   kasan_save_stack+0x3c/0x64
   kasan_set_track+0x2c/0x40
   kasan_save_free_info+0x38/0x5c
   __kasan_slab_free+0xe8/0x154
   __kmem_cache_free+0x128/0x2bc
   kfree+0x58/0x70
   binder_dec_node_tmpref+0x178/0x1fc
   binder_transaction_buffer_release+0x430/0x628
   binder_transaction+0x1954/0x36f0
   binder_thread_write+0x8e0/0x1b78
   binder_ioctl+0x14a0/0x1768
   __arm64_sys_ioctl+0xd4/0x118
   invoke_syscall+0x60/0x188
  [...]
  ==================================================================

In order to avoid these issues, let's always calculate the intended
'failed_at' offset beforehand. This is renamed and wrapped in a helper
function to make it clear and convenient.

Fixes: 32e9f56a96d8 ("binder: don't detect sender/target during buffer cleanup")
Reported-by: Zi Fan Tan <[email protected]>
Cc: [email protected]
Signed-off-by: Carlos Llamas <[email protected]>
Acked-by: Todd Kjos <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
sysopenci pushed a commit that referenced this pull request Dec 6, 2023
[ Upstream commit 7962ef13651a9163f07b530607392ea123482e8a ]

In 3cb4d5e ("perf trace: Free syscall tp fields in
evsel->priv") it only was freeing if strcmp(evsel->tp_format->system,
"syscalls") returned zero, while the corresponding initialization of
evsel->priv was being performed if it was _not_ zero, i.e. if the tp
system wasn't 'syscalls'.

Just stop looking for that and free it if evsel->priv was set, which
should be equivalent.

Also use the pre-existing evsel_trace__delete() function.

This resolves these leaks, detected with:

  $ make EXTRA_CFLAGS="-fsanitize=address" BUILD_BPF_SKEL=1 CORESIGHT=1 O=/tmp/build/perf-tools-next -C tools/perf install-bin

  =================================================================
  ==481565==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
      #0 0x7f7343cba097 in calloc (/lib64/libasan.so.8+0xba097)
      #1 0x987966 in zalloc (/home/acme/bin/perf+0x987966)
      #2 0x52f9b9 in evsel_trace__new /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:307
      #3 0x52f9b9 in evsel__syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:333
      #4 0x52f9b9 in evsel__init_raw_syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:458
      #5 0x52f9b9 in perf_evsel__raw_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:480
      #6 0x540e8b in trace__add_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3212
      #7 0x540e8b in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3891
      #8 0x540e8b in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5156
      #9 0x5ef262 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323
      #10 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377
      #11 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421
      #12 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537
      #13 0x7f7342c4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f)

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
      #0 0x7f7343cba097 in calloc (/lib64/libasan.so.8+0xba097)
      #1 0x987966 in zalloc (/home/acme/bin/perf+0x987966)
      #2 0x52f9b9 in evsel_trace__new /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:307
      #3 0x52f9b9 in evsel__syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:333
      #4 0x52f9b9 in evsel__init_raw_syscall_tp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:458
      #5 0x52f9b9 in perf_evsel__raw_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:480
      #6 0x540dd1 in trace__add_syscall_newtp /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3205
      #7 0x540dd1 in trace__run /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:3891
      #8 0x540dd1 in cmd_trace /home/acme/git/perf-tools-next/tools/perf/builtin-trace.c:5156
      #9 0x5ef262 in run_builtin /home/acme/git/perf-tools-next/tools/perf/perf.c:323
      #10 0x4196da in handle_internal_command /home/acme/git/perf-tools-next/tools/perf/perf.c:377
      #11 0x4196da in run_argv /home/acme/git/perf-tools-next/tools/perf/perf.c:421
      #12 0x4196da in main /home/acme/git/perf-tools-next/tools/perf/perf.c:537
      #13 0x7f7342c4a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f)

  SUMMARY: AddressSanitizer: 80 byte(s) leaked in 2 allocation(s).
  [root@quaco ~]#

With this we plug all leaks with "perf trace sleep 1".

Fixes: 3cb4d5e ("perf trace: Free syscall tp fields in evsel->priv")
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
sysopenci pushed a commit that referenced this pull request Dec 6, 2023
[ Upstream commit ef23cb593304bde0cc046fd4cc83ae7ea2e24f16 ]

While debugging a segfault on 'perf lock contention' without an
available perf.data file I noticed that it was basically calling:

	perf_session__delete(ERR_PTR(-1))

Resulting in:

  (gdb) run lock contention
  Starting program: /root/bin/perf lock contention
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib64/libthread_db.so.1".
  failed to open perf.data: No such file or directory  (try 'perf record' first)
  Initializing perf session failed

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858
  2858		if (!session->auxtrace)
  (gdb) p session
  $1 = (struct perf_session *) 0xffffffffffffffff
  (gdb) bt
  #0  0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858
  #1  0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300
  #2  0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161
  #3  0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604
  #4  0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322
  #5  0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375
  #6  0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419
  #7  0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535
  (gdb)

So just set it to NULL after using PTR_ERR(session) to decode the error
as perf_session__delete(NULL) is supported.

The same problem was found in 'perf top' after an audit of all
perf_session__new() failure handling.

Fixes: 6ef81c5 ("perf session: Return error code for perf_session__new() function on failure")
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jeremie Galarneau <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Mamatha Inamdar <[email protected]>
Cc: Mukesh Ojha <[email protected]>
Cc: Nageswara R Sastry <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Shawn Landden <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
sysopenci pushed a commit that referenced this pull request Dec 6, 2023
[ Upstream commit ade32bd8a738d7497ffe9743c46728db26740f78 ]

unix_tot_inflight is changed under spin_lock(unix_gc_lock), but
unix_release_sock() reads it locklessly.

Let's use READ_ONCE() for unix_tot_inflight.

Note that the writer side was marked by commit 9d6d7f1cb67c ("af_unix:
annote lockless accesses to unix_tot_inflight & gc_in_progress")

BUG: KCSAN: data-race in unix_inflight / unix_release_sock

write (marked) to 0xffffffff871852b8 of 4 bytes by task 123 on cpu 1:
 unix_inflight+0x130/0x180 net/unix/scm.c:64
 unix_attach_fds+0x137/0x1b0 net/unix/scm.c:123
 unix_scm_to_skb net/unix/af_unix.c:1832 [inline]
 unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1955
 sock_sendmsg_nosec net/socket.c:724 [inline]
 sock_sendmsg+0x148/0x160 net/socket.c:747
 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2493
 ___sys_sendmsg+0xc6/0x140 net/socket.c:2547
 __sys_sendmsg+0x94/0x140 net/socket.c:2576
 __do_sys_sendmsg net/socket.c:2585 [inline]
 __se_sys_sendmsg net/socket.c:2583 [inline]
 __x64_sys_sendmsg+0x45/0x50 net/socket.c:2583
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

read to 0xffffffff871852b8 of 4 bytes by task 4891 on cpu 0:
 unix_release_sock+0x608/0x910 net/unix/af_unix.c:671
 unix_release+0x59/0x80 net/unix/af_unix.c:1058
 __sock_release+0x7d/0x170 net/socket.c:653
 sock_close+0x19/0x30 net/socket.c:1385
 __fput+0x179/0x5e0 fs/file_table.c:321
 ____fput+0x15/0x20 fs/file_table.c:349
 task_work_run+0x116/0x1a0 kernel/task_work.c:179
 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
 syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

value changed: 0x00000000 -> 0x00000001

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4891 Comm: systemd-coredum Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: 9305cfa ("[AF_UNIX]: Make unix_tot_inflight counter non-atomic")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
sysopenci pushed a commit that referenced this pull request Dec 6, 2023
[ Upstream commit a154f5f643c6ecddd44847217a7a3845b4350003 ]

The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
sysopenci pushed a commit that referenced this pull request Dec 6, 2023
commit b502c87d2a26c349acbc231ff2acd6f17147926b upstream.

If an UNDEFINED exception is taken from EL1, and do_undefinstr() doesn't
find any suitable undef_hook, it will call:

	BUG_ON(!user_mode(regs))

... and the kernel will report a failure witin do_undefinstr() rather
than reporting the original context that the UNDEFINED exception was
taken from. The pt_regs and ESR value reported within the BUG() handler
will be from within do_undefinstr() and the code dump will be for the
BRK in BUG_ON(), which isn't sufficient to debug the cause of the
original exception.

This patch makes the reporting better by having do_undefinstr() call
die() directly in this case to report the original context from which
the UNDEFINED exception was taken.

Prior to this patch, an undefined instruction is reported as:

| kernel BUG at arch/arm64/kernel/traps.c:497!
| Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-00127-geff044f1b04e-dirty #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : do_undefinstr+0x28c/0x2ac
| lr : do_undefinstr+0x298/0x2ac
| sp : ffff800009f63bc0
| x29: ffff800009f63bc0 x28: ffff800009f73c00 x27: ffff800009644a70
| x26: ffff8000096778a8 x25: 0000000000000040 x24: 0000000000000000
| x23: 00000000800000c5 x22: ffff800009894060 x21: ffff800009f63d90
| x20: 0000000000000000 x19: ffff800009f63c40 x18: 0000000000000006
| x17: 0000000000403000 x16: 00000000bfbfd000 x15: ffff800009f63830
| x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000019
| x11: 0101010101010101 x10: 0000000000161b98 x9 : 0000000000000000
| x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
| x5 : ffff800009f761d0 x4 : 0000000000000000 x3 : ffff80000a2b80f8
| x2 : 0000000000000000 x1 : ffff800009f73c00 x0 : 00000000800000c5
| Call trace:
|  do_undefinstr+0x28c/0x2ac
|  el1_undef+0x2c/0x4c
|  el1h_64_sync_handler+0x84/0xd0
|  el1h_64_sync+0x64/0x68
|  setup_arch+0x550/0x598
|  start_kernel+0x88/0x6ac
|  __primary_switched+0xb8/0xc0
| Code: 17ffff95 a9425bf5 17ffffb8 a9025bf5 (d4210000)

With this patch applied, an undefined instruction is reported as:

| Internal error: Oops - Undefined instruction: 0 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-00128-gf27cfcc80e52-dirty #5
| Hardware name: linux,dummy-virt (DT)
| pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : setup_arch+0x550/0x598
| lr : setup_arch+0x50c/0x598
| sp : ffff800009f63d90
| x29: ffff800009f63d90 x28: 0000000081000200 x27: ffff800009644a70
| x26: ffff8000096778c8 x25: 0000000000000040 x24: 0000000000000000
| x23: 0000000000000100 x22: ffff800009f69a58 x21: ffff80000a2b80b8
| x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000006
| x17: 0000000000403000 x16: 00000000bfbfd000 x15: ffff800009f63830
| x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000019
| x11: 0101010101010101 x10: 0000000000161b98 x9 : 0000000000000000
| x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
| x5 : 0000000000000008 x4 : 0000000000000010 x3 : 0000000000000000
| x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
| Call trace:
|  setup_arch+0x550/0x598
|  start_kernel+0x88/0x6ac
|  __primary_switched+0xb8/0xc0
| Code: b4000080 90ffed80 912ac000 97db745f (00000000)

Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Amit Daniel Kachhap <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Jinjie Ruan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
xyzhao2018 pushed a commit to xyzhao2018/linux-intel-lts2021 that referenced this pull request Jan 2, 2024
[ Upstream commit b684c09f09e7a6af3794d4233ef785819e72db79 ]

ppc_save_regs() skips one stack frame while saving the CPU register states.
Instead of saving current R1, it pulls the previous stack frame pointer.

When vmcores caused by direct panic call (such as `echo c >
/proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the
backtrace correctly. On further analysis, it was found that it was because
of mismatch between r1 and NIP.

GDB uses NIP to get current function symbol and uses corresponding debug
info of that function to unwind previous frames, but due to the
mismatching r1 and NIP, the unwinding does not work, and it fails to
unwind to the 2nd frame and hence does not show the backtrace.

GDB backtrace with vmcore of kernel without this patch:

---------
(gdb) bt
 #0  0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>,
    newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69
 projectceladon#1  __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974
 projectceladon#2  0x0000000000000063 in ?? ()
 projectceladon#3  0xc000000003579320 in ?? ()
---------

Further analysis revealed that the mismatch occurred because
"ppc_save_regs" was saving the previous stack's SP instead of the current
r1. This patch fixes this by storing current r1 in the saved pt_regs.

GDB backtrace with vmcore of patched kernel:

--------
(gdb) bt
 #0  0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8)
    at ./arch/powerpc/include/asm/kexec.h:69
 projectceladon#1  __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974
 projectceladon#2  0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n")
    at kernel/panic.c:358
 projectceladon#3  0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155
 projectceladon#4  0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false)
    at drivers/tty/sysrq.c:602
 projectceladon#5  0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>,
    count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163
 projectceladon#6  0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>,
    buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340
 projectceladon#7  proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>,
    ppos=<optimized out>) at fs/proc/inode.c:352
 projectceladon#8  0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00,
    buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>,
    count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582
 projectceladon#9  0xc0000000005b4264 in ksys_write (fd=<optimized out>,
    buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2)
    at fs/read_write.c:637
 projectceladon#10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>)
    at arch/powerpc/kernel/syscall.c:171
 projectceladon#11 0xc00000000000c270 in system_call_vectored_common ()
    at arch/powerpc/kernel/interrupt_64.S:192
--------

Nick adds:
  So this now saves regs as though it was an interrupt taken in the
  caller, at the instruction after the call to ppc_save_regs, whereas
  previously the NIP was there, but R1 came from the caller's caller and
  that mismatch is what causes gdb's dwarf unwinder to go haywire.

Signed-off-by: Aditya Gupta <[email protected]>
Fixes: d16a58f ("powerpc: Improve ppc_save_regs()")
Reivewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
JeevakaPrabu pushed a commit that referenced this pull request Jun 20, 2024
[ Upstream commit f8bbc07ac535593139c875ffa19af924b1084540 ]

vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@sysopenci sysopenci added the Stale Stale label for inactive open prs label Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Stale label for inactive open prs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants