scsi: megaraid_sas: Fix deadlock on firmware crashdump
commit
0b0747d507bffb827e40fc0f9fb5883fffc23477 upstream.
The following processes run into a deadlock. CPU 41 was waiting for CPU 29
to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29
was hung by that spinlock with IRQs disabled.
PID: 17360 TASK:
ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd"
!# 0 [
ffffb80edbf37b58] __read_once_size at
ffffffff9b871a40 include/linux/compiler.h:185:0
!# 1 [
ffffb80edbf37b58] atomic_read at
ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0
!# 2 [
ffffb80edbf37b58] dump_stack at
ffffffff9b871a40 lib/dump_stack.c:54:0
# 3 [
ffffb80edbf37b78] csd_lock_wait_toolong at
ffffffff9b131ad5 kernel/smp.c:364:0
# 4 [
ffffb80edbf37b78] __csd_lock_wait at
ffffffff9b131ad5 kernel/smp.c:384:0
# 5 [
ffffb80edbf37bf8] csd_lock_wait at
ffffffff9b13267a kernel/smp.c:394:0
# 6 [
ffffb80edbf37bf8] smp_call_function_many at
ffffffff9b13267a kernel/smp.c:843:0
# 7 [
ffffb80edbf37c50] smp_call_function at
ffffffff9b13279d kernel/smp.c:867:0
# 8 [
ffffb80edbf37c50] on_each_cpu at
ffffffff9b13279d kernel/smp.c:976:0
# 9 [
ffffb80edbf37c78] flush_tlb_kernel_range at
ffffffff9b085c4b arch/x86/mm/tlb.c:742:0
#10 [
ffffb80edbf37cb8] __purge_vmap_area_lazy at
ffffffff9b23a1e0 mm/vmalloc.c:701:0
#11 [
ffffb80edbf37ce0] try_purge_vmap_area_lazy at
ffffffff9b23a2cc mm/vmalloc.c:722:0
#12 [
ffffb80edbf37ce0] free_vmap_area_noflush at
ffffffff9b23a2cc mm/vmalloc.c:754:0
#13 [
ffffb80edbf37cf8] free_unmap_vmap_area at
ffffffff9b23bb3b mm/vmalloc.c:764:0
#14 [
ffffb80edbf37cf8] remove_vm_area at
ffffffff9b23bb3b mm/vmalloc.c:1509:0
#15 [
ffffb80edbf37d18] __vunmap at
ffffffff9b23bb8a mm/vmalloc.c:1537:0
#16 [
ffffb80edbf37d40] vfree at
ffffffff9b23bc85 mm/vmalloc.c:1612:0
#17 [
ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at
ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0
#18 [
ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at
ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0
#19 [
ffffb80edbf37dc0] dev_attr_store at
ffffffff9b56dd7b drivers/base/core.c:758:0
#20 [
ffffb80edbf37dd0] sysfs_kf_write at
ffffffff9b326acf fs/sysfs/file.c:144:0
#21 [
ffffb80edbf37de0] kernfs_fop_write at
ffffffff9b325fd4 fs/kernfs/file.c:316:0
#22 [
ffffb80edbf37e20] __vfs_write at
ffffffff9b29418a fs/read_write.c:480:0
#23 [
ffffb80edbf37ea8] vfs_write at
ffffffff9b294462 fs/read_write.c:544:0
#24 [
ffffb80edbf37ee8] SYSC_write at
ffffffff9b2946ec fs/read_write.c:590:0
#25 [
ffffb80edbf37ee8] SyS_write at
ffffffff9b2946ec fs/read_write.c:582:0
#26 [
ffffb80edbf37f30] do_syscall_64 at
ffffffff9b003ca9 arch/x86/entry/common.c:298:0
#27 [
ffffb80edbf37f58] entry_SYSCALL_64 at
ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0
PID: 17355 TASK:
ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd"
!# 0 [
ffffb80f2d3c7d30] __read_once_size at
ffffffff9b0f2ab0 include/linux/compiler.h:185:0
!# 1 [
ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at
ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0
# 2 [
ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at
ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0
# 3 [
ffffb80f2d3c7d58] queued_spin_lock_slowpath at
ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0
# 4 [
ffffb80f2d3c7d68] queued_spin_lock at
ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0
# 5 [
ffffb80f2d3c7d68] do_raw_spin_lock_flags at
ffffffff9b8961a6 include/linux/spinlock.h:173:0
# 6 [
ffffb80f2d3c7d68] __raw_spin_lock_irqsave at
ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0
# 7 [
ffffb80f2d3c7d68] _raw_spin_lock_irqsave at
ffffffff9b8961a6 kernel/locking/spinlock.c:160:0
# 8 [
ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at
ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0
# 9 [
ffffb80f2d3c7dc0] dev_attr_store at
ffffffff9b56dd7b drivers/base/core.c:758:0
#10 [
ffffb80f2d3c7dd0] sysfs_kf_write at
ffffffff9b326acf fs/sysfs/file.c:144:0
#11 [
ffffb80f2d3c7de0] kernfs_fop_write at
ffffffff9b325fd4 fs/kernfs/file.c:316:0
#12 [
ffffb80f2d3c7e20] __vfs_write at
ffffffff9b29418a fs/read_write.c:480:0
#13 [
ffffb80f2d3c7ea8] vfs_write at
ffffffff9b294462 fs/read_write.c:544:0
#14 [
ffffb80f2d3c7ee8] SYSC_write at
ffffffff9b2946ec fs/read_write.c:590:0
#15 [
ffffb80f2d3c7ee8] SyS_write at
ffffffff9b2946ec fs/read_write.c:582:0
#16 [
ffffb80f2d3c7f30] do_syscall_64 at
ffffffff9b003ca9 arch/x86/entry/common.c:298:0
#17 [
ffffb80f2d3c7f58] entry_SYSCALL_64 at
ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0
The lock is used to synchronize different sysfs operations, it doesn't
protect any resource that will be touched by an interrupt. Consequently
it's not required to disable IRQs. Replace the spinlock with a mutex to fix
the deadlock.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>