Linus Torvalds [Sun, 30 Oct 2022 16:44:06 +0000 (09:44 -0700)]
Merge tag 'loongarch-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Remove unused kernel stack padding, fix some build errors/warnings and
two bugs in laptop platform driver"
* tag 'loongarch-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
platform/loongarch: laptop: Fix possible UAF and simplify generic_acpi_laptop_init()
platform/loongarch: laptop: Adjust resume order for loongson_hotkey_resume()
LoongArch: BPF: Avoid declare variables in switch-case
LoongArch: Use flexible-array member instead of zero-length array
LoongArch: Remove unused kernel stack padding
Linus Torvalds [Sun, 30 Oct 2022 16:40:04 +0000 (09:40 -0700)]
Merge tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
- use after free fix for reconnect race
- two memory leak fixes
* tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix use-after-free caused by invalid pointer `hostname`
cifs: Fix pages leak when writedata alloc failed in cifs_write_from_iter()
cifs: Fix pages array leak when writedata alloc failed in cifs_writedata_alloc()
Linus Torvalds [Sun, 30 Oct 2022 01:33:03 +0000 (18:33 -0700)]
Merge tag 'random-6.1-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fix from Jason Donenfeld:
"One fix from Jean-Philippe Brucker, addressing a regression in which
early boot code on ARM64 would use the non-_early variant of the
arch_get_random family of functions, resulting in the architectural
random number generator appearing unavailable during that early phase
of boot.
The fix simply changes arch_get_random*() to arch_get_random*_early().
This distinction between these two functions is a bit of an old wart
I'm not a fan of, and for 6.2 I'll see if I can make obsolete the
_early variant, so that one function does the right thing in all
contexts without overhead"
* tag 'random-6.1-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: use arch_get_random*_early() in random_init()
Linus Torvalds [Sun, 30 Oct 2022 01:12:45 +0000 (18:12 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Varions small fixes, all in drivers.
Some of these arrived during the merge window and got held over to
make sure of testing on the -rc tree.
The biggest change is for standards conformance in the target driver,
closely followed by a set of bug fixes in megaraid_sas"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits)
scsi: ufs: core: Fix typo in comment
scsi: mpi3mr: Select CONFIG_SCSI_SAS_ATTRS
scsi: ufs: core: Fix typo for register name in comments
scsi: pm80xx: Display proc_name in sysfs
scsi: ufs: core: Fix the error log in ufshcd_query_flag_retry()
scsi: ufs: core: Remove unneeded casts from void *
scsi: lpfc: Fix spelling mistake "unsolicted" -> "unsolicited"
scsi: qla2xxx: Use transport-defined speed mask for supported_speeds
scsi: target: iblock: Fold iblock_emulate_read_cap_with_block_size() into iblock_get_blocks()
scsi: qla2xxx: Fix serialization of DCBX TLV data request
scsi: ufs: qcom: Remove redundant dev_err() call
scsi: megaraid_sas: Move megasas_dbg_lvl init to megasas_init()
scsi: megaraid_sas: Remove unnecessary memset()
scsi: megaraid_sas: Simplify megasas_update_device_list
scsi: megaraid_sas: Correct an error message
scsi: megaraid_sas: Correct value passed to scsi_device_lookup()
scsi: target: core: UA on all LUNs after reset
scsi: target: core: New key must be used for moved PR
scsi: target: core: Abort all preempted regs if requested
scsi: target: core: Fix memory leak in preempt_and_abort
...
Linus Torvalds [Sun, 30 Oct 2022 01:06:52 +0000 (18:06 -0700)]
Merge tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- make the multipath dma alignment match the non-multipath one
(Keith Busch)
- fix a bogus use of sg_init_marker() (Nam Cao)
- fix circulr locking in nvme-tcp (Sagi Grimberg)
- Initialization fix for requests allocated via the special hw queue
allocator (John)
- Fix for a regression added in this release with the batched
completions of end_io backed requests (Ming)
- Error handling leak fix for rbd (Yang)
- Error handling leak fix for add_disk() failure (Yu)
* tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux:
blk-mq: Properly init requests from blk_mq_alloc_request_hctx()
blk-mq: don't add non-pt request with ->end_io to batch
rbd: fix possible memory leak in rbd_sysfs_init()
nvme-multipath: set queue dma alignment to 3
nvme-tcp: fix possible circular locking when deleting a controller under memory pressure
nvme-tcp: replace sg_init_marker() with sg_init_table()
block: fix memory leak for elevator on add_disk failure
Linus Torvalds [Sun, 30 Oct 2022 01:01:16 +0000 (18:01 -0700)]
Merge tag 'io_uring-6.1-2022-10-28' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Just a fix for a locking regression introduced with the deferred
task_work running from this merge window"
* tag 'io_uring-6.1-2022-10-28' of git://git.kernel.dk/linux:
io_uring: unlock if __io_run_local_work locked inside
io_uring: use io_run_local_work_locked helper
Linus Torvalds [Sun, 30 Oct 2022 00:49:33 +0000 (17:49 -0700)]
Merge tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"Eight fix pre-6.0 bugs and the remainder address issues which were
introduced in the 6.1-rc merge cycle, or address issues which aren't
considered sufficiently serious to warrant a -stable backport"
* tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
lib: maple_tree: remove unneeded initialization in mtree_range_walk()
mmap: fix remap_file_pages() regression
mm/shmem: ensure proper fallback if page faults
mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
x86: fortify: kmsan: fix KMSAN fortify builds
x86: asm: make sure __put_user_size() evaluates pointer once
Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
x86/purgatory: disable KMSAN instrumentation
mm: kmsan: export kmsan_copy_page_meta()
mm: migrate: fix return value if all subpages of THPs are migrated successfully
mm/uffd: fix vma check on userfault for wp
mm: prep_compound_tail() clear page->private
mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
mm/page_isolation: fix clang deadcode warning
fs/ext4/super.c: remove unused `deprecated_msg'
ipc/msg.c: fix percpu_counter use after free
memory tier, sysfs: rename attribute "nodes" to "nodelist"
MAINTAINERS: git://github.com -> https://github.com for nilfs2
mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
...
Linus Torvalds [Sat, 29 Oct 2022 17:35:17 +0000 (10:35 -0700)]
Merge tag 'powerpc-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a case of rescheduling with user access unlocked, when preempt is
enabled.
- A follow-up fix for a recent fix, which could lead to IRQ state
assertions firing incorrectly.
- Two fixes for lockdep warnings seen when using kfence with the Hash
MMU.
- Two fixes for preempt warnings seen when using the Hash MMU.
- Two fixes for the VAS coprocessor mechanism used on pseries.
- Prevent building some of our older KVM backends when
CONTEXT_TRACKING_USER is enabled, as it's known to cause crashes.
- A couple of fixes for issues seen with PMU NMIs.
Thanks to Nicholas Piggin, Guenter Roeck, Frederic Barrat Haren Myneni,
Sachin Sant, and Samuel Holland.
* tag 'powerpc-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/interrupt: Fix clear of PACA_IRQS_HARD_DIS when returning to soft-masked context
powerpc/64s/interrupt: Perf NMI should not take normal exit path
powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warning
KVM: PPC: BookS PR-KVM and BookE do not support context tracking
powerpc: Fix reschedule bug in KUAP-unlocked user copy
powerpc/64s: Fix hash__change_memory_range preemption warning
powerpc/64s: Disable preemption in hash lazy mmu mode
powerpc/64s: make linear_map_hash_lock a raw spinlock
powerpc/64s: make HPTE lock and native_tlbie_lock irq-safe
powerpc/64s: Add lockdep for HPTE lock
powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU
powerpc/pseries/vas: Add VAS IRQ primary handler
Yang Yingliang [Sat, 29 Oct 2022 08:29:31 +0000 (16:29 +0800)]
platform/loongarch: laptop: Fix possible UAF and simplify generic_acpi_laptop_init()
Currently the return value of 'sub_driver->init' is not checked. If
sparse_keymap_setup() called in the init function fails, 'generic_
inputdev' is freed, then it will lead a UAF when using it in generic_
acpi_laptop_init(). Fix it by checking the return value and setting
generic_inputdev to NULL after free, so as to avoid double free it.
The error code in generic_subdriver_init() is always negative, so the
return of generic_subdriver_init() can be simplified.
Huacai Chen [Sat, 29 Oct 2022 08:29:31 +0000 (16:29 +0800)]
platform/loongarch: laptop: Adjust resume order for loongson_hotkey_resume()
Some laptops don't support SW_LID, but still have backlight control,
move backlight resuming before SW_LID event handling so as to avoid
backlight mistake due to early return.
Huacai Chen [Sat, 29 Oct 2022 08:29:31 +0000 (16:29 +0800)]
LoongArch: BPF: Avoid declare variables in switch-case
Not all compilers support declare variables in switch-case, so move
declarations to the beginning of a function. Otherwise we may get such
build errors:
arch/loongarch/net/bpf_jit.c: In function ‘emit_atomic’:
arch/loongarch/net/bpf_jit.c:362:3: error: a label can only be part of a statement and a declaration is not a statement
u8 r0 = regmap[BPF_REG_0];
^~
arch/loongarch/net/bpf_jit.c: In function ‘build_insn’:
arch/loongarch/net/bpf_jit.c:727:3: error: a label can only be part of a statement and a declaration is not a statement
u8 t7 = -1;
^~
arch/loongarch/net/bpf_jit.c:778:3: error: a label can only be part of a statement and a declaration is not a statement
int ret;
^~~
arch/loongarch/net/bpf_jit.c:779:3: error: expected expression before ‘u64’
u64 func_addr;
^~~
arch/loongarch/net/bpf_jit.c:780:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
bool func_addr_fixed;
^~~~
arch/loongarch/net/bpf_jit.c:784:11: error: ‘func_addr’ undeclared (first use in this function); did you mean ‘in_addr’?
&func_addr, &func_addr_fixed);
^~~~~~~~~
in_addr
arch/loongarch/net/bpf_jit.c:784:11: note: each undeclared identifier is reported only once for each function it appears in
arch/loongarch/net/bpf_jit.c:814:3: error: a label can only be part of a statement and a declaration is not a statement
u64 imm64 = (u64)(insn + 1)->imm << 32 | (u32)insn->imm;
^~~
Jinyang He [Sat, 29 Oct 2022 08:29:31 +0000 (16:29 +0800)]
LoongArch: Remove unused kernel stack padding
The current LoongArch kernel stack is padded as if obeying the MIPS o32
calling convention (32 bytes), signifying the port's MIPS lineage but no
longer making sense. Remove the padding for clarity.
Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Linus Torvalds [Sat, 29 Oct 2022 00:03:00 +0000 (17:03 -0700)]
Merge tag 'riscv-for-linus-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for a build warning in the jump_label code
- One of the git://github -> https://github cleanups, for the SiFive
drivers
- A fix for the kasan initialization code, this still likely warrants
some cleanups but that's a bigger problem and at least this fixes the
crashes in the short term
- A pair of fixes for extension support detection on mixed LLVM/GNU
toolchains
- A fix for a runtime warning in the /proc/cpuinfo code
* tag 'riscv-for-linus-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix /proc/cpuinfo cpumask warning
riscv: fix detection of toolchain Zihintpause support
riscv: fix detection of toolchain Zicbom support
riscv: mm: add missing memcpy in kasan_init
MAINTAINERS: git://github.com -> https://github.com for sifive
riscv: jump_label: mark arguments as const to satisfy asm constraints
Linus Torvalds [Fri, 28 Oct 2022 23:48:29 +0000 (16:48 -0700)]
Merge tag 'acpi-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and device properties fixes from Rafael Wysocki:
"These fix device properties documentation and the ACPI PCC code, add a
new IRQ override quirk for resource handling and add one more item to
the list of device IDs to be ignored when returned by _DEP.
Specifics:
- Fix the documentation of the *_match_string() family of functions
to properly cover the return value (Andy Shevchenko)
- Fix a possible integer overflow during multiplication in the ACPI
PCC code (Manank Patel)
- Make the ACPI device resources code skip IRQ override on Asus
Vivobook S5602ZA (Tamim Khan)
- Add LATT2021 to the list of device IDs that are ignored when
returned by _DEP, because there are no drivers for them in the
kernel and no plans to add such drivers (Hans de Goede)"
* tag 'acpi-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: scan: Add LATT2021 to acpi_ignore_dep_ids[]
ACPI: resource: Skip IRQ override on Asus Vivobook S5602ZA
ACPI: PCC: Fix unintentional integer overflow
device property: Fix documentation for *_match_string() APIs
Linus Torvalds [Fri, 28 Oct 2022 23:44:12 +0000 (16:44 -0700)]
Merge tag 'pm-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These make the intel_pstate driver work as expected on all hybrid
platforms to date (regardless of possible platform firmware issues),
fix hybrid sleep on systems using suspend-to-idle by default, make the
generic power domains code handle disabled idle states properly and
update pm-graph.
Specifics:
- Make intel_pstate use what is known about the hardware instead of
relying on information from the platform firmware (ACPI CPPC in
particular) to establish the relationship between the HWP CPU
performance levels and frequencies on all hybrid platforms
available to date (Rafael Wysocki)
- Allow hybrid sleep to use suspend-to-idle as a system suspend
method if it is the current suspend method of choice (Mario
Limonciello)
- Fix handling of unavailable/disabled idle states in the generic
power domains code (Sudeep Holla)
- Update the pm-graph suite of utilities to version 5.10 which is
fixes-mostly and does not add any new features (Todd Brandt)"
* tag 'pm-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: domains: Fix handling of unavailable/disabled idle states
pm-graph v5.10
cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores
cpufreq: intel_pstate: Read all MSRs on the target CPU
PM: hibernate: Allow hybrid sleep to work with s2idle
random: use arch_get_random*_early() in random_init()
While reworking the archrandom handling, commit d349ab99eec7 ("random:
handle archrandom with multiple longs") switched to the non-early
archrandom helpers in random_init(), which broke initialization of the
entropy pool from the arm64 random generator.
Indeed at that point the arm64 CPU features, which verify that all CPUs
have compatible capabilities, are not finalized so arch_get_random_seed_longs()
is unsuccessful. Instead random_init() should use the _early functions,
which check only the boot CPU on arm64. On other architectures the
_early functions directly call the normal ones.
Fixes: d349ab99eec7 ("random: handle archrandom with multiple longs") Cc: stable@vger.kernel.org Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
lru_gen_add_mm() has been added within an IRQ-off region in the commit
mentioned below. The other invocations of lru_gen_add_mm() are not within
an IRQ-off region.
The invocation within IRQ-off region is problematic on PREEMPT_RT because
the function is using a spin_lock_t which must not be used within
IRQ-disabled regions.
The other invocations of lru_gen_add_mm() occur while
task_struct::alloc_lock is acquired. Move lru_gen_add_mm() after
interrupts are enabled and before task_unlock().
Link: https://lkml.kernel.org/r/20221026134830.711887-1-bigeasy@linutronix.de Fixes: bd74fdaea1460 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lukas Bulwahn [Wed, 26 Oct 2022 12:00:29 +0000 (14:00 +0200)]
lib: maple_tree: remove unneeded initialization in mtree_range_walk()
Before the do-while loop in mtree_range_walk(), the variables next, min,
max need to be initialized. The variables last, prev_min and prev_max are
set within the loop body before they are eventually used after exiting the
loop body.
As it is a do-while loop, the loop body is executed at least once, so the
variables last, prev_min and prev_max do not need to be initialized before
the loop body.
Remove unneeded initialization of last and prev_min.
The needless initialization was reported by clang-analyzer as Dead Stores.
As the compiler already identifies these assignments as unneeded, it
optimizes the assignments away. Hence:
Liam Howlett [Tue, 25 Oct 2022 16:12:49 +0000 (16:12 +0000)]
mmap: fix remap_file_pages() regression
When using the VMA iterator, the final execution will set the variable
'next' to NULL which causes the function to fail out. Restore the break
in the loop to exit the VMA iterator early without clearing NULL fixes the
issue.
Ira Weiny [Tue, 25 Oct 2022 22:01:08 +0000 (15:01 -0700)]
mm/shmem: ensure proper fallback if page faults
The kernel test robot flagged a recursive lock as a result of a conversion
from kmap_atomic() to kmap_local_folio()[Link]
The cause was due to the code depending on the kmap_atomic() side effect
of disabling page faults. In that case the code expects the fault to fail
and take the fallback case.
git archaeology implied that the recursion may not be an actual bug.[1]
However, depending on the implementation of the mmap_lock and the
condition of the call there may still be a deadlock.[2] So this is not
purely a lockdep issue. Considering a single threaded call stack there
are 3 options.
1) Different mm's are in play (no issue)
2) Readlock implementation is recursive and same mm is in play
(no issue)
3) Readlock implementation is _not_ recursive (issue)
The mmap_lock is recursive so with a single thread there is no issue.
However, Matthew pointed out a deadlock scenario when you consider
additional process' and threads thusly.
"The readlock implementation is only recursive if nobody else has taken a
write lock. If you have a multithreaded process, one of the other threads
can call mmap() and that will prevent recursion (due to fairness). Even
if it's a different process that you're trying to acquire the mmap read
lock on, you can still get into a deadly embrace. eg:
process A thread 1 takes read lock on own mmap_lock
process A thread 2 calls mmap, blocks taking write lock
process B thread 1 takes page fault, read lock on own mmap lock
process B thread 2 calls mmap, blocks taking write lock
process A thread 1 blocks taking read lock on process B
process B thread 1 blocks taking read lock on process A
Now all four threads are blocked waiting for each other."
Regardless using pagefault_disable() ensures that no matter what locking
implementation is used a deadlock will not occur. Add an explicit
pagefault_disable() and a big comment to explain this for future souls
looking at this code.
Link: https://lkml.kernel.org/r/20221025220108.2366043-1-ira.weiny@intel.com Link: https://lore.kernel.org/r/202210211215.9dc6efb5-yujie.liu@intel.com Fixes: 7a7256d5f512 ("shmem: convert shmem_mfill_atomic_pte() to use a folio") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: kernel test robot <yujie.liu@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ira Weiny [Mon, 24 Oct 2022 04:34:52 +0000 (21:34 -0700)]
mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
kmap() and kmap_atomic() are being deprecated in favor of
kmap_local_page() which is appropriate for any thread local context.[1]
A recent locking bug report with userfaultfd showed that the conversion of
the kmap_atomic()'s in those code flows requires care with regard to the
prevention of deadlock.[2]
git archaeology implied that the recursion may not be an actual bug.[3]
However, depending on the implementation of the mmap_lock and the
condition of the call there may still be a deadlock.[4] So this is not
purely a lockdep issue. Considering a single threaded call stack there
are 3 options.
1) Different mm's are in play (no issue)
2) Readlock implementation is recursive and same mm is in play
(no issue)
3) Readlock implementation is _not_ recursive (issue)
The mmap_lock is recursive so with a single thread there is no issue.
However, Matthew pointed out a deadlock scenario when you consider
additional process' and threads thusly.
"The readlock implementation is only recursive if nobody else has taken a
write lock. If you have a multithreaded process, one of the other threads
can call mmap() and that will prevent recursion (due to fairness). Even
if it's a different process that you're trying to acquire the mmap read
lock on, you can still get into a deadly embrace. eg:
process A thread 1 takes read lock on own mmap_lock
process A thread 2 calls mmap, blocks taking write lock
process B thread 1 takes page fault, read lock on own mmap lock
process B thread 2 calls mmap, blocks taking write lock
process A thread 1 blocks taking read lock on process B
process B thread 1 blocks taking read lock on process A
Now all four threads are blocked waiting for each other."
Regardless using pagefault_disable() ensures that no matter what locking
implementation is used a deadlock will not occur.
Complete kmap conversion in userfaultfd by replacing the kmap() and
kmap_atomic() calls with kmap_local_page(). When replacing the
kmap_atomic() call ensure page faults continue to be disabled to support
the correct fall back behavior and add a comment to inform future souls of
the requirement.
Ensure that KMSAN builds replace memset/memcpy/memmove calls with the
respective __msan_XXX functions, and that none of the macros are redefined
twice. This should allow building kernel with both CONFIG_KMSAN and
CONFIG_FORTIFY_SOURCE.
x86: asm: make sure __put_user_size() evaluates pointer once
User access macros must ensure their arguments are evaluated only once if
they are used more than once in the macro body. Adding
instrument_put_user() to __put_user_size() resulted in double evaluation
of the `ptr` argument, which led to correctness issues when performing
e.g. unsafe_put_user(..., p++, ...).
To fix those issues, evaluate the `ptr` argument of __put_user_size() at
the beginning of the macro.
Link: https://lkml.kernel.org/r/20221024212144.2852069-4-glider@google.com Fixes: 888f84a6da4d ("x86: asm: instrument usercopy in get_user() and put_user()") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: youling257 <youling257@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
KMSAN adds a lot of instrumentation to the code, which results in
increased stack usage (up to 2048 bytes and more in some cases). It's
hard to predict how big the stack frames can be, so we disable the
warnings for KMSAN instead.
Baolin Wang [Mon, 24 Oct 2022 08:34:21 +0000 (16:34 +0800)]
mm: migrate: fix return value if all subpages of THPs are migrated successfully
During THP migration, if THPs are not migrated but they are split and all
subpages are migrated successfully, migrate_pages() will still return the
number of THP pages that were not migrated. This will confuse the callers
of migrate_pages(). For example, the longterm pinning will failed though
all pages are migrated successfully.
Thus we should return 0 to indicate that all pages are migrated in this
case
Link: https://lkml.kernel.org/r/de386aa864be9158d2f3b344091419ea7c38b2f7.1666599848.git.baolin.wang@linux.alibaba.com Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
I just got time to revisit this and found that the root cause is we simply
messed up with the vma check, so that for !PTE_MARKER_UFFD_WP system, we
will allow UFFDIO_REGISTER of MINOR & WP upon shmem as the check was
wrong:
if (vm_flags & VM_UFFD_MINOR)
return is_vm_hugetlb_page(vma) || vma_is_shmem(vma);
Where we'll allow anything to pass on shmem as long as minor mode is
requested.
Axel did it right when introducing minor mode but I messed it up in b1f9e876862d when moving code around. Fix it.
Hugh Dickins [Sat, 22 Oct 2022 07:51:06 +0000 (00:51 -0700)]
mm: prep_compound_tail() clear page->private
Although page allocation always clears page->private in the first page or
head page of an allocation, it has never made a point of clearing
page->private in the tails (though 0 is often what is already there).
But now commit 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t
during THP split") issues a warning when page_tail->private is found to be
non-0 (unless it's swapcache).
We could just delete the warning, but today's consensus appears to want
page->private to be 0, unless there's a good reason for it to be set: so
now clear it in prep_compound_tail() (more general than just for THP; but
not for high order allocation, which makes no pass down the tails).
Link: https://lkml.kernel.org/r/1c4233bb-4e4d-5969-fbd4-96604268a285@google.com Fixes: 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rik van Riel [Fri, 21 Oct 2022 23:28:05 +0000 (19:28 -0400)]
mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
A common use case for hugetlbfs is for the application to create
memory pools backed by huge pages, which then get handed over to
some malloc library (eg. jemalloc) for further management.
That malloc library may be doing MADV_DONTNEED calls on memory
that is no longer needed, expecting those calls to happen on
PAGE_SIZE boundaries.
However, currently the MADV_DONTNEED code rounds up any such
requests to HPAGE_PMD_SIZE boundaries. This leads to undesired
outcomes when jemalloc expects a 4kB MADV_DONTNEED, but 2MB of
memory get zeroed out, instead.
Use of pre-built shared libraries means that user code does not
always know the page size of every memory arena in use.
Avoid unexpected data loss with MADV_DONTNEED by rounding up
only to PAGE_SIZE (in do_madvise), and rounding down to huge
page granularity.
That way programs will only get as much memory zeroed out as
they requested.
Link: https://lkml.kernel.org/r/20221021192805.366ad573@imladris.surriel.com Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Maria Yu [Fri, 21 Oct 2022 10:15:55 +0000 (18:15 +0800)]
mm/page_isolation: fix clang deadcode warning
When !CONFIG_VM_BUG_ON, there is warning of
clang-analyzer-deadcode.DeadStores:
Value stored to 'mt' during its initialization is never read.
Link: https://lkml.kernel.org/r/20221021101555.7992-2-quic_aiquny@quicinc.com Signed-off-by: Maria Yu <quic_aiquny@quicinc.com> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Fri, 21 Oct 2022 04:19:22 +0000 (21:19 -0700)]
ipc/msg.c: fix percpu_counter use after free
These percpu counters are referenced in free_ipcs->freeque, so destroy
them later.
Fixes: 72d1e611082e ("ipc/msg: mitigate the lock contention with percpu counter") Reported-by: syzbot+96e659d35b9d6b541152@syzkaller.appspotmail.com Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: Jiebin Sun <jiebin.sun@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Waiman Long [Thu, 20 Oct 2022 17:56:19 +0000 (13:56 -0400)]
mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
Commit 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object
iteration loop of kmemleak_scan()") adds cond_resched() in the first
object iteration loop of kmemleak_scan(). However, it turns that the 2nd
objection iteration loop can still cause soft lockup to happen in some
cases. So add a cond_resched() call in the 2nd and 3rd loops as well to
prevent that and for completeness.
Link: https://lkml.kernel.org/r/20221020175619.366317-1-longman@redhat.com Fixes: 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()") Signed-off-by: Waiman Long <longman@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Phillip Lougher [Thu, 20 Oct 2022 22:36:15 +0000 (23:36 +0100)]
squashfs: fix extending readahead beyond end of file
The readahead code will try to extend readahead to the entire size of the
Squashfs data block.
But, it didn't take into account that the last block at the end of the
file may not be a whole block. In this case, the code would extend
readahead to beyond the end of the file, leaving trailing pages.
Fix this by only requesting the expected number of pages.
Phillip Lougher [Thu, 20 Oct 2022 22:36:14 +0000 (23:36 +0100)]
squashfs: fix read regression introduced in readahead code
Patch series "squashfs: fix some regressions introduced in the readahead
code".
This patchset fixes 3 regressions introduced by the recent readahead code
changes. The first regression is causing "snaps" to randomly fail after a
couple of hours or days, which how the regression came to light.
This patch (of 3):
If a file isn't a whole multiple of the page size, the last page will have
trailing bytes unfilled.
There was a mistake in the readahead code which did this. In particular
it incorrectly assumed that the last page in the readahead page array
(page[nr_pages - 1]) will always contain the last page in the block, which
if we're at file end, will be the page that needs to be zero filled.
But the readahead code may not return the last page in the block, which
means it is unmapped and will be skipped by the decompressors (a temporary
buffer used).
In this case the zero filling code will zero out the wrong page, leading
to data corruption.
Fix this by by extending the "page actor" to return the last page if
present, or NULL if a temporary buffer was used.
Linus Torvalds [Fri, 28 Oct 2022 20:24:57 +0000 (13:24 -0700)]
Merge tag 'rtc-6.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC fixes from Alexandre Belloni:
"Fix wakeup support that broke on multiple platforms"
* tag 'rtc-6.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: cmos: fix build on non-ACPI platforms
rtc: cmos: Fix wake alarm breakage
Linus Torvalds [Fri, 28 Oct 2022 20:00:54 +0000 (13:00 -0700)]
Merge tag 'mmc-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Cancel recovery work on cleanup to avoid NULL pointer dereference
- Fix error path in the read/write error recovery path
- Fix kernel panic when remove non-standard SDIO card
- Fix WRITE_ZEROES handling for CQE
MMC host:
- sdhci_am654: Fixup Kconfig dependency for REGMAP_MMIO
- sdhci-esdhc-imx: Avoid warning of misconfigured bus-width
- sdhci-pci: Disable broken HS400 ES mode for ASUS BIOS on Jasper
Lake"
* tag 'mmc-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci_am654: 'select', not 'depends' REGMAP_MMIO
mmc: core: Fix WRITE_ZEROES CQE handling
mmc: core: Fix kernel panic when remove non-standard SDIO card
mmc: sdhci-pci-core: Disable ES for ASUS BIOS on Jasper Lake
mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus
mmc: queue: Cancel recovery work on cleanup
mmc: block: Remove error check of hw_reset on reset
Linus Torvalds [Fri, 28 Oct 2022 19:20:31 +0000 (12:20 -0700)]
Merge tag 'sound-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes:
- fixes for regressions by the recent ALSA control hash usages
- fixes for UAF with del_timer() at removals in a few drivers
- char signedness fixes
- a few memory leak fixes in error paths
- device-specific fixes / quirks for Intel SOF, AMD, HD-audio,
USB-audio, and various ASoC codecs"
* tag 'sound-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (50 commits)
ALSA: aoa: Fix I2S device accounting
ALSA: Use del_timer_sync() before freeing timer
ALSA: aoa: i2sbus: fix possible memory leak in i2sbus_add_dev()
ALSA: rme9652: use explicitly signed char
ALSA: au88x0: use explicitly signed char
ALSA: hda/realtek: Add another HP ZBook G9 model quirks
ALSA: usb-audio: Add quirks for M-Audio Fast Track C400/600
ASoC: SOF: Intel: hda-codec: fix possible memory leak in hda_codec_device_init()
ASoC: amd: yc: Add Lenovo Thinkbook 14+ 2022 21D0 to quirks table
ASoC: Intel: Skylake: fix possible memory leak in skl_codec_device_init()
ALSA: ac97: Use snd_ctl_rename() to rename a control
ALSA: ca0106: Use snd_ctl_rename() to rename a control
ALSA: emu10k1: Use snd_ctl_rename() to rename a control
ALSA: hda/realtek: Use snd_ctl_rename() to rename a control
ALSA: usb-audio: Use snd_ctl_rename() to rename a control
ALSA: control: add snd_ctl_rename()
ALSA: ac97: fix possible memory leak in snd_ac97_dev_register()
ASoC: SOF: Intel: pci-tgl: fix ADL-N descriptor
ASoC: qcom: lpass-cpu: Mark HDMI TX parity register as volatile
ASoC: amd: yc: Adding Lenovo ThinkBook 14 Gen 4+ ARA and Lenovo ThinkBook 16 Gen 4+ ARA to the Quirks List
...
Linus Torvalds [Fri, 28 Oct 2022 19:10:43 +0000 (12:10 -0700)]
Merge tag 'drm-fixes-2022-10-28' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regularly scheduled fixes for drm, live from a Red Hat office for the
first time in a while.
The core has two fixes, one for scheduler leak and one for aperture
uninit read.
Otherwise a single bridge fix, and msm, amdgpu/kfd and i915 have a set
of fixes each.
sched:
- Stop leaking fences when killing a sched entity.
aperture:
- Avoid uninitialized read in aperture_remove_conflicting_pci_device()
bridge:
- Fix HPD on bridge/ps8640.
msm:
- Fix shrinker deadlock
- Fix crash during suspend after unbind
- Fix IRQ lifetime issues
- Fix potential memory corruption with too many bridges
- Fix memory corruption on GPU state capture
amdkfd:
- Fix possible memory leak
- Fix GC 10.x cache info reporting
i915:
- Extend Wa_1607297627 to Alderlake-P
- Keep PCI autosuspend control 'on' by default on all dGPU
- Reset frl trained flag before restarting FRL training"
* tag 'drm-fixes-2022-10-28' of git://anongit.freedesktop.org/drm/drm: (39 commits)
fbdev/core: Avoid uninitialized read in aperture_remove_conflicting_pci_device()
drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume
drm/scheduler: fix fence ref counting
drm/amd/display: Revert logic for plane modifiers
drm/amdkfd: correct the cache info for gfx1036
drm/amdkfd: update gfx1037 Lx cache setting
drm/amdgpu: skip mes self test for gc 11.0.3 in recover
drm/amd: Add IMU fw version to fw version queries
drm/amd/display: Don't return false if no stream
drm/amd/display: Remove wrong pipe control lock
drm/amd/pm: allow gfxoff on gc_11_0_3
drm/amdkfd: Fix memory leak in kfd_mem_dmamap_userptr()
drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x
drm/i915/dp: Reset frl trained flag before restarting FRL training
drm/i915/dgfx: Keep PCI autosuspend control 'on' by default on all dGPU
drm/i915: Extend Wa_1607297627 to Alderlake-P
drm/amdgpu: Adjust MES polling timeout for sriov
drm/amd/pm: update driver-if header for smu_v13_0_10
drm/amdgpu: fix pstate setting issue
drm/bridge: ps8640: Add back the 50 ms mystery delay after HPD
...
Linus Torvalds [Fri, 28 Oct 2022 16:53:30 +0000 (09:53 -0700)]
Merge tag 'v6.1-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix an alignment crash in x86/polyval"
* tag 'v6.1-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/polyval - Fix crashes when keys are not 16-byte aligned
John Garry [Wed, 26 Oct 2022 10:35:13 +0000 (18:35 +0800)]
blk-mq: Properly init requests from blk_mq_alloc_request_hctx()
Function blk_mq_alloc_request_hctx() is missing zeroing/init of rq->bio,
biotail, __sector, and __data_len members, which blk_mq_alloc_request()
has, so duplicate what we do in blk_mq_alloc_request().
Fixes: 1f5bd336b9150 ("blk-mq: add blk_mq_alloc_request_hctx") Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1666780513-121650-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zeng Heng [Thu, 27 Oct 2022 12:45:28 +0000 (20:45 +0800)]
cifs: fix use-after-free caused by invalid pointer `hostname`
`hostname` needs to be set as null-pointer after free in
`cifs_put_tcp_session` function, or when `cifsd` thread attempts
to resolve hostname and reconnect the host, the thread would deref
the invalid pointer.
Here is one of practical backtrace examples as reference:
cifsd
---------------------------
kthread
cifs_demultiplex_thread
cifs_reconnect
reconn_set_ipaddr_from_hostname
--> if (!server->hostname)
--> if (server->hostname[0] == '\0') // !! UAF fault here
CIFS: VFS: cifs_mount failed w/return code = -112
mount error(112): Host is down
BUG: KASAN: use-after-free in reconn_set_ipaddr_from_hostname+0x2ba/0x310
Read of size 1 at addr ffff888108f35380 by task cifsd/480
CPU: 2 PID: 480 Comm: cifsd Not tainted 6.1.0-rc2-00106-gf705792f89dd-dirty #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x68/0x85
print_report+0x16c/0x4a3
kasan_report+0x95/0x190
reconn_set_ipaddr_from_hostname+0x2ba/0x310
__cifs_reconnect.part.0+0x241/0x800
cifs_reconnect+0x65f/0xb60
cifs_demultiplex_thread+0x1570/0x2570
kthread+0x2c5/0x380
ret_from_fork+0x22/0x30
</TASK>
Allocated by task 477:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x7e/0x90
__kmalloc_node_track_caller+0x52/0x1b0
kstrdup+0x3b/0x70
cifs_get_tcp_session+0xbc/0x19b0
mount_get_conns+0xa9/0x10c0
cifs_mount+0xdf/0x1970
cifs_smb3_do_mount+0x295/0x1660
smb3_get_tree+0x352/0x5e0
vfs_get_tree+0x8e/0x2e0
path_mount+0xf8c/0x1990
do_mount+0xee/0x110
__x64_sys_mount+0x14b/0x1f0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 477:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x50
__kasan_slab_free+0x10a/0x190
__kmem_cache_free+0xca/0x3f0
cifs_put_tcp_session+0x30c/0x450
cifs_mount+0xf95/0x1970
cifs_smb3_do_mount+0x295/0x1660
smb3_get_tree+0x352/0x5e0
vfs_get_tree+0x8e/0x2e0
path_mount+0xf8c/0x1990
do_mount+0xee/0x110
__x64_sys_mount+0x14b/0x1f0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff888108f35380
which belongs to the cache kmalloc-16 of size 16
The buggy address is located 0 bytes inside of
16-byte region [ffff888108f35380, ffff888108f35390)
The buggy address belongs to the physical page:
page:00000000333f8e58 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888108f350e0 pfn:0x108f35
flags: 0x200000000000200(slab|node=0|zone=2)
raw: 02000000000002000000000000000000dead000000000122ffff8881000423c0
raw: ffff888108f350e0000000008080007a00000001ffffffff0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888108f35280: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc ffff888108f35300: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc
>ffff888108f35380: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc
^ ffff888108f35400: fa fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888108f35480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Fixes: 7be3248f3139 ("cifs: To match file servers, make sure the server hostname matches") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
Dave Airlie [Fri, 28 Oct 2022 02:58:33 +0000 (12:58 +1000)]
Merge tag 'drm-misc-fixes-2022-10-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc3:
- Fix HPD on bridge/ps8640.
- Stop leaking fences when killing a sched entity.
- Avoid uninitialized read in aperture_remove_conflicting_pci_device()
Dave Airlie [Fri, 28 Oct 2022 02:57:18 +0000 (12:57 +1000)]
Merge tag 'drm-intel-fixes-2022-10-27-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Extend Wa_1607297627 to Alderlake-P (José Roberto de Souza)
- Keep PCI autosuspend control 'on' by default on all dGPU (Anshuman Gupta)
- Reset frl trained flag before restarting FRL training (Ankit Nautiyal)
Andrew Jones [Fri, 14 Oct 2022 15:58:44 +0000 (17:58 +0200)]
RISC-V: Fix /proc/cpuinfo cpumask warning
Commit 78e5a3399421 ("cpumask: fix checking valid cpu range") has
started issuing warnings[*] when cpu indices equal to nr_cpu_ids - 1
are passed to cpumask_next* functions. seq_read_iter() and cpuinfo's
start and next seq operations implement a pattern like
n = cpumask_next(n - 1, mask);
show(n);
while (1) {
++n;
n = cpumask_next(n - 1, mask);
if (n >= nr_cpu_ids)
break;
show(n);
}
which will issue the warning when reading /proc/cpuinfo. Ensure no
warning is generated by validating the cpu index before calling
cpumask_next().
[*] Warnings will only appear with DEBUG_PER_CPU_MAPS enabled.
Palmer Dabbelt [Thu, 27 Oct 2022 22:09:50 +0000 (15:09 -0700)]
Merge patch series "Fix RISC-V toolchain extension support detection"
Conor Dooley <conor@kernel.org> says:
From: Conor Dooley <conor.dooley@microchip.com>
This came up due to a report from Kevin @ kernel-ci, who had been
running a mixed configuration of GNU binutils and clang. Their compiler
was relatively recent & supports Zicbom but binutils @ 2.35.2 did not.
Our current checks for extension support only cover the compiler, but it
appears to me that we need to check both the compiler & linker support
in case of "pot-luck" configurations that mix different versions of
LD,AS,CC etc.
Linker support does not seem possible to actually check, since the ISA
string is emitted into the object files - so I put in version checks for
that. The checks have gotten a bit ugly since 32 & 64 bit support need
to be checked independently but ahh well.
As I was going, I fell into the trap of there being duplicated checks
for CC support in both the Makefile and Kconfig, so as part of renaming
the Kconfig symbol to TOOLCHAIN_HAS_FOO, I dropped the extra checks in
the Makefile. This has the added advantage of the TOOLCHAIN_HAS_FOO
symbol for Zihintpause appearing in .config.
I pushed out a version of this that specificly checked for assember
support for LKP to test & it looked /okay/ - but I did some more testing
today and realised that this is redudant & have since dropped the as
check.
I tested locally with a fair few different combinations, to try and
cover each of AS, LD, CC missing support for the extension.
* b4-shazam-merge:
riscv: fix detection of toolchain Zihintpause support
riscv: fix detection of toolchain Zicbom support
Conor Dooley [Thu, 6 Oct 2022 17:35:21 +0000 (18:35 +0100)]
riscv: fix detection of toolchain Zihintpause support
It is not sufficient to check if a toolchain supports a particular
extension without checking if the linker supports that extension
too. For example, Clang 15 supports Zihintpause but GNU bintutils
2.35.2 does not, leading build errors like so:
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zihintpause2p0: Invalid or unknown z ISA extension: 'zihintpause'
Add a TOOLCHAIN_HAS_ZIHINTPAUSE which checks if each of the compiler,
assembler and linker support the extension. Replace the ifdef in the
vdso with one depending on this new symbol.
Conor Dooley [Thu, 6 Oct 2022 17:35:20 +0000 (18:35 +0100)]
riscv: fix detection of toolchain Zicbom support
It is not sufficient to check if a toolchain supports a particular
extension without checking if the linker supports that extension too.
For example, Clang 15 supports Zicbom but GNU bintutils 2.35.2 does
not, leading build errors like so:
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicbom1p0_zihintpause2p0: Invalid or unknown z ISA extension: 'zicbom'
Convert CC_HAS_ZICBOM to TOOLCHAIN_HAS_ZICBOM & check if the linker
also supports Zicbom.
Qinglin Pan [Sun, 9 Oct 2022 08:30:50 +0000 (16:30 +0800)]
riscv: mm: add missing memcpy in kasan_init
Hi Atish,
It seems that the panic is due to the missing memcpy during kasan_init.
Could you please check whether this patch is helpful?
When doing kasan_populate, the new allocated base_pud/base_p4d should
contain kasan_early_shadow_{pud, p4d}'s content. Add the missing memcpy
to avoid page fault when read/write kasan shadow region.
Tested on:
- qemu with sv57 and CONFIG_KASAN on.
- qemu with sv48 and CONFIG_KASAN on.
Linus Torvalds [Thu, 27 Oct 2022 20:36:59 +0000 (13:36 -0700)]
Merge tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from 802.15.4 (Zigbee et al).
Current release - regressions:
- ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1
Current release - new code bugs:
- mptcp: fix abba deadlock on fastopen
- eth: stmmac: rk3588: allow multiple gmac controllers in one system
Previous releases - regressions:
- ip: rework the fix for dflt addr selection for connected nexthop
- net: couple more fixes for misinterpreting bits in struct page
after the signature was added
Previous releases - always broken:
- ipv6: ensure sane device mtu in tunnels
- openvswitch: switch from WARN to pr_warn on a user-triggerable path
- ethtool: eeprom: fix null-deref on genl_info in dump
- ieee802154: more return code fixes for corner cases in
dgram_sendmsg
- mac802154: fix link-quality-indicator recording
- eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack
offload
- eth: fec: limit register access on i.MX6UL
- eth: bcm4908_enet: update TX stats after actual transmission
- can: rcar_canfd: improve IRQ handling for RZ/G2L
Misc:
- genetlink: piggy back on the newly added resv_op_start to enforce
more sanity checks on new commands"
* tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
net: enetc: survive memory pressure without crashing
kcm: do not sense pfmemalloc status in kcm_sendpage()
net: do not sense pfmemalloc status in skb_append_pagefrags()
net/mlx5e: Fix macsec sci endianness at rx sa update
net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function
net/mlx5e: Fix macsec rx security association (SA) update/delete
net/mlx5e: Fix macsec coverity issue at rx sa update
net/mlx5: Fix crash during sync firmware reset
net/mlx5: Update fw fatal reporter state on PCI handlers successful recover
net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed
net/mlx5e: TC, Reject forwarding from internal port to internal port
net/mlx5: Fix possible use-after-free in async command interface
net/mlx5: ASO, Create the ASO SQ with the correct timestamp format
net/mlx5e: Update restore chain id for slow path packets
net/mlx5e: Extend SKB room check to include PTP-SQ
net/mlx5: DR, Fix matcher disconnect error flow
net/mlx5: Wait for firmware to enable CRS before pci_restore_state
net/mlx5e: Do not increment ESN when updating IPsec ESN state
netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed
netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed
...
Linus Torvalds [Thu, 27 Oct 2022 20:16:36 +0000 (13:16 -0700)]
Merge tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fixes from Kees Cook:
- Fix an ancient signal action copy race (Bernd Edlinger)
- Fix a memory leak in ELF loader, when under memory pressure (Li
Zetao)
* tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fs/binfmt_elf: Fix memory leak in load_elf_binary()
exec: Copy oldsighand->action under spin-lock
Linus Torvalds [Thu, 27 Oct 2022 19:31:57 +0000 (12:31 -0700)]
Merge tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Fix older Clang vs recent overflow KUnit test additions (Nick
Desaulniers, Kees Cook)
- Fix kern-doc visibility for overflow helpers (Kees Cook)
* tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
overflow: Refactor test skips for Clang-specific issues
overflow: disable failing tests for older clang versions
overflow: Fix kern-doc markup for functions
Linus Torvalds [Thu, 27 Oct 2022 19:21:57 +0000 (12:21 -0700)]
Merge tag 'media/v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A bunch of patches addressing issues in the vivid driver and adding
new checks in V4L2 to validate the input parameters from some ioctls"
* tag 'media/v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: vivid.rst: loop_video is set on the capture devnode
media: vivid: set num_in/outputs to 0 if not supported
media: vivid: drop GFP_DMA32
media: vivid: fix control handler mutex deadlock
media: videodev2.h: V4L2_DV_BT_BLANKING_HEIGHT should check 'interlaced'
media: v4l2-dv-timings: add sanity checks for blanking values
media: vivid: dev->bitmap_cap wasn't freed in all cases
media: vivid: s_fbuf: add more sanity checks
Vladimir Oltean [Thu, 27 Oct 2022 18:29:25 +0000 (21:29 +0300)]
net: enetc: survive memory pressure without crashing
Under memory pressure, enetc_refill_rx_ring() may fail, and when called
during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not
checked for.
An extreme case of memory pressure will result in exactly zero buffers
being allocated for the RX ring, and in such a case it is expected that
hardware drops all RX packets due to lack of buffers.
This does not happen, because the reset-default value of the consumer
and produces index is 0, and this makes the ENETC think that all buffers
have been initialized and that it owns them (when in reality none were).
The hardware guide explains this best:
| Configure the receive ring producer index register RBaPIR with a value
| of 0. The producer index is initially configured by software but owned
| by hardware after the ring has been enabled. Hardware increments the
| index when a frame is received which may consume one or more BDs.
| Hardware is not allowed to increment the producer index to match the
| consumer index since it is used to indicate an empty condition. The ring
| can hold at most RBLENR[LENGTH]-1 received BDs.
|
| Configure the receive ring consumer index register RBaCIR. The
| consumer index is owned by software and updated during operation of the
| of the BD ring by software, to indicate that any receive data occupied
| in the BD has been processed and it has been prepared for new data.
| - If consumer index and producer index are initialized to the same
| value, it indicates that all BDs in the ring have been prepared and
| hardware owns all of the entries.
| - If consumer index is initialized to producer index plus N, it would
| indicate N BDs have been prepared. Note that hardware cannot start if
| only a single buffer is prepared due to the restrictions described in
| (2).
| - Software may write consumer index to match producer index anytime
| while the ring is operational to indicate all received BDs prior have
| been processed and new BDs prepared for hardware.
Normally, the value of rx_ring->rcir (consumer index) is brought in sync
with the rx_ring->next_to_use software index, but this only happens if
page allocation ever succeeded.
When PI==CI==0, the hardware appears to receive frames and write them to
DMA address 0x0 (?!), then set the READY bit in the BD.
The enetc_clean_rx_ring() function (and its XDP derivative) is naturally
not prepared to handle such a condition. It will attempt to process
those frames using the rx_swbd structure associated with index i of the
RX ring, but that structure is not fully initialized (enetc_new_page()
does all of that). So what happens next is undefined behavior.
To operate using no buffer, we must initialize the CI to PI + 1, which
will block the hardware from advancing the CI any further, and drop
everything.
The issue was seen while adding support for zero-copy AF_XDP sockets,
where buffer memory comes from user space, which can even decide to
supply no buffers at all (example: "xdpsock --txonly"). However, the bug
is present also with the network stack code, even though it would take a
very determined person to trigger a page allocation failure at the
perfect time (a series of ifup/ifdown under memory pressure should
eventually reproduce it given enough retries).
Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20221027182925.3256653-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 27 Oct 2022 04:03:46 +0000 (04:03 +0000)]
net: do not sense pfmemalloc status in skb_append_pagefrags()
skb_append_pagefrags() is used by af_unix and udp sendpage()
implementation so far.
In commit 326140063946 ("tcp: TX zerocopy should not sense
pfmemalloc status") we explained why we should not sense
pfmemalloc status for pages owned by user space.
We should also use skb_fill_page_desc_noacc()
in skb_append_pagefrags() to avoid following KCSAN report:
BUG: KCSAN: data-race in lru_add_fn / skb_append_pagefrags
value changed: 0x0000000000000000 -> 0xffffea00058fc188
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 17325 Comm: syz-executor.0 Not tainted 6.1.0-rc1-syzkaller-00158-g440b7895c990-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022
Fixes: 326140063946 ("tcp: TX zerocopy should not sense pfmemalloc status") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221027040346.1104204-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Raed Salem [Wed, 26 Oct 2022 13:51:53 +0000 (14:51 +0100)]
net/mlx5e: Fix macsec sci endianness at rx sa update
The cited commit at rx sa update operation passes the sci object
attribute, in the wrong endianness and not as expected by the HW
effectively create malformed hw sa context in case of update rx sa
consequently, HW produces unexpected MACsec packets which uses this
sa.
Fix by passing sci to create macsec object with the correct endianness,
while at it add __force u64 to prevent sparse check error of type
"sparse: error: incorrect type in assignment".
Raed Salem [Wed, 26 Oct 2022 13:51:52 +0000 (14:51 +0100)]
net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function
The cited commit produces a sparse check error of type
"sparse: error: restricted __be64 degrades to integer". The
offending line wrongly did a bitwise operation between two different
storage types one of 64 bit when the other smaller side is 16 bit
which caused the above sparse error, furthermore bitwise operation
usage here is wrong in the first place as the constant MACSEC_PORT_ES
is not a bitwise field.
Fix by using the right mask to get the lower 16 bit if the sci number,
and use comparison operator '==' instead of bitwise '&' operator.
Raed Salem [Wed, 26 Oct 2022 13:51:51 +0000 (14:51 +0100)]
net/mlx5e: Fix macsec rx security association (SA) update/delete
The cited commit adds the support for update/delete MACsec Rx SA,
naturally, these operations need to check if the SA in question exists
to update/delete the SA and return error code otherwise, however they
do just the opposite i.e. return with error if the SA exists
Fix by change the check to return error in case the SA in question does
not exist, adjust error message and code accordingly.
Raed Salem [Wed, 26 Oct 2022 13:51:50 +0000 (14:51 +0100)]
net/mlx5e: Fix macsec coverity issue at rx sa update
The cited commit at update rx sa operation passes object attributes
to MACsec object create function without initializing/setting all
attributes fields leaving some of them with garbage values, therefore
violating the implicit assumption at create object function, which
assumes that all input object attributes fields are set.
Fix by initializing the object attributes struct to zero, thus leaving
unset fields with the legal zero value.
When setting Bluefield to DPU NIC mode using mlxconfig tool + sync
firmware reset flow, we run into scenario where the host was not
eswitch manager at the time of mlx5 driver load but becomes eswitch manager
after the sync firmware reset flow. This results in null pointer
access of mpfs structure during mac filter add. This change prevents null
pointer access but mpfs table entries will not be added.
Fixes: 5ec697446f46 ("net/mlx5: Add support for devlink reload action fw activate") Signed-off-by: Suresh Devarakonda <ramad@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-12-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Roy Novich [Wed, 26 Oct 2022 13:51:48 +0000 (14:51 +0100)]
net/mlx5: Update fw fatal reporter state on PCI handlers successful recover
Update devlink health fw fatal reporter state to "healthy" is needed by
strictly calling devlink_health_reporter_state_update() after recovery
was done by PCI error handler. This is needed when fw_fatal reporter was
triggered due to PCI error. Poll health is called and set reporter state
to error. Health recovery failed (since EEH didn't re-enable the PCI).
PCI handlers keep on recover flow and succeed later without devlink
acknowledgment. Fix this by adding devlink state update at the end of
the PCI handler recovery process.
Fixes: 6181e5cb752e ("devlink: add support for reporter recovery completion") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Roi Dayan [Wed, 26 Oct 2022 13:51:47 +0000 (14:51 +0100)]
net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed
On multi table split the driver creates a new attr instance with
data being copied from prev attr instance zeroing action flags.
Also need to reset dests properties to avoid incorrect dests per attr.
Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ariel Levkovich [Wed, 26 Oct 2022 13:51:46 +0000 (14:51 +0100)]
net/mlx5e: TC, Reject forwarding from internal port to internal port
Reject TC rules that forward from internal port to internal port
as it is not supported.
This include rules that are explicitly have internal port as
the filter device as well as rules that apply on tunnel interfaces
as the route device for the tunnel interface can be an internal
port.
Fixes: 27484f7170ed ("net/mlx5e: Offload tc rules that redirect to ovs internal port") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan [Wed, 26 Oct 2022 13:51:45 +0000 (14:51 +0100)]
net/mlx5: Fix possible use-after-free in async command interface
mlx5_cmd_cleanup_async_ctx should return only after all its callback
handlers were completed. Before this patch, the below race between
mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and
lead to a use-after-free:
1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e.
elevated by 1, a single inflight callback).
2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1.
3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and
is about to call wake_up().
4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns
immediately as the condition (num_inflight == 0) holds.
5. mlx5_cmd_cleanup_async_ctx returns.
6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx
object.
7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed
object.
Fix it by syncing using a completion object. Mark it completed when
num_inflight reaches 0.
Trace:
BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270
Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0
Saeed Mahameed [Wed, 26 Oct 2022 13:51:44 +0000 (14:51 +0100)]
net/mlx5: ASO, Create the ASO SQ with the correct timestamp format
mlx5 SQs must select the timestamp format explicitly according to the
active clock mode, select the current active timestamp mode so ASO SQ create
will succeed.
This fixes the following error prints when trying to create ipsec ASO SQ
while the timestamp format is real time mode.
mlx5_cmd_out_err:778:(pid 34874): CREATE_SQ(0x904) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xd61c0b), err(-22)
mlx5_aso_create_sq:285:(pid 34874): Failed to open aso wq sq, err=-22
mlx5e_ipsec_init:436:(pid 34874): IPSec initialization failed, -22
Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reported-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Wed, 26 Oct 2022 13:51:43 +0000 (14:51 +0100)]
net/mlx5e: Update restore chain id for slow path packets
Currently encap slow path rules just forward to software without
setting the chain id miss register, so driver doesn't restore
the chain, and packets hitting this rule will restart from tc chain
0 instead of continuing to the chain the encap rule was on.
Fix this by setting the chain id miss register to the chain id mapping.
Fixes: 8f1e0b97cc70 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aya Levin [Wed, 26 Oct 2022 13:51:42 +0000 (14:51 +0100)]
net/mlx5e: Extend SKB room check to include PTP-SQ
When tx_port_ts is set, the driver diverts all UPD traffic over PTP port
to a dedicated PTP-SQ. The SKBs are cached until the wire-CQE arrives.
When the packet size is greater then MTU, the firmware might drop it and
the packet won't be transmitted to the wire, hence the wire-CQE won't
reach the driver. In this case the SKBs are accumulated in the SKB fifo.
Add room check to consider the PTP-SQ SKB fifo, when the SKB fifo is
full, driver stops the queue resulting in a TX timeout. Devlink
TX-reporter can recover from it.
Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rongwei Liu [Wed, 26 Oct 2022 13:51:41 +0000 (14:51 +0100)]
net/mlx5: DR, Fix matcher disconnect error flow
When 2nd flow rules arrives, it will merge together with the
1st one if matcher criteria is the same.
If merge fails, driver will rollback the merge contents, and
reject the 2nd rule. At rollback stage, matcher can't be
disconnected unconditionally, otherise the 1st rule can't be
hit anymore.
Add logic to check if the matcher should be disconnected or not.
Fixes: cc2295cd54e4 ("net/mlx5: DR, Improve steering for empty or RX/TX-only matchers") Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Moshe Shemesh [Wed, 26 Oct 2022 13:51:40 +0000 (14:51 +0100)]
net/mlx5: Wait for firmware to enable CRS before pci_restore_state
After firmware reset driver should verify firmware already enabled CRS
and became responsive to pci config cycles before restoring pci state.
Fix that by waiting till device_id is readable through PCI again.
Hyong Youb Kim [Wed, 26 Oct 2022 13:51:39 +0000 (14:51 +0100)]
net/mlx5e: Do not increment ESN when updating IPsec ESN state
An offloaded SA stops receiving after about 2^32 + replay_window
packets. For example, when SA reaches <seq-hi 0x1, seq 0x2c>, all
subsequent packets get dropped with SA-icv-failure (integrity_failed).
To reproduce the bug:
- ConnectX-6 Dx with crypto enabled (FW 22.30.1004)
- ipsec.conf:
nic-offload = yes
replay-window = 32
esn = yes
salifetime=24h
- Run netperf for a long time to send more than 2^32 packets
netperf -H <device-under-test> -t TCP_STREAM -l 20000
When 2^32 + replay_window packets are received, the replay window
moves from the 2nd half of subspace (overlap=1) to the 1st half
(overlap=0). The driver then updates the 'esn' value in NIC
(i.e. seq_hi) as follows.
seq_hi = xfrm_replay_seqhi(seq_bottom)
new esn in NIC = seq_hi + 1
The +1 increment is wrong, as seq_hi already contains the correct
seq_hi. For example, when seq_hi=1, the driver actually tells NIC to
use seq_hi=2 (esn). This incorrect esn value causes all subsequent
packets to fail integrity checks (SA-icv-failure). So, do not
increment.
Fixes: cb01008390bb ("net/mlx5: IPSec, Add support for ESN") Signed-off-by: Hyong Youb Kim <hyonkim@cisco.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zhengchao Shao [Wed, 26 Oct 2022 01:46:42 +0000 (09:46 +0800)]
netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed
Remove dir in nsim_dev_debugfs_init() when creating ports dir failed.
Otherwise, the netdevsim device will not be created next time. Kernel
reports an error: debugfs: Directory 'netdevsim1' with parent 'netdevsim'
already present!
Fixes: ab1d0cc004d7 ("netdevsim: change debugfs tree topology") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zhengchao Shao [Wed, 26 Oct 2022 01:54:05 +0000 (09:54 +0800)]
netdevsim: fix memory leak in nsim_bus_dev_new()
If device_register() failed in nsim_bus_dev_new(), the value of reference
in nsim_bus_dev->dev is 1. obj->name in nsim_bus_dev->dev will not be
released.
Jakub Kicinski [Thu, 27 Oct 2022 17:30:41 +0000 (10:30 -0700)]
Merge tag 'linux-can-fixes-for-6.1-20221027' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-10-27
Anssi Hannula fixes the use of the completions in the kvaser_usb
driver.
Biju Das contributes 2 patches for the rcar_canfd driver. A IRQ storm
that can be triggered by high CAN bus load and channel specific IRQ
handlers are fixed.
Yang Yingliang fixes the j1939 transport protocol by moving a
kfree_skb() out of a spin_lock_irqsave protected section.
* tag 'linux-can-fixes-for-6.1-20221027' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: j1939: transport: j1939_session_skb_drop_old(): spin_unlock_irqrestore() before kfree_skb()
can: rcar_canfd: fix channel specific IRQ handling for RZ/G2L
can: rcar_canfd: rcar_canfd_handle_global_receive(): fix IRQ storm on global FIFO receive
can: kvaser_usb: Fix possible completions during init_completion
====================
Rafał Miłecki [Thu, 27 Oct 2022 11:24:30 +0000 (13:24 +0200)]
net: broadcom: bcm4908_enet: update TX stats after actual transmission
Queueing packets doesn't guarantee their transmission. Update TX stats
after hardware confirms consuming submitted data.
This also fixes a possible race and NULL dereference.
bcm4908_enet_start_xmit() could try to access skb after freeing it in
the bcm4908_enet_poll_tx().
====================
ip: rework the fix for dflt addr selection for connected nexthop"
This series reworks the fix that is reverted in the second commit.
As Julian explained, nhc_scope is related to nhc_gw, it's not the scope of
the route.
====================
Nicolas Dichtel [Thu, 20 Oct 2022 10:09:52 +0000 (12:09 +0200)]
nh: fix scope used to find saddr when adding non gw nh
As explained by Julian, fib_nh_scope is related to fib_nh_gw4, but
fib_info_update_nhc_saddr() needs the scope of the route, which is
the scope "before" fib_nh_scope, ie fib_nh_scope - 1.
This patch fixes the problem described in commit 747c14307214 ("ip: fix
dflt addr selection for connected nexthop").
As explained by Julian, nhc_scope is related to nhc_gw, not to the route.
Revert the original patch. The initial problem is fixed differently in the
next commit.
Dylan Yudaken [Thu, 27 Oct 2022 14:44:29 +0000 (07:44 -0700)]
io_uring: unlock if __io_run_local_work locked inside
It is possible for tw to lock the ring, and this was not propogated out to
io_run_local_work. This can cause an unlock to be missed.
Instead pass a pointer to locked into __io_run_local_work.
Fixes: 8ac5d85a89b4 ("io_uring: add local task_work run helper that is entered locked") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-3-dylany@meta.com
[axboe: WARN_ON() -> WARN_ON_ONCE() and add a minor comment] Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jakub Kicinski [Wed, 26 Oct 2022 00:15:24 +0000 (17:15 -0700)]
genetlink: limit the use of validation workarounds to old ops
During review of previous change another thing came up - we should
limit the use of validation workarounds to old commands.
Don't list the workarounds one by one, as we're rejecting all existing
ones. We can deal with the masking in the unlikely event that new flag
is added.
Florian Fainelli [Tue, 25 Oct 2022 23:42:01 +0000 (16:42 -0700)]
net: bcmsysport: Indicate MAC is in charge of PHY PM
Avoid the PHY library call unnecessarily into the suspend/resume
functions by setting phydev->mac_managed_pm to true. The SYSTEMPORT
driver essentially does exactly what mdio_bus_phy_resume() does by
calling phy_resume().
Jens Axboe [Thu, 27 Oct 2022 13:49:21 +0000 (07:49 -0600)]
Merge tag 'nvme-6.1-2022-10-27' of git://git.infradead.org/nvme into block-6.1
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.1
- make the multipath dma alignment to match the non-multipath one
(Keith Busch)
- fix a bogus use of sg_init_marker() (Nam Cao)
- fix circulr locking in nvme-tcp (Sagi Grimberg)"
* tag 'nvme-6.1-2022-10-27' of git://git.infradead.org/nvme:
nvme-multipath: set queue dma alignment to 3
nvme-tcp: fix possible circular locking when deleting a controller under memory pressure
nvme-tcp: replace sg_init_marker() with sg_init_table()
Ming Lei [Thu, 27 Oct 2022 08:57:09 +0000 (16:57 +0800)]
blk-mq: don't add non-pt request with ->end_io to batch
dm-rq implements ->end_io callback for request issued to underlying queue,
and it isn't passthrough request.
Commit ab3e1d3bbab9 ("block: allow end_io based requests in the completion
batch handling") doesn't clear rq->bio and rq->__data_len for request
with ->end_io in blk_mq_end_request_batch(), and this way is actually
dangerous, but so far it is only for nvme passthrough request.
dm-rq needs to clean up remained bios in case of partial completion,
and req->bio is required, then use-after-free is triggered, so the
underlying clone request can't be completed in blk_mq_end_request_batch.
Fix panic by not adding such request into batch list, and the issue
can be triggered simply by exposing nvme pci to dm-mpath simply.
Fixes: ab3e1d3bbab9 ("block: allow end_io based requests in the completion batch handling") Cc: dm-devel@redhat.com Cc: Mike Snitzer <snitzer@kernel.org> Reported-by: Changhui Zhong <czhong@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221027085709.513175-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Yang Yingliang [Thu, 27 Oct 2022 09:19:18 +0000 (17:19 +0800)]
rbd: fix possible memory leak in rbd_sysfs_init()
If device_register() returns error in rbd_sysfs_init(), name of kobject
which is allocated in dev_set_name() called in device_add() is leaked.
As comment of device_add() says, it should call put_device() to drop
the reference count that was set in device_initialize() when it fails,
so the name can be freed in kobject_cleanup().
Yang Yingliang [Thu, 27 Oct 2022 09:12:37 +0000 (17:12 +0800)]
can: j1939: transport: j1939_session_skb_drop_old(): spin_unlock_irqrestore() before kfree_skb()
It is not allowed to call kfree_skb() from hardware interrupt context
or with interrupts being disabled. The skb is unlinked from the queue,
so it can be freed after spin_unlock_irqrestore().
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20221027091237.2290111-1-yangyingliang@huawei.com Cc: stable@vger.kernel.org
[mkl: adjust subject] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Yang Yingliang [Tue, 25 Oct 2022 13:00:11 +0000 (21:00 +0800)]
net: ehea: fix possible memory leak in ehea_register_port()
If of_device_register() returns error, the of node and the
name allocated in dev_set_name() is leaked, call put_device()
to give up the reference that was set in device_initialize(),
so that of node is put in logical_port_release() and the name
is freed in kobject_cleanup().