Linus Torvalds [Wed, 12 Oct 2022 21:46:48 +0000 (14:46 -0700)]
Merge tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Prune private items from vfio_pci_core.h to a new internal header,
fix missed function rename, and refactor vfio-pci interrupt defines
(Jason Gunthorpe)
- Create consistent naming and handling of ioctls with a function per
ioctl for vfio-pci and vfio group handling, use proper type args
where available (Jason Gunthorpe)
- Implement a set of low power device feature ioctls allowing userspace
to make use of power states such as D3cold where supported (Abhishek
Sahu)
- Remove device counter on vfio groups, which had restricted the page
pinning interface to singleton groups to account for limitations in
the type1 IOMMU backend. Document usage as limited to emulated IOMMU
devices, ie. traditional mdev devices where this restriction is
consistent (Jason Gunthorpe)
- Correct function prefix in hisi_acc driver incurred during previous
refactoring (Shameer Kolothum)
- Correct typo and remove redundant warning triggers in vfio-fsl driver
(Christophe JAILLET)
- Introduce device level DMA dirty tracking uAPI and implementation in
the mlx5 variant driver (Yishai Hadas & Joao Martins)
- Move much of the vfio_device life cycle management into vfio core,
simplifying and avoiding duplication across drivers. This also
facilitates adding a struct device to vfio_device which begins the
introduction of device rather than group level user support and fills
a gap allowing userspace identify devices as vfio capable without
implicit knowledge of the driver (Kevin Tian & Yi Liu)
- Split vfio container handling to a separate file, creating a more
well defined API between the core and container code, masking IOMMU
backend implementation from the core, allowing for an easier future
transition to an iommufd based implementation of the same (Jason
Gunthorpe)
- Attempt to resolve race accessing the iommu_group for a device
between vfio releasing DMA ownership and removal of the device from
the IOMMU driver. Follow-up with support to allow vfio_group to exist
with NULL iommu_group pointer to support existing userspace use cases
of holding the group file open (Jason Gunthorpe)
- Fix error code and hi/lo register manipulation issues in the hisi_acc
variant driver, along with various code cleanups (Longfang Liu)
- Fix a prior regression in GVT-g group teardown, resulting in
unreleased resources (Jason Gunthorpe)
- A significant cleanup and simplification of the mdev interface,
consolidating much of the open coded per driver sysfs interface
support into the mdev core (Christoph Hellwig)
- Simplification of tracking and locking around vfio_groups that fall
out from previous refactoring (Jason Gunthorpe)
- Replace trivial open coded f_ops tests with new helper (Alex
Williamson)
* tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio: (77 commits)
vfio: More vfio_file_is_group() use cases
vfio: Make the group FD disassociate from the iommu_group
vfio: Hold a reference to the iommu_group in kvm for SPAPR
vfio: Add vfio_file_is_group()
vfio: Change vfio_group->group_rwsem to a mutex
vfio: Remove the vfio_group->users and users_comp
vfio/mdev: add mdev available instance checking to the core
vfio/mdev: consolidate all the description sysfs into the core code
vfio/mdev: consolidate all the available_instance sysfs into the core code
vfio/mdev: consolidate all the name sysfs into the core code
vfio/mdev: consolidate all the device_api sysfs into the core code
vfio/mdev: remove mtype_get_parent_dev
vfio/mdev: remove mdev_parent_dev
vfio/mdev: unexport mdev_bus_type
vfio/mdev: remove mdev_from_dev
vfio/mdev: simplify mdev_type handling
vfio/mdev: embedd struct mdev_parent in the parent data structure
vfio/mdev: make mdev.h standalone includable
drm/i915/gvt: simplify vgpu configuration management
drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_types
...
Linus Torvalds [Wed, 12 Oct 2022 18:16:58 +0000 (11:16 -0700)]
Merge tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one
is not"
* tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
nilfs2: fix use-after-free bug of struct nilfs_root
mm/damon/core: initialize damon_target->list in damon_new_target()
mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
Linus Torvalds [Wed, 12 Oct 2022 18:00:22 +0000 (11:00 -0700)]
Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
Linus Torvalds [Wed, 12 Oct 2022 17:35:20 +0000 (10:35 -0700)]
Merge tag 'loongarch-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Use EXPLICIT_RELOCS (ABIv2.0)
- Use generic BUG() handler
- Refactor TLB/Cache operations
- Add qspinlock support
- Add perf events support
- Add kexec/kdump support
- Add BPF JIT support
- Add ACPI-based laptop driver
- Update the default config file
* tag 'loongarch-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits)
LoongArch: Update Loongson-3 default config file
LoongArch: Add ACPI-based generic laptop driver
LoongArch: Add BPF JIT support
LoongArch: Add some instruction opcodes and formats
LoongArch: Move {signed,unsigned}_imm_check() to inst.h
LoongArch: Add kdump support
LoongArch: Add kexec support
LoongArch: Use generic BUG() handler
LoongArch: Add SysRq-x (TLB Dump) support
LoongArch: Add perf events support
LoongArch: Add qspinlock support
LoongArch: Use TLB for ioremap()
LoongArch: Support access filter to /dev/mem interface
LoongArch: Refactor cache probe and flush methods
LoongArch: mm: Refactor TLB exception handlers
LoongArch: Support R_LARCH_GOT_PC_{LO12,HI20} in modules
LoongArch: Support PC-relative relocations in modules
LoongArch: Define ELF relocation types added in ABIv2.0
LoongArch: Adjust symbol addressing for AS_HAS_EXPLICIT_RELOCS
LoongArch: Add Kconfig option AS_HAS_EXPLICIT_RELOCS
...
Linus Torvalds [Wed, 12 Oct 2022 17:23:24 +0000 (10:23 -0700)]
Merge tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt updates from Thomas Gleixner:
"Core code:
- Provide a generic wrapper which can be utilized in drivers to
handle the problem of force threaded demultiplex interrupts on RT
enabled kernels. This avoids conditionals and horrible quirks in
drivers all over the place
- Fix up affected pinctrl and GPIO drivers to make them cleanly RT
safe
Interrupt drivers:
- A new driver for the FSL MU platform specific MSI implementation
- Make irqchip_init() available for pure ACPI based systems
- Provide a functional DT binding for the Realtek RTL interrupt chip
- The usual DT updates and small code improvements all over the
place"
* tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
irqchip: IMX_MU_MSI should depend on ARCH_MXC
irqchip/imx-mu-msi: Fix wrong register offset for 8ulp
irqchip/ls-extirq: Fix invalid wait context by avoiding to use regmap
dt-bindings: irqchip: Describe the IMX MU block as a MSI controller
irqchip: Add IMX MU MSI controller driver
dt-bindings: irqchip: renesas,irqc: Add r8a779g0 support
irqchip/gic-v3: Fix typo in comment
dt-bindings: interrupt-controller: ti,sci-intr: Fix missing reg property in the binding
dt-bindings: irqchip: ti,sci-inta: Fix warning for missing #interrupt-cells
irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END
platform-msi: Export symbol platform_msi_create_irq_domain()
irqchip/realtek-rtl: use parent interrupts
dt-bindings: interrupt-controller: realtek,rtl-intc: require parents
irqchip/realtek-rtl: use irq_domain_add_linear()
irqchip: Make irqchip_init() usable on pure ACPI systems
bcma: gpio: Use generic_handle_irq_safe()
gpio: mlxbf2: Use generic_handle_irq_safe()
platform/x86: intel_int0002_vgpio: Use generic_handle_irq_safe()
ssb: gpio: Use generic_handle_irq_safe()
pinctrl: amd: Use generic_handle_irq_safe()
...
Huacai Chen [Wed, 12 Oct 2022 08:36:23 +0000 (16:36 +0800)]
LoongArch: Update Loongson-3 default config file
1, Enable ZBOOT, KEXEC and BPF_JIT;
2, Add more patition types;
3, Add some USB Type-C options;
4, Add some common network options;
5, Add some Bluetooth device drivers;
6, Remove obsolete config options (for some detailed information, see
Link).
Tiezhu Yang [Wed, 12 Oct 2022 08:36:20 +0000 (16:36 +0800)]
LoongArch: Add BPF JIT support
BPF programs are normally handled by a BPF interpreter, add BPF JIT
support for LoongArch to allow the kernel to generate native code when
a program is loaded into the kernel. This will significantly speed-up
processing of BPF programs.
Tiezhu Yang [Wed, 12 Oct 2022 08:36:19 +0000 (16:36 +0800)]
LoongArch: Add some instruction opcodes and formats
According to the "Table of Instruction Encoding" in LoongArch Reference
Manual [1], add some instruction opcodes and formats which are used in
the BPF JIT for LoongArch.
Youling Tang [Wed, 12 Oct 2022 08:36:19 +0000 (16:36 +0800)]
LoongArch: Add kdump support
This patch adds support for kdump. In kdump case the normal kernel will
reserve a region for the crash kernel and jump there on panic.
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.
A user-space tool, such as kexec-tools, is responsible for allocating a
separate region for the core's ELF header within the crash kdump kernel
memory and filling it in when executing kexec_load().
Then, its location will be advertised to the crash dump kernel via a
command line argument "elfcorehdr=", and the crash dump kernel will
preserve this region for later use with arch_reserve_vmcore() at boot
time.
At the same time, the crash kdump kernel is also limited within the
"crashkernel" area via a command line argument "mem=", so as not to
destroy the original kernel dump data.
In the crash dump kernel environment, /proc/vmcore is used to access the
primary kernel's memory with copy_oldmem_page().
I tested kdump on LoongArch machines (Loongson-3A5000) and it works as
expected (suggested crashkernel parameter is "crashkernel=512M@2560M"),
you may test it by triggering a crash through /proc/sysrq-trigger:
Youling Tang [Wed, 12 Oct 2022 08:36:19 +0000 (16:36 +0800)]
LoongArch: Add kexec support
Add three new files, kexec.h, machine_kexec.c and relocate_kernel.S to
the LoongArch architecture, so as to add support for the kexec re-boot
mechanism (CONFIG_KEXEC) on LoongArch platforms.
Kexec supports loading vmlinux.elf in ELF format and vmlinux.efi in PE
format.
I tested kexec on LoongArch machines (Loongson-3A5000) and it works as
expected:
Youling Tang [Wed, 12 Oct 2022 08:36:19 +0000 (16:36 +0800)]
LoongArch: Use generic BUG() handler
Inspired by commit ad73fc679a9("arm64/BUG: Use BRK instruction for
generic BUG traps"), do similar for LoongArch to use generic BUG()
handler.
This patch uses the BREAK software breakpoint instruction to generate
a trap instead, similarly to most other arches, with the generic BUG
code generating the dmesg boilerplate.
This allows bug metadata to be moved to a separate table and reduces
the amount of inline code at BUG() and WARN() sites. This also avoids
clobbering any registers before they can be dumped.
To mitigate the size of the bug table further, this patch makes use of
the existing infrastructure for encoding addresses within the bug table
as 32-bit relative pointers instead of absolute pointers.
(Note: this limits the max kernel size to 2GB.)
Before patch:
[ 3018.338013] lkdtm: Performing direct entry BUG
[ 3018.342445] Kernel bug detected[#5]:
[ 3018.345992] CPU: 2 PID: 865 Comm: cat Tainted: G D 6.0.0-rc6+ #35
After patch:
[ 125.585985] lkdtm: Performing direct entry BUG
[ 125.590433] ------------[ cut here ]------------
[ 125.595020] kernel BUG at drivers/misc/lkdtm/bugs.c:78!
[ 125.600211] Oops - BUG[#1]:
[ 125.602980] CPU: 3 PID: 410 Comm: cat Not tainted 6.0.0-rc6+ #36
Out-of-line file/line data information obtained compared to before.
Huacai Chen [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Add qspinlock support
On NUMA system, the performance of qspinlock is better than generic
spinlock. Below is the UnixBench test results on a 8 nodes (4 cores
per node, 32 cores in total) machine.
Huacai Chen [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Use TLB for ioremap()
We can support more cache attributes (e.g., CC, SUC and WUC) and page
protection when we use TLB for ioremap(). The implementation is based
on GENERIC_IOREMAP.
The existing simple ioremap() implementation has better performance so
we keep it and introduce ARCH_IOREMAP to control the selection.
We move pagetable_init() earlier to make early ioremap() works, and we
modify the PCI ecam mapping because the TLB-based version of ioremap()
will actually take the size into account.
Huacai Chen [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Support access filter to /dev/mem interface
Accidental access to /dev/mem is obviously disastrous, but specific
access can be used by people debugging the kernel. So select GENERIC_
LIB_DEVMEM_IS_ALLOWED, as well as define ARCH_HAS_VALID_PHYS_ADDR_RANGE
and related helpers, to support access filter to /dev/mem interface.
Signed-off-by: Weihao Li <liweihao@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Refactor cache probe and flush methods
Current cache probe and flush methods have some drawbacks:
1, Assume there are 3 cache levels and only 3 levels;
2, Assume L1 = I + D, L2 = V, L3 = S, V is exclusive, S is inclusive.
However, the fact is I + D, I + D + V, I + D + S and I + D + V + S are
all valid. So, refactor the cache probe and flush methods to adapt more
types of cache hierarchy.
Rui Wang [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: mm: Refactor TLB exception handlers
This patch simplifies TLB load, store and modify exception handlers:
1. Reduce instructions, such as alu/csr and memory access;
2. Execute tlb search instruction only in the fast path;
3. Return directly from the fast path for both normal and huge pages;
4. Re-tab the assembly for better vertical alignment.
And fixes the concurrent modification issue of fast path for huge pages.
This issue will occur in the following steps:
CPU-1 (In TLB exception) CPU-2 (In THP splitting)
1: Load PMD entry (HUGE=1)
2: Goto huge path
3: Store PMD entry (HUGE=0)
4: Reload PMD entry (HUGE=0)
5: Fill TLB entry (PA is incorrect)
This patch also slightly improves the TLB processing performance:
Xi Ruoyao [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Support R_LARCH_GOT_PC_{LO12,HI20} in modules
GCC >= 13 and GNU assembler >= 2.40 use these relocations to address
external symbols, so we need to add them.
Let the module loader emit GOT entries for data symbols so we would be
able to handle GOT relocations. The GOT entry is just the data's symbol
address.
In module.lds, emit a stub .got section for a section header entry. The
actual content of the section entry will be filled at runtime by module_
frob_arch_sections().
Tested-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Xi Ruoyao [Wed, 12 Oct 2022 08:36:08 +0000 (16:36 +0800)]
LoongArch: Adjust symbol addressing for AS_HAS_EXPLICIT_RELOCS
If explicit relocation hints are used by the toolchain, -Wa,-mla-*
options will be useless for the C code. So only use them for the
!CONFIG_AS_HAS_EXPLICIT_RELOCS case.
Replace "la" with "la.pcrel" in head.S to keep the semantic consistent
with new and old toolchains for the low level startup code.
For per-CPU variables, the "address" of the symbol is actually an offset
from $r21. The value is near the loading address of main kernel image,
but far from the loading address of modules. So we use model("extreme")
attibute to tell the compiler that a PC-relative addressing with 32-bit
offset is not sufficient for local per-CPU variables.
The behavior with different assemblers and compilers are summarized in
the following table:
AS has CC has
explicit relocs explicit relocs * Behavior
==============================================================
No No Use la.* macros.
No change from Linux 6.0.
--------------------------------------------------------------
No Yes Disable explicit relocs.
No change from Linux 6.0.
--------------------------------------------------------------
Yes No Not supported.
--------------------------------------------------------------
Yes Yes Enable explicit relocs.
No -Wa,-mla* options used.
==============================================================
*: We assume CC must have model attribute if it has explicit relocs.
Both features are added in GCC 13 development cycle, so any GCC
release >= 13 should be OK. Using early GCC 13 development snapshots
may produce modules with unsupported relocations.
GNU as >= 2.40 and GCC >= 13 will support using explicit relocation
hints in the assembly code, instead of la.* macros. The usage of
explicit relocation hints can improve code generation so it's enabled
by default by GCC >= 13.
Introduce a Kconfig option AS_HAS_EXPLICIT_RELOCS as the switch for
"use explicit relocation hints or not".
Tested-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Wed, 12 Oct 2022 08:36:08 +0000 (16:36 +0800)]
LoongArch: Mark __xchg() and __cmpxchg() as __always_inline
Commit bfab67995487 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
In case of __xchg()/__cmpxchg() this would cause to reference
BUILD_BUG(), which is an error case for catching bugs and will not
happen for correct code, if __xchg()/__cmpxchg() is inlined.
This bug can be produced with CONFIG_DEBUG_SECTION_MISMATCH enabled,
and the solution is similar to below commits: 9930c5515b0db01 ("MIPS: include: Mark __xchg as __always_inline"), e05a069803acbc4 ("MIPS: include: Mark __cmpxchg as __always_inline").
Huacai Chen [Wed, 12 Oct 2022 08:36:08 +0000 (16:36 +0800)]
LoongArch: Flush TLB earlier at initialization
Move local_flush_tlb_all() earlier (just after setup_ptwalker() and
before page allocation). This can avoid stale TLB entries misguiding
the later page allocation. Without this patch the second kernel of
kexec/kdump fails to boot SMP.
BTW, move output_pgtable_bits_defines() into tlb_init() since it has
nothing to do with tlb handler setup.
Tiezhu Yang [Wed, 12 Oct 2022 08:36:08 +0000 (16:36 +0800)]
LoongArch: Do not create sysfs control file for io master CPUs
Now io master CPUs are not hotpluggable on LoongArch, but in the current
code only /sys/devices/system/cpu/cpu0/online is not created. Let us set
the hotpluggable field of all the io master CPUs as 0, then prevent to
create sysfs control file for all the io master CPUs which confuses some
user space tools. This is similar with commit ea3db9d6660c ("MIPS: CPU#0
is not hotpluggable").
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The Freescale/NXP i.MX Messaging Unit is only present on Freescale/NXP
i.MX SoCs. Hence add a dependency on ARCH_MXC, to prevent asking the
user about this driver when configuring a kernel without Freescale/NXP
i.MX SoC family support.
While at it, expand "MU" to "Messaging Unit" in the help text.
Linus Torvalds [Wed, 12 Oct 2022 03:48:55 +0000 (20:48 -0700)]
Merge tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
"Test suite improvements:
- Added verification that memblock allocations zero the allocated
memory
- Added more test cases for memblock_add(), memblock_remove(),
memblock_reserve() and memblock_free()
- Added tests for memblock_*_raw() family
- Added tests for NUMA-aware allocations in memblock_alloc_try_nid()
and memblock_alloc_try_nid_raw()"
* tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: add generic NUMA tests for memblock_alloc_try_nid*
memblock tests: add bottom-up NUMA tests for memblock_alloc_try_nid*
memblock tests: add top-down NUMA tests for memblock_alloc_try_nid*
memblock tests: add simulation of physical memory with multiple NUMA nodes
memblock_tests: move variable declarations to single block
memblock tests: remove 'cleared' from comment blocks
memblock tests: add tests for memblock_trim_memory
memblock tests: add tests for memblock_*bottom_up functions
memblock tests: update alloc_nid_api to test memblock_alloc_try_nid_raw
memblock tests: update alloc_api to test memblock_alloc_raw
memblock tests: add additional tests for basic api and memblock_alloc
memblock tests: add labels to verbose output for generic alloc tests
memblock tests: update zeroed memory check for memblock_alloc_* tests
memblock tests: update tests to check if memblock_alloc zeroed memory
memblock tests: update reference to obsolete build option in comments
memblock tests: add command line help option
Linus Torvalds [Wed, 12 Oct 2022 03:07:44 +0000 (20:07 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
"The main batch of ARM + RISC-V changes, and a few fixes and cleanups
for x86 (PMU virtualization and selftests).
ARM:
- Fixes for single-stepping in the presence of an async exception as
well as the preservation of PSTATE.SS
- Better handling of AArch32 ID registers on AArch64-only systems
- Fixes for the dirty-ring API, allowing it to work on architectures
with relaxed memory ordering
- Advertise the new kvmarm mailing list
- Various minor cleanups and spelling fixes
RISC-V:
- Improved instruction encoding infrastructure for instructions not
yet supported by binutils
- Svinval support for both KVM Host and KVM Guest
- Zihintpause support for KVM Guest
- Zicbom support for KVM Guest
- Record number of signal exits as a VCPU stat
- Use generic guest entry infrastructure
x86:
- Misc PMU fixes and cleanups.
- selftests: fixes for Hyper-V hypercall
- selftests: fix nx_huge_pages_test on TDP-disabled hosts
- selftests: cleanups for fix_hypercall_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (57 commits)
riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
RISC-V: KVM: Use generic guest entry infrastructure
RISC-V: KVM: Record number of signal exits as a vCPU stat
RISC-V: KVM: add __init annotation to riscv_kvm_init()
RISC-V: KVM: Expose Zicbom to the guest
RISC-V: KVM: Provide UAPI for Zicbom block size
RISC-V: KVM: Make ISA ext mappings explicit
RISC-V: KVM: Allow Guest use Zihintpause extension
RISC-V: KVM: Allow Guest use Svinval extension
RISC-V: KVM: Use Svinval for local TLB maintenance when available
RISC-V: Probe Svinval extension form ISA string
RISC-V: KVM: Change the SBI specification version to v1.0
riscv: KVM: Apply insn-def to hlv encodings
riscv: KVM: Apply insn-def to hfence encodings
riscv: Introduce support for defining instructions
riscv: Add X register names to gpr-nums
KVM: arm64: Advertise new kvmarm mailing list
kvm: vmx: keep constant definition format consistent
kvm: mmu: fix typos in struct kvm_arch
KVM: selftests: Fix nx_huge_pages_test on TDP-disabled hosts
...
Ryusuke Konishi [Fri, 7 Oct 2022 08:52:26 +0000 (17:52 +0900)]
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
If nilfs_attach_log_writer() failed to create a log writer thread, it
frees a data structure of the log writer without any cleanup. After
commit 693b9b4ce61c ("nilfs2: use root object to get ifile"), this causes
a leak of struct nilfs_root, which started to leak an ifile metadata inode
and a kobject on that struct.
In addition, if the kernel is booted with panic_on_warn, the above
ifile metadata inode leak will cause the following panic when the
nilfs2 kernel module is removed:
Ryusuke Konishi [Sun, 2 Oct 2022 03:08:04 +0000 (12:08 +0900)]
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
If the i_mode field in inode of metadata files is corrupted on disk, it
can cause the initialization of bmap structure, which should have been
called from nilfs_read_inode_common(), not to be called. This causes a
lockdep warning followed by a NULL pointer dereference at
nilfs_bmap_lookup_at_level().
This patch fixes these issues by adding a missing sanitiy check for the
i_mode field of metadata file's inode.
Ryusuke Konishi [Mon, 3 Oct 2022 15:05:19 +0000 (00:05 +0900)]
nilfs2: fix use-after-free bug of struct nilfs_root
If the beginning of the inode bitmap area is corrupted on disk, an inode
with the same inode number as the root inode can be allocated and fail
soon after. In this case, the subsequent call to nilfs_clear_inode() on
that bogus root inode will wrongly decrement the reference counter of
struct nilfs_root, and this will erroneously free struct nilfs_root,
causing kernel oopses.
This fixes the problem by changing nilfs_new_inode() to skip reserved
inode numbers while repairing the inode bitmap.
SeongJae Park [Sun, 2 Oct 2022 19:31:30 +0000 (19:31 +0000)]
mm/damon/core: initialize damon_target->list in damon_new_target()
'struct damon_target' creation function, 'damon_new_target()' is not
initializing its '->list' field, unlike other DAMON structs creator
functions such as 'damon_new_region()'. Normal users of
'damon_new_target()' initializes the field by adding the target to DAMON
context's targets list, but some code could access the uninitialized
field.
This commit avoids the case by initializing the field in
'damon_new_target()'.
Baolin Wang [Thu, 1 Sep 2022 10:41:31 +0000 (18:41 +0800)]
mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with c320c29302d5 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with c41c435bd920 ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with c41c435bd920 for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com Fixes: c41c435bd920 ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tiezhu Yang [Fri, 2 Sep 2022 03:41:46 +0000 (11:41 +0800)]
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
The argument has_signal of arch_do_signal_or_restart() has been removed in
commit 7cac6ae2a373 ("task_work: Call tracehook_notify_signal from
get_signal on all architectures"), let us remove the related comment.
Link: https://lkml.kernel.org/r/1662090106-5545-1-git-send-email-yangtiezhu@loongson.cn Fixes: 7cac6ae2a373 ("task_work: Call tracehook_notify_signal from get_signal on all architectures") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
They must be empty (excluding vsyscall page) or full of zeroes.
Retroactively this test should've caught embarassing /proc/*/smaps_rollup
oops:
[17752.703567] BUG: kernel NULL pointer dereference, address: 0000000000000000
[17752.703580] #PF: supervisor read access in kernel mode
[17752.703583] #PF: error_code(0x0000) - not-present page
[17752.703587] PGD 0 P4D 0
[17752.703593] Oops: 0000 [#1] PREEMPT SMP PTI
[17752.703598] CPU: 0 PID: 60649 Comm: cat Tainted: G W 5.19.9-100.fc35.x86_64 #1
[17752.703603] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme6/3.1, BIOS P3.30 08/05/2016
[17752.703607] RIP: 0010:show_smaps_rollup+0x159/0x2e0
Note 1:
ProtectionKey field in /proc/*/smaps is optional,
so check most of its contents, not everything.
Note 2:
due to the nature of this test, child process hardly can signal
its readiness (after unmapping everything!) to parent.
I feel like "sleep(1)" is justified.
If you know how to do it without sleep please tell me.
Clean up config files by:
- removing configs that were deleted in the past
- removing configs not in tree and without recently pending patches
- adding new configs that are replacements for old configs in the file
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
If creation or finalization of a checkpoint fails due to anomalies in the
checkpoint metadata on disk, a kernel warning is generated.
This patch replaces the WARN_ONs by nilfs_error, so that a kernel, booted
with panic_on_warn, does not panic. A nilfs_error is appropriate here to
handle the abnormal filesystem condition.
This also replaces the detected error codes with an I/O error so that
neither of the internal error codes is returned to callers.
Linus Torvalds [Tue, 11 Oct 2022 22:02:25 +0000 (15:02 -0700)]
Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Add support for AMD on 'perf mem' and 'perf c2c', the kernel
enablement patches went via tip.
Example:
$ sudo perf mem record -- -c 10000
^C[ perf record: Woken up 227 times to write data ]
[ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]
$ sudo perf mem report -F mem,sample,snoop
Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
Memory access Samples Snoop
N/A 700620 N/A
L1 hit 126675 N/A
L2 hit 424 N/A
L3 hit 664 HitM
L3 hit 10 N/A
Local RAM hit 2 N/A
Remote RAM (1 hop) hit 8558 N/A
Remote Cache (1 hop) hit 3 N/A
Remote Cache (1 hop) hit 2 HitM
Remote Cache (2 hops) hit 10 HitM
Remote Cache (2 hops) hit 6 N/A
Uncached hit 4 N/A
$
- "perf lock" improvements:
- Add -E/--entries option to limit the number of entries to
display, say to ask for just the top 5 contended locks.
- Add -q/--quiet option to suppress header and debug messages.
- Add a 'perf test' kernel lock contention entry to test 'perf
lock'.
- "perf lock contention" improvements:
- Ask BPF's bpf_get_stackid() to skip some callchain entries.
The ones closer to the tooling are bpf related and not that
interesting, the ones calling the locking function are the ones
we're interested in, example of a full, unskipped callstack:
- Allow changing the callstack depth and number of entries to skip.
1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
0xffffffffbb8b8e75 bpf_trace_run2+0x35
0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb
0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5
0xffffffffbc1c26ff _raw_spin_lock+0x1f
0xffffffffbb841015 tick_do_update_jiffies64+0x25
0xffffffffbb8409ee tick_irq_enter+0x9e
- Show full callstack in verbose mode (-v option), sometimes this
is desirable instead of showing just one callstack entry.
- Allow multiple time ranges in 'perf record --delay' to help in
reducing the amount of data collected from hardware tracing (Intel
PT, etc) when there is a rough idea of periods of time where events
of interest take time.
- Add Intel PT to record only decoder debug messages when error
happens.
- Improve layout of Intel PT man page.
- Add new branch types: alignment, data and inst faults and arch
specific ones, such as fiq, debug_halt, debug_exit, debug_inst and
debug_data on arm64.
Kernel enablement went thru the tip tree.
- Fix 'perf probe' error log check in 'perf test' when no debuginfo is
available.
- Fix 'perf stat' aggregation mode logic, it should be looking at the
CPU not at the core number.
- Fix flags parsing in 'perf trace' filters.
- Introduce compact encoding of CPU range encoding on perf.data, to
avoid having a bitmap with all the CPUs.
- Improvements to the 'perf stat' metrics, including adding
"core_wide", and computing "smt" from the CPU topology.
- Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format,
that allows tooling to ask for the precise number of lost samples for
a given event.
- Add 'addr' sort key to see just the address of sampled instructions:
$ perf record -o- true | perf report -i- -s addr
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
# Samples: 12 of event 'cycles:u'
# Event count (approx.): 252512
#
# Overhead Address
# ........ ..................
42.96% 0x7f96f08443d7
29.55% 0x7f96f0859b50
14.76% 0x7f96f0852e02
8.30% 0x7f96f0855028
4.43% 0xffffffff8de01087
perf annotate: Toggle full address <-> offset display
- Add 'f' hotkey to the 'perf annotate' TUI interface when in
'disassembler output' mode ('o' hotkey) to toggle showing full
virtual address or just the offset.
- Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for
pre-existing threads, at the start of a 'perf record' session,
speeding up that record startup phase.
- Add a command line option to specify build ids in 'perf inject'.
- Update JSON event files for the Intel alderlake, broadwell,
broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake,
icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids,
skylake, skylakex, and tigerlake processors.
- Update vendor JSON event files for the ARM Neoverse V1 and E1
platforms.
- Add a 'perf test' entry for 'perf mem' where a struct has false
sharing and this gets detected in the 'perf mem' output, tested with
Intel, AMD and ARM64 systems.
- Add a 'perf test' entry to test the resolution of java symbols, where
an output like this is expected:
- Add tests for the ARM64 CoreSight hardware tracing feature, with
specially crafted pureloop, memcpy, thread loop and unroll tread that
then gets traced and the output compared with expected output.
Documentation explaining it is also included.
- Add per thread Intel PT 'perf test' entry to check that
PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a
mixture of per thread and per CPU events and mmaps, verify that this
gets all recorded correctly.
- Introduce pthread mutex wrappers to allow for building with clang's
-Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by"
"lockable", "exclusive_lock_function", "exclusive_trylock_function",
"exclusive_locks_required", and "no_thread_safety_analysis" compiler
function attributes.
- Fix empty version number when building outside of a git repo.
- Improve feature detection display when multiple versions of a feature
are present, such as for binutils libbfd, that has a mix of possible
ways to detect according to the Linux distribution.
Previously in some cases we had:
Auto-detecting system features
<SNIP>
... libbfd: [ on ]
... libbfd-liberty: [ on ]
... libbfd-liberty-z: [ on ]
<SNIP>
Now for this case we show just the main feature:
Auto-detecting system features
<SNIP>
... libbfd: [ on ]
<SNIP>
- Remove some unused structs, variables, macros, function prototypes
and includes from various places.
* tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
perf script: Add missing fields in usage hint
perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB
perf mem/c2c: Avoid printing empty lines for unsupported events
perf mem/c2c: Add load store event mappings for AMD
perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO}
perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()
perf test coresight: Add relevant documentation about ARM64 CoreSight testing
perf test: Add git ignore for tmp and output files of ARM CoreSight tests
perf test coresight: Add unroll thread test shell script
perf test coresight: Add unroll thread test tool
perf test coresight: Add thread loop test shell scripts
perf test coresight: Add thread loop test tool
perf test coresight: Add memcpy thread test shell script
perf test coresight: Add memcpy thread test tool
perf test: Add git ignore for perf data generated by the ARM CoreSight tests
perf test: Add arm64 asm pureloop test shell script
perf test: Add asm pureloop test tool
...
Linus Torvalds [Tue, 11 Oct 2022 18:08:18 +0000 (11:08 -0700)]
Merge tag 'pci-v6.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
"Resource management:
- Distribute spare resources to unconfigured hotplug bridges at
boot-time (not just when hot-adding such a bridge), which makes
hot-adding devices to docks work better.
- Revert to a BAR assignment inherited from firmware only when the
address is actually reachable via any upstream bridges, which fixes
some cases where firmware doesn't configure all devices.
- Add a sysfs interface to resize BARs so this can be done before
assigning devices to a VM through VFIO.
Power management:
- Disable Precision Time Management for all devices on suspend to
enable lower-power PM state. We previously did this just for Root
Ports, which isn't enough because downstream devices can still
generate PTM messages, which cause errors if it's disabled in the
Root Port.
- Save and restore the ASPM L1 PM Substates configuration for
suspend/ resume. Previously this configuration was lost, so L1.x
states likely stopped working after resume.
- Check whether the L1 PM Substates Capability exists. If it didn't
exist, we previously read junk and tried to configure L1 Substates
based on that.
- Fix the LTR_L1.2_THRESHOLD computation, which previously set a
threshold for entering L1.2 that was too low in some cases.
- Reduce the delay after transitions to or from D3cold by using
usleep_range() rather than msleep(), which often slept for ~19ms
instead of the 10ms normally required. The spec says 10ms is
enough, but it's possible we could trip over devices that need a
little more.
Error handling:
- Work around a BIOS bug that caused Intel Root Ports to advertise a
Root Port Programmed I/O (RP PIO) log size of zero, which caused
annoying warnings and prevented the kernel from dumping log
registers for DPC errors.
Qualcomm PCIe controller driver:
- Add support for SC8280XP and SA8540P host controllers and SM8450
endpoint controller.
- Disable Master AXI clock on endpoint controllers to save power when
link is idle or in L1.x.
- Expose link state transition counts via debugfs to help debug
issues with low-power states.
- Add auto-loading module support.
Synopsys DesignWare PCIe controller driver:
- Remove a dependency on ZONE_DMA32 by allocating the MSI target page
differently. There's more work to do related to eDMA controllers,
so it's not completely settled"
* tag 'pci-v6.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (71 commits)
PCI: qcom-ep: Check platform_get_resource_byname() return value
PCI: qcom-ep: Add support for SM8450 SoC
dt-bindings: PCI: qcom-ep: Add support for SM8450 SoC
dt-bindings: PCI: qcom-ep: Define clocks per platform
PCI: qcom-ep: Make PERST separation optional
dt-bindings: PCI: qcom-ep: Make PERST separation optional
PCI: qcom-ep: Disable Master AXI Clock when there is no PCIe traffic
PCI: Expose PCIe Resizable BAR support via sysfs
PCI/ASPM: Correct LTR_L1.2_THRESHOLD computation
PCI/ASPM: Ignore L1 PM Substates if device lacks capability
PCI/ASPM: Factor out L1 PM Substates configuration
PCI: qcom-ep: Gate Master AXI clock to MHI bus during L1SS
PCI: qcom-ep: Expose link transition counts via debugfs
PCI: qcom-ep: Disable IRQs during driver remove
PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
PCI/ASPM: Refactor L1 PM Substates Control Register programming
PCI: qcom-ep: Make use of the cached dev pointer
PCI: qcom-ep: Rely on the clocks supplied by devicetree
PCI: qcom-ep: Add kernel-doc for qcom_pcie_ep structure
phy: freescale: imx8m-pcie: Fix the wrong order of phy_init() and phy_power_on()
...
Linus Torvalds [Tue, 11 Oct 2022 18:03:42 +0000 (11:03 -0700)]
Merge tag 'i2c-for-6.1-rc1-batch2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- correct a variable type in the new pci1xxxx driver
- add a new SoC to the qcom-cci driver
- fix an issue with the designware driver which now got enough testing
- the aspeed driver now handles busy target backends better
* tag 'i2c-for-6.1-rc1-batch2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: aspeed: Assert NAK when slave is busy
i2c: designware: Fix handling of real but unexpected device interrupts
i2c: qcom-cci: Add MSM8226 compatible
dt-bindings: i2c: qcom,i2c-cci: Document clocks for MSM8974
dt-bindings: i2c: qcom,i2c-cci: Document MSM8226 compatible
i2c: microchip: pci1xxxx: Fix comparison of -EPERM against an unsigned variable
Linus Torvalds [Tue, 11 Oct 2022 17:53:25 +0000 (10:53 -0700)]
Merge tag 'input-for-v6.1-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a new driver for IBM Operational Panel
- a new driver for PinePhone keyboards
- RT5120 PMIC power key support
- various enhancements and support for new models in xpad (Xbox) driver
- a new compatible ID for Elan touchscreen driver
- rework of adp5588-keys driver to support configuring via device
properties (OF, ACPI, etc) instead of platform data, and proper
support of optional gpiochip functionality (and removal of
gpio-adp5588 driver)
- improvements to firmware update handling in Synaptics RMI4 driver
- support for double key matrix in mt6779-keypad
- support for polled mode in adc-joystick driver
- other assorted driver fixes, cleanups and improvements
* tag 'input-for-v6.1-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (90 commits)
Input: i8042 - fix refount leak on sparc
Input: i8042 - add LoongArch support in i8042-acpipnpio.h
Input: i8042 - rename i8042-x86ia64io.h to i8042-acpipnpio.h
Input: pinephone-keyboard - support the proxied I2C bus
Input: pinephone-keyboard - add PinePhone keyboard driver
dt-bindings: input: Add the PinePhone keyboard binding
dt-bindings: input: Convert hid-over-i2c to DT schema
input: drop empty comment blocks
Input: xpad - add X-Box Adaptive Profile button
Input: add ABS_PROFILE to uapi and documentation
Input: xpad - add X-Box Adaptive XBox button
Input: xpad - add X-Box Adaptive support
Input: ims-pcu - fix spelling mistake "BOOLTLOADER" -> "BOOTLOADER"
Input: ibm-panel - add missing MODULE_DEVICE_TABLE
Input: icn8505 - utilize acpi_get_subsystem_id()
Input: xpad - decipher xpadone packages with GIP defines
Input: xpad - refactor using BIT() macro
Input: synaptics-rmi4 - convert to use sysfs_emit() APIs
Input: twl4030-pwrbutton - add missing of.h include
Input: applespi - replace zero-length array with DECLARE_FLEX_ARRAY() helper
...
Linus Torvalds [Tue, 11 Oct 2022 17:49:17 +0000 (10:49 -0700)]
Merge tag 'fbdev-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev updates from Helge Deller:
"Here's a fix for the smscufx USB graphics card to prevent a kernel
crash if it's plugged in/out too fast.
The other patches are mostly small cleanups, fixes in failure paths
and code removal:
- fix an use-after-free in smscufx USB graphics driver
- add missing pci_disable_device() in tridentfb failure paths
- correctly handle irq detection failure in mb862xx driver
- fix resume code in omapfb/dss
- drop unused code in controlfb, tridentfb, arkfb, imxfb and udlfb
- convert uvesafb to use scnprintf() instead of snprintf()
- convert gbefb to use dev_groups
- add MODULE_DEVICE_TABLE() entry to vga16fb"
* tag 'fbdev-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: mb862xx: Fix check of return value from irq_of_parse_and_map()
fbdev: vga16fb: Add missing MODULE_DEVICE_TABLE() entry
fbdev: tridentfb: Fix missing pci_disable_device() in probe and remove
fbdev: smscufx: Fix use-after-free in ufx_ops_open()
fbdev: gbefb: Convert to use dev_groups
fbdev: imxfb: Remove redundant dev_err() call
fbdev: omapfb/dss: Use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()
fbdev: uvesafb: Convert snprintf to scnprintf
fbdev: arkfb: Remove the unused function dac_read_reg()
fbdev: tridentfb: Remove the unused function shadowmode_off()
fbdev: controlfb: Remove the unused function VAR_MATCH()
fbdev: udlfb: Remove redundant initialization to variable identical
Linus Torvalds [Tue, 11 Oct 2022 17:42:25 +0000 (10:42 -0700)]
Merge tag 'for-linus-6.1-1' of https://github.com/cminyard/linux-ipmi
Pull IPMI updates from Corey Minyard:
"Fix a bunch of little problems in IPMI
This is mostly just doc, config, and little tweaks. Nothing big, which
is why there was nothing for 6.0. There is one crash fix, but it's not
something that I think anyone is using yet"
* tag 'for-linus-6.1-1' of https://github.com/cminyard/linux-ipmi:
ipmi: Remove unused struct watcher_entry
ipmi: kcs: aspeed: Update port address comments
ipmi: Add __init/__exit annotations to module init/exit funcs
ipmi:ipmb: Don't call ipmi_unregister_smi() on a register failure
ipmi:ipmb: Fix a vague comment and a typo
dt-binding: ipmi: add fallback to npcm845 compatible
ipmi: Fix comment typo
char: ipmi: modify NPCM KCS configuration
dt-bindings: ipmi: Add npcm845 compatible
Lukas Bulwahn [Tue, 4 Oct 2022 07:13:02 +0000 (09:13 +0200)]
alpha: remove the needless aliases osf_{readv,writev}
Commit 0e5f80046eba ("a.out: Remove the a.out implementation") removes
CONFIG_OSF4_COMPAT and its functionality. Hence, sys_osf_{readv,writev}
are now just aliases of sys_{readv,writev}.
Remove these needless aliases.
[ Identical patch also posted by Jason A. Donenfeld ]
Joel Stanley [Tue, 11 Oct 2022 03:59:10 +0000 (14:29 +1030)]
powerpc: Fix 85xx build
The merge of the kbuild tree dropped the renaming of the FSL_BOOKE
kconfig option.
Fixes: 109b6f70e02b ("Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild") Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xen/pv: support selecting safe/unsafe msr accesses
Instead of always doing the safe variants for reading and writing MSRs
in Xen PV guests, make the behavior controllable via Kconfig option
and a boot parameter.
The default will be the current behavior, which is to always use the
safe variant.
xen/pv: refactor msr access functions to support safe and unsafe accesses
Refactor and rename xen_read_msr_safe() and xen_write_msr_safe() to
support both cases of MSR accesses, safe ones and potentially GP-fault
generating ones.
This will prepare to no longer swallow GPs silently in xen_read_msr()
and xen_write_msr().
Juergen Gross [Wed, 5 Oct 2022 07:42:33 +0000 (09:42 +0200)]
xen/pv: fix vendor checks for pmu emulation
The CPU vendor checks for pmu emulation are rather limited today, as
the assumption seems to be that only Intel and AMD are existing and/or
supported vendors.
Fix that by handling Centaur and Zhaoxin CPUs the same way as Intel,
and Hygon the same way as AMD.
While at it fix the return type of is_intel_pmu_msr().
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
xen/pv: add fault recovery control to pmu msr accesses
Today pmu_msr_read() and pmu_msr_write() fall back to the safe variants
of read/write MSR in case the MSR access isn't emulated via Xen. Allow
the caller to select that faults should not be recovered from by passing
NULL for the error pointer.
Linus Torvalds [Tue, 11 Oct 2022 03:32:10 +0000 (20:32 -0700)]
Merge tag 'xfs-6.1-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Dave Chinner:
"There are relatively few updates this cycle; half the cycle was eaten
by a grue, the other half was eaten by a tricky data corruption issue
that I still haven't entirely solved.
Hence there's no major changes in this cycle and it's largely just
minor cleanups and small bug fixes:
- fixes for filesystem shutdown procedure during a DAX memory failure
notification
- bug fixes
- logic cleanups
- log message cleanups
- updates to use vfs{g,u}id_t helpers where appropriate"
* tag 'xfs-6.1-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: on memory failure, only shut down fs after scanning all mappings
xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx
xfs: trim the mapp array accordingly in xfs_da_grow_inode_int
xfs: do not need to check return value of xlog_kvmalloc()
xfs: port to vfs{g,u}id_t and associated helpers
xfs: remove xfs_setattr_time() declaration
xfs: Remove the unneeded result variable
xfs: missing space in xfs trace log
xfs: simplify if-else condition in xfs_reflink_trim_around_shared
xfs: simplify if-else condition in xfs_validate_new_dalign
xfs: replace unnecessary seq_printf with seq_puts
xfs: clean up "%Ld/%Lu" which doesn't meet C standard
xfs: remove redundant else for clean code
xfs: remove the redundant word in comment
Linus Torvalds [Tue, 11 Oct 2022 03:28:41 +0000 (20:28 -0700)]
Merge tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This round looks fairly small comparing to the previous updates and
includes mostly minor bug fixes. Nevertheless, as we've still
interested in improving the stability, Chao added some debugging
methods to diagnoze subtle runtime inconsistency problem.
Enhancements:
- store all the corruption or failure reasons in superblock
- detect meta inode, summary info, and block address inconsistency
- increase the limit for reserve_root for low-end devices
- add the number of compressed IO in iostat
Bug fixes:
- DIO write fix for zoned devices
- do out-of-place writes for cold files
- fix some stat updates (FS_CP_DATA_IO, dirty page count)
- fix race condition on setting FI_NO_EXTENT flag
- fix data races when freezing super
- fix wrong continue condition check in GC
- do not allow ATGC for LFS mode
In addition, there're some code enhancement and clean-ups as usual"
* tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits)
f2fs: change to use atomic_t type form sbi.atomic_files
f2fs: account swapfile inodes
f2fs: allow direct read for zoned device
f2fs: support recording errors into superblock
f2fs: support recording stop_checkpoint reason into super_block
f2fs: remove the unnecessary check in f2fs_xattr_fiemap
f2fs: introduce cp_status sysfs entry
f2fs: fix to detect corrupted meta ino
f2fs: fix to account FS_CP_DATA_IO correctly
f2fs: code clean and fix a type error
f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file
f2fs: fix to do sanity check on summary info
f2fs: port to vfs{g,u}id_t and associated helpers
f2fs: fix to do sanity check on destination blkaddr during recovery
f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT
f2fs: fix race condition on setting FI_NO_EXTENT flag
f2fs: remove redundant check in f2fs_sanity_check_cluster
f2fs: add static init_idisk_time function to reduce the code
f2fs: fix typo
f2fs: fix wrong dirty page count when race between mmap and fallocate.
...
Linus Torvalds [Tue, 11 Oct 2022 03:21:09 +0000 (20:21 -0700)]
Merge tag '9p-for-6.1' of https://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
"Smaller buffers for small messages and fixes.
The highlight of this is Christian's patch to allocate smaller buffers
for most metadata requests: 9p with a big msize would try to allocate
large buffers when just 4 or 8k would be more than enough; this brings
in nice performance improvements.
There's also a few fixes for problems reported by syzkaller (thanks to
Schspa Shi, Tetsuo Handa for tests and feedback/patches) as well as
some minor cleanup"
* tag '9p-for-6.1' of https://github.com/martinetd/linux:
net/9p: clarify trans_fd parse_opt failure handling
net/9p: add __init/__exit annotations to module init/exit funcs
net/9p: use a dedicated spinlock for trans_fd
9p/trans_fd: always use O_NONBLOCK read/write
net/9p: allocate appropriate reduced message buffers
net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module
net/9p: add p9_msg_buf_size()
9p: add P9_ERRMAX for 9p2000 and 9p2000.u
net/9p: split message size argument into 't_size' and 'r_size' pair
9p: trans_fd/p9_conn_cancel: drop client lock earlier
Linus Torvalds [Tue, 11 Oct 2022 03:13:22 +0000 (20:13 -0700)]
Merge tag 'gfs2-nopid-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 debugfs updates from Andreas Gruenbacher:
- Improve the way how the state of glocks is reported in debugfs for
glocks which are not held by processes, but rather by other resouces
like cached inodes or flocks.
* tag 'gfs2-nopid-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Mark the remaining process-independent glock holders as GL_NOPID
gfs2: Mark flock glock holders as GL_NOPID
gfs2: Add GL_NOPID flag for process-independent glock holders
gfs2: Add flocks to glockfd debugfs file
gfs2: Add glockfd debugfs file
Linus Torvalds [Tue, 11 Oct 2022 03:08:52 +0000 (20:08 -0700)]
Merge tag 'gfs2-v6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Make sure to initialize the filesystem work queues before registering
the filesystem; this prevents them from being used uninitialized.
- On filesystem withdraw: prevent a a double iput() and immediately
reject pending locking requests that can no longer succeed.
- Use TRY lock in gfs2_inode_lookup() to prevent a rare glock hang
during evict.
- During filesystem mount, explicitly make sure that the sb_bsize and
sb_bsize_shift super block fields are consistent with each other.
This prevents messy error messages during fuzz testing.
- Switch from strlcpy to strscpy.
* tag 'gfs2-v6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Register fs after creating workqueues
gfs2: Check sb_bsize_shift after reading superblock
gfs2: Switch from strlcpy to strscpy
gfs2: Clear flags when withdraw prevents xmote
gfs2: Dequeue waiters when withdrawn
gfs2: Prevent double iput for journal on error
gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes
Linus Torvalds [Tue, 11 Oct 2022 03:04:22 +0000 (20:04 -0700)]
Merge tag '6.1-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
- data corruption fix when cache disabled
- four RDMA (smbdirect) improvements, including enabling support for
SoftiWARP
- four signing improvements
- three directory lease improvements
- four cleanup fixes
- minor security fix
- two debugging improvements
* tag '6.1-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (21 commits)
smb3: fix oops in calculating shash_setkey
cifs: secmech: use shash_desc directly, remove sdesc
smb3: rename encryption/decryption TFMs
cifs: replace kfree() with kfree_sensitive() for sensitive data
cifs: remove initialization value
cifs: Replace a couple of one-element arrays with flexible-array members
smb3: do not log confusing message when server returns no network interfaces
smb3: define missing create contexts
cifs: store a pointer to a fid in the cfid structure instead of the struct
cifs: improve handlecaching
cifs: Make tcon contain a wrapper structure cached_fids instead of cached_fid
smb3: add dynamic trace points for tree disconnect
Fix formatting of client smbdirect RDMA logging
Handle variable number of SGEs in client smbdirect send.
Reduce client smbdirect max receive segment size
Decrease the number of SMB3 smbdirect client SGEs
cifs: Fix the error length of VALIDATE_NEGOTIATE_INFO message
cifs: destage dirty pages before re-reading them for cache=none
cifs: return correct error in ->calc_signature()
MAINTAINERS: Add Tom Talpey as cifs.ko reviewer
...
Linus Torvalds [Tue, 11 Oct 2022 02:58:04 +0000 (19:58 -0700)]
Merge tag 'nfsd-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull more nfsd updates from Chuck Lever:
- filecache code clean-ups
* tag 'nfsd-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: rework hashtable handling in nfsd_do_file_acquire
nfsd: fix nfsd_file_unhash_and_dispose
Linus Torvalds [Tue, 11 Oct 2022 02:45:17 +0000 (19:45 -0700)]
Merge tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs tmpfile updates from Al Viro:
"Miklos' ->tmpfile() signature change; pass an unopened struct file to
it, let it open the damn thing. Allows to add tmpfile support to FUSE"
* tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fuse: implement ->tmpfile()
vfs: open inside ->tmpfile()
vfs: move open right after ->tmpfile()
vfs: make vfs_tmpfile() static
ovl: use vfs_tmpfile_open() helper
cachefiles: use vfs_tmpfile_open() helper
cachefiles: only pass inode to *mark_inode_inuse() helpers
cachefiles: tmpfile error handling cleanup
hugetlbfs: cleanup mknod and tmpfile
vfs: add vfs_tmpfile_open() helper
Linus Torvalds [Tue, 11 Oct 2022 00:53:04 +0000 (17:53 -0700)]
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
Linus Torvalds [Tue, 11 Oct 2022 00:13:07 +0000 (17:13 -0700)]
Merge tag 'x86_mm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Dave Hansen:
"There are some small things here, plus one big one.
The big one detected and refused to create W+X kernel mappings. This
caused a bit of trouble and it is entirely disabled on 32-bit due to
known unfixable EFI issues. It also oopsed on some systemd eBPF use,
which kept some users from booting.
The eBPF issue is fixed, but those troubles were caught relatively
recently which made me nervous that there are more lurking. The final
commit in here retains the warnings, but doesn't actually refuse to
create W+X mappings.
Summary:
- Detect insecure W+X mappings and warn about them, including a few
bug fixes and relaxing the enforcement
- Do a long-overdue defconfig update and enabling W+X boot-time
detection
- Cleanup _PAGE_PSE handling (follow-up on an earlier bug)
- Rename a change_page_attr function"
* tag 'x86_mm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Ease W^X enforcement back to just a warning
x86/mm: Disable W^X detection and enforcement on 32-bit
x86/mm: Add prot_sethuge() helper to abstract out _PAGE_PSE handling
x86/mm/32: Fix W^X detection when page tables do not support NX
x86/defconfig: Enable CONFIG_DEBUG_WX=y
x86/defconfig: Refresh the defconfigs
x86/mm: Refuse W^X violations
x86/mm: Rename set_memory_present() to set_memory_p()
Linus Torvalds [Mon, 10 Oct 2022 21:21:11 +0000 (14:21 -0700)]
Merge tag 'xtensa-20221010' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa updates from Max Filippov:
- add support for FDPIC and static PIE executable formats for noMMU
* tag 'xtensa-20221010' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: add FDPIC and static PIE support for noMMU
xtensa: clean up ELF_PLAT_INIT macro
Linus Torvalds [Mon, 10 Oct 2022 21:19:05 +0000 (14:19 -0700)]
Merge tag 'm68knommu-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
"Just a couple of changes. Fixes to compilation of the old/legacy
Freescale 68328 targets in some kernel configurations, and some
default configuration updates.
Linus Torvalds [Mon, 10 Oct 2022 21:02:53 +0000 (14:02 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- 9k mtu perf improvements
- vdpa feature provisioning
- virtio blk SECURE ERASE support
- fixes and cleanups all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_pci: don't try to use intxif pin is zero
vDPA: conditionally read MTU and MAC in dev cfg space
vDPA: fix spars cast warning in vdpa_dev_net_mq_config_fill
vDPA: check virtio device features to detect MQ
vDPA: check VIRTIO_NET_F_RSS for max_virtqueue_paris's presence
vDPA: only report driver features if FEATURES_OK is set
vDPA: allow userspace to query features of a vDPA device
virtio_blk: add SECURE ERASE command support
vp_vdpa: support feature provisioning
vdpa_sim_net: support feature provisioning
vdpa: device feature provisioning
virtio-net: use mtu size as buffer length for big packets
virtio-net: introduce and use helper function for guest gso support checks
virtio: drop vp_legacy_set_queue_size
virtio_ring: make vring_alloc_queue_packed prettier
virtio_ring: split: Operators use unified style
vhost: add __init/__exit annotations to module init/exit funcs
Linus Torvalds [Mon, 10 Oct 2022 20:59:01 +0000 (13:59 -0700)]
Merge tag 'hyperv-next-signed-20221009' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Remove unnecessary delay while probing for VMBus (Stanislav
Kinsburskiy)
- Optimize vmbus_on_event (Saurabh Sengar)
- Fix a race in Hyper-V DRM driver (Saurabh Sengar)
- Miscellaneous clean-up patches from various people
* tag 'hyperv-next-signed-20221009' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Replace kmap() with kmap_local_page()
drm/hyperv: Add ratelimit on error message
hyperv: simplify and rename generate_guest_id
Drivers: hv: vmbus: Split memcpy of flex-array
scsi: storvsc: remove an extraneous "to" in a comment
Drivers: hv: vmbus: Don't wait for the ACPI device upon initialization
Drivers: hv: vmbus: Use PCI_VENDOR_ID_MICROSOFT for better discoverability
Drivers: hv: vmbus: Fix kernel-doc
drm/hyperv: Don't overwrite dirt_needed value set by host
Drivers: hv: vmbus: Optimize vmbus_on_event
Linus Torvalds [Mon, 10 Oct 2022 20:52:14 +0000 (13:52 -0700)]
Merge tag 'thermal-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"These fix assorted issues in the thermal core and ARM thermal drivers.
Specifics:
- Use platform data to get the sensor ID instead of parsing the
device in imx_sc thermal driver and remove the dedicated OF
function from the core code (Daniel Lezcano).
- Fix Kconfig dependency for the QCom tsens thermal driver (Jonathan
Cameron).
- Add missing const annotation to the RCar ops thermal driver (Lad
Prabhakar).
- Drop duplicate parameter check from
thermal_zone_device_register_with_trips() (Lad Prabhakar).
- Fix NULL pointer dereference in trip_point_temp_store() by making
it check if the ->set_trip_temp() operation is present (Lad
Prabhakar).
- Fix the MSM8939 fourth sensor hardware ID in the QCom tsens thermal
driver (Vincent Knecht)"
* tag 'thermal-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/drivers/qcom/tsens-v0_1: Fix MSM8939 fourth sensor hw_id
thermal/core: Add a check before calling set_trip_temp()
thermal/core: Drop valid pointer check for type
thermal/drivers/rcar_thermal: Constify static thermal_zone_device_ops
thermal/drivers/qcom: Drop false build dependency of all QCOM drivers on QCOM_TSENS
thermal/of: Remove the thermal_zone_of_get_sensor_id() function
thermal/drivers/imx_sc: Rely on the platform data to get the resource id
Linus Torvalds [Mon, 10 Oct 2022 20:39:03 +0000 (13:39 -0700)]
Merge tag 'pm-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These update the turbostat utility, extend the macros used for
defining device power management callbacks and add a diagnostic
message to the generic power domains code.
Specifics:
- Add an error message to be printed when a power domain marked as
"always on" is not actually on during initialization (Johan
Hovold).
- Extend macros used for defining power management callbacks to allow
conditional exporting of noirq and late/early suspend/resume PM
callbacks (Paul Cercueil).
- Update the turbostat utility:
- Add support for two new platforms (Zhang Rui).
- Adjust energy unit for Sapphire Rapids (Zhang Rui).
- Do not dump TRL if turbo is not supported (Artem Bityutskiy)"
* tag 'pm-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tools/power turbostat: version 2022.10.04
tools/power turbostat: Use standard Energy Unit for SPR Dram RAPL domain
tools/power turbostat: Do not dump TRL if turbo is not supported
tools/power turbostat: Add support for MeteorLake platforms
tools/power turbostat: Add support for RPL-S
PM: Improve EXPORT_*_DEV_PM_OPS macros
PM: domains: log failures to register always-on domains
Linus Torvalds [Mon, 10 Oct 2022 20:28:06 +0000 (13:28 -0700)]
Merge tag 'acpi-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These fix two issues, in APEI and in the int3472 driver, clean up the
ACPI thermal driver, add ACPI support for non-GPE system wakeup events
and make the system reboot code use the S5 (system off) state by
default.
- Fix a memory leak in APEI by avoiding to add a task_work to kernel
threads running when an asynchronous error is detected (Shuai Xue).
- Add ACPI support for handling system wakeups via GPIO wake capable
IRQs in addition to GPEs (Raul E Rangel).
- Make the system reboot code put ACPI-enabled systems into the S5
(system off) state which is necessary for some platforms to work as
expected (Kai-Heng Feng).
- Make the white space usage in the ACPI thermal driver more
consistent and drop redundant code from it (Rafael Wysocki)"
* tag 'acpi-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: thermal: Drop some redundant code
ACPI: thermal: Drop redundant parens from expressions
ACPI: thermal: Use white space more consistently
platform/x86: int3472: Don't leak reference on error
ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
PM: ACPI: reboot: Reinstate S5 for reboot
kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode
ACPI: PM: Take wake IRQ into consideration when entering suspend-to-idle
i2c: acpi: Use ACPI wake capability bit to set wake_irq
ACPI: resources: Add wake_capable parameter to acpi_dev_irq_flags
gpiolib: acpi: Add wake_capable variants of acpi_dev_gpio_irq_get
Linus Torvalds [Mon, 10 Oct 2022 20:24:55 +0000 (13:24 -0700)]
Merge tag 'dma-mapping-6.1-2022-10-10' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- fix a regression in the ARM dma-direct conversion (Christoph Hellwig)
- use memcpy_{from,to}_page (Fabio M. De Francesco)
- cleanup the swiotlb MAINTAINERS entry (Lukas Bulwahn)
- make SG table pool allocation less fragile (Masahiro Yamada)
- don't panic on swiotlb initialization failure (Robin Murphy)
* tag 'dma-mapping-6.1-2022-10-10' of git://git.infradead.org/users/hch/dma-mapping:
ARM/dma-mapping: remove the dma_coherent member of struct dev_archdata
ARM/dma-mappÑ–ng: don't override ->dma_coherent when set from a bus notifier
lib/sg_pool: change module_init(sg_pool_init) to subsys_initcall
MAINTAINERS: merge SWIOTLB SUBSYSTEM into DMA MAPPING HELPERS
swiotlb: don't panic!
swiotlb: replace kmap_atomic() with memcpy_{from,to}_page()
Linus Torvalds [Mon, 10 Oct 2022 20:20:53 +0000 (13:20 -0700)]
Merge tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- remove the bus_set_iommu() interface which became unnecesary because
of IOMMU per-device probing
- make the dma-iommu.h header private
- Intel VT-d changes from Lu Baolu:
- Decouple PASID and PRI from SVA
- Add ESRTPS & ESIRTPS capability check
- Cleanups
- Apple DART support for the M1 Pro/MAX SOCs
- support for AMD IOMMUv2 page-tables for the DMA-API layer.
The v2 page-tables are compatible with the x86 CPU page-tables. Using
them for DMA-API prepares support for hardware-assisted IOMMU
virtualization
- support for MT6795 Helio X10 M4Us in the Mediatek IOMMU driver
- some smaller fixes and cleanups
* tag 'iommu-updates-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (59 commits)
iommu/vt-d: Avoid unnecessary global DMA cache invalidation
iommu/vt-d: Avoid unnecessary global IRTE cache invalidation
iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_support
iommu/vt-d: Remove pasid_set_eafe()
iommu/vt-d: Decouple PASID & PRI enabling from SVA
iommu/vt-d: Remove unnecessary SVA data accesses in page fault path
dt-bindings: iommu: arm,smmu-v3: Relax order of interrupt names
iommu: dart: Support t6000 variant
iommu/io-pgtable-dart: Add DART PTE support for t6000
iommu/io-pgtable: Add DART subpage protection support
iommu/io-pgtable: Move Apple DART support to its own file
iommu/mediatek: Add support for MT6795 Helio X10 M4Us
iommu/mediatek: Introduce new flag TF_PORT_TO_ADDR_MT8173
dt-bindings: mediatek: Add bindings for MT6795 M4U
iommu/iova: Fix module config properly
iommu/amd: Fix sparse warning
iommu/amd: Remove outdated comment
iommu/amd: Free domain ID after domain_flush_pages
iommu/amd: Free domain id in error path
iommu/virtio: Fix compile error with viommu_capable()
...
Linus Torvalds [Mon, 10 Oct 2022 20:09:33 +0000 (13:09 -0700)]
Merge tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"Just a few bug fixes this time"
* tag 'tpmdd-next-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle
security/keys: Remove inconsistent __user annotation
char: move from strlcpy with unused retval to strscpy
Linus Torvalds [Mon, 10 Oct 2022 19:49:34 +0000 (12:49 -0700)]
Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)
- cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)
This series cleans that mess and adds new config FORCE_NR_CPUS that
allows to optimize cpumask subsystem if the number of CPUs is known
at compile-time.
- optimize find_bit() functions (me)
Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
macros.
- add find_nth_bit() (me)
Adds find_nth_bit(), which is ~70 times faster than bitcounting with
for_each() loop:
for_each_set_bit(bit, mask, size)
if (n-- == 0)
return bit;
Also adds bitmap_weight_and() to let people replace this pattern:
Linus Torvalds [Mon, 10 Oct 2022 19:20:55 +0000 (12:20 -0700)]
Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"Major changes:
- Changed location of tracing repo from personal git repo to:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
- Added Masami Hiramatsu as co-maintainer
- Updated MAINTAINERS file to separate out FTRACE as it is more than
just TRACING.
Minor changes:
- Added Mark Rutland as FTRACE reviewer
- Updated user_events to make it on its way to remove the BROKEN tag.
The changes should now be acceptable but will run it through a
cycle and hopefully we can remove the BROKEN tag next release.
- Added filtering to eprobes
- Added a delta time to the benchmark trace event
- Have the histogram and filter callbacks called via a switch
statement instead of indirect functions. This speeds it up to avoid
retpolines.
- Add a way to wake up ring buffer waiters waiting for the ring
buffer to fill up to its watermark.
- New ioctl() on the trace_pipe_raw file to wake up ring buffer
waiters.
- Wake up waiters when the ring buffer is disabled. A reader may
block when the ring buffer is disabled, but if it was blocked when
the ring buffer is disabled it should then wake up.
Fixes:
- Allow splice to read partially read ring buffer pages. This fixes
splice never moving forward.
- Fix inverted compare that made the "shortest" ring buffer wait
queue actually the longest.
- Fix a race in the ring buffer between resetting a page when a
writer goes to another page, and the reader.
- Fix ftrace accounting bug when function hooks are added at boot up
before the weak functions are set to "disabled".
- Fix bug that freed a user allocated snapshot buffer when enabling a
tracer.
- Fix possible recursive locks in osnoise tracer
- Fix recursive locking direct functions
- Other minor clean ups and fixes"
* tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
ftrace: Create separate entry in MAINTAINERS for function hooks
tracing: Update MAINTAINERS to reflect new tracing git repo
tracing: Do not free snapshot if tracer is on cmdline
ftrace: Still disable enabled records marked as disabled
tracing/user_events: Move pages/locks into groups to prepare for namespaces
tracing: Add Masami Hiramatsu as co-maintainer
tracing: Remove unused variable 'dups'
MAINTAINERS: add myself as a tracing reviewer
ring-buffer: Fix race between reset page and reading page
tracing/user_events: Update ABI documentation to align to bits vs bytes
tracing/user_events: Use bits vs bytes for enabled status page data
tracing/user_events: Use refcount instead of atomic for ref tracking
tracing/user_events: Ensure user provided strings are safely formatted
tracing/user_events: Use WRITE instead of READ for io vector import
tracing/user_events: Use NULL for strstr checks
tracing: Fix spelling mistake "preapre" -> "prepare"
tracing: Wake up waiters when tracing is disabled
tracing: Add ioctl() to force ring buffer waiters to wake up
tracing: Wake up ring buffer waiters on closing of the file
ring-buffer: Add ring_buffer_wake_waiters()
...
Linus Torvalds [Mon, 10 Oct 2022 19:18:28 +0000 (12:18 -0700)]
Merge tag 'sysctl-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"Just some boring cleanups on the sysctl front for this release"
* tag 'sysctl-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred}
kernel/sysctl.c: move sysctl_vals and sysctl_long_vals to sysctl.c
sysctl: remove max_extfrag_threshold
kernel/sysctl.c: remove unnecessary (void*) conversions
proc: remove initialization assignment
Linus Torvalds [Mon, 10 Oct 2022 19:13:36 +0000 (12:13 -0700)]
Merge tag 'modules-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
- minor enhancement for sysfs compression string (David Disseldorp)
- debugfs interface to view unloaded tainted modules (Aaron Tomlin)
* tag 'modules-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module/decompress: generate sysfs string at compile time
module: Add debugfs interface to view unloaded tainted modules
Linus Torvalds [Mon, 10 Oct 2022 19:00:45 +0000 (12:00 -0700)]
Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Remove potentially incomplete targets when Kbuid is interrupted by
SIGINT etc in case GNU Make may miss to do that when stderr is piped
to another program.
- Rewrite the single target build so it works more correctly.
- Fix rpm-pkg builds with V=1.
- List top-level subdirectories in ./Kbuild.
- Ignore auto-generated __kstrtab_* and __kstrtabns_* symbols in
kallsyms.
- Avoid two different modules in lib/zstd/ having shared code, which
potentially causes building the common code as build-in and modular
back-and-forth.
- Unify two modpost invocations to optimize the build process.
- Remove head-y syntax in favor of linker scripts for placing
particular sections in the head of vmlinux.
- Bump the minimal GNU Make version to 3.82.
- Clean up misc Makefiles and scripts.
* tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
docs: bump minimal GNU Make version to 3.82
ia64: simplify esi object addition in Makefile
Revert "kbuild: Check if linker supports the -X option"
kbuild: rebuild .vmlinux.export.o when its prerequisite is updated
kbuild: move modules.builtin(.modinfo) rules to Makefile.vmlinux_o
zstd: Fixing mixed module-builtin objects
kallsyms: ignore __kstrtab_* and __kstrtabns_* symbols
kallsyms: take the input file instead of reading stdin
kallsyms: drop duplicated ignore patterns from kallsyms.c
kbuild: reuse mksysmap output for kallsyms
mksysmap: update comment about __crc_*
kbuild: remove head-y syntax
kbuild: use obj-y instead extra-y for objects placed at the head
kbuild: hide error checker logs for V=1 builds
kbuild: re-run modpost when it is updated
kbuild: unify two modpost invocations
kbuild: move vmlinux.o rule to the top Makefile
kbuild: move .vmlinux.objs rule to Makefile.modpost
kbuild: list sub-directories in ./Kbuild
Makefile.compiler: replace cc-ifversion with compiler-specific macros
...
Linus Torvalds [Mon, 10 Oct 2022 18:36:19 +0000 (11:36 -0700)]
Merge tag 'livepatching-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Fix race between fork and livepatch transition revert
- Add sysfs entry that shows "patched" state for each object (module)
that can be livepatched by the given livepatch
- Some clean up
* tag 'livepatching-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests/livepatch: add sysfs test
livepatch: add sysfs entry "patched" for each klp_object
selftests/livepatch: normalize sysctl error message
livepatch: Add a missing newline character in klp_module_coming()
livepatch: fix race between fork and KLP transition
Linus Torvalds [Mon, 10 Oct 2022 18:24:19 +0000 (11:24 -0700)]
Merge tag 'printk-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Initialize pointer hashing using the system workqueue. It avoids
taking locks in printk()/vsprintf() code path
- Misc code clean up
* tag 'printk-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Mark __printk percpu data ready __ro_after_init
printk: Remove bogus comment vs. boot consoles
printk: Remove write only variable nr_ext_console_drivers
printk: Declare log_wait properly
printk: Make pr_flush() static
lib/vsprintf: Initialize vsprintf's pointer hash once the random core is ready.
lib/vsprintf: Remove static_branch_likely() from __ptr_to_hashval().
lib/vnsprintf: add const modifier for param 'bitmap'
Linus Torvalds [Mon, 10 Oct 2022 18:12:25 +0000 (11:12 -0700)]
Merge tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset now support isolated cpus.partition type, which will enable
dynamic CPU isolation
- pids.peak added to remember the max number of pids used
- holes in cgroup namespace plugged
- internal cleanups
* tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits)
cgroup: use strscpy() is more robust and safer
iocost_monitor: reorder BlkgIterator
cgroup: simplify code in cgroup_apply_control
cgroup: Make cgroup_get_from_id() prettier
cgroup/cpuset: remove unreachable code
cgroup: Remove CFTYPE_PRESSURE
cgroup: Improve cftype add/rm error handling
kselftest/cgroup: Add cpuset v2 partition root state test
cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst
cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule
cgroup/cpuset: Relocate a code block in validate_change()
cgroup/cpuset: Show invalid partition reason string
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Relax constraints to partition & cpus changes
cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective
cgroup/cpuset: Miscellaneous cleanups & add helper functions
cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
cgroup: add pids.peak interface for pids controller
cgroup: Remove data-race around cgrp_dfl_visible
cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG
...
Linus Torvalds [Mon, 10 Oct 2022 17:41:21 +0000 (10:41 -0700)]
Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Huawei reported that when they updated their kernel from 4.4 to
something much newer, some userspace code they had broke, the culprit
being the accidental removal of O_NONBLOCK from /dev/random way back
in 5.6. It's been gone for over 2 years now and this is the first
we've heard of it, but userspace breakage is userspace breakage, so
O_NONBLOCK is now back.
- Use randomness from hardware RNGs much more often during early boot,
at the same interval that crng reseeds are done, from Dominik.
- A semantic change in hardware RNG throttling, so that the hwrng
framework can properly feed random.c with randomness from hardware
RNGs that aren't specifically marked as creditable.
A related patch coming to you via Herbert's hwrng tree depends on
this one, not to compile, but just to function properly, so you may
want to merge this PULL before that one.
- A fix to clamp credited bits from the interrupts pool to the size of
the pool sample. This is mainly just a theoretical fix, as it'd be
pretty hard to exceed it in practice.
- Oracle reported that InfiniBand TCP latency regressed by around
10-15% after a change a few cycles ago made at the request of the RT
folks, in which we hoisted a somewhat rare operation (1 in 1024
times) out of the hard IRQ handler and into a workqueue, a pretty
common and boring pattern.
It turns out, though, that scheduling a worker from there has
overhead of its own, whereas scheduling a timer on that same CPU for
the next jiffy amortizes better and doesn't incur the same overhead.
I also eliminated a cache miss by moving the work_struct (and
subsequently, the timer_list) to below a critical cache line, so that
the more critical members that are accessed on every hard IRQ aren't
split between two cache lines.
- The boot-time initialization of the RNG has been split into two
approximate phases: what we can accomplish before timekeeping is
possible and what we can accomplish after.
This winds up being useful so that we can use RDRAND to seed the RNG
before CONFIG_SLAB_FREELIST_RANDOM=y systems initialize slabs, in
addition to other early uses of randomness. The effect is that
systems with RDRAND (or a bootloader seed) will never see any
warnings at all when setting CONFIG_WARN_ALL_UNSEEDED_RANDOM=y. And
kfence benefits from getting a better seed of its own.
- Small systems without much entropy sometimes wind up putting some
truncated serial number read from flash into hostname, so contribute
utsname changes to the RNG, without crediting.
- Add smaller batches to serve requests for smaller integers, and make
use of them when people ask for random numbers bounded by a given
compile-time constant. This has positive effects all over the tree,
most notably in networking and kfence.
- The original jitter algorithm intended (I believe) to schedule the
timer for the next jiffy, not the next-next jiffy, yet it used
mod_timer(jiffies + 1), which will fire on the next-next jiffy,
instead of what I believe was intended, mod_timer(jiffies), which
will fire on the next jiffy. So fix that.
- Fix a comment typo, from William.
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: clear new batches when bringing new CPUs online
random: fix typos in get_random_bytes() comment
random: schedule jitter credit for next jiffy, not in two jiffies
prandom: make use of smaller types in prandom_u32_max
random: add 8-bit and 16-bit batches
utsname: contribute changes to RNG
random: use init_utsname() instead of utsname()
kfence: use better stack hash seed
random: split initialization into early step and later step
random: use expired timer rather than wq for mixing fast pool
random: avoid reading two cache lines on irq randomness
random: clamp credited irq bits to maximum mixed
random: throttle hwrng writes if no entropy is credited
random: use hwgenerator randomness more frequently at early boot
random: restore O_NONBLOCK support
Linus Torvalds [Mon, 10 Oct 2022 17:21:22 +0000 (10:21 -0700)]
Merge tag 'slab-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- The "common kmalloc v4" series [1] by Hyeonggon Yoo.
While the plan after LPC is to try again if it's possible to get rid
of SLOB and SLAB (and if any critical aspect of those is not possible
to achieve with SLUB today, modify it accordingly), it will take a
while even in case there are no objections.
Meanwhile this is a nice cleanup and some parts (e.g. to the
tracepoints) will be useful even if we end up with a single slab
implementation in the future:
- Improves the mm/slab_common.c wrappers to allow deleting
duplicated code between SLAB and SLUB.
- Large kmalloc() allocations in SLAB are passed to page allocator
like in SLUB, reducing number of kmalloc caches.
- Removes the {kmem_cache_alloc,kmalloc}_node variants of
tracepoints, node id parameter added to non-_node variants.
- Addition of kmalloc_size_roundup()
The first two patches from a series by Kees Cook [2] that introduce
kmalloc_size_roundup(). This will allow merging of per-subsystem
patches using the new function and ultimately stop (ab)using ksize()
in a way that causes ongoing trouble for debugging functionality and
static checkers.
- Wasted kmalloc() memory tracking in debugfs alloc_traces
A patch from Feng Tang that enhances the existing debugfs
alloc_traces file for kmalloc caches with information about how much
space is wasted by allocations that needs less space than the
particular kmalloc cache provides.
- My series [3] to fix validation races for caches with enabled
debugging:
- By decoupling the debug cache operation more from non-debug
fastpaths, extra locking simplifications were possible and thus
done afterwards.
- Additional cleanup of PREEMPT_RT specific code on top, by Thomas
Gleixner.
- A late fix for slab page leaks caused by the series, by Feng
Tang.
- Smaller fixes and cleanups:
- Unneeded variable removals, by ye xingchen
- A cleanup removing a BUG_ON() in create_unique_id(), by Chao Yu
Link: https://lore.kernel.org/all/20220817101826.236819-1-42.hyeyoo@gmail.com/ Link: https://lore.kernel.org/all/20220923202822.2667581-1-keescook@chromium.org/ Link: https://lore.kernel.org/all/20220823170400.26546-1-vbabka@suse.cz/
* tag 'slab-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (30 commits)
mm/slub: fix a slab missed to be freed problem
slab: Introduce kmalloc_size_roundup()
slab: Remove __malloc attribute from realloc functions
mm/slub: clean up create_unique_id()
mm/slub: enable debugging memory wasting of kmalloc
slub: Make PREEMPT_RT support less convoluted
mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock()
mm/slub: convert object_map_lock to non-raw spinlock
mm/slub: remove slab_lock() usage for debug operations
mm/slub: restrict sysfs validation to debug caches and make it safe
mm/sl[au]b: check if large object is valid in __ksize()
mm/slab_common: move declaration of __ksize() to mm/slab.h
mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
mm/slab_common: unify NUMA and UMA version of tracepoints
mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
mm/sl[au]b: generalize kmalloc subsystem
mm/slub: move free_debug_processing() further
mm/sl[au]b: introduce common alloc/free functions without tracepoint
mm/slab: kmalloc: pass requests larger than order-1 page to page allocator
mm/slab_common: cleanup kmalloc_large()
...
Linus Torvalds [Mon, 10 Oct 2022 17:16:00 +0000 (10:16 -0700)]
Merge tag 'timers-core-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A boring time, timekeeping, timers update:
- No core code changes
- No new clocksource/event driver
- Cleanup of the TI DM clocksource/event driver
- The usual set of device tree binding updates
- Small improvement, fixes and cleanups all over the place"
* tag 'timers-core-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
clocksource/drivers/arm_arch_timer: Fix CNTPCT_LO and CNTVCT_LO value
clocksource/drivers/imx-sysctr: handle nxp,no-divider property
dt-bindings: timer: nxp,sysctr-timer: add nxp,no-divider property
clocksource/drivers/timer-ti-dm: Get clock in probe with devm_clk_get()
clocksource/drivers/timer-ti-dm: Add flag to detect omap1
clocksource/drivers/timer-ti-dm: Move struct omap_dm_timer fields to driver
clocksource/drivers/timer-ti-dm: Use runtime PM directly and check errors
clocksource/drivers/timer-ti-dm: Move private defines to the driver
clocksource/drivers/timer-ti-dm: Simplify register access further
clocksource/drivers/timer-ti-dm: Simplify register writes with dmtimer_write()
clocksource/drivers/timer-ti-dm: Simplify register reads with dmtimer_read()
clocksource/drivers/timer-ti-dm: Drop unused functions
clocksource/drivers/timer-gxp: Add missing error handling in gxp_timer_probe
clocksource/drivers/arm_arch_timer: Fix handling of ARM erratum 858921
clocksource/drivers/exynos_mct: Enable building on ARTPEC
clocksource/drivers/exynos_mct: Support local-timers property
clocksource/drivers/exynos_mct: Support frc-shared property
dt-bindings: timer: exynos4210-mct: Add ARTPEC-8 MCT support
clocksource/drivers/sun4i: Add definition of clear interrupt
clocksource/drivers/renesas-ostm: Add support for RZ/V2L SoC
...
Linus Torvalds [Mon, 10 Oct 2022 17:03:24 +0000 (10:03 -0700)]
Merge tag 'sched-rt-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull preempt RT updates from Thomas Gleixner:
"Introduce preempt_[dis|enable_nested() and use it to clean up various
places which have open coded PREEMPT_RT conditionals.
On PREEMPT_RT enabled kernels, spinlocks and rwlocks are neither
disabling preemption nor interrupts. Though there are a few places
which depend on the implicit preemption/interrupt disable of those
locks, e.g. seqcount write sections, per CPU statistics updates etc.
PREEMPT_RT added open coded CONFIG_PREEMPT_RT conditionals to
disable/enable preemption in the related code parts all over the
place. That's hard to read and does not really explain why this is
necessary.
Linus suggested to use helper functions (preempt_disable_nested() and
preempt_enable_nested()) and use those in the affected places. On !RT
enabled kernels these functions are NOPs, but contain a lockdep assert
to validate that preemption is actually disabled to catch call sites
which do not have preemption disabled.
Clean up the affected code paths in mm, dentry and lib"
* tag 'sched-rt-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
u64_stats: Streamline the implementation
flex_proportions: Disable preemption entering the write section.
mm/compaction: Get rid of RT ifdeffery
mm/memcontrol: Replace the PREEMPT_RT conditionals
mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()
mm/vmstat: Use preempt_[dis|en]able_nested()
dentry: Use preempt_[dis|en]able_nested()
preempt: Provide preempt_[dis|en]able_nested()
Linus Torvalds [Mon, 10 Oct 2022 16:56:19 +0000 (09:56 -0700)]
Merge tag 'objtool-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Remove the "ANNOTATE_NOENDBR on ENDBR" warning: it's not really
useful and only found a non-bug false positive so far.
- Properly decode LOOP/LOOPE/LOOPNE, which were missing from the x86
decoder. Because these instructions are rather ineffective, they
never showed up in compiler output, but they are simple enough to
support, so add them for completeness.
- A bit more cross-arch preparatory work.
* tag 'objtool-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool,x86: Teach decode about LOOP* instructions
objtool: Remove "ANNOTATE_NOENDBR on ENDBR" warning
objtool: Use arch_jump_destination() in read_intra_function_calls()
Linus Torvalds [Mon, 10 Oct 2022 16:44:12 +0000 (09:44 -0700)]
Merge tag 'locking-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Disable preemption in rwsem_write_trylock()'s attempt to take the
rwsem, to avoid RT tasks hogging the CPU, which managed to preempt
this function after the owner has been cleared but before a new owner
is set. Also add debug checks to enforce this.
- Add __lockfunc to more slow path functions and add __sched to
semaphore functions.
- Mark spinlock APIs noinline when the respective CONFIG_INLINE_SPIN_*
toggles are disabled, to reduce LTO text size.
- Print more debug information when lockdep gets confused in
look_up_lock_class().
- Improve header file abuse checks.
- Misc cleanups
* tag 'locking-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Print more debug information - report name and key when look_up_lock_class() got confused
locking: Add __sched to semaphore functions
locking/rwsem: Disable preemption while trying for rwsem lock
locking: Detect includes rwlock.h outside of spinlock.h
locking: Add __lockfunc to slow path functions
locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled
selftests: futex: Fix 'the the' typo in comment