Linus Torvalds [Thu, 24 Feb 2022 19:08:15 +0000 (11:08 -0800)]
Merge tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Add a conditional schedule point in io_add_buffers() (Eric)
- Fix for a quiesce speedup merged in this release (Dylan)
- Don't convert to jiffies for event timeout waiting, it's way too
coarse when we accept a timespec as input (me)
* tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block:
io_uring: disallow modification of rsrc_data during quiesce
io_uring: don't convert to jiffies for waiting on timeouts
io_uring: add a schedule point in io_add_buffers()
Linus Torvalds [Thu, 24 Feb 2022 18:42:20 +0000 (10:42 -0800)]
Merge tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull more x86 platform driver fixes from Hans de Goede:
"Two more fixes:
- Fix suspend/resume regression on AMD Cezanne APUs in >= 5.16
- Fix Microsoft Surface 3 battery readings"
* tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
surface: surface3_power: Fix battery readings on batteries without a serial number
platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup
Hans de Goede [Thu, 24 Feb 2022 10:18:48 +0000 (11:18 +0100)]
surface: surface3_power: Fix battery readings on batteries without a serial number
The battery on the 2nd hand Surface 3 which I recently bought appears to
not have a serial number programmed in. This results in any I2C reads from
the registers containing the serial number failing with an I2C NACK.
This was causing mshw0011_bix() to fail causing the battery readings to
not work at all.
Ignore EREMOTEIO (I2C NACK) errors when retrieving the serial number and
continue with an empty serial number to fix this.
platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup
commit 5ce9f7468cc3 ("platform/x86: amd-pmc: Add special handling for
timer based S0i3 wakeup") adds support for using another platform timer
in lieu of the RTC which doesn't work properly on some systems. This path
was validated and worked well before submission. During the 5.16-rc1 merge
window other patches were merged that caused this to stop working properly.
When this feature was used with 5.16-rc1 or later some OEM laptops with the
matching firmware requirements from that commit would shutdown instead of
program a timer based wakeup.
This was bisected to commit f146da28dd2f ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path"). This wasn't supposed to cause any
negative impacts and also tested well on both Intel and ARM platforms.
However this changed the semantics of when CPUs are allowed to be in the
deepest state. For the AMD systems in question it appears this causes a
firmware crash for timer based wakeup.
It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT`
and all the CPUs going into a deep state while the timer is still being
programmed. It's likely a firmware bug, but to avoid it don't allow setting
CPUs into the deepest state while using CZN timer wakeup path.
If later it's discovered that this also occurs from "regular" suspends
without a timer as well or on other silicon, this may be later expanded to
run in the suspend path for more scenarios.
Linus Torvalds [Thu, 24 Feb 2022 01:25:22 +0000 (17:25 -0800)]
Merge tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Update some maintainers email addresses
- Fix handling of elfcorehdr reservation for crash dump kernel
- Fix unittest expected warnings text
* tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: update Roger Quadros email
MAINTAINERS: sifive: drop Yash Shah
of/fdt: move elfcorehdr reservation early for crash dump kernel
of: unittest: update text of expected warnings
Linus Torvalds [Wed, 23 Feb 2022 20:06:23 +0000 (12:06 -0800)]
Merge tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc unaligned handler fixes from Helge Deller:
"Two patches which fix a few bugs in the unalignment handlers.
The fldd and fstd instructions weren't handled at all on 32-bit
kernels, the stw instruction didn't check for fault errors and the
fldw_l and ldw_m were handled wrongly as integer vs floating point
instructions.
Both patches are tagged for stable series"
* tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc/unaligned: Fix ldw() and stw() unalignment handlers
parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
Linus Torvalds [Wed, 23 Feb 2022 19:51:35 +0000 (11:51 -0800)]
Merge tag 'hwmon-for-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix two old bugs and one new bug in the hwmon subsystem:
- In pmbus core, clear pmbus fault/warning status bits after read to
follow PMBus standard
- In hwmon core, handle failure to register sensor with thermal zone
correctly
- In ntc_thermal driver, use valid thermistor names for Samsung
thermistors"
* tag 'hwmon-for-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus) Clear pmbus fault/warning bits after read
hwmon: Handle failure to register sensor with thermal zone correctly
hwmon: (ntc_thermistor) Underscore Samsung thermistor
Linus Torvalds [Wed, 23 Feb 2022 19:33:12 +0000 (11:33 -0800)]
Merge tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Build fix (workaround) for clang.
- Fix a /proc/kcore based slabinfo script broken by struct slab changes
in 5.17-rc1.
* tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
tools/cgroup/slabinfo: update to work with struct slab
slab: remove __alloc_size attribute from __kmalloc_track_caller
Helge Deller [Fri, 18 Feb 2022 08:25:20 +0000 (09:25 +0100)]
parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
Usually the kernel provides fixup routines to emulate the fldd and fstd
floating-point instructions if they load or store 8-byte from/to a not
natuarally aligned memory location.
On a 32-bit kernel I noticed that those unaligned handlers didn't worked and
instead the application got a SEGV.
While checking the code I found two problems:
First, the OPCODE_FLDD_L and OPCODE_FSTD_L cases were ifdef'ed out by the
CONFIG_PA20 option, and as such those weren't built on a pure 32-bit kernel.
This is now fixed by moving the CONFIG_PA20 #ifdef to prevent the compilation
of OPCODE_LDD_L and OPCODE_FSTD_L only, and handling the fldd and fstd
instructions.
The second problem are two bugs in the 32-bit inline assembly code, where the
wrong registers where used. The calculation of the natural alignment used %2
(vall) instead of %3 (ior), and the first word was stored back to address %1
(valh) instead of %3 (ior).
Linus Torvalds [Wed, 23 Feb 2022 00:14:35 +0000 (16:14 -0800)]
Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix for a subtle bug in the recent release_agent permission check
update
- Fix for a long-standing race condition between cpuset and cpu hotplug
- Comment updates
* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: Fix kernel-doc
cgroup-v1: Correct privileges check in release_agent writes
cgroup: clarify cgroup_css_set_fork()
cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
Ondrej Mosnacek [Mon, 21 Feb 2022 14:06:49 +0000 (15:06 +0100)]
selinux: fix misuse of mutex_is_locked()
mutex_is_locked() tests whether the mutex is locked *by any task*, while
here we want to test if it is held *by the current task*. To avoid
false/missed WARNINGs, use lockdep_assert_is_held() and
lockdep_assert_is_not_held() instead, which do the right thing (though
they are a no-op if CONFIG_LOCKDEP=n).
Cc: stable@vger.kernel.org Fixes: 530fc4905dda ("selinux: measure state and policy capabilities") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Emails to Roger Quadros TI account bounce with:
550 Invalid recipient <rogerq@ti.com> (#5.1.1)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Acked-by: Roger Quadros <rogerq@kernel.org> Acked-By: Vinod Koul <vkoul@kernel.org> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220221100701.48593-1-krzysztof.kozlowski@canonical.com
Jiapeng Chong [Wed, 16 Feb 2022 03:17:53 +0000 (11:17 +0800)]
cpuset: Fix kernel-doc
Fix the following W=1 kernel warnings:
kernel/cgroup/cpuset.c:3718: warning: expecting prototype for
cpuset_memory_pressure_bump(). Prototype was for
__cpuset_memory_pressure_bump() instead.
kernel/cgroup/cpuset.c:3568: warning: expecting prototype for
cpuset_node_allowed(). Prototype was for __cpuset_node_allowed()
instead.
Michal Koutný [Thu, 17 Feb 2022 16:11:28 +0000 (17:11 +0100)]
cgroup-v1: Correct privileges check in release_agent writes
The idea is to check: a) the owning user_ns of cgroup_ns, b)
capabilities in init_user_ns.
The commit e6a88a3740c5 ("cgroup-v1: Require capabilities to set
release_agent") got this wrong in the write handler of release_agent
since it checked user_ns of the opener (may be different from the owning
user_ns of cgroup_ns).
Secondly, to avoid possibly confused deputy, the capability of the
opener must be checked.
Fixes: e6a88a3740c5 ("cgroup-v1: Require capabilities to set release_agent") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/ Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Masami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp> Signed-off-by: Tejun Heo <tj@kernel.org>
With recent fixes for the permission checking when moving a task into a cgroup
using a file descriptor to a cgroup's cgroup.procs file and calling write() it
seems a good idea to clarify CLONE_INTO_CGROUP permission checking with a
comment.
Dylan Yudaken [Tue, 22 Feb 2022 16:17:51 +0000 (08:17 -0800)]
io_uring: disallow modification of rsrc_data during quiesce
io_rsrc_ref_quiesce will unlock the uring while it waits for references to
the io_rsrc_data to be killed.
There are other places to the data that might add references to data via
calls to io_rsrc_node_switch.
There is a race condition where this reference can be added after the
completion has been signalled. At this point the io_rsrc_ref_quiesce call
will wake up and relock the uring, assuming the data is unused and can be
freed - although it is actually being used.
To fix this check in io_rsrc_ref_quiesce if a resource has been revived.
Vikash Chandola [Tue, 22 Feb 2022 13:12:53 +0000 (13:12 +0000)]
hwmon: (pmbus) Clear pmbus fault/warning bits after read
Almost all fault/warning bits in pmbus status registers remain set even
after fault/warning condition are removed. As per pmbus specification
these faults must be cleared by user.
Modify hwmon behavior to clear fault/warning bit after fetching data if
fault/warning bit was set. This allows to get fresh data in next read.
Guenter Roeck [Mon, 21 Feb 2022 16:32:14 +0000 (08:32 -0800)]
hwmon: Handle failure to register sensor with thermal zone correctly
If an attempt is made to a sensor with a thermal zone and it fails,
the call to devm_thermal_zone_of_sensor_register() may return -ENODEV.
This may result in crashes similar to the following.
The hwmon core needs to handle all errors returned from calls
to devm_thermal_zone_of_sensor_register(). If the call fails
with -ENODEV, report that the sensor was not attached to a
thermal zone but continue to register the hwmon device.
Linus Torvalds [Mon, 21 Feb 2022 17:10:53 +0000 (09:10 -0800)]
Merge tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Two small fixes and one hardware-id addition"
* tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: int3472: Add terminator to gpiod_lookup_table
platform/x86: asus-wmi: Fix regression when probing for fan curve control
platform/x86: thinkpad_acpi: Add dual-fan quirk for T15g (2nd gen)
Max Kellermann [Mon, 21 Feb 2022 10:03:13 +0000 (11:03 +0100)]
lib/iov_iter: initialize "flags" in new pipe_buffer
The functions copy_page_to_iter_pipe() and push_pipe() can both
allocate a new pipe_buffer, but the "flags" member initializer is
missing.
Fixes: 96607715cf28 ("new iov_iter flavour: pipe-backed")
To: Alexander Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
To: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Daniel Scally [Wed, 16 Feb 2022 22:53:02 +0000 (22:53 +0000)]
platform/x86: int3472: Add terminator to gpiod_lookup_table
Without the terminator, if a con_id is passed to gpio_find() that
does not exist in the lookup table the function will not stop looping
correctly, and eventually cause an oops.
Fixes: dd65ed821fe3 ("platform/x86: int3472: Pass tps68470_regulator_platform_data to the tps68470-regulator MFD-cell") Signed-off-by: Daniel Scally <djrscally@gmail.com> Link: https://lore.kernel.org/r/20220216225304.53911-5-djrscally@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Jens Axboe [Mon, 21 Feb 2022 12:49:30 +0000 (05:49 -0700)]
io_uring: don't convert to jiffies for waiting on timeouts
If an application calls io_uring_enter(2) with a timespec passed in,
convert that timespec to ktime_t rather than jiffies. The latter does
not provide the granularity the application may expect, and may in
fact provided different granularity on different systems, depending
on what the HZ value is configured at.
Turn the timespec into an absolute ktime_t, and use that with
schedule_hrtimeout() instead.
Roman Gushchin [Wed, 16 Feb 2022 20:43:30 +0000 (12:43 -0800)]
tools/cgroup/slabinfo: update to work with struct slab
After the introduction of the dedicated struct slab to describe slab
pages by commit 72c5556853bb ("mm: Split slab into its own type") and
the following removal of the corresponding struct page's fields by
commit b8cdde2f41ce ("mm: Remove slab from struct page") the
memcg_slabinfo tool broke. An attempt to run it produces a trace like
this:
Traceback (most recent call last):
File "/usr/bin/drgn", line 33, in <module>
sys.exit(load_entry_point('drgn==0.0.16', 'console_scripts', 'drgn')())
File "/usr/lib64/python3.9/site-packages/drgn/internal/cli.py", line 133, in main
runpy.run_path(args.script[0], init_globals=init_globals, run_name="__main__")
File "/usr/lib64/python3.9/runpy.py", line 268, in run_path
return _run_module_code(code, init_globals, run_name,
File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "memcg_slabinfo.py", line 226, in <module>
main()
File "memcg_slabinfo.py", line 199, in main
cache = page.slab_cache
AttributeError: 'struct page' has no member 'slab_cache'
The problem can be fixed by explicitly casting struct page * to struct
slab * for slab pages. The tools works as expected with this fix, e.g.:
slab: remove __alloc_size attribute from __kmalloc_track_caller
Commit 7acd3f7944a5 ("slab: add __alloc_size attributes for better
bounds checking") added __alloc_size attributes to a bunch of kmalloc
function prototypes. Unfortunately the change to __kmalloc_track_caller
seems to cause clang to generate broken code and the first time this is
called when booting, the box will crash.
While the compiler problems are being reworked and attempted to be
solved [1], let's just drop the attribute to solve the issue now. Once
it is resolved it can be added back.
Linus Torvalds [Sun, 20 Feb 2022 20:04:14 +0000 (12:04 -0800)]
Merge tag 'edac_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"Fix a long-standing struct alignment bug in the EDAC struct allocation
code"
* tag 'edac_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
Linus Torvalds [Sun, 20 Feb 2022 19:51:49 +0000 (11:51 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes, all in drivers.
The ufs and qedi fixes are minor; the lpfc one is a bit bigger because
it involves adding a heuristic to detect and deal with common but not
standards compliant behaviour"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix divide by zero in ufshcd_map_queues()
scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp()
Linus Torvalds [Sun, 20 Feb 2022 19:30:18 +0000 (11:30 -0800)]
Merge tag 'dmaengine-fix-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"A bunch of driver fixes for:
- ptdma error handling in init
- lock fix in at_hdmac
- error path and error num fix for sh dma
- pm balance fix for stm32"
* tag 'dmaengine-fix-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: shdma: Fix runtime PM imbalance on error
dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
dmaengine: sh: rcar-dmac: Check for error num after setting mask
dmaengine: at_xdmac: Fix missing unlock in at_xdmac_tasklet()
dmaengine: ptdma: Fix the error handling path in pt_core_init()
Linus Torvalds [Sun, 20 Feb 2022 19:23:48 +0000 (11:23 -0800)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some driver updates, a MAINTAINERS fix, and additions to COMPILE_TEST
(so we won't miss build problems again)"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: remove duplicate entry for i2c-qcom-geni
i2c: brcmstb: fix support for DSL and CM variants
i2c: qup: allow COMPILE_TEST
i2c: imx: allow COMPILE_TEST
i2c: cadence: allow COMPILE_TEST
i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
i2c: qcom-cci: don't delete an unregistered adapter
i2c: bcm2835: Avoid clock stretching timeouts
Linus Torvalds [Sun, 20 Feb 2022 19:15:46 +0000 (11:15 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for Synaptics touchpads in RMI4 mode failing to suspend/resume
properly because I2C client devices are now being suspended and
resumed asynchronously which changed the ordering
- a change to make sure we do not set right and middle buttons
capabilities on touchpads that are "buttonpads" (i.e. do not have
separate physical buttons)
- a change to zinitix touchscreen driver adding more compatible
strings/IDs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: psmouse - set up dependency between PS/2 and SMBus companions
Input: zinitix - add new compatible strings
Input: clear BTN_RIGHT/MIDDLE on buttonpads
Linus Torvalds [Sun, 20 Feb 2022 19:07:46 +0000 (11:07 -0800)]
Merge tag 'for-v5.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
"Three regression fixes for the 5.17 cycle:
- build warning fix for power-supply documentation
- pointer size fix in cw2015 battery driver
- OOM handling in bq256xx charger driver"
* tag 'for-v5.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: bq256xx: Handle OOM correctly
power: supply: core: fix application of sizeof to pointer
power: supply: fix table problem in sysfs-class-power
Linus Torvalds [Sun, 20 Feb 2022 19:01:47 +0000 (11:01 -0800)]
Merge tag 'fs.mount_setattr.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull mount_setattr test/doc fixes from Christian Brauner:
"This contains a fix for one of the selftests for the mount_setattr
syscall to create idmapped mounts, an entry for idmapped mounts for
maintainers, and missing kernel documentation for the helper we split
out some time ago to get and yield write access to a mount when
changing mount properties"
* tag 'fs.mount_setattr.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: add kernel doc for mnt_{hold,unhold}_writers()
MAINTAINERS: add entry for idmapped mounts
tests: fix idmapped mount_setattr test
Linus Torvalds [Sun, 20 Feb 2022 18:55:05 +0000 (10:55 -0800)]
Merge tag 'pidfd.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd fix from Christian Brauner:
"This fixes a problem reported by lockdep when installing a pidfd via
fd_install() with siglock and the tasklisk write lock held in
copy_process() when calling clone()/clone3() with CLONE_PIDFD.
Originally a pidfd was created prior to holding any of these locks but
this required a call to ksys_close(). So quite some time ago in 9506a95c5a8d ("copy_process(): don't use ksys_close() on cleanups") we
switched to a get_unused_fd_flags() + fd_install() model.
As part of that we moved fd_install() as late as possible. This was
done for two main reasons. First, because we needed to ensure that we
call fd_install() past the point of no return as once that's called
the fd is live in the task's file table. Second, because we tried to
ensure that the fd is visible in /proc/<pid>/fd/<pidfd> right when the
task is visible.
This fix moves the fd_install() to an even later point which means
that a task will be visible in proc while the pidfd isn't yet under
/proc/<pid>/fd/<pidfd>.
While this is a user visible change it's very unlikely that this will
have any impact. Nobody should be relying on that and if they do we
need to come up with something better but again, it's doubtful this is
relevant"
* tag 'pidfd.v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
copy_process(): Move fd_install() out of sighand->siglock critical section
Linus Torvalds [Sun, 20 Feb 2022 18:44:11 +0000 (10:44 -0800)]
Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fixes from Eric Biederman:
"Michal Koutný recently found some bugs in the enforcement of
RLIMIT_NPROC in the recent ucount rlimit implementation.
In this set of patches I have developed a very conservative approach
changing only what is necessary to fix the bugs that I can see
clearly. Cleanups and anything that is making the code more consistent
can follow after we have the code working as it has historically.
The problem is not so much inconsistencies (although those exist) but
that it is very difficult to figure out what the code should be doing
in the case of RLIMIT_NPROC.
All other rlimits are only enforced where the resource is acquired
(allocated). RLIMIT_NPROC by necessity needs to be enforced in an
additional location, and our current implementation stumbled it's way
into that implementation"
* 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: Handle wrapping in is_ucounts_overlimit
ucounts: Move RLIMIT_NPROC handling after set_user
ucounts: Base set_cred_ucounts changes on the real user
ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
Wolfram Sang [Fri, 18 Feb 2022 10:49:04 +0000 (11:49 +0100)]
MAINTAINERS: remove duplicate entry for i2c-qcom-geni
The driver is already covered in the ARM/QUALCOMM section. Also, Akash
Asthana's email bounces meanwhile and Mukesh Savaliya has never
responded to mails regarding this driver.
Signed-off-by: Wolfram Sang <wsa@kernel.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Peter Zijlstra [Mon, 14 Feb 2022 09:16:57 +0000 (10:16 +0100)]
sched: Fix yet more sched_fork() races
Where commit 6e9fcdf29782 ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.
Commit c4692466afa2 ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.
Linus Torvalds [Sat, 19 Feb 2022 00:24:44 +0000 (16:24 -0800)]
Merge tag 'nfs-for-5.17-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
- Fix unnecessary changeattr revalidations
- Fix resolving symlinks during directory lookups
- Don't report writeback errors in nfs_getattr()
* tag 'nfs-for-5.17-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Do not report writeback errors in nfs_getattr()
NFS: LOOKUP_DIRECTORY is also ok with symlinks
NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()
Linus Torvalds [Sat, 19 Feb 2022 00:19:14 +0000 (16:19 -0800)]
Merge tag 'acpi-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These make an excess warning message go away and fix a recently
introduced boot failure on a vintage machine.
Specifics:
- Change the log level of the "table not found" message in
acpi_table_parse_entries_array() to debug to prevent it from
showing up in the logs unnecessarily (Dan Williams)
- Add a C-state limit quirk for 32-bit ThinkPad T40 to prevent it
from crashing on boot after recent changes in the ACPI processor
driver (Woody Suwalski)"
* tag 'acpi-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40
ACPI: tables: Quiet ACPI table not found warning
Linus Torvalds [Sat, 19 Feb 2022 00:14:13 +0000 (16:14 -0800)]
Merge tag 'riscv-for-linus-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"A set of three fixes, all aimed at fixing some fallout from the recent
sparse hart ID support"
* tag 'riscv-for-linus-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix IPI/RFENCE hmask on non-monotonic hartid ordering
RISC-V: Fix handling of empty cpu masks
RISC-V: Fix hartid mask handling for hartid 31 and up
Dmitry Torokhov [Tue, 15 Feb 2022 21:32:26 +0000 (13:32 -0800)]
Input: psmouse - set up dependency between PS/2 and SMBus companions
When we switch from emulated PS/2 to native (RMI4 or Elan) protocols, we
create SMBus companion devices that are attached to I2C/SMBus controllers.
However, when suspending and resuming, we also need to make sure that we
take into account the PS/2 device they are associated with, so that PS/2
device is suspended after the companion and resumed before it, otherwise
companions will not work properly. Before I2C devices were marked for
asynchronous suspend/resume, this ordering happened naturally, but now we
need to enforce it by establishing device links, with PS/2 devices being
suppliers and SMBus companions being consumers.
Linus Torvalds [Fri, 18 Feb 2022 17:27:10 +0000 (09:27 -0800)]
Merge tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Surprise removal fix (Christoph)
- Ensure that pages are zeroed before submitted for userspace IO
(Haimin)
- Fix blk-wbt accounting issue with BFQ (Laibin)
- Use bsize for discard granularity in loop (Ming)
- Fix missing zone handling in blk_complete_request() (Pankaj)
* tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block:
block/wbt: fix negative inflight counter when remove scsi device
block: fix surprise removal for drivers calling blk_set_queue_dying
block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
block: loop:use kstatfs.f_bsize of backing file to set discard granularity
block: Add handling for zone append command in blk_complete_request
- Regression fix for Realtek HD-audio mutex deadlock
- Regression fix for USB-audio PM resume error
- More coverage of ASoC core control API notification fixes
- Old regression fixes for HD-audio probe mask
- Fixes for ASoC Realtek codec work handling
- Other device-specific quirks / fixes"
* tag 'sound-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ASoC: intel: skylake: Set max DMA segment size
ASoC: SOF: hda: Set max DMA segment size
ALSA: hda: Set max DMA segment size
ALSA: hda/realtek: Fix deadlock by COEF mutex
ALSA: usb-audio: Don't abort resume upon errors
ALSA: hda: Fix missing codec probe on Shenker Dock 15
ALSA: hda: Fix regression on forced probe mask option
ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
ALSA: usb-audio: revert to IMPLICIT_FB_FIXED_DEV for M-Audio FastTrack Ultra
ASoC: wm_adsp: Correct control read size when parsing compressed buffer
ASoC: qcom: Actually clear DMA interrupt register for HDMI
ALSA: memalloc: invalidate SG pages before sync
ALSA: memalloc: Fix dma_need_sync() checks
MAINTAINERS: update cros_ec_codec maintainers
ASoC: rt5682: do not block workqueue if card is unbound
ASoC: rt5668: do not block workqueue if card is unbound
ASoC: rt5682s: do not block workqueue if card is unbound
ASoC: tas2770: Insert post reset delay
ASoC: Revert "ASoC: mediatek: Check for error clk pointer"
ASoC: amd: acp: Set gpio_spkr_en to None for max speaker amplifer in machine driver
...
Linus Torvalds [Fri, 18 Feb 2022 17:10:14 +0000 (09:10 -0800)]
Merge tag 'powerpc-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix boot failure on 603 with DEBUG_PAGEALLOC and KFENCE
- Fix 32-build with newer binutils that rejects 'ptesync' etc
Thanks to Anders Roxell, Christophe Leroy, and Maxime Bizon.
* tag 'powerpc-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/lib/sstep: fix 'ptesync' build error
powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
Linus Torvalds [Fri, 18 Feb 2022 17:04:27 +0000 (09:04 -0800)]
Merge tag '5.17-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Six small smb3 client fixes, three for stable:
- fix for snapshot mount option
- two ACL related fixes
- use after free race fix
- fix for confusing warning message logged with older dialects"
* tag '5.17-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix confusing unneeded warning message on smb2.1 and earlier
cifs: modefromsids must add an ACE for authenticated users
cifs: fix double free race when mount fails in cifs_get_root()
cifs: do not use uninitialized data in the owner/group sid
cifs: fix set of group SID via NTSD xattrs
smb3: fix snapshot mount option
xfpregs_set() handles 32-bit REGSET_XFP and 64-bit REGSET_FP. The actual
code treats these regsets as modern FX state (i.e. the beginning part of
XSTATE). The declarations of the regsets thought they were the legacy
i387 format. The code thought they were the 32-bit (no xmm8..15) variant
of XSTATE and, for good measure, made the high bits disappear by zeroing
the wrong part of the buffer. The latter broke ptrace, and everything
else confused anyone trying to understand the code. In particular, the
nonsense definitions of the regsets confused me when I wrote this code.
Clean this all up. Change the declarations to match reality (which
shouldn't change the generated code, let alone the ABI) and fix
xfpregs_set() to clear the correct bits and to only do so for 32-bit
callers.
Rafał Miłecki [Tue, 15 Feb 2022 07:27:35 +0000 (08:27 +0100)]
i2c: brcmstb: fix support for DSL and CM variants
DSL and CM (Cable Modem) support 8 B max transfer size and have a custom
DT binding for that reason. This driver was checking for a wrong
"compatible" however which resulted in an incorrect setup.
Fixes: 3505762f0dfd ("i2c: brcmstb: Adding support for CM and DSL SoCs") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Linus Torvalds [Thu, 17 Feb 2022 23:21:42 +0000 (15:21 -0800)]
Merge tag 'linux-kselftest-fixes-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Fixes to ftrace, exec, and seccomp tests build, run-time and install
bugs. These bugs are in the way of running the tests"
* tag 'linux-kselftest-fixes-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Do not trace do_softirq because of PREEMPT_RT
selftests/seccomp: Fix seccomp failure by adding missing headers
selftests/exec: Add non-regular to TEST_GEN_PROGS
Nikhil Gupta [Fri, 28 Jan 2022 04:23:21 +0000 (09:53 +0530)]
of/fdt: move elfcorehdr reservation early for crash dump kernel
elfcorehdr_addr is fixed address passed to Second kernel which may be conflicted
with potential reserved memory in Second kernel,so fdt_reserve_elfcorehdr() ahead
of fdt_init_reserved_mem() can relieve this situation.
Linus Torvalds [Thu, 17 Feb 2022 21:11:46 +0000 (13:11 -0800)]
Merge tag 'drm-fixes-2022-02-18' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular fixes for rc5, nothing really stands out, mostly some amdgpu
and i915 fixes with mediatek, radeon and some misc fixes.
cma-helper:
- set VM_DONTEXPAND
atomic:
- error handling fix
mediatek:
- fix probe defer loop with external bridge
i915:
- GVT kerneldoc cleanup.
- GVT Kconfig should depend on X86
- Prevent out of range access in SWSCI display code
- Fix mbus join and dbuf slice config lookup
- Fix inverted priority selection in the TTM backend
- Fix FBC plane end Y offset check"
* tag 'drm-fixes-2022-02-18' of git://anongit.freedesktop.org/drm/drm:
drm/atomic: Don't pollute crtc_state->mode_blob with error pointers
drm/radeon: Fix backlight control on iMac 12,1
drm/amd/pm: correct the sequence of sending gpu reset msg
drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.
drm/amd/pm: correct UMD pstate clocks for Dimgrey Cavefish and Beige Goby
drm/i915/fbc: Fix the plane end Y offset check
drm/i915/opregion: check port number bounds for SWSCI display power state
drm/i915/ttm: tweak priority hint selection
drm/i915: Fix mbus join config lookup
drm/i915: Fix dbuf slice config lookup
drm/cma-helper: Set VM_DONTEXPAND for mmap
drm/mediatek: mtk_dsi: Avoid EPROBE_DEFER loop with external bridge
drm/i915/gvt: Make DRM_I915_GVT depend on X86
drm/i915/gvt: clean up kernel-doc in gtt.c
Linus Torvalds [Thu, 17 Feb 2022 19:33:59 +0000 (11:33 -0800)]
Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and netfilter.
Current release - regressions:
- dsa: lantiq_gswip: fix use after free in gswip_remove()
- smc: avoid overwriting the copies of clcsock callback functions
Current release - new code bugs:
- iwlwifi:
- fix use-after-free when no FW is present
- mei: fix the pskb_may_pull check in ipv4
- mei: retry mapping the shared area
- mvm: don't feed the hardware RFKILL into iwlmei
Previous releases - regressions:
- ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
- tipc: fix wrong publisher node address in link publications
- iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
assertion
- bgmac: make idm and nicpm resource optional again
- atl1c: fix tx timeout after link flap
Previous releases - always broken:
- vsock: remove vsock from connected table when connect is
interrupted by a signal
- ping: change destination interface checks to match raw sockets
- crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
semantics (and null-deref) after SO_RESERVE_MEM was added
- ipv6: make exclusive flowlabel checks per-netns
- bonding: force carrier update when releasing slave
- sched: limit TC_ACT_REPEAT loops
- bridge: multicast: notify switchdev driver whenever MC processing
gets disabled because of max entries reached
- wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
- iwlwifi: fix locking when "HW not ready"
- phy: mediatek: remove PHY mode check on MT7531
- dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
- dsa: lan9303:
- fix polarity of reset during probe
- fix accelerated VLAN handling"
* tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
bonding: force carrier update when releasing slave
nfp: flower: netdev offload check for ip6gretap
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
ipv4: fix data races in fib_alias_hw_flags_set
net: dsa: lan9303: add VLAN IDs to master device
net: dsa: lan9303: handle hwaccel VLAN tags
vsock: remove vsock from connected table when connect is interrupted by a signal
Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
ping: fix the dif and sdif check in ping_lookup
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
net: sched: limit TC_ACT_REPEAT loops
tipc: fix wrong notification node addresses
net: dsa: lantiq_gswip: fix use after free in gswip_remove()
ipv6: per-netns exclusive flowlabel checks
net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
CDC-NCM: avoid overflow in sanity checking
mctp: fix use after free
net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
bonding: fix data-races around agg_select_timer
dpaa2-eth: Initialize mutex used in one step timestamping path
...
Zhang Changzhong [Wed, 16 Feb 2022 14:18:08 +0000 (22:18 +0800)]
bonding: force carrier update when releasing slave
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
Reinette Chatre [Wed, 2 Feb 2022 19:41:12 +0000 (11:41 -0800)]
x86/sgx: Fix missing poison handling in reclaimer
The SGX reclaimer code lacks page poison handling in its main
free path. This can lead to avoidable machine checks if a
poisoned page is freed and reallocated instead of being
isolated.
A troublesome scenario is:
1. Machine check (#MC) occurs (asynchronous, !MF_ACTION_REQUIRED)
2. arch_memory_failure() is eventually called
3. (SGX) page->poison set to 1
4. Page is reclaimed
5. Page added to normal free lists by sgx_reclaim_pages()
^ This is the bug (poison pages should be isolated on the
sgx_poison_page_list instead)
6. Page is reallocated by some innocent enclave, a second (synchronous)
in-kernel #MC is induced, probably during EADD instruction.
^ This is the fallout from the bug
(6) is unfortunate and can be avoided by replacing the open coded
enclave page freeing code in the reclaimer with sgx_free_epc_page()
to obtain support for poison page handling that includes placing the
poisoned page on the correct list.
Fixes: 6f798ed2d215 ("x86/sgx: Add new sgx_epc_page flag bit to mark free pages") Fixes: 06ff25541112 ("x86/sgx: Initial poison handling for dirty and free pages") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/dcc95eb2aaefb042527ac50d0a50738c7c160dac.1643830353.git.reinette.chatre@intel.com
Commit bd6ea65287c7 ("Fix regression due to "fs: move binfmt_misc sysctl
to its own file") fixed a regression, however it failed to add a
kmemleak_not_leak().
Fixes: bd6ea65287c7 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file") Reported-by: Tong Zhang <ztong0001@gmail.com> Cc: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
the CPU iterator
- Sync linux/perf_event.h UAPI with the kernel sources
- Update Jiri Olsa's email address in MAINTAINERS
* tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bpf: Defer freeing string after possible strlen() on it
perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
libsubcmd: Fix use-after-free for realloc(..., 0)
libperf: Fix perf_cpu_map__for_each_cpu macro
perf cs-etm: Fix corrupt inject files when only last branch option is enabled
perf cs-etm: No-op refactor of synth opt usage
libperf: Fix 32-bit build for tests uint64_t printf
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
perf trace: Avoid early exit due SIGCHLD from non-workload processes
MAINTAINERS: Update Jiri's email address
Danie du Toit [Thu, 17 Feb 2022 12:48:20 +0000 (14:48 +0200)]
nfp: flower: netdev offload check for ip6gretap
IPv6 GRE tunnels are not being offloaded, this is caused by a missing
netdev offload check. The functionality of IPv6 GRE tunnel offloading
was previously added but this check was not included. Adding the
ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.
Fixes: 57d5a5cb676b ("nfp: flower: Allow ipv6gretap interface for offloading") Signed-off-by: Danie du Toit <danie.dutoit@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 16 Feb 2022 17:32:17 +0000 (09:32 -0800)]
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().
BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x2c7/0x2e0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30
value changed: 0x22 -> 0x2a
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 323fdfa3a82c ("IPv6: Add "offload failed" indication to routes") Fixes: 4fb4b908ee50 ("ipv6: Add "offload" and "trap" indications to routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amit Cohen <amcohen@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 16 Feb 2022 17:32:16 +0000 (09:32 -0800)]
ipv4: fix data races in fib_alias_hw_flags_set
fib_alias_hw_flags_set() can be used by concurrent threads,
and is only RCU protected.
We need to annotate accesses to following fields of struct fib_alias:
offload, trap, offload_failed
Because of READ_ONCE()WRITE_ONCE() limitations, make these
field u8.
BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set
read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
process_scheduled_works kernel/workqueue.c:2370 [inline]
worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
process_scheduled_works kernel/workqueue.c:2370 [inline]
worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
value changed: 0x00 -> 0x02
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 371b0fd7623c ("ipv4: Add "offload" and "trap" indications to routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mans Rullgard [Wed, 16 Feb 2022 20:48:18 +0000 (20:48 +0000)]
net: dsa: lan9303: add VLAN IDs to master device
If the master device does VLAN filtering, the IDs used by the switch
must be added for any frames to be received. Do this in the
port_enable() function, and remove them in port_disable().
Fixes: 3bbb5663f4fc ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mans Rullgard [Wed, 16 Feb 2022 12:46:34 +0000 (12:46 +0000)]
net: dsa: lan9303: handle hwaccel VLAN tags
Check for a hwaccel VLAN tag on rx and use it if present. Otherwise,
use __skb_vlan_pop() like the other tag parsers do. This fixes the case
where the VLAN tag has already been consumed by the master.
Fixes: 3bbb5663f4fc ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 17 Feb 2022 16:57:47 +0000 (08:57 -0800)]
mm: don't try to NUMA-migrate COW pages that have other uses
Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:
"All the details are in the bug, but the bottom line is that somehow,
this patch causes corruption when the numa balancing feature is
enabled AND we don't use process affinity AND we use GUP to pin pages
so our accelerator can DMA to/from system memory.
Either disabling numa balancing, using process affinity to bind to
specific numa-node or reverting this patch causes the bug to
disappear"
and Oded bisected the issue to commit a103ba5cc916 ("mm: do_wp_page()
simplification").
Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW. But it appears it
does. Suspicious.
However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical. It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.
The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.
Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.
Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information). Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.
The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless.
Seth Forshee [Thu, 17 Feb 2022 14:13:12 +0000 (08:13 -0600)]
vsock: remove vsock from connected table when connect is interrupted by a signal
vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.
Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.
Note for backporting: this patch requires dee01a9904fd ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.
Since idm_base and nicpm_base are still optional resources not present
on all platforms, this breaks the driver for everything except Northstar
2 (which has both).
The same change was already reverted once with bb93c3c4fae3 ("net:
broadcom: fix a mistake about ioremap resource").
So let's do it again.
Fixes: 8b375ede3c51 ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
[florian: Added comments to explain the resources are optional] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
While examining is_ucounts_overlimit and reading the various messages
I realized that is_ucounts_overlimit fails to deal with counts that
may have wrapped.
Being wrapped should be a transitory state for counts and they should
never be wrapped for long, but it can happen so handle it.
ucounts: Move RLIMIT_NPROC handling after set_user
During set*id() which cred->ucounts to charge the the current process
to is not known until after set_cred_ucounts. So move the
RLIMIT_NPROC checking into a new helper flag_nproc_exceeded and call
flag_nproc_exceeded after set_cred_ucounts.
This is very much an arbitrary subset of the places where we currently
change the RLIMIT_NPROC accounting, designed to preserve the existing
logic.
Fixing the existing logic will be the subject of another series of
changes.
ucounts: Base set_cred_ucounts changes on the real user
Michal Koutný <mkoutny@suse.com> wrote:
> Tasks are associated to multiple users at once. Historically and as per
> setrlimit(2) RLIMIT_NPROC is enforce based on real user ID.
>
> The commit 96a6f980d813 ("Reimplement RLIMIT_NPROC on top of ucounts")
> made the accounting structure "indexed" by euid and hence potentially
> account tasks differently.
>
> The effective user ID may be different e.g. for setuid programs but
> those are exec'd into already existing task (i.e. below limit), so
> different accounting is moot.
>
> Some special setresuid(2) users may notice the difference, justifying
> this fix.
I looked at cred->ucount and it is only used for rlimit operations
that were previously stored in cred->user. Making the fact
cred->ucount can refer to a different user from cred->user a bug,
affecting all uses of cred->ulimit not just RLIMIT_NPROC.
Fix set_cred_ucounts to always use the real uid not the effective uid.
Further simplify set_cred_ucounts by noticing that set_cred_ucounts
somehow retained a draft version of the check to see if alloc_ucounts
was needed that checks the new->user and new->user_ns against the
current_real_cred(). Remove that draft version of the check.
All that matters for setting the cred->ucounts are the user_ns and uid
fields in the cred.
> It was reported that v5.14 behaves differently when enforcing
> RLIMIT_NPROC limit, namely, it allows one more task than previously.
> This is consequence of the commit 96a6f980d813 ("Reimplement
> RLIMIT_NPROC on top of ucounts") that missed the sharpness of
> equality in the forking path.
This can be fixed either by fixing the test or by moving the increment
to be before the test. Fix it my moving copy_creds which contains
the increment before is_ucounts_overlimit.
In the case of CLONE_NEWUSER the ucounts in the task_cred changes.
The function is_ucounts_overlimit needs to use the final version of
the ucounts for the new process. Which means moving the
is_ucounts_overlimit test after copy_creds is necessary.
Both the test in fork and the test in set_user were semantically
changed when the code moved to ucounts. The change of the test in
fork was bad because it was before the increment. The test in
set_user was wrong and the change to ucounts fixed it. So this
fix only restores the old behavior in one lcation not two.
rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
Solar Designer <solar@openwall.com> wrote:
> I'm not aware of anyone actually running into this issue and reporting
> it. The systems that I personally know use suexec along with rlimits
> still run older/distro kernels, so would not yet be affected.
>
> So my mention was based on my understanding of how suexec works, and
> code review. Specifically, Apache httpd has the setting RLimitNPROC,
> which makes it set RLIMIT_NPROC:
>
> https://httpd.apache.org/docs/2.4/mod/core.html#rlimitnproc
>
> The above documentation for it includes:
>
> "This applies to processes forked from Apache httpd children servicing
> requests, not the Apache httpd children themselves. This includes CGI
> scripts and SSI exec commands, but not any processes forked from the
> Apache httpd parent, such as piped logs."
>
> In code, there are:
>
> ./modules/generators/mod_cgid.c: ( (cgid_req.limits.limit_nproc_set) && ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/generators/mod_cgi.c: ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/filters/mod_ext_filter.c: rv = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC, conf->limit_nproc);
>
> For example, in mod_cgi.c this is in run_cgi_child().
>
> I think this means an httpd child sets RLIMIT_NPROC shortly before it
> execs suexec, which is a SUID root program. suexec then switches to the
> target user and execs the CGI script.
>
> Before 2a27b4654e7e, the setuid() in suexec would set the flag, and the
> target user's process count would be checked against RLIMIT_NPROC on
> execve(). After 2a27b4654e7e, the setuid() in suexec wouldn't set the
> flag because setuid() is (naturally) called when the process is still
> running as root (thus, has those limits bypass capabilities), and
> accordingly execve() would not check the target user's process count
> against RLIMIT_NPROC.
In commit 2a27b4654e7e ("set_user: add capability check when
rlimit(RLIMIT_NPROC) exceeds") capable calls were added to set_user to
make it more consistent with fork. Unfortunately because of call site
differences those capable calls were checking the credentials of the
user before set*id() instead of after set*id().
This breaks enforcement of RLIMIT_NPROC for applications that set the
rlimit and then call set*id() while holding a full set of
capabilities. The capabilities are only changed in the new credential
in security_task_fix_setuid().
The code in apache suexec appears to follow this pattern.
Commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits
(RLIMIT_NPROC)") where this check was added describes the targes of this
capability check as:
2/ When a root-owned process (e.g. cgiwrap) sets up process limits and then
calls setuid, the setuid should fail if the user would then be running
more than rlim_cur[RLIMIT_NPROC] processes, but it doesn't. This patch
adds an appropriate test. With this patch, and per-user process limit
imposed in cgiwrap really works.
So the original use case of this check also appears to match the broken
pattern.
Restore the enforcement of RLIMIT_NPROC by removing the bad capable
checks added in set_user. This unfortunately restores the
inconsistent state the code has been in for the last 11 years, but
dealing with the inconsistencies looks like a larger problem.
Xin Long [Wed, 16 Feb 2022 05:20:52 +0000 (00:20 -0500)]
ping: fix the dif and sdif check in ping_lookup
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:
# ip link add dummy0 type dummy
# ip link set dummy0 up
# ip addr add 192.168.111.1/24 dev dummy0
# ping -I dummy0 192.168.111.1 -c1
The issue was also reported on:
https://github.com/iputils/iputils/issues/104
But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.
This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.
Fixes: cfce3383c7be ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Laibin Qiu [Sat, 22 Jan 2022 11:10:45 +0000 (19:10 +0800)]
block/wbt: fix negative inflight counter when remove scsi device
Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
wbt_disable_default() when switch elevator to bfq. And when
we remove scsi device, wbt will be enabled by wbt_enable_default.
If it become false positive between wbt_wait() and wbt_track()
when submit write request.
The following is the scenario that triggered the problem.
T1 T2 T3
elevator_switch_mq
bfq_init_queue
wbt_disable_default <= Set
rwb->enable_state (OFF)
Submit_bio
blk_mq_make_request
rq_qos_throttle
<= rwb->enable_state (OFF)
scsi_remove_device
sd_remove
del_gendisk
blk_unregister_queue
elv_unregister_queue
wbt_enable_default
<= Set rwb->enable_state (ON)
q_qos_track
<= rwb->enable_state (ON)
^^^^^^ this request will mark WBT_TRACKED without inflight add and will
lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
Fix this by move wbt_enable_default() from elv_unregister to
bfq_exit_queue(). Only re-enable wbt when bfq exit.
Fixes: 19be9d608b322 ("blk-wbt: make sure throttle is enabled properly")
Remove oneline stale comment, and kill one oneshot local variable.
block: fix surprise removal for drivers calling blk_set_queue_dying
Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit b44b1c0ab7ff that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.
Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.
Fixes: b44b1c0ab7ff ("block: drain file system I/O on del_gendisk") Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
perf bpf: Defer freeing string after possible strlen() on it
This was detected by the gcc in Fedora Rawhide's gcc:
50 11.01 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
1225 | *key_scan_pos += strlen(map_opt);
| ^~~~~~~~~~~~~~~
util/bpf-loader.c:1223:9: note: call to 'free' here
1223 | free(map_name);
| ^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
So do the calculations on the pointer before freeing it.
Fixes: 68eaf1530bf9810b ("perf bpf-loader: Add missing '*' for key_scan_pos") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Takashi Iwai [Tue, 15 Feb 2022 13:27:56 +0000 (14:27 +0100)]
ASoC: intel: skylake: Set max DMA segment size
The recent code refactoring to use the standard DMA helper requires
the max DMA segment size setup for SG list management. Without it,
the kernel may spew warnings when a large buffer is allocated.
This patch sets up dma_set_max_seg_size() for avoiding spurious
warnings.
Takashi Iwai [Tue, 15 Feb 2022 13:27:55 +0000 (14:27 +0100)]
ASoC: SOF: hda: Set max DMA segment size
The recent code refactoring to use the standard DMA helper requires
the max DMA segment size setup for SG list management. Without it,
the kernel may spew warnings when a large buffer is allocated.
This patch sets up dma_set_max_seg_size() for avoiding spurious
warnings.
Takashi Iwai [Tue, 15 Feb 2022 13:27:54 +0000 (14:27 +0100)]
ALSA: hda: Set max DMA segment size
The recent code refactoring to use the standard DMA helper requires
the max DMA segment size setup for SG list management. Without it,
the kernel may spew warnings when a large buffer is allocated.
This patch sets up dma_set_max_seg_size() for avoiding spurious
warnings.
Jon Maloy [Wed, 16 Feb 2022 02:00:09 +0000 (21:00 -0500)]
tipc: fix wrong notification node addresses
The previous bug fix had an unfortunate side effect that broke
distribution of binding table entries between nodes. The updated
tipc_sock_addr struct is also used further down in the same
function, and there the old value is still the correct one.
Willem de Bruijn [Tue, 15 Feb 2022 16:00:37 +0000 (11:00 -0500)]
ipv6: per-netns exclusive flowlabel checks
Ipv6 flowlabels historically require a reservation before use.
Optionally in exclusive mode (e.g., user-private).
Commit 0e86d7cdae9f ("ipv6: elide flowlabel check if no exclusive
leases exist") introduced a fastpath that avoids this check when no
exclusive leases exist in the system, and thus any flowlabel use
will be granted.
That allows skipping the control operation to reserve a flowlabel
entirely. Though with a warning if the fast path fails:
This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.
Still, this is subtle. Better isolate network namespaces from each
other. Flowlabels are per-netns. Also record per-netns whether
exclusive leases are in use. Then behavior does not change based on
activity in other netns.
Changes
v2
- wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled
Whenever bridge driver hits the max capacity of MDBs, it disables
the MC processing (by setting corresponding bridge option), but never
notifies switchdev about such change (the notifiers are called only upon
explicit setting of this option, through the registered netlink interface).
This could lead to situation when Software MDB processing gets disabled,
but this event never gets offloaded to the underlying Hardware.
Steve French [Wed, 16 Feb 2022 19:23:53 +0000 (13:23 -0600)]
cifs: fix confusing unneeded warning message on smb2.1 and earlier
When mounting with SMB2.1 or earlier, even with nomultichannel, we
log the confusing warning message:
"CIFS: VFS: multichannel is not supported on this protocol version, use 3.0 or above"
Fix this so that we don't log this unless they really are trying
to mount with multichannel.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215608 Reported-by: Kim Scarborough <kim@scarborough.kim> Cc: stable@vger.kernel.org # 5.11+ Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
Trond Myklebust [Tue, 15 Feb 2022 23:05:18 +0000 (18:05 -0500)]
NFS: Do not report writeback errors in nfs_getattr()
The result of the writeback, whether it is an ENOSPC or an EIO, or
anything else, does not inhibit the NFS client from reporting the
correct file timestamps.
Fixes: 08fa707c28fe ("NFS: Getattr doesn't require data sync semantics") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>