the docker program tries to add F_SEAL_WRITE through the following
command, but it fails unexpectedly with errno EBUSY:
fcntl(5, F_ADD_SEALS, F_SEAL_WRITE) = -1.
That is because memfd_tag_pins() and memfd_wait_for_pins() were never
updated for shmem huge pages: checking page_mapcount() against
page_count() is hopeless on THP subpages - they need to check
total_mapcount() against page_count() on THP heads only.
Make memfd_tag_pins() (compared > 1) as strict as memfd_wait_for_pins()
(compared != 1): either can be justified, but given the non-atomic
total_mapcount() calculation, it is better now to be strict. Bear in
mind that total_mapcount() itself scans all of the THP subpages, when
choosing to take an XA_CHECK_SCHED latency break.
Also fix the unlikely xa_is_value() case in memfd_wait_for_pins(): if a
page has been swapped out since memfd_tag_pins(), then its refcount must
have fallen, and so it can safely be untagged.
Link: https://lkml.kernel.org/r/a4f79248-df75-2c8c-3df-ba3317ccb5da@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Zeal Robot <zealci@zte.com.cn> Reported-by: wangyong <wang.yong12@zte.com.cn> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: CGEL ZTE <cgel.zte@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Song Liu <songliubraving@fb.com> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm: fix use-after-free when anon vma name is used after vma is freed
When adjacent vmas are being merged it can result in the vma that was
originally passed to madvise_update_vma being destroyed. In the current
implementation, the name parameter passed to madvise_update_vma points
directly to vma->anon_name and it is used after the call to vma_merge.
In the cases when vma_merge merges the original vma and destroys it,
this might result in UAF. For that the original vma would have to hold
the anon_vma_name with the last reference. The following vma would need
to contain a different anon_vma_name object with the same string. Such
scenario is shown below:
madvise_vma_behavior(vma)
madvise_update_vma(vma, ..., anon_name == vma->anon_name)
vma_merge(vma)
__vma_adjust(vma) <-- merges vma with adjacent one
vm_area_free(vma) <-- frees the original vma
replace_vma_anon_name(anon_name) <-- UAF of vma->anon_name
Fix this by raising the name refcount and stabilizing it.
Link: https://lkml.kernel.org/r/20220224231834.1481408-3-surenb@google.com Link: https://lkml.kernel.org/r/20220223153613.835563-3-surenb@google.com Fixes: 57b4d3e68c72 ("mm: add a field to store names for private anonymous memory") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: syzbot+aa7b3d4b35f9dc46a366@syzkaller.appspotmail.com Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Chris Hyser <chris.hyser@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Colin Cross <ccross@google.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiaofeng Cao <caoxiaofeng@yulong.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A deep process chain with many vmas could grow really high. With
default sysctl_max_map_count (64k) and default pid_max (32k) the max
number of vmas in the system is 2147450880 and the refcounter has
headroom of 1073774592 before it reaches REFCOUNT_SATURATED
(3221225472).
Therefore it's unlikely that an anonymous name refcounter will overflow
with these defaults. Currently the max for pid_max is PID_MAX_LIMIT
(4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In
this configuration anon_vma_name refcount overflow becomes theoretically
possible (that still require heavy sharing of that anon_vma_name between
processes).
kref refcounting interface used in anon_vma_name structure will detect a
counter overflow when it reaches REFCOUNT_SATURATED value but will only
generate a warning and freeze the ref counter. This would lead to the
refcounted object never being freed. A determined attacker could leak
memory like that but it would be rather expensive and inefficient way to
do so.
To ensure anon_vma_name refcount does not overflow, stop anon_vma_name
sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still
leaves INT_MAX/2 (1073741823) values before the counter reaches
REFCOUNT_SATURATED. This should provide enough headroom for raising the
refcounts temporarily.
Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Gladkov <legion@kernel.org> Cc: Chris Hyser <chris.hyser@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Colin Cross <ccross@google.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Collingbourne <pcc@google.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiaofeng Cao <caoxiaofeng@yulong.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid mixing strings and their anon_vma_name referenced pointers by
using struct anon_vma_name whenever possible. This simplifies the code
and allows easier sharing of anon_vma_name structures when they
represent the same name.
[surenb@google.com: fix comment]
Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Colin Cross <ccross@google.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Alexey Gladkov <legion@kernel.org> Cc: Sasha Levin <sashal@kernel.org> Cc: Chris Hyser <chris.hyser@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Collingbourne <pcc@google.com> Cc: Xiaofeng Cao <caoxiaofeng@yulong.com> Cc: David Hildenbrand <david@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Sat, 5 Mar 2022 04:28:48 +0000 (20:28 -0800)]
selftests/vm: cleanup hugetlb file after mremap test
The hugepage-mremap test will create a file in a hugetlb filesystem. In
a default 'run_vmtests' run, the file will contain all the hugetlb
pages. After the test, the file remains and there are no free hugetlb
pages for subsequent tests. This causes those hugetlb tests to fail.
Change hugepage-mremap to take the name of the hugetlb file as an
argument. Unlink the file within the test, and just to be sure remove
the file in the run_vmtests script.
Link: https://lkml.kernel.org/r/20220201033459.156944-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 4 Mar 2022 19:15:00 +0000 (11:15 -0800)]
Merge tag 'sound-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the last PR for 5.17, including just a few small changes:
an additional fix for ASoC ops boundary check and other minor
device-specific fixes"
* tag 'sound-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: intel_hdmi: Fix reference to PCM buffer address
ASoC: cs4265: Fix the duplicated control name
ASoC: ops: Shift tested values in snd_soc_put_volsw() by +min
Linus Torvalds [Fri, 4 Mar 2022 19:01:22 +0000 (11:01 -0800)]
Merge tag 'drm-fixes-2022-03-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Things are quieting down as expected, just a small set of fixes, i915,
exynos, amdgpu, vrr, bridge and hdlcd. Nothing scary at all.
i915:
- Fix GuC SLPC unset command
- Fix misidentification of some Apple MacBook Pro laptops as Jasper Lake
amdgpu:
- Suspend regression fix
exynos:
- irq handling fixes
- Fix two regressions to TE-gpio handling
arm/hdlcd:
- Select DRM_GEM_CMEA_HELPER for HDLCD
bridge:
- ti-sn65dsi86: Properly undo autosuspend
vrr:
- Fix potential NULL-pointer deref"
* tag 'drm-fixes-2022-03-04' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: fix suspend/resume hang regression
drm/vrr: Set VRR capable prop only if it is attached to connector
drm/arm: arm hdlcd select DRM_GEM_CMA_HELPER
drm/bridge: ti-sn65dsi86: Properly undo autosuspend
drm/i915: s/JSP2/ICP2/ PCH
drm/i915/guc/slpc: Correct the param count for unset param
drm/exynos: Search for TE-gpio in DSI panel's node
drm/exynos: Don't fail if no TE-gpio is defined for DSI driver
drm/exynos: gsc: Use platform_get_irq() to get the interrupt
drm/exynos/fimc: Use platform_get_irq() to get the interrupt
drm/exynos/exynos_drm_fimd: Use platform_get_irq_byname() to get the interrupt
drm/exynos: mixer: Use platform_get_irq() to get the interrupt
drm/exynos/exynos7_drm_decon: Use platform_get_irq_byname() to get the interrupt
Linus Torvalds [Fri, 4 Mar 2022 18:56:00 +0000 (10:56 -0800)]
Merge tag 'pinctrl-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"These two fixes should fix the issues seen on the OrangePi, first we
needed the correct offset when calling pinctrl_gpio_direction(), and
fixing that made a lockdep issue explode in our face. Both now fixed"
* tag 'pinctrl-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: Use unique lockdep classes for IRQs
pinctrl-sunxi: sunxi_pinctrl_gpio_direction_in/output: use correct offset
Daniel Borkmann [Fri, 4 Mar 2022 14:26:32 +0000 (15:26 +0100)]
mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
syzkaller was recently triggering an oversized kvmalloc() warning via
xdp_umem_create().
The triggered warning was added back in b21ae08a2ad1 ("mm: don't allow
oversized kvmalloc() calls"). The rationale for the warning for huge
kvmalloc sizes was as a reaction to a security bug where the size was
more than UINT_MAX but not everything was prepared to handle unsigned
long sizes.
Anyway, the AF_XDP related call trace from this syzkaller report was:
Björn mentioned that requests for >2GB allocation can still be valid:
The structure that is being allocated is the page-pinning accounting.
AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
PAGE_SIZE on 64 bit systems). [...]
I could just change from U32_MAX to INT_MAX, but as I stated earlier
that has a hacky feeling to it. [...] From my perspective, the code
isn't broken, with the memcg limits in consideration. [...]
Linus says:
[...] Pretty much every time this has come up, the kernel warning has
shown that yes, the code was broken and there really wasn't a reason
for doing allocations that big.
Of course, some people would be perfectly fine with the allocation
failing, they just don't want the warning. I didn't want __GFP_NOWARN
to shut it up originally because I wanted people to see all those
cases, but these days I think we can just say "yeah, people can shut
it up explicitly by saying 'go ahead and fail this allocation, don't
warn about it'".
So enough time has passed that by now I'd certainly be ok with [it].
Thus allow call-sites to silence such userspace triggered splats if the
allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
to kvcalloc() this is already the case, so nothing else needed there.
Fixes: b21ae08a2ad1 ("mm: don't allow oversized kvmalloc() calls") Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com Cc: Björn Töpel <bjorn@kernel.org> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David S. Miller <davem@davemloft.net> Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Ackd-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Niklas Cassel [Tue, 1 Mar 2022 00:44:18 +0000 (00:44 +0000)]
riscv: dts: k210: fix broken IRQs on hart1
Commit 0fe601e2e76a ("riscv: Update Canaan Kendryte K210 device tree")
incorrectly removed two entries from the PLIC interrupt-controller node's
interrupts-extended property.
The PLIC driver cannot know the mapping between hart contexts and hart ids,
so this information has to be provided by device tree, as specified by the
PLIC device tree binding.
The PLIC driver uses the interrupts-extended property, and initializes the
hart context registers in the exact same order as provided by the
interrupts-extended property.
In other words, if we don't specify the S-mode interrupts, the PLIC driver
will simply initialize the hart0 S-mode hart context with the hart1 M-mode
configuration. It is therefore essential to specify the S-mode IRQs even
though the system itself will only ever be running in M-mode.
Re-add the S-mode interrupts, so that we get working IRQs on hart1 again.
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:53 +0000 (13:39 +0100)]
riscv: Fix kasan pud population
In sv48, the kasan inner regions are not aligned on PGDIR_SIZE and then
when we populate the kasan linear mapping region, we clear the kasan
vmalloc region which is in the same PGD.
Fix this by copying the content of the kasan early pud after allocating a
new PGD for the first time.
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:52 +0000 (13:39 +0100)]
riscv: Move high_memory initialization to setup_bootmem
high_memory used to be initialized in mem_init, way after setup_bootmem.
But a call to dma_contiguous_reserve in this function gives rise to the
below warning because high_memory is equal to 0 and is used at the very
beginning at cma_declare_contiguous_nid.
It went unnoticed since the move of the kasan region redefined
KERN_VIRT_SIZE so that it does not encompass -1 anymore.
Fix this by initializing high_memory in setup_bootmem.
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:50 +0000 (13:39 +0100)]
riscv: Fix DEBUG_VIRTUAL false warnings
KERN_VIRT_SIZE used to encompass the kernel mapping before it was
redefined when moving the kasan mapping next to the kernel mapping to only
match the maximum amount of physical memory.
Then, kernel mapping addresses that go through __virt_to_phys are now
declared as wrong which is not true, one can use __virt_to_phys on such
addresses.
Fix this by redefining the condition that matches wrong addresses.
Fixes: afdada808285 ("riscv: Move KASAN mapping next to the kernel mapping") Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In order to get the pfn of a struct page* when sparsemem is enabled
without vmemmap, the mem_section structures need to be initialized which
happens in sparse_init.
But kasan_early_init calls pfn_to_page way before sparse_init is called,
which then tries to dereference a null mem_section pointer.
Fix this by removing the usage of this function in kasan_early_init.
Alexandre Ghiti [Fri, 25 Feb 2022 12:39:48 +0000 (13:39 +0100)]
riscv: Fix is_linear_mapping with recent move of KASAN region
The KASAN region was recently moved between the linear mapping and the
kernel mapping, is_linear_mapping used to check the validity of an
address by using the start of the kernel mapping, which is now wrong.
Fix this by using the maximum size of the physical memory.
Fixes: afdada808285 ("riscv: Move KASAN mapping next to the kernel mapping") Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
David Howells [Thu, 3 Mar 2022 13:05:18 +0000 (13:05 +0000)]
cachefiles: Fix incorrect length to fallocate()
When cachefiles_shorten_object() calls fallocate() to shape the cache
file to match the DIO size, it passes the total file size it wants to
achieve, not the amount of zeros that should be inserted. Since this is
meant to preallocate that amount of storage for the file, it can cause
the cache to fill up the disk and hit ENOSPC.
Fix this by passing the length actually required to go from the current
EOF to the desired EOF.
Fixes: d32b2a7eb8ec ("cachefiles: Implement cookie resize for truncate") Reported-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164630854858.3665356.17419701804248490708.stgit@warthog.procyon.org.uk Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 3 Mar 2022 19:10:56 +0000 (11:10 -0800)]
Merge tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can, xfrm, wifi, bluetooth, and netfilter.
Lots of various size fixes, the length of the tag speaks for itself.
Most of the 5.17-relevant stuff comes from xfrm, wifi and bt trees
which had been lagging as you pointed out previously. But there's also
a larger than we'd like portion of fixes for bugs from previous
releases.
Three more fixes still under discussion, including and xfrm revert for
uAPI error.
- eth: ixgbe: xsk: change !netif_carrier_ok() handling in
ixgbe_xmit_zc(), prevent live lock when link goes down
- eth: stmmac: only enable DMA interrupts when ready
- eth: sparx5: move vlan checks before any changes are made
- eth: iavf: fix races around init, removal, resets and vlan ops
- ibmvnic: more reset flow fixes
Misc:
- eth: fix return value of __setup handlers"
* tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
selftests: mlxsw: resource_scale: Fix return value
selftests: mlxsw: tc_police_scale: Make test more robust
net: dcb: disable softirqs in dcbnl_flush_dev()
bnx2: Fix an error message
sfc: extend the locking on mcdi->seqno
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
tcp: make tcp_read_sock() more robust
bpf, sockmap: Do not ignore orig_len parameter
net: ipa: add an interconnect dependency
net: fix up skbs delta_truesize in UDP GRO frag_list
iwlwifi: mvm: return value for request_ownership
nl80211: Update bss channel on channel switch for P2P_CLIENT
iwlwifi: fix build error for IWLMEI
ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
batman-adv: Don't expect inter-netns unique iflink indices
...
Linus Torvalds [Thu, 3 Mar 2022 18:38:28 +0000 (10:38 -0800)]
Merge tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- Fix memory detection for MT7621 devices
- Fix setnocoherentio kernel option
- Fix warning when CONFIG_SCHED_CORE is enabled
* tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: ralink: mt7621: use bitwise NOT instead of logical
mips: setup: fix setnocoherentio() boolean setting
MIPS: smp: fill in sibling and core maps earlier
MIPS: ralink: mt7621: do memory detection on KSEG1
Linus Torvalds [Thu, 3 Mar 2022 18:31:09 +0000 (10:31 -0800)]
Merge tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux
Pull auxdisplay fixes from Miguel Ojeda:
"A few lcd2s fixes from Andy Shevchenko"
* tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux:
auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
auxdisplay: lcd2s: Fix memory leak in ->remove()
auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
Eric Dumazet [Thu, 3 Mar 2022 17:37:28 +0000 (09:37 -0800)]
ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
While investigating on why a synchronize_net() has been added recently
in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
might drop skbs in some cases.
Discussion about removing synchronize_net() from ipv6_mc_down()
will happen in a different thread.
Fixes: 76fa0d5bd59a ("mld: add new workqueues for process mld events") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 3 Mar 2022 15:42:49 +0000 (17:42 +0200)]
net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
The blamed commit said one thing but did another. It explains that we
should restore the "return err" to the original "goto out_unwind_tagger",
but instead it replaced it with "goto out_unlock".
When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a
multi-switch tree, the switches would end up not using the same tagging
protocol.
ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
Commit 99ad695b2913 ("ixgbe: don't do any AF_XDP zero-copy transmit if
netif is not OK") addressed the ring transient state when
MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the
interface to through down/up. Maurice reported that when carrier is not
ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU
cycles due to the constant NAPI rescheduling as ixgbe_poll() states that
there is still some work to be done.
To fix this, do not set work_done to false for a !netif_carrier_ok().
Fixes: 99ad695b2913 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") Reported-by: Maurice Baijens <maurice.baijens@ellips.com> Tested-by: Maurice Baijens <maurice.baijens@ellips.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 3 Mar 2022 16:14:04 +0000 (08:14 -0800)]
Merge branch 'selftests-mlxsw-a-couple-of-fixes'
Ido Schimmel says:
====================
selftests: mlxsw: A couple of fixes
Patch #1 fixes a breakage due to a change in iproute2 output. The real
problem is not iproute2, but the fact that the check was not strict
enough. Fixed by using JSON output instead. Targeting at net so that the
test will pass as part of old and new kernels regardless of iproute2
version.
Patch #2 fixes an issue uncovered by the first one.
====================
Amit Cohen [Wed, 2 Mar 2022 16:14:46 +0000 (18:14 +0200)]
selftests: mlxsw: tc_police_scale: Make test more robust
The test adds tc filters and checks how many of them were offloaded by
grepping for 'in_hw'.
iproute2 commit f4cd4f127047 ("tc: add skip_hw and skip_sw to control
action offload") added offload indication to tc actions, producing the
following output:
$ tc filter show dev swp2 ingress
...
filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0
eth_type ipv6
dst_ip 2001:db8:1::7bf
skip_sw
in_hw in_hw_count 1
action order 1: police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b
ref 1 bind 1
not_in_hw
used_hw_stats immediate
The current grep expression matches on both 'in_hw' and 'not_in_hw',
resulting in incorrect results.
Fix that by using JSON output instead.
Fixes: b74dbba493e4 ("selftests: mlxsw: Add scale test for tc-police") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 2 Mar 2022 19:39:39 +0000 (21:39 +0200)]
net: dcb: disable softirqs in dcbnl_flush_dev()
Ido Schimmel points out that since commit dffd712a9f84 ("dcbnl : Disable
software interrupts before taking dcb_lock"), the DCB API can be called
by drivers from softirq context.
One such in-tree example is the chelsio cxgb4 driver:
dcb_rpl
-> cxgb4_dcb_handle_fw_update
-> dcb_ieee_setapp
If the firmware for this driver happened to send an event which resulted
in a call to dcb_ieee_setapp() at the exact same time as another
DCB-enabled interface was unregistering on the same CPU, the softirq
would deadlock, because the interrupted process was already holding the
dcb_lock in dcbnl_flush_dev().
Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
in the rest of the dcbnl code.
Fixes: d4eea5e138b3 ("net: dcb: flush lingering app table entries for unregistered devices") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Niels Dossche [Tue, 1 Mar 2022 22:28:22 +0000 (23:28 +0100)]
sfc: extend the locking on mcdi->seqno
seqno could be read as a stale value outside of the lock. The lock is
already acquired to protect the modification of seqno against a possible
race condition. Place the reading of this value also inside this locking
to protect it against a possible race condition.
Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
And we can clearly see that this error is also divided into two types:
1. 0x09990003
2. 0x05000000/0x09990003
Which has the same root causes, but the immediate causes vary.
The root cause of this issues is that remove connections from link group
is not synchronous with add/delete rtoken entry, which means that even
the number of connections is less that SMC_RMBS_PER_LGR_MAX, it does not
mean that the connection can register rtoken successfully later. In
other words, the rtoken entry may released, This will cause an
unexpected SMC_CLC_DECL_ERR_REGRMB to be reported, and then this SMC
connections have to fallback to TCP.
This patch set handles two types of SMC_CLC_DECL_ERR_REGRMB exceptions
from different perspectives.
Patch 1: fix the 0x05000000/0x09990003 error.
Patch 2: fix the 0x09990003 error.
After those patches, there is no SMC_CLC_DECL_ERR_REGRMB exceptions in
my
test case any more.
v1 -> v2:
- add bugfix patch for SMC_CLC_DECL_ERR_REGRMB cause by server side
v2 -> v3:
- fix incorrect mail thread
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
D. Wythe [Wed, 2 Mar 2022 13:25:12 +0000 (21:25 +0800)]
net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
The problem of SMC_CLC_DECL_ERR_REGRMB on the server is very clear.
Based on the fact that whether a new SMC connection can be accepted or
not depends on not only the limit of conn nums, but also the available
entries of rtoken. Since the rtoken release is trigger by peer, while
the conn nums is decrease by local, tons of thing can happen in this
time difference.
This only thing that needs to be mentioned is that now all connection
creations are completely protected by smc_server_lgr_pending lock, it's
enough to check only the available entries in rtokens_used_mask.
Fixes: 6d24c9bf5120 ("smc: remote memory buffers (RMBs)") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
smc_lgr_unregister_conn() makes current link available to assigned to new
incoming connection, while smcr_buf_unuse() has not executed yet, which
means that smc_rtoken_add may fail because of insufficient rtoken_entry,
reversing their execution order will avoid this problem.
Fixes: 47f3c5a9bb7c ("net/smc: common functions for RMBs and send buffers") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zheyu Ma [Wed, 2 Mar 2022 12:24:23 +0000 (20:24 +0800)]
net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
During driver initialization, the pointer of card info, i.e. the
variable 'ci' is required. However, the definition of
'com20020pci_id_table' reveals that this field is empty for some
devices, which will cause null pointer dereference when initializing
these devices.
Fix this by checking whether the 'ci' is a null pointer first.
Fixes: 31e4bbb26b04 ("ARCNET: add com20020 PCI IDs with metadata") Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 2 Mar 2022 16:17:23 +0000 (08:17 -0800)]
tcp: make tcp_read_sock() more robust
If recv_actor() returns an incorrect value, tcp_read_sock()
might loop forever.
Instead, issue a one time warning and make sure to make progress.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20220302161723.3910001-2-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 1 Mar 2022 11:34:40 +0000 (05:34 -0600)]
net: ipa: add an interconnect dependency
In order to function, the IPA driver very clearly requires the
interconnect framework to be enabled in the kernel configuration.
State that dependency in the Kconfig file.
This became a problem when CONFIG_COMPILE_TEST support was added.
Non-Qualcomm platforms won't necessarily enable CONFIG_INTERCONNECT.
Reported-by: kernel test robot <lkp@intel.com> Fixes: 3ce3d6ab0dc8f ("net: ipa: support COMPILE_TEST") Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20220301113440.257916-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
lena wang [Tue, 1 Mar 2022 11:17:09 +0000 (19:17 +0800)]
net: fix up skbs delta_truesize in UDP GRO frag_list
The truesize for a UDP GRO packet is added by main skb and skbs in main
skb's frag_list:
skb_gro_receive_list
p->truesize += skb->truesize;
The commit 96ec7031f351 ("net: fix use-after-free when UDP GRO with
shared fraglist") introduced a truesize increase for frag_list skbs.
When uncloning skb, it will call pskb_expand_head and trusesize for
frag_list skbs may increase. This can occur when allocators uses
__netdev_alloc_skb and not jump into __alloc_skb. This flow does not
use ksize(len) to calculate truesize while pskb_expand_head uses.
skb_segment_list
err = skb_unclone(nskb, GFP_ATOMIC);
pskb_expand_head
if (!skb->sk || skb->destructor == sock_edemux)
skb->truesize += size - osize;
If we uses increased truesize adding as delta_truesize, it will be
larger than before and even larger than previous total truesize value
if skbs in frag_list are abundant. The main skb truesize will become
smaller and even a minus value or a huge value for an unsigned int
parameter. Then the following memory check will drop this abnormal skb.
To avoid this error we should use the original truesize to segment the
main skb.
Fixes: 96ec7031f351 ("net: fix use-after-free when UDP GRO with shared fraglist") Signed-off-by: lena wang <lena.wang@mediatek.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1646133431-8948-1-git-send-email-lena.wang@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 3 Mar 2022 05:53:34 +0000 (21:53 -0800)]
Merge tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Remove redundant iflink requests, by Sven Eckelmann (2 patches)
- Don't expect inter-netns unique iflink indices, by Sven Eckelmann
* tag 'batadv-net-pullrequest-20220302' of git://git.open-mesh.org/linux-merge:
batman-adv: Don't expect inter-netns unique iflink indices
batman-adv: Request iflink once in batadv_get_real_netdevice
batman-adv: Request iflink once in batadv-on-batadv check
====================
Jakub Kicinski [Thu, 3 Mar 2022 05:49:57 +0000 (21:49 -0800)]
Merge tag 'wireless-for-net-2022-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Three more fixes:
- fix build issue in iwlwifi, now that I understood
what's going on there
- propagate error in iwlwifi/mvm to userspace so it
can figure out what's happening
- fix channel switch related updates in P2P-client
in cfg80211
* tag 'wireless-for-net-2022-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
iwlwifi: mvm: return value for request_ownership
nl80211: Update bss channel on channel switch for P2P_CLIENT
iwlwifi: fix build error for IWLMEI
====================
Linus Torvalds [Thu, 3 Mar 2022 00:20:04 +0000 (16:20 -0800)]
Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fix from Eric Biederman:
"Etienne Dechamps recently found a regression caused by enforcing
RLIMIT_NPROC for root where the rlimit was not previously enforced.
Michal Koutný had previously pointed out the inconsistency in
enforcing the RLIMIT_NPROC that had been on the root owned process
after the root user creates a user namespace.
Which makes the fix for the regression simply removing the
inconsistency"
* 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: Fix systemd LimitNPROC with private users regression
Linus Torvalds [Thu, 3 Mar 2022 00:11:56 +0000 (16:11 -0800)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- Fix kgdb breakpoint for Thumb2
- Fix dependency for BITREVERSE kconfig
- Fix nommu early_params and __setup returns
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE
ARM: Fix kgdb breakpoint for Thumb2
Qiang Yu [Tue, 1 Mar 2022 06:11:59 +0000 (14:11 +0800)]
drm/amdgpu: fix suspend/resume hang regression
Regression has been reported that suspend/resume may hang with
the previous vm ready check commit.
So bring back the evicted list check as a temp fix.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922 Fixes: 8dce28f235c2 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Qiang Yu <qiang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Andy Shevchenko [Wed, 23 Feb 2022 15:47:18 +0000 (17:47 +0200)]
auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
While it might work, the current approach is fragile in a few ways:
- whenever members in the structure are shuffled, the pointer will be wrong
- the resource freeing may include more than covered by kfree()
Fix this by using charlcd_free() call instead of kfree().
Fixes: d4bab42e94cd ("auxdisplay: add a driver for lcd2s character display") Cc: Lars Poeschel <poeschel@lemonage.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Andy Shevchenko [Wed, 23 Feb 2022 15:47:17 +0000 (17:47 +0200)]
auxdisplay: lcd2s: Fix memory leak in ->remove()
Once allocated the struct lcd2s_data is never freed.
Fix the memory leak by switching to devm_kzalloc().
Fixes: d4bab42e94cd ("auxdisplay: add a driver for lcd2s character display") Cc: Lars Poeschel <poeschel@lemonage.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
It seems that the lcd2s_redefine_char() has never been properly
tested. The buffer is filled by DEF_CUSTOM_CHAR command followed
by the character number (from 0 to 7), but immediately after that
these bytes are rewritten by the decoded hex stream.
Fix the index to fill the buffer after the command and number.
Fixes: d4bab42e94cd ("auxdisplay: add a driver for lcd2s character display") Cc: Lars Poeschel <poeschel@lemonage.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
[fixed typo in commit message] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
nl80211: Update bss channel on channel switch for P2P_CLIENT
The wdev channel information is updated post channel switch only for
the station mode and not for the other modes. Due to this, the P2P client
still points to the old value though it moved to the new channel
when the channel change is induced from the P2P GO.
Update the bss channel after CSA channel switch completion for P2P client
interface as well.
Randy Dunlap [Sun, 27 Feb 2022 20:00:51 +0000 (12:00 -0800)]
iwlwifi: fix build error for IWLMEI
When CONFIG_IWLWIFI=m and CONFIG_IWLMEI=y, the kernel build system
must be told to build the iwlwifi/ subdirectory for both IWLWIFI and
IWLMEI so that builds for both =y and =m are done.
Linus Torvalds [Wed, 2 Mar 2022 20:08:36 +0000 (12:08 -0800)]
Merge tag 'erofs-for-5.17-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"A one-line patch to fix the new ztailpacking feature on > 4GiB
filesystems because z_idataoff can get trimmed improperly.
ztailpacking is still a brand new EXPERIMENTAL feature, but it'd be
better to fix the issue as soon as possible to avoid unnecessary
backporting.
Summary:
- Fix ztailpacking z_idataoff getting trimmed on > 4GiB filesystems"
* tag 'erofs-for-5.17-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix ztailpacking on > 4GiB filesystems
Linus Torvalds [Wed, 2 Mar 2022 19:58:27 +0000 (11:58 -0800)]
Merge tag 'ntb-5.17-bugfixes' of git://github.com/jonmason/ntb
Pull NTB fixes from Jon Mason:
"Bug fixes for sparse warning, intel port config offset, and a new
mailing list"
* tag 'ntb-5.17-bugfixes' of git://github.com/jonmason/ntb:
MAINTAINERS: update mailing list address for NTB subsystem
ntb: intel: fix port config status offset for SPR
NTB/msi: Use struct_size() helper in devm_kzalloc()
Jonathan Lemon [Mon, 28 Feb 2022 20:39:57 +0000 (12:39 -0800)]
ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
In ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime."), the
ns adjustment was written to the FPGA register, so the clock could
accurately perform adjustments.
However, the adjtime() call passes in a s64, while the clock adjustment
registers use a s32. When trying to perform adjustments with a large
value (37 sec), things fail.
Examine the incoming delta, and if larger than 1 sec, use the original
(coarse) adjustment method. If smaller than 1 sec, then allow the
FPGA to fold in the changes over a 1 second window.
Fixes: 917de1d917ee ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime.") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20220228203957.367371-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gao Xiang [Tue, 22 Feb 2022 03:31:18 +0000 (11:31 +0800)]
erofs: fix ztailpacking on > 4GiB filesystems
z_idataoff here is an absolute physical offset, so it should use
erofs_off_t (64 bits at least). Otherwise, it'll get trimmed and
cause the decompresion failure.
Zhen Ni [Wed, 2 Mar 2022 07:42:41 +0000 (15:42 +0800)]
ALSA: intel_hdmi: Fix reference to PCM buffer address
PCM buffers might be allocated dynamically when the buffer
preallocation failed or a larger buffer is requested, and it's not
guaranteed that substream->dma_buffer points to the actually used
buffer. The driver needs to refer to substream->runtime->dma_addr
instead for the buffer address.
Sven Eckelmann [Sun, 27 Feb 2022 22:23:49 +0000 (23:23 +0100)]
batman-adv: Don't expect inter-netns unique iflink indices
The ifindex doesn't have to be unique for multiple network namespaces on
the same machine.
$ ip netns add test1
$ ip -net test1 link add dummy1 type dummy
$ ip netns add test2
$ ip -net test2 link add dummy2 type dummy
$ ip -net test1 link show dev dummy1
6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff
$ ip -net test2 link show dev dummy2
6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff
But the batman-adv code to walk through the various layers of virtual
interfaces uses this assumption because dev_get_iflink handles it
internally and doesn't return the actual netns of the iflink. And
dev_get_iflink only documents the situation where ifindex == iflink for
physical devices.
But only checking for dev->netdev_ops->ndo_get_iflink is also not an option
because ipoib_get_iflink implements it even when it sometimes returns an
iflink != ifindex and sometimes iflink == ifindex. The caller must
therefore make sure itself to check both netns and iflink + ifindex for
equality. Only when they are equal, a "physical" interface was detected
which should stop the traversal. On the other hand, vxcan_get_iflink can
also return 0 in case there was currently no valid peer. In this case, it
is still necessary to stop.
Fixes: 88c70b8cee47 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface") Fixes: 463a0e2a6e70 ("batman-adv: additional checks for virtual interfaces on top of WiFi") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Sun, 27 Feb 2022 23:01:24 +0000 (00:01 +0100)]
batman-adv: Request iflink once in batadv_get_real_netdevice
There is no need to call dev_get_iflink multiple times for the same
net_device in batadv_get_real_netdevice. And since some of the
ndo_get_iflink callbacks are dynamic (for example via RCUs like in
vxcan_get_iflink), it could easily happen that the returned values are not
stable. The pre-checks before __dev_get_by_index are then of course bogus.
Fixes: 463a0e2a6e70 ("batman-adv: additional checks for virtual interfaces on top of WiFi") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Sun, 27 Feb 2022 23:01:24 +0000 (00:01 +0100)]
batman-adv: Request iflink once in batadv-on-batadv check
There is no need to call dev_get_iflink multiple times for the same
net_device in batadv_is_on_batman_iface. And since some of the
.ndo_get_iflink callbacks are dynamic (for example via RCUs like in
vxcan_get_iflink), it could easily happen that the returned values are not
stable. The pre-checks before __dev_get_by_index are then of course bogus.
Fixes: 88c70b8cee47 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Vladimir Oltean [Mon, 28 Feb 2022 14:17:15 +0000 (16:17 +0200)]
net: dsa: restore error path of dsa_tree_change_tag_proto
When the DSA_NOTIFIER_TAG_PROTO returns an error, the user space process
which initiated the protocol change exits the kernel processing while
still holding the rtnl_mutex. So any other process attempting to lock
the rtnl_mutex would deadlock after such event.
The error handling of DSA_NOTIFIER_TAG_PROTO was inadvertently changed
by the blamed commit, introducing this regression. We must still call
rtnl_unlock(), and we must still call DSA_NOTIFIER_TAG_PROTO for the old
protocol. The latter is due to the limiting design of notifier chains
for cross-chip operations, which don't have a built-in error recovery
mechanism - we should look into using notifier_call_chain_robust for that.
Jakub Kicinski [Wed, 2 Mar 2022 01:16:46 +0000 (17:16 -0800)]
Merge tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression with scanning not working in some systems.
* tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: Fix not checking MGMT cmd pending queue
====================
Brian Gix [Tue, 1 Mar 2022 22:34:57 +0000 (14:34 -0800)]
Bluetooth: Fix not checking MGMT cmd pending queue
A number of places in the MGMT handlers we examine the command queue for
other commands (in progress but not yet complete) that will interact
with the process being performed. However, not all commands go into the
queue if one of:
1. There is no negative side effect of consecutive or redundent commands
2. The command is entirely perform "inline".
This change examines each "pending command" check, and if it is not
needed, deletes the check. Of the remaining pending command checks, we
make sure that the command is in the pending queue by using the
mgmt_pending_add/mgmt_pending_remove pair rather than the
mgmt_pending_new/mgmt_pending_free pair.
Paul Blakey [Mon, 28 Feb 2022 09:23:49 +0000 (11:23 +0200)]
net/sched: act_ct: Fix flow table lookup failure with no originating ifindex
After cited commit optimizted hw insertion, flow table entries are
populated with ifindex information which was intended to only be used
for HW offload. This tuple ifindex is hashed in the flow table key, so
it must be filled for lookup to be successful. But tuple ifindex is only
relevant for the netfilter flowtables (nft), so it's not filled in
act_ct flow table lookup, resulting in lookup failure, and no SW
offload and no offload teardown for TCP connection FIN/RST packets.
To fix this, add new tc ifindex field to tuple, which will
only be used for offloading, not for lookup, as it will not be
part of the tuple hash.
Fixes: 0b4775375e46 ("net/sched: act_ct: Fill offloading tuple iifidx") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Tue, 1 Mar 2022 20:01:18 +0000 (12:01 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The bigger part of the change is a revert for x86 hosts. Here the
second patch was supposed to fix the first, but in reality it was just
as broken, so both have to go.
x86 host:
- Revert incorrect assumption that cr3 changes come with preempt
notifier callbacks (they don't when static branches are changed,
for example)
ARM host:
- Correctly synchronise PMR and co on PSCI CPU_SUSPEND
- Skip tests that depend on GICv3 when the HW isn't available"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: aarch64: Skip tests if we can't create a vgic-v3
Revert "KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()"
Revert "KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()"
KVM: arm64: Don't miss pending interrupts for suspended vCPU
Manasi Navare [Fri, 25 Feb 2022 01:30:54 +0000 (17:30 -0800)]
drm/vrr: Set VRR capable prop only if it is attached to connector
VRR capable property is not attached by default to the connector
It is attached only if VRR is supported.
So if the driver tries to call drm core set prop function without
it being attached that causes NULL dereference.
Kees Cook [Mon, 28 Feb 2022 18:59:12 +0000 (10:59 -0800)]
binfmt_elf: Avoid total_mapping_size for ET_EXEC
Partially revert commit 8a174c881983 ("binfmt_elf: reintroduce using
MAP_FIXED_NOREPLACE"), which applied the ET_DYN "total_mapping_size"
logic also to ET_EXEC.
At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address
contiguous (but _are_ file-offset contiguous). This would result in a
giant mapping attempting to cover the entire span, including the virtual
address range hole, and well beyond the size of the ELF file itself,
causing the kernel to refuse to load it. For example:
File offset range : 0x000000-0x00bb4c
0x00bb4c bytes
Virtual address range : 0x4000000000000000-0x600000000000bcb0
0x200000000000bcb0 bytes
Remove the total_mapping_size logic for ET_EXEC, which reduces the
ET_EXEC MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD (better
than nothing), and retains it for ET_DYN.
Ironically, this is the reverse of the problem that originally caused
problems with MAP_FIXED_NOREPLACE: overlapping PT_LOAD segments. Future
work could restore full coverage if load_elf_binary() were to perform
mappings in a separate phase from the loading (where it could resolve
both overlaps and holes).
Do not call get_trip_hyst() from thermal_genl_cmd_tz_get_trip() if
the thermal zone does not define one.
Fixes: 259044c38f1f ("thermal: core: genetlink support for events/cmd/sampling") Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Cc: 5.10+ <stable@vger.kernel.org> # 5.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
David S. Miller [Tue, 1 Mar 2022 14:45:55 +0000 (14:45 +0000)]
Merge tag 'wireless-for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
johannes Berg says:
====================
Some last-minute fixes:
* rfkill
- add missing rfill_soft_blocked() when disabled
* cfg80211
- handle a nla_memdup() failure correctly
- fix CONFIG_CFG80211_EXTRA_REGDB_KEYDIR typo in
Makefile
* mac80211
- fix EAPOL handling in 802.3 RX path
- reject setting up aggregation sessions before
connection is authorized to avoid timeouts or
similar
- handle some SAE authentication steps correctly
- fix AC selection in mesh forwarding
* iwlwifi
- remove TWT support as it causes firmware crashes
when the AP isn't behaving correctly
- check debugfs pointer before dereferncing it
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 28 Feb 2022 23:46:19 +0000 (00:46 +0100)]
netfilter: nf_queue: handle socket prefetch
In case someone combines bpf socket assign and nf_queue, then we will
queue an skb who references a struct sock that did not have its
reference count incremented.
As we leave rcu protection, there is no guarantee that skb->sk is still
valid.
For refcount-less skb->sk case, try to increment the reference count
and then override the destructor.
In case of failure we have two choices: orphan the skb and 'delete'
preselect or let nf_queue() drop the packet.
Do the latter, it should not happen during normal operation.
Florian Westphal [Fri, 25 Feb 2022 11:01:23 +0000 (12:01 +0100)]
selftests: netfilter: add nfqueue TCP_NEW_SYN_RECV socket race test
causes:
BUG: KASAN: slab-out-of-bounds in sk_free+0x25/0x80
Write of size 4 at addr ffff888106df0284 by task nf-queue/1459
sk_free+0x25/0x80
nf_queue_entry_release_refs+0x143/0x1a0
nf_reinject+0x233/0x770
... without 'netfilter: nf_queue: don't assume sk is full socket'.
Johannes Berg [Thu, 24 Feb 2022 09:39:34 +0000 (10:39 +0100)]
mac80211: treat some SAE auth steps as final
When we get anti-clogging token required (added by the commit
mentioned below), or the other status codes added by the later
commit 2074365f8075 ("mac80211: Handle special status codes in
SAE commit") we currently just pretend (towards the internal
state machine of authentication) that we didn't receive anything.
This has the undesirable consequence of retransmitting the prior
frame, which is not expected, because the timer is still armed.
If we just disarm the timer at that point, it would result in
the undesirable side effect of being in this state indefinitely
if userspace crashes, or so.
So to fix this, reset the timer and set a new auth_data->waiting
in order to have no more retransmissions, but to have the data
destroyed when the timer actually fires, which will only happen
if userspace didn't continue (i.e. crashed or abandoned it.)
Golan Ben Ami [Tue, 1 Mar 2022 07:29:26 +0000 (09:29 +0200)]
iwlwifi: don't advertise TWT support
Some APs misbehave when TWT is used and cause our firmware to crash.
We don't know a reasonable way to detect and work around this problem
in the FW yet. To prevent these crashes, disable TWT in the driver by
stopping to advertise TWT support.
Ben Dooks [Fri, 18 Feb 2022 09:38:58 +0000 (09:38 +0000)]
rfkill: define rfill_soft_blocked() if !RFKILL
If CONFIG_RFKILL is not set, the Intel WiFi driver will not build
the iw_mvm driver part due to the missing rfill_soft_blocked()
call. Adding a inline declaration of rfill_soft_blocked() if
CONFIG_RFKILL=n fixes the following error:
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h: In function 'iwl_mvm_mei_set_sw_rfkill_state':
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h:2215:38: error: implicit declaration of function 'rfkill_soft_blocked'; did you mean 'rfkill_blocked'? [-Werror=implicit-function-declaration]
2215 | mvm->hw_registered ? rfkill_soft_blocked(mvm->hw->wiphy->rfkill) : false;
| ^~~~~~~~~~~~~~~~~~~
| rfkill_blocked
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reported-by: Neill Whillans <neill.whillans@codethink.co.uk> Fixes: 7a36943340aa ("rfkill: allow to get the software rfkill state") Link: https://lore.kernel.org/r/20220218093858.1245677-1-ben.dooks@codethink.co.uk Signed-off-by: Johannes Berg <johannes.berg@intel.com>
David S. Miller [Tue, 1 Mar 2022 08:33:55 +0000 (08:33 +0000)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-02-28
This series contains updates to igc and e1000e drivers.
Corinna Vinschen ensures release of hardware sempahore on failed
register read in igc_read_phy_reg_gpy().
Sasha does the same for the write variant, igc_write_phy_reg_gpy(). On
e1000e, he resolves an issue with hardware unit hang on s0ix exit
by disabling some bits and LAN connected device reset during power
management flows. Lastly, he allows for TGP platforms to correct its
NVM checksum.
v2: Fix Fixes tag on patch 3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Holland [Wed, 16 Feb 2022 04:00:36 +0000 (22:00 -0600)]
pinctrl: sunxi: Use unique lockdep classes for IRQs
This driver, like several others, uses a chained IRQ for each GPIO bank,
and forwards .irq_set_wake to the GPIO bank's upstream IRQ. As a result,
a call to irq_set_irq_wake() needs to lock both the upstream and
downstream irq_desc's. Lockdep considers this to be a possible deadlock
when the irq_desc's share lockdep classes, which they do by default:
============================================
WARNING: possible recursive locking detected 5.17.0-rc3-00394-gc849047c2473 #1 Not tainted
--------------------------------------------
init/307 is trying to acquire lock: c2dfe27c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
but task is already holding lock: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0
other info that might help us debug this:
Possible unsafe locking scenario:
stack backtrace:
CPU: 0 PID: 307 Comm: init Not tainted 5.17.0-rc3-00394-gc849047c2473 #1
Hardware name: Allwinner sun8i Family
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x90
dump_stack_lvl from __lock_acquire+0x1680/0x31a0
__lock_acquire from lock_acquire+0x148/0x3dc
lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
_raw_spin_lock_irqsave from __irq_get_desc_lock+0x58/0xa0
__irq_get_desc_lock from irq_set_irq_wake+0x2c/0x19c
irq_set_irq_wake from irq_set_irq_wake+0x13c/0x19c
[tail call from sunxi_pinctrl_irq_set_wake]
irq_set_irq_wake from gpio_keys_suspend+0x80/0x1a4
gpio_keys_suspend from gpio_keys_shutdown+0x10/0x2c
gpio_keys_shutdown from device_shutdown+0x180/0x224
device_shutdown from __do_sys_reboot+0x134/0x23c
__do_sys_reboot from ret_fast_syscall+0x0/0x1c
However, this can never deadlock because the upstream and downstream
IRQs are never the same (nor do they even involve the same irqchip).
Silence this erroneous lockdep splat by applying what appears to be the
usual fix of moving the GPIO IRQs to separate lockdep classes.
Fixes: 8bdebd9d7a8f ("pinctrl: sunxi: Forward calls to irq_set_irq_wake") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20220216040037.22730-1-samuel@sholland.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Hans Verkuil [Wed, 26 Jan 2022 11:02:04 +0000 (12:02 +0100)]
pinctrl-sunxi: sunxi_pinctrl_gpio_direction_in/output: use correct offset
The commit that sets the direction directly without calling
pinctrl_gpio_direction(), forgot to add chip->base to the offset when
calling sunxi_pmx_gpio_set_direction().
This caused failures for various Allwinner boards which have two
GPIO blocks.
Sasha Neftin [Thu, 3 Feb 2022 12:21:49 +0000 (14:21 +0200)]
e1000e: Correct NVM checksum verification flow
Update MAC type check e1000_pch_tgp because for e1000_pch_cnp,
NVM checksum update is still possible.
Emit a more detailed warning message.
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1191663 Fixes: c3bd9db97fa7 ("e1000e: Do not take care about recovery NVM checksum") Reported-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Sasha Neftin [Tue, 25 Jan 2022 17:31:23 +0000 (19:31 +0200)]
e1000e: Fix possible HW unit hang after an s0ix exit
Disable the OEM bit/Gig Disable/restart AN impact and disable the PHY
LAN connected device (LCD) reset during power management flows. This
fixes possible HW unit hangs on the s0ix exit on some corporate ADL
platforms.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821 Fixes: 7c28b2aaf1e5 ("e1000e: Add handshake with the CSME to support S0ix") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Suggested-by: Nir Efrati <nir.efrati@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Netfilter assumes its called with rcu_read_lock held, but in egress
hook case it may be called with BH readlock.
This triggers lockdep splat.
In order to avoid to change all rcu_dereference() to
rcu_dereference_check(..., rcu_read_lock_bh_held()), wrap nf_hook_slow
with read lock/unlock pair.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Dumazet [Sun, 27 Feb 2022 18:01:41 +0000 (10:01 -0800)]
netfilter: fix use-after-free in __nf_register_net_hook()
We must not dereference @new_hooks after nf_hook_mutex has been released,
because other threads might have freed our allocated hooks already.
BUG: KASAN: use-after-free in nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
BUG: KASAN: use-after-free in hooks_validate net/netfilter/core.c:171 [inline]
BUG: KASAN: use-after-free in __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
Read of size 2 at addr ffff88801c1a8000 by task syz-executor237/4430
Linus Torvalds [Mon, 28 Feb 2022 20:44:33 +0000 (12:44 -0800)]
Merge tag 'efi-urgent-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- don't treat valid hartid U32_MAX as a failure return code (RISC-V)
- avoid blocking query_variable_info() call when blocking is not
allowed
* tag 'efi-urgent-for-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efivars: Respect "block" flag in efivar_entry_set_safe()
riscv/efi_stub: Fix get_boot_hartid_from_fdt() return value
The PM Runtime docs say:
Drivers in ->remove() callback should undo the runtime PM changes done
in ->probe(). Usually this means calling pm_runtime_disable(),
pm_runtime_dont_use_autosuspend() etc.
We weren't doing that for autosuspend. Let's do it.
Sasha Neftin [Sun, 20 Feb 2022 07:29:15 +0000 (09:29 +0200)]
igc: igc_write_phy_reg_gpy: drop premature return
Similar to "igc_read_phy_reg_gpy: drop premature return" patch.
igc_write_phy_reg_gpy checks the return value from igc_write_phy_reg_mdic
and if it's not 0, returns immediately. By doing this, it leaves the HW
semaphore in the acquired state.
Drop this premature return statement, the function returns after
releasing the semaphore immediately anyway.
Fixes: f2c4d6854520 ("igc: Add code for PHY support") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Reported-by: Corinna Vinschen <vinschen@redhat.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Corinna Vinschen [Wed, 16 Feb 2022 13:31:35 +0000 (14:31 +0100)]
igc: igc_read_phy_reg_gpy: drop premature return
igc_read_phy_reg_gpy checks the return value from igc_read_phy_reg_mdic
and if it's not 0, returns immediately. By doing this, it leaves the HW
semaphore in the acquired state.
Drop this premature return statement, the function returns after
releasing the semaphore immediately anyway.
Fixes: f2c4d6854520 ("igc: Add code for PHY support") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Randy Dunlap [Wed, 23 Feb 2022 19:46:35 +0000 (20:46 +0100)]
ARM: 9182/1: mmu: fix returns from early_param() and __setup() functions
early_param() handlers should return 0 on success.
__setup() handlers should return 1 on success, i.e., the parameter
has been handled. A return of 0 would cause the "option=value" string
to be added to init's environment strings, polluting it.
../arch/arm/mm/mmu.c: In function 'test_early_cachepolicy':
../arch/arm/mm/mmu.c:215:1: error: no return statement in function returning non-void [-Werror=return-type]
../arch/arm/mm/mmu.c: In function 'test_noalign_setup':
../arch/arm/mm/mmu.c:221:1: error: no return statement in function returning non-void [-Werror=return-type]
Fixes: 6e34d4eb084a ("ARM: make cr_alignment read-only #ifndef CONFIG_CPU_CP15") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Miaoqian Lin [Fri, 7 Jan 2022 08:09:11 +0000 (08:09 +0000)]
iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_find
The reference taken by 'of_find_device_by_node()' must be released when
not needed anymore.
Add the corresponding 'put_device()' in the error handling path.
Fixes: 9d1145197604 ("iommu/tegra-smmu: Fix mc errors on tegra124-nyan") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20220107080915.12686-1-linmq006@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>