Daniel Vetter [Thu, 4 Feb 2021 16:58:31 +0000 (17:58 +0100)]
PCI: Revoke mappings like devmem
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps PTEs when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.
Except there are two more ways to access PCI BARs: sysfs and proc mmap
support. Let's plug that hole.
For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
to adjust this at at ->open time:
- for sysfs this is easy, now that binary attributes support this. We
just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs PCI access has only
one file per device, and access to a specific resource first needs
to be set up with some ioctl calls. But mmap is only supported for
the same resources as sysfs exposes with mmap support, and otherwise
rejected, so we can set the mapping unconditionally at open time
without harm.
A special consideration is for arch_can_pci_mmap_io() - we need to
make sure that the ->f_mapping doesn't alias between ioport and iomem
space. There are only 2 ways in-tree to support mmap of ioports: generic
PCI mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
architecture hand-rolling. Both approaches support ioport mmap through a
special PFN range and not through magic PTE attributes. Aliasing is
therefore not a problem.
The only difference in access checks left is that sysfs PCI mmap does
not check for CAP_RAWIO. I'm not really sure whether that should be
added or not.
Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210204165831.2703772-3-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 5 Feb 2021 13:36:32 +0000 (14:36 +0100)]
PCI: Also set up legacy files only after sysfs init
We are already doing this for all the regular sysfs files on PCI
devices, but not yet on the legacy io files on the PCI buses. Thus far
no problem, but in the next patch I want to wire up iomem revoke
support. That needs the vfs up and running already to make sure that
iomem_get_mapping() works.
Wire it up exactly like the existing code in
pci_create_sysfs_dev_files(). Note that pci_remove_legacy_files()
doesn't need a check since the one for pci_bus->legacy_io is
sufficient.
An alternative solution would be to implement a callback in sysfs to
set up the address space from iomem_get_mapping() when userspace calls
mmap(). This also works, but Greg didn't really like that just to work
around an ordering issue when the kernel loads initially.
v2: Improve commit message (Bjorn)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210205133632.2827730-1-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:25 +0000 (17:41 +0100)]
sysfs: Support zapping of binary attr mmaps
We want to be able to revoke pci mmaps so that the same access rules
applies as for /dev/kmem. Revoke support for devmem was added in 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
region").
The simplest way to achieve this is by having the same filp->f_mapping
for all mappings, so that unmap_mapping_range can find them all, no
matter through which file they've been created. Since this must be set
at open time we need sysfs support for this.
Add an optional mapping parameter bin_attr, which is only consulted
when there's also an mmap callback, since without mmap support
allowing to adjust the ->f_mapping makes no sense.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-12-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:24 +0000 (17:41 +0100)]
resource: Move devmem revoke code to resource framework
We want all iomem mmaps to consistently revoke ptes when the kernel
takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
pci bar mmaps available through procfs and sysfs, which currently do
not revoke mappings.
To prepare for this, move the code from the /dev/kmem driver to
kernel/resource.c.
During review Jason spotted that barriers are used somewhat
inconsistently. Fix that up while we shuffle this code, since it
doesn't have an actual impact at runtime. Otherwise no semantic and
behavioural changes intended, just code extraction and adjusting
comments and names.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Hildenbrand <david@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-11-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:23 +0000 (17:41 +0100)]
/dev/mem: Only set filp->f_mapping
When we care about pagecache maintenance, we need to make sure that
both f_mapping and i_mapping point at the right mapping.
But for iomem mappings we only care about the virtual/pte side of
things, so f_mapping is enough. Also setting inode->i_mapping was
confusing me as a driver maintainer, since in e.g. drivers/gpu we
don't do that. Per Dan this seems to be copypasta from places which do
care about pagecache consistency, but not needed. Hence remove it for
slightly less confusion.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-10-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:22 +0000 (17:41 +0100)]
PCI: Obey iomem restrictions for procfs mmap
There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to PCI BARs while a driver is loaded and using it.
Fix this by adding the same iomem_is_exclusive() check we already have
on the sysfs side in pci_mmap_resource().
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-9-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:21 +0000 (17:41 +0100)]
mm: Close race in generic_access_phys
Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:
- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved
- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")
Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.
Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.
While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.
References: 28b2ee20c7cb ("access_process_vm device memory infrastructure") Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Benjamin Herrensmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-8-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:20 +0000 (17:41 +0100)]
media: videobuf2: Move frame_vector into media subsystem
It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).
Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Pawel Osciak <pawel@osciak.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-7-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:19 +0000 (17:41 +0100)]
mm/frame-vector: Use FOLL_LONGTERM
This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.
Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.
By relying entirely on the vma checks in pin_user_pages and follow_pfn
(for vm_flags and vma_is_fsdax) we can also streamline the code a lot.
Note that pin_user_pages_fast is a safe replacement despite the
seeming lack of checking for vma->vm_flasg & (VM_IO | VM_PFNMAP). Such
ptes are marked with pte_mkspecial (which pup_fast rejects in the
fastpath), and only architectures supporting that support the
pin_user_pages_fast fastpath.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Pawel Osciak <pawel@osciak.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-6-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:17 +0000 (17:41 +0100)]
misc/habana: Stop using frame_vector helpers
All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
Note that pin_user_pages_fast is a safe replacement despite the
seeming lack of checking for vma->vm_flasg & (VM_IO | VM_PFNMAP). Such
ptes are marked with pte_mkspecial (which pup_fast rejects in the
fastpath), and only architectures supporting that support the
pin_user_pages_fast fastpath.
Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Pawel Piskorski <ppiskorski@habana.ai> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-4-daniel.vetter@ffwll.ch
Daniel Vetter [Fri, 27 Nov 2020 16:41:15 +0000 (17:41 +0100)]
drm/exynos: Stop using frame_vector helpers
All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
Note that pin_user_pages_fast is a safe replacement despite the
seeming lack of checking for vma->vm_flasg & (VM_IO | VM_PFNMAP). Such
ptes are marked with pte_mkspecial (which pup_fast rejects in the
fastpath), and only architectures supporting that support the
pin_user_pages_fast fastpath.
Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Inki Dae <inki.dae@samsung.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Kukjin Kim <kgene@kernel.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20201127164131.2244124-2-daniel.vetter@ffwll.ch
Linus Torvalds [Sun, 10 Jan 2021 21:24:55 +0000 (13:24 -0800)]
Merge tag 'kbuild-fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Search for <ncurses.h> in the default header path of HOSTCC
- Tweak the option order to be kind to old BSD awk
- Remove 'kvmconfig' and 'xenconfig' shorthands
- Fix documentation
* tag 'kbuild-fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Documentation: kbuild: Fix section reference
kconfig: remove 'kvmconfig' and 'xenconfig' shorthands
lib/raid6: Let $(UNROLL) rules work with macOS userland
kconfig: Support building mconf with vendor sysroot ncurses
kconfig: config script: add a little user help
MAINTAINERS: adjust GCC PLUGINS after gcc-plugin.sh removal
Linus Torvalds [Sun, 10 Jan 2021 21:17:21 +0000 (13:17 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is two driver fixes (megaraid_sas and hisi_sas).
The megaraid one is a revert of a previous revert of a cpu hotplug fix
which exposed a bug in the block layer which has been fixed in this
merge window.
The hisi_sas performance enhancement comes from switching to interrupt
managed completion queues, which depended on the addition of
devm_platform_get_irqs_affinity() which is now upstream via the irq
tree in the last merge window"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Expose HW queues for v2 hw
Revert "Revert "scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug""
Linus Torvalds [Sun, 10 Jan 2021 20:53:08 +0000 (12:53 -0800)]
Merge tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Missing CRC32 selections (Arnd)
- Fix for a merge window regression with bdev inode init (Christoph)
- bcache fixes
- rnbd fixes
- NVMe pull request from Christoph:
- fix a race in the nvme-tcp send code (Sagi Grimberg)
- fix a list corruption in an nvme-rdma error path (Israel Rukshin)
- avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
- add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
- fix two compiler warnings in nvme-fcloop (James Smart)
- don't call sleeping functions from irq context in nvme-fc (James Smart)
- remove an unused argument (Max Gurtovoy)
- remove unused exports (Minwoo Im)
- Use-after-free fix for partition iteration (Ming)
- Missing blk-mq debugfs flag annotation (John)
- Bdev freeze regression fix (Satya)
- blk-iocost NULL pointer deref fix (Tejun)
* tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block: (26 commits)
bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
bcache: check unsupported feature sets for bcache register
bcache: fix typo from SUUP to SUPP in features.h
bcache: set pdev_set_uuid before scond loop iteration
blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED
block/rnbd-clt: avoid module unload race with close confirmation
block/rnbd: Adding name to the Contributors List
block/rnbd-clt: Fix sg table use after free
block/rnbd-srv: Fix use after free in rnbd_srv_sess_dev_force_close
block/rnbd: Select SG_POOL for RNBD_CLIENT
block: pre-initialize struct block_device in bdev_alloc_inode
fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb
nvme: remove the unused status argument from nvme_trace_bio_complete
nvmet-rdma: Fix list_del corruption on queue establishment failure
nvme: unexport functions with no external caller
nvme: avoid possible double fetch in handling CQE
nvme-tcp: Fix possible race of io_work and direct send
nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
...
Linus Torvalds [Sun, 10 Jan 2021 20:39:38 +0000 (12:39 -0800)]
Merge tag 'io_uring-5.11-2021-01-10' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A bit larger than I had hoped at this point, but it's all changes that
will be directed towards stable anyway. In detail:
- Fix a merge window regression on error return (Matthew)
- Ensure SQPOLL is synchronized with creator life time (Pavel)"
* tag 'io_uring-5.11-2021-01-10' of git://git.kernel.dk/linux-block:
io_uring: stop SQPOLL submit on creator's death
io_uring: add warn_once for io_uring_flush()
io_uring: inline io_uring_attempt_task_drop()
io_uring: io_rw_reissue lockdep annotations
io_uring: synchronise ev_posted() with waitqueues
io_uring: dont kill fasync under completion_lock
io_uring: trigger eventfd for IOPOLL
io_uring: Fix return value from alloc_fixed_file_ref_node
io_uring: Delete useless variable ‘id’ in io_prep_async_work
io_uring: cancel more aggressively in exit_work
io_uring: drop file refs after task cancel
io_uring: patch up IOPOLL overflow_flush sync
io_uring: synchronise IOPOLL on task_submit fail
Linus Torvalds [Sun, 10 Jan 2021 20:28:07 +0000 (12:28 -0800)]
Merge tag 'staging-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.11-rc3. Nothing major,
just resolving some reported issues:
- cleanup some remaining mentions of the ION drivers that were
removed in 5.11-rc1
- comedi driver bugfix
- two error path memory leak fixes
All have been in linux-next for a while with no reported issues"
* tag 'staging-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: ION: remove some references to CONFIG_ION
staging: mt7621-dma: Fix a resource leak in an error handling path
Staging: comedi: Return -EFAULT if copy_to_user() fails
staging: spmi: hisi-spmi-controller: Fix some error handling paths
Linus Torvalds [Sun, 10 Jan 2021 20:24:33 +0000 (12:24 -0800)]
Merge tag 'char-misc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.11-rc3.
The majority here are fixes for the habanalabs drivers, but also in
here are:
- crypto driver fix
- pvpanic driver fix
- updated font file
- interconnect driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (26 commits)
Fonts: font_ter16x32: Update font with new upstream Terminus release
misc: pvpanic: Check devm_ioport_map() for NULL
speakup: Add github repository URL and bug tracker
MAINTAINERS: Update Georgi's email address
crypto: asym_tpm: correct zero out potential secrets
habanalabs: Fix memleak in hl_device_reset
interconnect: imx8mq: Use icc_sync_state
interconnect: imx: Remove a useless test
interconnect: imx: Add a missing of_node_put after of_device_is_available
interconnect: qcom: fix rpmh link failures
habanalabs: fix order of status check
habanalabs: register to pci shutdown callback
habanalabs: add validation cs counter, fix misplaced counters
habanalabs/gaudi: retry loading TPC f/w on -EINTR
habanalabs: adjust pci controller init to new firmware
habanalabs: update comment in hl_boot_if.h
habanalabs/gaudi: enhance reset message
habanalabs: full FW hard reset support
habanalabs/gaudi: disable CGM at HW initialization
habanalabs: Revise comment to align with mirror list name
...
Linus Torvalds [Sun, 10 Jan 2021 20:00:26 +0000 (12:00 -0800)]
Merge tag 'arc-5.11-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Address the 2nd boot failure due to snafu in signal handling code
(first was generic console ttynull issue)
- misc other fixes
* tag 'arc-5.11-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [hsdk]: Enable FPU_SAVE_RESTORE
ARC: unbork 5.11 bootup: fix snafu in _TIF_NOTIFY_SIGNAL handling
include/soc: remove headers for EZChip NPS
arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
Linus Torvalds [Sun, 10 Jan 2021 19:34:33 +0000 (11:34 -0800)]
Merge tag 'powerpc-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- A fix for machine check handling with VMAP stack on 32-bit.
- A clang build fix.
Thanks to Christophe Leroy and Nathan Chancellor.
* tag 'powerpc-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Handle .text.{hot,unlikely}.* in linker script
powerpc/32s: Fix RTAS machine check with VMAP stack
Linus Torvalds [Sun, 10 Jan 2021 19:31:17 +0000 (11:31 -0800)]
Merge tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"As expected, fixes started trickling in after the holidays so here is
the accumulated pile of x86 fixes for 5.11:
- A fix for fanotify_mark() missing the conversion of x86_32 native
syscalls which take 64-bit arguments to the compat handlers due to
former having a general compat handler. (Brian Gerst)
- Add a forgotten pmd page destructor call to pud_free_pmd_page()
where a pmd page is freed. (Dan Williams)
- Make IN/OUT insns with an u8 immediate port operand handling for
SEV-ES guests more precise by using only the single port byte and
not the whole s32 value of the insn decoder. (Peter Gonda)
- Correct a straddling end range check before returning the proper
MTRR type, when the end address is the same as top of memory.
(Ying-Tsun Huang)
- Change PQR_ASSOC MSR update scheme when moving a task to a resctrl
resource group to avoid significant performance overhead with some
resctrl workloads. (Fenghua Yu)
- Avoid the actual task move overhead when the task is already in the
resource group. (Fenghua Yu)"
* tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Don't move a task to the same resource group
x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
x86/mtrr: Correct the range check before performing MTRR type lookups
x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling
x86/mm: Fix leak of pmd ptlock
fanotify: Fix sys_fanotify_mark() on native x86-32
Linus Torvalds [Sat, 9 Jan 2021 19:22:30 +0000 (11:22 -0800)]
Merge tag 'hwmon-for-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix possible KASAN issue in amd_energy driver
- Avoid configuration problem in pwm-fan driver
- Fix kernel-doc warning in sbtsi_temp documentation
* tag 'hwmon-for-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (amd_energy) fix allocation of hwmon_channel_info config
hwmon: (pwm-fan) Ensure that calculation doesn't discard big period values
hwmon: (sbtsi_temp) Fix Documenation kernel-doc warning
Linus Torvalds [Sat, 9 Jan 2021 19:18:02 +0000 (11:18 -0800)]
Merge tag 'dmaengine-fix-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"A bunch of dmaengine driver fixes for:
- coverity discovered issues for xilinx driver
- qcom, gpi driver fix for undefined bhaviour and one off cleanup
- update Peter's email for TI DMA drivers
- one-off for idxd driver
- resource leak fix for mediatek and milbeaut drivers"
* tag 'dmaengine-fix-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: stm32-mdma: fix STM32_MDMA_VERY_HIGH_PRIORITY value
dmaengine: xilinx_dma: fix mixed_enum_type coverity warning
dmaengine: xilinx_dma: fix incompatible param warning in _child_probe()
dmaengine: xilinx_dma: check dma_async_device_register return value
dmaengine: qcom: fix gpi undefined behavior
dt-bindings: dma: ti: Update maintainer and author information
MAINTAINERS: Add entry for Texas Instruments DMA drivers
qcom: bam_dma: Delete useless kfree code
dmaengine: dw-edma: Fix use after free in dw_edma_alloc_chunk()
dmaengine: milbeaut-xdmac: Fix a resource leak in the error handling path of the probe function
dmaengine: mediatek: mtk-hsdma: Fix a resource leak in the error handling path of the probe function
dmaengine: qcom: gpi: Fixes a format mismatch
dmaengine: idxd: off by one in cleanup code
dmaengine: ti: k3-udma: Fix pktdma rchan TPL level setup
Linus Torvalds [Sat, 9 Jan 2021 19:04:48 +0000 (11:04 -0800)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Three driver bugfixes for I2C. Buisness as usual"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mediatek: Fix apdma and i2c hand-shake timeout
i2c: i801: Fix the i2c-mux gpiod_lookup_table not being properly terminated
i2c: sprd: use a specific timeout to avoid system hang up issue
Darrick J. Wong [Sat, 9 Jan 2021 06:46:02 +0000 (22:46 -0800)]
maintainers: update my email address
Change my email contact ahead of a likely painful eleven-month migration
to a certain cobalt enteprisey groupware cloud product that will totally
break my workflow. Some day I may get used to having to email being
sequestered behind both claret and cerulean oath2+sms 2fa layers, but
for now I'll stick with keying in one password to receive an email vs.
the required four.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Begunkov [Fri, 8 Jan 2021 20:57:25 +0000 (20:57 +0000)]
io_uring: stop SQPOLL submit on creator's death
When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
its internals like ->files and ->mm to be poked by the SQPOLL task, it
have never been nice and recently got racy. That can happen when the
owner undergoes destruction and SQPOLL tasks tries to submit new
requests in parallel, and so calls io_sq_thread_acquire*().
That patch halts SQPOLL submissions when sqo_task dies by introducing
sqo_dead flag. Once set, the SQPOLL task must not do any submission,
which is synchronised by uring_lock as well as the new flag.
The tricky part is to make sure that disabling always happens, that
means either the ring is discovered by creator's do_exit() -> cancel,
or if the final close() happens before it's done by the creator. The
last is guaranteed by the fact that for SQPOLL the creator task and only
it holds exactly one file note, so either it pins up to do_exit() or
removed by the creator on the final put in flush. (see comments in
uring_flush() around file->f_count == 2).
One more place that can trigger io_sq_thread_acquire_*() is
__io_req_task_submit(). Shoot off requests on sqo_dead there, even
though actually we don't need to. That's because cancellation of
sqo_task should wait for the request before going any further.
note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
caller would enter the ring to get an error, but it still doesn't
guarantee that the flag won't be cleared.
note 2: if final __userspace__ close happens not from the creator
task, the file note will pin the ring until the task dies.
Fixed: b1b6b5a30dce8 ("kernel/io_uring: cancel io_uring before task works") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Fri, 8 Jan 2021 20:57:24 +0000 (20:57 +0000)]
io_uring: add warn_once for io_uring_flush()
files_cancel() should cancel all relevant requests and drop file notes,
so we should never have file notes after that, including on-exit fput
and flush. Add a WARN_ONCE to be sure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Mon, 4 Jan 2021 07:41:22 +0000 (15:41 +0800)]
bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
set, it means the cache device is created with obsoleted layout with
obso_bucket_site_hi. Now bcache does not support this feature bit, a new
BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
for a better layout to support large bucket size.
For the legacy compatibility purpose, if a cache device created with
obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
devices attached to this cache set should be set to read-only. Then the
dirty data can be written back to backing device before re-create the
cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
by the latest bcache-tools.
This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
when running a cache set and attach a bcache device to the cache set. If
this bit is set,
- When run a cache set, print an error kernel message to indicate all
following attached bcache device will be read-only.
- When attach a bcache device, print an error kernel message to indicate
the attached bcache device will be read-only, and ask users to update
to latest bcache-tools.
Such change is only for cache device whose bucket size >= 32MB, this is
for the zoned SSD and almost nobody uses such large bucket size at this
moment. If you don't explicit set a large bucket size for a zoned SSD,
such change is totally transparent to your bcache device.
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Mon, 4 Jan 2021 07:41:21 +0000 (15:41 +0800)]
bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
was introduced into the incompat feature set. It used bucket_size_hi
(which was added at the tail of struct cache_sb_disk) to extend current
16bit bucket size to 32bit with existing bucket_size in struct
cache_sb_disk.
This is not a good idea, there are two obvious problems,
- Bucket size is always value power of 2, if store log2(bucket size) in
existing bucket_size of struct cache_sb_disk, it is unnecessary to add
bucket_size_hi.
- Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
struct cache_sb_disk, bucket_size_hi was added after d[] which makes
csum_set calculate an unexpected super block checksum.
To fix the above problems, this patch introduces a new incompat feature
bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
means bucket_size in struct cache_sb_disk stores the order of power-of-2
bucket size value. When user specifies a bucket size larger than 32768
sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
incompat feature set, and bucket_size stores log2(bucket size) more
than store the real bucket size value.
The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
recognized by kernel driver for legacy compatible purpose. The previous
bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
and not used in bcache-tools anymore.
For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
bcache-tools and kernel driver still recognize the feature string and
display it as "obso_large_bucket".
With this change, the unnecessary extra space extend of bcache on-disk
super block can be avoided, and csum_set() may generate expected check
sum as well.
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Mon, 4 Jan 2021 07:41:20 +0000 (15:41 +0800)]
bcache: check unsupported feature sets for bcache register
This patch adds the check for features which is incompatible for
current supported feature sets.
Now if the bcache device created by bcache-tools has features that
current kernel doesn't support, read_super() will fail with error
messoage. E.g. if an unsupported incompatible feature detected,
bcache register will fail with dmesg "bcache: register_bcache() error :
Unsupported incompatible feature found".
Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device") Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
Coly Li [Mon, 4 Jan 2021 07:41:19 +0000 (15:41 +0800)]
bcache: fix typo from SUUP to SUPP in features.h
This patch fixes the following typos,
from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP
Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device") Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket") Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sat, 9 Jan 2021 01:18:50 +0000 (17:18 -0800)]
Merge tag 'linux-kselftest-kunit-fixes-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit fixes from Shuah Khan:
"One fix to force the use of the 'tty' console for UML.
Given that kunit tool requires the console output, explicitly stating
the dependency makes sense than relying on it being the default"
* tag 'linux-kselftest-kunit-fixes-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: tool: Force the use of the 'tty' console for UML
Linus Torvalds [Sat, 9 Jan 2021 01:13:52 +0000 (17:13 -0800)]
Merge tag 'linux-kselftest-next-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Two minor fixes to vDSO test changes in this merge window"
* tag 'linux-kselftest-next-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/vDSO: fix -Wformat warning in vdso_test_correctness
selftests/vDSO: add additional binaries to .gitignore
Linus Torvalds [Sat, 9 Jan 2021 01:01:28 +0000 (17:01 -0800)]
Merge tag 'docs-5.11-3' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A handful of relatively small documentation fixes"
* tag 'docs-5.11-3' of git://git.lwn.net/linux:
docs: admin-guide: bootconfig: Fix feils to fails
Documentation/admin-guide: kernel-parameters: hyphenate comma-separated
docs: binfmt-misc: Fix .rst formatting
docs: remove mention of ENABLE_MUST_CHECK
atomic: remove further references to atomic_ops
Documentation: doc-guide: fixes to sphinx.rst
docs/mm: concepts.rst: Correct the threshold to low watermark
Documentation: admin: early_param()s are also listed in kernel-parameters
docs: Fix reST markup when linking to sections
Linus Torvalds [Fri, 8 Jan 2021 23:45:47 +0000 (15:45 -0800)]
Merge tag 'devprop-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework fixes from Rafael Wysocki:
"Revert a problematic commit that went in during the 5.10 cycle and
improve the kerneldoc description of the function affected by it (both
changes from Bard Liao)"
* tag 'devprop-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
device property: add description of fwnode cases
Revert "device property: Keep secondary firmware node secondary by type"
Linus Torvalds [Fri, 8 Jan 2021 23:42:56 +0000 (15:42 -0800)]
Merge tag 'acpi-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These address two build issues and drop confusing text from a couple
of Kconfig entries.
Specifics:
- Drop two local variables that are never read and the code updating
their values from the x86 suspend-to-idle code (Rafael Wysocki)
- Add empty stub of an ACPI helper function to avoid build issues
when CONFIG_ACPI is not set (Shawn Guo)
- Remove confusing text regarding modules from Kconfig entries that
correspond to non-modular code (Peter Robinson)"
* tag 'acpi-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: Update Kconfig help text for items that are no longer modular
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
ACPI: PM: s2idle: Drop unused local variables and related code
Linus Torvalds [Fri, 8 Jan 2021 23:39:25 +0000 (15:39 -0800)]
Merge tag 'pm-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These address two issues in the intel_pstate driver and one in the
powernow-k8 cpufreq driver.
Specifics:
- Make the powernow-k8 cpufreq driver avoid calling
cpufreq_cpu_get(), which theoretically may return NULL, to get a
policy pointer that is known to it already (Colin Ian King)
- Drop two functions that are not used any more from the intel_pstate
driver (Lukas Bulwahn)
- Make intel_pstate check the HWP capabilities to get the maximum
available P-state in the passive mode to avoid using a stale value
of it in case of out-of-band updates (Rafael Wysocki)"
* tag 'pm-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: remove obsolete functions
cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()
cpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf()
Linus Torvalds [Fri, 8 Jan 2021 23:12:08 +0000 (15:12 -0800)]
Merge tag 'drm-fixes-2021-01-08' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Daniel Vetter:
"Looks like people are back from the break, usual small pile of fixes
all over. Next week Dave should be back.
The only thing pending I'm aware of is a "this shouldn't have become
uapi" reverts for amdgpu, but they're already on the list and not that
important really so can wait another week.
Summary:
- fix for ttm list corruption in radeon, reported by a few people
- fixes for amdgpu, i915, msm
- dma-buf use-after free fix"
* tag 'drm-fixes-2021-01-08' of git://anongit.freedesktop.org/drm/drm: (29 commits)
drm/msm: Only enable A6xx LLCC code on A6xx
drm/msm: Add modparam to allow vram carveout
drm/msm: Call msm_init_vram before binding the gpu
drm/msm/dp: postpone irq_hpd event during connection pending state
drm/ttm: unexport ttm_pool_init/fini
drm/radeon: stop re-init the TTM page pool
dmabuf: fix use-after-free of dmabuf's file->f_inode
Revert "drm/amd/display: Fix memory leaks in S3 resume"
drm/amdgpu/display: drop DCN support for aarch64
drm/amdgpu: enable ras eeprom support for sienna cichlid
drm/amdgpu: fix no bad_pages issue after umc ue injection
drm/amdgpu: fix potential memory leak during navi12 deinitialization
drm/amd/display: Fix unused variable warning
drm/amd/pm: improve the fine grain tuning function for RV/RV2/PCO
drm/amd/pm: fix the failure when change power profile for renoir
drm/amdgpu: fix a GPU hang issue when remove device
drm/amdgpu: fix a memory protection fault when remove amdgpu device
drm/amdgpu: switched to cached noretry setting for vangogh
drm/amd/display: fix sysfs amdgpu_current_backlight_pwm NULL pointer issue
drm/amd/pm: updated PM to I2C controller port on sienna cichlid
...
Linus Torvalds [Fri, 8 Jan 2021 23:06:02 +0000 (15:06 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86:
- Fixes for the new scalable MMU
- Fixes for migration of nested hypervisors on AMD
- Fix for clang integrated assembler
- Fix for left shift by 64 (UBSAN)
- Small cleanups
- Straggler SEV-ES patch
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (38 commits)
KVM: x86: __kvm_vcpu_halt can be static
KVM: SVM: Add support for booting APs in an SEV-ES guest
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode
KVM: nSVM: correctly restore nested_run_pending on migration
KVM: x86/mmu: Clarify TDP MMU page list invariants
KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
kvm: check tlbs_dirty directly
KVM: x86: change in pv_eoi_get_pending() to make code more readable
MAINTAINERS: Really update email address for Sean Christopherson
KVM: x86: fix shift out of bounds reported by UBSAN
KVM: selftests: Implement perf_test_util more conventionally
KVM: selftests: Use vm_create_with_vcpus in create_vm
KVM: selftests: Factor out guest mode code
KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c
KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load
KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte()
KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array
KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
...
Linus Torvalds [Fri, 8 Jan 2021 22:55:41 +0000 (14:55 -0800)]
Merge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull iommu fixes from Will Deacon:
"This is mainly all Intel VT-D stuff, but there are some fixes for AMD
and ARM as well.
We've also got the revert I promised during the merge window, which
removes a temporary hack to accomodate i915 while we transitioned the
Intel IOMMU driver over to the common DMA-IOMMU API.
Finally, there are still a couple of other VT-D fixes floating around,
so I expect to send you another batch of fixes next week.
Summary:
- Fix VT-D TLB invalidation for subdevices
- Fix VT-D use-after-free on subdevice detach
- Fix VT-D locking so that IRQs are disabled during SVA bind/unbind
- Fix VT-D address alignment when flushing IOTLB
- Fix memory leak in VT-D IRQ remapping failure path
- Revert temporary i915 sglist hack now that it is no longer required
- Fix sporadic boot failure with Arm SMMU on Qualcomm SM8150
- Fix NULL dereference in AMD IRQ remapping code with remapping disabled
- Fix accidental enabling of irqs on AMD resume-from-suspend path
- Fix some typos in comments"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
iommu/vt-d: Fix ineffective devTLB invalidation for subdevices
iommu/vt-d: Fix general protection fault in aux_detach_device()
iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev
iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context
iommu/vt-d: Fix lockdep splat in sva bind()/unbind()
Revert "iommu: Add quirk for Intel graphic devices in map_sg"
iommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb()
iommu/amd: Stop irq_remapping_select() matching when remapping is disabled
iommu/amd: Set iommu->int_enabled consistently when interrupts are set up
iommu/intel: Fix memleak in intel_irq_remapping_alloc
iommu/iova: fix 'domain' typos
Linus Torvalds [Fri, 8 Jan 2021 22:13:54 +0000 (14:13 -0800)]
Merge tag 'arm-fixes-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are a small number of bug fixes that all came in before or
during the merge window, most for the omap platform:
- One boot regression fix for Nokia N9 (OMAP3).
- Two small defconfig changes for omap2, to reflect changes in
drivers
- Warning fixes for DT issues on omap2, picoxcell and bitmap SoCs.
The picoxcell platform will be removed in v5.12, but fixing it
first makes it easier to backport to the fix to stable kernels and
get a clean build with new dtc versions"
* tag 'arm-fixes-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: picoxcell: fix missing interrupt-parent properties
ARM: dts: ux500/golden: Set display max brightness
arm64: dts: bitmain: Use generic "ngpios" rather than "snps,nr-gpios"
ARM: omap2: pmic-cpcap: fix maximum voltage to be consistent with defaults on xt875
ARM: omap2plus_defconfig: enable SPI GPIO
ARM: OMAP2+: omap_device: fix idling of devices during probe
ARM: dts: OMAP3: disable AES on N950/N9
ARM: omap2plus_defconfig: drop unused POWER_AVS option
Linus Torvalds [Fri, 8 Jan 2021 22:11:34 +0000 (14:11 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Clean-ups following the merging window: remove unused variable,
duplicate includes, superfluous barrier, move some inline asm to
separate functions.
- Disable top-byte-ignore on kernel code addresses with KASAN/MTE
enabled (already done when MTE is disabled).
- Fix ARCH_LOW_ADDRESS_LIMIT definition with CONFIG_ZONE_DMA disabled.
- Compiler/linker flags: link with "-z norelno", discard .eh_frame_hdr
instead of --no-eh-frame-hdr.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Move PSTATE.TCO setting to separate functions
arm64: kasan: Set TCR_EL1.TBID1 when KASAN_HW_TAGS is enabled
arm64: vdso: disable .eh_frame_hdr via /DISCARD/ instead of --no-eh-frame-hdr
arm64: traps: remove duplicate include statement
arm64: link with -z norelro for LLD or aarch64-elf
arm64: mm: Fix ARCH_LOW_ADDRESS_LIMIT when !CONFIG_ZONE_DMA
arm64: mte: remove an ISB on kernel exit
arm64/smp: Remove unused irq variable in arch_show_interrupts()
Linus Torvalds [Fri, 8 Jan 2021 20:12:30 +0000 (12:12 -0800)]
Merge tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking fixes from Jakub Kicinski:
"Slightly lighter pull request to get back into the Thursday cadence.
Current release - always broken:
- can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions
- dsa: hellcreek: fix led_classdev build errors
Previous releases - regressions:
- ipv6: fib: flush exceptions when purging route to avoid netdev
reference leak
- ip_tunnels: fix pmtu check in nopmtudisc mode
- ip: always refragment ip defragmented packets to avoid MTU issues
when forwarding through tunnels, correct "packet too big" message
is prohibitively tricky to generate
- s390/qeth: fix locking for discipline setup / removal and during
recovery to prevent both deadlocks and races
- mlx5: Use port_num 1 instead of 0 when delete a RoCE address
Previous releases - always broken:
- cdc_ncm: correct overhead calculation in delayed_ndp_size to
prevent out of bound accesses with Huawei 909s-120 LTE module
- fix stmmac dwmac-sun8i suspend/resume:
- PHY being left powered off
- MAC syscon configuration being reset
- reference to the reset controller being improperly dropped
- qrtr: fix null-ptr-deref in qrtr_ns_remove
- can: tcan4x5x: fix bittiming const, use common bittiming from m_can
driver
- mlx5e: CT: Use per flow counter when CT flow accounting is enabled
- mlx5e: Fix SWP offsets when vlan inserted by driver
Misc:
- bpf: Fix a task_iter bug caused by a bpf -> net merge conflict
resolution
And the usual many fixes to various error paths"
* tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
s390/qeth: fix locking for discipline setup / removal
s390/qeth: fix deadlock during recovery
selftests: fib_nexthops: Fix wrong mausezahn invocation
nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
nexthop: Unlink nexthop group entry in error path
nexthop: Fix off-by-one error in error path
octeontx2-af: fix memory leak of lmac and lmac->name
chtls: Fix chtls resources release sequence
chtls: Added a check to avoid NULL pointer dereference
chtls: Replace skb_dequeue with skb_peek
chtls: Avoid unnecessary freeing of oreq pointer
chtls: Fix panic when route to peer not configured
chtls: Remove invalid set_tcb call
chtls: Fix hardware tid leak
net: ip: always refragment ip defragmented packets
net: fix pmtu check in nopmtudisc mode
selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking
docs: octeontx2: tune rst markup
...
Linus Torvalds [Thu, 7 Jan 2021 17:43:54 +0000 (09:43 -0800)]
poll: fix performance regression due to out-of-line __put_user()
The kernel test robot reported a -5.8% performance regression on the
"poll2" test of will-it-scale, and bisected it to commit d55564cfc222
("x86: Make __put_user() generate an out-of-line call").
I didn't expect an out-of-line __put_user() to matter, because no normal
core code should use that non-checking legacy version of user access any
more. But I had overlooked the very odd poll() usage, which does a
__put_user() to update the 'revents' values of the poll array.
Now, Al Viro correctly points out that instead of updating just the
'revents' field, it would be much simpler to just copy the _whole_
pollfd entry, and then we could just use "copy_to_user()" on the whole
array of entries, the same way we use "copy_from_user()" a few lines
earlier to get the original values.
But that is not what we've traditionally done, and I worry that threaded
applications might be concurrently modifying the other fields of the
pollfd array. So while Al's suggestion is simpler - and perhaps worth
trying in the future - this instead keeps the "just update revents"
model.
To fix the performance regression, use the modern "unsafe_put_user()"
instead of __put_user(), with the proper "user_write_access_begin()"
guarding in place. This improves code generation enormously.
Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Laight <David.Laight@aculab.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commit caused that ttynull was used as the default console
on several systems[1][2][3]. As a result, the console was
blank even when a better alternative existed.
It happened when there was no console configured
on the command line and ttynull_init() was the first initcall
calling register_console().
Or it happened when /dev/ did not exist when console_on_rootfs()
was called. It was not able to open /dev/console even though
a console driver was registered. It tried to add ttynull console
but it obviously did not help. But ttynull became the preferred
console and was used by /dev/console when it was available later.
The commit tried to fix a historical problem that have been there
for ages. The primary motivation was the commit 3cffa06aeef7ece30f6
("printk/console: Allow to disable console output by using console=""
or console=null"). It provided a clean solution for a workaround
that was widely used and worked only by chance.
This revert causes that the console="" or console=null command line
options will again work only by chance. These options will cause that
a particular console will be preferred and the default (tty) ones
will not get enabled. There will be no console registered at
all. As a result there won't be stdin, stdout, and stderr for
the init process. But it worked exactly this way even before.
The proper solution has to fulfill many conditions:
+ Register ttynull only when explicitly required or as
the ultimate fallback.
+ ttynull should get associated with /dev/console but it must
not become preferred console when used as a fallback.
Especially, it must still be possible to replace it
by a better console later.
Such a change requires clean up of the register_console() code.
Otherwise, it would be even harder to follow. Especially, the use
of has_preferred_console and CON_CONSDEV flag is tricky. The clean
up is risky. The ordering of consoles is not well defined. And
any changes tend to break existing user settings.
Do the revert at the least risky solution for now.
David Arcari [Thu, 7 Jan 2021 14:47:07 +0000 (09:47 -0500)]
hwmon: (amd_energy) fix allocation of hwmon_channel_info config
hwmon, specifically hwmon_num_channel_attrs, expects the config
array in the hwmon_channel_info structure to be terminated by
a zero entry. amd_energy does not honor this convention. As
result, a KASAN warning is possible. Fix this by adding an
additional entry and setting it to zero.
Fixes: 8abee9566b7e ("hwmon: Add amd_energy driver to report energy counters") Signed-off-by: David Arcari <darcari@redhat.com> Cc: Naveen Krishna Chatradhi <nchatrad@amd.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: David Arcari <darcari@redhat.com> Acked-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Link: https://lore.kernel.org/r/20210107144707.6927-1-darcari@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
The problem is both code path try to close same session, which lead to
panic.
To fix it, just skip the sess if the refcount already drop to 0.
Fixes: f7a7a5c228d4 ("block/rnbd: client: main functionality") Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Swapnil Ingle [Fri, 8 Jan 2021 14:36:33 +0000 (15:36 +0100)]
block/rnbd: Adding name to the Contributors List
Adding name to the Contributors List
Signed-off-by: Swapnil Ingle <ingleswapnil@gmail.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Acked-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Guoqing Jiang [Fri, 8 Jan 2021 14:36:32 +0000 (15:36 +0100)]
block/rnbd-clt: Fix sg table use after free
Since dynamically allocate sglist is used for rnbd_iu, we can't free sg
table after send_usr_msg since the callback function (cqe.done) could
still access the sglist.
Otherwise KASAN reports UAF issue:
[ 4856.600257] BUG: KASAN: use-after-free in dma_direct_unmap_sg+0x53/0x290
[ 4856.600772] Read of size 4 at addr ffff888206af3a98 by task swapper/1/0
[ 778.226506] The buggy address belongs to the object at ffff88b1d6516c00
which belongs to the cache kmalloc-512 of size 512
[ 778.227464] The buggy address is located 40 bytes inside of
512-byte region [ffff88b1d6516c00, ffff88b1d6516e00)
The problem is in the sess_dev release function we call
rnbd_destroy_sess_dev, and could free the sess_dev already, but we still
set the keep_id in rnbd_srv_sess_dev_force_close, which lead to use
after free.
To fix it, move the keep_id before the sysfs removal, and cache the
rnbd_srv_session for lock accessing,
Fixes: 786998050cbc ("block/rnbd-srv: close a mapped device from server side.") Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jack Wang [Fri, 8 Jan 2021 14:36:30 +0000 (15:36 +0100)]
block/rnbd: Select SG_POOL for RNBD_CLIENT
lkp reboot following build error:
drivers/block/rnbd/rnbd-clt.c: In function 'rnbd_softirq_done_fn':
>> drivers/block/rnbd/rnbd-clt.c:387:2: error: implicit declaration of function 'sg_free_table_chained' [-Werror=implicit-function-declaration]
387 | sg_free_table_chained(&iu->sgt, RNBD_INLINE_SG_CNT);
| ^~~~~~~~~~~~~~~~~~~~~
The reason is CONFIG_SG_POOL is not enabled in the config, to
avoid such failure, select SG_POOL in Kconfig for RNBD_CLIENT.
Fixes: 5a1328d0c3a7 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fenghua Yu [Thu, 17 Dec 2020 22:31:19 +0000 (14:31 -0800)]
x86/resctrl: Don't move a task to the same resource group
Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.
Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.
Fenghua Yu [Thu, 17 Dec 2020 22:31:18 +0000 (14:31 -0800)]
x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.
This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.
This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
block: pre-initialize struct block_device in bdev_alloc_inode
bdev_evict_inode and bdev_free_inode are also called for the root inode
of bdevfs, for which bdev_alloc is never called. Move the zeroing o
f struct block_device and the initialization of the bd_bdi field into
bdev_alloc_inode to make sure they are initialized for the root inode
as well.
Jakub Kicinski [Fri, 8 Jan 2021 03:13:29 +0000 (19:13 -0800)]
Merge tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-01-07
* tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups
net/mlx5e: Fix two double free cases
net/mlx5: Release devlink object if adev fails
net/mlx5e: ethtool, Fix restriction of autoneg with 56G
net/mlx5e: In skb build skip setting mark in switchdev mode
net/mlx5: E-Switch, fix changing vf VLANID
net/mlx5e: Fix SWP offsets when vlan inserted by driver
net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
net/mlx5e: Add missing capability check for uplink follow
net/mlx5: Check if lag is supported before creating one
====================
Jakub Kicinski [Fri, 8 Jan 2021 02:54:08 +0000 (18:54 -0800)]
Merge branch 's390-qeth-fixes-2021-01-07'
Julian Wiedmann says:
====================
s390/qeth: fixes 2021-01-07
This brings two locking fixes for the device control path.
Also one fix for a path where our .ndo_features_check() attempts to
access a non-existent L2 header.
====================
Julian Wiedmann [Thu, 7 Jan 2021 17:24:42 +0000 (18:24 +0100)]
s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
ip_finish_output_gso() may call .ndo_features_check() even before the
skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt
to inspect the L2 header via vlan_eth_hdr().
Switch to vlan_get_protocol(), as already used further down in the
common qeth_features_check() path.
Fixes: f13ade199391 ("s390/qeth: run non-offload L3 traffic over common xmit path") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Thu, 7 Jan 2021 17:24:41 +0000 (18:24 +0100)]
s390/qeth: fix locking for discipline setup / removal
Due to insufficient locking, qeth_core_set_online() and
qeth_dev_layer2_store() can run in parallel, both attempting to load &
setup the discipline (and stepping on each other toes along the way).
A similar race can also occur between qeth_core_remove_device() and
qeth_dev_layer2_store().
Access to .discipline is meant to be protected by the discipline_mutex,
so add/expand the locking in qeth_core_remove_device() and
qeth_core_set_online().
Adjust the locking in qeth_l*_remove_device() accordingly, as it's now
handled by the callers in a consistent manner.
Based on an initial patch by Ursula Braun.
Fixes: 9dc48ccc68b9 ("qeth: serialize sysfs-triggered device configurations") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Thu, 7 Jan 2021 17:24:40 +0000 (18:24 +0100)]
s390/qeth: fix deadlock during recovery
When qeth_dev_layer2_store() - holding the discipline_mutex - waits
inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete,
we can hit a deadlock if qeth_do_reset() concurrently calls
qeth_set_online() and thus tries to aquire the discipline_mutex.
Move the discipline_mutex locking outside of qeth_set_online() and
qeth_set_offline(), and turn the discipline into a parameter so that
callers understand the dependency.
To fix the deadlock, we can now relax the locking:
As already established, qeth_l*_remove_device() waits for
qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk
of having card->discipline ripped out while it's running, and thus
doesn't need to take the discipline_mutex.
Fixes: 9dc48ccc68b9 ("qeth: serialize sysfs-triggered device configurations") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 8 Jan 2021 02:47:21 +0000 (18:47 -0800)]
Merge branch 'nexthop-various-fixes'
Ido Schimmel says:
====================
nexthop: Various fixes
This series contains various fixes for the nexthop code. The bugs were
uncovered during the development of resilient nexthop groups.
Patches #1-#2 fix the error path of nexthop_create_group(). I was not
able to trigger these bugs with current code, but it is possible with
the upcoming resilient nexthop groups code which adds a user
controllable memory allocation further in the function.
Patch #3 fixes wrong validation of netlink attributes.
Patch #4 fixes wrong invocation of mausezahn in a selftest.
====================
For IPv6 traffic, mausezahn needs to be invoked with '-6'. Otherwise an
error is returned:
# ip netns exec me mausezahn veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn"
Failed to set source IPv4 address. Please check if source is set to a valid IPv4 address.
Invalid command line parameters!
Fixes: 7c741868ceab ("selftests: Add torture tests to nexthop tests") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Thu, 7 Jan 2021 14:48:23 +0000 (16:48 +0200)]
nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
The function nh_check_attr_group() is called to validate nexthop groups.
The intention of that code seems to have been to bounce all attributes
above NHA_GROUP_TYPE except for NHA_FDB. However instead it bounces all
these attributes except when NHA_FDB attribute is present--then it accepts
them.
NHA_FDB validation that takes place before, in rtm_to_nh_config(), already
bounces NHA_OIF, NHA_BLACKHOLE, NHA_ENCAP and NHA_ENCAP_TYPE. Yet further
back, NHA_GROUPS and NHA_MASTER are bounced unconditionally.
But that still leaves NHA_GATEWAY as an attribute that would be accepted in
FDB nexthop groups (with no meaning), so long as it keeps the address
family as unspecified:
# ip nexthop add id 1 fdb via 127.0.0.1
# ip nexthop add id 10 fdb via default group 1
The nexthop code is still relatively new and likely not used very broadly,
and the FDB bits are newer still. Even though there is a reproducer out
there, it relies on an improbable gateway arguments "via default", "via
all" or "via any". Given all this, I believe it is OK to reformulate the
condition to do the right thing and bounce NHA_GATEWAY.
Fixes: 38428d68719c ("nexthop: support for fdb ecmp nexthops") Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 7 Jan 2021 14:48:22 +0000 (16:48 +0200)]
nexthop: Unlink nexthop group entry in error path
In case of error, remove the nexthop group entry from the list to which
it was previously added.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Thu, 7 Jan 2021 14:48:21 +0000 (16:48 +0200)]
nexthop: Fix off-by-one error in error path
A reference was not taken for the current nexthop entry, so do not try
to put it in the error path.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Thu, 7 Jan 2021 12:39:16 +0000 (12:39 +0000)]
octeontx2-af: fix memory leak of lmac and lmac->name
Currently the error return paths don't kfree lmac and lmac->name
leading to some memory leaks. Fix this by adding two error return
paths that kfree these objects
Addresses-Coverity: ("Resource leak") Fixes: 1463f382f58d ("octeontx2-af: Add support for CGX link management") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210107123916.189748-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ayush Sawal [Wed, 6 Jan 2021 04:29:12 +0000 (09:59 +0530)]
chtls: Fix chtls resources release sequence
CPL_ABORT_RPL is sent after releasing the resources by calling
chtls_release_resources(sk); and chtls_conn_done(sk);
eventually causing kernel panic. Fixing it by calling release
in appropriate order.
Ayush Sawal [Wed, 6 Jan 2021 04:29:10 +0000 (09:59 +0530)]
chtls: Replace skb_dequeue with skb_peek
The skb is unlinked twice, one in __skb_dequeue in function
chtls_reset_synq() and another in cleanup_syn_rcv_conn().
So in this patch using skb_peek() instead of __skb_dequeue(),
so that unlink will be handled only in cleanup_syn_rcv_conn().
Ayush Sawal [Wed, 6 Jan 2021 04:29:07 +0000 (09:59 +0530)]
chtls: Remove invalid set_tcb call
At the time of SYN_RECV, connection information is not
initialized at FW, updating tcb flag over uninitialized
connection causes adapter crash. We don't need to
update the flag during SYN_RECV state, so avoid this.
Ayush Sawal [Wed, 6 Jan 2021 04:29:06 +0000 (09:59 +0530)]
chtls: Fix hardware tid leak
send_abort_rpl() is not calculating cpl_abort_req_rss offset and
ends up sending wrong TID with abort_rpl WR causng tid leaks.
Replaced send_abort_rpl() with chtls_send_abort_rpl() as it is
redundant.
Tom Lendacky [Mon, 4 Jan 2021 20:20:01 +0000 (14:20 -0600)]
KVM: SVM: Add support for booting APs in an SEV-ES guest
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.
Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.
First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.
Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.
To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.
A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Maxim Levitsky [Thu, 7 Jan 2021 09:38:51 +0000 (11:38 +0200)]
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
It is possible to exit the nested guest mode, entered by
svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
if the nested run was not pending during the migration.
In this case we must not switch to the nested msr permission bitmap.
Also add a warning to catch similar cases in the future.
Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first nested VM run") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ben Gardon [Thu, 7 Jan 2021 00:19:35 +0000 (16:19 -0800)]
KVM: x86/mmu: Clarify TDP MMU page list invariants
The tdp_mmu_roots and tdp_mmu_pages in struct kvm_arch should only contain
pages with tdp_mmu_page set to true. tdp_mmu_pages should not contain any
pages with a non-zero root_count and tdp_mmu_roots should only contain
pages with a positive root_count, unless a thread holds the MMU lock and
is in the process of modifying the list. Various functions expect these
invariants to be maintained, but they are not explictily documented. Add
to the comments on both fields to document the above invariants.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ben Gardon [Thu, 7 Jan 2021 00:19:34 +0000 (16:19 -0800)]
KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
Many TDP MMU functions which need to perform some action on all TDP MMU
roots hold a reference on that root so that they can safely drop the MMU
lock in order to yield to other threads. However, when releasing the
reference on the root, there is a bug: the root will not be freed even
if its reference count (root_count) is reduced to 0.
To simplify acquiring and releasing references on TDP MMU root pages, and
to ensure that these roots are properly freed, move the get/put operations
into another TDP MMU root iterator macro.
Moving the get/put operations into an iterator macro also helps
simplify control flow when a root does need to be freed. Note that using
the list_for_each_entry_safe macro would not have been appropriate in
this situation because it could keep a pointer to the next root across
an MMU lock release + reacquire, during which time that root could be
freed.
Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: 063afacd8730 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU") Fixes: a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-1-bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Lai Jiangshan [Thu, 17 Dec 2020 15:41:18 +0000 (23:41 +0800)]
kvm: check tlbs_dirty directly
In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
need_tlb_flush |= kvm->tlbs_dirty;
with need_tlb_flush's type being int and tlbs_dirty's type being long.
It means that tlbs_dirty is always used as int and the higher 32 bits
is useless. We need to check tlbs_dirty in a correct way and this
change checks it directly without propagating it to need_tlb_flush.
Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
cause problems in practice. It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow
paging or be running a nested guest.
Cc: stable@vger.kernel.org Fixes: a4ee1ca4a36e ("KVM: MMU: delay flush all tlbs on sync_page path") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Stephen Zhang [Fri, 18 Dec 2020 07:51:37 +0000 (15:51 +0800)]
KVM: x86: change in pv_eoi_get_pending() to make code more readable
Signed-off-by: Stephen Zhang <stephenzhangzsd@gmail.com>
Message-Id: <1608277897-1932-1-git-send-email-stephenzhangzsd@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>