1) Fix truncation of 32-bit right shift in bpf, from Jann Horn.
2) Fix memory leak in wireless wext compat, from Stefan Seyfried.
3) Use after free in cfg80211's reg_process_hint(), from Yu Zhao.
4) Need to cancel pending work when unbinding in smsc75xx otherwise
we oops, also from Yu Zhao.
5) Don't allow enslaving a team device to itself, from Ido Schimmel.
6) Fix backwards compat with older userspace for rtnetlink FDB dumps.
From Mauricio Faria.
7) Add validation of tc policy netlink attributes, from David Ahern.
8) Fix RCU locking in rawv6_send_hdrinc(), from Wei Wang."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
net: mvpp2: Extract the correct ethtype from the skb for tx csum offload
ipv6: take rcu lock in rawv6_send_hdrinc()
net: sched: Add policy validation for tc attributes
rtnetlink: fix rtnl_fdb_dump() for ndmsg header
yam: fix a missing-check bug
net: bpfilter: Fix type cast and pointer warnings
net: cxgb3_main: fix a missing-check bug
bpf: 32-bit RSH verification must truncate input before the ALU op
net: phy: phylink: fix SFP interface autodetection
be2net: don't flip hw_features when VXLANs are added/deleted
net/packet: fix packet drop as of virtio gso
net: dsa: b53: Keep CPU port as tagged in all VLANs
openvswitch: load NAT helper
bnxt_en: get the reduced max_irqs by the ones used by RDMA
bnxt_en: free hwrm resources, if driver probe fails.
bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request
bnxt_en: Fix VNIC reservations on the PF.
team: Forbid enslaving team device to itself
net/usb: cancel pending work when unbinding smsc75xx
mlxsw: spectrum: Delete RIF when VLAN device is removed
...
* akpm:
mm: madvise(MADV_DODUMP): allow hugetlbfs pages
ocfs2: fix locking for res->tracking and dlm->tracking_list
mm/vmscan.c: fix int overflow in callers of do_shrink_slab()
mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
mm/vmstat.c: fix outdated vmstat_text
proc: restrict kernel stack dumps to root
mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes
mm/migrate.c: split only transparent huge pages when allocation fails
ipc/shm.c: use ERR_CAST() for shm_lock() error return
mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl
mm, thp: fix mlocking THP page with migration enabled
ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()
hugetlb: take PMD sharing into account when flushing tlb/caches
mm: migration: fix migration of huge PMD shared pages
hugetlbfs pages have VM_DONTEXPAND in the VmFlags driver pages based on
author testing with analysis from Florian Weimer[1].
The inclusion of VM_DONTEXPAND into the VM_SPECIAL defination was a
consequence of the large useage of VM_DONTEXPAND in device drivers.
A consequence of [2] is that VM_DONTEXPAND marked pages are unable to be
marked DODUMP.
A user could quite legitimately madvise(MADV_DONTDUMP) their hugetlbfs
memory for a while and later request that madvise(MADV_DODUMP) on the same
memory. We correct this omission by allowing madvice(MADV_DODUMP) on
hugetlbfs pages.
[1] https://stackoverflow.com/questions/52548260/madvisedodump-on-the-same-ptr-size-as-a-successful-madvisedontdump-fails-wit
[2] commit 6c709454325b ("mm: prepare VM_DONTDUMP for using in drivers")
Link: http://lkml.kernel.org/r/20180930054629.29150-1-daniel@linux.ibm.com Link: https://lists.launchpad.net/maria-discuss/msg05245.html Fixes: 6c709454325b ("mm: prepare VM_DONTDUMP for using in drivers") Reported-by: Kenneth Penza <kpenza@gmail.com> Signed-off-by: Daniel Black <daniel@linux.ibm.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ashish Samant [Fri, 5 Oct 2018 22:52:15 +0000 (15:52 -0700)]
ocfs2: fix locking for res->tracking and dlm->tracking_list
In dlm_init_lockres() we access and modify res->tracking and
dlm->tracking_list without holding dlm->track_lock. This can cause list
corruptions and can end up in kernel panic.
Fix this by locking res->tracking and dlm->tracking_list with
dlm->track_lock instead of dlm->spinlock.
Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kirill Tkhai [Fri, 5 Oct 2018 22:52:10 +0000 (15:52 -0700)]
mm/vmscan.c: fix int overflow in callers of do_shrink_slab()
do_shrink_slab() returns unsigned long value, and the placing into int
variable cuts high bytes off. Then we compare ret and 0xfffffffe (since
SHRINK_EMPTY is converted to ret type).
Thus a large number of objects returned by do_shrink_slab() may be
interpreted as SHRINK_EMPTY, if low bytes of their value are equal to
0xfffffffe. Fix that by declaration ret as unsigned long in these
functions.
Jann Horn [Fri, 5 Oct 2018 22:52:07 +0000 (15:52 -0700)]
mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
13b7ab412ab6 ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even
on UP") made the availability of the NR_TLB_REMOTE_FLUSH* counters inside
the kernel unconditional to reduce #ifdef soup, but (either to avoid
showing dummy zero counters to userspace, or because that code was missed)
didn't update the vmstat_array, meaning that all following counters would
be shown with incorrect values.
This only affects kernel builds with
CONFIG_VM_EVENT_COUNTERS=y && CONFIG_DEBUG_TLBFLUSH=y && CONFIG_SMP=n.
Link: http://lkml.kernel.org/r/20181001143138.95119-2-jannh@google.com Fixes: 13b7ab412ab6 ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Kemi Wang <kemi.wang@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jann Horn [Fri, 5 Oct 2018 22:52:03 +0000 (15:52 -0700)]
mm/vmstat.c: fix outdated vmstat_text
01072c93cb8c ("mm: get rid of vmacache_flush_all() entirely") removed the
VMACACHE_FULL_FLUSHES statistics, but didn't remove the corresponding
entry in vmstat_text. This causes an out-of-bounds access in
vmstat_show().
Luckily this only affects kernels with CONFIG_DEBUG_VM_VMACACHE=y, which
is probably very rare.
Link: http://lkml.kernel.org/r/20181001143138.95119-1-jannh@google.com Fixes: 01072c93cb8c ("mm: get rid of vmacache_flush_all() entirely") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Kemi Wang <kemi.wang@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jann Horn [Fri, 5 Oct 2018 22:51:58 +0000 (15:51 -0700)]
proc: restrict kernel stack dumps to root
Currently, you can use /proc/self/task/*/stack to cause a stack walk on
a task you control while it is running on another CPU. That means that
the stack can change under the stack walker. The stack walker does
have guards against going completely off the rails and into random
kernel memory, but it can interpret random data from your kernel stack
as instruction pointers and stack pointers. This can cause exposure of
kernel stack contents to userspace.
Restrict the ability to inspect kernel stacks of arbitrary tasks to root
in order to prevent a local attacker from exploiting racy stack unwinding
to leak kernel task stack contents. See the added comment for a longer
rationale.
There don't seem to be any users of this userspace API that can't
gracefully bail out if reading from the file fails. Therefore, I believe
that this change is unlikely to break things. In the case that this patch
does end up needing a revert, the next-best solution might be to fake a
single-entry stack based on wchan.
Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com Fixes: 16e98d640ab3 ("proc: add /proc/*/stack") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ken Chen <kenchen@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mm/migrate.c: split only transparent huge pages when allocation fails
split_huge_page_to_list() fails on HugeTLB pages. I was experimenting
with moving 32MB contig HugeTLB pages on arm64 (with a debug patch
applied) and hit the following stack trace when the kernel crashed.
When unmap_and_move[_huge_page]() fails due to lack of memory, the
splitting should happen only for transparent huge pages not for HugeTLB
pages. PageTransHuge() returns true for both THP and HugeTLB pages.
Hence the conditonal check should test PagesHuge() flag to make sure that
given pages is not a HugeTLB one.
Link: http://lkml.kernel.org/r/1537798495-4996-1-git-send-email-anshuman.khandual@arm.com Fixes: 21bb047634 ("mm: unclutter THP migration") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kees Cook [Fri, 5 Oct 2018 22:51:48 +0000 (15:51 -0700)]
ipc/shm.c: use ERR_CAST() for shm_lock() error return
This uses ERR_CAST() instead of an open-coded cast, as it is casting
across structure pointers, which upsets __randomize_layout:
ipc/shm.c: In function `shm_lock':
ipc/shm.c:209:9: note: randstruct: casting between randomized structure pointer types (ssa): `struct shmid_kernel' and `struct kern_ipc_perm'
YueHaibing [Fri, 5 Oct 2018 22:51:44 +0000 (15:51 -0700)]
mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl
get_user_pages_fast() will return negative value if no pages were pinned,
then be converted to a unsigned, which is compared to zero, giving the
wrong result.
Link: http://lkml.kernel.org/r/20180921095015.26088-1-yuehaibing@huawei.com Fixes: 2ff85d46710e ("mm/gup_benchmark: handle gup failures") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mm, thp: fix mlocking THP page with migration enabled
A transparent huge page is represented by a single entry on an LRU list.
Therefore, we can only make unevictable an entire compound page, not
individual subpages.
If a user tries to mlock() part of a huge page, we want the rest of the
page to be reclaimable.
We handle this by keeping PTE-mapped huge pages on normal LRU lists: the
PMD on border of VM_LOCKED VMA will be split into PTE table.
Introduction of THP migration breaks[1] the rules around mlocking THP
pages. If we had a single PMD mapping of the page in mlocked VMA, the
page will get mlocked, regardless of PTE mappings of the page.
For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in
remove_migration_pmd().
Anon THP pages can only be shared between processes via fork(). Mlocked
page can only be shared if parent mlocked it before forking, otherwise CoW
will be triggered on mlock().
For Anon-THP, we can fix the issue by munlocking the page on removing PTE
migration entry for the page. PTEs for the page will always come after
mlocked PMD: rmap walks VMAs from oldest to newest.
Larry Chen [Fri, 5 Oct 2018 22:51:37 +0000 (15:51 -0700)]
ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()
ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages
is dirty. When a page has not been written back, it is still in dirty
state. If ocfs2_duplicate_clusters_by_page() is called against the dirty
page, the crash happens.
To fix this bug, we can just unlock the page and wait until the page until
its not dirty.
Mike Kravetz [Fri, 5 Oct 2018 22:51:33 +0000 (15:51 -0700)]
hugetlb: take PMD sharing into account when flushing tlb/caches
When fixing an issue with PMD sharing and migration, it was discovered via
code inspection that other callers of huge_pmd_unshare potentially have an
issue with cache and tlb flushing.
Use the routine adjust_range_if_pmd_sharing_possible() to calculate worst
case ranges for mmu notifiers. Ensure that this range is flushed if
huge_pmd_unshare succeeds and unmaps a PUD_SUZE area.
Link: http://lkml.kernel.org/r/20180823205917.16297-3-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mike Kravetz [Fri, 5 Oct 2018 22:51:29 +0000 (15:51 -0700)]
mm: migration: fix migration of huge PMD shared pages
The page migration code employs try_to_unmap() to try and unmap the source
page. This is accomplished by using rmap_walk to find all vmas where the
page is mapped. This search stops when page mapcount is zero. For shared
PMD huge pages, the page map count is always 1 no matter the number of
mappings. Shared mappings are tracked via the reference count of the PMD
page. Therefore, try_to_unmap stops prematurely and does not completely
unmap all mappings of the source page.
This problem can result is data corruption as writes to the original
source page can happen after contents of the page are copied to the target
page. Hence, data is lost.
This problem was originally seen as DB corruption of shared global areas
after a huge page was soft offlined due to ECC memory errors. DB
developers noticed they could reproduce the issue by (hotplug) offlining
memory used to back huge pages. A simple testcase can reproduce the
problem by creating a shared PMD mapping (note that this must be at least
PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using
migrate_pages() to migrate process pages between nodes while continually
writing to the huge pages being migrated.
To fix, have the try_to_unmap_one routine check for huge PMD sharing by
calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared
mapping it will be 'unshared' which removes the page table entry and drops
the reference on the PMD page. After this, flush caches and TLB.
mmu notifiers are called before locking page tables, but we can not be
sure of PMD sharing until page tables are locked. Therefore, check for
the possibility of PMD sharing before locking so that notifiers can
prepare for the worst possible case.
Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com
[mike.kravetz@oracle.com: make _range_in_vma() a static inline] Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com Fixes: 51baf743f83e ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merge tag 'for-4.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Mike writes:
"device mapper fixes
- Fix a DM thinp __udivdi3 undefined on 32-bit bug introduced during
4.19 merge window.
- Fix leak and dangling pointer in DM multipath's scsi_dh related code.
- A couple stable@ fixes for DM cache's resize support.
- A DM raid fix to remove "const" from decipher_sync_action()'s return
type."
* tag 'for-4.19/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix resize crash if user doesn't reload cache table
dm cache metadata: ignore hints array being too small during resize
dm raid: remove bogus const from decipher_sync_action() return type
dm mpath: fix attached_handler_name leak and dangling hw_handler_name pointer
dm thin metadata: fix __udivdi3 undefined on 32-bit
Merge tag 'pm-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Rafael writes:
"Power management fix for 4.19-rc7
Fix a bug that may cause runtime PM to misbehave for some devices
after a failing or aborted system suspend which is nasty enough for
an -rc7 time frame fix."
* tag 'pm-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / core: Clear the direct_complete flag on errors
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Ingo writes:
"perf fixes:
- fix a CPU#0 hot unplug bug and a PCI enumeration bug in the x86 Intel uncore PMU driver
- fix a CPU event enumeration bug in the x86 AMD PMU driver
- fix a perf ring-buffer corruption bug when using tracepoints
- fix a PMU unregister locking bug"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events
perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
perf/ring_buffer: Prevent concurent ring buffer access
perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0
perf/core: Fix perf_pmu_unregister() locking
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Ingo writes:
"x86 fixes:
Misc fixes:
- fix various vDSO bugs: asm constraints and retpolines
- add vDSO test units to make sure they never re-appear
- fix UV platform TSC initialization bug
- fix build warning on Clang"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Fix vDSO syscall fallback asm constraint regression
x86/cpu/amd: Remove unnecessary parentheses
x86/vdso: Only enable vDSO retpolines when enabled and supported
x86/tsc: Fix UV TSC initialization
x86/platform/uv: Provide is_early_uv_system()
selftests/x86: Add clock_gettime() tests to test_vdso
x86/vdso: Fix asm constraints on vDSO syscall fallbacks
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Ingo writes:
"scheduler fixes:
These fixes address a rather involved performance regression between
v4.17->v4.19 in the sched/numa auto-balancing code. Since distros
really need this fix we accelerated it to sched/urgent for a faster
upstream merge.
NUMA scheduling and balancing performance is now largely back to
v4.17 levels, without reintroducing the NUMA placement bugs that
v4.18 and v4.19 fixed.
Many thanks to Srikar Dronamraju, Mel Gorman and Jirka Hladky, for
reporting, testing, re-testing and solving this rather complex set of
bugs."
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task
mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration
sched/numa: Avoid task migration for small NUMA improvement
mm/migrate: Use spin_trylock() while resetting rate limit
sched/numa: Limit the conditions where scan period is reset
sched/numa: Reset scan rate whenever task moves across nodes
sched/numa: Pass destination CPU as a parameter to migrate_task_rq
sched/numa: Stop multiple tasks from moving to the CPU at the same time
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Ingo writes:
"locking fixes:
A fix in the ww_mutex self-test that produces a scary splat, plus an
updates to the maintained-filed patters in MAINTAINER."
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/ww_mutex: Fix runtime warning in the WW mutex selftest
MAINTAINERS: Remove dead path from LOCKING PRIMITIVES entry
Vijay Khemka [Fri, 5 Oct 2018 17:46:01 +0000 (10:46 -0700)]
net/ncsi: Add NCSI OEM command support
This patch adds OEM commands and response handling. It also defines OEM
command and response structure as per NCSI specification along with its
handlers.
ncsi_cmd_handler_oem: This is a generic command request handler for OEM
commands
ncsi_rsp_handler_oem: This is a generic response handler for OEM commands
Signed-off-by: Vijay Khemka <vijaykhemka@fb.com> Reviewed-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvpp2: Extract the correct ethtype from the skb for tx csum offload
When offloading the L3 and L4 csum computation on TX, we need to extract
the l3_proto from the ethtype, independently of the presence of a vlan
tag.
The actual driver uses skb->protocol as-is, resulting in packets with
the wrong L4 checksum being sent when there's a vlan tag in the packet
header and checksum offloading is enabled.
This commit makes use of vlan_protocol_get() to get the correct ethtype
regardless the presence of a vlan tag.
Fixes: dc37d05f9885 ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Thu, 4 Oct 2018 17:12:37 +0000 (10:12 -0700)]
ipv6: take rcu lock in rawv6_send_hdrinc()
In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
directly assign the dst to skb and set passed in dst to NULL to avoid
double free.
However, in error case, we free skb and then do stats update with the
dst pointer passed in. This causes use-after-free on the dst.
Fix it by taking rcu read lock right before dst could get released to
make sure dst does not get freed until the stats update is done.
Note: we don't have this issue in ipv4 cause dst is not used for stats
update in v4.
Syzkaller reported following crash:
BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088
Fixes: 6e2628c9196a ("raw: avoid two atomics in xmit") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 4 Oct 2018 16:34:39 +0000 (18:34 +0200)]
tc-testing: use a plugin to build eBPF program
use a TDC plugin, instead of building eBPF programs in the 'setup' stage.
'-B' argument can be used to build eBPF programs in $EBPFDIR directory,
in the 'pre-suite' stage. Binaries are then cleaned in 'post-suite' stage.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 4 Oct 2018 16:34:38 +0000 (18:34 +0200)]
tc-testing: fix build of eBPF programs
rely on uAPI headers in the current kernel tree, rather than requiring the
correct version installed on the test system. While at it, group all
sections in a single binary and test the 'section' parameter.
Reported-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mscc: ocelot: add support for SerDes muxing configuration
The Ocelot switch has currently an hardcoded SerDes muxing that suits only
a particular use case. Any other board setup will fail to work.
To prepare for upcoming boards' support that do not have the same muxing,
create a PHY driver that will handle all possible cases.
A SerDes can work in SGMII, QSGMII or PCIe and is also muxed to use a
given port depending on the selected mode or board design.
The SerDes configuration is in the middle of an address space (HSIO) that
is used to configure some parts in the MAC controller driver, that is why
we need to use a syscon so that we can write to the same address space from
different drivers safely using regmap.
This breaks backward compatibility but it's fine because there's only one
board at the moment that is using what's modified in this patch series.
This will break git bisect.
Even though this patch series is about SerDes __muxing__ configuration, the
DT node is named serdes for the simple reason that I couldn't find any
mention to SerDes anywhere else from the address space handled by this
driver.
v4:
- add reviewed-by,
- format the patch series with -M for identifying renamed files,
- add parent info in DT binding of the SerDes IP,
- move to macros SERDES[16]G(X) instead of multiple SERDES[16]G_[012345]
constants,
- move to SERDES[16]G_MAX being the last VALID macro of a type, so
migrate to <= conditions instead of < when iterating,
- create a SERDES_MUX_SGMII and SERDES_MUX_QSGMII macro so the muxing
configurations are a tad more readable,
- use a bunch of unsigned int instead of int,
- return -EOPNOTSUPP for SERDES6G/PCIe until it's supported,
- simplify condition when there is an error code returned by
devm_of_phy_get,
v3:
- add Paul Burton's Acked-By on MIPS patches so that the patch series can
be merged in the net tree in its entirety,
v2:
- use a switch case for setting the phy_mode in the SerDes driver as
suggested by Andrew,
- stop replacing the value of the error pointer in the SerDes driver,
- use a dev_dbg for the deferring of the probe in the SerDes driver,
- use constants in the Device Tree to select the SerDes macro in use with
a port,
- adapt the SerDes driver to use those constants,
- add a header file in include/dt-bindings for the constants,
- fix space/tab issue,
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Schulz [Thu, 4 Oct 2018 12:22:08 +0000 (14:22 +0200)]
net: mscc: ocelot: make use of SerDes PHYs for handling their configuration
Previously, the SerDes muxing was hardcoded to a given mode in the MAC
controller driver. Now, the SerDes muxing is configured within the
Device Tree and is enforced in the MAC controller driver so we can have
a lot of different SerDes configurations.
Make use of the SerDes PHYs in the MAC controller to set up the SerDes
according to the SerDes<->switch port mapping and the communication mode
with the Ethernet PHY.
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Schulz [Thu, 4 Oct 2018 12:22:06 +0000 (14:22 +0200)]
dt-bindings: add constants for Microsemi Ocelot SerDes driver
The Microsemi Ocelot has multiple SerDes and requires that the SerDes be
muxed accordingly to the hardware representation.
Let's add a constant for each SerDes available in the Microsemi Ocelot.
Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Schulz [Thu, 4 Oct 2018 12:22:05 +0000 (14:22 +0200)]
MIPS: mscc: ocelot: add SerDes mux DT node
The Microsemi Ocelot has a set of register for SerDes/switch port muxing
as well as PCIe muxing for a specific SerDes, so let's add the device
and all SerDes in the Device Tree.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Paul Burton <paul.burton@mips.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Schulz [Thu, 4 Oct 2018 12:22:03 +0000 (14:22 +0200)]
phy: add QSGMII and PCIE modes
Prepare for upcoming phys that'll handle QSGMII or PCIe.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Schulz [Thu, 4 Oct 2018 12:22:02 +0000 (14:22 +0200)]
net: mscc: ocelot: simplify register access for PLL5 configuration
Since HSIO address space can be accessed by different drivers, let's
simplify the register address definitions so that it can be easily used
by all drivers and put the register address definition in the
include/soc/mscc/ocelot_hsio.h header file.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Schulz [Thu, 4 Oct 2018 12:22:01 +0000 (14:22 +0200)]
net: mscc: ocelot: move the HSIO header to include/soc
Since HSIO address space can be used by different drivers (PLL, SerDes
muxing, temperature sensor), let's move it somewhere it can be included
by all drivers.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Schulz [Thu, 4 Oct 2018 12:21:59 +0000 (14:21 +0200)]
dt-bindings: net: ocelot: remove hsio from the list of register address spaces
HSIO register address space should be handled outside of the MAC
controller as there are some registers for PLL5 configuring,
SerDes/switch port muxing and a thermal sensor IP, so let's remove it.
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Schulz [Thu, 4 Oct 2018 12:21:58 +0000 (14:21 +0200)]
MIPS: mscc: ocelot: make HSIO registers address range a syscon
HSIO contains registers for PLL5 configuration, SerDes/switch port
muxing and a thermal sensor, hence we can't keep it in the switch DT
node.
Acked-by: Paul Burton <paul.burton@mips.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Sitnicki [Thu, 4 Oct 2018 09:09:40 +0000 (11:09 +0200)]
socket: Tighten no-error check in bind()
move_addr_to_kernel() returns only negative values on error, or zero on
success. Rewrite the error check to an idiomatic form to avoid confusing
the reader.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lance Roy [Thu, 4 Oct 2018 07:46:57 +0000 (00:46 -0700)]
atm: nicstar: Replace spin_is_locked() with spin_trylock()
ns_poll() used spin_is_locked() + spin_lock() to get achieve the same
thing as a spin_trylock(), so simplify it by using that instead. This is
also a step towards possibly removing spin_is_locked().
Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Chas Williams <3chas3@gmail.com> Cc: <linux-atm-general@lists.sourceforge.net> Cc: <netdev@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 3 Oct 2018 22:05:36 +0000 (15:05 -0700)]
net: sched: Add policy validation for tc attributes
A number of TC attributes are processed without proper validation
(e.g., length checks). Add a tca policy for all input attributes and use
when invoking nlmsg_parse.
The 2 Fixes tags below cover the latest additions. The other attributes
are a string (KIND), nested attribute (OPTIONS which does seem to have
validation in most cases), for dumps only or a flag.
Fixes: 78526a0e48a9a ("net: sched: introduce multichain support for filters") Fixes: 2b04c04f5c06d ("net: sched: introduce ingress/egress block index attributes for qdisc") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').
The problem is, the function bails out early if nlmsg_parse() fails, which
does occur for iproute2 usage of 'struct ndmsg' because the payload length
is shorter than the family header alone (as 'struct ifinfomsg' is assumed).
This breaks backward compatibility with userspace -- nothing is sent back.
Some examples with iproute2 and netlink library for go [1]:
1) $ bridge fdb show
33:33:00:00:00:01 dev ens3 self permanent
01:00:5e:00:00:01 dev ens3 self permanent
33:33:ff:15:98:30 dev ens3 self permanent
The actual breakage was introduced by commit e6340ec8b77e ("net: rtnetlink:
bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
if the payload length (with the _actual_ family header) is less than the
family header length alone (which is assumed, in parameter 'hdrlen').
This is true in the examples above with struct ndmsg, with size and payload
length shorter than struct ifinfomsg.
However, that commit just intends to fix something under the assumption the
family header is indeed an 'struct ifinfomsg' - by preventing access to the
payload as such (via 'ifm' pointer) if the payload length is not sufficient
to actually contain it.
The assumption was introduced by commit 5b74d9327a0b ("bridge: netlink dump
interface at par with brctl"), to support iproute2's 'bridge fdb' command
(not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.
So, in order to unbreak the 'struct ndmsg' family headers and still allow
'struct ifinfomsg' to continue to work, check for the known message sizes
used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
not used in this function anyway) then do not parse the data as ifinfomsg.
Same examples with this patch applied (or revert/before the original fix):
$ bridge fdb show
33:33:00:00:00:01 dev ens3 self permanent
01:00:5e:00:00:01 dev ens3 self permanent
33:33:ff:15:98:30 dev ens3 self permanent
$ ip --family bridge neigh
dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT
Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068).
References:
[1] netlink library for go (test-case)
https://github.com/vishvananda/netlink
$ cat ~/go/src/neighlist/main.go
package main
import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
func main() {
neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
}
$ export GOPATH=~/go
$ go get github.com/vishvananda/netlink
$ go build neighlist
$ ~/go/src/neighlist/neighlist
Thanks to David Ahern for suggestions to improve this patch.
Fixes: e6340ec8b77e ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error") Fixes: 5b74d9327a0b ("bridge: netlink dump interface at par with brctl") Reported-by: Aidan Obley <aobley@pivotal.io> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Oct 2018 19:01:55 +0000 (12:01 -0700)]
Merge branch 'hns3-Unicast-MAC-VLAN-table'
Salil Mehta says:
====================
Fixes, minor changes & cleanups for the Unicast MAC VLAN table
This patch-set presents necessary modifications, fixes & cleanups related
to Unicast MAC Vlan Table to support revision 0x21 hardware.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Oct 2018 17:03:29 +0000 (18:03 +0100)]
net: hns3: Fix for rx vlan id handle to support Rev 0x21 hardware
In revision 0x20, we use vlan id != 0 to check whether a vlan tag
has been offloaded, so vlan id 0 is not supported.
In revision 0x21, rx buffer descriptor adds two bits to indicate
whether one or more vlan tags have been offloaded, so vlan id 0
is valid now.
This patch seperates the handle for vlan id 0, add vlan id 0 support
for revision 0x21.
Fixes: ac73085dd8df ("net: hns3: Add STRP_TAGP field support for hardware revision 0x21") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhongzhu Liu [Fri, 5 Oct 2018 17:03:28 +0000 (18:03 +0100)]
net: hns3: Add egress/ingress vlan filter for revision 0x21
In revision 0x21, hw supports both ingress and egress vlan filter.
This patch adds support for it.
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Oct 2018 17:03:27 +0000 (18:03 +0100)]
net: hns3: Drop depricated mta table support
For mta table support has been dropped, remove the code for mta table.
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Oct 2018 17:03:26 +0000 (18:03 +0100)]
net: hns3: Optimize for unicast mac vlan table
In previously implement for unicast mac vlan table, the space is
shared by all the functions, driver does nothing when the space is
exhausted. This patch preallocates the space of unicast mac vlan
table for each function by software. Each function can only use its
private space and available shared space, avoiding single function
exhausts too much space, and other functions are unable to add
unicast mac address.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Oct 2018 17:03:25 +0000 (18:03 +0100)]
net: hns3: Clear mac vlan table entries when unload driver or function reset
In original codes, the mac vlan table entries are not cleared when
unload hns3 driver. The dirty mac vlan table entries will make the
result of looking up mac vlan table being unexpected.
When doing core reset or global reset, the firmware will clear all
the tables for driver, and driver shouldn't send any commands to
firmware during reset. But when doing function reset, the driver
needs to clear the tables itself.
This patch clears the mac vlan table entries for each client when
unload driver or reset.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Oct 2018 17:03:24 +0000 (18:03 +0100)]
net: hns3: Remove the default mask configuration for mac vlan table
The default mask configuration has been done by firmware, so the driver
doesn't need to do it any more.
Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 5 Oct 2018 17:01:19 +0000 (10:01 -0700)]
fib_tests: Add tests for invalid metric on route
Add ipv4 and ipv6 test cases with an invalid metrics option causing
ip_metrics_convert to fail. Tests clean up path during route add.
Also, add nodad to to ipv6 address add. When running ipv6_route_metrics
directly seeing an occasional failure on the "Using route with mtu metric"
test case.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 5 Oct 2018 16:17:50 +0000 (09:17 -0700)]
ipv6: do not leave garbage in rt->fib6_metrics
In case ip_fib_metrics_init() returns an error, we better
rewrite rt->fib6_metrics with &dst_default_metrics so that
we do not crash later in ip_fib_metrics_put()
Fixes: f907f93192ff ("net: common metrics init helper for FIB entries") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wenwen Wang [Fri, 5 Oct 2018 15:59:36 +0000 (10:59 -0500)]
yam: fix a missing-check bug
In yam_ioctl(), the concrete ioctl command is firstly copied from the
user-space buffer 'ifr->ifr_data' to 'ioctl_cmd' and checked through the
following switch statement. If the command is not as expected, an error
code EINVAL is returned. In the following execution the buffer
'ifr->ifr_data' is copied again in the cases of the switch statement to
specific data structures according to what kind of ioctl command is
requested. However, after the second copy, no re-check is enforced on the
newly-copied command. Given that the buffer 'ifr->ifr_data' is in the user
space, a malicious user can race to change the command between the two
copies. This way, the attacker can inject inconsistent data and cause
undefined behavior.
This patch adds a re-check in each case of the switch statement if there is
a second copy in that case, to re-check whether the command obtained in the
second copy is the same as the one in the first copy. If not, an error code
EINVAL will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Shanthosh RK [Fri, 5 Oct 2018 15:27:48 +0000 (20:57 +0530)]
net: bpfilter: Fix type cast and pointer warnings
Fixes the following Sparse warnings:
net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
of expression
net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
NULL pointer
Signed-off-by: Shanthosh RK <shanthosh.rk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wenwen Wang [Fri, 5 Oct 2018 13:48:27 +0000 (08:48 -0500)]
net: cxgb3_main: fix a missing-check bug
In cxgb_extension_ioctl(), the command of the ioctl is firstly copied from
the user-space buffer 'useraddr' to 'cmd' and checked through the
switch statement. If the command is not as expected, an error code
EOPNOTSUPP is returned. In the following execution, i.e., the cases of the
switch statement, the whole buffer of 'useraddr' is copied again to a
specific data structure, according to what kind of command is requested.
However, after the second copy, there is no re-check on the newly-copied
command. Given that the buffer 'useraddr' is in the user space, a malicious
user can race to change the command between the two copies. By doing so,
the attacker can supply malicious data to the kernel and cause undefined
behavior.
This patch adds a re-check in each case of the switch statement if there is
a second copy in that case, to re-check whether the command obtained in the
second copy is the same as the one in the first copy. If not, an error code
EINVAL is returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix to truncate input on ALU operations in 32 bit mode, from Jann.
2) Fixes for cgroup local storage to reject reserved flags on element
update and rejection of map allocation with zero-sized value, from Roman.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
gigaset: asyncdata: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case, I replaced the
" --v-- fall through --v-- " comment with a proper
"fall through", which is what GCC is expecting to find.
Addresses-Coverity-ID: 1364476 ("Missing break in switch")
Addresses-Coverity-ID: 1364477 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Fri, 5 Oct 2018 09:34:45 +0000 (15:04 +0530)]
cxgb4: use FW_PORT_ACTION_L1_CFG32 for 32 bit capability
when 32 bit port capability is in use, use FW_PORT_ACTION_L1_CFG32
rather than FW_PORT_ACTION_L1_CFG.
Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jann Horn [Fri, 5 Oct 2018 16:17:59 +0000 (18:17 +0200)]
bpf: 32-bit RSH verification must truncate input before the ALU op
When I wrote commit a0e6193c2fad ("bpf: fix 32-bit ALU op verification"), I
assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
is sufficient to just truncate the output to 32 bits; and so I just moved
the register size coercion that used to be at the start of the function to
the end of the function.
That assumption is true for almost every op, but not for 32-bit right
shifts, because those can propagate information towards the least
significant bit. Fix it by always truncating inputs for 32-bit ops to 32
bits.
Also get rid of the coerce_reg_to_size() after the ALU op, since that has
no effect.
Fixes: a0e6193c2fad ("bpf: fix 32-bit ALU op verification") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Merge tag 'iommu-fixes-v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Joerg writes:
"IOMMU Fix for Linux v4.19-rc6
One important fix:
- Fix a memory leak with AMD IOMMU when SME is active and a VM
has assigned devices. In that case the complete guest memory
will be leaked without this fix."
* tag 'iommu-fixes-v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Clear memory encryption mask from physical address
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Paolo writes:
"KVM changes for 4.19-rc7
x86 and PPC bugfixes, mostly introduced in 4.19-rc1."
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: nVMX: fix entry with pending interrupt if APICv is enabled
KVM: VMX: hide flexpriority from guest when disabled at the module level
KVM: VMX: check for existence of secondary exec controls before accessing
KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault
KVM: x86: fix L1TF's MMIO GFN calculation
tools/kvm_stat: cut down decimal places in update interval dialog
KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS
KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly
KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled
KVM: x86: never trap MSR_KERNEL_GS_BASE
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Herbert writes:
"Crypto Fixes for 4.19
This push fixes the following issues:
- Out-of-bound stack access in qat.
- Illegal schedule in mxs-dcp.
- Memory corruption in chelsio.
- Incorrect pointer computation in caam."
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: qat - Fix KASAN stack-out-of-bounds bug in adf_probe()
crypto: mxs-dcp - Fix wait logic on chan threads
crypto: chelsio - Fix memory corruption in DMA Mapped buffers.
crypto: caam/jr - fix ablkcipher_edesc pointer arithmetic
Merge tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Steve writes:
"SMB3 fixes
four small SMB3 fixes: one for stable, the others to address a more
recent regression"
* tag '4.19-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix lease break problem introduced by compounding
cifs: only wake the thread for the very last PDU in a compound
cifs: add a warning if we try to to dequeue a deleted mid
smb2: fix missing files in root share directory listing
Singh, Brijesh [Thu, 4 Oct 2018 21:40:23 +0000 (21:40 +0000)]
iommu/amd: Clear memory encryption mask from physical address
Boris Ostrovsky reported a memory leak with device passthrough when SME
is active.
The VFIO driver uses iommu_iova_to_phys() to get the physical address for
an iova. This physical address is later passed into vfio_unmap_unpin() to
unpin the memory. The vfio_unmap_unpin() uses pfn_valid() before unpinning
the memory. The pfn_valid() check was failing because encryption mask was
part of the physical address returned. This resulted in the memory not
being unpinned and therefore leaked after the guest terminates.
The memory encryption mask must be cleared from the physical address in
iommu_iova_to_phys().
Fixes: be30f50e64df ("iommu/amd: Allow the AMD IOMMU to work with memory encryption") Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: <iommu@lists.linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
When connecting SFP PHY to phylink use the detected interface.
Otherwise, the link fails to come up when the configured 'phy-mode'
differs from the SFP detected mode.
Move most of phylink_connect_phy() into __phylink_connect_phy(), and
leave phylink_connect_phy() as a wrapper. phylink_sfp_connect_phy() can
now pass the SFP detected PHY interface to __phylink_connect_phy().
This fixes 1GB SFP module link up on eth3 of the Macchiatobin board that
is configured in the DT to "2500base-x" phy-mode.
Fixes: 81485403f4ecc ("phylink: add phylink infrastructure") Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Wed, 3 Oct 2018 13:20:58 +0000 (15:20 +0200)]
be2net: don't flip hw_features when VXLANs are added/deleted
the be2net implementation of .ndo_tunnel_{add,del}() changes the value of
NETIF_F_GSO_UDP_TUNNEL bit in 'features' and 'hw_features', but it forgets
to call netdev_features_change(). Moreover, ethtool setting for that bit
can potentially be reverted after a tunnel is added or removed.
GSO already does software segmentation when 'hw_enc_features' is 0, even
if VXLAN offload is turned on. In addition, commit a47ec8daf138 ("benet:
stricter vxlan offloading check in be_features_check") avoids hardware
segmentation of non-VXLAN tunneled packets, or VXLAN packets having wrong
destination port. So, it's safe to avoid flipping the above feature on
addition/deletion of VXLAN tunnels.
Fixes: 100180c8bb78 ("be2net: Export tunnel offloads only when a VxLAN tunnel is created") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Bonzini [Fri, 5 Oct 2018 07:39:53 +0000 (09:39 +0200)]
Merge tag 'kvm-ppc-fixes-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master
Third set of PPC KVM fixes for 4.19
One patch here, fixing a potential host crash introduced (or at least
exacerbated) by a previous fix for corruption relating to radix guest
page faults and THP operations.
Cong Wang [Tue, 2 Oct 2018 19:50:19 +0000 (12:50 -0700)]
net_sched: convert idrinfo->lock from spinlock to a mutex
In commit 8eef071c358f ("net_sched: change tcf_del_walker() to take idrinfo->lock")
we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
with the spinlock held. Unfortunately, this causes a lot of
troubles here:
1. tcf_chain_destroy() could be called right after we queue the work
but before the work runs. This is a use-after-free.
2. The chain refcnt is already 0, we can't even just hold it again.
We can check refcnt==1 but it is ugly.
3. The chain with refcnt 0 is still visible in its block, which means
it could be still found and used!
4. The block has a refcnt too, we can't hold it without introducing a
proper API either.
We can make it working but the end result is ugly. Instead of wasting
time on reviewing it, let's just convert the troubling spinlock to
a mutex, which allows us to use non-atomic allocations too.
Fixes: 8eef071c358f ("net_sched: change tcf_del_walker() to take idrinfo->lock") Reported-by: Ido Schimmel <idosch@idosch.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 3 Oct 2018 22:33:12 +0000 (15:33 -0700)]
net/neigh: Extend dump filter to proxy neighbor dumps
Move the attribute parsing from neigh_dump_table to neigh_dump_info, and
pass the filter arguments down to neigh_dump_table in a new struct. Add
the filter option to proxy neigh dumps as well to make them consistent.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jianfeng Tan [Sat, 29 Sep 2018 15:41:27 +0000 (15:41 +0000)]
net/packet: fix packet drop as of virtio gso
When we use raw socket as the vhost backend, a packet from virito with
gso offloading information, cannot be sent out in later validaton at
xmit path, as we did not set correct skb->protocol which is further used
for looking up the gso function.
To fix this, we set this field according to virito hdr information.
Fixes: a7a35bbe1312cc ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Jianfeng Tan <jianfeng.tan@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Oct 2018 04:54:45 +0000 (21:54 -0700)]
Merge branch 'net-metrics-consolidate'
David Ahern says:
====================
net: Consolidate metrics handling for ipv4 and ipv6
As part of the IPv6 fib info refactoring, the intent was to make metrics
handling for ipv6 identical to ipv4. One oversight in ip6_dst_destroy
led to confusion and a couple of incomplete attempts at finding and
fixing the resulting memory leak which was ultimately resolved by f7bca8368265 ("ipv6: fix memory leak on dst->_metrics").
Refactor metrics hanlding make the code really identical for v4 and v6,
and add a few test cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 5 Oct 2018 03:07:55 +0000 (20:07 -0700)]
fib_tests: Add tests for metrics on routes
Add ipv4 and ipv6 test cases for metrics (mtu) when fib entries are
created. Can be used with kmemleak to see leaks with both fib entries
and dst_entry.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 5 Oct 2018 03:07:51 +0000 (20:07 -0700)]
net: common metrics init helper for FIB entries
Consolidate initialization of ipv4 and ipv6 metrics when fib entries
are created into a single helper, ip_fib_metrics_init, that handles
the call to ip_metrics_convert.
If no metrics are defined for the fib entry, then the metrics is set
to dst_default_metrics.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: b53: Keep CPU port as tagged in all VLANs
Commit a49062fedac3 ("net: dsa: b53: Stop using dev->cpu_port
incorrectly") was a bit too trigger happy in removing the CPU port from
the VLAN membership because we rely on DSA to program the CPU port VLAN,
which it does, except it does not bother itself with tagged/untagged and
just usese untagged.
Having the CPU port "follow" the user ports tagged/untagged is not great
and does not allow for properly differentiating, so keep the CPU port
tagged in all VLANs.
Reported-by: Gerhard Wiesinger <lists@wiesinger.com> Fixes: a49062fedac3 ("net: dsa: b53: Stop using dev->cpu_port incorrectly") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Oct 2018 04:41:16 +0000 (21:41 -0700)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: Misc. bug fixes.
4 small bug fixes related to setting firmware message enables bits, possible
memory leak when probe fails, and ring accouting when RDMA driver is loaded.
Please queue these for -stable as well. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_en: get the reduced max_irqs by the ones used by RDMA
When getting the max rings supported, get the reduced max_irqs
by the ones used by RDMA.
If the number MSIX is the limiting factor, this bug may cause the
max ring count to be higher than it should be when RDMA driver is
loaded and may result in ring allocation failures.
Fixes: 5a38b3b35249 ("bnxt_en: Do not modify max IRQ count after RDMA driver requests/frees IRQs.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Venkat Duvvuru [Fri, 5 Oct 2018 04:26:02 +0000 (00:26 -0400)]
bnxt_en: free hwrm resources, if driver probe fails.
When the driver probe fails, all the resources that were allocated prior
to the failure must be freed. However, hwrm dma response memory is not
getting freed.
This patch fixes the problem described above.
Fixes: dfe047769e71 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request
In HWRM_QUEUE_COS2BW_CFG request, enables field should have the bits
set only for the queue ids which are having the valid parameters.
This causes firmware to return error when the TC to hardware CoS queue
mapping is not 1:1 during DCBNL ETS setup.
Fixes: ffbe6db0c9b4 ("bnxt_en: Add TC to hardware QoS queue mapping logic.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 5 Oct 2018 04:26:00 +0000 (00:26 -0400)]
bnxt_en: Fix VNIC reservations on the PF.
The enables bit for VNIC was set wrong when calling the HWRM_FUNC_CFG
firmware call to reserve VNICs. This has the effect that the firmware
will keep a large number of VNICs for the PF, and having very few for
VFs. DPDK driver running on the VFs, which requires more VNICs, may not
work properly as a result.
Fixes: fa47877e6282 ("bnxt_en: Implement new method to reserve rings.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'drm-fixes-2018-10-05' of git://anongit.freedesktop.org/drm/drm
Dave writes:
"amdgpu and two core fixes
Two fixes for amdgpu:
one corrects a use of process->mm
one fix for display code race condition that can result in a crash
Two core fixes:
One for a use-after-free in the leasing code
One for a cma/fbdev crash."
* tag 'drm-fixes-2018-10-05' of git://anongit.freedesktop.org/drm/drm:
drm/amdkfd: Fix incorrect use of process->mm
drm/amd/display: Signal hw_done() after waiting for flip_done()
drm/cma-helper: Fix crash in fbdev error path
drm: fix use-after-free read in drm_mode_create_lease_ioctl()
Ido Schimmel [Mon, 1 Oct 2018 09:21:59 +0000 (12:21 +0300)]
team: Forbid enslaving team device to itself
team's ndo_add_slave() acquires 'team->lock' and later tries to open the
newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
that causes the VLAN driver to add VLAN 0 on the team device. team's
ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
deadlock.
Fix this by checking early at the enslavement function that a team
device is not being enslaved to itself.
A similar check was added to the bond driver in commit a059e6578f9c
("bonding: disallow enslaving a bond to itself").
WARNING: possible recursive locking detected
4.18.0-rc7+ #176 Not tainted
--------------------------------------------
syz-executor4/6391 is trying to acquire lock:
(____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
but task is already holding lock:
(____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947
other info that might help us debug this:
Possible unsafe locking scenario:
Fixes: 5217d87602f3 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-and-tested-by: syzbot+bd051aba086537515cdb@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Sat, 29 Sep 2018 15:06:29 +0000 (23:06 +0800)]
geneve: allow to clear ttl inherit
As Michal remaind, we should allow to clear ttl inherit. Then we will
have three states:
1. set the flag, and do ttl inherit.
2. do not set the flag, use configured ttl value, or default ttl (0) if
not set.
3. disable ttl inherit, use previous configured ttl value, or default ttl (0).
Fixes: 26fb58432a333 ("geneve: add ttl inherit support") CC: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tc: Add support for configuring the taprio scheduler
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.
The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:
sched-entry <CMD> <GATE MASK> <INTERVAL>
The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.
This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.
The other parameters can be defined as follows:
- base-time: specifies the instant when the schedule starts, if
'base-time' is a time in the past, the schedule will start at
base-time + (N * cycle-time)
where N is the smallest integer so the resulting time is greater
than "now", and "cycle-time" is the sum of all the intervals of the
entries in the schedule;
- clockid: specifies the reference clock to be used;
The parameters should be similar to what the IEEE 802.1Q family of
specification defines.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset adds support for 3 generic and 1 driver-specific devlink
parameters. Add documentation for these configuration parameters.
Also, this patchset adds support to return proper error code if
HWRM_NVM_GET/SET_VARIABLE commands return error code
HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED.
v3->v4:
-Remove extra definition of NVM_OFF_HW_TC_OFFLOAD from bnxt_devlink.h
-Remove type information for generic parameters from
devlink-params-bnxt.txt
v2->v3:
-Remove description of generic parameters from devlink-params-bnxt.txt
v1->v2:
-Remove hw_tc_offload parameter.
-Update all patches with Cc of MAINTAINERS.
-Add more description in commit message for device specific parameter.
-Add a new Documentation/networking/devlink-params.txt with some
generic devlink parameters information.
-Add a new Documentation/networking/devlink-params-bnxt.txt with devlink
parameters information that are supported by bnxt_en driver.
====================
Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new file to add information about configuration
parameters that are supported by bnxt_en driver via devlink.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Jiri Pirko <jiri@mellanox.com> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new file to add information about some of the
generic configuration parameters set via devlink.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Jiri Pirko <jiri@mellanox.com> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>