Thomas Petazzoni [Sun, 13 Aug 2017 21:14:58 +0000 (23:14 +0200)]
sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus()
When building the kernel for Sparc using gcc 7.x, the build fails
with:
arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’:
arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
cmd |= PCI_COMMAND_IO;
^~
I.e, the code assumes that pcic_read_config() will always initialize
cmd. But it's not the case. Looking at pcic_read_config(), if
bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will
not be initialized.
As a simple fix, we initialize cmd to zero at the beginning of
pcibios_fixup_bus.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Fri, 18 Aug 2017 07:23:24 +0000 (15:23 +0800)]
net: sched: Add the invalid handle check in qdisc_class_find
Add the invalid handle "0" check to avoid unnecessary search, because
the qdisc uses the skb->priority as the handle value to look up, and
it is "0" usually.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Mon, 21 Aug 2017 15:59:30 +0000 (17:59 +0200)]
tipc: don't reset stale broadcast send link
When the broadcast send link after 100 attempts has failed to
transfer a packet to all peers, we consider it stale, and reset
it. Thereafter it needs to re-synchronize with the peers, something
currently done by just resetting and re-establishing all links to
all peers. This has turned out to be overkill, with potentially
unwanted consequences for the remaining cluster.
A closer analysis reveals that this can be done much simpler. When
this kind of failure happens, for reasons that may lie outside the
TIPC protocol, it is typically only one peer which is failing to
receive and acknowledge packets. It is hence sufficient to identify
and reset the links only to that peer to resolve the situation, without
having to reset the broadcast link at all. This solution entails a much
lower risk of negative consequences for the own node as well as for
the overall cluster.
We implement this change in this commit.
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 21 Aug 2017 20:30:36 +0000 (13:30 -0700)]
Merge tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- PAE40 related updates
- SLC errata for region ops
- intc line masking by default
* tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
arc: Mask individual IRQ lines during core INTC init
ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC
ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
ARC: dma: implement dma_unmap_page and sg variant
ARCv2: SLC: Make sure busy bit is set properly for region ops
ARC: [plat-sim] Include this platform unconditionally
ARC: [plat-axs10x]: prepare dts files for enabling PAE40 on axs103
ARC: defconfig: Cleanup from old Kconfig options
2) Fix timer access to freed object in dccp, from Eric Dumazet.
3) Use kmalloc_array() in ptr_ring to avoid overflow cases which are
triggerable by userspace. Also from Eric Dumazet.
4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin Ian
King.
5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.
6) Fix use after free in TIPC, from Eric Dumazet.
7) When replacing a route in ipv6 we need to reset the round robin
pointer, from Wei Wang.
8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
relaxed ordering changes, from Thierry Redding. I made sure to get
an explicit ACK from Bjorn this time around :-)
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: repair fib6 tree in failure case
net_sched: fix order of queue length updates in qdisc_replace()
tools lib bpf: improve warning
switchdev: documentation: minor typo fixes
bpf, doc: also add s390x as arch to sysctl description
net: sched: fix NULL pointer dereference when action calls some targets
rxrpc: Fix oops when discarding a preallocated service call
irda: do not leak initialized list.dev to userspace
net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
PCI: Allow PCI express root ports to find themselves
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
ipv6: reset fn->rr_ptr when replacing route
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
tipc: fix use-after-free
tun: handle register_netdevice() failures properly
datagram: When peeking datagrams with offset < 0 don't skip empty skbs
bpf, doc: improve sysctl knob description
netxen: fix incorrect loop counter decrement
nfp: fix infinite loop on umapping cleanup
...
Oleg Nesterov [Mon, 21 Aug 2017 15:35:02 +0000 (17:35 +0200)]
pids: make task_tgid_nr_ns() safe
This was reported many times, and this was even mentioned in commit 05bd4ec0ab9f ("pids: refactor vnr/nr_ns helpers to make them safe") but
somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns() is
not safe because task->group_leader points to nowhere after the exiting
task passes exit_notify(), rcu_read_lock() can not help.
We really need to change __unhash_process() to nullify group_leader,
parent, and real_parent, but this needs some cleanups. Until then we
can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and
fix the problem.
David Lamparter [Fri, 18 Aug 2017 12:31:35 +0000 (14:31 +0200)]
net: check type when freeing metadata dst
Commit c0d65610445f ("net: store port/representator id in metadata_dst")
added a new type field to metadata_dst, but metadata_dst_free() wasn't
updated to check it before freeing the METADATA_IP_TUNNEL specific dst
cache entry.
This is not currently causing problems since it's far enough back in the
struct to be zeroed for the only other type currently in existance
(METADATA_HW_PORT_MUX), but nevertheless it's not correct.
Fixes: c0d65610445f ("net: store port/representator id in metadata_dst") Signed-off-by: David Lamparter <equinox@diac24.net> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Sridhar Samudrala <sridhar.samudrala@intel.com> Cc: Simon Horman <horms@verge.net.au> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 17 Aug 2017 19:17:20 +0000 (12:17 -0700)]
net: ipv6: put host and anycast routes on device with address
One nagging difference between ipv4 and ipv6 is host routes for ipv6
addresses are installed using the loopback device or VRF / L3 Master
device. e.g.,
2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium
local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium
Using the loopback device is convenient -- necessary for local tx, but
has some nasty side effects, most notably setting the 'lo' device down
causes all host routes for all local IPv6 address to be removed from the
FIB and completely breaks IPv6 networking across all interfaces.
This patch puts FIB entries for IPv6 routes against the device. This
simplifies the routes in the FIB, for example by making dst->dev and
rt6i_idev->dev the same (a future patch can look at removing the device
reference taken for rt6i_idev for FIB entries).
When copies are made on FIB lookups, the cloned route has dst->dev
set to loopback (or the L3 master device). This is needed for the
local Tx of packets to local addresses.
With fib entries allocated against the real network device, the addrconf
code that reinserts host routes on admin up of 'lo' is no longer needed.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Fri, 18 Aug 2017 23:40:33 +0000 (16:40 -0700)]
MIPS,bpf: Cache value of BPF_OP(insn->code) in eBPF JIT.
The code looks a little cleaner if we replace BPF_OP(insn->code) with
the local variable bpf_op. Caching the value this way also saves 300
bytes (about 1%) in the code size of the JIT.
Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bhumika Goyal [Mon, 21 Aug 2017 11:43:10 +0000 (17:13 +0530)]
qlogic: make device_attribute const
Make these const as they are only passed as an argument to the
function device_create_file and device_remove_file and the corresponding
arguments are of type const.
Done using Coccinelle
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Support RX checksum with IPsec crypto offload for esp4/esp6.
From Ilan Tayari.
2) Fixup IPv6 checksums when doing IPsec crypto offload.
From Yossi Kuperman.
3) Auto load the xfrom offload modules if a user installs
a SA that requests IPsec offload. From Ilan Tayari.
4) Clear RX offload informations in xfrm_input to not
confuse the TX path with stale offload informations.
From Ilan Tayari.
5) Allow IPsec GSO for local sockets if the crypto operation
will be offloaded.
6) Support setting of an output mark to the xfrm_state.
This mark can be used to to do the tunnel route lookup.
From Lorenzo Colitti.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Current max_register setting breaks reading nvram on certain chips and
also reading the standard registers on RX8130 where register map starts
at 0x10.
Rick Farrington [Sat, 19 Aug 2017 01:21:49 +0000 (18:21 -0700)]
liquidio: fix use of pf in pass-through mode in a virtual machine
Fix problem when PF is used in pass-through mode in a VM (w/embedded f/w).
If host error reading PF num from CN23XX_PCIE_SRIOV_FDL reg,
try to retrieve PF num from SLI_PKT(0)_INPUT_CONTROL (initialized by f/w).
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Sat, 19 Aug 2017 00:14:49 +0000 (17:14 -0700)]
ipv6: repair fib6 tree in failure case
In fib6_add(), it is possible that fib6_add_1() picks an intermediate
node and sets the node's fn->leaf to NULL in order to add this new
route. However, if fib6_add_rt2node() fails to add the new
route for some reason, fn->leaf will be left as NULL and could
potentially cause crash when fn->leaf is accessed in fib6_locate().
This patch makes sure fib6_repair_tree() is called to properly repair
fn->leaf in the above failure case.
net_sched: fix order of queue length updates in qdisc_replace()
This important to call qdisc_tree_reduce_backlog() after changing queue
length. Parent qdisc should deactivate class in ->qlen_notify() called from
qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
Missed class deactivations leads to crashes/warnings at picking packets
from empty qdisc and corrupting state at reactivating this class in future.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: a0f7d0a652b6 ("net_sched: introduce qdisc_replace() helper") Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ibm: emac: Fix some error handling path in 'emac_probe()'
If 'irq_of_parse_and_map()' or 'of_address_to_resource()' fail, 'err' is
known to be 0 at this point.
So return -ENODEV instead in the first case and use 'of_iomap()' instead of
the equivalent 'of_address_to_resource()/ioremap()' combinaison in the 2nd
case.
Doing so, the 'rsrc_regs' field of the 'emac_instance struct' becomes
redundant and is removed.
While at it, turn a 'err != 0' test into an equivalent 'err' to be more
consistent.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Sun, 20 Aug 2017 08:45:51 +0000 (14:15 +0530)]
cxgb4/cxgbvf: Handle 32-bit fw port capabilities
Implement new 32-bit Firmware Port Capabilities in order to
handle new speeds which couldn't be represented in the old 16-bit
Firmware Port Capabilities values.
Based on the original work of Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Leblond [Sun, 20 Aug 2017 19:48:14 +0000 (21:48 +0200)]
tools lib bpf: improve warning
Signed-off-by: Eric Leblond <eric@regit.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 20 Aug 2017 23:48:12 +0000 (01:48 +0200)]
bpf: fix double free from dev_map_notification()
In the current code, dev_map_free() can still race with dev_map_notification().
In dev_map_free(), we remove dtab from the list of dtabs after we purged
all entries from it. However, we don't do xchg() with NULL or the like,
so the entry at that point is still pointing to the device. If a unregister
notification comes in at the same time, we therefore risk a double-free,
since the pointer is still present in the map, and then pushed again to
__dev_map_entry_free().
All this is completely unnecessary. Just remove the dtab from the list
right before the synchronize_rcu(), so all outstanding readers from the
notifier list have finished by then, thus we don't need to deal with this
corner case anymore and also wouldn't need to nullify dev entires. This is
fine because we iterate over the map releasing all entries and therefore
dev references anyway.
Fixes: e62b945d0e85 ("bpf: devmap fix mutex in rcu critical section") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 20 Aug 2017 22:26:03 +0000 (00:26 +0200)]
bpf, doc: also add s390x as arch to sysctl description
Looks like this was accidentally missed, so still add s390x
as supported eBPF JIT arch to bpf_jit_enable.
Fixes: c42299f11e48 ("bpf: Update sysctl documentation to list all supported architectures") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 20 Aug 2017 20:26:27 +0000 (13:26 -0700)]
Sanitize 'move_pages()' permission checks
The 'move_paghes()' system call was introduced long long ago with the
same permission checks as for sending a signal (except using
CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).
That turns out to not be a great choice - while the system call really
only moves physical page allocations around (and you need other
capabilities to do a lot of it), you can check the return value to map
out some the virtual address choices and defeat ASLR of a binary that
still shares your uid.
So change the access checks to the more common 'ptrace_may_access()'
model instead.
This tightens the access checks for the uid, and also effectively
changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
anybody really _uses_ this legacy system call any more (we hav ebetter
NUMA placement models these days), so I expect nobody to notice.
Famous last words.
Reported-by: Otto Ebeling <otto.ebeling@iki.fi> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Willy Tarreau <w@1wt.eu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 20 Aug 2017 16:36:52 +0000 (09:36 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Another pile of small fixes and updates for x86:
- Plug a hole in the SMAP implementation which misses to clear AC on
NMI entry
- Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line
parameter works correctly again
- Use the proper accessor in the startup64 code for next_early_pgt to
prevent accessing of invalid addresses and faulting in the early
boot code.
- Prevent CPU hotplug lock recursion in the MTRR code
- Unbreak CPU0 hotplugging
- Rename overly long CPUID bits which got introduced in this cycle
- Two commits which mark data 'const' and restrict the scope of data
and functions to file scope by making them 'static'"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Constify attribute_group structures
x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'
x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks
x86: Fix norandmaps/ADDR_NO_RANDOMIZE
x86/mtrr: Prevent CPU hotplug lock recursion
x86: Mark various structures and functions as 'static'
x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag
x86/smpboot: Unbreak CPU0 hotplug
x86/asm/64: Clear AC on NMI entries
Linus Torvalds [Sun, 20 Aug 2017 16:20:57 +0000 (09:20 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for the perf subsystem:
- Fix an inconsistency of RDPMC mm struct tagging across exec() which
causes RDPMC to fault.
- Correct the timestamp mechanics across IOC_DISABLE/ENABLE which
causes incorrect timestamps and total time calculations"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix time on IOC_ENABLE
perf/x86: Fix RDPMC vs. mm_struct tracking
Linus Torvalds [Sun, 20 Aug 2017 16:07:56 +0000 (09:07 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A pile of smallish changes all over the place:
- Add a missing ISB in the GIC V1 driver
- Remove an ACPI version check in the GIC V3 ITS driver
- Add the missing irq_pm_shutdown function for BRCMSTB-L2 to avoid
spurious wakeups
- Remove the artifical limitation of ITS instances to the number of
NUMA nodes which prevents utilizing the ITS hardware correctly
- Prevent a infinite parsing loop in the GIC-V3 ITS/MSI code
- Honour the force affinity argument in the GIC-V3 driver which is
required to make perf work correctly
- Correctly report allocation failures in GIC-V2/V3 to avoid using
half allocated and initialized interrupts.
- Fixup checks against nr_cpu_ids in the generic IPI code"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/ipi: Fixup checks against nr_cpu_ids
genirq: Restore trigger settings in irq_modify_status()
MAINTAINERS: Remove Jason Cooper's irqchip git tree
irqchip/gic-v3-its-platform-msi: Fix msi-parent parsing loop
irqchip/gic-v3-its: Allow GIC ITS number more than MAX_NUMNODES
irqchip: brcmstb-l2: Define an irq_pm_shutdown function
irqchip/gic: Ensure we have an ISB between ack and ->handle_irq
irqchip/gic-v3-its: Remove ACPICA version check for ACPI NUMA
irqchip/gic-v3: Honor forced affinity setting
irqchip/gic-v3: Report failures in gic_irq_domain_alloc
irqchip/gic-v2: Report failures in gic_irq_domain_alloc
irqchip/atmel-aic: Remove root argument from ->fixup() prototype
irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()
Linus Torvalds [Sun, 20 Aug 2017 15:54:30 +0000 (08:54 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog fix from Thomas Gleixner:
"A fix for the hardlockup watchdog to prevent false positives with
extreme Turbo-Modes which make the perf/NMI watchdog fire faster than
the hrtimer which is used to verify.
Slightly larger than the minimal fix, which just would increase the
hrtimer frequency, but comes with extra overhead of more watchdog
timer interrupts and thread wakeups for all users.
With this change we restrict the overhead to the extreme Turbo-Mode
systems"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kernel/watchdog: Prevent false positives with turbo modes
Add outbound_pci_buffer_overflow to ethtool output for monitoring the
number of packets that were dropped due to lack of PCIe buffers on
receive path from NIC port toward the host(s).
This counter is valid only in case that tx_overflow_buffer_pkt is
supported in MCAM enhanced features.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gal Pressman [Thu, 15 Jun 2017 15:29:32 +0000 (18:29 +0300)]
net/mlx5e: Add PCIe outbound stalls counters
outbound_pci_stalled_rd - The percentage of time within the last second
that the NIC had outbound non-posted read requests but could not perform
the operation due to insufficient non-posted credits.
outbound_pci_stalled_wr - The percentage of time within the
last second that the NIC had outbound posted writes requests but could
not perform the operation due to insufficient posted credits.
outbound_pci_stalled_rd_events - The number of events where
outbound_pci_stalled_rd was above the threshold.
outbound_pci_stalled_wr_events - The number of events where
outbound_pci_stalled_wr was above the threshold.
Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Shalom Lagziel [Sun, 28 May 2017 13:40:24 +0000 (16:40 +0300)]
net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool
Add support for "ethtool DEVNAME" over ipoib ports,
Display standard port information for IPoIB netdevices using ethtool
For example:
$ ethtool ib2
> Settings for ib2:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 100000Mb/s
Duplex: Full
Port: Other
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Link detected: yes
net/mlx5e: IPoIB, Fix driver name retrieved by ethtool
Printing an enhanced IPoIB device information using
"ethtool -i DEVNAME", prints the low level driver name: mlx5_core.
This commit changes the name to mlx5_core [ib_ipoib], to include the
ipoib device driver infromation.
Eran Ben Elisha [Sun, 5 Feb 2017 15:57:40 +0000 (17:57 +0200)]
net/mlx5e: Send PAOS command on interface up/down
Upon interface up/down, driver will send PAOS (Ports Administrative and
Operational Status Register) in order to inform the Firmware on the
desired status of the port by the driver.
Since now we might change physical link status on mlx5e_open/close,
logical VF representor should not use mlx5e_open/close ndos as is, and
should call the logical version mlx5e_open/closed_locked.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Alexey Dobriyan [Sat, 19 Aug 2017 09:57:51 +0000 (12:57 +0300)]
genirq/ipi: Fixup checks against nr_cpu_ids
Valid CPU ids are [0, nr_cpu_ids-1] inclusive.
Fixes: dcaca254b804 ("genirq: Implement ipi_send_mask/single()") Fixes: 696edd275734 ("genirq: Add a new function to get IPI reverse mapping") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170819095751.GB27864@avx2
Daniel Borkmann [Sat, 19 Aug 2017 01:12:46 +0000 (03:12 +0200)]
bpf: inline map in map lookup functions for array and htab
Avoid two successive functions calls for the map in map lookup, first
is the bpf_map_lookup_elem() helper call, and second the callback via
map->ops->map_lookup_elem() to get to the map in map implementation.
Implementation inlines array and htab flavor for map in map lookups.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 19 Aug 2017 01:12:45 +0000 (03:12 +0200)]
bpf: make htab inlining more robust wrt assumptions
Commit 39e28bbe219f ("bpf: inline htab_map_lookup_elem()") was
making the assumption that a direct call emission to the function
__htab_map_lookup_elem() will always work out for JITs.
This is currently true since all JITs we have are for 64 bit archs,
but in case of 32 bit JITs like upcoming arm32, we get a NULL pointer
dereference when executing the call to __htab_map_lookup_elem()
since passed arguments are of a different size (due to pointer args)
than what we do out of BPF. Guard and thus limit this for now for
the current 64 bit JITs only.
Reported-by: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 18 Aug 2017 18:28:01 +0000 (11:28 -0700)]
bpf: Allow numa selection in INNER_LRU_HASH_PREALLOC test of map_perf_test
This patch makes the needed changes to allow each process of
the INNER_LRU_HASH_PREALLOC test to provide its numa node id
when creating the lru map.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 18 Aug 2017 18:28:00 +0000 (11:28 -0700)]
bpf: Allow selecting numa node during map creation
The current map creation API does not allow to provide the numa-node
preference. The memory usually comes from where the map-creation-process
is running. The performance is not ideal if the bpf_prog is known to
always run in a numa node different from the map-creation-process.
One of the use case is sharding on CPU to different LRU maps (i.e.
an array of LRU maps). Here is the test result of map_perf_test on
the INNER_LRU_HASH_PREALLOC test if we force the lru map used by
CPU0 to be allocated from a remote numa node:
[ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ]
># taskset -c 10 ./map_perf_test 512 8 12600008000000
5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec
4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec
3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec
6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec
2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec
1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec
7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec
0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec #<<<
After specifying numa node:
># taskset -c 10 ./map_perf_test 512 8 12600008000000
5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec
3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec
1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec
6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec
2:inner_lru_hash_map_perf pre-alloc 1614630 events per sec
4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec
7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec
0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<<
This patch adds one field, numa_node, to the bpf_attr. Since numa node 0
is a valid node, a new flag BPF_F_NUMA_NODE is also added. The numa_node
field is honored if and only if the BPF_F_NUMA_NODE flag is set.
Numa node selection is not supported for percpu map.
This patch does not change all the kmalloc. F.e.
'htab = kzalloc()' is not changed since the object
is small enough to stay in the cache.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 20 Aug 2017 00:13:41 +0000 (17:13 -0700)]
Merge branch 'net-const-eisa_device_id'
Arvind Yadav says:
====================
constify net eisa_device_id
eisa_device_id are not supposed to change at runtime. All functions
working with eisa_device_id provided by <linux/eisa.h> work with
const eisa_device_id. So mark the non-const structs as const.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arvind Yadav [Sat, 19 Aug 2017 06:53:34 +0000 (12:23 +0530)]
net: defxx: constify eisa_device_id
eisa_device_id are not supposed to change at runtime. All functions
working with eisa_device_id provided by <linux/eisa.h> work with
const eisa_device_id. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Arvind Yadav [Sat, 19 Aug 2017 06:53:14 +0000 (12:23 +0530)]
net: hp100: constify eisa_device_id
eisa_device_id are not supposed to change at runtime. All functions
working with eisa_device_id provided by <linux/eisa.h> work with
const eisa_device_id. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arvind Yadav [Sat, 19 Aug 2017 06:52:44 +0000 (12:22 +0530)]
net: de4x5: constify eisa_device_id
eisa_device_id are not supposed to change at runtime. All functions
working with eisa_device_id provided by <linux/eisa.h> work with
const eisa_device_id. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arvind Yadav [Sat, 19 Aug 2017 06:52:13 +0000 (12:22 +0530)]
net: 3c59x: constify eisa_device_id
eisa_device_id are not supposed to change at runtime. All functions
working with eisa_device_id provided by <linux/eisa.h> work with
const eisa_device_id. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arvind Yadav [Sat, 19 Aug 2017 06:51:43 +0000 (12:21 +0530)]
net: 3c509: constify eisa_device_id
eisa_device_id are not supposed to change at runtime. All functions
working with eisa_device_id provided by <linux/eisa.h> work with
const eisa_device_id. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
nfp: add basic ethtool callbacks to representors
This set extends the basic ethtool functionality to representor
netdevs. I start with providing link state via ethtool and then
move on to functions such as driver information, statistics and
FW log dump. The series contains a number of clean ups to the
ethtool stats code too, some of the logic is simplified by making
better use of the nfp_port abstraction. The stats we expose on
representors are only the PCIe and MAC port statistics firmware
maintains for us.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:22 +0000 (15:48 -0700)]
nfp: don't reuse pointers in ring dumping
We were reusing skb pointer when reading page frag, since ring
entries contain a union of a skb and frag pointer. This can
be confusing to people reading the code. Refactor the code
to read frag pointer directly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:21 +0000 (15:48 -0700)]
nfp: fix copy paste in names and messages regarding vNICs
Data and control vNICs currently use the same area name and
error message. This could lead to confusion. Make sure
the error message says "ctrl" in case of control and the
data area is called "nfp.bar0".
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:20 +0000 (15:48 -0700)]
nfp: add ethtool statistics for representors
Representors may be associated with both VFs or more importantly
with physical ports. Allow vNIC and MAC statistics to be read
with ethtool -S on representors. In case of vNICs we reuse
the vNIC statistic helper, we just need to swap RX and TX to
give statistics the "switch perspective."
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:19 +0000 (15:48 -0700)]
nfp: add pointer to vNIC config memory to nfp_port structure
Simplify the statistics handling code by keeping pointer to vNIC's
config memory in nfp_port. Note that this is referring to the
representor side of vNICs, vNIC side has the pointer in nfp_net.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:18 +0000 (15:48 -0700)]
nfp: report MAC statistics in ethtool
Add reporting of MAC statistics in ethtool. MAC statistics
are read out from the MAC IP and accumulated by application
FW, therefore their presence depends on the application FW.
Add missing defines and string names for the statistics and
dump them in ethtool -S.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:17 +0000 (15:48 -0700)]
nfp: store pointer to MAC statistics in nfp_port
Store pointer to device memory containing MAC statistics
in nfp_port. This simplifies representor code and will
be used to dump those statistics in ethtool as well.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:16 +0000 (15:48 -0700)]
nfp: split software and hardware vNIC statistics
In preparation for reporting vNIC HW stats on representors
split handling of the SW and HW stats in ethtool -S.
Representors don't have SW stats (since vNIC is assigned
to the VM).
Remove the questionable defines which assume nn variable
exists in the scope.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:15 +0000 (15:48 -0700)]
nfp: add helper for printing ethtool strings
Add a helper for printing ethtool strings and advancing the
pointer correctly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:14 +0000 (15:48 -0700)]
nfp: don't report standard netdev statistics in ethtool
We have been recently called out as a bad example for reporting
standard netdev statistics as part of ethtool. Fix that :)
Removing standard statistics allows us to simplify the structure
holding definitions since we no longer have to mux different types
of statistics.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:13 +0000 (15:48 -0700)]
nfp: allow retreiving management FW logs on representors
Users should be able to dump the management FW logs on any
of the driver's netdevs. Make the code only depend on the
nfp_app and share it between vNICs and representors.
Storing the dump flag is simply dropped for now, since we
only support the argument being set to 0.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:12 +0000 (15:48 -0700)]
nfp: provide ethtool_drvinfo on representors
Extend representors' ethtool ops to show basic info like firmware
version, driver version, and driver name.
While at it don't set drvinfo.n_stats and drvinfo.regdump_len,
core will invoke appropriate handlers to get those.
A helper is added to turn a netdev into nfp_app for convenience.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 18 Aug 2017 22:48:11 +0000 (15:48 -0700)]
nfp: link basic ethtool ops to representors
Start linking ethtool ops to representors. Begin by adding
a separate ops structure and providing link state. Next
patches will convert appropriate functions to only use nfp_port,
which will make them reusable on representors.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ideally, attribute groups would contain const attributes
but there are too many places that do modifications of list
during startup (in other code) to allow that.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: don't decrement kobj reference count on init failure
If kobject_init_and_add failed, then the failure path would
decrement the reference count of the queue kobject whose reference
count was already zero.
Fixes: c51c8a153113 ("bql: Byte queue limits") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following updates are included in this driver update series:
- Set the MDIO mode to clause 45 for the 10GBase-T configuration
- Set the MII control width to 8-bits for speeds less than 1Gbps
- Fix an issue to related to module removal when the devices are up
- Fix ethtool statistics related to packet counting of TSO packets
- Add support for device renaming
- Add additional dynamic debug output for the PCS window calculation
- Optimize reading of DMA channel interrupt enablement register
- Add additional dynamic debug output about the hardware features
- Add per queue Tx and Rx ethtool statistics
- Add a macro to clear ethtool_link_ksettings modes
- Convert the driver to use the ethtool_link_ksettings
- Add support for VXLAN offload capabilities
- Add additional ethtool statistics related to VXLAN
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 18 Aug 2017 14:04:04 +0000 (09:04 -0500)]
amd-xgbe: Add support for VXLAN offload capabilities
The hardware has the capability to perform checksum offload support
(both Tx and Rx) and TSO support for VXLAN packets. Add the support
required to enable this.
The hardware can only support a single VXLAN port for offload. If more
than one VXLAN port is added then the offload capabilities have to be
disabled and can no longer be advertised.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 18 Aug 2017 14:03:55 +0000 (09:03 -0500)]
amd-xgbe: Convert to using the new link mode settings
Convert from using the old u32 supported, advertising, etc. link settings
to the new link mode settings that support bit positions / settings
greater than 32 bits.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently whenever the driver needs to enable or disable interrupts for
a DMA channel it reads the interrupt enable register (IER), updates the
value and then writes the new value back to the IER. Since the hardware
does not change the IER, software can track this value and elimiate the
need to read it each time.
Add the IER value to the channel related data structure and use that as
the base for enabling and disabling interrupts, thus removing the need
for the MMIO read.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>