Martin KaFai Lau [Fri, 11 Nov 2016 18:55:11 +0000 (10:55 -0800)]
bpf: Add tests for the LRU bpf_htab
This patch has some unit tests and a test_lru_dist.
The test_lru_dist reads in the numeric keys from a file.
The files used here are generated by a modified fio-genzipf tool
originated from the fio test suit. The sample data file can be
found here: https://github.com/iamkafai/bpf-lru
The zipf.* data files have 100k numeric keys and the key is also
ranged from 1 to 100k.
The test_lru_dist outputs the number of unique keys (nr_unique).
F.e. The following means, 61239 of them is unique out of 100k keys.
nr_misses means it cannot be found in the LRU map, so nr_misses
must be >= nr_unique. test_lru_dist also simulates a perfect LRU
map as a comparison:
Martin KaFai Lau [Fri, 11 Nov 2016 18:55:07 +0000 (10:55 -0800)]
bpf: Add percpu LRU list
Instead of having a common LRU list, this patch allows a
percpu LRU list which can be selected by specifying a map
attribute. The map attribute will be added in the later
patch.
While the common use case for LRU is #reads >> #updates,
percpu LRU list allows bpf prog to absorb unusual #updates
under pathological case (e.g. external traffic facing machine which
could be under attack).
Each percpu LRU is isolated from each other. The LRU nodes (including
free nodes) cannot be moved across different LRU Lists.
Here are the update performance comparison between
common LRU list and percpu LRU list (the test code is
at the last patch):
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
1 cpus: 2934082 updates
4 cpus: 7391434 updates
8 cpus: 6500576 updates
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 32 $i | awk '{r += $3}END{printr " updates"}'; done
1 cpus: 2896553 updates
4 cpus: 9766395 updates
8 cpus: 17460553 updates
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Fri, 11 Nov 2016 18:55:06 +0000 (10:55 -0800)]
bpf: LRU List
Introduce bpf_lru_list which will provide LRU capability to
the bpf_htab in the later patch.
* General Thoughts:
1. Target use case. Read is more often than update.
(i.e. bpf_lookup_elem() is more often than bpf_update_elem()).
If bpf_prog does a bpf_lookup_elem() first and then an in-place
update, it still counts as a read operation to the LRU list concern.
2. It may be useful to think of it as a LRU cache
3. Optimize the read case
3.1 No lock in read case
3.2 The LRU maintenance is only done during bpf_update_elem()
4. If there is a percpu LRU list, it will lose the system-wise LRU
property. A completely isolated percpu LRU list has the best
performance but the memory utilization is not ideal considering
the work load may be imbalance.
5. Hence, this patch starts the LRU implementation with a global LRU
list with batched operations before accessing the global LRU list.
As a LRU cache, #read >> #update/#insert operations, it will work well.
6. There is a local list (for each cpu) which is named
'struct bpf_lru_locallist'. This local list is not used to sort
the LRU property. Instead, the local list is to batch enough
operations before acquiring the lock of the global LRU list. More
details on this later.
7. In the later patch, it allows a percpu LRU list by specifying a
map-attribute for scalability reason and for use cases that need to
prepare for the worst (and pathological) case like DoS attack.
The percpu LRU list is completely isolated from each other and the
LRU nodes (including free nodes) cannot be moved across the list. The
following description is for the global LRU list but mostly applicable
to the percpu LRU list also.
* Global LRU List:
1. It has three sub-lists: active-list, inactive-list and free-list.
2. The two list idea, active and inactive, is borrowed from the
page cache.
3. All nodes are pre-allocated and all sit at the free-list (of the
global LRU list) at the beginning. The pre-allocation reasoning
is similar to the existing BPF_MAP_TYPE_HASH. However,
opting-out prealloc (BPF_F_NO_PREALLOC) is not supported in
the LRU map.
* Active/Inactive List (of the global LRU list):
1. The active list, as its name says it, maintains the active set of
the nodes. We can think of it as the working set or more frequently
accessed nodes. The access frequency is approximated by a ref-bit.
The ref-bit is set during the bpf_lookup_elem().
2. The inactive list, as its name also says it, maintains a less
active set of nodes. They are the candidates to be removed
from the bpf_htab when we are running out of free nodes.
3. The ordering of these two lists is acting as a rough clock.
The tail of the inactive list is the older nodes and
should be released first if the bpf_htab needs free element.
* Rotating the Active/Inactive List (of the global LRU list):
1. It is the basic operation to maintain the LRU property of
the global list.
2. The active list is only rotated when the inactive list is running
low. This idea is similar to the current page cache.
Inactive running low is currently defined as
"# of inactive < # of active".
3. The active list rotation always starts from the tail. It moves
node without ref-bit set to the head of the inactive list.
It moves node with ref-bit set back to the head of the active
list and then clears its ref-bit.
4. The inactive rotation is pretty simply.
It walks the inactive list and moves the nodes back to the head of
active list if its ref-bit is set. The ref-bit is cleared after moving
to the active list.
If the node does not have ref-bit set, it just leave it as it is
because it is already in the inactive list.
* Shrinking the Inactive List (of the global LRU list):
1. Shrinking is the operation to get free nodes when the bpf_htab is
full.
2. It usually only shrinks the inactive list to get free nodes.
3. During shrinking, it will walk the inactive list from the tail,
delete the nodes without ref-bit set from bpf_htab.
4. If no free node found after step (3), it will forcefully get
one node from the tail of inactive or active list. Forcefully is
in the sense that it ignores the ref-bit.
* Local List:
1. Each CPU has a 'struct bpf_lru_locallist'. The purpose is to
batch enough operations before acquiring the lock of the
global LRU.
2. A local list has two sub-lists, free-list and pending-list.
3. During bpf_update_elem(), it will try to get from the free-list
of (the current CPU local list).
4. If the local free-list is empty, it will acquire from the
global LRU list. The global LRU list can either satisfy it
by its global free-list or by shrinking the global inactive
list. Since we have acquired the global LRU list lock,
it will try to get at most LOCAL_FREE_TARGET elements
to the local free list.
5. When a new element is added to the bpf_htab, it will
first sit at the pending-list (of the local list) first.
The pending-list will be flushed to the global LRU list
when it needs to acquire free nodes from the global list
next time.
* Lock Consideration:
The LRU list has a lock (lru_lock). Each bucket of htab has a
lock (buck_lock). If both locks need to be acquired together,
the lock order is always lru_lock -> buck_lock and this only
happens in the bpf_lru_list.c logic.
In hashtab.c, both locks are not acquired together (i.e. one
lock is always released first before acquiring another lock).
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Fix off by one wrt. indexing when dumping /proc/net/route entries,
from Alexander Duyck.
2) Fix lockdep splats in iwlwifi, from Johannes Berg.
3) Cure panic when inserting certain netfilter rules when NFT_SET_HASH
is disabled, from Liping Zhang.
4) Memory leak when nft_expr_clone() fails, also from Liping Zhang.
5) Disable UFO when path will apply IPSEC tranformations, from Jakub
Sitnicki.
6) Don't bogusly double cwnd in dctcp module, from Florian Westphal.
7) skb_checksum_help() should never actually use the value "0" for the
resulting checksum, that has a special meaning, use CSUM_MANGLED_0
instead. From Eric Dumazet.
8) Per-tx/rx queue statistic strings are wrong in qed driver, fix from
Yuval MIntz.
9) Fix SCTP reference counting of associations and transports in
sctp_diag. From Xin Long.
10) When we hit ip6tunnel_xmit() we could have come from an ipv4 path in
a previous layer or similar, so explicitly clear the ipv6 control
block in the skb. From Eli Cooper.
11) Fix bogus sleeping inside of inet_wait_for_connect(), from WANG
Cong.
12) Correct deivce ID of T6 adapter in cxgb4 driver, from Hariprasad
Shenai.
13) Fix potential access past the end of the skb page frag array in
tcp_sendmsg(). From Eric Dumazet.
14) 'skb' can legitimately be NULL in inet{,6}_exact_dif_match(). Fix
from David Ahern.
15) Don't return an error in tcp_sendmsg() if we wronte any bytes
successfully, from Eric Dumazet.
16) Extraneous unlocks in netlink_diag_dump(), we removed the locking
but forgot to purge these unlock calls. From Eric Dumazet.
17) Fix memory leak in error path of __genl_register_family(). We leak
the attrbuf, from WANG Cong.
18) cgroupstats netlink policy table is mis-sized, from WANG Cong.
19) Several XDP bug fixes in mlx5, from Saeed Mahameed.
20) Fix several device refcount leaks in network drivers, from Johan
Hovold.
21) icmp6_send() should use skb dst device not skb->dev to determine L3
routing domain. From David Ahern.
22) ip_vs_genl_family sets maxattr incorrectly, from WANG Cong.
23) We leak new macvlan port in some cases of maclan_common_netlink()
errors. Fix from Gao Feng.
24) Similar to the icmp6_send() fix, icmp_route_lookup() should
determine L3 routing domain using skb_dst(skb)->dev not skb->dev.
Also from David Ahern.
25) Several fixes for route offloading and FIB notification handling in
mlxsw driver, from Jiri Pirko.
26) Properly cap __skb_flow_dissect()'s return value, from Eric Dumazet.
27) Fix long standing regression in ipv4 redirect handling, wrt.
validating the new neighbour's reachability. From Stephen Suryaputra
Lin.
28) If sk_filter() trims the packet excessively, handle it reasonably in
tcp input instead of exploding. From Eric Dumazet.
29) Fix handling of napi hash state when copying channels in sfc driver,
from Bert Kenward.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (121 commits)
mlxsw: spectrum_router: Flush FIB tables during fini
net: stmmac: Fix lack of link transition for fixed PHYs
sctp: change sk state only when it has assocs in sctp_shutdown
bnx2: Wait for in-flight DMA to complete at probe stage
Revert "bnx2: Reset device during driver initialization"
ps3_gelic: fix spelling mistake in debug message
net: ethernet: ixp4xx_eth: fix spelling mistake in debug message
ibmvnic: Fix size of debugfs name buffer
ibmvnic: Unmap ibmvnic_statistics structure
sfc: clear napi_hash state when copying channels
mlxsw: spectrum_router: Correctly dump neighbour activity
mlxsw: spectrum: Fix refcount bug on span entries
bnxt_en: Fix VF virtual link state.
bnxt_en: Fix ring arithmetic in bnxt_setup_tc().
Revert "include/uapi/linux/atm_zatm.h: include linux/time.h"
tcp: take care of truncations done by sk_filter()
ipv4: use new_gw for redirect neigh lookup
r8152: Fix error path in open function
net: bpqether.h: remove if_ether.h guard
net: __skb_flow_dissect() must cap its return value
...
* tag 'rtc-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: omap: prevent disabling of clock/module during suspend
rtc: omap: Fix selecting external osc
rtc: cmos: Don't enable interrupts in the middle of the interrupt handler
rtc: cmos: remove all __exit_p annotations
rtc: asm9260: fix module autoload
Chris Metcalf [Mon, 7 Nov 2016 19:32:02 +0000 (14:32 -0500)]
tile: handle __ro_after_init like parisc does
The tile architecture already marks RO_DATA as read-only in
the kernel, so grouping RO_AFTER_INIT_DATA with RO_DATA, as is
done by default, means the kernel faults in init when it tries
to write to RO_AFTER_INIT_DATA. For now, just arrange that
__ro_after_init is handled like __write_once, i.e. __read_mostly.
Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Ido Schimmel [Mon, 14 Nov 2016 10:26:32 +0000 (11:26 +0100)]
mlxsw: spectrum_router: Flush FIB tables during fini
Since commit d3a93b4c07d9 ("mlxsw: spectrum_router: Use FIB notifications
instead of switchdev calls") we reflect to the device the entire FIB
table and not only FIBs that point to netdevs created by the driver.
During module removal, FIBs of the second type are removed following
NETDEV_UNREGISTER events sent. The other FIBs are still present in both
the driver's cache and the device's table.
Fix this by iterating over all the FIB tables in the device and flush
them. There's no need to take locks, as we're the only writer.
Fixes: d3a93b4c07d9 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 14 Nov 2016 01:50:35 +0000 (17:50 -0800)]
net: stmmac: Fix lack of link transition for fixed PHYs
Commit 8554f8afbe65 ("stmmac: fix adjust link call in case of a switch
is attached") added some logic to avoid polling the fixed PHY and
therefore invoking the adjust_link callback more than once, since this
is a fixed PHY and link events won't be generated.
This works fine the first time, because we start with phydev->irq =
PHY_POLL, so we call adjust_link, then we set phydev->irq =
PHY_IGNORE_INTERRUPT and we stop polling the PHY.
Now, if we called ndo_close(), which calls both phy_stop() and does an
explicit netif_carrier_off(), we end up with a link down. Upon calling
ndo_open() again, despite starting the PHY state machine, we have
PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the
link is permanently down.
Fixes: 8554f8afbe65 ("stmmac: fix adjust link call in case of a switch is attached") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 13 Nov 2016 13:44:37 +0000 (21:44 +0800)]
sctp: change sk state only when it has assocs in sctp_shutdown
Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if
this sock has no connection (assoc), sk state would be changed to
SCTP_SS_CLOSING, which is not as we expect.
Besides, after that if users try to listen on this sock, kernel
could even panic when it dereference sctp_sk(sk)->bind_hash in
sctp_inet_listen, as bind_hash is null when sock has no assoc.
This patch is to move sk state change after checking sk assocs
is not empty, and also merge these two if() conditions and reduce
indent level.
Fixes: da9339344cda ("sctp: sctp should change socket state when shutdown is received") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Nov 2016 21:20:54 +0000 (16:20 -0500)]
Merge branch 'bnx2-kdump-fix'
Baoquan He says:
====================
bnx2: Wait for in-flight DMA to complete at probe stage
This is v2 post.
In commit 28190b3 ("bnx2: Reset device during driver initialization"),
firmware requesting code was moved from open stage to probe stage.
The reason is in kdump kernel hardware iommu need device be reset in
driver probe stage, otherwise those in-flight DMA from 1st kernel
will continue going and look up into the newly created io-page tables.
However bnx2 chip resetting involves firmware requesting issue, that
need be done in open stage.
Michale Chan suggested we can just wait for the old in-flight DMA to
complete at probe stage, then though without device resetting, we
don't need to worry the old in-flight DMA could continue looking up
the newly created io-page tables.
v1->v2:
Michael suggested to wait for the in-flight DMA to complete at probe
stage. So give up the old method of trying to reset chip at probe
stage, take the new way accordingly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Baoquan He [Sun, 13 Nov 2016 05:01:33 +0000 (13:01 +0800)]
bnx2: Wait for in-flight DMA to complete at probe stage
In-flight DMA from 1st kernel could continue going in kdump kernel.
New io-page table has been created before bnx2 does reset at open stage.
We have to wait for the in-flight DMA to complete to avoid it look up
into the newly created io-page table at probe stage.
Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When people build bnx2 driver into kernel, it will fail to detect
and load firmware because firmware is contained in initramfs and
initramfs has not been uncompressed yet during do_initcalls. So
revert commit 28190b3 and work out a new way in the later patch.
Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 13 Nov 2016 17:53:28 +0000 (18:53 +0100)]
net: atheros: atl2: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
The previous implementation of set_settings was modifying
the value of advertising, but with the new API, it's not
possible. The structure ethtool_link_ksettings is defined
as const.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 13 Nov 2016 17:35:14 +0000 (18:35 +0100)]
net: atheros: atl1: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
The previous implementation of set_settings was modifying
the value of advertising, but with the new API, it's not
possible. The structure ethtool_link_ksettings is defined
as const.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 11 Nov 2016 18:20:50 +0000 (10:20 -0800)]
net: fix sleeping for sk_wait_event()
Similar to commit 7d72068941d0 ("inet: fix sleeping inside inet_wait_for_connect()"),
sk_wait_event() needs to fix too, because release_sock() is blocking,
it changes the process state back to running after sleep, which breaks
the previous prepare_to_wait().
Switch to the new wait API.
Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 14 Nov 2016 17:46:08 +0000 (09:46 -0800)]
ASoC: lpass-platform: fix uninitialized variable
In commit 50980eeb4eef ("ASoC: lpass-platform: Fix broken pcm data
usage") the stream specific information initialization was broken, with
the dma channel information not being initialized if there was no
alloc_dma_channel() helper function.
Before that, the DMA channel number was implicitly initialized to zero
because the backing store was allocated with devm_kzalloc(). When the
init code was rewritten, that implicit initialization was lost, and gcc
rightfully complains about an uninitialized variable being used.
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turns out that this flushes things much too aggressiverly, and causes
lines to break up when the system logger races with new continuation
lines being printed.
There's a pending patch to make printk() flushing much more
straightforward, but it's too invasive for 4.9, so in the meantime let's
just not make the system message logging flush continuation lines.
They'll be flushed by the final newline anyway.
Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This file was converted to a separate module at commit 21a4256907be
("gp8psk: Fix DVB frontend attach"), because the DVB attach routines
require it to work. However, I forgot to copy the MODULE_foo() macros
from the original module, causing this warning:
WARNING: modpost: missing MODULE_LICENSE() in drivers/media/dvb-frontends/gp8psk-fe.o
Linus Torvalds [Mon, 14 Nov 2016 16:34:56 +0000 (08:34 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"This fixes a genirq regression that resulted in the Intel/Broxton
pinctrl/GPIO driver (and possibly others) spewing warnings"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Use irq type from irqdata instead of irqdesc
Linus Torvalds [Mon, 14 Nov 2016 16:30:06 +0000 (08:30 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"An uncore PMU driver hardware enablement change for Intel SkyLake
uncore PMUs (Skylake Y, U, H and S platforms), plus a number of
tooling fixes for the histogram handling/displaying code"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake
perf hists: Fix column length on --hierarchy
perf hists browser: Fix column indentation on --hierarchy
perf hists browser: Show folded sign properly on --hierarchy
perf hists browser: Fix indentation of folded sign on --hierarchy
perf hist browser: Fix hierarchy column counts
====================
Netfilter updates for net-next
The following patchset contains a second batch of Netfilter updates for
your net-next tree. This includes a rework of the core hook
infrastructure that improves Netfilter performance by ~15% according to
synthetic benchmarks. Then, a large batch with ipset updates, including
a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a
couple of assorted updates.
Regarding the core hook infrastructure rework to improve performance,
using this simple drop-all packets ruleset from ingress:
nft add table netdev x
nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; }
nft add rule netdev x y drop
And generating traffic through Jesper Brouer's
samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
option. perf report shows nf_tables calls in its top 10:
I'm measuring here an improvement of ~15% in performance with this
patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.
This rework contains more specifically, in strict order, these patches:
1) Remove compile-time debugging from core.
2) Remove obsolete comments that predate the rcu era. These days it is
well known that a Netfilter hook always runs under rcu_read_lock().
3) Remove threshold handling, this is only used by br_netfilter too.
We already have specific code to handle this from br_netfilter,
so remove this code from the core path.
4) Deprecate NF_STOP, as this is only used by br_netfilter.
5) Place nf_state_hook pointer into xt_action_param structure, so
this structure fits into one single cacheline according to pahole.
This also implicit affects nftables since it also relies on the
xt_action_param structure.
6) Move state->hook_entries into nf_queue entry. The hook_entries
pointer is only required by nf_queue(), so we can store this in the
queue entry instead.
7) use switch() statement to handle verdict cases.
8) Remove hook_entries field from nf_hook_state structure, this is only
required by nf_queue, so store it in nf_queue_entry structure.
9) Merge nf_iterate() into nf_hook_slow() that results in a much more
simple and readable function.
10) Handle NF_REPEAT away from the core, so far the only client is
nf_conntrack_in() and we can restart the packet processing using a
simple goto to jump back there when the TCP requires it.
This update required a second pass to fix fallout, fix from
Arnd Bergmann.
11) Set random seed from nft_hash when no seed is specified from
userspace.
12) Simplify nf_tables expression registration, in a much smarter way
to save lots of boiler plate code, by Liping Zhang.
14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due
to recent generalization of the socket infrastructure, from Arnd
Bergmann.
15) Then, the ipset batch from Jozsef, he describes it as it follows:
* Cleanup: Remove extra whitespaces in ip_set.h
* Cleanup: Mark some of the helpers arguments as const in ip_set.h
* Cleanup: Group counter helper functions together in ip_set.h
* struct ip_set_skbinfo is introduced instead of open coded fields
in skbinfo get/init helper funcions.
* Use kmalloc() in comment extension helper instead of kzalloc()
because it is unnecessary to zero out the area just before
explicit initialization.
* Cleanup: Split extensions into separate files.
* Cleanup: Separate memsize calculation code into dedicated function.
* Cleanup: group ip_set_put_extensions() and ip_set_get_extensions()
together.
* Add element count to hash headers by Eric B Munson.
* Add element count to all set types header for uniform output
across all set types.
* Count non-static extension memory into memsize calculation for
userspace.
* Cleanup: Remove redundant mtype_expire() arguments, because
they can be get from other parameters.
* Cleanup: Simplify mtype_expire() for hash types by removing
one level of intendation.
* Make NLEN compile time constant for hash types.
* Make sure element data size is a multiple of u32 for the hash set
types.
* Optimize hash creation routine, exit as early as possible.
* Make struct htype per ipset family so nets array becomes fixed size
and thus simplifies the struct htype allocation.
* Collapse same condition body into a single one.
* Fix reported memory size for hash:* types, base hash bucket structure
was not taken into account.
* hash:ipmac type support added to ipset by Tomasz Chilinski.
* Use setup_timer() and mod_timer() instead of init_timer()
by Muhammad Falak R Wani, individually for the set type families.
16) Remove useless connlabel field in struct netns_ct, patch from
Florian Westphal.
17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify
{ip,ip6,arp}tables code that uses this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 14 Nov 2016 03:36:42 +0000 (22:36 -0500)]
Merge branch 'dsa-mv88e6xxx-post-refactor-fixes'
Andrew Lunn says:
====================
dsa: mv88e6xxx: Fixes for port refactoring
The patches which refactored setting up the switch MACs introduced a
couple of regressions. The RGMII delays for a port can be set using
other mechanism than just phy-mode. Don't overwrite the delays unless
explicitly asked to. This broke my Armada 370 RD. Also, the mv88e6351
family supports setting RGMII delays, but is missing the necessary
entries in the ops structures to allow this.
These fixes are to patches currently in net-next. No need for stable
etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 10 Nov 2016 14:44:01 +0000 (15:44 +0100)]
net: dsa: mv88e6xxx: 6351 family also has RGMII delays
The recent refactoring of setting the MAC configuration broke setting
of RGMII delays, via the phy-mode, on the 6351 family. Add the missing
ops to the structure.
Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 10 Nov 2016 14:44:00 +0000 (15:44 +0100)]
net: dsa: mv88e6xxx: Don't modify RGMII delays when not RGMII mode
The RGMII modes delays can be set via strapping pings or EEPROM.
Don't change them unless explicitly asked to change them. The recent
refactoring of setting the MAC configuration changed this behaviours,
in that CPU and DSA ports have any pre-configured RGMII delays
removed. This breaks the Armada 370RD board. Restore the previous
behaviour, in that RGMII delays are only applied/removed when
explicitly asked for via an phy-mode being PHY_INTERFACE_MODE_RGMII*
Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 14 Oct 2016 07:34:18 +0000 (10:34 +0300)]
ntb_perf: potential info leak in debugfs
This is a static checker warning, not something I'm desperately
concerned about. But snprintf() returns the number of bytes that
would have been copied if there were space. We really care about the
number of bytes that actually were copied so we should use scnprintf()
instead.
It probably won't overrun, and in that case we may as well just use
sprintf() but these sorts of things make static checkers and code
reviewers happier.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
Dave Jiang [Thu, 27 Oct 2016 18:06:44 +0000 (11:06 -0700)]
ntb: ntb_hw_intel: init peer_addr in struct intel_ntb_dev
The peer_addr member of intel_ntb_dev is not set, therefore when
acquiring ntb_peer_db and ntb_peer_spad we only get the offset rather
than the actual physical address. Adding fix to correct that.
Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
schedule_timeout_* takes a timeout in jiffies but the code currently is
passing in a constant which makes this timeout HZ dependent, so pass it
through msecs_to_jiffies() to fix this up.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
ntb_transport: make DMA_OUT_RESOURCE_TO HZ independent
schedule_timeout_* takes a timeout in jiffies but the code currently is
passing in a constant which makes this timeout HZ dependent, so pass it
through msecs_to_jiffies() to fix this up.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>
Julia Lawall [Fri, 11 Nov 2016 12:32:38 +0000 (13:32 +0100)]
netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test
Since commit 8f960f4e430e ("netfilter: don't use
mutex_lock_interruptible()"), the function xt_find_table_lock can only
return NULL on an error. Simplify the call sites and update the
comment before the function.
The semantic patch that change the code is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression t,e;
@@
t = \(xt_find_table_lock(...)\|
try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
- ! IS_ERR_OR_NULL(t)
+ t
@@
expression t,e;
@@
t = \(xt_find_table_lock(...)\|
try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
- IS_ERR_OR_NULL(t)
+ !t
@@
expression t,e,e1;
@@
t = \(xt_find_table_lock(...)\|
try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
?- t ? PTR_ERR(t) : e1
+ e1
... when any
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Philippe Reynes [Sat, 12 Nov 2016 22:16:51 +0000 (23:16 +0100)]
net: atheros: atl1e: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
The previous implementation of set_settings was modifying
the value of advertising, but with the new API, it's not
possible. The structure ethtool_link_ksettings is defined
as const.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Fri, 11 Nov 2016 15:56:51 +0000 (15:56 +0000)]
sfc: clear napi_hash state when copying channels
efx_copy_channel() doesn't correctly clear the napi_hash related state.
This means that when napi_hash_add is called for that channel nothing is
done, and we are left with a copy of the napi_hash_node from the old
channel. When we later call napi_hash_del() on this channel we have a
stale napi_hash_node.
Corruption is only seen when there are multiple entries in one of the
napi_hash lists. This is made more likely by having a very large number
of channels. Testing was carried out with 512 channels - 32 channels on
each of 16 ports.
This failure typically appears as protection faults within napi_by_id()
or napi_hash_add(). efx_copy_channel() is only used when tx or rx ring
sizes are changed (ethtool -G).
Fixes: a0d97e85fac2 ("sfc: Add support for busy polling") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Fri, 11 Nov 2016 14:10:47 +0000 (16:10 +0200)]
net: ethernet: ti: davinci_cpdma: don't stop ctlr if it was stopped
No need to stop ctlr if it was already stopped. It can cause timeout
warns. Steps:
- ifconfig eth0 down
- ethtool -l eth0 rx 8 tx 8
- ethtool -l eth0 rx 1 tx 1
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The dma ctlr is reseted to 0 while cpdma soft reset, thus cpdma ctlr
cannot be configured after cpdma is stopped. So restoring content
of cpdma ctlr while off/on procedure is needed. The cpdma ctlr off/on
procedure is present while interface down/up and while changing number
of channels with ethtool. In order to not restore content in many
places, move it to cpdma_ctlr_start().
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 13 Nov 2016 18:28:53 +0000 (10:28 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM fixes. There are a couple pending x86 patches but they'll have to
wait for next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm/arm64: vgic: Kick VCPUs when queueing already pending IRQs
KVM: arm/arm64: vgic: Prevent access to invalid SPIs
arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU
Linus Torvalds [Sun, 13 Nov 2016 18:26:05 +0000 (10:26 -0800)]
Merge branch 'media-fixes' (patches from Mauro)
Merge media fixes from Mauro Carvalho Chehab:
"This contains two patches fixing problems with my patch series meant
to make USB drivers to work again after the DMA on stack changes.
The last patch on this series is actually not related to DMA on stack.
It solves a longstanding bug affecting module unload, causing
module_put() to be called twice. It was reported by the user who
reported and tested the issues with the gp8psk driver with the DMA
fixup patches. As we're late at -rc cycle, maybe you prefer to not
apply it right now. If this is the case, I'll add to the pile of
patches for 4.10.
Exceptionally this time, I'm sending the patches via e-mail, because
I'm on another trip, and won't be able to use the usual procedure
until Monday. Also, it is only three patches, and you followed already
the discussions about the first one"
Linus Torvalds [Sun, 13 Nov 2016 18:24:08 +0000 (10:24 -0800)]
Merge tag 'char-misc-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are three small driver fixes for some reported issues for
4.9-rc5.
One for the hyper-v subsystem, fixing up a naming issue that showed up
in 4.9-rc1, one mei driver fix, and one fix for parallel ports,
resolving a reported regression.
All have been in linux-next with no reported issues"
* tag 'char-misc-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
ppdev: fix double-free of pp->pdev->name
vmbus: make sysfs names consistent with PCI
mei: bus: fix received data size check in NFC fixup
Linus Torvalds [Sun, 13 Nov 2016 18:22:07 +0000 (10:22 -0800)]
Merge tag 'driver-core-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two driver core fixes for 4.9-rc5.
The first resolves an issue with some drivers not liking to be unbound
and bound again (if CONFIG_DEBUG_TEST_DRIVER_REMOVE is enabled), which
solves some reported problems with graphics and storage drivers. The
other resolves a smatch error with the 4.9-rc1 driver core changes
around this feature.
Both have been in linux-next with no reported issues"
* tag 'driver-core-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: fix smatch warning on dev->bus check
driver core: skip removal test for non-removable drivers
Linus Torvalds [Sun, 13 Nov 2016 18:13:33 +0000 (10:13 -0800)]
Merge tag 'staging-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Grek KH:
"Here are a few small staging and iio driver fixes for reported issues.
The last one was cherry-picked from my -next branch to resolve a build
warning that Arnd fixed, in his quest to be able to turn
-Wmaybe-uninitialized back on again. That patch, and all of the
others, have been in linux-next for a while with no reported issues"
* tag 'staging-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: maxim_thermocouple: detect invalid storage size in read()
staging: nvec: remove managed resource from PS2 driver
Revert "staging: nvec: ps2: change serio type to passthrough"
drivers: staging: nvec: remove bogus reset command for PS/2 interface
staging: greybus: arche-platform: fix device reference leak
staging: comedi: ni_tio: fix buggy ni_tio_clock_period_ps() return value
staging: sm750fb: Fix bugs introduced by early commits
iio: hid-sensors: Increase the precision of scale to fix wrong reading interpretation.
iio: orientation: hid-sensor-rotation: Add PM function (fix non working driver)
iio: st_sensors: fix scale configuration for h3lis331dl
staging: iio: ad5933: avoid uninitialized variable in error case
Linus Torvalds [Sun, 13 Nov 2016 18:09:04 +0000 (10:09 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull more block fixes from Jens Axboe:
"Since I mistakenly left out the lightnvm regression fix yesterday and
the aoeblk seems adequately tested at this point, might as well send
out another pull to make -rc5"
* 'for-linus' of git://git.kernel.dk/linux-block:
aoe: fix crash in page count manipulation
lightnvm: invalid offset calculation for lba_shift
Linus Torvalds [Sun, 13 Nov 2016 18:07:08 +0000 (10:07 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The megaraid_sas patch in here fixes a major regression in the last
fix set that made all megaraid_sas cards unusable. It turns out no-one
had actually tested such an "obvious" fix, sigh. The fix for the fix
has been tested ...
The next most serious is the vmw_pvscsi abort problem which basically
means that aborts don't work on the vmware paravirt devices and error
handling always escalates to reset.
The rest are an assortment of missed reference counting in certain
paths and corner case bugs that show up on some architectures"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove
scsi: qla2xxx: do not queue commands when unloading
scsi: libcxgbi: fix incorrect DDP resource cleanup
scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
scsi: scsi_dh_alua: Fix a reference counting bug
scsi: vmw_pvscsi: return SUCCESS for successful command aborts
scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk
scsi: scsi_dh_alua: fix missing kref_put() in alua_rtpg_work()
Linus Torvalds [Sun, 13 Nov 2016 18:04:55 +0000 (10:04 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"The typical collection of minor bug fixes in clk drivers. We don't
have anything in the core framework here, just driver fixes.
There's a boot fix for Samsung devices and a safety measure for qoriq
to prevent CPUs from running too fast. There's also a fix for i.MX6Q
to properly handle audio clock rates. We also have some "that's
obviously wrong" fixes like bad NULL pointer checks in the MPP driver
and a poor usage of __pa in the xgene clk driver that are fixed here"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: mmp: pxa910: fix return value check in pxa910_clk_init()
clk: mmp: pxa168: fix return value check in pxa168_clk_init()
clk: mmp: mmp2: fix return value check in mmp2_clk_init()
clk: qoriq: Don't allow CPU clocks higher than starting value
clk: imx: fix integer overflow in AV PLL round rate
clk: xgene: Don't call __pa on ioremaped address
clk/samsung: Use CLK_OF_DECLARE_DRIVER initialization method for CLKOUT
clk: rockchip: don't return NULL when failing to register ddrclk branch
The DVB binding schema at the DVB core assumes that the frontend is a
separate driver. Faling to do that causes OOPS when the module is
removed, as it tries to do a symbol_put_addr on an internal symbol,
causing craches like:
Commit bc29131ecb10 ("[media] gp8psk: don't do DMA on stack") fixed the
usage of DMA on stack, but the memcpy was wrong for gp8psk_usb_in_op().
Fix it.
From Derek's email:
"Fix confirmed using 2 different Skywalker models with
HD mpeg4, SD mpeg2."
Suggested-by: Johannes Stezenbach <js@linuxtv.org> Fixes: bc29131ecb10 ("[media] gp8psk: don't do DMA on stack") Tested-by: Derek <user.vdr@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The device's neighbour table is periodically dumped in order to update
the kernel about active neighbours. A single dump session may span
multiple queries, until the response carries less records than requested
or when a record (can contain up to four neighbour entries) is not full.
Current code stops the session when the number of returned records is
zero, which can result in infinite loop in case of high packet rate.
Fix this by stopping the session according to the above logic.
Fixes: 842a6e37f8ac ("mlxsw: spectrum_router: Periodically update the kernel's neigh table") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Fri, 11 Nov 2016 15:34:25 +0000 (16:34 +0100)]
mlxsw: spectrum: Fix refcount bug on span entries
When binding port to a newly created span entry, its refcount is
initialized to zero even though it has a bound port. That leads
to unexpected behaviour when the user tries to delete that port
from the span entry.
Fix this by initializing the reference count to 1.
Also add a warning to put function.
Fixes: a25b2c9e6d13 ("mlxsw: spectrum: Add support in matchall mirror TC offloading") Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 11 Nov 2016 05:11:43 +0000 (00:11 -0500)]
bnxt_en: Fix VF virtual link state.
If the physical link is down and the VF virtual link is set to "enable",
the current code does not always work. If the link is down but the
cable is attached, the firmware returns LINK_SIGNAL instead of
NO_LINK. The current code is treating LINK_SIGNAL as link up.
The fix is to treat link as down when the link_status != LINK.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Frysinger [Fri, 11 Nov 2016 00:08:39 +0000 (19:08 -0500)]
Revert "include/uapi/linux/atm_zatm.h: include linux/time.h"
This reverts commit ca37d718f2b6 ("include/uapi/linux/atm_zatm.h: include
linux/time.h").
This attempted to fix userspace breakage that no longer existed when
the patch was merged. Almost one year earlier, commit f1bfe01ffcf0
("atm: remove 'struct zatm_t_hist'") deleted the struct in question.
After this patch was merged, we now have to deal with people being
unable to include this header in conjunction with standard C library
headers like stdlib.h (which linux-atm does). Example breakage:
x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \
-I. -DCPPFLAGS_TEST -I../../src/include -O2 -march=native -pipe -g \
-frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \
-Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \
-Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \
-Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c
In file included from /usr/include/linux/atm_zatm.h:17:0,
from zntune.c:17:
/usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’
struct timespec {
^
In file included from /usr/include/sys/select.h:43:0,
from /usr/include/sys/types.h:219,
from /usr/include/stdlib.h:314,
from zntune.c:9:
/usr/include/time.h:120:8: note: originally defined here
struct timespec
^
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 10 Nov 2016 21:12:35 +0000 (13:12 -0800)]
tcp: take care of truncations done by sk_filter()
With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()
Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.
We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq
Many thanks to syzkaller team and Marco for giving us a reproducer.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marco Grassi <marco.gra@gmail.com> Reported-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().
After commit a934d4a8fe23 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.
So, use the new_gw for neigh lookup.
Changes from v1:
- use __ipv4_neigh_lookup instead (per Eric Dumazet).
Fixes: a934d4a8fe23 ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Thu, 10 Nov 2016 14:03:01 +0000 (15:03 +0100)]
net: phy: marvell: optimize logic for page changing during init
Instead of remembering if the page was changed, just compare the current
page to the saved one. This is easier and has the advantage to save a
register write if the page was already restored.
Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 25 Oct 2016 15:55:04 +0000 (17:55 +0200)]
iio: maxim_thermocouple: detect invalid storage size in read()
As found by gcc -Wmaybe-uninitialized, having a storage_bytes value other
than 2 or 4 will result in undefined behavior:
drivers/iio/temperature/maxim_thermocouple.c: In function 'maxim_thermocouple_read':
drivers/iio/temperature/maxim_thermocouple.c:141:5: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This probably cannot happen, but returning -EINVAL here is appropriate
and makes gcc happy and the code more robust.
Fixes: a37d2838e7c1 ("iio: maxim_thermocouple: Align 16 bit big endian value of raw reads") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
(cherry picked from commit 32cb7d27e65df9daa7cee8f1fdf7b259f214bee2) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.
Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.
Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.
Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161112210424.5157-3-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
Borislav Petkov [Sat, 12 Nov 2016 21:04:23 +0000 (21:04 +0000)]
x86/efi: Fix EFI memmap pointer size warning
Fix this when building on 32-bit:
arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
(efi_memory_desc_t *)pa);
^
arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
(efi_memory_desc_t *)pa);
^
The @pa local variable is declared as phys_addr_t and that is a u64 when
CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
build.)
However, its value comes from __pa() which is basically doing pointer
arithmetic and checking, and returns unsigned long as it is the native
pointer width.
So let's use an unsigned long too. It should be fine to do so because
the later users cast it to a pointer too.
Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20161112210424.5157-2-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch series is targeted at adding support for a new PCI version
of the hardware. As part of the new PCI device, there is a new PCS/PHY
interaction, ECC support, I2C sideband communication, SFP+ support and
more.
The following updates and fixes are included in this driver update series:
- Hardware workaround for possible incorrectly generated interrupts
during software reset
- Hardware workaround for Tx timestamp register access order
- Add support for a PCI version of the device
- Increase the Rx queue limit to take advantage of the increased number
of DMA channels that might be available
- Add support for a new DMA channel interrupt mode
- Add ECC support for the device memory
- Add support for using the integrated I2C controller for sideband
communication
- Expose the phylib phy_aneg_done() function so it can be called by the
driver
- Add support for SFP+ modules
- Add support for MDIO attached PHYs
- Add support for KR re-driver between the PCS/SerDes and an external
PHY
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:11:41 +0000 (17:11 -0600)]
amd-xgbe: Add support for a KR redriver
This patch provides support for the presence of a KR redriver chip in
between the device PCS and an external PHY. When a redriver chip is
present the device must perform clause 73 auto-negotiation in order to
set the redriver chip for the downstream connection.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:11:14 +0000 (17:11 -0600)]
amd-xgbe: Add support for MDIO attached PHYs
Use the phylib support in the kernel to communicate with and control an
MDIO attached PHY. Use the hardware's MDIO communication mechanism to
communicate with the PHY.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:58 +0000 (17:10 -0600)]
amd-xgbe: Add support for SFP+ modules
Add support for recognizing and using SFP+ modules directly. This includes
using the I2C support to read and interpret the information returned from
an SFP+ module and configuring things properly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:36 +0000 (17:10 -0600)]
amd-xgbe: Add I2C support for sideband communication
Add support to initialize and use the I2C controller within the hardware
in order to perform sideband communication, e.g. determine the SFP media
type that is installed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:26 +0000 (17:10 -0600)]
amd-xgbe: Add ECC status support for the device memory
Some versions of the amd-xgbe device are capable of reporting ECC error
information back to the driver. Add support to process, track and report
on this information.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:17 +0000 (17:10 -0600)]
amd-xgbe: Add support for new DMA interrupt mode
The current per channel DMA interrupt support is based on an edge
triggered interrupt that is not maskable. This results in having to call
the disable_irq/enable_irq functions in order to prevent interrupts
during napi processing. The hardware now has a way to configure the per
channel DMA interrupt that will allow for masking the interrupt which
prevents calling disable_irq/enable_irq now. This patch makes use of
this support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:10:05 +0000 (17:10 -0600)]
amd-xgbe: Allow for a greater number of Rx queues
Remove the call to netif_get_num_default_rss_queues() and replace it
with num_online_cpus() to allow for the possibility of using all of
the hardware DMA channels available.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:09:45 +0000 (17:09 -0600)]
amd-xgbe: Add a workaround for Tx timestamp issue
Update the reading of the Tx timestamp to account for a hardware issue
on how the fields and interrupt are cleared. The "seconds" portion of
the timestamp should be read first, followed by the "nanoseconds" portion.
Reading the "nanoseconds" portion should clear the timestamp data and the
interrupt. Because of an issue with the hardware this order is reversed
and reading the "seconds" portion actually clears the timestamp. The code
currently follows this workaround, but to guard against future versions
where this is fixed add a field to the version data to indicate if the
workaround is required or not.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 10 Nov 2016 23:09:29 +0000 (17:09 -0600)]
amd-xgbe: Guard against incorrectly generated interrupts
Due to a hardware issue, it is possible for interrupt events to be
incorrectly generated when performing a soft reset. To guard against
this, perform the soft reset twice.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Nov 2016 05:51:04 +0000 (00:51 -0500)]
Merge branch 'ovs-L3-encap'
Jiri Benc says:
====================
openvswitch: support for layer 3 encapsulated packets
At the core of this patch set is removing the assumption in Open vSwitch
datapath that all packets have Ethernet header.
The implementation relies on the presence of pop_eth and push_eth actions
in datapath flows to facilitate adding and removing Ethernet headers as
appropriate. The construction of such flows is left up to user-space.
This series is based on work by Simon Horman, Lorand Jakab, Thomas Morin and
others. I kept Lorand's and Simon's s-o-b in the patches that are derived
from v11 to record their authorship of parts of the code.
Changes from v12 to v13:
* Addressed Pravin's feedback.
* Removed the GRE vport conversion patch; L3 GRE ports should be created by
rtnetlink instead.
Main changes from v11 to v12:
* The patches were restructured and split differently for easier review.
* They were rebased and adjusted to the current net-next. Especially MPLS
handling is different (and easier) thanks to the recent MPLS GSO rework.
* Several bugs were discovered and fixed. The most notable is fragment
handling: header adjustment for ARPHRD_NONE devices on tx needs to be done
after refragmentation, not before it. This required significant changes in
the patchset. Another one is stricter checking of attributes (match on L2
vs. L3 packet) at the kernel level.
* Instead of is_layer3 bool, a mac_proto field is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:24 +0000 (16:28 +0100)]
openvswitch: allow L3 netdev ports
Allow ARPHRD_NONE interfaces to be added to ovs bridge.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Benc [Thu, 10 Nov 2016 15:28:23 +0000 (16:28 +0100)]
openvswitch: add Ethernet push and pop actions
It's not allowed to push Ethernet header in front of another Ethernet
header.
It's not allowed to pop Ethernet header if there's a vlan tag. This
preserves the invariant that L3 packet never has a vlan tag.
Based on previous versions by Lorand Jakab and Simon Horman.
Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>