net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
Commit 02b56483900a7 ("net: use listified RX for handling GRO_NORMAL
skbs") made use of listified skb processing for the users of
napi_gro_frags().
The same technique can be used in a way more common napi_gro_receive()
to speed up non-merged (GRO_NORMAL) skbs for a wide range of drivers
including gro_cells and mac80211 users.
This slightly changes the return value in cases where skb is being
dropped by the core stack, but it seems to have no impact on related
drivers' functionality.
gro_normal_batch is left untouched as it's very individual for every
single system configuration and might be tuned in manual order to
achieve an optimal performance.
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Himadri Pandya [Sun, 13 Oct 2019 00:30:21 +0000 (00:30 +0000)]
hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication
Current code assumes PAGE_SIZE (the guest page size) is equal
to the page size used to communicate with Hyper-V (which is
always 4K). While this assumption is true on x86, it may not
be true for Hyper-V on other architectures. For example,
Linux on ARM64 may have PAGE_SIZE of 16K or 64K. A new symbol,
HV_HYP_PAGE_SIZE, has been previously introduced to use when
the Hyper-V page size is intended instead of the guest page size.
Make this code work on non-x86 architectures by using the new
HV_HYP_PAGE_SIZE symbol instead of PAGE_SIZE, where appropriate.
Also replace the now redundant PAGE_SIZE_4K with HV_HYP_PAGE_SIZE.
The change has no effect on x86, but lays the groundwork to run
on ARM64 and others.
Signed-off-by: Himadri Pandya <himadrispandya@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 11 Oct 2019 22:31:15 +0000 (01:31 +0300)]
net: dsa: sja1105: Switch to scatter/gather API for SPI
This reworks the SPI transfer implementation to make use of more of the
SPI core features. The main benefit is to avoid the memcpy in
sja1105_xfer_buf().
The memcpy was only needed because the function was transferring a
single buffer at a time. So it needed to copy the caller-provided buffer
at buf + 4, to store the SPI message header in the "headroom" area.
But the SPI core supports scatter-gather messages, comprised of multiple
transfers. We can actually use those to break apart every SPI message
into 2 transfers: one for the header and one for the actual payload.
To keep the behavior the same regarding the chip select signal, it is
necessary to tell the SPI core to de-assert the chip select after each
chunk. This was not needed before, because each spi_message contained
only 1 single transfer.
The meaning of the per-transfer cs_change=1 is:
- If the transfer is the last one of the message, keep CS asserted
- Otherwise, deassert CS
We need to deassert CS in the "otherwise" case, which was implicit
before.
Avoiding the memcpy creates yet another opportunity. The device can't
process more than 256 bytes of SPI payload at a time, so the
sja1105_xfer_long_buf() function used to exist, to split the larger
caller buffer into chunks.
But these chunks couldn't be used as scatter/gather buffers for
spi_message until now, because of that memcpy (we would have needed more
memory for each chunk). So we can now remove the sja1105_xfer_long_buf()
function and have a single implementation for long and short buffers.
Another benefit is lower usage of stack memory. Previously we had to
store 2 SPI buffers for each chunk. Due to the elimination of the
memcpy, we can now send pointers to the actual chunks from the
caller-supplied buffer to the SPI core.
Since the patch merges two functions into a rewritten implementation,
the function prototype was also changed, mainly for cosmetic consistency
with the structures used within it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 11 Oct 2019 17:22:32 +0000 (18:22 +0100)]
net: b44: remove redundant assignment to variable reg
The variable reg is being assigned a value that is never read
and is being re-assigned in the following for-loop. The
assignment is redundant and hence can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
PTP driver refactoring for SJA1105 DSA
This series creates a better separation between the driver core and the
PTP portion. Therefore, users who are not interested in PTP can get a
simpler and smaller driver by compiling it out.
This is in preparation for further patches: SPI transfer timestamping,
synchronizing the hardware clock (as opposed to keeping it
free-running), PPS input/output, etc.
====================
Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 11 Oct 2019 23:18:16 +0000 (02:18 +0300)]
net: dsa: sja1105: Change the PTP command access pattern
The PTP command register contains enable bits for:
- Putting the 64-bit PTPCLKVAL register in add/subtract or write mode
- Taking timestamps off of the corrected vs free-running clock
- Starting/stopping the TTEthernet scheduling
- Starting/stopping PPS output
- Resetting the switch
When a command needs to be issued (e.g. "change the PTPCLKVAL from write
mode to add/subtract mode"), one cannot simply write to the command
register setting the PTPCLKADD bit to 1, because that would zeroize the
other settings. One also cannot do a read-modify-write (that would be
too easy for this hardware) because not all bits of the command register
are readable over SPI.
So this leaves us with the only option of keeping the value of the PTP
command register in the driver, and operating on that.
Actually there are 2 types of PTP operations now:
- Operations that modify the cached PTP command. These operate on
ptp_data->cmd as a pointer.
- Operations that apply all previously cached PTP settings, but don't
otherwise cache what they did themselves. The sja1105_ptp_reset
function is such an example. It copies the ptp_data->cmd on stack
before modifying and writing it to SPI.
This practically means that struct sja1105_ptp_cmd is no longer an
implementation detail, since it needs to be stored in full into struct
sja1105_ptp_data, and hence in struct sja1105_private. So the (*ptp_cmd)
function prototype can change and take struct sja1105_ptp_cmd as second
argument now.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 11 Oct 2019 23:18:15 +0000 (02:18 +0300)]
net: dsa: sja1105: Move PTP data to its own private structure
This is a non-functional change with 2 goals (both for the case when
CONFIG_NET_DSA_SJA1105_PTP is not enabled):
- Reduce the size of the sja1105_private structure.
- Make the PTP code more self-contained.
Leaving priv->ptp_data.lock to be initialized in sja1105_main.c is not a
leftover: it will be used in a future patch "net: dsa: sja1105: Restore
PTP time after switch reset".
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 11 Oct 2019 23:18:14 +0000 (02:18 +0300)]
net: dsa: sja1105: Make all public PTP functions take dsa_switch as argument
The new rule (as already started for sja1105_tas.h) is for functions of
optional driver components (ones which may be disabled via Kconfig - PTP
and TAS) to take struct dsa_switch *ds instead of struct sja1105_private
*priv as first argument.
This is so that forward-declarations of struct sja1105_private can be
avoided.
So make sja1105_ptp.h the second user of this rule.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Oct 2019 18:29:07 +0000 (11:29 -0700)]
Merge tag 'mac80211-next-for-net-next-2019-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
A few more small things, nothing really stands out:
* minstrel improvements from Felix
* a TX aggregation simplification
* some additional capabilities for hwsim
* minor cleanups & docs updates
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubecek [Fri, 11 Oct 2019 07:40:09 +0000 (09:40 +0200)]
genetlink: do not parse attributes for families with zero maxattr
Commit 61a1afebc302 ("net: genetlink: push attrbuf allocation and parsing
to a separate function") moved attribute buffer allocation and attribute
parsing from genl_family_rcv_msg_doit() into a separate function
genl_family_rcv_msg_attrs_parse() which, unlike the previous code, calls
__nlmsg_parse() even if family->maxattr is 0 (i.e. the family does its own
parsing). The parser error is ignored and does not propagate out of
genl_family_rcv_msg_attrs_parse() but an error message ("Unknown attribute
type") is set in extack and if further processing generates no error or
warning, it stays there and is interpreted as a warning by userspace.
Dumpit requests are not affected as genl_family_rcv_msg_dumpit() bypasses
the call of genl_family_rcv_msg_attrs_parse() if family->maxattr is zero.
Move this logic inside genl_family_rcv_msg_attrs_parse() so that we don't
have to handle it in each caller.
v3: put the check inside genl_family_rcv_msg_attrs_parse()
v2: adjust also argument of genl_family_rcv_msg_attrs_free()
Fixes: 61a1afebc302 ("net: genetlink: push attrbuf allocation and parsing to a separate function") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: improve recv_skip_hint for tcp_zerocopy_receive
tcp_zerocopy_receive() rounds down the zc->length a multiple of
PAGE_SIZE. This results in two issues:
- tcp_zerocopy_receive sets recv_skip_hint to the length of the
receive queue if the zc->length input is smaller than the
PAGE_SIZE, even though the data in receive queue could be
zerocopied.
- tcp_zerocopy_receive would set recv_skip_hint of 0, in cases
where we have a little bit of data after the perfectly-sized
packets.
To fix these issues, do not store the rounded down value in
zc->length. Round down the length passed to zap_page_range(),
and return min(inq, zc->length) when the zap_range is 0.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Patch #1 enforces libbpf build to have bpf_helper_defs.h ready before test BPF
programs are built.
Patch #2 drops obsolete BTF/pahole detection logic from Makefile.
v1->v2:
- drop CPU and PROBE (Martin).
====================
Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Fri, 11 Oct 2019 22:01:46 +0000 (15:01 -0700)]
selftests/bpf: Remove obsolete pahole/BTF support detection
Given lots of selftests won't work without recent enough Clang/LLVM that
fully supports BTF, there is no point in maintaining outdated BTF
support detection and fall-back to pahole logic. Just assume we have
everything we need.
Andrii Nakryiko [Fri, 11 Oct 2019 22:01:45 +0000 (15:01 -0700)]
selftests/bpf: Enforce libbpf build before BPF programs are built
Given BPF programs rely on libbpf's bpf_helper_defs.h, which is
auto-generated during libbpf build, libbpf build has to happen before
we attempt progs/*.c build. Enforce it as order-only dependency.
====================
This series contains mainly fixes/improvements for cross-compilation
but not only, tested for arm, arm64, and intended for any arch.
Also verified on native build (not cross compilation) for x86_64
and arm, arm64.
Besides the patches given here, the RFC also contains couple patches
related to llvm clang
arm: include: asm: swab: mask rev16 instruction for clang
arm: include: asm: unified: mask .syntax unified for clang
They are necessarily to verify arm 32 build.
Also, couple more fixes were added but are not merged in bpf-next yet,
they can be needed for verification/configuration steps, if not in
your tree the fixes can be taken here:
https://www.spinics.net/lists/netdev/msg601716.html
https://www.spinics.net/lists/netdev/msg601714.html
https://www.spinics.net/lists/linux-kbuild/msg23468.html
Now, to build samples, SAMPLE_BPF should be enabled in config.
The change touches not only cross-compilation and can have impact on
other archs and build environments, so might be good idea to verify
it in order to add appropriate changes, some warn options could be
tuned also.
All is tested on x86-64 with clang installed (has to be built containing
targets for arm, arm64..., see llc --version, usually it's present already)
Instructions to test native on x86_64
=================================================
Native build on x86_64 is done in usual way and shouldn't have difference
except HOSTCC is now printed as CC wile building the samples.
Instructions to test cross compilation on arm64
=================================================
gcc version 8.3.0
(GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36))
I've used sdk for TI am65x got here:
http://downloads.ti.com/processor-sdk-linux/esd/AM65X/latest/exports/\
ti-processor-sdk-linux-am65xx-evm-06.00.00.07-Linux-x86-Install.bin
make ARCH=arm64 -C tools/ clean
make ARCH=arm64 -C samples/bpf clean
make ARCH=arm64 clean
make ARCH=arm64 defconfig
make ARCH=arm64 headers_install
make ARCH=arm64 INSTALL_HDR_PATH=/../sdk/\
ti-processor-sdk-linux-am65xx-evm-06.00.00.07/linux-devkit/sysroots/\
aarch64-linux/usr headers_install
make samples/bpf/ ARCH=arm64 CROSS_COMPILE="aarch64-linux-gnu-"\
SYSROOT="/../sdk/ti-processor-sdk-linux-am65xx-evm-06.00.00.07/\
linux-devkit/sysroots/aarch64-linux"
Instructions to test cross compilation on arm
=================================================
arm-linux-gnueabihf-gcc (Linaro GCC 7.2-2017.11) 7.2.1 20171011
or
arm-linux-gnueabihf-gcc
(GNU Toolchain for the A-profile Architecture 8.3-2019.03 \
(arm-rel-8.36)) 8.3.0
make ARCH=arm -C tools/ clean
make ARCH=arm -C samples/bpf clean
make ARCH=arm clean
make ARCH=arm omap2plus_defconfig
make ARCH=arm headers_install
make ARCH=arm INSTALL_HDR_PATH=/../sdk/\
ti-processor-sdk-linux-am57xx-evm-05.03.00.07/linux-devkit/sysroots/\
armv7ahf-neon-linux-gnueabi/usr headers_install
make samples/bpf/ ARCH=arm CROSS_COMPILE="arm-linux-gnueabihf-"\
SYSROOT="/../sdk/ti-processor-sdk-linux-am57xx-evm-05.03\
.00.07/linux-devkit/sysroots/armv7ahf-neon-linux-gnueabi"
Based on bpf-next/master
v5..v4:
- any changes, only missed SOBs are added
v4..v3:
- renamed CLANG_EXTRA_CFLAGS on BPF_EXTRA_CFLAGS
- used filter for ARCH_ARM_SELECTOR
- omit "-fomit-frame-pointer" and use same flags for native and "cross"
- used sample/bpf prefixes
- use C instead of C++ compiler for test_libbpf target
v3..v2:
- renamed makefile.progs to makeifle.target, as more appropriate
- left only __LINUX_ARM_ARCH__ for D options for arm
- for host build - left options from KBUILD_HOST for compatibility reasons
- split patch adding c/cxx/ld flags to libbpf by modules
- moved readme change to separate patch
- added patch setting options for cross-compile
- fixed issue with option error for syscall_nrs.S,
avoiding overlap for ccflags-y.
v2..v1:
- restructured patches order
- split "samples: bpf: Makefile: base progs build on Makefile.progs"
to make change more readable. It added couple nice extra patches.
- removed redundant patch:
"samples: bpf: Makefile: remove target for native build"
- added fix:
"samples: bpf: makefile: fix cookie_uid_helper_example obj build"
- limited -D option filter only for arm
- improved comments
- added couple instructions to verify cross compilation for arm and
arm64 arches based on TI am57xx and am65xx sdks.
- corrected include a little order
====================
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:28:07 +0000 (03:28 +0300)]
samples/bpf: Add sysroot support
Basically it only enables that was added by previous couple fixes.
Sysroot contains correct libs installed and its headers. Useful when
working with NFC or virtual machine.
Usage example:
clean (on demand)
make ARCH=arm -C samples/bpf clean
make ARCH=arm -C tools clean
make ARCH=arm clean
configure and install headers:
make ARCH=arm defconfig
make ARCH=arm headers_install
build samples/bpf:
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- samples/bpf/ \
SYSROOT="path/to/sysroot"
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:28:05 +0000 (03:28 +0300)]
libbpf: Add C/LDFLAGS to libbpf.so and test_libpf targets
In case of C/LDFLAGS there is no way to pass them correctly to build
command, for instance when --sysroot is used or external libraries
are used, like -lelf, wich can be absent in toolchain. This can be
used for samples/bpf cross-compiling allowing to get elf lib from
sysroot.
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:28:04 +0000 (03:28 +0300)]
libbpf: Don't use cxx to test_libpf target
No need to use C++ for test_libbpf target when libbpf is on C and it
can be tested with C, after this change the CXXFLAGS in makefiles can
be avoided, at least in bpf samples, when sysroot is used, passing
same C/LDFLAGS as for lib.
Add "return 0" in test_libbpf to avoid warn, but also remove spaces at
start of the lines to keep same style and avoid warns while apply.
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:28:02 +0000 (03:28 +0300)]
samples/bpf: Use own flags but not HOSTCFLAGS
While compiling natively, the host's cflags and ldflags are equal to
ones used from HOSTCFLAGS and HOSTLDFLAGS. When cross compiling it
should have own, used for target arch. While verification, for arm,
arm64 and x86_64 the following flags were used always:
So, add them as they were verified and used before adding
Makefile.target and lets omit "-fomit-frame-pointer" as were proposed
while review, as no sense in such optimization for samples.
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:28:01 +0000 (03:28 +0300)]
samples/bpf: Base target programs rules on Makefile.target
The main reason for that - HOSTCC and CC have different aims.
HOSTCC is used to build programs running on host, that can
cross-comple target programs with CC. It was tested for arm and arm64
cross compilation, based on linaro toolchain, but should work for
others.
So, in order to split cross compilation (CC) with host build (HOSTCC),
lets base samples on Makefile.target. It allows to cross-compile
samples/bpf programs with CC while auxialry tools running on host
built with HOSTCC.
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:28:00 +0000 (03:28 +0300)]
samples/bpf: Add makefile.target for separate CC target build
The Makefile.target is added only and will be used in
sample/bpf/Makefile later in order to switch cross-compiling to CC
from HOSTCC environment.
The HOSTCC is supposed to build binaries and tools running on the host
afterwards, in order to simplify build or so, like "fixdep" or else.
In case of cross compiling "fixdep" is executed on host when the rest
samples should run on target arch. In order to build binaries for
target arch with CC and tools running on host with HOSTCC, lets add
Makefile.target for simplicity, having definition and routines similar
to ones, used in script/Makefile.host. This allows later add
cross-compilation to samples/bpf with minimum changes.
The tprog stands for target programs built with CC.
Makefile.target contains only stuff needed for samples/bpf, potentially
can be reused later and now needed only for unblocking tricky
samples/bpf cross compilation.
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:27:59 +0000 (03:27 +0300)]
samples/bpf: Drop unnecessarily inclusion for bpf_load
Drop inclusion for bpf_load -I$(objtree)/usr/include as it is
included for all objects anyway, with above line:
KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:27:58 +0000 (03:27 +0300)]
samples/bpf: Use __LINUX_ARM_ARCH__ selector for arm
For arm, -D__LINUX_ARM_ARCH__=X is min version used as instruction
set selector and is absolutely required while parsing some parts of
headers. It's present in KBUILD_CFLAGS but not in autoconf.h, so let's
retrieve it from and add to programs cflags. In another case errors
like "SMP is not supported" for armv7 and bunch of other errors are
issued resulting to incorrect final object.
Ivan Khoronzhuk [Fri, 11 Oct 2019 00:27:56 +0000 (03:27 +0300)]
samples/bpf: Use --target from cross-compile
For cross compiling the target triple can be inherited from
cross-compile prefix as it's done in CLANG_FLAGS from kernel makefile.
So copy-paste this decision from kernel Makefile.
Don't list userspace "cookie_uid_helper_example" object in list for
bpf objects.
'always' target is used for listing bpf programs, but
'cookie_uid_helper_example.o' is a user space ELF file, and covered
by rule `per_socket_stats_example`, so shouldn't be in 'always'.
Let us remove `always += cookie_uid_helper_example.o`, which avoids
breaking cross compilation due to mismatched includes.
Jiri Pirko [Thu, 10 Oct 2019 13:18:50 +0000 (15:18 +0200)]
netdevsim: implement couple of testing devlink health reporters
Implement "empty" and "dummy" reporters. The first one is really simple
and does nothing. The other one has debugfs files to trigger breakage
and it is able to do recovery. The ops also implement dummy fmsg
content.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Fink [Thu, 10 Oct 2019 13:00:22 +0000 (15:00 +0200)]
net: usb: ax88179_178a: write mac to hardware in get_mac_addr
When the MAC address is supplied via device tree or a random
MAC is generated it has to be written to the asix chip in
order to receive any data.
Previously in 1a7f4ce06175 ("net: usb: ax88179_178a: allow
optionally getting mac address from device tree") this line was
omitted because it seemed to work perfectly fine without it.
But it was simply not detected because the chip keeps the mac
stored even beyond a reset and it was tested on a hardware
with an integrated UPS where the asix chip was permanently
powered on even throughout power cycles.
Fixes: 1a7f4ce06175 ("net: usb: ax88179_178a: allow optionally getting mac address from device tree") Signed-off-by: Peter Fink <pfink@christ-es.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Vito Caputo [Thu, 10 Oct 2019 03:43:47 +0000 (20:43 -0700)]
af_unix: __unix_find_socket_byname() cleanup
Remove pointless return variable dance.
Appears vestigial from when the function did locking as seen in
unix_find_socket_byinode(), but locking is handled in
unix_find_socket_byname() for __unix_find_socket_byname().
Signed-off-by: Vito Caputo <vcaputo@pengaru.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: ftgmac100: Ungate RCLK for RMII on ASPEED MACs
This series slightly extends the devicetree binding and driver for the
FTGMAC100 to describe an optional RMII RCLK gate in the clocks property.
Currently it's necessary for the kernel to ungate RCLK on the AST2600 in NCSI
configurations as u-boot does not yet support NCSI (which uses the
R(educed)MII).
v2:
* Clear up Reduced vs Reversed MII in the cover letter
* Mitigate anxiety in the commit message for 1/3
* Clarify that AST2500 is also affected in the clocks property description in
2/3
* Rework the error paths and update some comments in 3/3
v1 can be found here: https://lore.kernel.org/netdev/20191008115143.14149-1-andrew@aj.id.au/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Critically, the AST2600 requires ungating the RMII RCLK if e.g. NCSI is
in use.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Acked-by: Joel Stanley <joel@jms.id.au> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The AST2600 contains an FTGMAC100-compatible MAC, although the MDIO
controller previously embedded in the MAC has been moved out to a
dedicated MDIO block.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Acked-by: Joel Stanley <joel@jms.id.au> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrii Nakryiko [Fri, 11 Oct 2019 03:29:01 +0000 (20:29 -0700)]
libbpf: Handle invalid typedef emitted by old GCC
Old GCC versions are producing invalid typedef for __gnuc_va_list
pointing to void. Special-case this and emit valid:
typedef __builtin_va_list __gnuc_va_list;
Reported-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191011032901.452042-1-andriin@fb.com
Andrii Nakryiko [Fri, 11 Oct 2019 02:38:47 +0000 (19:38 -0700)]
libbpf: Generate more efficient BPF_CORE_READ code
Existing BPF_CORE_READ() macro generates slightly suboptimal code. If
there are intermediate pointers to be read, initial source pointer is
going to be assigned into a temporary variable and then temporary
variable is going to be uniformly used as a "source" pointer for all
intermediate pointer reads. Schematically (ignoring all the type casts),
BPF_CORE_READ(s, a, b, c) is expanded into:
({
const void *__t = src;
bpf_probe_read(&__t, sizeof(*__t), &__t->a);
bpf_probe_read(&__t, sizeof(*__t), &__t->b);
This initial `__t = src` makes calls more uniform, but causes slightly
less optimal register usage sometimes when compiled with Clang. This can
cascase into, e.g., more register spills.
This patch fixes this issue by generating more optimal sequence:
({
const void *__t;
bpf_probe_read(&__t, sizeof(*__t), &src->a); /* <-- src here */
bpf_probe_read(&__t, sizeof(*__t), &__t->b);
Andrii Nakryiko [Fri, 11 Oct 2019 17:20:53 +0000 (10:20 -0700)]
bpf: Fix cast to pointer from integer of different size warning
Fix "warning: cast to pointer from integer of different size" when
casting u64 addr to void *.
Fixes: 46a1145a9048 ("bpf: Track contents of read-only maps as scalars") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191011172053.2980619-1-andriin@fb.com
Jakub Sitnicki [Fri, 11 Oct 2019 08:29:46 +0000 (10:29 +0200)]
selftests/bpf: Check that flow dissector can be re-attached
Make sure a new flow dissector program can be attached to replace the old
one with a single syscall. Also check that attaching the same program twice
is prohibited.
Jakub Sitnicki [Fri, 11 Oct 2019 08:29:45 +0000 (10:29 +0200)]
flow_dissector: Allow updating the flow dissector program atomically
It is currently not possible to detach the flow dissector program and
attach a new one in an atomic fashion, that is with a single syscall.
Attempts to do so will be met with EEXIST error.
This makes updates to flow dissector program hard. Traffic steering that
relies on BPF-powered flow dissection gets disrupted while old program has
been already detached but the new one has not been attached yet.
There is also a window of opportunity to attach a flow dissector to a
non-root namespace while updating the root flow dissector, thus blocking
the update.
Lastly, the behavior is inconsistent with cgroup BPF programs, which can be
replaced with a single bpf(BPF_PROG_ATTACH, ...) syscall without any
restrictions.
Allow attaching a new flow dissector program when another one is already
present with a restriction that it can't be the same program.
Felix Fietkau [Tue, 8 Oct 2019 17:11:38 +0000 (19:11 +0200)]
mac80211: minstrel_ht: replace rate stats ewma with a better moving average
Rate success probability usually fluctuates a lot under normal conditions.
With a simple EWMA, noise and fluctuation can be reduced by increasing the
window length, but that comes at the cost of introducing lag on sudden
changes.
This change replaces the EWMA implementation with a moving average that's
designed to significantly reduce lag while keeping a bigger window size
by being better at filtering out noise.
It is only slightly more expensive than the simple EWMA and still avoids
divisions in its calculation.
The algorithm is adapted from an implementation intended for a completely
different field (stock market trading), where the tradeoff of lag vs
noise filtering is equally important. It is based on the "smoothing filter"
from http://www.stockspotter.com/files/PredictiveIndicators.pdf.
I have adapted it to fixed-point math with some constants so that it uses
only addition, bit shifts and multiplication
To better make use of the filtering and bigger window size, the update
interval time is cut in half.
For testing, the algorithm can be reverted to the older one via debugfs
Mahesh Bandewar [Wed, 9 Oct 2019 23:20:11 +0000 (16:20 -0700)]
ipvlan: consolidate TSO flags using NETIF_F_ALL_TSO
This will ensure that any new TSO related flags added (which
would be part of ALL_TSO mask and IPvlan driver doesn't need
to update every time new flag gets added.
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Daniel Borkmann [Thu, 10 Oct 2019 23:49:16 +0000 (01:49 +0200)]
Merge branch 'bpf-romap-known-scalars'
Andrii Nakryiko says:
====================
With BPF maps supporting direct map access (currently, array_map w/ single
element, used for global data) that are read-only both from system call and
BPF side, it's possible for BPF verifier to track its contents as known
constants.
Now it's possible for user-space control app to pre-initialize read-only map
(e.g., for .rodata section) with user-provided flags and parameters and rely
on BPF verifier to detect and eliminate dead code resulting from specific
combination of input parameters.
v1->v2:
- BPF_F_RDONLY means nothing, stick to just map->frozen (Daniel);
- stick to passing just offset into map_direct_value_addr (Martin).
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add tests checking that verifier does proper constant propagation for
read-only maps. If constant propagation didn't work, skipp_loop and
part_loop BPF programs would be rejected due to BPF verifier otherwise
not being able to prove they ever complete. With constant propagation,
though, they are succesfully validated as properly terminating loops.
Andrii Nakryiko [Wed, 9 Oct 2019 20:14:57 +0000 (13:14 -0700)]
bpf: Track contents of read-only maps as scalars
Maps that are read-only both from BPF program side and user space side
have their contents constant, so verifier can track referenced values
precisely and use that knowledge for dead code elimination, branch
pruning, etc. This patch teaches BPF verifier how to do this.
Hangbin Liu [Wed, 9 Oct 2019 12:18:28 +0000 (20:18 +0800)]
team: call RCU read lock when walking the port_list
Before reading the team port list, we need to acquire the RCU read lock.
Also change list_for_each_entry() to list_for_each_entry_rcu().
v2:
repost the patch to net-next and remove fixes flag as this is a cosmetic
change.
Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Ursula Braun [Wed, 9 Oct 2019 08:07:47 +0000 (10:07 +0200)]
net/smc: improve close of terminated socket
Make sure a terminated SMC socket reaches the CLOSED state.
Even if sending of close flags fails, change the socket state to
the intended state to avoid dangling sockets not reaching the
CLOSED state.
Ursula Braun [Wed, 9 Oct 2019 08:07:45 +0000 (10:07 +0200)]
net/smc: increase device refcount for added link group
SMCD link groups belong to certain ISM-devices and SMCR link group
links belong to certain IB-devices. Increase the refcount for
these devices, as long as corresponding link groups exist.
Ursula Braun [Wed, 9 Oct 2019 08:07:43 +0000 (10:07 +0200)]
net/smc: separate SMCD and SMCR link group lists
Currently SMCD and SMCR link groups are maintained in one list.
To facilitate abnormal termination handling they are split into
a separate list for SMCR link groups and separate lists for SMCD
link groups per SMCD device.
Biao Huang [Wed, 9 Oct 2019 07:33:48 +0000 (15:33 +0800)]
net: stmmac: dwmac-mediatek: fix wrong delay value issue when resume back
mac_delay value will be divided by 550/170 in mt2712_delay_ps2stage(),
which is invoked at the beginning of mt2712_set_delay(), and the value
should be restored at the end of mt2712_set_delay().
Or, mac_delay will be divided again when invoking mt2712_set_delay()
when resume back.
So, add mt2712_delay_stage2ps() to mt2712_set_delay() to recovery the
original mac_delay value.
Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Xin Long says:
====================
There are 4 events defined in rfc5061 missed in linux sctp:
SCTP_ADDR_ADDED, SCTP_ADDR_REMOVED, SCTP_ADDR_MADE_PRIM and
SCTP_SEND_FAILED_EVENT.
This patchset is to add them up.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Xin Long [Tue, 8 Oct 2019 11:27:36 +0000 (19:27 +0800)]
sctp: add SCTP_SEND_FAILED_EVENT event
This patch is to add a new event SCTP_SEND_FAILED_EVENT described in
rfc6458#section-6.1.11. It's a update of SCTP_SEND_FAILED event:
struct sctp_sndrcvinfo ssf_info is replaced with
struct sctp_sndinfo ssfe_info in struct sctp_send_failed_event.
SCTP_SEND_FAILED is being deprecated, but we don't remove it in this
patch. Both are being processed in sctp_datamsg_destroy() when the
corresp event flag is set.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Xin Long [Tue, 8 Oct 2019 11:27:35 +0000 (19:27 +0800)]
sctp: add SCTP_ADDR_MADE_PRIM event
sctp_ulpevent_nofity_peer_addr_change() would be called in
sctp_assoc_set_primary() to send SCTP_ADDR_MADE_PRIM event
when this transport is set to the primary path of the asoc.
This event is described in rfc6458#section-6.1.2:
SCTP_ADDR_MADE_PRIM: This address has now been made the primary
destination address. This notification is provided whenever an
address is made primary.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Xin Long [Tue, 8 Oct 2019 11:27:34 +0000 (19:27 +0800)]
sctp: add SCTP_ADDR_REMOVED event
sctp_ulpevent_nofity_peer_addr_change() is called in
sctp_assoc_rm_peer() to send SCTP_ADDR_REMOVED event
when this transport is removed from the asoc.
This event is described in rfc6458#section-6.1.2:
SCTP_ADDR_REMOVED: The address is no longer part of the
association.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Xin Long [Tue, 8 Oct 2019 11:27:33 +0000 (19:27 +0800)]
sctp: add SCTP_ADDR_ADDED event
A helper sctp_ulpevent_nofity_peer_addr_change() will be extracted
to make peer_addr_change event and enqueue it, and the helper will
be called in sctp_assoc_add_peer() to send SCTP_ADDR_ADDED event.
This event is described in rfc6458#section-6.1.2:
SCTP_ADDR_ADDED: The address is now part of the association.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Ilya Maximets [Wed, 9 Oct 2019 16:49:29 +0000 (18:49 +0200)]
libbpf: Fix passing uninitialized bytes to setsockopt
'struct xdp_umem_reg' has 4 bytes of padding at the end that makes
valgrind complain about passing uninitialized stack memory to the
syscall:
Syscall param socketcall.setsockopt() points to uninitialised byte(s)
at 0x4E7AB7E: setsockopt (in /usr/lib64/libc-2.29.so)
by 0x4BDE035: xsk_umem__create@@LIBBPF_0.0.4 (xsk.c:172)
Uninitialised value was created by a stack allocation
at 0x4BDDEBA: xsk_umem__create@@LIBBPF_0.0.4 (xsk.c:140)
Padding bytes appeared after introducing of a new 'flags' field.
memset() is required to clear them.
====================
Fix BTF-to-C logic of handling padding at the end of a struct. Fix existing
test that should have captured this. Also move test_btf_dump into a test_progs
test to leverage common infrastructure.
====================
Andrii Nakryiko [Tue, 8 Oct 2019 23:10:08 +0000 (16:10 -0700)]
selftests/bpf: Fix btf_dump padding test case
Existing padding test case for btf_dump has a good test that was
supposed to test padding generation at the end of a struct, but its
expected output was specified incorrectly. Fix this.
Andrii Nakryiko [Tue, 8 Oct 2019 23:10:06 +0000 (16:10 -0700)]
libbpf: Fix struct end padding in btf_dump
Fix a case where explicit padding at the end of a struct is necessary
due to non-standart alignment requirements of fields (which BTF doesn't
capture explicitly).
Fixes: b653ca9bf78a ("libbpf: add btf_dump API for BTF-to-C conversion") Reported-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20191008231009.2991130-2-andriin@fb.com
As noticed by Jakub, this is no longer needed after
commit be785e0ec91f ("tun: fix memory leak in error path")
This no longer exports dev_get_valid_name() for the exclusive
use of tun driver.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Jiri Pirko [Tue, 8 Oct 2019 11:01:51 +0000 (13:01 +0200)]
net: tipc: prepare attrs in __tipc_nl_compat_dumpit()
__tipc_nl_compat_dumpit() calls tipc_nl_publ_dump() which expects
the attrs to be available by genl_dumpit_info(cb)->attrs. Add info
struct and attr parsing in compat dumpit function.
Reported-by: syzbot+8d37c50ffb0f52941a5e@syzkaller.appspotmail.com Fixes: 3a3cc4be2327 ("net: tipc: have genetlink code to parse the attrs during dumpit") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Jiri Pirko [Tue, 8 Oct 2019 10:31:43 +0000 (12:31 +0200)]
net: genetlink: always allocate separate attrs for dumpit ops
Individual dumpit ops (start, dumpit, done) are locked by genl_lock
if !family->parallel_ops. However, multiple
genl_family_rcv_msg_dumpit() calls may in in flight in parallel.
Each has a separate struct genl_dumpit_info allocated
but they share the same family->attrbuf. Fix this by allocating separate
memory for attrs for dumpit ops, for non-parallel_ops (for parallel_ops
it is done already).
Reported-by: syzbot+495688b736534bb6c6ad@syzkaller.appspotmail.com Reported-by: syzbot+ff59dc711f2cff879a05@syzkaller.appspotmail.com Reported-by: syzbot+dbe02e13bcce52bcf182@syzkaller.appspotmail.com Reported-by: syzbot+9cb7edb2906ea1e83006@syzkaller.appspotmail.com Fixes: aa97a7fc27d2 ("net: genetlink: parse attrs and store in contect info struct during dumpit") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Yunsheng Lin [Tue, 8 Oct 2019 01:20:09 +0000 (09:20 +0800)]
net: hns3: support tx-scatter-gather-fraglist feature
The hardware supports up to 8 TX BD for non-tso skb and up to
63 TX BD for TSO skb. Currently, the hns3 driver supports RX skb
with fraglist when HW GRO is enabled, when the stack forwards a
RX skb with fraglist, the stack need to linearize the skb before
sending to other interface without TX fraglist support.
This patch adds support for TX fraglist. The performance increases
from 1 GByte to 1.5 GByte for one iperf TCP stream during
forwarding test after this patch. BTW, the minimum BD number of
ring should be updated to 72 for supporting TX fraglist.
This patch also changes the error handling of some function that
called by hns3_fill_desc, which returns BD num when there is no
error, change some macro to more meaningful name.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Huazhong Tan [Tue, 8 Oct 2019 01:20:08 +0000 (09:20 +0800)]
net: hns3: add support for configuring VF MAC from the host
This patch adds support of configuring VF MAC from the host
for the HNS3 driver.
BTW, the parameter init in the hns3_init_mac_addr is
unnecessary now, since the MAC address will not read from
NCL_CONFIG when doing reset, so it should be removed,
otherwise it will affect VF's MAC address initialization.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>