git.baikalelectronics.ru Git

Merge branch 'inuse-cleanups'

Eric Dumazet says:

====================
net: prot_inuse and sock_inuse cleanups

Small series cleaning and optimizing sock_prot_inuse_add()
and sock_inuse_add().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: drop nopreempt requirement on sock_prot_inuse_add()

This is distracting really, let's make this simpler,
because many callers had to take care of this
by themselves, even if on x86 this adds more
code than really needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: merge net->core.prot_inuse and net->core.sock_inuse

net->core.sock_inuse is a per cpu variable (int),
while net->core.prot_inuse is another per cpu variable
of 64 integers.

per cpu allocator tend to place them in very different places.

Grouping them together makes sense, since it makes
updates potentially faster, if hitting the same
cache line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make sock_inuse_add() available

MPTCP hard codes it, let us instead provide this helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: inline sock_prot_inuse_add()

sock_prot_inuse_add() is very small, we can inline it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gro-out-of-core-files'

Eric Dumazet says:

====================
gro: get out of core files

Move GRO related content into net/core/gro.c
and include/net/gro.h.

This reduces GRO scope to where it is really needed,
and shrinks too big files (include/linux/netdevice.h
and net/core/dev.c)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: gro: populate net/core/gro.c

Move gro code and data from net/core/dev.c to net/core/gro.c
to ease maintenance.

gro_normal_list() and gro_normal_one() are inlined
because they are called from both files.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gro: move skb_gro_receive into net/core/gro.c

net/core/gro.c will contain all core gro functions,
to shrink net/core/skbuff.c and net/core/dev.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gro: move skb_gro_receive_list to udp_offload.c

This helper is used once, no need to keep it in fat net/core/skbuff.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: move gro definitions to include/net/gro.h

include/linux/netdevice.h became too big, move gro stuff
into include/net/gro.h

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-optimizations'

Eric Dumazet says:

====================
tcp: optimizations for linux-5.17

Mostly small improvements in this series.

The notable change is in "defer skb freeing after
socket lock is released" in recvmsg() (and RX zerocopy)

The idea is to try to let skb freeing to BH handler,
whenever possible, or at least perform the freeing
outside of the socket lock section, for much improved
performance. This idea can probably be extended
to other protocols.

Tests on a 100Gbit NIC
Max throughput for one TCP_STREAM flow, over 10 runs.

MTU : 1500  (1428 bytes of TCP payload per MSS)
Before: 55 Gbit
After:  66 Gbit

MTU : 4096+ (4096 bytes of TCP payload, plus TCP/IPv6 headers)
Before: 82 Gbit
After:  95 Gbit
====================

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: move early demux fields close to sk_refcnt

sk_rx_dst/sk_rx_dst_ifindex/sk_rx_dst_cookie are read in early demux,
and currently spans two cache lines.

Moving them close to sk_refcnt makes more sense, as only one cache
line is needed.

New layout for this hot cache line is :

struct sock {
struct sock_common         __sk_common;          /*     0  0x88 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
struct dst_entry *         sk_rx_dst;            /*  0x88   0x8 */
int                        sk_rx_dst_ifindex;    /*  0x90   0x4 */
u32                        sk_rx_dst_cookie;     /*  0x94   0x4 */
socket_lock_t              sk_lock;              /*  0x98  0x20 */
atomic_t                   sk_drops;             /*  0xb8   0x4 */
int                        sk_rcvlowat;          /*  0xbc   0x4 */
/* --- cacheline 3 boundary (192 bytes) --- */

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: do not call tcp_cleanup_rbuf() if we have a backlog

Under pressure, tcp recvmsg() has logic to process the socket backlog,
but calls tcp_cleanup_rbuf() right before.

Avoiding sending ACK right before processing new segments makes
a lot of sense, as this decrease the number of ACK packets,
with no impact on effective ACK clocking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: check local var (timeo) before socket fields in one test

Testing timeo before sk_err/sk_state/sk_shutdown makes more sense.

Modern applications use non-blocking IO, while a socket is terminated
only once during its life time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: defer skb freeing after socket lock is released

tcp recvmsg() (or rx zerocopy) spends a fair amount of time
freeing skbs after their payload has been consumed.

A typical ~64KB GRO packet has to release ~45 page
references, eventually going to page allocator
for each of them.

Currently, this freeing is performed while socket lock
is held, meaning that there is a high chance that
BH handler has to queue incoming packets to tcp socket backlog.

This can cause additional latencies, because the user
thread has to process the backlog at release_sock() time,
and while doing so, additional frames can be added
by BH handler.

This patch adds logic to defer these frees after socket
lock is released, or directly from BH handler if possible.

Being able to free these skbs from BH handler helps a lot,
because this avoids the usual alloc/free assymetry,
when BH handler and user thread do not run on same cpu or
NUMA node.

One cpu can now be fully utilized for the kernel->user copy,
and another cpu is handling BH processing and skb/page
allocs/frees (assuming RFS is not forcing use of a single CPU)

Tested:
100Gbit NIC
Max throughput for one TCP_STREAM flow, over 10 runs

MTU : 1500
Before: 55 Gbit
After: 66 Gbit

MTU : 4096+(headers)
Before: 82 Gbit
After: 95 Gbit

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: avoid indirect calls to sock_rfree

TCP uses sk_eat_skb() when skbs can be removed from receive queue.
However, the call to skb_orphan() from __kfree_skb() incurs
an indirect call so sock_rfee(), which is more expensive than
a direct call, especially for CONFIG_RETPOLINE=y.

Add tcp_eat_recv_skb() function to make the call before
__kfree_skb().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: tp->urg_data is unlikely to be set

Use some unlikely() hints in the fast path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: annotate races around tp->urg_data

tcp_poll() and tcp_ioctl() are reading tp->urg_data without socket lock
owned.

Also, it is faster to first check tp->urg_data in tcp_poll(),
then tp->urg_seq == tp->copied_seq, because tp->urg_seq is
located in a different/cold cache line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: annotate data-races on tp->segs_in and tp->data_segs_in

tcp_segs_in() can be called from BH, while socket spinlock
is held but socket owned by user, eventually reading these
fields from tcp_get_info()

Found by code inspection, no need to backport this patch
to older kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add RETPOLINE mitigation to sk_backlog_rcv

Use INDIRECT_CALL_INET() to avoid an indirect call
when/if CONFIG_RETPOLINE=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: small optimization in tcp recvmsg()

When reading large chunks of data, incoming packets might
be added to the backlog from BH.

tcp recvmsg() detects the backlog queue is not empty, and uses
a release_sock()/lock_sock() pair to process this backlog.

We now have __sk_flush_backlog() to perform this
a bit faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cache align tcp_memory_allocated, tcp_sockets_allocated

tcp_memory_allocated and tcp_sockets_allocated often share
a common cache line, source of false sharing.

Also take care of udp_memory_allocated and mptcp_sockets_allocated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: forward_alloc_get depends on CONFIG_MPTCP

(struct proto)->sk_forward_alloc is currently only used by MPTCP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: shrink struct sock by 8 bytes

Move sk_bind_phc next to sk_peer_lock to fill a hole.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: shrink struct ipcm6_cookie

gso_size can be moved after tclass, to use an existing hole.
(8 bytes saved on 64bit arches)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove sk_route_nocaps

Instead of using a full netdev_features_t, we can use a single bit,
as sk_route_nocaps is only used to remove NETIF_F_GSO_MASK from
sk->sk_route_cap.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove sk_route_forced_caps

We were only using one bit, and we can replace it by sk_is_tcp()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use sk_is_tcp() in more places

Move sk_is_tcp() to include/net/sock.h and use it where we can.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: small optimization in tcp_v6_send_check()

For TCP flows, inet6_sk(sk)->saddr has the same value
than sk->sk_v6_rcv_saddr.

Using sk->sk_v6_rcv_saddr increases data locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: remove dead code in __tcp_v6_send_check()

For some reason, I forgot to change __tcp_v6_send_check() at
the same time I removed (ip_summed == CHECKSUM_PARTIAL) check
in __tcp_v4_send_check()

Fixes: 72334e1f93f2 ("tcp: remove dead code after CHECKSUM_PARTIAL adoption")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: minor optimization in tcp_add_backlog()

If packet is going to be coalesced, sk_sndbuf/sk_rcvbuf values
are not used. Defer their access to the point we need them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: Fix several edge cases in validate

There were several cases where validate() would return bogus supported
modes with unusual combinations of interfaces and capabilities. For
example, if state->interface was 10GBASER and the macb had HIGH_SPEED
and PCS but not GIGABIT MODE, then 10/100 modes would be set anyway. In
another case, SGMII could be enabled even if the mac was not a GEM
(despite this being checked for later on in mac_config()). These
inconsistencies make it difficult to refactor this function cleanly.

There is still the open question of what exactly the requirements for
SGMII and 10GBASER are, and what SGMII actually supports. If someone
from Cadence (or anyone else with access to the GEM/MACB datasheet)
could comment on this, it would be greatly appreciated. In particular,
what is supported by Cadence vs. vendor extension/limitation?

To address this, the current logic is split into three parts. First, we
determine what we support, then we eliminate unsupported interfaces, and
finally we set the appropriate link modes. There is still some cruft
related to NA, but this can be removed in a future patch.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Parshuram Thombare <pthombar@cadence.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20211112190400.1937855-1-sean.anderson@seco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2021-11-15

We've added 72 non-merge commits during the last 13 day(s) which contain
a total of 171 files changed, 2728 insertions(+), 1143 deletions(-).

The main changes are:

1) Add btf_type_tag attributes to bring kernel annotations like __user/__rcu to
   BTF such that BPF verifier will be able to detect misuse, from Yonghong Song.

2) Big batch of libbpf improvements including various fixes, future proofing APIs,
   and adding a unified, OPTS-based bpf_prog_load() low-level API, from Andrii Nakryiko.

3) Add ingress_ifindex to BPF_SK_LOOKUP program type for selectively applying the
   programmable socket lookup logic to packets from a given netdev, from Mark Pashmfouroush.

4) Remove the 128M upper JIT limit for BPF programs on arm64 and add selftest to
   ensure exception handling still works, from Russell King and Alan Maguire.

5) Add a new bpf_find_vma() helper for tracing to map an address to the backing
   file such as shared library, from Song Liu.

6) Batch of various misc fixes to bpftool, fixing a memory leak in BPF program dump,
   updating documentation and bash-completion among others, from Quentin Monnet.

7) Deprecate libbpf bpf_program__get_prog_info_linear() API and migrate its users as
   the API is heavily tailored around perf and is non-generic, from Dave Marchevsky.

8) Enable libbpf's strict mode by default in bpftool and add a --legacy option as an
   opt-out for more relaxed BPF program requirements, from Stanislav Fomichev.

9) Fix bpftool to use libbpf_get_error() to check for errors, from Hengqi Chen.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
  bpftool: Use libbpf_get_error() to check error
  bpftool: Fix mixed indentation in documentation
  bpftool: Update the lists of names for maps and prog-attach types
  bpftool: Fix indent in option lists in the documentation
  bpftool: Remove inclusion of utilities.mak from Makefiles
  bpftool: Fix memory leak in prog_dump()
  selftests/bpf: Fix a tautological-constant-out-of-range-compare compiler warning
  selftests/bpf: Fix an unused-but-set-variable compiler warning
  bpf: Introduce btf_tracing_ids
  bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs
  bpftool: Enable libbpf's strict mode by default
  docs/bpf: Update documentation for BTF_KIND_TYPE_TAG support
  selftests/bpf: Clarify llvm dependency with btf_tag selftest
  selftests/bpf: Add a C test for btf_type_tag
  selftests/bpf: Rename progs/tag.c to progs/btf_decl_tag.c
  selftests/bpf: Test BTF_KIND_DECL_TAG for deduplication
  selftests/bpf: Add BTF_KIND_TYPE_TAG unit tests
  selftests/bpf: Test libbpf API function btf__add_type_tag()
  bpftool: Support BTF_KIND_TYPE_TAG
  libbpf: Support BTF_KIND_TYPE_TAG
  ...
====================

Link: https://lore.kernel.org/r/20211115162008.25916-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Revert "Merge branch 'mctp-i2c-driver'"

This reverts commit 6fa390a81f44a84575e558232afcbd2ceb1c2391, reversing
changes made to 47d95f289c07773b12e946055cac8942b31cb704.

Wolfram Sang says:

Please revert. Besides the driver in net, it modifies the I2C core
code. This has not been acked by the I2C maintainer (in this case me).
So, please don't pull this in via the net tree. The question raised here
(extending SMBus calls to 255 byte) is complicated because we need ABI
backwards compatibility.

Link: https://lore.kernel.org/all/YZJ9H4eM%2FM7OXVN0@shikoro/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'generic-phylink-validation'

Russell King says:

====================
introduce generic phylink validation

The various validate method implementations we have in phylink users
have been quite repetitive but also prone to bugs. These patches
introduce a generic implementation which relies solely on the
supported_interfaces bitmap introduced during last cycle, and in the
first patch, a bit array of MAC capabilities.

MAC drivers are free to continue to do their own thing if they have
special requirements - such as mvneta and mvpp2 which do not support
1000base-X without AN enabled. Most implementations currently in the
kernel can be converted to call phylink_generic_validate() directly
from the phylink MAC operations structure once they fill in the
supported_interfaces and mac_capabilities members of phylink_config.

This series introduces the generic implementation, and converts mvneta
and mvpp2 to use it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: use phylink_generic_validate()

Convert mvpp2 to use phylink_generic_validate() for the bulk of its
validate() implementation. This network adapter has a restriction
that for 802.3z links, autonegotiation must be enabled.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: use phylink_generic_validate()

Convert mvneta to use phylink_generic_validate() for the bulk of its
validate() implementation. This network adapter has a restriction
that for 802.3z links, autonegotiation must be enabled.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phylink: add generic validate implementation

Add a generic validate() implementation using the supported_interfaces
and a bitmask of MAC pause/speed/duplex capabilities. This allows us
to entirely eliminate many driver private validate() implementations.

We expose the underlying phylink_get_linkmodes() function so that
drivers which have special needs can still benefit from conversion.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/wan/fsl_ucc_hdlc: fix sparse warnings

CHECK   drivers/net/wan/fsl_ucc_hdlc.c
drivers/net/wan/fsl_ucc_hdlc.c:309:57: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:309:57:    expected void [noderef] __iomem *
drivers/net/wan/fsl_ucc_hdlc.c:309:57:    got restricted __be16 *
drivers/net/wan/fsl_ucc_hdlc.c:311:46: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:311:46:    expected void [noderef] __iomem *
drivers/net/wan/fsl_ucc_hdlc.c:311:46:    got restricted __be32 *
drivers/net/wan/fsl_ucc_hdlc.c:320:57: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:320:57:    expected void [noderef] __iomem *
drivers/net/wan/fsl_ucc_hdlc.c:320:57:    got restricted __be16 *
drivers/net/wan/fsl_ucc_hdlc.c:322:46: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:322:46:    expected void [noderef] __iomem *
drivers/net/wan/fsl_ucc_hdlc.c:322:46:    got restricted __be32 *
drivers/net/wan/fsl_ucc_hdlc.c:372:29: warning: incorrect type in assignment (different base types)
drivers/net/wan/fsl_ucc_hdlc.c:372:29:    expected unsigned short [usertype]
drivers/net/wan/fsl_ucc_hdlc.c:372:29:    got restricted __be16 [usertype]
drivers/net/wan/fsl_ucc_hdlc.c:379:36: warning: restricted __be16 degrades to integer
drivers/net/wan/fsl_ucc_hdlc.c:402:12: warning: incorrect type in assignment (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:402:12:    expected struct qe_bd [noderef] __iomem *bd
drivers/net/wan/fsl_ucc_hdlc.c:402:12:    got struct qe_bd *curtx_bd
drivers/net/wan/fsl_ucc_hdlc.c:425:20: warning: incorrect type in assignment (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:425:20:    expected struct qe_bd [noderef] __iomem *[assigned] bd
drivers/net/wan/fsl_ucc_hdlc.c:425:20:    got struct qe_bd *tx_bd_base
drivers/net/wan/fsl_ucc_hdlc.c:427:16: error: incompatible types in comparison expression (different address spaces):
drivers/net/wan/fsl_ucc_hdlc.c:427:16:    struct qe_bd [noderef] __iomem *
drivers/net/wan/fsl_ucc_hdlc.c:427:16:    struct qe_bd *
drivers/net/wan/fsl_ucc_hdlc.c:462:33: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:506:41: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:528:33: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:552:38: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:596:67: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:611:41: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:851:38: warning: incorrect type in initializer (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:854:40: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:855:40: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:858:39: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:861:37: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:866:38: warning: incorrect type in initializer (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:868:21: warning: incorrect type in argument 1 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:870:40: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:871:40: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:873:39: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:993:57: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:995:46: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:1004:57: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:1006:46: warning: incorrect type in argument 2 (different address spaces)
drivers/net/wan/fsl_ucc_hdlc.c:412:35: warning: dereference of noderef expression
drivers/net/wan/fsl_ucc_hdlc.c:412:35: warning: dereference of noderef expression
drivers/net/wan/fsl_ucc_hdlc.c:724:29: warning: dereference of noderef expression
drivers/net/wan/fsl_ucc_hdlc.c:815:21: warning: dereference of noderef expression
drivers/net/wan/fsl_ucc_hdlc.c:1021:29: warning: dereference of noderef expression

Most of the warnings are due to DMA memory being incorrectly handled as IO memory.
Fix it by doing direct read/write and doing proper dma_rmb() / dma_wmb().

Other problems are type mismatches or lack of use of IO accessors.

Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.org/lkml/2021/11/12/647
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fddi: use swap() to make code cleaner

Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Signed-off-by: Yihao Han <hanyihao@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hinic: use ARRAY_SIZE instead of ARRAY_LEN

ARRAY_SIZE defined in <linux/kernel.h> is safer than self-defined
macros to get size of an array such as ARRAY_LEN used here. Because
ARRAY_SIZE uses __must_be_array(arr) to ensure arr is really an array.

Reported-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: ax88179_178a: add TSO feature

On low-effciency embedded platforms, transmission performance is poor
due to on Bulk-out with single packet.
Adding TSO feature improves the transmission performance and reduces
the number of interrupt caused by Bulk-out complete.

Reference to module, net: usb: aqc111.

Signed-off-by: Jacky Chou <jackychou@asix.com.tw>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mctp-i2c-driver'

Matt Johnston says:

====================
MCTP I2C driver

This patch series adds a netdev driver providing MCTP transport over
I2C.

It applies against net-next using recent MCTP changes there, though also
has I2C core changes for review. I'll leave it to maintainers where it
should be applied - please let me know if it needs to be submitted
differently.

The I2C patches were previously sent as RFC though the only feedback
there was an ack to 255 bytes for aspeed.

The dt-bindings patch went through review on the list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mctp i2c: MCTP I2C binding driver

Provides MCTP network transport over an I2C bus, as specified in
DMTF DSP0237. All messages between nodes are sent as SMBus Block Writes.

Each I2C bus to be used for MCTP is flagged in devicetree by a
'mctp-controller' property on the bus node. Each flagged bus gets a
mctpi2cX net device created based on the bus number. A
'mctp-i2c-controller' I2C client needs to be added under the adapter. In
an I2C mux situation the mctp-i2c-controller node must be attached only
to the root I2C bus. The I2C client will handle incoming I2C slave block
write data for subordinate busses as well as its own bus.

In configurations without devicetree a driver instance can be attached
to a bus using the I2C slave new_device mechanism.

The MCTP core will hold/release the MCTP I2C device while responses
are pending (a 6 second timeout or once a socket is closed, response
received etc). While held the MCTP I2C driver will lock the I2C bus so
that the correct I2C mux remains selected while responses are received.

(Ideally we would just lock the mux to keep the current bus selected for
the response rather than a full I2C bus lock, but that isn't exposed in
the I2C mux API)

This driver requires I2C adapters that allow 255 byte transfers
(SMBus 3.0) as the specification requires a minimum MTU of 68 bytes.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: New binding mctp-i2c-controller

Used to define a local endpoint to communicate with MCTP peripherals
attached to an I2C bus. This I2C endpoint can communicate with remote
MCTP devices on the I2C bus.

In the example I2C topology below (matching the second yaml example) we
have MCTP devices on busses i2c1 and i2c6. MCTP-supporting busses are
indicated by the 'mctp-controller' DT property on an I2C bus node.

A mctp-i2c-controller I2C client DT node is placed at the top of the
mux topology, since only the root I2C adapter will support I2C slave
functionality.
                                               .-------.
                                               |eeprom |
    .------------.     .------.               /'-------'
    | adapter    |     | mux  --@0,i2c5------'
    | i2c1       ----.*|      --@1,i2c6--.--.
    |............|    \'------'           \  \  .........
    | mctp-i2c-  |     \                   \  \ .mctpB  .
    | controller |      \                   \  '.0x30   .
    |            |       \  .........        \  '.......'
    | 0x50       |        \ .mctpA  .         \ .........
    '------------'         '.0x1d   .          '.mctpC  .
                            '.......'          '.0x31   .
                                                '.......'
(mctpX boxes above are remote MCTP devices not included in the DT at
present, they can be hotplugged/probed at runtime. A DT binding for
specific fixed MCTP devices could be added later if required)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

i2c: npcm7xx: Allow 255 byte block SMBus transfers

255 byte support has been tested on a npcm750 board

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Reviewed-by: Tali Perry <tali.perry1@gmail.com>
Reviewed-by: Patrick Venture <venture@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i2c: aspeed: Allow 255 byte block transfers

255 byte transfers have been tested on an AST2500 board

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i2c: dev: Handle 255 byte blocks for i2c ioctl

I2C_SMBUS is limited to 32 bytes due to compatibility with the
32 byte i2c_smbus_data.block

I2C_RDWR allows larger transfers if sufficient sized buffers are passed.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

i2c: core: Allow 255 byte transfers for SMBus 3.x

SMBus 3.0 increased the maximum block transfer size from 32 bytes to
255 bytes. We increase the size of struct i2c_smbus_data's block[]
member.

i2c_smbus_xfer() and i2c_smbus_xfer_emulated() now support 255 byte
block operations, other block functions remain limited to 32 bytes for
compatibility with existing callers.

We allow adapters to indicate support for the larger size with
I2C_FUNC_SMBUS_V3_BLOCK. Most emulated drivers should be able to use 255
byte blocks by replacing I2C_SMBUS_BLOCK_MAX with I2C_SMBUS_V3_BLOCK_MAX
though some will have hardware limitations that need testing.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Slightly optimize 'find_portno()'

The 'inuse' bitmap is local to this function. So we can use the
non-atomic '__set_bit()' to save a few cycles.

While at it, also remove some useless {}.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: sch_netem: Refactor code in 4-state loss generator

Fixed comments to match description with variable names and
refactored code to match the convention as per [1].

To match the convention mapping is done as follows:
State 3 - LOST_IN_BURST_PERIOD
State 4 - LOST_IN_GAP_PERIOD

[1] S. Salsano, F. Ludovici, A. Ordine, "Definition of a general
and intuitive loss model for packet networks and its implementation
in the Netem module in the Linux kernel"

Fixes: 2aa3f4f94298 ("sch_netem: replace magic numbers with enumerate")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: vsc73xxx: Make vsc73xx_remove() return void

vsc73xx_remove() returns zero unconditionally and no caller checks the
returned value. So convert the function to return no value.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: enhance XDP ZC driver level switching performance

The previous stmmac_xdp_set_prog() implementation uses stmmac_release()
and stmmac_open() which tear down the PHY device and causes undesirable
autonegotiation which causes a delay whenever AFXDP ZC is setup.

This patch introduces two new functions that just sufficiently tear
down DMA descriptors, buffer, NAPI process, and IRQs and reestablish
them accordingly in both stmmac_xdp_release() and stammac_xdp_open().

As the results of this enhancement, we get rid of transient state
introduced by the link auto-negotiation:

$ ./xdpsock -i eth0 -t -z

sock0@eth0:0 txonly xdp-drv
                   pps            pkts           1.00
rx                 0              0
tx                 634444         634560

sock0@eth0:0 txonly xdp-drv
                   pps            pkts           1.00
rx                 0              0
tx                 632330         1267072

sock0@eth0:0 txonly xdp-drv
                   pps            pkts           1.00
rx                 0              0
tx                 632438         1899584

sock0@eth0:0 txonly xdp-drv
                   pps            pkts           1.00
rx                 0              0
tx                 632502         2532160

Reported-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpftool: Use libbpf_get_error() to check error

Currently, LIBBPF_STRICT_ALL mode is enabled by default for
bpftool which means on error cases, some libbpf APIs would
return NULL pointers. This makes IS_ERR check failed to detect
such cases and result in segfault error. Use libbpf_get_error()
instead like we do in libbpf itself.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211115012436.3143318-1-hengqi.chen@gmail.com

Merge branch 'bpftool: miscellaneous fixes'

Quentin Monnet says:

====================

This set contains several independent minor fixes for bpftool, its
Makefile, and its documentation. Please refer to individual commits for
details.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

bpftool: Fix mixed indentation in documentation

Some paragraphs in bpftool's documentation have a mix of tabs and spaces
for indentation. Let's make it consistent.

This patch brings no change to the text content.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211110114632.24537-7-quentin@isovalent.com

bpftool: Update the lists of names for maps and prog-attach types

To support the different BPF map or attach types, bpftool must remain
up-to-date with the types supported by the kernel. Let's update the
lists, by adding the missing Bloom filter map type and the perf_event
attach type.

Both missing items were found with test_bpftool_synctypes.py.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211110114632.24537-6-quentin@isovalent.com

bpftool: Fix indent in option lists in the documentation

Mixed indentation levels in the lists of options in bpftool's
documentation produces some unexpected results. For the "bpftool" man
page, it prints a warning:

    $ make -C bpftool.8
      GEN     bpftool.8
    <stdin>:26: (ERROR/3) Unexpected indentation.

For other pages, there is no warning, but it results in a line break
appearing in the option lists in the generated man pages.

RST paragraphs should have a uniform indentation level. Let's fix it.

Fixes: 310c7809be34 ("tools: bpftool: Update and synchronise option list in doc and help msg")
Fixes: 89867d944072 ("tools: bpftool: Document and add bash completion for -L, -B options")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211110114632.24537-5-quentin@isovalent.com

bpftool: Remove inclusion of utilities.mak from Makefiles

Bpftool's Makefile, and the Makefile for its documentation, both include
scripts/utilities.mak, but they use none of the items defined in this
file. Remove the includes.

Fixes: 23857202cde3 ("tools: bpf: add bpftool")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211110114632.24537-3-quentin@isovalent.com

bpftool: Fix memory leak in prog_dump()

Following the extraction of prog_dump() from do_dump(), the struct btf
allocated in prog_dump() is no longer freed on error; the struct
bpf_prog_linfo is not freed at all. Make sure we release them before
exiting the function.

Fixes: aeeada59ce26 ("bpftool: Match several programs with same tag")
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211110114632.24537-2-quentin@isovalent.com

ipv6: Remove duplicate statements

This statement is repeated with the initialization statement

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Remove duplicate assignments

there is a same action when the variable is initialized

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: drop unused assignment

The assignment in the if statement will be overwritten by the
following statement

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/bpf: Fix a tautological-constant-out-of-range-compare compiler warning

When using clang to build selftests with LLVM=1 in make commandline,
I hit the following compiler warning:

  benchs/bench_bloom_filter_map.c:84:46: warning: result of comparison of constant 256
    with expression of type '__u8' (aka 'unsigned char') is always false
    [-Wtautological-constant-out-of-range-compare]
                if (args.value_size < 2 || args.value_size > 256) {
                                           ~~~~~~~~~~~~~~~ ^ ~~~

The reason is arg.vaue_size has type __u8, so comparison "args.value_size > 256"
is always false.

This patch fixed the issue by doing proper comparison before assigning the
value to args.value_size. The patch also fixed the same issue in two
other places.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211112204838.3579953-1-yhs@fb.com

selftests/bpf: Fix an unused-but-set-variable compiler warning

When using clang to build selftests with LLVM=1 in make commandline,
I hit the following compiler warning:
  xdpxceiver.c:747:6: warning: variable 'total' set but not used [-Wunused-but-set-variable]
          u32 total = 0;
              ^

This patch fixed the issue by removing that declaration and its
assocatied unused operation.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211112204833.3579457-1-yhs@fb.com

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"This fixes a boot crash regression"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: api - Fix boot-up crash when crypto manager is disabled

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
"This series is all the stragglers that didn't quite make the first
  merge window pull. It's mostly minor updates and bug fixes of merge
  window code but it also has two driver updates: ufs and qla2xxx"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (46 commits)
  scsi: scsi_debug: Don't call kcalloc() if size arg is zero
  scsi: core: Remove command size deduction from scsi_setup_scsi_cmnd()
  scsi: scsi_ioctl: Validate command size
  scsi: ufs: ufshpb: Properly handle max-single-cmd
  scsi: core: Avoid leaving shost->last_reset with stale value if EH does not run
  scsi: bsg: Fix errno when scsi_bsg_register_queue() fails
  scsi: sr: Remove duplicate assignment
  scsi: ufs: ufs-exynos: Introduce ExynosAuto v9 virtual host
  scsi: ufs: ufs-exynos: Multi-host configuration for ExynosAuto v9
  scsi: ufs: ufs-exynos: Support ExynosAuto v9 UFS
  scsi: ufs: ufs-exynos: Add pre/post_hce_enable drv callbacks
  scsi: ufs: ufs-exynos: Factor out priv data init
  scsi: ufs: ufs-exynos: Add EXYNOS_UFS_OPT_SKIP_CONFIG_PHY_ATTR option
  scsi: ufs: ufs-exynos: Support custom version of ufs_hba_variant_ops
  scsi: ufs: ufs-exynos: Add setup_clocks callback
  scsi: ufs: ufs-exynos: Add refclkout_stop control
  scsi: ufs: ufs-exynos: Simplify drv_data retrieval
  scsi: ufs: ufs-exynos: Change pclk available max value
  scsi: ufs: Add quirk to enable host controller without PH configuration
  scsi: ufs: Add quirk to handle broken UIC command
  ...

Merge tag 'pwm/for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
"This set is mostly small fixes and cleanups, so more of a janitorial
  update for this cycle"

* tag 'pwm/for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: vt8500: Rename pwm_busy_wait() to make it obviously driver-specific
  dt-bindings: pwm: tpu: Add R-Car M3-W+ device tree bindings
  dt-bindings: pwm: tpu: Add R-Car V3U device tree bindings
  pwm: pwm-samsung: Trigger manual update when disabling PWM
  pwm: visconti: Simplify using devm_pwmchip_add()
  pwm: samsung: Describe driver in Kconfig
  pwm: Make it explicit that pwm_apply_state() might sleep
  pwm: Add might_sleep() annotations for !CONFIG_PWM API functions
  pwm: atmel: Drop unused header

Merge tag 'sound-fix-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of fixes for 5.16-rc1, notably for a few regressions that
  were found in 5.15 and pre-rc1:

   - revert of the unification of SG-buffer helper functions on x86 and
     the relevant fix

   - regression fixes for mmap after the recent code refactoring

   - two NULL dereference fixes in HD-audio controller driver

   - UAF fixes in ALSA timer core

   - a few usual HD-audio and FireWire quirks"

* tag 'sound-fix-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: fireworks: add support for Loud Onyx 1200f quirk
  ALSA: hda: fix general protection fault in azx_runtime_idle
  ALSA: hda: Free card instance properly at probe errors
  ALSA: hda/realtek: Add quirk for HP EliteBook 840 G7 mute LED
  ALSA: memalloc: Remove a stale comment
  ALSA: synth: missing check for possible NULL after the call to kstrdup
  ALSA: memalloc: Use proper SG helpers for noncontig allocations
  ALSA: pci: rme: Fix unaligned buffer addresses
  ALSA: firewire-motu: add support for MOTU Track 16
  ALSA: PCM: Fix NULL dereference at mmap checks
  ALSA: hda/realtek: Add quirk for ASUS UX550VE
  ALSA: timer: Unconditionally unlink slave instances, too
  ALSA: memalloc: Catch call with NULL snd_dma_buffer pointer
  Revert "ALSA: memalloc: Convert x86 SG-buffer handling with non-contiguous type"
  ALSA: hda/realtek: Add a quirk for Acer Spin SP513-54N
  ALSA: firewire-motu: add support for MOTU Traveler mk3
  ALSA: hda/realtek: Headset fixup for Clevo NH77HJQ
  ALSA: timer: Fix use-after-free problem

Merge tag 'drm-next-2021-11-12' of git://anongit.freedesktop.org/drm/drm

Pull more drm updates from Dave Airlie:
"I missed a drm-misc-next pull for the main pull last week. It wasn't
  that major and isn't the bulk of this at all. This has a bunch of
  fixes all over, a lot for amdgpu and i915.

  bridge:
   - HPD improvments for lt9611uxc
   - eDP aux-bus support for ps8640
   - LVDS data-mapping selection support

  ttm:
   - remove huge page functionality (needs reworking)
   - fix a race condition during BO eviction

  panels:
   - add some new panels

  fbdev:
   - fix double-free
   - remove unused scrolling acceleration
   - CONFIG_FB dep improvements

  locking:
   - improve contended locking logging
   - naming collision fix

  dma-buf:
   - add dma_resv_for_each_fence iterator
   - fix fence refcounting bug
   - name locking fixesA

  prime:
   - fix object references during mmap

  nouveau:
   - various code style changes
   - refcount fix
   - device removal fixes
   - protect client list with a mutex
   - fix CE0 address calculation

  i915:
   - DP rates related fixes
   - Revert disabling dual eDP that was causing state readout problems
   - put the cdclk vtables in const data
   - Fix DVO port type for older platforms
   - Fix blankscreen by turning DP++ TMDS output buffers on encoder->shutdown
   - CCS FBs related fixes
   - Fix recursive lock in GuC submission
   - Revert guc_id from i915_request tracepoint
   - Build fix around dmabuf

  amdgpu:
   - GPU reset fix
   - Aldebaran fix
   - Yellow Carp fixes
   - DCN2.1 DMCUB fix
   - IOMMU regression fix for Picasso
   - DSC display fixes
   - BPC display calculation fixes
   - Other misc display fixes
   - Don't allow partial copy from user for DC debugfs
   - SRIOV fixes
   - GFX9 CSB pin count fix
   - Various IP version check fixes
   - DP 2.0 fixes
   - Limit DCN1 MPO fix to DCN1

  amdkfd:
   - SVM fixes
   - Fix gfx version for renoir
   - Reset fixes

  udl:
   - timeout fix

  imx:
   - circular locking fix

  virtio:
   - NULL ptr deref fix"

* tag 'drm-next-2021-11-12' of git://anongit.freedesktop.org/drm/drm: (126 commits)
  drm/ttm: Double check mem_type of BO while eviction
  drm/amdgpu: add missed support for UVD IP_VERSION(3, 0, 64)
  drm/amdgpu: drop jpeg IP initialization in SRIOV case
  drm/amd/display: reject both non-zero src_x and src_y only for DCN1x
  drm/amd/display: Add callbacks for DMUB HPD IRQ notifications
  drm/amd/display: Don't lock connection_mutex for DMUB HPD
  drm/amd/display: Add comment where CONFIG_DRM_AMD_DC_DCN macro ends
  drm/amdkfd: Fix retry fault drain race conditions
  drm/amdkfd: lower the VAs base offset to 8KB
  drm/amd/display: fix exit from amdgpu_dm_atomic_check() abruptly
  drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov
  drm/amdgpu: fix uvd crash on Polaris12 during driver unloading
  drm/i915/adlp/fb: Prevent the mapping of redundant trailing padding NULL pages
  drm/i915/fb: Fix rounding error in subsampled plane size calculation
  drm/i915/hdmi: Turn DP++ TMDS output buffers back on in encoder->shutdown()
  drm/locking: fix __stack_depot_* name conflict
  drm/virtio: Fix NULL dereference error in virtio_gpu_poll
  drm/amdgpu: fix SI handling in amdgpu_device_asic_has_dc_support()
  drm/amdgpu: Fix dangling kfd_bo pointer for shared BOs
  drm/amd/amdkfd: Don't sent command to HWS on kfd reset
  ...

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:
"Just one new driver (Cypress StreetFighter touchkey), and no input
  core changes this time.

  Plus various fixes and enhancements to existing drivers"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits)
  Input: iforce - fix control-message timeout
  Input: wacom_i2c - use macros for the bit masks
  Input: ili210x - reduce sample period to 15ms
  Input: ili210x - improve polled sample spacing
  Input: ili210x - special case ili251x sample read out
  Input: elantench - fix misreporting trackpoint coordinates
  Input: synaptics-rmi4 - Fix device hierarchy
  Input: i8042 - Add quirk for Fujitsu Lifebook T725
  Input: cap11xx - add support for cap1206
  Input: remove unused header <linux/input/cy8ctmg110_pdata.h>
  Input: ili210x - add ili251x firmware update support
  Input: ili210x - export ili251x version details via sysfs
  Input: ili210x - use resolution from ili251x firmware
  Input: pm8941-pwrkey - respect reboot_mode for warm reset
  reboot: export symbol 'reboot_mode'
  Input: max77693-haptic - drop unneeded MODULE_ALIAS
  Input: cpcap-pwrbutton - do not set input parent explicitly
  Input: max8925_onkey - don't mark comment as kernel-doc
  Input: ads7846 - do not attempt IRQ workaround when deferring probe
  Input: ads7846 - use input_set_capability()
  ...

Merge tag 'rtc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
"This includes new ioctls to get and set parameters and in particular
  the backup switch mode that is needed for some RTCs to actually enable
  the backup voltage (and have a useful RTC).

  The same interface can also be used to get the actual features
  supported by the RTC so userspace has a better way than trying and
  failing.

  Summary:

  Subsystem:
   - Add new ioctl to get and set extra RTC parameters, this includes
     backup switch mode
   - Expose available features to userspace, in particular, when alarmas
     have a resolution of one minute instead of a second.
   - Let the core handle those alarms with a minute resolution

  New driver:
   - MSTAR MSC313 RTC

  Drivers:
   - Add SPI ID table where necessary
   - Add BSM support for rv3028, rv3032 and pcf8523
   - s3c: set RTC range
   - rx8025: set range, implement .set_offset and .read_offset"

* tag 'rtc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
  rtc: rx8025: use .set_offset/.read_offset
  rtc: rx8025: use rtc_add_group
  rtc: rx8025: clear RTC_FEATURE_ALARM when alarm are not supported
  rtc: rx8025: set range
  rtc: rx8025: let the core handle the alarm resolution
  rtc: rx8025: switch to devm_rtc_allocate_device
  rtc: ab8500: let the core handle the alarm resolution
  rtc: ab-eoz9: support UIE when available
  rtc: ab-eoz9: use RTC_FEATURE_UPDATE_INTERRUPT
  rtc: rv3032: let the core handle the alarm resolution
  rtc: s35390a: let the core handle the alarm resolution
  rtc: handle alarms with a minute resolution
  rtc: pcf85063: silence cppcheck warning
  rtc: rv8803: fix writing back ctrl in flag register
  rtc: s3c: Add time range
  rtc: s3c: Extract read/write IO into separate functions
  rtc: s3c: Remove usage of devm_rtc_device_register()
  rtc: tps80031: Remove driver
  rtc: sun6i: Allow probing without an early clock provider
  rtc: pcf8523: add BSM support
  ...

Merge tag 'libata-5.16-rc1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull more libata updates from Damien Le Moal:
"Second round of updates for libata for 5.16:

   - Fix READ LOG EXT and READ LOG DMA EXT command timeouts during disk
     revalidation after a resume or a modprobe of the LLDD (me)

   - Remove unnecessary error message in sata_highbank driver (Xu)

   - Better handling of accesses to the IDENTIFY DEVICE data log for
     drives that do not support this log page (me)

   - Fix ahci_shost_attr_group declaration in ahci driver (me)"

* tag 'libata-5.16-rc1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  libata: libahci: declare ahci_shost_attr_group as static
  libata: add horkage for missing Identify Device log
  ata: sata_highbank: Remove unnecessary print function dev_err()
  libata: fix read log timeout value

tools/lib/lockdep: drop liblockdep

TL;DR: While a tool like liblockdep is useful, it probably doesn't
belong within the kernel tree.

liblockdep attempts to reuse kernel code both directly (by directly
building the kernel's lockdep code) as well as indirectly (by using
sanitized headers). This makes liblockdep an integral part of the
kernel.

It also makes liblockdep quite unique: while other userspace code might
use sanitized headers, it generally doesn't attempt to use kernel code
directly which means that changes on the kernel side of things don't
affect (and break) it directly.

All our workflows and tooling around liblockdep don't support this
uniqueness. Changes that go into the kernel code aren't validated to not
break in-tree userspace code.

liblockdep ended up being very fragile, breaking over and over, to the
point that living in the same tree as the lockdep code lost most of it's
value.

liblockdep should continue living in an external tree, syncing with
the kernel often, in a controllable way.

Signed-off-by: Sasha Levin <sashal@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

thermal: int340x: fix build on 32-bit targets

Commit 9ebf1a5db1fc ("thermal/drivers/int340x: processor_thermal: Suppot
64 bit RFIM responses") started using 'readq()' to read 64-bit status
responses from the int340x hardware.

That's all fine and good, but on 32-bit targets a 64-bit 'readq()' is
ambiguous, since it's no longer an atomic access. Some hardware might
require 64-bit accesses, and other hardware might want low word first or
high word first.

It's quite likely that the driver isn't relevant in a 32-bit environment
any more, and there's a patch floating around to just make it depend on
X86_64, but let's make it buildable on x86-32 anyway.

The driver previously just read the low 32 bits, so the hardware
certainly is ok with 32-bit reads, and in a little-endian environment
the low word first model is the natural one.

So just add the include for the 'io-64-nonatomic-lo-hi.h' version.

Fixes: 9ebf1a5db1fc ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'introduce btf_tracing_ids'

Song Liu says:

====================

Changes v2 => v3:
1. Fix bug in task_iter.c. (Kernel test robot <lkp@intel.com>)

Changes v1 => v2:
1. Add patch 2/2. (Alexei)

1/2 fixes issue with btf_task_struct_ids w/o CONFIG_DEBUG_INFO_BTF.
2/2 replaces btf_task_struct_ids with easier to understand btf_tracing_ids.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Introduce btf_tracing_ids

Similar to btf_sock_ids, btf_tracing_ids provides btf ID for task_struct,
file, and vm_area_struct via easy to understand format like
btf_tracing_ids[BTF_TRACING_TYPE_[TASK|file|VMA]].

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211112150243.1270987-3-songliubraving@fb.com

bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs

syzbot reported the following BUG w/o CONFIG_DEBUG_INFO_BTF

BUG: KASAN: global-out-of-bounds in task_iter_init+0x212/0x2e7 kernel/bpf/task_iter.c:661
Read of size 4 at addr ffffffff90297404 by task swapper/0/1

CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-syzkaller #0
Hardware name: ... Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0xf/0x309 mm/kasan/report.c:256
__kasan_report mm/kasan/report.c:442 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
task_iter_init+0x212/0x2e7 kernel/bpf/task_iter.c:661
do_one_initcall+0x103/0x650 init/main.c:1295
do_initcall_level init/main.c:1368 [inline]
do_initcalls init/main.c:1384 [inline]
do_basic_setup init/main.c:1403 [inline]
kernel_init_freeable+0x6b1/0x73a init/main.c:1606
kernel_init+0x1a/0x1d0 init/main.c:1497
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>

This is caused by hard-coded name[1] in BTF_ID_LIST_GLOBAL (w/o
CONFIG_DEBUG_INFO_BTF). Fix this by adding a parameter n to
BTF_ID_LIST_GLOBAL. This avoids ifdef CONFIG_DEBUG_INFO_BTF in btf.c and
filter.c.

Fixes: 223fe5821395 ("bpf: Introduce helper bpf_find_vma")
Reported-by: syzbot+e0d81ec552a21d9071aa@syzkaller.appspotmail.com
Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20211112150243.1270987-2-songliubraving@fb.com

bpftool: Enable libbpf's strict mode by default

Otherwise, attaching with bpftool doesn't work with strict section names.

Also:

  - Add --legacy option to switch back to pre-1.0 behavior
  - Print a warning when program fails to load in strict mode to
    point to --legacy flag
  - By default, don't append / to the section name; in strict
    mode it's relevant only for a small subset of prog types

+ bpftool --legacy prog loadall tools/testing/selftests/bpf/test_cgroup_link.o /sys/fs/bpf/kprobe type kprobe
libbpf: failed to pin program: File exists
Error: failed to pin all programs
+ bpftool prog loadall tools/testing/selftests/bpf/test_cgroup_link.o /sys/fs/bpf/kprobe type kprobe

v1 -> v2:
  - strict by default (Quentin Monnet)
  - add more info to --legacy description (Quentin Monnet)
  - add bash completion (Quentin Monnet)

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20211110192324.920934-1-sdf@google.com

Merge branch 'next' into for-linus

Prepare input updates for 5.16 merge window.

Merge tag 'drm-misc-fixes-2021-11-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

* dma-buf: name_lock fixes
* prime: Keep object ref during mmap
* nouveau: Fix a refcount issue; Fix device removal; Protect client
list with dedicated mutex; Fix address CE0 address calculation
* ttm: Fix race condition during BO eviction

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YYzY6jeox9EeI15i@linux-uq9g.fritz.box

Merge branch 'Support BTF_KIND_TYPE_TAG for btf_type_tag attributes'

Yonghong Song says:

====================

LLVM patches ([1] for clang, [2] and [3] for BPF backend)
added support for btf_type_tag attributes. This patch
added support for the kernel.

The main motivation for btf_type_tag is to bring kernel
annotations __user, __rcu etc. to btf. With such information
available in btf, bpf verifier can detect mis-usages
and reject the program. For example, for __user tagged pointer,
developers can then use proper helper like bpf_probe_read_kernel()
etc. to read the data.

BTF_KIND_TYPE_TAG may also useful for other tracing
facility where instead of to require user to specify
kernel/user address type, the kernel can detect it
by itself with btf.

Patch 1 added support in kernel, Patch 2 for libbpf and Patch 3
for bpftool. Patches 4-9 are for bpf selftests and Patch 10
updated docs/bpf/btf.rst file with new btf kind.

  [1] https://reviews.llvm.org/D111199
  [2] https://reviews.llvm.org/D113222
  [3] https://reviews.llvm.org/D113496

Changelogs:
  v2 -> v3:
    - rebase to resolve merge conflicts.
  v1 -> v2:
    - add more dedup tests.
    - remove build requirement for LLVM=1.
    - remove testing macro __has_attribute in bpf programs
      as it is always defined in recent clang compilers.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

docs/bpf: Update documentation for BTF_KIND_TYPE_TAG support

Add BTF_KIND_TYPE_TAG documentation in btf.rst.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012656.1509082-1-yhs@fb.com

selftests/bpf: Clarify llvm dependency with btf_tag selftest

btf_tag selftest needs certain llvm versions (>= llvm14).
Make it clear in the selftests README.rst file.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012651.1508549-1-yhs@fb.com

selftests/bpf: Add a C test for btf_type_tag

The following is the main btf_type_tag usage in the
C test:
  #define __tag1 __attribute__((btf_type_tag("tag1")))
  #define __tag2 __attribute__((btf_type_tag("tag2")))
  struct btf_type_tag_test {
       int __tag1 * __tag1 __tag2 *p;
  } g;

The bpftool raw dump with related types:
  [4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
  [11] STRUCT 'btf_type_tag_test' size=8 vlen=1
          'p' type_id=14 bits_offset=0
  [12] TYPE_TAG 'tag1' type_id=16
  [13] TYPE_TAG 'tag2' type_id=12
  [14] PTR '(anon)' type_id=13
  [15] TYPE_TAG 'tag1' type_id=4
  [16] PTR '(anon)' type_id=15
  [17] VAR 'g' type_id=11, linkage=global

With format C dump, we have
  struct btf_type_tag_test {
        int __attribute__((btf_type_tag("tag1"))) * __attribute__((btf_type_tag("tag1"))) __attribute__((btf_type_tag("tag2"))) *p;
  };
The result C code is identical to the original definition except macro's are gone.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012646.1508231-1-yhs@fb.com

selftests/bpf: Rename progs/tag.c to progs/btf_decl_tag.c

Rename progs/tag.c to progs/btf_decl_tag.c so we can introduce
progs/btf_type_tag.c in the next patch.

Also create a subtest for btf_decl_tag in prog_tests/btf_tag.c
so we can introduce btf_type_tag subtest in the next patch.

I also took opportunity to remove the check whether __has_attribute
is defined or not in progs/btf_decl_tag.c since all recent
clangs should already support this macro.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012641.1507144-1-yhs@fb.com

selftests/bpf: Test BTF_KIND_DECL_TAG for deduplication

Add BTF_KIND_TYPE_TAG duplication unit tests.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012635.1506853-1-yhs@fb.com

selftests/bpf: Add BTF_KIND_TYPE_TAG unit tests

Add BTF_KIND_TYPE_TAG unit tests.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012630.1506095-1-yhs@fb.com

selftests/bpf: Test libbpf API function btf__add_type_tag()

Add unit tests for btf__add_type_tag().

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012625.1505748-1-yhs@fb.com

bpftool: Support BTF_KIND_TYPE_TAG

Add bpftool support for BTF_KIND_TYPE_TAG.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012620.1505506-1-yhs@fb.com

libbpf: Support BTF_KIND_TYPE_TAG

Add libbpf support for BTF_KIND_TYPE_TAG.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012614.1505315-1-yhs@fb.com

bpf: Support BTF_KIND_TYPE_TAG for btf_type_tag attributes

LLVM patches ([1] for clang, [2] and [3] for BPF backend)
added support for btf_type_tag attributes. This patch
added support for the kernel.

The main motivation for btf_type_tag is to bring kernel
annotations __user, __rcu etc. to btf. With such information
available in btf, bpf verifier can detect mis-usages
and reject the program. For example, for __user tagged pointer,
developers can then use proper helper like bpf_probe_read_user()
etc. to read the data.

BTF_KIND_TYPE_TAG may also useful for other tracing
facility where instead of to require user to specify
kernel/user address type, the kernel can detect it
by itself with btf.

  [1] https://reviews.llvm.org/D111199
  [2] https://reviews.llvm.org/D113222
  [3] https://reviews.llvm.org/D113496

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211112012609.1505032-1-yhs@fb.com

Merge branch 'Future-proof more tricky libbpf APIs'

Andrii Nakryiko says:

====================

This patch set continues the work of revamping libbpf APIs that are not
extensible, as they were added before we figured out all the intricacies of
building APIs that can preserve ABI compatibility (both backward and forward).

What makes them tricky is that (most of) these APIs are actively used by
multiple applications, so we need to be careful about refactoring them. See
individual patches for details, but the general approach is similar to
previous bpf_prog_load() API revamp. The biggest different and complexity is
in changing btf_dump__new(), because function overloading through macro magic
doesn't work based on number of arguments, as both new and old APIs have
4 arguments. Because of that, another overloading approach is taken; overload
happens based on argument types.

I've validated manually (by using local test_progs-shared flavor that is
compiling test_progs against libbpf as a shared library) that compiling "old
application" (selftests before being adapted to using new variants of revamped
APIs) are compiled and successfully run against newest libbpf version as well
as the older libbpf version (provided no new variants are used). All these
scenarios seem to be working as expected.

v1->v2:
  - add explicit printf_fn NULL check in btf_dump__new() (Alexei);
  - replaced + with || in __builtin_choose_expr() (Alexei);
  - dropped test_progs-shared flavor (Alexei).
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpftool: Update btf_dump__new() and perf_buffer__new_raw() calls

Use v1.0-compatible variants of btf_dump and perf_buffer "constructors".
This is also a demonstration of reusing struct perf_buffer_raw_opts as
OPTS-style option struct for new perf_buffer__new_raw() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-10-andrii@kernel.org

tools/runqslower: Update perf_buffer__new() calls

Use v1.0+ compatible variant of perf_buffer__new() call to prepare for
deprecation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-9-andrii@kernel.org

selftests/bpf: Update btf_dump__new() uses to v1.0+ variant

Update to-be-deprecated forms of btf_dump__new().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-8-andrii@kernel.org

selftests/bpf: Migrate all deprecated perf_buffer uses

Migrate all old-style perf_buffer__new() and perf_buffer__new_raw()
calls to new v1.0+ variants.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-7-andrii@kernel.org

libbpf: Make perf_buffer__new() use OPTS-based interface

Add new variants of perf_buffer__new() and perf_buffer__new_raw() that
use OPTS-based options for future extensibility ([0]). Given all the
currently used API names are best fits, re-use them and use
___libbpf_override() approach and symbol versioning to preserve ABI and
source code compatibility. struct perf_buffer_opts and struct
perf_buffer_raw_opts are kept as well, but they are restructured such
that they are OPTS-based when used with new APIs. For struct
perf_buffer_raw_opts we keep few fields intact, so we have to also
preserve the memory location of them both when used as OPTS and for
legacy API variants. This is achieved with anonymous padding for OPTS
"incarnation" of the struct. These pads can be eventually used for new
options.

[0] Closes: https://github.com/libbpf/libbpf/issues/311

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-6-andrii@kernel.org

libbpf: Ensure btf_dump__new() and btf_dump_opts are future-proof

Change btf_dump__new() and corresponding struct btf_dump_ops structure
to be extensible by using OPTS "framework" ([0]). Given we don't change
the names, we use a similar approach as with bpf_prog_load(), but this
time we ended up with two APIs with the same name and same number of
arguments, so overloading based on number of arguments with
___libbpf_override() doesn't work.

Instead, use "overloading" based on types. In this particular case,
print callback has to be specified, so we detect which argument is
a callback. If it's 4th (last) argument, old implementation of API is
used by user code. If not, it must be 2nd, and thus new implementation
is selected. The rest is handled by the same symbol versioning approach.

btf_ext argument is dropped as it was never used and isn't necessary
either. If in the future we'll need btf_ext, that will be added into
OPTS-based struct btf_dump_opts.

struct btf_dump_opts is reused for both old API and new APIs. ctx field
is marked deprecated in v0.7+ and it's put at the same memory location
as OPTS's sz field. Any user of new-style btf_dump__new() will have to
set sz field and doesn't/shouldn't use ctx, as ctx is now passed along
the callback as mandatory input argument, following the other APIs in
libbpf that accept callbacks consistently.

Again, this is quite ugly in implementation, but is done in the name of
backwards compatibility and uniform and extensible future APIs (at the
same time, sigh). And it will be gone in libbpf 1.0.

[0] Closes: https://github.com/libbpf/libbpf/issues/283

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-5-andrii@kernel.org

libbpf: Turn btf_dedup_opts into OPTS-based struct

btf__dedup() and struct btf_dedup_opts were added before we figured out
OPTS mechanism. As such, btf_dedup_opts is non-extensible without
breaking an ABI and potentially crashing user application.

Unfortunately, btf__dedup() and btf_dedup_opts are short and succinct
names that would be great to preserve and use going forward. So we use
___libbpf_override() macro approach, used previously for bpf_prog_load()
API, to define a new btf__dedup() variant that accepts only struct btf *
and struct btf_dedup_opts * arguments, and rename the old btf__dedup()
implementation into btf__dedup_deprecated(). This keeps both source and
binary compatibility with old and new applications.

The biggest problem was struct btf_dedup_opts, which wasn't OPTS-based,
and as such doesn't have `size_t sz;` as a first field. But btf__dedup()
is a pretty rarely used API and I believe that the only currently known
users (besides selftests) are libbpf's own bpf_linker and pahole.
Neither use case actually uses options and just passes NULL. So instead
of doing extra hacks, just rewrite struct btf_dedup_opts into OPTS-based
one, move btf_ext argument into those opts (only bpf_linker needs to
dedup btf_ext, so it's not a typical thing to specify), and drop never
used `dont_resolve_fwds` option (it was never used anywhere, AFAIK, it
makes BTF dedup much less useful and efficient).

Just in case, for old implementation, btf__dedup_deprecated(), detect
non-NULL options and error out with helpful message, to help users
migrate, if there are any user playing with btf__dedup().

The last remaining piece is dedup_table_size, which is another
anachronism from very early days of BTF dedup. Since then it has been
reduced to the only valid value, 1, to request forced hash collisions.
This is only used during testing. So instead introduce a bool flag to
force collisions explicitly.

This patch also adapts selftests to new btf__dedup() and btf_dedup_opts
use to avoid selftests breakage.

[0] Closes: https://github.com/libbpf/libbpf/issues/281

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211111053624.190580-4-andrii@kernel.org