IB: ucontext should be set properly for all cmd & ioctl paths
the Attempt to use the below commit to initialize the ucontext for the
uobject destroy path has shown that the below commit is incomplete.
Parts were reverted and the ucontext set up in the uverbs_attr_bundle was
moved to rdma_lookup_get_uobject which is called from the uobj_get_XXX
macros and rdma_alloc_begin_uobject which is called when uobject is
created.
Fixes: 1e0f0e2ae24b ("IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows") Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:21:00 +0000 (16:21 -0800)]
qib: Convert qib_unit_table to XArray
Also remove qib_devs_list.
Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Fri, 8 Feb 2019 20:41:29 +0000 (15:41 -0500)]
hfi1: Convert hfi1_unit_table to XArray
Also remove hfi1_devs_list.
Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Sun, 10 Mar 2019 15:27:46 +0000 (17:27 +0200)]
RDMA/core: Don't compare specific bit after boolean AND
There is no need to perform extra comparison after boolean AND.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:39 +0000 (16:20 -0800)]
RDMA/hns: Convert cq_table to XArray
Change the locking to not disable interrupts as the lookup in interrupt
context will not see a freed CQ, thanks to the synchronize_irq() call
before freeing the CQ.
Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 12:01:48 +0000 (14:01 +0200)]
RDMA/core: Add interface to read device namespace sharing mode
Add an interface via netlink command to query whether rdma devices are
shared among multiple net namespaces or not. When using RDMAtool, it can
be queried as,
$rdma system show netns
netns shared
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 12:01:46 +0000 (14:01 +0200)]
RDMA: Check net namespace access for uverbs, umad, cma and nldev
Introduce an API rdma_dev_access_netns() to check whether a rdma device
can be accessed from the specified net namespace or not.
Use rdma_dev_access_netns() while opening character uverbs, umad network
device and also check while rdma cm_id binds to rdma device.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 12:01:45 +0000 (14:01 +0200)]
RDMA/core: Add module param to disable device sharing among net ns
Add module parameter to change a sharing mode of ib_core early in the
boot process. This parameter helps to those systems where modern up
to date rdma tool (iproute2) package may not be available during
kernel upgrade cycle.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 11:56:12 +0000 (13:56 +0200)]
RDMA/core: Restrict sysfs entries view to init_net
This is a preparation patch to provide isolation of rdma device in a
network namespace.
As first step, make rdma device visible only in init net namespace.
Subsequent patch will enable rdma device visibility back in multiple net
namespaces using compat ib_core_device device/sysfs tree.
Given that the IB subsystem depends on net stack, it needs to be
initialized after netdev and since it support devices, it needs to be
initialized before the device subsystem; therefore, change initcall
sequence to fs_initcall, so that when ib_core is compiled in the kernel
image, the right init sequence is followed.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 26 Feb 2019 11:56:11 +0000 (13:56 +0200)]
RDMA/core: Introduce ib_core_device to hold device
In order to support sysfs entries in multiple net namespaces for a rdma
device, introduce a ib_core_device whose scope is limited to hold core
device and per port sysfs related entries.
This is preparation patch so that multiple ib_core_devices in each net
namespace can be created in subsequent patch who all can share ib_device.
(a) Move sysfs specific fields to ib_core_device.
(b) Make sysfs and device life cycle related routines to work on
ib_core_device.
(c) Introduce and use rdma_init_coredev() helper to initialize
coredev fields.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Thu, 28 Mar 2019 16:49:47 +0000 (11:49 -0500)]
RDMA/rdmavt: Use correct sizing on buffers holding page DMA addresses
The buffer that holds the page DMA addresses is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.
Use ib_umem_num_pages() to size this buffer.
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Thu, 28 Mar 2019 16:49:46 +0000 (11:49 -0500)]
RDMA/rxe: Use correct sizing on buffers holding page DMA addresses
The buffer that holds the page DMA addresses is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.
Use ib_umem_num_pages() to size this buffer.
Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Thu, 28 Mar 2019 16:49:45 +0000 (11:49 -0500)]
RDMA/mthca: Use correct sizing on buffers holding page DMA addresses
The buffer that holds the page DMA addresses is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.
Use ib_umem_num_pages() to size this buffer.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Thu, 28 Mar 2019 16:49:44 +0000 (11:49 -0500)]
RDMA/cxbg: Use correct sizing on buffers holding page DMA addresses
The PBL array that hold the page DMA address is sized off umem->nmap.
This can potentially cause out of bound accesses on the PBL array when
iterating the umem DMA-mapped SGL. This is because if umem pages are
combined, umem->nmap can be much lower than the number of system pages
in umem.
Selvin Xavier [Thu, 28 Mar 2019 16:49:43 +0000 (11:49 -0500)]
RDMA/bnxt_re: Use correct sizing on buffers holding page DMA addresses
umem->nmap is used while allocating internal buffer for storing
page DMA addresses. This causes out of bounds array access while iterating
the umem DMA-mapped SGL with umem page combining as umem->nmap can be
less than number of system pages in umem.
Use ib_umem_num_pages() instead of umem->nmap to size the page array.
Add a new structure (bnxt_qplib_sg_info) to pass sglist, npages and nmap.
Bart Van Assche [Wed, 27 Mar 2019 23:50:51 +0000 (16:50 -0700)]
IB/qib: Remove a set-but-not-used variable
This patch avoids that a compiler warning is reported when building with
W=1.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: 5dc5bc02817e ("IB/qib: Change SDMA progression mode depending on single- or multi-rail") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:47 +0000 (16:50 -0700)]
RDMA/uverbs: Allow the compiler to verify declaration and definition consistency
This patch avoids that sparse reports the following warnings:
drivers/infiniband/core/uverbs_std_types_flow_action.c:442:30: warning: symbol 'uverbs_def_obj_flow_action' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_dm.c:112:30: warning: symbol 'uverbs_def_obj_dm' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_counters.c:153:30: warning: symbol 'uverbs_def_obj_counters' was not declared. Should it be static?
drivers/infiniband/core/uverbs_std_types_mr.c:213:30: warning: symbol 'uverbs_def_obj_mr' was not declared. Should it be static?
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: c471142089a9 ("RDMA/uverbs: Require all objects to have a driver destroy function") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:46 +0000 (16:50 -0700)]
RDMA/uverbs: Annotate uverbs_request_next_ptr() return value as a __user pointer
This patch avoids that sparse complains about a mismatch between the
returned value and the function return type.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: 24d5ca5a8328 ("RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Wed, 27 Mar 2019 23:50:45 +0000 (16:50 -0700)]
RDMA/uverbs: Add a __user annotation to a pointer
This patch avoids that sparse and smatch report the following:
warning: cast removes address space of expression
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: 6a2d94b15928 ("RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Zhu Yanjun [Wed, 27 Mar 2019 09:50:47 +0000 (05:50 -0400)]
IB/rxe: Replace av->network_type with skb->protocol
In the function rxe_init_packet, based on av->network_type, skb->protocol
is set to ipv4 or ipv6. The functions rxe_prepare and rxe_send are called
after the functin rxe_init_packet. So in these functions,
av->network_type can be replaced with skb->protocol.
The functions are in the xmit fast path. So with skb->protocol, the
performance will be better.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:48 +0000 (14:11 -0700)]
IB/MAD: Add SMP details to MAD tracing
Decode more information from the packet and include it in the trace.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:47 +0000 (14:11 -0700)]
IB/UMAD: Add umad trace points
Trace MADs going to/from user space.
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:46 +0000 (14:11 -0700)]
IB/MAD: Add agent trace points
Trace agent details when agents are [un]registered. In addition, report
agent details on send/recv.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:45 +0000 (14:11 -0700)]
IB/MAD: Add recv path trace point
Trace received MAD details.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ira Weiny [Tue, 19 Mar 2019 21:11:44 +0000 (14:11 -0700)]
IB/MAD: Add send path trace points
Use the standard Linux trace mechanism to trace MADs being sent. 4 trace
points are added, when the MAD is posted to the qp, when the MAD is
completed, if a MAD is resent, and when the MAD completes in error.
Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com> Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Artemy Kovalyov [Tue, 19 Mar 2019 09:24:39 +0000 (11:24 +0200)]
IB/mlx5: Compare only index part of a memory window rkey
The InfiniBand Architecture Specification section 10.6.7.2.4 TYPE 2 MEMORY
WINDOWS says that if the CI supports the Base Memory Management Extensions
defined in this specification, the R_Key format for a Type 2 Memory Window
must consist of:
* 24 bit index in the most significant bits of the R_Key, which is owned
by the CI, and
* 8 bit key in the least significant bits of the R_Key, which is owned by
the Consumer.
This means that the kernel should compare only the index part of a R_Key
to determine equality with another R_Key.
Fixes: 2ebc59d8fb63 ("IB/mlx5: Add ODP support to MW") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 19 Mar 2019 09:10:09 +0000 (11:10 +0200)]
RDMA/hns: Limit scope of hns_roce_cmq_send()
The forgotten static keyword causes to the following error to appear while
building HNS driver. Declare hns_roce_cmq_send() to be static function to
fix this warning.
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:1089:5: warning: no previous
prototype for _hns_roce_cmq_send_ [-Wmissing-prototypes]
int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
Fixes: cbfbd3e1c8ab ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Colin Ian King [Sat, 16 Mar 2019 23:05:12 +0000 (23:05 +0000)]
IB/iser: remove uninitialized variable len
The variable len is not being inintialized and the uninitialized value is
being returned. However, this return path is never reached because the
default case in the switch statement returns -ENOSYS. Clean up the code
by replacing the return -ENOSYS with a break for the default case and
returning -ENOSYS at the end of the function. This allows len to be
removed. Also remove redundant break that follows a return statement.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
...which concluded this code was no longer necessary.
Acked-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Colin Ian King [Sat, 2 Mar 2019 23:06:36 +0000 (23:06 +0000)]
RDMA/nes: remove redundant check on udata
The non-null check on udata is redundant as this check was performed just
a few statements earlier and the check is always true as udata must be
non-null at this point. Remove redundant the check on udata and the
redundant else part that can never be executed.
Detected by CoverityScan, CID#1477317 ("Logically dead code")
Fixes: 608d3c337e22 ("IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:42 +0000 (16:20 -0800)]
IB/mad: Convert ib_mad_clients to XArray
Pull the allocation function out into its own function to reduce the
length of ib_register_mad_agent() a little and keep all the allocation
logic together.
Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Tue, 26 Feb 2019 16:46:16 +0000 (08:46 -0800)]
IB/hfi1: Add running average for adaptive pio
The adaptive PIO implementation only considers the current packet size
when deciding between SDMA and pio for a packet.
This causes credit return forces if small and large packets are
interleaved.
Add a running average to avoid costly credit forces so that a large
sequence of small packets is required to go below the threshold that
chooses pio.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Erez Alfasi [Mon, 25 Feb 2019 06:52:30 +0000 (08:52 +0200)]
RDMA: Use __packed annotation instead of __attribute__ ((packed))
"__attribute__" set of macros has been standardized, have became more
potentially portable and consistent code back in v2.6.21 by commit c24cd302f ("[PATCH] extend the set of "__attribute__" shortcut macros").
Moreover, nowadays checkpatch.pl warns about using __attribute__((packed))
instead of __packed.
This patch converts all the "__attribute__ ((packed))" annotations to
"__packed" within the RDMA subsystem.
Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:26 +0000 (20:01 +0800)]
RDMA/hns: Bugfix for sending with invalidate
According to IB protocol, the send with invalidate operation will not
invalidate mr that was created through a register mr or reregister mr.
Fixes: e2f28bb571fa ("RDMA/hns: Support local invalidate for hip08 in kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lijun Ou [Sat, 23 Feb 2019 12:01:24 +0000 (20:01 +0800)]
RDMA/hns: Only assgin some fields if the relatived attr_mask is set
According to IB protocol, some fields of qp context are filled with
optional when the relatived attr_mask are set. The relatived attr_mask
include IB_QP_TIMEOUT, IB_QP_RETRY_CNT, IB_QP_RNR_RETRY and
IB_QP_MIN_RNR_TIMER. Besides, we move some assignments of the fields of
qp context into the outside of the specific qp state jump function.
Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:55 +0000 (16:20 -0800)]
cxgb4: Convert stid_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:53 +0000 (16:20 -0800)]
cxgb4: Convert hwtid_idr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:52 +0000 (16:20 -0800)]
cxgb4: Convert mmidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:51 +0000 (16:20 -0800)]
cxgb4: Convert qpidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:50 +0000 (16:20 -0800)]
cxgb4: Convert cqidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:49 +0000 (16:20 -0800)]
cxgb3: Convert mmidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:48 +0000 (16:20 -0800)]
cxgb3: Convert qpidr to XArray
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matthew Wilcox [Thu, 21 Feb 2019 00:20:47 +0000 (16:20 -0800)]
cxgb3: Convert cqidr to XArray
It would make sense to convert this to an allocating XArray and remove the
kfifo that is currently used to allocate the CQID, but that work is better
done by someone who has the hardware to test with.
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Linus Torvalds [Sun, 24 Mar 2019 20:41:37 +0000 (13:41 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for 5.1"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: prohibit fstrim in norecovery mode
ext4: cleanup bh release code in ext4_ind_remove_space()
ext4: brelse all indirect buffer in ext4_ind_remove_space()
ext4: report real fs size after failed resize
ext4: add missing brelse() in add_new_gdb_meta_bg()
ext4: remove useless ext4_pin_inode()
ext4: avoid panic during forced reboot
ext4: fix data corruption caused by unaligned direct AIO
ext4: fix NULL pointer dereference while journal is aborted
Linus Torvalds [Sun, 24 Mar 2019 18:42:10 +0000 (11:42 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
"Third more careful attempt for this set of fixes:
- Prevent a 32bit math overflow in the cpufreq code
- Fix a buffer overflow when scanning the cgroup2 cpu.max property
- A set of fixes for the NOHZ scheduler logic to prevent waking up
CPUs even if the capacity of the busy CPUs is sufficient along with
other tweaks optimizing the behaviour for asymmetric systems
(big/little)"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Skip LLC NOHZ logic for asymmetric systems
sched/fair: Tune down misfit NOHZ kicks
sched/fair: Comment some nohz_balancer_kick() kick conditions
sched/core: Fix buffer overflow in cgroup2 property cpu.max
sched/cpufreq: Fix 32-bit math overflow
Linus Torvalds [Sun, 24 Mar 2019 18:16:27 +0000 (11:16 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
"A larger set of perf updates.
Not all of them are strictly fixes, but that's solely the tip
maintainers fault as they let the timely -rc1 pull request fall
through the cracks for various reasons including travel. So I'm
sending this nevertheless because rebasing and distangling fixes and
updates would be a mess and risky as well. As of tomorrow, a strict
fixes separation is happening again. Sorry for the slip-up.
Kernel:
- Handle RECORD_MMAP vs. RECORD_MMAP2 correctly so different
consumers of the mmap event get what they requested.
Tools:
- A larger set of updates to perf record/report/scripts vs. time
stamp handling
- More Python3 fixups
- A pile of memory leak plumbing
- perf BPF improvements and fixes
- Finalize the perf.data directory storage"
[ Note: the kernel part is strictly a fix, the updates are purely to
tooling - Linus ]
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
perf bpf: Show more BPF program info in print_bpf_prog_info()
perf bpf: Extract logic to create program names from perf_event__synthesize_one_bpf_prog()
perf tools: Save bpf_prog_info and BTF of new BPF programs
perf evlist: Introduce side band thread
perf annotate: Enable annotation of BPF programs
perf build: Check what binutils's 'disassembler()' signature to use
perf bpf: Process PERF_BPF_EVENT_PROG_LOAD for annotation
perf symbols: Introduce DSO_BINARY_TYPE__BPF_PROG_INFO
perf feature detection: Add -lopcodes to feature-libbfd
perf top: Add option --no-bpf-event
perf bpf: Save BTF information as headers to perf.data
perf bpf: Save BTF in a rbtree in perf_env
perf bpf: Save bpf_prog_info information as headers to perf.data
perf bpf: Save bpf_prog_info in a rbtree in perf_env
perf bpf: Make synthesize_bpf_events() receive perf_session pointer instead of perf_tool
perf bpf: Synthesize bpf events with bpf_program__get_prog_info_linear()
bpftool: use bpf_program__get_prog_info_linear() in prog.c:do_dump()
tools lib bpf: Introduce bpf_program__get_prog_info_linear()
perf record: Replace option --bpf-event with --no-bpf-event
perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
...
Linus Torvalds [Sun, 24 Mar 2019 18:12:27 +0000 (11:12 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:
- Prevent potential NULL pointer dereferences in the HPET and HyperV
code
- Exclude the GART aperture from /proc/kcore to prevent kernel
crashes on access
- Use the correct macros for Cyrix I/O on Geode processors
- Remove yet another kernel address printk leak
- Announce microcode reload completion as requested by quite some
people. Microcode loading has become popular recently.
- Some 'Make Clang' happy fixlets
- A few cleanups for recently added code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/gart: Exclude GART aperture from kcore
x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
x86/mm/pti: Make local symbols static
x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors
x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
x86/microcode: Announce reload operation's completion
x86/hyperv: Prevent potential NULL pointer dereference
x86/hpet: Prevent potential NULL pointer dereference
x86/lib: Fix indentation issue, remove extra tab
x86/boot: Restrict header scope to make Clang happy
x86/mm: Don't leak kernel addresses
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
Linus Torvalds [Sun, 24 Mar 2019 18:09:47 +0000 (11:09 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A set of small fixes plus the removal of stale board support code:
- Remove the board support code from the clpx711x clocksource driver.
This change had fallen through the cracks and I'm sending it now
rather than dealing with people who want to improve that stale code
for 3 month.
- Use the proper clocksource mask on RICSV
- Make local scope functions and variables static"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/clps711x: Remove board support
clocksource/drivers/riscv: Fix clocksource mask
clocksource/drivers/mips-gic-timer: Make gic_compare_irqaction static
clocksource/drivers/timer-ti-dm: Make omap_dm_timer_set_load_start() static
clocksource/drivers/tcb_clksrc: Make tc_clksrc_suspend/resume() static
clocksource/drivers/clps711x: Make clps711x_clksrc_init() static
time/jiffies: Make refined_jiffies static
Linus Torvalds [Sun, 24 Mar 2019 17:58:01 +0000 (10:58 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two small fixes:
- Cure a recently introduces error path hickup which tries to
unregister a not registered lockdep key in te workqueue code
- Prevent unaligned cmpxchg() crashes in the robust list handling
code by sanity checking the user space supplied futex pointer"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Ensure that futex address is aligned in handle_futex_death()
workqueue: Only unregister a registered lockdep key
Linus Torvalds [Sun, 24 Mar 2019 17:51:23 +0000 (10:51 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for the interrupt subsystem:
- Remove secondary GIC support on systems w/o device-tree support
- A set of small fixlets in various irqchip drivers
- static and fall-through annotations
- Kernel doc and typo fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Mark expected switch case fall-through
genirq/devres: Remove excess parameter from kernel doc
irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static
irqchip/mbigen: Don't clear eventid when freeing an MSI
irqchip/stm32: Don't set rising configuration registers at init
irqchip/stm32: Don't clear rising/falling config registers at init
dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support
irqchip/mmp: Make mmp_irq_domain_ops static
irqchip/brcmstb-l2: Make two init functions static
genirq: Fix typo in comment of IRQD_MOVE_PCNTXT
irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
irqchip/gic: Drop support for secondary GIC in non-DT systems
irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
Linus Torvalds [Sun, 24 Mar 2019 17:17:33 +0000 (10:17 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:
"Two small fixes:
- Move the large objtool_file struct off the stack so objtool works
in setups with a tight stack limit.
- Make a few variables static in the watchdog core code"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/core: Make variables static
objtool: Move objtool_file struct off the stack
Linus Torvalds [Sun, 24 Mar 2019 16:58:08 +0000 (09:58 -0700)]
Merge tag '5.1-rc1-cifs-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb3 fixes from Steve French:
- two fixes for stable for guest mount problems with smb3.1.1
- two fixes for crediting (SMB3 flow control) on resent requests
- a byte range lock leak fix
- two fixes for incorrect rc mappings
* tag '5.1-rc1-cifs-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
SMB3: Fix SMB3.1.1 guest mounts to Samba
cifs: Fix slab-out-of-bounds when tracing SMB tcon
cifs: allow guest mounts to work for smb3.11
fix incorrect error code mapping for OBJECTID_NOT_FOUND
cifs: fix that return -EINVAL when do dedupe operation
CIFS: Fix an issue with re-sending rdata when transport returning -EAGAIN
CIFS: Fix an issue with re-sending wdata when transport returning -EAGAIN
Linus Torvalds [Sun, 24 Mar 2019 16:43:35 +0000 (09:43 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six fixes to four drivers and two core fixes.
One core fix simply corrects a missed destroy_rcu_head() but the other
is hopefully the end of an ongoing effort to make suspend/resume play
nicely with scsi quiesce"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ibmvscsi: Fix empty event pool access during host removal
scsi: ibmvscsi: Protect ibmvscsi_head from concurrent modificaiton
scsi: hisi_sas: Add softreset in hisi_sas_I_T_nexus_reset()
scsi: qla2xxx: Fix NULL pointer crash due to stale CPUID
scsi: qla2xxx: Fix FC-AL connection target discovery
scsi: core: Avoid that a kernel warning appears during system resume
scsi: core: Also call destroy_rcu_head() for passthrough requests
scsi: iscsi: flush running unbind operations when removing a session
Linus Torvalds [Sat, 23 Mar 2019 17:25:12 +0000 (10:25 -0700)]
Merge tag 'io_uring-20190323' of git://git.kernel.dk/linux-block
Pull io_uring fixes and improvements from Jens Axboe:
"The first five in this series are heavily inspired by the work Al did
on the aio side to fix the races there.
The last two re-introduce a feature that was in io_uring before it got
merged, but which I pulled since we didn't have a good way to have
BVEC iters that already have a stable reference. These aren't
necessarily related to block, it's just how io_uring pins fixed
buffers"
* tag 'io_uring-20190323' of git://git.kernel.dk/linux-block:
block: add BIO_NO_PAGE_REF flag
iov_iter: add ITER_BVEC_FLAG_NO_REF flag
io_uring: mark me as the maintainer
io_uring: retry bulk slab allocs as single allocs
io_uring: fix poll races
io_uring: fix fget/fput handling
io_uring: add prepped flag
io_uring: make io_read/write return an integer
io_uring: use regular request ref counts
Linus Torvalds [Sat, 23 Mar 2019 17:14:42 +0000 (10:14 -0700)]
Merge tag 'for-linus-20190323' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes/changes that should go into this series. This contains:
- Kernel doc / comment updates (Bart, Shenghui)
- Un-export of core-only used function (Bart)
- Fix race on loop file access (Dongli)
- pf/pcd queue cleanup fixes (me)
- Use appropriate helper for RESTART bit set (Yufen)
- Use named identifier for classic poll (Yufen)"
* tag 'for-linus-20190323' of git://git.kernel.dk/linux-block:
sbitmap: trivial - update comment for sbitmap_deferred_clear_bit
blkcg: Fix kernel-doc warnings
blk-iolatency: #include "blk.h"
block: Unexport blk_mq_add_to_requeue_list()
block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value
blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx
loop: access lo_backing_file only when the loop device is Lo_bound
blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART
paride/pcd: cleanup queues when detection fails
paride/pf: cleanup queues when detection fails
Linus Torvalds [Sat, 23 Mar 2019 17:04:47 +0000 (10:04 -0700)]
Merge tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A follow up for the new alloc_size logic and a blacklisting fix,
marked for stable"
* tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client:
rbd: drop wait_for_latest_osdmap()
libceph: wait for latest osdmap in ceph_monc_blacklist_add()
rbd: set io_min, io_opt and discard_granularity to alloc_size
Darrick J. Wong [Sat, 23 Mar 2019 16:10:29 +0000 (12:10 -0400)]
ext4: prohibit fstrim in norecovery mode
The ext4 fstrim implementation uses the block bitmaps to find free space
that can be discarded. If we haven't replayed the journal, the bitmaps
will be stale and we absolutely *cannot* use stale metadata to zap the
underlying storage.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
zhangyi (F) [Sat, 23 Mar 2019 15:56:01 +0000 (11:56 -0400)]
ext4: cleanup bh release code in ext4_ind_remove_space()
Currently, we are releasing the indirect buffer where we are done with
it in ext4_ind_remove_space(), so we can see the brelse() and
BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we
may probably forget to release the buffer some day. This patch cleans
up the code by putting of the code which releases the buffers to the
end of the function.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
zhangyi (F) [Sat, 23 Mar 2019 15:43:05 +0000 (11:43 -0400)]
ext4: brelse all indirect buffer in ext4_ind_remove_space()
All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.
- Create and mount an empty ext4 filesystem without extent and quota
features,
- quotacheck and enable the user & group quota,
- Create some files and write some data to them, and then punch hole
to some files of them, it may trigger the buffer leak problem
mentioned above.
- Disable quota and run quotacheck again, it will create two new
aquota files and write the checked quota information to them, which
probably may reuse the freed indirect block(the buffer and page
cache was not freed) as data block.
- Enable quota again, it will invoke
vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
buffers and pagecache. Unfortunately, because of the buffer of quota
data block is still referenced, quota code cannot read the up to date
quota info from the device and lead to quota information corruption.
This problem can be reproduced by xfstests generic/231 on ext3 file
system or ext4 file system without extent and quota features.
This patch fix this problem by releasing the missing indirect buffers,
in ext4_ind_remove_space().
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
With -Wimplicit-fallthrough added to CFLAGS:
kernel/irq/manage.c: In function ‘irq_do_set_affinity’:
kernel/irq/manage.c:198:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
cpumask_copy(desc->irq_common_data.affinity, mask);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/irq/manage.c:199:2: note: here
case IRQ_SET_MASK_OK_NOCOPY:
^~~~
Kairui Song [Fri, 8 Mar 2019 03:05:08 +0000 (11:05 +0800)]
x86/gart: Exclude GART aperture from kcore
On machines where the GART aperture is mapped over physical RAM,
/proc/kcore contains the GART aperture range. Accessing the GART range via
/proc/kcore results in a kernel crash.
vmcore used to have the same issue, until it was fixed with commit 625b86f75921 ("x86/gart: Exclude GART aperture from vmcore")', leveraging
existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
when attempting to read the aperture region, and so it won't read from the
actual memory.
Apply the same workaround for kcore. First implement the same hook
infrastructure for kcore, then reuse the hook functions introduced in the
previous vmcore fix. Just with some minor adjustment, rename some functions
for more general usage, and simplify the hook infrastructure a bit as there
is no module usage yet.
Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
Steve French [Sat, 23 Mar 2019 03:31:17 +0000 (22:31 -0500)]
SMB3: Fix SMB3.1.1 guest mounts to Samba
Workaround problem with Samba responses to SMB3.1.1
null user (guest) mounts. The server doesn't set the
expected flag in the session setup response so we have
to do a similar check to what is done in smb3_validate_negotiate
where we also check if the user is a null user (but not sec=krb5
since username might not be passed in on mount for Kerberos case).
Note that the commit below tightened the conditions and forced signing
for the SMB2-TreeConnect commands as per MS-SMB2.
However, this should only apply to normal user sessions and not for
cases where there is no user (even if server forgets to set the flag
in the response) since we don't have anything useful to sign with.
This is especially important now that the more secure SMB3.1.1 protocol
is in the default dialect list.
An earlier patch ("cifs: allow guest mounts to work for smb3.11") fixed
the guest mounts to Windows.
Fixes: 21529761fb77 ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara <palcantara@suse.de> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>