Tetsuo Handa [Thu, 9 Jun 2022 13:26:48 +0000 (22:26 +0900)]
scsi: mpt3sas: Remove flush_scheduled_work() call
It seems to me that mpt3sas driver is using dedicated workqueues and is not
calling schedule{,_delayed}_work{,_on}(). Then, there will be no work to
flush using flush_scheduled_work().
Changyuan Lyu [Tue, 21 Jun 2022 18:11:25 +0000 (18:11 +0000)]
scsi: trace: Print driver_tag and scheduler_tag in SCSI trace
Trace events like scsi_dispatch_cmd_start and scsi_dispatch_cmd_done are
useful for tracking a command throughout its lifetime. But for some ATA
passthrough commands, the information printed in current logs is not enough
to identify and match them. For example, if two threads send SMART cmd to
the same disk at the same time, their trace logs may look the same, which
makes it hard to match scsi_dispatch_cmd_done and scsi_dispatch_cmd_start.
Printing tags can help us solve the problem. Further, if a command failed
for some reason and then is retried, its driver_tag will change. So
scheduler_tag is also included such that we can track the retries of a
command.
Link: https://lore.kernel.org/r/20220621181125.3211399-1-changyuanl@google.com Reviewed-by: Vishakha Channapattan <vishakhavc@google.com> Reviewed-by: Jolly Shah <jollys@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Changyuan Lyu <changyuanl@google.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike Christie [Thu, 16 Jun 2022 22:45:57 +0000 (17:45 -0500)]
scsi: libiscsi: Improve conn_send_pdu API
The conn_send_pdu API is evil in that it returns a pointer to an
iscsi_task, but that task might have been freed already so you can't touch
it. This patch splits the task allocation and transmission, so functions
like iscsi_send_nopout() can access the task before its sent and do
whatever bookkeeping is needed before it is sent.
Mike Christie [Thu, 16 Jun 2022 22:45:56 +0000 (17:45 -0500)]
scsi: iscsi: Try to avoid taking back_lock in xmit path
We need the back lock when freeing a task, so we hold it when calling
__iscsi_put_task() from the completion path to make it easier and to avoid
having to retake it in that path. For iscsi_put_task() we just grabbed it
while also doing the decrement on the refcount but it's only really needed
if the refcount is zero and we free the task. This modifies
iscsi_put_task() to just take the lock when needed then has the xmit path
use it. Normally we will then not take the back lock from the xmit path. It
will only be rare cases where the network is so fast that we get a response
right after we send the header/data.
We currently require that the back_lock is held when calling the functions
that manipulate the iscsi_task refcount. The only reason for this is to
handle races where we are handling SCSI-ml EH callbacks and the cmd is
completing at the same time the normal completion path is running, and we
can't return from the EH callback until the driver has stopped accessing
the cmd. Holding the back_lock while also accessing the task->state made it
simple to check that a cmd is completing and also get/put a refcount at the
same time, and at the time we were not as concerned about performance.
The problem is that we don't want to take the back_lock from the xmit path
for normal I/O since it causes contention with the completion path if the
user has chosen to try and split those paths on different CPUs (in this
case abusing the CPUs and ignoring caching improves perf for some uses).
Begins to remove the back_lock requirement for iscsi_get/put_task by
removing the requirement for the get path. Instead of always holding the
back_lock we detect if something has done the last put and is about to call
iscsi_free_task(). A subsequent commit will then allow iSCSI code to do the
last put on a task and only grab the back_lock if the refcount is now zero
and it's going to call iscsi_free_task().
Mike Christie [Thu, 16 Jun 2022 22:45:54 +0000 (17:45 -0500)]
scsi: iscsi: Remove unneeded task state check
Commit d0757df95b24 ("scsi: libiscsi: Drop taskqueuelock") added an extra
task->state because for commit 4a81c36e3a80 ("scsi: libiscsi: add lock
around task lists to fix list corruption regression") we didn't know why we
ended up with cmds on the list and thought it might have been a bad target
sending a response while we were still sending the cmd. We were never able
to get a target to send us a response early, because it turns out the bug
was just a race in libiscsi/libiscsi_tcp where:
1. iscsi_tcp_r2t_rsp() queues a r2t to tcp_task->r2tqueue.
2. iscsi_tcp_task_xmit() runs iscsi_tcp_get_curr_r2t() and sees we have a
r2t. It dequeues it and iscsi_tcp_task_xmit() starts to process it.
3. iscsi_tcp_r2t_rsp() runs iscsi_requeue_task() and puts the task on the
requeue list.
4. iscsi_tcp_task_xmit() sends the data for r2t. This is the final chunk
of data, so the cmd is done.
5. target sends the response.
6. On a different CPU from #3, iscsi_complete_task() processes the
response. Since there was no common lock for the list, the lists/tasks
pointers are not fully in sync, so could end up with list corruption.
Since it was just a race on our side, remove the extra check and fix up the
comments.
Mike Christie [Thu, 16 Jun 2022 22:45:53 +0000 (17:45 -0500)]
scsi: iscsi_tcp: Drop target_alloc use
For software iSCSI, we do a session per host so there is no need to set the
target's can_queue since it's the same as the host one. Setting it just
results in extra atomic checks in the main I/O path.
Mike Christie [Thu, 16 Jun 2022 22:45:51 +0000 (17:45 -0500)]
scsi: iscsi: Run recv path from workqueue
We don't always want to run the recv path from the network softirq because
when we have to have multiple sessions sharing the same CPUs, some sessions
can eat up the NAPI softirq budget and affect other sessions or users.
Allow us to queue the recv handling to the iscsi workqueue so we can have
the scheduler/wq code try to balance the work and CPU use across all
sessions' worker threads.
Note: It wasn't the original intent of the change but a nice side effect is
that for some workloads/configs we get a nice performance boost. For a
simple read heavy test:
where the iscsi threads, fio jobs, and rps_cpus share CPUs we see a 32%
throughput boost. We also see increases for small I/O IOPs tests but it's
not as high.
Mike Christie [Thu, 16 Jun 2022 22:27:38 +0000 (17:27 -0500)]
scsi: iscsi: Fix session removal on shutdown
When the system is shutting down, iscsid is not running so we will not get
a response to the ISCSI_ERR_INVALID_HOST error event. The system shutdown
will then hang waiting on userspace to remove the session.
This has libiscsi force the destruction of the session from the kernel when
iscsi_host_remove() is called from a driver's shutdown callout.
This fixes a regression added in qedi boot with commit c55cd560a09b ("scsi:
qedi: Fix host removal with running sessions") which made qedi use the
common session removal function that waits on userspace instead of rolling
its own kernel based removal.
Link: https://lore.kernel.org/r/20220616222738.5722-7-michael.christie@oracle.com Fixes: c55cd560a09b ("scsi: qedi: Fix host removal with running sessions") Tested-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike Christie [Thu, 16 Jun 2022 22:27:37 +0000 (17:27 -0500)]
scsi: qedi: Use QEDI_MODE_NORMAL for error handling
When handling errors that lead to host removal use QEDI_MODE_NORMAL. There
is currently no difference in behavior between NORMAL and SHUTDOWN, but in
a subsequent commit we will want to know when we are called from the
pci_driver shutdown callout vs remove/err_handler so we know when userspace
is up.
Link: https://lore.kernel.org/r/20220616222738.5722-6-michael.christie@oracle.com Tested-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike Christie [Thu, 16 Jun 2022 22:27:36 +0000 (17:27 -0500)]
scsi: iscsi: Add helper to remove a session from the kernel
During qedi shutdown we need to stop the iSCSI layer from sending new nops
as pings and from responding to target ones and make sure there is no
running connection cleanups. Commit c55cd560a09b ("scsi: qedi: Fix host
removal with running sessions") converted the driver to use the libicsi
helper to drive session removal, so the above issues could be handled. The
problem is that during system shutdown iscsid will not be running so when
we try to remove the root session we will hang waiting for userspace to
reply.
Add a helper that will drive the destruction of sessions like these during
system shutdown.
Mike Christie [Thu, 16 Jun 2022 22:27:35 +0000 (17:27 -0500)]
scsi: iscsi: Clean up bound endpoints during shutdown
In the next patch we allow drivers to drive session removal during
shutdown. In this case iscsid will not be running, so we need to detect
bound endpoints and disconnect them. This moves the bound ep check so we
now always check.
Mike Christie [Thu, 16 Jun 2022 22:27:34 +0000 (17:27 -0500)]
scsi: iscsi: Allow iscsi_if_stop_conn() to be called from kernel
iscsi_if_stop_conn() is only called from the userspace interface but in a
subsequent commit we will want to call it from the kernel interface to
allow drivers like qedi to remove sessions from inside the kernel during
shutdown. This removes the iscsi_uevent code from iscsi_if_stop_conn() so we
can call it in a new helper.
Mike Christie [Thu, 16 Jun 2022 22:27:33 +0000 (17:27 -0500)]
scsi: iscsi: Fix HW conn removal use after free
If qla4xxx doesn't remove the connection before the session, the iSCSI
class tries to remove the connection for it. We were doing a
iscsi_put_conn() in the iter function which is not needed and will result
in a use after free because iscsi_remove_conn() will free the connection.
Link: https://lore.kernel.org/r/20220616222738.5722-2-michael.christie@oracle.com Tested-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ren Zhijie [Sun, 19 Jun 2022 11:54:32 +0000 (19:54 +0800)]
scsi: ufs: ufs-mediatek: Fix build error and type mismatch
If CONFIG_PM_SLEEP is not set.
make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-, will fail:
drivers/ufs/host/ufs-mediatek.c: In function ‘ufs_mtk_vreg_fix_vcc’:
drivers/ufs/host/ufs-mediatek.c:688:46: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ [-Wformat=]
snprintf(vcc_name, MAX_VCC_NAME, "vcc-opt%u", res.a1);
~^ ~~~~~~
%lu
drivers/ufs/host/ufs-mediatek.c: In function ‘ufs_mtk_system_suspend’:
drivers/ufs/host/ufs-mediatek.c:1371:8: error: implicit declaration of function ‘ufshcd_system_suspend’; did you mean ‘ufs_mtk_system_suspend’? [-Werror=implicit-function-declaration]
ret = ufshcd_system_suspend(dev);
^~~~~~~~~~~~~~~~~~~~~
ufs_mtk_system_suspend
drivers/ufs/host/ufs-mediatek.c: In function ‘ufs_mtk_system_resume’:
drivers/ufs/host/ufs-mediatek.c:1386:9: error: implicit declaration of function ‘ufshcd_system_resume’; did you mean ‘ufs_mtk_system_resume’? [-Werror=implicit-function-declaration]
return ufshcd_system_resume(dev);
^~~~~~~~~~~~~~~~~~~~
ufs_mtk_system_resume
cc1: some warnings being treated as errors
The declaration of func "ufshcd_system_suspend()" depends on
CONFIG_PM_SLEEP, so the function wrapper ufs_mtk_system_suspend() should
wrapped by CONFIG_PM_SLEEP too.
Link: https://lore.kernel.org/r/20220619115432.205504-1-renzhijie2@huawei.com Fixes: 225f5d4d2198 ("scsi: ufs: ufs-mediatek: Fix the timing of configuring device regulators") Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Ren Zhijie <renzhijie2@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[Option 2: By UFS version]
mediatek,ufs-vcc-by-ver;
vcc-ufs3-supply = <&mt6373_vbuck4_ufs>;
Link: https://lore.kernel.org/r/20220616053725.5681-11-stanley.chu@mediatek.com Signed-off-by: Alice Chao <alice.chao@mediatek.com> Signed-off-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Peter Wang [Thu, 16 Jun 2022 05:37:20 +0000 (13:37 +0800)]
scsi: ufs: ufs-mediatek: Support low-power mode for VCCQ
Allow VCCQ to enter low-power mode, and also remove the restriction of VCC
because VCCQ/VCCQ2 can be changed to low-power mode even if VCC stays on
while the device is not in active power mode.
Link: https://lore.kernel.org/r/20220616053725.5681-7-stanley.chu@mediatek.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Device regulatrs are allowed to enter low-power mode if neither device is
not in active mode, nor VCC does not keep on.
Fix this by adding conditions before LPM decision.
Link: https://lore.kernel.org/r/20220616053725.5681-6-stanley.chu@mediatek.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Po-Wen Kao <powen.kao@mediatek.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Po-Wen Kao [Thu, 16 Jun 2022 05:37:18 +0000 (13:37 +0800)]
scsi: ufs: ufs-mediatek: Fix the timing of configuring device regulators
Currently the LPM configurations of device regulators may not work since
VCC is not disabled yet while ufs_mtk_vreg_set_lpm() is executed.
Fix this by changing the timing of invoking ufs_mtk_vreg_set_lpm().
Link: https://lore.kernel.org/r/20220616053725.5681-5-stanley.chu@mediatek.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Po-Wen Kao <powen.kao@mediatek.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
CC Chou [Thu, 16 Jun 2022 05:37:17 +0000 (13:37 +0800)]
scsi: ufs: ufs-mediatek: Introduce workaround for power mode change
Some MediaTek SoC chips need special flow for power mode change, especially
for chips supporting HS-G5.
Enable the workaround by setting the host-specific capability.
Link: https://lore.kernel.org/r/20220616053725.5681-4-stanley.chu@mediatek.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: CC Chou <cc.chou@mediatek.com> Signed-off-by: Eddie Huang <eddie.huang@mediatek.com> Signed-off-by: Dennis Yu <tun-yu.yu@mediatek.com> Signed-off-by: Peter Wang <peter.want@medaitek.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Thu, 16 Jun 2022 05:35:07 +0000 (22:35 -0700)]
scsi: qla2xxx: Fix erroneous mailbox timeout after PCI error injection
Clear wait for mailbox interrupt flag to prevent stale mailbox:
Feb 22 05:22:56 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-500a:4: LOOP UP detected (16 Gbps).
Feb 22 05:22:59 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-d04c:4: MBX Command timeout for cmd 69, ...
To fix the issue, driver needs to clear the MBX_INTR_WAIT flag on purging
the mailbox. When the stale mailbox completion does arrive, it will be
dropped.
Arun Easi [Thu, 16 Jun 2022 05:35:06 +0000 (22:35 -0700)]
scsi: qla2xxx: Fix losing FCP-2 targets on long port disable with I/Os
FCP-2 devices were not coming back online once they were lost, login
retries exhausted, and then came back up. Fix this by accepting RSCN when
the device is not online.
Link: https://lore.kernel.org/r/20220616053508.27186-10-njavali@marvell.com Fixes: d5638e660741 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Arun Easi [Thu, 16 Jun 2022 05:35:03 +0000 (22:35 -0700)]
scsi: qla2xxx: Fix losing FCP-2 targets during port perturbation tests
When a mix of FCP-2 (tape) and non-FCP-2 targets are present, FCP-2 target
state was incorrectly transitioned when both of the targets were gone. Fix
this by ignoring state transition for FCP-2 targets.
Link: https://lore.kernel.org/r/20220616053508.27186-7-njavali@marvell.com Fixes: d5638e660741 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bikash Hazarika [Thu, 16 Jun 2022 05:34:59 +0000 (22:34 -0700)]
scsi: qla2xxx: Add a new v2 dport diagnostic feature
FW requires minimum 72 bytes buffer size for D_port result. Buffer size
1024 is mentioned in the FW spec so buffer size is increased to 1024.
Rewrite the logic to handle START/RESTART command from SDMAPI.
Dmitry Bogdanov [Tue, 7 Jun 2022 13:19:53 +0000 (16:19 +0300)]
scsi: iscsi: Prefer xmit of DataOut over new commands
iscsi_data_xmit() (TX worker) is iterating over the queue of new SCSI
commands concurrently with the queue being replenished. Only after the
queue is emptied will we start sending pending DataOut PDUs. That leads to
DataOut timeout on the target side and to connection reinstatement.
Give priority to pending DataOut commands over new commands.
Link: https://lore.kernel.org/r/20220607131953.11584-1-d.bogdanov@yadro.com Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Alim Akhtar [Wed, 15 Jun 2022 12:12:03 +0000 (17:42 +0530)]
scsi: ufs: host: ufs-exynos: Use already existing definition
UFS core already uses RX_MIN_ACTIVATETIME_CAPABILITY macro, let's use the
same in driver as well instead of having a different macro name for the
same offset.
John Garry [Fri, 10 Jun 2022 16:46:42 +0000 (00:46 +0800)]
scsi: pm8001: Expose hardware queues for pm80xx
In commit 7266bb90ee0d ("scsi: pm80xx: Increase number of supported
queues"), support for 80xx chip was improved by enabling multiple HW
queues.
In this, like other SCSI MQ HBA drivers at the time, the HW queues were not
exposed to upper layer, and instead the driver managed the queues
internally.
However, this management duplicates blk-mq code. In addition, the HW queue
management is sub-optimal for a system where the number of CPUs exceeds the
HW queues - this is because queues are selected in a round-robin fashion,
when it would be better to make adjacent CPUs submit on the same queue. And
finally, the affinity of the completion queue interrupts is not set to
mirror the cpu<->HQ queue mapping, which is suboptimal.
As such, for when MSIX is supported, expose HW queues to upper layer. We
always use queue index #0 for "internal" commands, i.e. anything which does
not come from the block layer, so omit this from the affinity spreading.
John Garry [Fri, 10 Jun 2022 16:46:41 +0000 (00:46 +0800)]
scsi: pm8001: Use non-atomic bitmap ops for tag alloc + free
In pm8001_tag_alloc() we don't require atomic set_bit() as we are already
in atomic context. In pm8001_tag_free() we should use the same host
spinlock to protect clearing the tag (and then don't require the atomic
clear_bit()).
John Garry [Fri, 10 Jun 2022 16:46:40 +0000 (00:46 +0800)]
scsi: pm8001: Set up tags before using them
The current code is buggy in that the tags are set up after they are needed
in pm80xx_chip_init() -> pm80xx_set_sas_protocol_timer_config(). The tag
depth is earlier read in pm80xx_chip_init() -> read_main_config_table().
Add a post init callback to do the pm80xx work which needs to be done after
reading the tags. I don't see a better way to do this.
John Garry [Fri, 10 Jun 2022 16:46:39 +0000 (00:46 +0800)]
scsi: pm8001: Rework shost initial values
Some values in pm8001_prep_sas_ha_init() are set the same as they would be
set in scsi_host_alloc(), or could be in the sht (which would be better),
or later just overwritten, so rework the following:
- cmd_per_lun can be set in the sht
- max_lun and max_channel are as scsi_host_alloc() (so no need to set)
- can_queue is later overwritten (so don't set in
pm8001_prep_sas_ha_init())
Bit[3] of HCI_CLKSTOP_CTRL register is for enabling/disabling MPHY APB
clock. Lets add it to CLK_STOP_MASK, so that the same can be controlled
during clock masking/unmasking.
Link: https://lore.kernel.org/r/20220610104119.66401-6-alim.akhtar@samsung.com Tested-by: Chanho Park <chanho61.park@samsung.com> Reviewed-by: Chanho Park <chanho61.park@samsung.com> Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Thu, 9 Jun 2022 02:24:56 +0000 (11:24 +0900)]
scsi: libsas: Introduce struct smp_rps_resp
Similarly to sas report general and discovery responses, define the
structure struct smp_rps_resp to handle SATA PHY report responses using a
structure with a size that is exactly equal to the sas defined response
size.
With this change, struct smp_resp becomes unused and is removed.
Damien Le Moal [Thu, 9 Jun 2022 02:24:55 +0000 (11:24 +0900)]
scsi: libsas: Introduce struct smp_rg_resp
When compiling with gcc 12, several warnings are thrown by gcc when
compiling drivers/scsi/libsas/sas_expander.c, e.g.:
In function ‘sas_get_ex_change_count’,
inlined from ‘sas_find_bcast_dev’ at
drivers/scsi/libsas/sas_expander.c:1816:8:
drivers/scsi/libsas/sas_expander.c:1781:20: warning: array subscript
‘struct smp_resp[0]’ is partly outside array bounds of ‘unsigned
char[32]’ [-Warray-bounds]
1781 | if (rg_resp->result != SMP_RESP_FUNC_ACC) {
| ~~~~~~~^~~~~~~~
This is due to the use of the struct smp_resp to aggregate all possible
response types using a union but allocating a response buffer with a size
exactly equal to the size of the response type needed. This leads to access
to fields of struct smp_resp from an allocated memory area that is smaller
than the size of struct smp_resp.
Fix this by defining struct smp_rg_resp for sas report general responses.
Damien Le Moal [Thu, 9 Jun 2022 02:24:54 +0000 (11:24 +0900)]
scsi: libsas: Introduce struct smp_disc_resp
When compiling with gcc 12, several warnings are thrown by gcc when
compiling drivers/scsi/libsas/sas_expander.c, e.g.:
In function ‘sas_get_phy_change_count’,
inlined from ‘sas_find_bcast_phy.constprop’ at
drivers/scsi/libsas/sas_expander.c:1737:9:
drivers/scsi/libsas/sas_expander.c:1697:39: warning: array subscript
‘struct smp_resp[0]’ is partly outside array bounds of ‘unsigned
char[56]’ [-Warray-bounds]
1697 | *pcc = disc_resp->disc.change_count;
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~
This is due to the use of the struct smp_resp to aggregate all possible
response types using a union but allocating a response buffer with a size
exactly equal to the size of the response type needed. This leads to access
to fields of struct smp_resp from an allocated memory area that is smaller
than the size of struct smp_resp.
Fix this by defining struct smp_disc_resp for sas discovery operations.
Since this structure and the generic struct smp_resp are identical for
the little endian and big endian archs, move the definition of these
structures at the end of include/scsi/sas.h to avoid repeating their
definition.
Quinn Tran [Wed, 8 Jun 2022 11:58:48 +0000 (04:58 -0700)]
scsi: qla2xxx: edif: Fix slow session teardown
User experience slow recovery when target device went through a stop/start
of the authentication application (app_stop/app_start).
Between the period of app_stop and app_start on the target device, target
device choose to send ELS Reject for any receive AUTH ELS command. At this
time, authentication application does not do ELS reject if it encounters
error.
Therefore, AUTH ELS reject signify authentication application is not
running. If driver passes up the AUTH ELS Reject to the authentication
application, then it would result in authentication application
retrying/resending the same AUTH ELS command again + delay.
As a work around, driver should trigger a session tear down where it tells
the local authentication application to also tear down. At the next
relogin, both sides are then synchronized.
Link: https://lore.kernel.org/r/20220608115849.16693-10-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Wed, 8 Jun 2022 11:58:47 +0000 (04:58 -0700)]
scsi: qla2xxx: edif: Reduce N2N thrashing at app_start time
For N2N + remote WWPN is bigger than local adapter, remote adapter will
login to local adapter while authentication application is not running.
When authentication application starts, the current session in FW needs to
to be invalidated.
Make sure the old session is torn down before triggering a relogin.
Link: https://lore.kernel.org/r/20220608115849.16693-9-njavali@marvell.com Fixes: 03d1f5917d23 ("scsi: qla2xxx: edif: Add N2N support for EDIF") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Wed, 8 Jun 2022 11:58:43 +0000 (04:58 -0700)]
scsi: qla2xxx: edif: Fix no login after app start
The scenario is this: User loaded driver but has not started authentication
app. All sessions to secure device will exhaust all login attempts, fail,
and in stay in deleted state. Then some time later the app is started. The
driver will replenish the login retry count, trigger delete to prepare for
secure login. After deletion, relogin is triggered.
For the session that is already deleted, the delete trigger is a no-op. If
none of the sessions trigger a relogin, no progress is made.
Quinn Tran [Wed, 8 Jun 2022 11:58:41 +0000 (04:58 -0700)]
scsi: qla2xxx: edif: Send LOGO for unexpected IKE message
If the session is down and the local port continues to receive AUTH ELS
messages, the driver needs to send back LOGO so that the remote device
knows to tear down its session. Terminate and clean up the AUTH ELS
exchange followed by a passthrough LOGO.
Link: https://lore.kernel.org/r/20220608115849.16693-3-njavali@marvell.com Fixes: ce4d4dd912f3 ("scsi: qla2xxx: edif: Reject AUTH ELS on session down") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Wed, 8 Jun 2022 11:58:40 +0000 (04:58 -0700)]
scsi: qla2xxx: edif: Fix I/O timeout due to over-subscription
The current edif code does not keep track of FW IOCB resources. This led
to IOCB queue full on error recovery (I/O timeout). Make use of the
existing code that tracks IOCB resources to prevent over-subscription.
Link: https://lore.kernel.org/r/20220608115849.16693-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
keliu [Fri, 27 May 2022 08:30:49 +0000 (08:30 +0000)]
scsi: core: iscsi: Directly use ida_alloc()/ida_free()
Use ida_alloc()/ida_free() instead of the deprecated
ida_simple_get()/ida_simple_remove() interface.
Link: https://lore.kernel.org/r/20220527083049.2552526-1-liuke94@huawei.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: keliu <liuke94@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dmitry Bogdanov [Mon, 23 May 2022 09:59:05 +0000 (12:59 +0300)]
scsi: target: iscsi: Control authentication per ACL
Add acls/{ACL}/attrib/authentication attribute that controls authentication
for particular ACL. By default, this attribute inherits a value of the
authentication attribute of the target port group to keep backward
compatibility.
Authentication attribute has 3 states:
"0" - authentication is turned off for this ACL
"1" - authentication is required for this ACL
"-1" - authentication is inherited from TPG
Link: https://lore.kernel.org/r/20220523095905.26070-4-d.bogdanov@yadro.com Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dmitry Bogdanov [Mon, 23 May 2022 09:59:04 +0000 (12:59 +0300)]
scsi: target: iscsi: Extract auth functions
Create functions that answers simple questions: Whether authentication is
required, what credentials, whether connection is autenticated.
Link: https://lore.kernel.org/r/20220523095905.26070-3-d.bogdanov@yadro.com Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dmitry Bogdanov [Mon, 23 May 2022 09:59:03 +0000 (12:59 +0300)]
scsi: target: iscsi: Add upcast helpers
iSCSI target is cluttered with open-coded container_of() conversions from
se_nacl to iscsi_node_acl. The code could be cleaned by introducing a
helper - to_iscsi_nacl() (similar to other helpers in target core).
While at it, make another iSCSI conversion helper consistent and rename
iscsi_tpg() helper to to_iscsi_tpg().
Link: https://lore.kernel.org/r/20220523095905.26070-2-d.bogdanov@yadro.com Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 7 Jun 2022 04:46:26 +0000 (21:46 -0700)]
scsi: qla2xxx: edif: Fix n2n login retry for secure device
After initiator has burned up all login retries, target authentication
application begins to run. This triggers a link bounce on target side.
Initiator will attempt another login. Due to N2N, the PRLI [nvme | fcp] can
fail because of the mode mismatch with target. This patch add a few more
login retries to revive the connection.
Link: https://lore.kernel.org/r/20220607044627.19563-11-njavali@marvell.com Fixes: 03d1f5917d23 ("scsi: qla2xxx: edif: Add N2N support for EDIF") Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 7 Jun 2022 04:46:25 +0000 (21:46 -0700)]
scsi: qla2xxx: edif: Fix n2n discovery issue with secure target
User failed to see disk via n2n topology. Driver used up all login retries
before authentication application started. When authentication application
started, driver did not have enough login retries to connect securely. On
app_start, driver will reset the login retry attempt count.
Link: https://lore.kernel.org/r/20220607044627.19563-10-njavali@marvell.com Fixes: 03d1f5917d23 ("scsi: qla2xxx: edif: Add N2N support for EDIF") Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Quinn Tran [Tue, 7 Jun 2022 04:46:24 +0000 (21:46 -0700)]
scsi: qla2xxx: edif: Remove old doorbell interface
Recently driver has implemented a new doorbell mechanism via bsg. The new
doorbell tells driver the exact buffer size application has where driver
can fill it up with events. The old doorbell guestimated application buffer
size is 256.
Remove duplicate functionality, the application has moved on to the new
doorbell interface.
Quinn Tran [Tue, 7 Jun 2022 04:46:23 +0000 (21:46 -0700)]
scsi: qla2xxx: edif: Add retry for ELS passthrough
Relating to EDIF, when sending IKE message, updating key or deleting key,
driver can encounter IOCB queue full. Add additional retries to reduce
higher level recovery.
Quinn Tran [Tue, 7 Jun 2022 04:46:21 +0000 (21:46 -0700)]
scsi: qla2xxx: edif: Fix potential stuck session in sa update
When a thread is in the process of reestablish a session, a flag is set to
prevent multiple threads/triggers from doing the same task. This flag was
left on, and any attempt to relogin was locked out. Clear this flag if the
attempt has failed.
Quinn Tran [Tue, 7 Jun 2022 04:46:20 +0000 (21:46 -0700)]
scsi: qla2xxx: edif: Add bsg interface to read doorbell events
Add bsg interface for app to read doorbell events. This interface lets
driver know how much app can read based on return buffer size. When the
next event(s) occur, driver will return the bsg_job with the event(s) in
the return buffer.
If there is no event to read, driver will hold on to the bsg_job up to few
seconds as a way to control the polling interval.
Quinn Tran [Tue, 7 Jun 2022 04:46:19 +0000 (21:46 -0700)]
scsi: qla2xxx: edif: Wait for app to ack on sess down
On session deletion, wait for app to acknowledge before moving on. This
allows both app and driver to stay in sync. In addition, this gives a
chance for authentication app to do any type of cleanup before moving on.
This patch uses GFFID switch command to scan whether remote device is
Target or Initiator mode. Based on that info, driver will not pass up
Initiator info to authentication application. This helps reduce unnecessary
stress for authentication application to deal with unused connections.