]> git.baikalelectronics.ru Git - kernel.git/commit
nvme-rdma: remove timeout for getting RDMA-CM established event
authorIsrael Rukshin <israelr@nvidia.com>
Sun, 15 May 2022 15:04:40 +0000 (18:04 +0300)
committerJens Axboe <axboe@kernel.dk>
Tue, 2 Aug 2022 23:22:41 +0000 (17:22 -0600)
commitfbfb18718fa81633a0aabed9eaeff522295d7530
tree615b15709420f85d1161e5e959559eb50f97b289
parent94c0fbf3275b348b53b8165aba3c5eb141b473ed
nvme-rdma: remove timeout for getting RDMA-CM established event

In case many controllers start error recovery at the same time (i.e.,
when port is down and up), they may never succeed to reconnect again.
This is because the target can't handle all the connect requests at
three seconds (the arbitrary value set today). Even if some of the
connections are established, when a single queue fails to connect,
all the controller's queues are destroyed as well. So, on the
following reconnection attempts the number of connect requests may
remain the same. To fix this, remove the timeout and wait for RDMA-CM
event to abort/complete the connect request. RDMA-CM sends unreachable
event when a timeout of ~90 seconds is expired. This approach is used
at other RDMA-CM users like SRP and iSER at blocking mode. The commit
also renames NVME_RDMA_CONNECT_TIMEOUT_MS to NVME_RDMA_CM_TIMEOUT_MS.

Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
drivers/nvme/host/rdma.c