]> git.baikalelectronics.ru Git - kernel.git/commit
drm/amdgpu: race issue when jobs on 2 ring timeout
authorHorace Chen <horace.chen@amd.com>
Wed, 20 Jan 2021 14:03:28 +0000 (22:03 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Mon, 25 Jan 2021 22:45:16 +0000 (17:45 -0500)
commitd6d5dd80f118606ff76f75daf85c31ab935040f1
tree74e197780fc7941ae409f12a61540c7bae762153
parent3ed833d499175d0126aa6583c4331aa33bd74431
drm/amdgpu: race issue when jobs on 2 ring timeout

Fix a racing issue when jobs on 2 rings timeout simultaneously.

If 2 rings timed out at the same time, the
amdgpu_device_gpu_recover will be reentered. Then the
adev->gmc.xgmi.head will be grabbed by 2 local linked list,
which may cause wild pointer issue in iterating.

lock the device earily to prevent the node be added to 2
different lists.

also increase karma for the skipped job since the job is also
timed out and should be guilty.

Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c