git.baikalelectronics.ru Git - kernel.git/commit

author	Horace Chen <horace.chen@amd.com>
	Wed, 20 Jan 2021 14:03:28 +0000 (22:03 +0800)
committer	Alex Deucher <alexander.deucher@amd.com>
	Mon, 25 Jan 2021 22:45:16 +0000 (17:45 -0500)
commit	d6d5dd80f118606ff76f75daf85c31ab935040f1
tree	74e197780fc7941ae409f12a61540c7bae762153	tree \| snapshot
parent	3ed833d499175d0126aa6583c4331aa33bd74431	commit \| diff

drm/amdgpu: race issue when jobs on 2 ring timeout

Fix a racing issue when jobs on 2 rings timeout simultaneously.

If 2 rings timed out at the same time, the
amdgpu_device_gpu_recover will be reentered. Then the
adev->gmc.xgmi.head will be grabbed by 2 local linked list,
which may cause wild pointer issue in iterating.

lock the device earily to prevent the node be added to 2
different lists.

also increase karma for the skipped job since the job is also
timed out and should be guilty.

Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>