]> git.baikalelectronics.ru Git - kernel.git/commit
drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU
authorGuchun Chen <guchun.chen@amd.com>
Thu, 16 Apr 2020 15:41:07 +0000 (23:41 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Wed, 22 Apr 2020 22:11:46 +0000 (18:11 -0400)
commit506617b176fa30b2d6e21b10b97364effc0c609f
treea636bfd72445599432b2165a2d59c205606cacb5
parenta24be6fe60529dc4aeedcf0aae29b08909c81d1c
drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU

When running ras uncorrectable error injection and triggering GPU
reset on sGPU, below issue is observed. It's caused by the list
uninitialized when accessing.

[   80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750
[   80.047300] #PF: supervisor write access in kernel mode
[   80.047351] #PF: error_code(0x0003) - permissions violation
[   80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061
[   80.047477] Oops: 0003 [#1] SMP PTI
[   80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G           OE     5.4.0-rc7-guchchen #1
[   80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
[   80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu]

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c