git.baikalelectronics.ru Git - kernel.git/commit

drm/amdgpu: Add autodump debugfs node for gpu reset v8

When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.

A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.

After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().

There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.

v2: (1) changed 'registered' to 'app_listening'
    (2) add a mutex in open() to prevent race condition

v3 (chk): grab the reset lock to avoid race in autodump_open,
          rename debugfs file to amdgpu_autodump,
          provide autodump_read as well,
          style and code cleanups

v4: add 'bool app_listening' to differentiate situations, so that
    the node can be reopened; also, there is no need to wait for
    completion when no app is waiting for a dump.

v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
    add 'app_state_mutex' for race conditions:
(1)Only 1 user can open this file node
(2)wait_dump() can only take effect after poll() executed.
(3)eliminated the race condition between release() and
   wait_dump()

v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
    removed state checking in amdgpu_debugfs_wait_dump
    Improve on top of version 3 so that the node can be reopened.

v7: move reinit_completion into open() so that only one user
    can open it.

v8: remove complete_all() from amdgpu_debugfs_wait_dump().

Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

author	Jiange Zhao <Jiange.Zhao@amd.com>
	Sun, 26 Apr 2020 09:57:00 +0000 (17:57 +0800)
committer	Alex Deucher <alexander.deucher@amd.com>
	Mon, 18 May 2020 15:23:37 +0000 (11:23 -0400)
commit	97e6f1a67eea7e49c17e6ff7d1f7a9dfa340ce4a
tree	f37d2f8d3060d473562da259577d4d4a891f8280	tree \| snapshot
parent	38f99ba6a2bd381c9a5bb671a66f381fafd18d2c	commit \| diff

drivers/gpu/drm/amd/amdgpu/amdgpu.h		diff \| blob \| history
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c		diff \| blob \| history
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.h		diff \| blob \| history
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c		diff \| blob \| history