Evan Quan [Thu, 13 Aug 2020 08:39:25 +0000 (16:39 +0800)]
drm/amd/pm: optimize the power related source code layout
The target is to provide a clear entry point(for power routines).
Also this can help to maintain a clear view about the frameworks
used on different ASICs. Hopefully all these can make power part
more friendly to play with.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 13 Aug 2020 05:37:52 +0000 (13:37 +0800)]
drm/amd/powerplay: put those exposed power interfaces in amdgpu_dpm.c
As other power interfaces.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 13 Aug 2020 03:51:11 +0000 (11:51 +0800)]
drm/amd/powerplay: optimize i2c bus access implementation
The caller needs not care about the internal details how the powerplay
API implemented.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 12 Aug 2020 05:19:25 +0000 (13:19 +0800)]
drm/amd/powerplay: drop unnecessary pp_funcs checker
It's redundant. Also, the callers should not care about
the implementation details.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cover the implementation details from outside(of power part).
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 14 Aug 2020 04:15:44 +0000 (12:15 +0800)]
drm/amd/powerplay: suppress the kernel test robot warning
Suppress the warning below:
In file included from drivers/gpu/drm/amd/amdgpu/../powerplay/smu_cmn.c:
>> drivers/gpu/drm/amd/powerplay/smu_cmn.c:485:9: warning: Identical condition 'ret', second condition is always false [identicalConditionAfterEarlyExit]
return ret;
^
drivers/gpu/drm/amd/powerplay/smu_cmn.c:477:6: note: first condition
if (ret)
^
drivers/gpu/drm/amd/powerplay/smu_cmn.c:485:9: note: second condition
return ret;
^
Signed-off-by: Evan Quan <evan.quan@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guchun Chen [Thu, 13 Aug 2020 07:00:56 +0000 (15:00 +0800)]
drm/amdgpu: guard ras debugfs creation/removal based on CONFIG_DEBUG_FS
It can avoid potential build warn/error when
CONFIG_DEBUG_FS is not set.
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guchun Chen [Thu, 13 Aug 2020 06:35:35 +0000 (14:35 +0800)]
drm/amdgpu: fix NULL pointer access issue when unloading driver
When unloading driver by "modprobe -r amdgpu", one NULL pointer
dereference bug occurs in ras debugfs releasing. The cause is the
duplicated debugfs_remove, as drm debugfs_root dir has been cleaned
up already by drm_minor_unregister.
Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 12 Aug 2020 04:37:03 +0000 (12:37 +0800)]
drm/amd/powerplay: enable Sienna Cichlid mgpu fan boost feature
Support Sienna Cichlid mgpu fan boost enablement.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 12 Aug 2020 04:29:16 +0000 (12:29 +0800)]
drm/amd/powerplay: enable Navi1X mgpu fan boost feature(V2)
Support Navi1X mgpu fan boost enablement.
V2: rich the comment and correct the revision id check
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 12 Aug 2020 04:08:56 +0000 (12:08 +0800)]
drm/amd/powerplay: enable swSMU mgpu fan boost support
Enable mgpu fan boost feature on swSMU routines.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 12 Aug 2020 03:53:47 +0000 (11:53 +0800)]
drm/amd/powerplay: optimize the interface for mgpu fan boost enablement
Cover the implementation details from outside(of power). Also preparing
for expanding this to swSMU.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Kevin Wang [Thu, 6 Aug 2020 15:41:47 +0000 (23:41 +0800)]
drm/amdgpu: fix uninit-value in arcturus_log_thermal_throttling_event()
when function arcturus_get_smu_metrics_data() call failed,
it will cause the variable "throttler_status" isn't initialized before use.
warning:
powerplay/arcturus_ppt.c:2268:24: warning: ‘throttler_status’ may be used uninitialized in this function [-Wmaybe-uninitialized]
2268 | if (throttler_status & logging_label[throttler_idx].feature_mask) {
Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Mon, 10 Aug 2020 05:27:56 +0000 (13:27 +0800)]
drm/amd/powerplay: bump NAVI12 driver if version
To fit the latest SMU firmware.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 6 Aug 2020 08:49:19 +0000 (16:49 +0800)]
drm/amd/powerplay: maximum the code sharing around metrics table retrieving
Instead of having one copy in each ASIC.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 6 Aug 2020 07:38:25 +0000 (15:38 +0800)]
drm/amd/powerplay: update the metrics table cache interval as 1ms
To make the setting same as Arcturus/Navi1x/Sienna_Cichlid.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oak Zeng [Fri, 7 Aug 2020 03:17:35 +0000 (22:17 -0500)]
drm/amdgpu: Use function pointer for some mmhub functions
Add more function pointers to amdgpu_mmhub_funcs. ASIC specific
implementation of most mmhub functions are called from a general
function pointer, instead of calling different function for
different ASIC. Simplify the code by deleting duplicate functions
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Each adev has owned lock_class_key to avoid false positive
recursive locking.
v2:
1. register adev->lock_key into lockdep, otherwise lockdep will
report the below warning
[ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
[ 1216.705924] ------------[ cut here ]------------
[ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
[ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210
v3:
change to use down_write_nest_lock to annotate the false dead-lock
warning.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The RAP TA contains tests used to verify if
RAP(Register Access Policy), or otherwise known
as Security Policy is applied correctly
by PSP BL&TOS.
The RAP test is a measure to ensure that we reduce
the avenue of complexity and mistakes when dealing
with RAP in post-si execution, where debugging failures
related to RAP is quite difficult and expensive.
Tianci.Yin [Fri, 19 Jun 2020 08:01:11 +0000 (16:01 +0800)]
drm/amdgpu: add interface amdgpu_gfx_init_spm_golden for Navi1x
On Navi1x, the SPM golden settings are lost after GFXOFF
enter/exit, so reconfiguration is needed. Make the
configuration code as an interface for future use.
Guchun Chen [Tue, 4 Aug 2020 07:05:01 +0000 (15:05 +0800)]
drm/amdgpu: add debugfs node to toggle ras error cnt harvest
Before ras recovery is issued, user could operate this debugfs
node to enable/disable the harvest of all RAS IPs' ras error
count registers, which will help keep hardware's registers'
status instead of cleaning up them.
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Once ras recovery is issued by ras sync flood interrupt or
ras controller interrupt, add this guard to bypass or execute
ras error count register harvest of all IPs.
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Arunpravin [Thu, 6 Aug 2020 09:04:33 +0000 (14:34 +0530)]
drm/amdgpu: Enable P2P dmabuf over XGMI
Access the exported P2P dmabuf over XGMI, if available.
Otherwise, fall back to the existing PCIe method.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arunpravin <apaneers@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Daniel Kolesa [Sat, 8 Aug 2020 20:44:58 +0000 (22:44 +0200)]
drm/amd/display: add DCN support for aarch64
This adds ARM64 support into the DCN. This mainly enables support
for Navi graphics cards. The dcn10 changes haven't been tested,
since I don't have the relevant hardware available, but there
is no way to conditionally disable them, so I've done them anyway.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Daniel Kolesa [Sat, 8 Aug 2020 20:42:35 +0000 (22:42 +0200)]
drm/amdgpu/display: use GFP_ATOMIC in dcn20_validate_bandwidth_internal
GFP_KERNEL may and will sleep, and this is being executed in
a non-preemptible context; this will mess things up since it's
called inbetween DC_FP_START/END, and rescheduling will result
in the DC_FP_END later being called in a different context (or
just crashing if any floating point/vector registers/instructions
are used after the call is resumed in a different context).
Signed-off-by: Daniel Kolesa <daniel@octaforge.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Blank stream before destroying HDCP session
[Why]
Stream disable sequence incorretly destroys HDCP session while stream is
not blanked and while audio is not muted. This sequence causes a flash
of corruption during mode change and an audio click.
[How]
Change sequence to blank stream before destroying HDCP session. Audio will
also be muted by blanking the stream.
Cc: stable@vger.kernel.org Signed-off-by: Jaehyun Chung <jaehyun.chung@amd.com> Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stylon Wang [Tue, 28 Jul 2020 07:10:35 +0000 (15:10 +0800)]
drm/amd/display: Fix EDID parsing after resume from suspend
[Why]
Resuming from suspend, CEA blocks from EDID are not parsed and no video
modes can support YUV420. When this happens, output bpc cannot go over
8-bit with 4K modes on HDMI.
[How]
In amdgpu_dm_update_connector_after_detect(), drm_add_edid_modes() is
called after drm_connector_update_edid_property() to fully parse EDID
and update display info.
Cc: stable@vger.kernel.org Signed-off-by: Stylon Wang <stylon.wang@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alvin Lee [Thu, 30 Jul 2020 03:08:59 +0000 (23:08 -0400)]
drm/amd/display: Disconnect pipe separetely when disable pipe split
[Why]
When changing pixel formats for HDR (e.g. ARGB -> FP16)
there are configurations that change from 2 pipes to 1 pipe.
In these cases, it seems that disconnecting MPCC and doing
a surface update at the same time(after unlocking) causes
some registers to be updated slightly faster than others
after unlocking (e.g. if the pixel format is updated to FP16
before the new surface address is programmed, we get
corruption on the screen because the pixel formats aren't
matching). We separate disconnecting MPCC from the rest
of the pipe programming sequence to prevent this.
[How]
Move MPCC disconnect into separate operation than the
rest of the pipe programming.
Signed-off-by: Alvin Lee <alvin.lee2@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Anthony Koo [Wed, 29 Jul 2020 21:43:10 +0000 (17:43 -0400)]
drm/amd/display: Switch to immediate mode for updating infopackets
[Why]
Using FRAME_UPDATE will result in infopacket to be potentially updated
one frame late.
In commit stream scenarios for previously active stream, some stale
infopacket data from previous config might be erroneously sent out on
initial frame after stream is re-enabled.
[How]
Switch to using IMMEDIATE_UPDATE mode
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Reviewed-by: Ashley Thomas <Ashley.Thomas2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
1. There is a calculation that is using frame_time_in_us instead of
last_render_time_in_us to calculate whether choosing an LFC multiplier
would cause the inserted frame duration to be outside of range.
2. We do not handle unsigned integer subtraction correctly and it underflows
to a really large value, which causes some logic errors.
[How]
1. Fix logic to calculate 'within range' using last_render_time_in_us
2. Split out delta_from_mid_point_delta_in_us calculation to ensure
we don't underflow and wrap around
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Xiaodong Yan [Tue, 28 Jul 2020 10:12:45 +0000 (18:12 +0800)]
drm/amd/display: mpcc black color should not be impacted by pixel encoding format
[Why]
The format in MPCC should be 444
[How]
do not modify the mpcc black color according to pixel encoding format
Signed-off-by: Xiaodong Yan <Xiaodong.Yan@amd.com> Reviewed-by: Eric Yang <eric.yang2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Adjust static-ness of resource functions
[Why]
Register definitions are asic-specific, so functions that use registers of
a particular asic should be static, to be exposed in asic-specific function
pointer structures.
[How]
- make register-definition-using functions static
- make some functions non-static, for future use
- remove duplicate function definition
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Monk Liu [Mon, 10 Aug 2020 06:12:06 +0000 (14:12 +0800)]
drm/amdgpu: fix reload KMD hang on GFX10 KIQ
GFX10 KIQ will hang if we try below steps:
modprobe amdgpu
rmmod amdgpu
modprobe amdgpu sched_hw_submission=4
Due to KIQ is always living there even after KMD unloaded
thus when doing the realod KIQ will crash upon its register
being programed by different values with the previous loading
(the config like HQD addr, ring size, is easily changed if we alter
the sched_hw_submission)
the fix is we must inactive KIQ first before touching any
of its registgers
Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 7 Aug 2020 09:01:47 +0000 (17:01 +0800)]
drm/amd/powerplay: correct UVD/VCE PG state on custom pptable uploading
The UVD/VCE PG state is managed by UVD and VCE IP. It's error-prone to
assume the bootup state in SMU based on the dpm status.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 7 Aug 2020 07:03:40 +0000 (15:03 +0800)]
drm/amd/powerplay: correct Vega20 cached smu feature state
Correct the cached smu feature state on pp_features sysfs
setting.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jay Cornwall [Fri, 24 Jul 2020 23:58:48 +0000 (16:58 -0700)]
drm/amdkfd: Fix spurious debug exception on gfx10
s_barrier triggers a debug exception when issued with PRIV=1,
DEBUG_EN=1. This causes spurious notifications to rocm-gdb.
Clear MODE before issuing s_barrier and restore MODE afterwards
in the context restore handler.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Tested-by: Laurent Morichetti <laurent.morichetti@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm: amdgpu: Use the correct size when allocating memory
When '*sgt' is allocated, we must allocated 'sizeof(**sgt)' bytes instead
of 'sizeof(*sg)'.
The sizeof(*sg) is bigger than sizeof(**sgt) so this wastes memory but
it won't lead to corruption.
Fixes: 4ff30ea1f771 ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume
Reproducing bug report here:
After hibernating and resuming, DPM is not enabled. This remains the case
even if you test hibernate using the steps here:
https://www.kernel.org/doc/html/latest/power/basic-pm-debugging.html
I debugged the problem, and figured out that in the file hardwaremanager.c,
in the function, phm_enable_dynamic_state_management(), the check
'if (!hwmgr->pp_one_vf && smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev) && adev->in_suspend)'
returns true for the hibernate case, and false for the suspend case.
This means that for the hibernate case, the AMDGPU driver doesn't enable DPM
(even though it should) and simply returns from that function.
In the suspend case, it goes ahead and enables DPM, even though it doesn't need to.
I debugged further, and found out that in the case of suspend, for the
CIK/Hawaii GPUs, smum_is_dpm_running(hwmgr) returns false, while in the case of
hibernate, smum_is_dpm_running(hwmgr) returns true.
For CIK, the ci_is_dpm_running() function calls the ci_is_smc_ram_running() function,
which is ultimately used to determine if DPM is currently enabled or not,
and this seems to provide the wrong answer.
I've changed the ci_is_dpm_running() function to instead use the same method that
some other AMD GPU chips do (e.g Fiji), which seems to read the voltage controller.
I've tested on my R9 390 and it seems to work correctly for both suspend and
hibernate use cases, and has been stable so far.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208839 Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dennis Li [Tue, 4 Aug 2020 04:32:13 +0000 (12:32 +0800)]
drm/amdgpu: unlock mutex on error
Make sure to unlock the mutex when error happen
v2:
1. correct syntax error in the commit comments
2. remove change-Id
Acked-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Wed, 5 Aug 2020 09:24:41 +0000 (17:24 +0800)]
drm/amd/powerplay: put VCN/JPEG into PG ungate state before dpm table setup(V3)
As VCN related dpm table setup needs VCN be in PG ungate state. Same logics
applies to JPEG.
V2: fix paste typo
V3: code cosmetic
Signed-off-by: Evan Quan <evan.quan@amd.com> Tested-by: Matt Coffin <mcoffin13@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add lock protections and avoid unnecessary actions
if the PG state is already the same as required.
Signed-off-by: Evan Quan <evan.quan@amd.com> Tested-by: Matt Coffin <mcoffin13@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Drop dm_determine_update_type_for_commit
[Why]
This was added in the past to solve the issue of not knowing when
to stall for medium and full updates in DM.
Since DC is ultimately decides what requires bandwidth changes we
wanted to make use of it directly to determine this.
The problem is that we can't actually pass any of the stream or surface
updates into DC global validation, so we don't actually check if the new
configuration is valid - we just validate the old existing config
instead and stall for outstanding commits to finish.
There's also the problem of grabbing the DRM private object for
pageflips which can lead to page faults in the case where commits
execute out of order and free a DRM private object state that was
still required for commit tail.
[How]
Now that we reset the plane in DM with the same conditions DC checks
we can have planes go through DC validation and we know when we need
to check and stall based on whether the stream or planes changed.
We mark lock_and_validation_needed whenever we've done this, so just
go back to using that instead of dm_determine_update_type_for_commit.
Since we'll skip resetting the plane for a pageflip we will no longer
grab the DRM private object for pageflips as well, avoiding the
page fault issued caused by pageflipping under load with commits
executing out of order.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Reset plane for anything that's not a FAST update
[Why]
MEDIUM or FULL updates can require global validation or affect
bandwidth. By treating these all simply as surface updates we aren't
actually passing this through DC global validation.
[How]
There's currently no way to pass surface updates through DC global
validation, nor do I think it's a good idea to change the interface
to accept these.
DC global validation itself is currently stateless, and we can move
our update type checking to be stateless as well by duplicating DC
surface checks in DM based on DRM properties.
We wanted to rely on DC automatically determining this since DC knows
best, but DM is ultimately what fills in everything into DC plane
state so it does need to know as well.
There are basically only three paths that we exercise in DM today:
Which means that anything that's more than a pageflip really needs to
go down path #3.
So this change duplicates all the surface update checks based on DRM
state instead inside of should_reset_plane().
Next step is dropping dm_determine_update_type_for_commit and we no
longer require the old DC state at all for global validation.
Optimization can come later so we don't reset DC planes at all for
MEDIUM udpates and avoid validation, but we might require some extra
checks in DM to achieve this.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Store tiling_flags and tmz_surface on dm_plane_state
[Why]
Store these in advance so we can reuse them later in commit_tail without
having to reserve the fbo again.
These will also be used for checking for tiling changes when deciding
to reset the plane or not.
[How]
This change should mostly be a refactor. Only commit check is affected
for now and I'll drop the get_fb_info calls in prepare_planes and
commit_tail after.
This runs a prepass loop once we think that all planes have been added
to the context and replaces the get_fb_info calls with accessing the
dm_plane_state instead.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allows UMD to know if TMZ is supported and enabled.
This commit also bumps KMS_DRIVER_MINOR because if we don't
UMD can't tell if "ids_flags & AMDGPU_IDS_FLAGS_TMZ == 0" means
"tmz is not enabled" or "tmz may be enabled but the kernel doesn't
report it".
v2: use amdgpu_is_tmz() and reworded commit message.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 07:28:40 +0000 (15:28 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Vega12
As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 07:24:08 +0000 (15:24 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Vega20
As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 07:02:11 +0000 (15:02 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Renoir
As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 07:09:57 +0000 (15:09 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Sienna Cichlid
As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 06:55:32 +0000 (14:55 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Navi10
As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 06:31:21 +0000 (14:31 +0800)]
drm/amd/powerplay: add control method to bypass metrics cache on Arcturus
As for the gpu metric export, metrics cache makes no sense. It's up to
user to decide how often the metrics should be retrieved.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 04:39:58 +0000 (12:39 +0800)]
drm/amd/powerplay: add Vega12 support for gpu metrics export
Add Vega12 gpu metrics export interface.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 04:23:42 +0000 (12:23 +0800)]
drm/amd/powerplay: add Vega20 support for gpu metrics export
Add Vega20 gpu metrics export interface.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 30 Jul 2020 03:40:07 +0000 (11:40 +0800)]
drm/amd/powerplay: enable gpu_metrics export on legacy powerplay routines
Enable gpu_metrics support on legacy powerplay routines.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Mon, 27 Jul 2020 08:24:46 +0000 (16:24 +0800)]
drm/amd/powerplay: add Renoir support for gpu metrics export(V2)
Add Renoir gpu metrics export interface.
V2: use memcpy to make code more compact
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Mon, 27 Jul 2020 02:00:47 +0000 (10:00 +0800)]
drm/amd/powerplay: add Sienna Cichlid support for gpu metrics export
Add Sienna Cichlid gpu metrics export interface.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 24 Jul 2020 09:24:34 +0000 (17:24 +0800)]
drm/amd/powerplay: add Navi1x support for gpu metrics export
Add Navi1x gpu metrics export interface.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 24 Jul 2020 09:47:03 +0000 (17:47 +0800)]
drm/amd/powerplay: update the data structure for NV12 SmuMetrics
Although it does not bring any problem for now, the coming gpu
metrics interface needs to handle them differently based on the
asic type.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 24 Jul 2020 02:42:39 +0000 (10:42 +0800)]
drm/amd/powerplay: add Arcturus support for gpu metrics export
Add Arcturus gpu metrics export interface.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 24 Jul 2020 10:39:33 +0000 (18:39 +0800)]
drm/amd/powerplay: implement SMU V11 common APIs for retrieving link speed/width
This will be shared around all SMU V11 asics.
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 23 Jul 2020 10:03:35 +0000 (18:03 +0800)]
drm/amd/powerplay: add new sysfs interface for retrieving gpu metrics(V2)
A new interface for UMD to retrieve gpu metrics data.
V2: rich the documentation
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 23 Jul 2020 08:07:01 +0000 (16:07 +0800)]
drm/amd/powerplay: define an universal data structure for gpu metrics (V4)
Thus we can provide an interface for UMD to retrieve gpu metrics data.
V2: better naming and comments
V3: two structures created for dGPU and APU separately
V4: add driver attached timestamp
Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Mon, 27 Jul 2020 13:06:18 +0000 (09:06 -0400)]
drm/amdkfd: option to disable system mem limit
If multiple process share system memory through /dev/shm, KFD allocate
memory should not fail if it reaches the system memory limit because
one copy of physical system memory are shared by multiple process.
Add module parameter no_system_mem_limit to provide user option to
disable system memory limit check at runtime using sysfs or during
driver module init using kernel boot argument. By default the system
memory limit is on.
Print out debug message to warn user if KFD allocate memory failed
because system memory reaches limit.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tianjia Zhang [Sun, 2 Aug 2020 11:15:36 +0000 (19:15 +0800)]
drm/amd/display: Fix wrong return value in dm_update_plane_state()
On an error exit path, a negative error code should be returned
instead of a positive return value.
Fixes: 4af6690ca0428 ("drm/amd/display: Move iteration out of dm_update_planes") Cc: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Jul 2020 19:35:56 +0000 (15:35 -0400)]
drm/amdgpu/gmc: disable keep_stolen_vga_memory on arcturus
I suspect the only reason this was set was to avoid touching
the display related registers on arcturus. Someone should
double check this on arcturus with S3.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Jul 2020 22:34:50 +0000 (18:34 -0400)]
drm/amdgpu: drop the CPU pointers for the stolen vga bos
We never use them.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Jul 2020 22:30:14 +0000 (18:30 -0400)]
drm/amdgpu/gmc10: switch to using amdgpu_gmc_get_vbios_allocations
The new helper centralizes the logic in one place.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Jul 2020 22:29:55 +0000 (18:29 -0400)]
drm/amdgpu/gmc9: switch to using amdgpu_gmc_get_vbios_allocations
The new helper centralizes the logic in one place.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Jul 2020 22:29:39 +0000 (18:29 -0400)]
drm/amdgpu/gmc8: switch to using amdgpu_gmc_get_vbios_allocations
The new helper centralizes the logic in one place.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Jul 2020 22:29:20 +0000 (18:29 -0400)]
drm/amdgpu/gmc7: switch to using amdgpu_gmc_get_vbios_allocations
The new helper centralizes the logic in one place.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Jul 2020 22:27:46 +0000 (18:27 -0400)]
drm/amdgpu/gmc6: switch to using amdgpu_gmc_get_vbios_allocations
The new helper centralizes the logic in one place.
Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>