Imre Deak [Fri, 5 Jun 2020 09:48:01 +0000 (12:48 +0300)]
drm/i915/dp_mst: Fix disabling MST on a port
Currently MST on a port can get enabled/disabled from the hotplug work
and get disabled from the short pulse work in a racy way. Fix this by
relying on the MST state checking in the hotplug work and just schedule
a hotplug work from the short pulse handler if some problem happened
during the MST interrupt handling.
This removes the explicit MST disabling in case of an AUX failure, but
if AUX fails, then probably the detection will also fail during the
scheduled hotplug work and it's not guaranteed that we'll see
intermittent errors anyway.
While at it also simplify the error checking of the MST interrupt
handler.
v2:
- Convert intel_dp_check_mst_status() to return bool. (Ville)
- Change the intel_dp->is_mst check to an assert, since after this patch
the condition can't change after we checked it previously.
- Document the return value from intel_dp_check_mst_status().
v3:
- Remove the intel_dp->is_mst check from intel_dp_check_mst_status().
There is no point in checking the same condition twice, even though
there is a chance that the hotplug work running concurrently changes
it.
Imre Deak [Tue, 9 Jun 2020 22:06:16 +0000 (01:06 +0300)]
drm/i915/icl: Disable DIP on MST ports with the transcoder clock still on
According to BSpec the Data Island Packet should be disabled after
disabling the transcoder, but before the transcoder clock select is set
to none. On an ICL RVP, daisy-chained MST config not following this
leads to a hang with the following MCE when disabling the output:
Chris Wilson [Sun, 7 Jun 2020 22:20:43 +0000 (23:20 +0100)]
drm/i915/selftests: Teach hang-self to target only itself
We have a test case to exercise resetting an engine while the other
engines are busy, all the TEST_SELF adds on top is that the target
engine also has background activity. In this case it is useful to first
test resetting the engine while there is background activity, as a
separate flag from exercising all others.
Chris Wilson [Tue, 9 Jun 2020 15:17:23 +0000 (16:17 +0100)]
drm/i915/gt: Incrementally check for rewinding
In commit 5ba32c7be81e ("drm/i915/execlists: Always force a context
reload when rewinding RING_TAIL"), we placed the check for rewinding a
context on actually submitting the next request in that context. This
was so that we only had to check once, and could do so with precision
avoiding as many forced restores as possible. For example, to ensure
that we can resubmit the same request a couple of times, we include a
small wa_tail such that on the next submission, the ring->tail will
appear to move forwards when resubmitting the same request. This is very
common as it will happen for every lite-restore to fill the second port
after a context switch.
However, intel_ring_direction() is limited in precision to movements of
upto half the ring size. The consequence being that if we tried to
unwind many requests, we could exceed half the ring and flip the sense
of the direction, so missing a force restore. As no request can be
greater than half the ring (i.e. 2048 bytes in the smallest case), we
can check for rollback incrementally. As we check against the tail that
would be submitted, we do not lose any sensitivity and allow lite
restores for the simple case. We still need to double check upon
submitting the context, to allow for multiple preemptions and
resubmissions.
Fixes: 5ba32c7be81e ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200609151723.12971-1-chris@chris-wilson.co.uk
Aditya Swarup [Sat, 6 Jun 2020 02:57:37 +0000 (19:57 -0700)]
drm/i915/rkl: Don't try to read out DSI transcoders
RKL doesn't have DSI outputs, so we shouldn't try to read out the DSI
transcoder registers.
v2(MattR):
- Just set the 'extra panel mask' to edp | dsi0 | dsi1 and then mask
against the platform's cpu_transcoder_mask to filter out the ones
that don't exist on a given platform. (Ville)
v3(MattR):
- Only include DSI transcoders on gen11+ again. (Ville)
- Use for_each_cpu_transcoder_masked() for loop. (Ville)
Matt Roper [Sat, 6 Jun 2020 02:57:36 +0000 (19:57 -0700)]
drm/i915/rkl: Update TGP's pin mapping when paired with RKL
HPD pin handling for RKL+TGP is a special case; we effectively select
the HPD pin based on the DDI (A,B,D,E) rather than the PHY (A,B,C,D).
This differs from the regular behavior of RKL+CMP (and also TGL+TGP).
v2:
- Rather than providing a custom hpd_pin mapping table, just assign
encoder->hpd_pin in a custom manner for this setup. (Ville)
Matt Roper [Sat, 6 Jun 2020 02:57:34 +0000 (19:57 -0700)]
drm/i915/rkl: RKL uses ABOX0 for pixel transfers
Rocket Lake uses the same 'abox0' mechanism to handle pixel data
transfers from memory that gen11 platforms used, rather than the
abox1/abox2 interfaces used by TGL/DG1. For the most part this is a
hardware implementation detail that's transparent to driver software,
but we do have to program a couple of tuning registers (MBUS_ABOX_CTL
and BW_BUDDY registers) according to which ABOX instances are used by a
platform. Let's track the platform's ABOX usage in the device info
structure and use that to determine which instances of these registers
to program.
As an exception to this rule is that even though TGL/DG1 use ABOX1+ABOX2
for data transfers, we're still directed to program the ABOX_CTL
register for ABOX0; so we'll handle that as a special case.
v2:
- Store the mask of platform-specific abox registers in the device
info structure.
- Add a TLB_REQ_TIMER() helper macro. (Aditya)
v3:
- Squash ABOX and BW_BUDDY patches together and use a single mask for
both of them, plus a special-case for programming the ABOX0 instance
on all gen12. (Ville)
Chris Wilson [Sun, 7 Jun 2020 22:20:42 +0000 (23:20 +0100)]
drm/i915/selftests: Make the hanging request non-preemptible
In some of our hangtests, we try to reset an active engine while it is
spinning inside the recursive spinner. However, we also try to flood the
engine with requests that preempt the hang, and so should disable the
preemption to be sure that we reset the right request.
Unfortunately according to our recent findings there is still some
unidentified factor, requiring CDCLK to be set higher - otherwise we
still get underruns on some multipipe configurations, despite CDCLK
being set according to BSpec formula. So getting again back into debug
mode to indentify the cause, meanwhile setting CDCLK=Pixel rate back in
order to remove regression in 10% of the cases due to FIFO underruns.
Gwan-gyeong Mun [Sun, 7 Jun 2020 14:36:14 +0000 (17:36 +0300)]
drm/i915/psr: Program default IO buffer Wake and Fast Wake
The IO buffer Wake and Fast Wake bit size and value have been changed from
Gen12+. It programs the default value of IO buffer Wake and Fast Wake on
Gen12+. It adds definitions of IO buffer Wake and Fast Wake for pre Gen12
and Gen12+. And it aligns PSR2 definition macros.
v2: Fix macro definitions. (José)
v3: Addressed review comments from José
- Add missing default values of IO_BUFFER_WAKE and FAST_WAKE for GEN9+
- Change a style of macro naming in order to use lines as input.
- Update Todo comments.
v4: Add parentheses to macros to avoid precedence issues.
Matt Roper [Sat, 6 Jun 2020 03:18:03 +0000 (20:18 -0700)]
drm/i915: Restore DP-E to VBT mapping table
We accidentally dropped matching for DVO_PORT_DPE from the VBT mapping
table when we refactored the function. Restore it.
Fixes: 4628142aeccc ("drm/i915/rkl: provide port/phy mapping for vbt") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200606031803.3309624-1-matthew.d.roper@intel.com
As a last minute addition, I added an assertion to make sure that the
new i915_vma view would be equal to the discard. However, the positive
encouragement from CI only goes to show that we rarely take this path,
and it wasn't until the post-merge run did we hit the assert -- because
it compared the wrong view. Fixup the copy'n'paste error and compare
against both the old view and the expected new view.
Chris Wilson [Fri, 5 Jun 2020 16:52:58 +0000 (17:52 +0100)]
drm/i915: Discard a misplaced GGTT vma
Across the many users of the GGTT vma (internal objects, mmapings,
display etc), we may end up with conflicting requirements for the
placement. Currently, we try to resolve the conflict by unbinding the
vma and rebinding it to match the new constraints; over time we will end
up with a GGTT that matches the most strict constraints over all
concurrent users. However, this causes a problem if the vma is currently
in use as we must wait until it is idle before moving it. But there is
no restriction on the number of views we may use (apart from the limited
size of the GGTT itself), and so if the active vma does not meet our
requirements, try and build a new one!
Chris Wilson [Fri, 5 Jun 2020 12:23:26 +0000 (13:23 +0100)]
drm/i915/gt: Always check to enable timeslicing if not submitting
We may choose not to submit for a number of reasons, yet not fill both
ELSP. In which case we must start timeslicing (there will be no ACK
event on which to hook the start) if the queue would benefit from the
currently active context being evicted.
Chris Wilson [Fri, 5 Jun 2020 12:23:25 +0000 (13:23 +0100)]
drm/i915/gt: Set timeslicing priority from queue
If we only submit the first port, leaving the second empty yet have
ready requests pending in the queue, use that to set the timeslicing
priority (i.e. the priority at which we will decided to enabling
timeslicing and evict the currently active context if the queue is of
equal priority after its quantum expired).
Kees Cook [Fri, 5 Jun 2020 14:19:53 +0000 (07:19 -0700)]
drm/i915: Fix comments mentioning typo in IS_ENABLED()
This has no code changes, but the typo is clearly getting copy/pasted,
so better to avoid this now and fix the typo. IS_ENABLED() takes full
names, and must have the "CONFIG_" prefix.
Chris Wilson [Thu, 4 Jun 2020 21:14:57 +0000 (22:14 +0100)]
drm/i915/gem: Async GPU relocations only
Reduce the 3 relocation paths down to the single path that accommodates
all. The primary motivation for this is to guard the relocations with a
natural fence (derived from the i915_request used to write the
relocation from the GPU).
The tradeoff in using async gpu relocations is that it increases latency
over using direct CPU relocations, for the cases where the target is
idle and accessible by the CPU. The benefit is greatly reduced lock
contention and improved concurrency by pipelining.
Note that forcing the async gpu relocations does reveal a few issues
they have. Firstly, is that they are visible as writes to gem_busy,
causing to mark some buffers are being to written to by the GPU even
though userspace only reads. Secondly is that, in combination with the
cmdparser, they can cause priority inversions. This should be the case
where the work is being put into a common workqueue losing our priority
information and so being executed in FIFO from the worker, denying us
the opportunity to reorder the requests afterwards.
This parameter is meant to be used when PSR issues are found as some
issues in the past was due wrong values set in VBT so this would be
a quick and easy way to ask users or for us to check if the issue is
due VBT values.
RKL doesn't have PSR2 HW tracking, it was replaced by software/manual
tracking. The driver is required to track the areas that needs update
and program hardware to send selective updates.
So until the software tracking is implemented, PSR2 needs to be disabled
for platforms without PSR2 HW tracking.
Matt Roper [Wed, 3 Jun 2020 21:15:23 +0000 (14:15 -0700)]
drm/i915/rkl: Don't try to access transcoder D
There are a couple places in our driver that loop over transcoders A..D
for gen11+; since RKL only has three pipes/transcoders, this can lead to
unclaimed register reads/writes. We should add checks for transcoder
existence where appropriate.
v2: Move one transcoder check that wound up in the wrong function after
conflict resolution. It belongs in bdw_get_trans_port_sync_config
rather than bxt_get_dsi_transcoder_state.
v3: Switch loops to use for_each_cpu_transcoder_masked() since this
iterator already checks the platform's transcoder mask for us.
(Ville)
drm/i915/tgl: Add HBR and HBR2+ voltage swing table
As latest update we have now 2 voltage swing tables for DP over DKL
PHY with only one difference in Level 0 pre-emphasis 3.
So with 2 tables for DP is time to have one single function to return
all DKL voltage swing tables.
Previous patch didn't take into account all pipes
but only those in state, which could cause wrong
CDCLK conclcusions and calculations.
Also there was a severe issue with min_cdclk being
assigned to 0 every compare cycle.
Too bad this was found by me only after merge.
This could be also causing the issues in test, however
not clear - anyway marking this as fixing the
"Adjust CDCLK accordingly to our DBuf bw needs".
v2: - s/pipe/crtc->pipe/
- save a bit of instructions by
skipping inactive pipes, without
getting 0 DBuf slice mask for it.
Matt Roper [Wed, 3 Jun 2020 21:15:25 +0000 (14:15 -0700)]
drm/i915/rkl: Handle comp master/slave relationships for PHYs
Certain combo PHYs act as a compensation master to other PHYs and need
to be initialized with a special irefgen bit in the PORT_COMP_DW8
register. Previously PHY A was the only compensation master (for PHYs
B & C), but RKL adds a fourth PHY which is slaved to PHY C instead.
Matt Roper [Wed, 3 Jun 2020 21:15:22 +0000 (14:15 -0700)]
drm/i915/rkl: Add DDC pin mapping
The pin mapping for the final two outputs varies according to which PCH
is present on the platform: with TGP the pins are remapped into the TC
range, whereas with CMP they stay in the traditional combo output range.
Lucas De Marchi [Wed, 3 Jun 2020 21:15:20 +0000 (14:15 -0700)]
drm/i915/rkl: provide port/phy mapping for vbt
RKL uses the DDI A, DDI B, DDI USBC1, DDI USBC2 from the DE point of
view, so all DDI/pipe/transcoder register use these indexes to refer to
them. Combo phy and IO functions follow another namespace that we keep
as "enum phy". The VBT in theory would use the DE point of view, but
that does not happen in practice.
Provide a table to convert the child devices to the "correct" port
numbering we use. Now this is the output we get while reading the VBT:
DDIA:
[drm:intel_bios_port_aux_ch [i915]] using AUX A for port A (VBT)
[drm:intel_dp_init_connector [i915]] Adding DP connector on [ENCODER:275:DDI A]
[drm:intel_hdmi_init_connector [i915]] Adding HDMI connector on [ENCODER:275:DDI A]
[drm:intel_hdmi_init_connector [i915]] Using DDC pin 0x1 for port A (VBT)
DDIB:
[drm:intel_bios_port_aux_ch [i915]] using AUX B for port B (platform default)
[drm:intel_hdmi_init_connector [i915]] Adding HDMI connector on [ENCODER:291:DDI B]
[drm:intel_hdmi_init_connector [i915]] Using DDC pin 0x2 for port B (VBT)
DDI USBC1:
[drm:intel_bios_port_aux_ch [i915]] using AUX D for port D (VBT)
[drm:intel_dp_init_connector [i915]] Adding DP connector on [ENCODER:295:DDI D]
[drm:intel_hdmi_init_connector [i915]] Adding HDMI connector on [ENCODER:295:DDI D]
[drm:intel_hdmi_init_connector [i915]] Using DDC pin 0x3 for port D (VBT)
DDI USBC2:
[drm:intel_bios_port_aux_ch [i915]] using AUX E for port E (VBT)
[drm:intel_dp_init_connector [i915]] Adding DP connector on [ENCODER:306:DDI E]
[drm:intel_hdmi_init_connector [i915]] Adding HDMI connector on [ENCODER:306:DDI E]
[drm:intel_hdmi_init_connector [i915]] Using DDC pin 0x9 for port E (VBT)
Matt Roper [Wed, 3 Jun 2020 21:15:15 +0000 (14:15 -0700)]
drm/i915/rkl: Set transcoder mask properly
Although we properly captured RKL's three pipes in the device info
structure, we forgot to make the corresponding update to the transcoder
mask. Set this field so that our transcoder loops will operate
properly.
Chris Wilson [Thu, 4 Jun 2020 15:31:45 +0000 (16:31 +0100)]
drm/i915/gt: Track if an engine requires forcewake w/a
Sometimes an engine might need to keep forcewake active while it is busy
submitting requests for a particular workaround. Track such nuisance
with engine->fw_domain.
Chris Wilson [Thu, 4 Jun 2020 13:59:38 +0000 (14:59 +0100)]
drm/i915: Trim set_timer_ms() intervals
Use the plain msec_to_jiffies() rather than the _timeout variant so we
round down and do not add an extra jiffy to our interval. For example,
with timeslicing we do not want to err on the longer side as any
fairness depends on catching hogging contexts on the GPU. Bring on
CFS.
Chris Wilson [Thu, 4 Jun 2020 10:37:30 +0000 (11:37 +0100)]
drm/i915/gem: Mark the buffer pool as active for the cmdparser
If the execbuf is interrupted after building the cmdparser pipeline, and
before we commit to submitting the request to HW, we would attempt to
clean up the cmdparser early. While we held active references to the vma
being parsed and constructed, we did not hold an active reference for
the buffer pool itself. The result was that an interrupted execbuf could
still have run the cmdparser pipeline, but since the buffer pool was
idle, its target vma could have been recycled.
Note this problem only occurs if the cmdparser is running async due to
pipelined waits on busy fences, and the execbuf is interrupted.
Clint Taylor [Wed, 3 Jun 2020 22:11:50 +0000 (15:11 -0700)]
drm/i915/tgl: Implement WA_16011163337
Set GS Timer to 224. Combine with Wa_1604555607 due to register FF_MODE2
not being able to be read.
V2: Math issue fixed
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Caz Yokoyama <caz.yokoyama@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200603221150.14745-1-clinton.a.taylor@intel.com
Chris Wilson [Thu, 4 Jun 2020 12:36:41 +0000 (13:36 +0100)]
drm/i915/selftests: Exercise all copy engines with the blt routines
Just to remove an obnoxious HAS_ENGINES(), and in the process make the
code agnostic to the availabilty of any particular engine by making it
exercise any and all such engines declared on the system.
Ville Syrjälä [Tue, 12 May 2020 17:41:43 +0000 (20:41 +0300)]
drm/i915: Reverse preemph vs. voltage swing preference
The DP spec says:
"When the combination of the requested pre-emphasis level and
voltage swing exceeds the capability of a DPTX, the DPTX shall
set the pre-emphasis level according to the request and use the
highest voltage swing it can output with the given pre-emphasis level."
and
"When a DPTX reads a request beyond the limits of this Standard,
the DPTX shall set the pre-emphasis level according to the request
and set the highest voltage swing level it can output with the
given pre-emphasis level. If a DPTX is requested for 9.5dB of
pre-emphasis level (may be supported for a DPTX) and cannot support
that level, it shall set the pre-emphasis level to the next
highest level, 6dB."
Ie. we should first validate the pre-emphasis, and then select
the appropriate vswing for it.
Ville Syrjälä [Tue, 12 May 2020 17:41:41 +0000 (20:41 +0300)]
drm/i915: Fix ivb cpu edp vswing
According to the DP spec supporting vswing 1 + preemph 2 is
mandatory. We don't have the hw settings for that though. In
order to pretend to follow the DP spec let's just select
vswing 0 + preemph 2 in this case (the DP spec says to use
the requested preemph in preference to the vswing when the
requested values aren't supported).
Chris Wilson [Tue, 2 Jun 2020 22:09:53 +0000 (23:09 +0100)]
drm/i915: Drop i915_request.i915 backpointer
We infrequently use the direct i915 backpointer from the i915_request,
so do we really need to waste the space in the struct for it? 8 bytes
from the most frequently allocated struct vs an 3 bytes and pointer
chasing in using rq->engine->i915?
Chris Wilson [Wed, 3 Jun 2020 10:46:57 +0000 (11:46 +0100)]
drm/i915/gt: Suppress the error message for GT init failure on error injection
If we injected an error (such as pretending the GuC firmware was
broken), then suppress the error message as it is expected and our CI
complains if it sees any *ERROR*.
Chris Wilson [Tue, 2 Jun 2020 15:48:39 +0000 (16:48 +0100)]
drm/i915/gt: Make the CTX_TIMESTAMP readable on !rcs
For reasons that be, the HW only allows usersace to read its own
CTX_TIMESTAMP [context local HW runtime] on rcs. Make it available for
all by adding it to the whitelists.
Chris Wilson [Tue, 2 Jun 2020 15:48:39 +0000 (16:48 +0100)]
drm/i915/selftests: Ignore autoincrementing timestamp on verfifying whitelists
As a timestamp will automatically update itself, it will not hold only
contexts we write into it, and will change from the baseline value
making us suspect that our writes are landing. As this confuses us and
we would need more careful treatment to detect invalid stores into the
timestamp, skip it when verifying the whitelists.
Vivek Kasireddy [Fri, 22 May 2020 20:26:30 +0000 (13:26 -0700)]
drm/i915/dsi: Dont forget to clean up the connector on error (v2)
If an error is encountered during the DSI initialization setup, the
drm connector object also needs to be cleaned up along with the encoder.
The error can happen due to a missing mode in the VBT or for other
reasons.
v2: Rephrase the commit message to make it more clear.
Chris Wilson [Mon, 1 Jun 2020 07:24:12 +0000 (08:24 +0100)]
drm/i915/gt: Split low level gen2-7 CS emitters
Pull the routines for writing CS packets out of intel_ring_submission
into their own files. These are low level operations for building CS
instructions, rather than the logic for filling the global ring buffer
with requests, and we will want to reuse them outside of this context.
Chris Wilson [Mon, 1 Jun 2020 14:03:55 +0000 (15:03 +0100)]
drm/i915: Trim the ironlake+ irq handler
Ever noticed that our interrupt handlers are where we spend most of our
time on a busy system? In part this is unavoidable as each interrupt
requires to poll and reset several registers, but we can try and do so as
efficiently as possible.
Function old new delta
ilk_irq_handler 2317 2156 -161
v2: Restore the irqreturn_t ret
Function old new delta
ilk_irq_handler.cold 63 72 +9
ilk_irq_handler 2221 2080 -141
A slight improvement in the baseline overnight as well!
Ville Syrjälä [Wed, 27 May 2020 20:02:45 +0000 (23:02 +0300)]
drm/i915: Fix global state use-after-frees with a refcount
While the current locking/serialization of the global state
suffices for protecting the obj->state access and the actual
hardware reprogramming, we do have a problem with accessing
the old/new states during nonblocking commits.
The state computation and swap will be protected by the crtc
locks, but the commit_tails can finish out of order, thus also
causing the atomic states to be cleaned up out of order. This
would mean the commit that started first but finished last has
had its new state freed as the no-longer-needed old state by the
other commit.
To fix this let's just refcount the states. obj->state amounts
to one reference, and the intel_atomic_state holds extra references
to both its new and old global obj states.
Chris Wilson [Mon, 1 Jun 2020 07:24:23 +0000 (08:24 +0100)]
drm/i915: Relinquish forcewake immediately after manual grouping
Our forcewake utilisation is split into categories: automatic and
manual. Around bare register reads, we look up the right forcewake
domain and automatically acquire and release [upon a timer] the
forcewake domain. For other access, where we know we require the
forcewake across a group of register reads, we manually acquire the
forcewake domain and release it at the end. Again, this currently arms
the domain timer for a later release.
However, looking at some energy utilisation profiles, we have tried to
avoid using forcewake [and rely on the natural wake up to post register
updates] due to that even keep the fw active for a brief period
contributes to a significant power draw [i.e. when the gpu is sleeping
with rc6 at high clocks]. But as it turns out, not posting the writes
immediately also has unintended consequences, such as not reducing the
clocks and so conserving power while busy.
As a compromise, let us only arm the domain timer for automatic
forcewake usage around bare register access, but immediately release the
forcewake when manually acquired by intel_uncore_forcewake_get/_put.
The corollary to this is that we may instead have to take forcewake more
often, and so incur a latency penalty in doing so. For Sandybridge this
was significant, and even on the latest machines, taking forcewake at
interrupt frequency is a huge impact. [So we don't do that anymore!
Hopefully, this will spare us from still needing the mitigation of the
timer for steady state execution.]
Chris Wilson [Mon, 1 Jun 2020 07:24:11 +0000 (08:24 +0100)]
drm/i915: Handle very early engine initialisation failure
If we fail during engine setup, we may leave some engines not yet setup.
During the error cleanup, we have to be careful not to try and use the
uninitialise engines before discarding them.
drm/i915: Add Plane color encoding support for YCBCR_BT2020
Currently the plane property doesn't have support for YCBCR_BT2020,
which enables the corresponding color conversion mode on plane CSC.
Enabling the plane property for the planes for GLK & ICL+ platforms.
Also as per spec, update the Plane Color CSC from YUV601_TO_RGB709
to YUV601_TO_RGB601.
V2: Enabling support for YCBCT_BT2020 for HDR planes on
platforms GLK & ICL
V3: Refined the condition check to handle GLK & ICL+ HDR planes
Also added BT2020 handling in glk_plane_color_ctl.
V4: Combine If-else into single If
V5: Drop the checking for HDR planes and enable YCBCR_BT2020
for platforms GLK & ICL+.
V6: As per Spec, update PLANE_COLOR_CSC_MODE_YUV601_TO_RGB709
to PLANE_COLOR_CSC_MODE_YUV601_TO_RGB601 as per Ville's
feedback.
Chris Wilson [Fri, 29 May 2020 18:32:03 +0000 (19:32 +0100)]
drm/i915/gem: Taint all shrinkable object locks
If we declare that an object type is shrinkable (any that we can reclaim
to recover system pages), make sure we taint the object mutex so that
lockdep expects us to use it within fs_reclaim. lockdep will then
complain the first time we try to allocate while holding the plain
mutex, as doing so invites potential recursion.
Chris Wilson [Fri, 29 May 2020 14:39:26 +0000 (15:39 +0100)]
drm/i915: Check for awaits on still currently executing requests
With the advent of preempt-to-busy, a request may still be on the GPU as
we unwind. And in the case of a unpreemptible [due to HW] request, that
request will remain indefinitely on the GPU even though we have
returned it back to our submission queue, and cleared the active bit.
We only run the execution callbacks on transferring the request from our
submission queue to the execution queue, but if this is a bonded request
that the HW is waiting for, we will not submit it (as we wait for a
fresh execution) even though it is still being executed.
As we know that there are always preemption points between requests, we
know that only the currently executing request may be still active even
though we have cleared the flag. However, we do not precisely know which
request is in ELSP[0] due to a delay in processing events, and
furthermore we only store the last request in a context in our state
tracker.
Chris Wilson [Fri, 29 May 2020 08:58:08 +0000 (09:58 +0100)]
drm/i915: Add a few asserts around handling of i915_request_is_active()
Let's assert that we only call the execute callbacks on making the
request active, and that we do not execute the request without calling
the callbacks.
Ville Syrjälä [Wed, 29 Apr 2020 10:39:04 +0000 (13:39 +0300)]
drm/i915: Stop using mode->private_flags
Replace the use of mode->private_flags with a truly private bitmaks
in our own crtc state. We also need a copy in the crtc itself so the
vblank code can get at it. We already have scanline_offset in there
for a similar reason, as well as the vblank->hwmode which is assigned
via drm_calc_timestamping_constants(). Fortunately we now have a
nice place for doing the crtc_state->crtc copy in
intel_crtc_update_active_timings() which gets called both for
modesets and init/resume readout.
The one slightly iffy spot is the INHERITED flag which we want to
preserve until userspace/fb_helper does the first proper commit after
actually calling .detecti() on the connectors. Otherwise we don't have
the full sink capabilities (audio,infoframes,etc.) when .compute_config()
gets called and thus we will fail to enable those features when the
first userspace commit happens. The only internal commit we do prior to
that should be from intel_initial_commit() and there we can simply
preserve the INHERITED flag from the readout.
v2: Deal with INHERITED in sanitize_watermarks() as well
Chris Wilson [Thu, 28 May 2020 20:57:27 +0000 (21:57 +0100)]
drm/i915/gt: Start timeslice on partial submission
We may choose to only submit ELSP[0], even though we have sufficient
requests to fill the whole ELSP. Normally, we only start timeslicing if
we fill more than one port, but in this case we need to start
timeslicing for the queue that we choose not to submit.
Chris Wilson [Thu, 28 May 2020 07:41:00 +0000 (08:41 +0100)]
drm/i915/gt: Don't declare hangs if engine is stalled
If the ring submission is stalled on an external request, nothing can be
submitted, not even the heartbeat in the kernel context. Since nothing
is running, resetting the engine/device does not unblock the system and
is pointless. We can see if the heartbeat is supposed to be running
before declaring foul.
Chris Wilson [Thu, 28 May 2020 08:24:27 +0000 (09:24 +0100)]
drm/i915/gt: Remove local entries from GGTT on suspend
Across suspend/resume, we clear the entire GGTT and rebuild from
scratch. In particular, we want to only preserve the global entries for
use by the HW, and delay reinstating the local binds until required by
the user. This means that we can evict any local binds in the global GTT,
saving any time in preserving their state, as they will be rebound on
demand.
Chris Wilson [Thu, 28 May 2020 15:04:52 +0000 (16:04 +0100)]
drm/i915/gt: Restore both GGTT bindings on resume
We should be able to skip restoring LOCAL (user) binds within the GGTT on
resume and let them be restored upon demand. However, our consistency
checks demand that the bind flags match the node state, and we cannot
simply clear the flags, we need to evict as well. For now, make sure we
restore the bind flags exactly upon resume.
Fixes: 0109a16ef391 ("drm/i915/gt: Clear LOCAL_BIND from shared GGTT on resume") Fixes: bf0840cdb304 ("drm/i915/gt: Stop cross-polluting PIN_GLOBAL with PIN_USER with no-ppgtt") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200528150452.7880-1-chris@chris-wilson.co.uk
Chris Wilson [Wed, 27 May 2020 16:24:18 +0000 (17:24 +0100)]
drm/i915/gt: Prevent timeslicing into unpreemptable requests
We have a I915_REQUEST_NOPREEMPT flag that we set when we must prevent
the HW from preempting during the course of this request. We need to
honour this flag and protect the HW even if we have a heartbeat request,
or other maximum priority barrier, pending. As such, restrict the
timeslicing check to avoid preempting into the topmost priority band,
leaving the unpreemptable requests in blissful peace running
uninterrupted on the HW.
v2: Set the I915_PRIORITY_BARRIER to be less than
I915_PRIORITY_UNPREEMPTABLE so that we never submit a request
(heartbeat or barrier) that can legitimately preempt the current
non-premptable request.
Arnd Bergmann [Wed, 27 May 2020 14:05:09 +0000 (16:05 +0200)]
drm/i915: work around false-positive maybe-uninitialized warning
gcc-9 gets confused by the code flow in check_dirty_whitelist:
drivers/gpu/drm/i915/gt/selftest_workarounds.c: In function 'check_dirty_whitelist':
drivers/gpu/drm/i915/gt/selftest_workarounds.c:492:17: error: 'rsvd' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I could not figure out a good way to do this in a way that gcc
understands better, so initialize the variable to zero, as last
resort.
Arnd Bergmann [Wed, 27 May 2020 14:05:08 +0000 (16:05 +0200)]
drm/i915/pmu: avoid an maybe-uninitialized warning
Conditional spinlocks make it hard for gcc and for lockdep to
follow the code flow. This one causes a warning with at least
gcc-9 and higher:
In file included from include/linux/irq.h:14,
from drivers/gpu/drm/i915/i915_pmu.c:7:
drivers/gpu/drm/i915/i915_pmu.c: In function 'i915_sample':
include/linux/spinlock.h:289:3: error: 'flags' may be used uninitialized in this function [-Werror=maybe-uninitialized]
289 | _raw_spin_unlock_irqrestore(lock, flags); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/i915_pmu.c:288:17: note: 'flags' was declared here
288 | unsigned long flags;
| ^~~~~
Split out the part between the locks into a separate function
for readability and to let the compiler figure out what the
logic actually is.
Chris Wilson [Tue, 26 May 2020 15:07:39 +0000 (16:07 +0100)]
drm/i915/gt: Clear LOCAL_BIND from shared GGTT on resume
We only restore GLOBAL binds upon resume as we expect these to be pinned
for use by HW, whereas the LOCAL binds can be recreated on demand once
userspace is resumed. For the LOCAL bind to be recreated in the global
GTT (for old systems without ppgtt), we need to clear its presence flag
on deciding not to restore the mapping upon resume.
Chris Wilson [Tue, 26 May 2020 09:07:53 +0000 (10:07 +0100)]
drm/i915/gt: Do not schedule normal requests immediately along virtual
When we push a virtual request onto the HW, we update the rq->engine to
point to the physical engine. A request that is then submitted by the
user that waits upon the virtual engine, but along the physical engine
in use, will then see that it is due to be submitted to the same engine
and take a shortcut (and be queued without waiting for the completion
fence). However, the virtual request may be preempted (either by higher
priority users, or by timeslicing) and removed from the physical engine
to be migrated over to one of its siblings. The dependent normal request
however is oblivious to the removal of the virtual request and remains
queued to execute on HW, believing that once it reaches the head of its
queue all of its predecessors will have completed executing!
v2: Beware restriction of signal->execution_mask prior to submission.
Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")
Testcase: igt/gem_exec_balancer/sliced Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.3+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk
Chris Wilson [Mon, 25 May 2020 12:49:12 +0000 (13:49 +0100)]
drm/i915/display: Only query DP state of a DDI encoder
Avoid a NULL dereference for a mismatched encoder type, hit when
probing state for all encoders.
This is a band aid to prevent the OOPS as the right fix is "probably to
swap the psr vs infoframes.enable checks, or outright disappear from
this function" (Ville).
Chris Wilson [Mon, 25 May 2020 15:14:59 +0000 (16:14 +0100)]
drm/i915/gt: Force the GT reset on shutdown
Before we return control to the system, and letting it reuse all the
pages being accessed by HW, we must disable the HW. At the moment, we
dare not reset the GPU if it will clobber the display, but once we know
the display has been disabled, we can proceed with the reset as we
shutdown the module. We know the next user must reinitialise the HW for
their purpose.
Chris Wilson [Sun, 24 May 2020 23:39:00 +0000 (00:39 +0100)]
drm/i915/display: Fix early deref of 'dsb'
drivers/gpu/drm/i915/display/intel_dsb.c:177 intel_dsb_reg_write() warn: variable dereferenced before check 'dsb' (see line 175)
Fixes: afeda4f3b1c8 ("drm/i915/dsb: Pre allocate and late cleanup of cmd buffer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Animesh Manna <animesh.manna@intel.com> Cc: Uma Shankar <uma.shankar@intel.com> Reviewed-by: Animesh Manna <animesh.manna@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200524233900.25598-1-chris@chris-wilson.co.uk
Chris Wilson [Mon, 25 May 2020 07:53:36 +0000 (08:53 +0100)]
drm/i915/gt: Stop cross-polluting PIN_GLOBAL with PIN_USER with no-ppgtt
In order to keep userptr distinct from ggtt mmaps in the eyes of
lockdep, we need to avoid marking those userptr vma as PIN_GLOBAL. (So
long as we comply with only using them as local PIN_USER!)
Chris Wilson [Mon, 25 May 2020 14:19:56 +0000 (15:19 +0100)]
drm/i915/gt: Cancel the flush worker more thoroughly
Since the worker may rearm, we currently are only guaranteed to flush
the work if we cancel the timer. If the work was running at the time we
try and cancel it, we will wait for it to complete, but it may leave
items in the pool and requeue the work. If we rearrange the immediate
discard of the pool then cancel the work, we know that the work cannot
rearm and so our flush will be final.
Animesh Manna [Wed, 20 May 2020 13:07:37 +0000 (18:37 +0530)]
drm/i915/dsb: Pre allocate and late cleanup of cmd buffer
Pre-allocate command buffer in atomic_commit using intel_dsb_prepare
function which also includes pinning and map in cpu domain.
No functional change is dsb write/commit functions.
Now dsb get/put function is removed and ref-count mechanism is
not needed. Below dsb api added to do respective job mentioned
below.
intel_dsb_prepare - Allocate, pin and map the buffer.
intel_dsb_cleanup - Unpin and release the gem object.
RFC: Initial patch for design review.
v2: included _init() part in _prepare(). [Daniel, Ville]
v3: dsb_cleanup called after cleanup_planes. [Daniel]
v4: dsb structure is moved to intel_crtc_state from intel_crtc. [Maarten]
v5: dsb get/put/ref-count mechanism removed. [Maarten]
v6: Based on review feedback following changes are added,
- replaced intel_dsb structure by pointer in intel_crtc_state. [Maarten]
- passing intel_crtc_state to dsp-api to simplify the code. [Maarten]
- few dsb functions prototype modified to simplify code.
v7: added few cosmetic changes suggested by Jani and null check for
crtc_state in dsb_cleanup removed as suggested by Maarten.
v8: changed the function parameter to intel_crtc_state* of
ivb_load_lut_ext_max() from intel_crtc. [Maarten]
v9: error handling improved in _write() and prepare(). [Maarten]
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@intel.com> Signed-off-by: Animesh Manna <animesh.manna@intel.com> Signed-off-by: Uma Shankar <uma.shankar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200520130737.11240-1-animesh.manna@intel.com
Chris Wilson [Fri, 22 May 2020 13:27:06 +0000 (14:27 +0100)]
drm/i915/gem: Avoid iterating an empty list
Our __sgt_iter assumes that the scattergather list has at least one
element. But during construction we may fail in allocating the first
page, and so mark the first element as the terminator. This is
unexpected!
Qiushi Wu [Fri, 22 May 2020 08:34:51 +0000 (09:34 +0100)]
agp/intel: Fix a memory leak on module initialisation failure
In intel_gtt_setup_scratch_page(), pointer "page" is not released if
pci_dma_mapping_error() return an error, leading to a memory leak on
module initialisation failure. Simply fix this issue by freeing "page"
before return.
drm/i915: Adjust CDCLK accordingly to our DBuf bw needs
According to BSpec max BW per slice is calculated using formula
Max BW = CDCLK * 64. Currently when calculating min CDCLK we
account only per plane requirements, however in order to avoid
FIFO underruns we need to estimate accumulated BW consumed by
all planes(ddb entries basically) residing on that particular
DBuf slice. This will allow us to put CDCLK lower and save power
when we don't need that much bandwidth or gain additional
performance once plane consumption grows.
v2: - Fix long line warning
- Limited new DBuf bw checks to only gens >= 11
v3: - Lets track used Dbuf bw per slice and per crtc in bw state
(or may be in DBuf state in future), that way we don't need
to have all crtcs in state and those only if we detect if
are actually going to change cdclk, just same way as we
do with other stuff, i.e intel_atomic_serialize_global_state
and co. Just as per Ville's paradigm.
- Made dbuf bw calculation procedure look nicer by introducing
for_each_dbuf_slice_in_mask - we often will now need to iterate
slices using mask.
- According to experimental results CDCLK * 64 accounts for
overall bandwidth across all dbufs, not per dbuf.
v4: - Fixed missing const(Ville)
- Removed spurious whitespaces(Ville)
- Fixed local variable init(reduced scope where not needed)
- Added some comments about data rate for planar formats
- Changed struct intel_crtc_bw to intel_dbuf_bw
- Moved dbuf bw calculation to intel_compute_min_cdclk(Ville)
v5: - Removed unneeded macro
v6: - Prevent too frequent CDCLK switching back and forth:
Always switch to higher CDCLK when needed to prevent bandwidth
issues, however don't switch to lower CDCLK earlier than once
in 30 minutes in order to prevent constant modeset blinking.
We could of course not switch back at all, however this is
bad from power consumption point of view.
v7: - Fixed to track cdclk using bw_state, modeset will be now
triggered only when CDCLK change is really needed.
v8: - Lock global state if bw_state->min_cdclk is changed.
- Try getting bw_state only if there are crtcs in the commit
(need to have read-locked global state)
v9: - Do not do Dbuf bw check for gens < 9 - triggers WARN
as ddb_size is 0.
v10: - Lock global state for older gens as well.
v11: - Define new bw_calc_min_cdclk hook, instead of using
a condition(Manasi Navare)
v12: - Fixed rebase conflict
v13: - Added spaces after declarations to make checkpatch happy.
Checking with hweight8 if plane configuration had
changed seems to be wrong as different plane configs
can result in a same hamming weight.
So lets check the bitmask itself.
v2: Fixed "from" field which got corrupted for some weird reason
drm/i915: Extract cdclk requirements checking to separate function
In Gen11+ whenever we might exceed DBuf bandwidth we might need to
recalculate CDCLK which DBuf bandwidth is scaled with.
Total Dbuf bw used might change based on particular plane needs.
Thus to calculate if cdclk needs to be changed it is not enough
anymore to check plane configuration and plane min cdclk, per DBuf
bw can be calculated only after wm/ddb calculation is done and
all required planes are added into the state. In order to keep
all min_cdclk related checks in one place let's extract it into
separate function, checking and modifying any_ms.
drm/i915: Decouple cdclk calculation from modeset checks
We need to calculate cdclk after watermarks/ddb has been calculated
as with recent hw CDCLK needs to be adjusted accordingly to DBuf
requirements, which is not possible with current code organization.
Setting CDCLK according to DBuf BW requirements and not just rejecting
if it doesn't satisfy BW requirements, will allow us to save power when
it is possible and gain additional bandwidth when it's needed - i.e
boosting both our power management and perfomance capabilities.
This patch is preparation for that, first we now extract modeset
calculation from modeset checks, in order to call it after wm/ddb
has been calculated.
v2: - Extract only intel_modeset_calc_cdclk from intel_modeset_checks
(Ville Syrjälä)
v3: - Clear plls after intel_modeset_calc_cdclk
v4: - Added r-b from previous revision to commit message
Chris Wilson [Thu, 21 May 2020 14:49:49 +0000 (15:49 +0100)]
drm/i915: Remove PIN_UPDATE for i915_vma_pin
As we no longer use PIN_UPDATE (since commit 7d0aa0db4375 ("drm/i915/gem:
Unbind all current vma on changing cache-level")) we can remove
PIN_UPDATE itself. The benefit is just in simplifing the vma bind.