Chris Wilson [Fri, 29 Nov 2019 20:13:28 +0000 (20:13 +0000)]
drm/i915/gen7: Re-enable full-ppgtt for ivb & hsw
After much hair pulling, resort to preallocating the ppGTT entries on
init to circumvent the apparent lack of PD invalidate following the
write to PP_DCLV upon switching mm between contexts (and here the same
context after binding new objects). However, the details of that PP_DCLV
invalidate are still unknown, and it appears we need to reload the mm
twice to cover over a timing issue. Worrying.
Chris Wilson [Fri, 29 Nov 2019 17:25:42 +0000 (17:25 +0000)]
drm/i915/execlists: Ensure the tasklet is decoupled upon shutdown
As we only cancel the timers asynchronously, they may
still be running on another CPU as we shutdown, raising one last
softirq. So be safe and make sure the tasklet is flushed before
destroying the engine's memory.
Chris Wilson [Fri, 29 Nov 2019 15:18:45 +0000 (15:18 +0000)]
drm/i915/gem: Take timeline->mutex to walk list-of-requests
Though the context is closed and so no more requests can be added to the
timeline, retirement can still be removing requests. It can even be
removing the very request we are inspecting and so cause us to wander
into dead links.
Serialise with the retirement by taking the timeline->mutex used for
guarding the timeline->requests list.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112404 Fixes: ee80fd16aa52 ("drm/i915/gem: Refine occupancy test in kill_context()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191129151845.1092933-1-chris@chris-wilson.co.uk
Ville Syrjälä [Fri, 11 Oct 2019 20:09:45 +0000 (23:09 +0300)]
drm/i915: Don't set undefined bits in dirty_pipes
skl_commit_modeset_enables() straight up compares dirty_pipes
with a bitmask of already committed pipes. If we set bits in
dirty_pipes for non-existent pipes that comparison will never
work right. So let's limit ourselves to bits that exist.
And we'll do the same for the active_pipes_changed bitmask.
Chris Wilson [Fri, 29 Nov 2019 12:48:46 +0000 (12:48 +0000)]
Revert "drm/i915: use a separate context for gpu relocs"
Since commit 86fe8b597fbf ("drm/i915/tgl: Suspend pre-parser across GTT
invalidations"), we now disable the advanced preparser on Tigerlake for the
invalidation phase at the start of the batch, we no longer need to emit
the GPU relocations from a second context as they are now flushed inlined.
References: 47191cff0463 ("drm/i915: use a separate context for gpu relocs")
References: 86fe8b597fbf ("drm/i915/tgl: Suspend pre-parser across GTT invalidations") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191129124846.949100-1-chris@chris-wilson.co.uk
Chris Wilson [Fri, 29 Nov 2019 10:34:55 +0000 (10:34 +0000)]
drm/i915/selftests: Wait only on the expected barrier
Wait on only the last request on the kernel_context after emitting a
barrier so that we do not wait for everything in general and by doing so
cause an accidental emission of the barrier!
Michel Thierry [Thu, 28 Nov 2019 02:10:05 +0000 (07:40 +0530)]
drm/i915/tgl: Implement Wa_1604555607
Implement Wa_1604555607 (set the DS pairing timer to 128 cycles).
FF_MODE2 is part of the register state context, that's why it is
implemented here.
At TGL A0 stepping, FF_MODE2 register read back is broken, hence
disabling the WA verification.
v2: Rebased on top of the WA refactoring (Oscar)
v3: Correctly add to ctx_workarounds_init (Michel)
v4:
uncore read is used [Tvrtko]
Macros as used for MASK definition [Chris]
v5:
Skip the Wa_1604555607 verification [Ram]
i915 ptr retrieved from engine. [Tvrtko]
v6:
Added wa_add as a wrapper for __wa_add [Chris]
wa_add is directly called instead of new wrapper [tvrtko]
Chris Wilson [Thu, 28 Nov 2019 11:34:24 +0000 (11:34 +0000)]
drm/i915/gem: Excise the per-batch whitelist from the context
One does not lightly add a new hidden struct_mutex dependency deep within
the execbuf bowels! The immediate suspicion in seeing the whitelist
cached on the context, is that it is intended to be preserved between
batches, as the kernel is quite adept at caching small allocations
itself. But no, it's sole purpose is to serialise command submission in
order to save a kmalloc on a slow, slow path!
By removing the whitelist dependency from the context, our freedom to
chop the big struct_mutex is greatly augmented.
v2: s/set_bit/__set_bit/ as the whitelist shall never be accessed
concurrently.
Chris Wilson [Wed, 27 Nov 2019 11:58:13 +0000 (11:58 +0000)]
drm/i915/gt: Defer breadcrumb processing to after the irq handler
The design of our interrupt handlers is that we ack the receipt of the
interrupt first, inside the critical section where the master interrupt
control is off and other cpus cannot start processing the next
interrupt; and then process the interrupt events afterwards. However,
Icelake introduced a whole new set of banked GT_IIR that are inherently
serialised and slow to retrieve the IIR and must be processed within the
critical section. We can still push our breadcrumbs out of this critical
section by using our irq_worker. On bdw+, this should not make too much
of a difference as we only slightly defer the breadcrumbs, but on icl+
this should make a big difference to our throughput of interrupts from
concurrently executing engines.
Chris Wilson [Wed, 27 Nov 2019 13:45:27 +0000 (13:45 +0000)]
drm/i915: Serialise i915_active_fence_set() with itself
The expected downside to commit 211d4a06fdef ("drm/i915: Reduce nested
prepare_remote_context() to a trylock") was that it would need to return
-EAGAIN to userspace in order to resolve potential mutex inversion. Such
an unsightly round trip is unnecessary if we could atomically insert a
barrier into the i915_active_fence, so make it happen.
Currently, we use the timeline->mutex (or some other named outer lock)
to order insertion into the i915_active_fence (and so individual nodes
of i915_active). Inside __i915_active_fence_set, we only need then
serialise with the interrupt handler in order to claim the timeline for
ourselves.
However, if we remove the outer lock, we need to ensure the order is
intact between not only multiple threads trying to insert themselves
into the timeline, but also with the interrupt handler completing the
previous occupant. We use xchg() on insert so that we have an ordered
sequence of insertions (and each caller knows the previous fence on
which to wait, preserving the chain of all fences in the timeline), but
we then have to cmpxchg() in the interrupt handler to avoid overwriting
the new occupant. The only nasty side-effect is having to temporarily
strip off the RCU-annotations to apply the atomic operations, otherwise
the rules are much more conventional!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402 Fixes: 211d4a06fdef ("drm/i915: Reduce nested prepare_remote_context() to a trylock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk
Chris Wilson [Wed, 27 Nov 2019 09:56:57 +0000 (09:56 +0000)]
drm/i915/gt: Manual rc6 entry upon parking
Now that we rapidly park the GT when the GPU idles, we often find
ourselves idling faster than the RC6 promotion timer. Thus if we tell
the GPU to enter RC6 manually as we park, we can do so quicker (by
around 50ms, half an EI on average) and marginally increase our
powersaving across all execlists platforms.
v2: Now with a selftest to check we can enter RC6 manually
Clint Taylor [Thu, 21 Nov 2019 20:14:55 +0000 (12:14 -0800)]
drm/i915: Disable display interrupts during display IRQ handler
During the Display Interrupt Service routine the Display Interrupt
Enable bit must be disabled, The interrupts handled, then the
Display Interrupt Enable bit must be set to prevent possible missed
interrupts.
Bspec: 49212
V2: Change Title to remove SDE reference.
V3: Fix TAB spacing.
Kai Vehmanen [Mon, 25 Nov 2019 12:53:12 +0000 (14:53 +0200)]
drm/i915/dp: fix DP audio for PORT_A on gen12+
Starting with gen12, PORT_A can be connected to a transcoder
with audio support. Modify the existing logic that disabled
audio on PORT_A unconditionally.
Chris Wilson [Tue, 26 Nov 2019 06:55:21 +0000 (06:55 +0000)]
drm/i915: Reduce nested prepare_remote_context() to a trylock
On context retiring, we may invoke the kernel_context to unpin this
context. Elsewhere, we may use the kernel_context to modify this
context. This currently leads to an AB-BA lock inversion, so we need to
back-off from the contended lock, and repeat.
Chris Wilson [Mon, 25 Nov 2019 16:27:37 +0000 (16:27 +0000)]
drm/i915: Default to a more lenient forced preemption timeout
Based on a sampling of a number of benchmarks across platforms, by
default opt for a much more lenient timeout so that we should not
adversely affect existing "good" clients.
640ms ought to be enough for anyone.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112169 Fixes: f8c83d5a79eb ("drm/i915/execlists: Force preemption") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125162737.2161069-1-chris@chris-wilson.co.uk
Chris Wilson [Mon, 25 Nov 2019 10:58:58 +0000 (10:58 +0000)]
drm/i915/gt: Schedule request retirement when timeline idles
The major drawback of commit 99952ba44d8e ("drm/i915/gen8+: Add RC6 CTX
corruption WA") is that it disables RC6 while Skylake (and friends) is
active, and we do not consider the GPU idle until all outstanding
requests have been retired and the engine switched over to the kernel
context. If userspace is idle, this task falls onto our background idle
worker, which only runs roughly once a second, meaning that userspace has
to have been idle for a couple of seconds before we enable RC6 again.
Naturally, this causes us to consume considerably more energy than
before as powersaving is effectively disabled while a display server
(here's looking at you Xorg) is running.
As execlists will get a completion event as each context is completed,
we can use this interrupt to queue a retire worker bound to this engine
to cleanup idle timelines. We will then immediately notice the idle
engine (without userspace intervention or the aid of the background
retire worker) and start parking the GPU. Thus during light workloads,
we will do much more work to idle the GPU faster... Hopefully with
commensurate power saving!
v2: Watch context completions and only look at those local to the engine
when retiring to reduce the amount of excess work we perform.
Chris Wilson [Mon, 25 Nov 2019 10:58:57 +0000 (10:58 +0000)]
drm/i915/gt: Adapt engine_park synchronisation rules for engine_retire
In the next patch, we will introduce a new asynchronous retirement
worker, fed by execlists CS events. Here we may queue a retirement as
soon as a request is submitted to HW (and completes instantly), and we
also want to process that retirement as early as possible and cannot
afford to postpone (as there may not be another opportunity to retire it
for a few seconds). To allow the new async retirer to run in parallel
with our submission, pull the __i915_request_queue (that passes the
request to HW) inside the timelines spinlock so that the retirement
cannot release the timeline before we have completed the submission.
v2: Actually to play nicely with engine_retire, we have to raise the
timeline.active_lock before releasing the HW. intel_gt_retire_requsts()
is still serialised by the outer lock so they cannot see this
intermediate state, and engine_retire is serialised by HW submission.
Chris Wilson [Mon, 25 Nov 2019 10:58:56 +0000 (10:58 +0000)]
drm/i915: Serialise with engine-pm around requests on the kernel_context
As the engine->kernel_context is used within the engine-pm barrier, we
have to be careful when emitting requests outside of the barrier, as the
strict timeline locking rules do not apply. Instead, we must ensure the
engine_park() cannot be entered as we build the request, which is
simplest by taking an explicit engine-pm wakeref around the request
construction.
Chris Wilson [Mon, 25 Nov 2019 11:25:20 +0000 (11:25 +0000)]
drm/i915/execlists: Fixup cancel_port_requests()
I rushed a last minute correction to cancel_port_requests() to prevent
the snooping of *execlists->active as the inflight array was being
updated, without noticing we iterated the inflight array starting from
active! Oops.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387 Fixes: 9d1579db5343 ("drm/i915/gt: Mark the execlists->active as the primary volatile access") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk
Chris Wilson [Mon, 25 Nov 2019 09:43:18 +0000 (09:43 +0000)]
drm/i915/gt: Mark the execlists->active as the primary volatile access
Since we want to do a lockless read of the current active request, and
that request is written to by process_csb also without serialisation, we
need to instruct gcc to take care in reading the pointer itself.
Otherwise, we have observed execlists_active() to report 0x40.
The answer is that as we keep the existing execlists->active pointing
into the array as we copy over that array, the unserialised read may see
a partial pointer value.
Tvrtko Ursulin [Fri, 22 Nov 2019 10:41:15 +0000 (10:41 +0000)]
drm/i915/query: Align flavour of engine data lookup
Commit 58a70cd21027 ("drm/i915/gt: Move the [class][inst] lookup for
engines onto the GT") changed the engine query to iterate over uabi
engines but left the buffer size calculation look at the physical engine
count. Difference has no practical consequence but it is nicer to align
both queries.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 58a70cd21027 ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191122104115.29610-1-tvrtko.ursulin@linux.intel.com
Chris Wilson [Fri, 22 Nov 2019 13:24:04 +0000 (13:24 +0000)]
drm/i915/selftests: Flush the active callbacks
Before checking the current i915_active state for the asynchronous work
we submitted, flush any ongoing callback. This ensures that our sampling
is robust and does not sporadically fail due to bad timing as the work
is running on another cpu.
v2: Drop the fence callback sync, retiring under the lock should be good
enough to synchronize with engine_retire() and the
intel_gt_retire_requests() background worker.
Chris Wilson [Fri, 22 Nov 2019 11:21:48 +0000 (11:21 +0000)]
drm/i915/selftests: Force bonded submission to overlap
Bonded request submission is designed to allow requests to execute in
parallel as laid out by the user. If the master request is already
finished before its bonded pair is submitted, the pair were not destined
to run in parallel and we lose the information about the master engine
to dictate selection of the secondary. If the second request was
required to be run on a particular engine in a virtual set, that should
have been specified, rather than left to the whims of a random
unconnected requests!
In the selftest, I made the mistake of not ensuring the master would
overlap with its bonded pairs, meaning that it could indeed complete
before we submitted the bonds. Those bonds were then free to select any
available engine in their virtual set, and not the one expected by the
test.
Chris Wilson [Fri, 22 Nov 2019 09:49:24 +0000 (09:49 +0000)]
drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request
As we start peeking into requests for longer and longer, e.g.
incorporating use of spinlocks when only protected by an
rcu_read_lock(), we need to be careful in how we reset the request when
recycling and need to preserve any barriers that may still be in use as
the request is reset for reuse.
Quoting Linus Torvalds:
> If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?
.. because the object can be accessed (by RCU) after the refcount has
gone down to zero, and the thing has been released.
That's the whole and only point of SLAB_TYPESAFE_BY_RCU.
That flag basically says:
"I may end up accessing this object *after* it has been free'd,
because there may be RCU lookups in flight"
This has nothing to do with constructors. It's ok if the object gets
reused as an object of the same type and does *not* get
re-initialized, because we're perfectly fine seeing old stale data.
What it guarantees is that the slab isn't shared with any other kind
of object, _and_ that the underlying pages are free'd after an RCU
quiescent period (so the pages aren't shared with another kind of
object either during an RCU walk).
And it doesn't necessarily have to have a constructor, because the
thing that a RCU walk will care about is
(a) guaranteed to be an object that *has* been on some RCU list (so
it's not a "new" object)
(b) the RCU walk needs to have logic to verify that it's still the
*same* object and hasn't been re-used as something else.
In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
immediately, but because it gets reused as the same kind of object,
the RCU walker can "know" what parts have meaning for re-use, in a way
it couidn't if the re-use was random.
That said, it *is* subtle, and people should be careful.
> So the re-use might initialize the fields lazily, not necessarily using a ctor.
If you have a well-defined refcount, and use "atomic_inc_not_zero()"
to guard the speculative RCU access section, and use
"atomic_dec_and_test()" in the freeing section, then you should be
safe wrt new allocations.
If you have a completely new allocation that has "random stale
content", you know that it cannot be on the RCU list, so there is no
speculative access that can ever see that random content.
So the only case you need to worry about is a re-use allocation, and
you know that the refcount will start out as zero even if you don't
have a constructor.
So you can think of the refcount itself as always having a zero
constructor, *BUT* you need to be careful with ordering.
In particular, whoever does the allocation needs to then set the
refcount to a non-zero value *after* it has initialized all the other
fields. And in particular, it needs to make sure that it uses the
proper memory ordering to do so.
NOTE! One thing to be very worried about is that re-initializing
whatever RCU lists means that now the RCU walker may be walking on the
wrong list so the walker may do the right thing for this particular
entry, but it may miss walking *other* entries. So then you can get
spurious lookup failures, because the RCU walker never walked all the
way to the end of the right list. That ends up being a much more
subtle bug.
Chris Wilson [Thu, 21 Nov 2019 13:05:28 +0000 (13:05 +0000)]
drm/i915: Mark intel_wakeref_get() as a sleeper
Assume that intel_wakeref_get() may take the mutex, and perform other
sleeping actions in the course of its callbacks and so use might_sleep()
to ensure that all callers abide. Anything that cannot sleep has to use
e.g. intel_wakeref_get_if_active() to guarantee its avoidance of the
non-atomic paths.
Chris Wilson [Thu, 21 Nov 2019 10:35:46 +0000 (10:35 +0000)]
drm/i915/execlists: Lock the request while validating it during promotion
Since the request is already on the HW as we perform its validation, it
and even its subsequent barrier may be concurrently retired before we
process the assertions. If it is retired already and so off the HW, our
assertions become void and we need to ignore them.
Chris Wilson [Thu, 21 Nov 2019 07:10:41 +0000 (07:10 +0000)]
drm/i915: Serialise with remote retirement
Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.
Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.
Chris Wilson [Thu, 21 Nov 2019 07:10:40 +0000 (07:10 +0000)]
Revert "drm/i915/gt: Wait for new requests in intel_gt_retire_requests()"
From inside an active timeline in the execbuf ioctl, we may try to
reclaim some space in the GGTT. We need GGTT space for all objects on
!full-ppgtt platforms, and for context images everywhere. However, to
free up space in the GGTT we may need to remove some pinned objects
(e.g. context images) that require flushing the idle barriers to remove.
For this we use the big hammer of intel_gt_wait_for_idle()
However, commit ae7a507b4b61 ("drm/i915/gt: Wait for new requests in
intel_gt_retire_requests()") will continue spinning on the wait if a
timeline is active but lacks requests, as is the case during execbuf
reservation. Spinning forever is quite time consuming, so revert that
commit and start again.
In practice, the effect commit ae7a507b4b61 was trying to achieve is
accomplished by commit 3f8d9c3c7da7 ("drm/i915/gt: Move new timelines
to the end of active_list"), so there is no immediate rush to replace
the looping.
Testcase: igt/gem_exec_reloc/basic-range Fixes: ae7a507b4b61 ("drm/i915/gt: Wait for new requests in intel_gt_retire_requests()")
References: 3f8d9c3c7da7 ("drm/i915/gt: Move new timelines to the end of active_list") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-1-chris@chris-wilson.co.uk
Chris Wilson [Wed, 20 Nov 2019 18:22:09 +0000 (18:22 +0000)]
drm/i915/gt: Fixup config ifdeffery for pm_suspend_target_state
pm_suspend_target_state is declared under CONFIG_PM_SLEEP but only
defined under CONFIG_SUSPEND. Play safe and only use the symbol if it is
both declared and defined.
Reported-by: kbuild-all@lists.01.org Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: 840983c99683 ("drm/i915: Defer rc6 shutdown to suspend_late") Cc: Andi Shyti <andi.shyti@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120182209.3967833-1-chris@chris-wilson.co.uk
Chris Wilson [Wed, 20 Nov 2019 16:55:14 +0000 (16:55 +0000)]
drm/i915/gt: Unlock engine-pm after queuing the kernel context switch
In commit 886808005124 ("drm/i915: Push the wakeref->count deferral to
the backend"), I erroneously concluded that we last modify the engine
inside __i915_request_commit() meaning that we could enable concurrent
submission for userspace as we enqueued this request. However, this
falls into a trap with other users of the engine->kernel_context waking
up and submitting their request before the idle-switch is queued, with
the result that the kernel_context is executed out-of-sequence most
likely upsetting the GPU and certainly ourselves when we try to retire
the out-of-sequence requests.
As such we need to hold onto the effective engine->kernel_context mutex
lock (via the engine pm mutex proxy) until we have finish queuing the
request to the engine.
v2: Serialise against concurrent intel_gt_retire_requests()
v3: Describe the hairy locking scheme with intel_gt_retire_requests()
for future reference.
v4: Combine timeline->lock and engine pm release; it's hairy.
Fixes: 886808005124 ("drm/i915: Push the wakeref->count deferral to the backend") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-2-chris@chris-wilson.co.uk
Chris Wilson [Wed, 20 Nov 2019 16:55:13 +0000 (16:55 +0000)]
drm/i915/gt: Close race between engine_park and intel_gt_retire_requests
The general concept was that intel_timeline.active_count was locked by
the intel_timeline.mutex. The exception was for power management, where
the engine->kernel_context->timeline could be manipulated under the
global wakeref.mutex.
This was quite solid, as we always manipulated the timeline only while
we held an engine wakeref.
And then we started retiring requests outside of struct_mutex, only
using the timelines.active_list and the timeline->mutex. There we
started manipulating intel_timeline.active_count outside of an engine
wakeref, and so introduced a race between __engine_park() and
intel_gt_retire_requests(), a race that could result in the
engine->kernel_context not being added to the active timelines and so
losing requests, which caused us to keep the system permanently powered
up [and unloadable].
The race would be easy to close if we could take the engine wakeref for
the timeline before we retire -- except timelines are not bound to any
engine and so we would need to keep all active engines awake. The
alternative is to guard intel_timeline_enter/intel_timeline_exit for use
outside of the timeline->mutex.
Fixes: eaac073f8c2e ("drm/i915: Protect request retirement with timeline->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk
Chris Wilson [Wed, 20 Nov 2019 12:54:33 +0000 (12:54 +0000)]
drm/i915: Mark up the calling context for intel_wakeref_put()
Previously, we assumed we could use mutex_trylock() within an atomic
context, falling back to a worker if contended. However, such trickery
is illegal inside interrupt context, and so we need to always use a
worker under such circumstances. As we normally are in process context,
we can typically use a plain mutex, and only defer to a work when we
know we are being called from an interrupt path.
Fixes: b7fdcb5a50d3 ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
References: ffa070fa9b938 ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk
Just pass the atomic state+crtc to the .crtc_enable()
.crtc_disable(). Life is easier when you don't have to think
whether to pass the old or the new crtc state.
Ville Syrjälä [Mon, 18 Nov 2019 16:44:23 +0000 (18:44 +0200)]
drm/i915: Move assert_vblank_disabled() into intel_crtc_vblank_on()
Move the assert_vblank_disabled() into intel_crtc_vblank_on()
so that we don't have to inline it all over.
This does mean we now assert_vblank_disabled() during readout as well
but that is totally fine as it happens after drm_crtc_vblank_reset().
One can even argue it's what we want to do anyway to make sure
the reset actually happened.
Just pass the atomic state and the crtc to intel_encoders_enable() & co.
Make life simpler when you don't have to think which state (old vs. new)
you have to pass in. Also constify the states while at it.
Chris Wilson [Wed, 20 Nov 2019 10:27:41 +0000 (10:27 +0000)]
drm/i915/selftests: Take a ref to the request we wait upon
i915_request_add() consumes the passed in reference to the i915_request,
so if the selftest caller wishes to wait upon it afterwards, it needs to
take a reference for itself.
Chris Wilson [Tue, 19 Nov 2019 10:09:13 +0000 (10:09 +0000)]
drm/i915/gem: Manually dump the debug trace on GEM_BUG_ON
Since igt now defaults to not enabling ftrace-on-oops, we need to
manually invoke GEM_TRACE_DUMP() to see the debug log prior to a
GEM_BUG_ON panicking.
Chris Wilson [Tue, 19 Nov 2019 16:25:58 +0000 (16:25 +0000)]
drm/i915/gt: Move new timelines to the end of active_list
When adding a new active timeline, place it at the end of the list. This
allows for intel_gt_retire_requests() to pick up the newcomer more
quickly and hopefully complete the retirement sooner. A miniscule
optimisation.
Matthew Auld [Tue, 19 Nov 2019 15:01:54 +0000 (15:01 +0000)]
drm/i915: make pool objects read-only
For our current users we don't expect pool objects to be writable from
the gpu.
Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Fixes: 4e4b99951391 ("drm/i915: Support ro ppgtt mapped cmdparser shadow buffers") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191119150154.18249-1-matthew.auld@intel.com
Matt Roper [Mon, 18 Nov 2019 18:02:19 +0000 (10:02 -0800)]
drm/i915/tgl: Add DKL PHY vswing table for HDMI
The bspec initially provided a single DKL PHY vswing table for both HDMI
and DP, but was recently updated to include an independent table for
HDMI.
Bspec: 49292 Fixes: 1b8756a62631 ("drm/i915/tgl: Add dkl phy programming sequences") Cc: Clinton A Taylor <clinton.a.taylor@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191118180219.9309-1-matthew.d.roper@intel.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Chris Wilson [Tue, 19 Nov 2019 11:25:15 +0000 (11:25 +0000)]
drm/i915/gem: Track ggtt writes from userspace on the bound vma
When userspace writes into the GTT itself, it is supposed to call
set-domain to let the kernel keep track and so manage the CPU/GPU
caches. As we track writes on the individual i915_vma, we should also be
sure to mark them as dirty.
Chris Wilson [Mon, 18 Nov 2019 23:02:40 +0000 (23:02 +0000)]
drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint
In order to avoid some nasty mutex inversions, commit baf4b170accf
("drm/i915: Keep rings pinned while the context is active") allowed the
intel_ring unpinning to be run concurrently with the next context
pinning it. Thus each step in intel_ring_unpin() needed to be atomic and
ordered in a nice onion with intel_ring_pin() so that the lifetimes
overlapped and were always safe.
Sadly, a few steps in intel_ring_unpin() were overlooked, such as
closing the read/write pointers of the ring and discarding the
intel_ring.vaddr, as these steps were not serialised with
intel_ring_pin() and so could leave the ring in disarray.
Fixes: baf4b170accf ("drm/i915: Keep rings pinned while the context is active") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191118230254.2615942-6-chris@chris-wilson.co.uk
Chris Wilson [Mon, 18 Nov 2019 18:49:33 +0000 (18:49 +0000)]
drm/i915/gt: Only wait for register chipset flush if active
Only serialise with the chipset using an mmio if the chipset is
currently active. We expect that any writes into the chipset range will
simply be forgotten until it wakes up.
Lucas De Marchi [Mon, 11 Nov 2019 20:50:25 +0000 (12:50 -0800)]
drm/i915/dsb: fix extra warning on error path handling
When we call intel_dsb_get(), the dsb initialization may fail for
various reasons. We already log the error message in that path, making
it unnecessary to trigger a warning that refcount == 0 when calling
intel_dsb_put().
So here we simplify the logic and do lazy shutdown: leaving the extra
refcount alive so when we call intel_dsb_put() we end up calling
i915_vma_unpin_and_release().
Lucas De Marchi [Sat, 16 Nov 2019 01:15:39 +0000 (17:15 -0800)]
drm/i915/dsb: remove atomic operations
The current dsb API is not really prepared to handle multithread access.
I was debugging an issue that ended up fixed by commit 252b4509f5ab
("drm/i915/dsb: Remove PIN_MAPPABLE from the DSB object VMA") and was
puzzled how these atomic operations were guaranteeing atomicity.
if (atomic_add_return(1, &dsb->refcount) != 1)
return dsb;
Thread A could still be initializing dsb struct (and even fail in the
middle) while thread B would take a reference and use it (even
derefencing a NULL cmd_buf).
I don't think the atomic operations here will help much if this were
to support multithreaded scenario in future, so just remove them to
avoid confusion.
v2: Use refcount++ != 0 instead of ++refcount != 1 (from Ville)
drm/i915/mst: Check uapi enable not intel one during mst atomic check
When the connector has VCPI allocated and is being moved to another
pipe it causes drm_dp_atomic_release_vcpi_slots() and
drm_dp_atomic_find_vcpi_slots() to be called in the same atomic check
causing the error bellow.
This happens because at this point Intel's hw.enable(and all other
flags in the same struct) is not set but checking to on the uapi one
it have the expected value.
Cc: Lyude Paul <lyude@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191115200430.53146-1-jose.souza@intel.com
Matt Roper [Fri, 15 Nov 2019 16:51:32 +0000 (08:51 -0800)]
drm/i915/vbt: Handle generic DTD block
VBT revision 229 adds a new "Generic DTD" block 58 and deprecates the
old LFP panel mode data in block 42. Let's start parsing this block to
fill in the panel fixed mode on devices with a >=229 VBT.
v2:
* Update according to the recent updates:
- DTD size is now 16 bits instead of 24
- polarity is now just a single bit for hsync and vsync and is
properly documented
* Minor checkpatch fix
v3:
* Now that panel options are parsed separately from the previous patch,
move generic DTD parsing into a function parallel to
parse_lfp_panel_dtd. We'll still fall back to looking at the legacy
LVDS timing block if the generic DTD fails. (Jani)
* Don't forget to actually set lfp_lvds_vbt_mode! (Jani)
* Drop "bdb_" prefix from dtd entry structure. (Jani)
* Follow C99 standard for structure's flexible array member. (Jani)
v4:
* Add "positive" to polarity field names for clarity. (Jani)
* Move VBT version check and fallback to legacy DTD parsing logic to a
helper to keep top-level VBT parsing uncluttered. (Jani)
* Restructure reserved bit packing at end of generic_dtd_entry from
"u32 rsvd:24" to "u8 rsvd[3]" to prevent copy/paste mistakes in the
future. (Jani)
Matt Roper [Fri, 15 Nov 2019 16:51:31 +0000 (08:51 -0800)]
drm/i915/vbt: Parse panel options separately from timing data
Newer VBT versions will add an alternate way to read panel DTD
information, so let's split parsing of the general panel information
from the timing data in preparation.
Don Hiatt [Fri, 15 Nov 2019 23:15:38 +0000 (15:15 -0800)]
drm/i915/guc: Skip suspend/resume GuC action on platforms w/o GuC submission
On some platforms (e.g. KBL) that do not support GuC submission, but
the user enabled the GuC communication (e.g for HuC authentication)
calling the GuC EXIT_S_STATE action results in lose of ability to
enter RC6. We can remove the GuC suspend/resume entirely as we do
not need to save the GuC submission status.
Add intel_guc_submission_is_enabled() function to determine if
GuC submission is active.
v2: Do not suspend/resume the GuC on platforms that do not support
Guc Submission.
v3: Fix typo, move suspend logic to remove goto.
v4: Use intel_guc_submission_is_enabled() to check GuC submission
status.
v5: No need to look at engine to determine if submission is enabled.
Squash fix + intel_guc_submission_is_enabled() patch into one.
v6: Move resume check into intel_guc_resume() for symmetry.
Fix commit Fixes tag.
Reported-by: KiteStramuort <kitestramuort@autistici.org> Reported-by: S. Zharkoff <s.zharkoff@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111594
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111623 Fixes: 9fa4411c3fe6 ("drm/i915/guc: Updates for GuC 32.0.3 firmware") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceralo Spurio <daniele.ceraolospurio@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Tomas Janousek <tomi@nomi.cz> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191115231538.1249-1-don.hiatt@intel.com
Ville Syrjälä [Thu, 24 Oct 2019 12:21:38 +0000 (15:21 +0300)]
drm/i915: Fix frame start delay programming
Currently we're blindly poking at the frame start delay bits
in PIPECONF when trying to sanitize the hardware state. Those
bits decided to move elsewhere on HSW, so on many platforms
we're not doing anything at all here. Also we're forgetting
about the PCH transcoder entirely.
Add all the bit definitions for the various homes these bits
have had throughout the years, and reset them all to zero.
However I'm not entirely sure this is a safe thing to do. If
not I guess we'd want full readout+statecheck for this stuff.
For now let's stick to the current logic and hope for the
best.
Chris Wilson [Fri, 15 Nov 2019 15:08:40 +0000 (15:08 +0000)]
drm/i915/selftests: Disable heartbeat around context barrier tests
As the heartbeat has the effect of flushing context barriers, this
interferes with the context barrier tests that are trying to observe
them directly. Disable the heartbeat so that the barriers are as
predictable as the test demands.
Chris Wilson [Fri, 15 Nov 2019 15:08:39 +0000 (15:08 +0000)]
drm/i915/gt: Flush retire.work timer object on unload
We need to wait until the timer object is marked as deactivated before
unloading, so follow up our gentle cancel_delayed_work() with the
synchronous variant to ensure it is flushed off a remote cpu before we
mark the memory as freed.
Chris Wilson [Thu, 14 Nov 2019 22:57:36 +0000 (22:57 +0000)]
drm/i915/gem: Silence sparse for RCU protection inside the constructor
Inside the constructor, while cloning, we need to replace the
dst->engines. Having forgotten that dst->engines is marked as RCU
protected, we need to add the appropriate annotations to make sparse
happy.
Chris Wilson [Thu, 14 Nov 2019 22:57:32 +0000 (22:57 +0000)]
drm/i915/gt: Wait for new requests in intel_gt_retire_requests()
Our callers fall into two categories, those passing timeout=0 who just
want to flush request retirements and those passing a timeout that need
to wait for submission completion (e.g. intel_gt_wait_for_idle()).
Currently, we only wait for a snapshot of timelines at the start of the
wait (but there was an expectation that new requests would cause timelines
to appear at the end). However, our callers, such as
intel_gt_wait_for_idle() before suspend, do require us to wait for the
power management requests emitted by retirement as well. If we don't,
then it takes an extra second or two for the background worker to flush
the queue and mark the GT as idle.
Jani Nikula [Fri, 15 Nov 2019 11:17:39 +0000 (13:17 +0200)]
Merge drm/drm-next into drm-intel-next-queued
Backmerge to get 03f40b8046c8 ("Backmerge i915 security patches from
commit '826035bbac8f' into drm-next") and thus 5a861c95b538 ("Merge
Intel Gen8/Gen9 graphics fixes from Jon Bloomfield.").
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Dave Airlie [Fri, 15 Nov 2019 02:16:43 +0000 (12:16 +1000)]
Merge tag 'drm-intel-next-fixes-2019-11-14' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- PMU "Frequency" is reported as accumulated cycles
- Avoid OOPS in dumb_create IOCTL when no CRTCs
- Mitigation for userptr put_pages deadlock with trylock_page
- Fix to avoid freeing heartbeat request too early
- Fix LRC coherency issue
- Fix Bugzilla #112212: Avoid screen corruption on MST
- Error path fix to unlock context on failed context VM SETPARAM
- Always consider holding preemption a privileged op in perf/OA
- Preload LUTs if the hw isn't currently using them to avoid color flash on VLV/CHV
- Protect context while grabbing its name for the request
- Don't resize aliasing ppGTT size
- Smaller fixes picked by tooling