Chris Wilson [Fri, 13 Oct 2017 20:26:21 +0000 (21:26 +0100)]
drm/i915: Trim struct_mutex hold duration for i915_gem_free_objects
We free objects in bulk after they wait for their RCU grace period.
Currently, we take struct_mutex and unbind all the objects. This can lead
to a long lock duration during which time those objects have their pages
unfreeable (i.e. the shrinker is prevented from reaping those pages). If
we only process a single object under the struct_mutex and then free the
pages, the number of objects locked away from the shrinker is minimal
and we allow regular clients better access to struct_mutex if they need
it.
Chris Wilson [Fri, 13 Oct 2017 20:26:20 +0000 (21:26 +0100)]
drm/i915: Only free the oldest stale object before a fresh allocation
Inspired by Tvrtko's critique of the reaping of the stale contexts
before allocating a new one, also limit the freed object reaping to the
oldest stale object before allocating a fresh object. Unlike contexts,
objects may have radically different sizes of backing storage, but
similar to contexts, while we want to prevent starvation due to
excessive freed lists, we also do not want to delay fresh allocations
for too long. Only freeing the oldest on the freed object list before
each allocation is a reasonable compromise.
v2: Only a single consumer of llist_del_first() is allowed (although
multiple llist_add are still allowed in parallel). Unlike
i915_gem_context, i915_gem_flush_free_objects() is itself not serialized
and so we need to add our own spinlock. Otherwise KASAN eventually spots
a use-after-free for the race on *first->next.
Chris Wilson [Fri, 13 Oct 2017 20:26:19 +0000 (21:26 +0100)]
drm/i915: Set our shrinker->batch to 4096 (~16MiB)
Prefer to defer activating our GEM shrinker until we have a few
megabytes to free; or we have accumulated sufficient mempressure by
deferring the reclaim to force a shrink. The intent is that because our
objects may typically be large, we are too effective at shrinking and
are not rewarded for freeing more pages than the batch. It will also
defer the initial shrinking to hopefully put it at a lower priority than
say the buffer cache (although it will balance out over a number of
reclaims, with GEM being more bursty).
v2: Give it a feedback system to try and tune the batch size towards
an effective size for the available objects.
v3: Start keeping track of shrinker stats in debugfs
v4: Protect against finding no shrinkable objects (div-by-zero)
Chris Wilson [Fri, 13 Oct 2017 20:26:18 +0000 (21:26 +0100)]
drm/i915: Wire up shrinkctl->nr_scanned
shrink_slab() allows us to report back the number of objects we
successfully scanned (out of the target shrinkctl->nr_to_scan). As
report the number of pages owned by each GEM object as a separate item
to the shrinker, we cannot precisely control the number of shrinker
objects we scan on each pass; and indeed may free more than requested.
If we fail to tell the shrinker about the number of objects we process,
it will continue to hold a grudge against us as any objects left
unscanned are added to the next reclaim -- and so we will keep on
"unfairly" shrinking our own slab in comparison to other slabs.
v2: fixup the misplaced addition, we want to count everything we scan
(to match the number we reported earlier) not just the objects we
successfully validated and freed.
Chris Wilson [Mon, 16 Oct 2017 11:40:37 +0000 (12:40 +0100)]
drm/i915: Move dev_priv->mm.[un]bound_list to its own lock
Remove the struct_mutex requirement around dev_priv->mm.bound_list and
dev_priv->mm.unbound_list by giving it its own spinlock. This reduces
one more requirement for struct_mutex and in the process gives us
slightly more accurate unbound_list tracking, which should improve the
shrinker - but the drawback is that we drop the retirement before
counting so i915_gem_object_is_active() may be stale and lead us to
underestimate the number of objects that may be shrunk (see commit bed50aea61df ("drm/i915/shrinker: Flush active on objects before
counting")).
v2: Crosslink the spinlock to the lists it protects, and btw this
changes s/obj->global_link/obj->mm.link/
v3: Fix decoupling of old links in i915_gem_object_attach_phys()
v3.1: Fix the fix, only unlink if it was linked
v3.2: Use a local for to_i915(obj->base.dev)->mm.obj_lock
Chris Wilson [Fri, 13 Oct 2017 20:26:16 +0000 (21:26 +0100)]
drm/i915: Remove walk over obj->vma_list for the shrinker
In the next patch, we want to reduce the lock coverage within the
shrinker, and one of the dangerous walks we have is over obj->vma_list.
We are only walking the obj->vma_list in order to check whether it has
been permanently pinned by HW access, typically via use on the scanout.
But we have a couple of other long term pins, the context objects for
which we currently have to check the individual vma pin_count. If we
instead mark these using obj->pin_display, we can forgo the dangerous
and sometimes slow list iteration.
v2: Rearrange code to try and avoid confusion from false associations
due to arrangement of whitespace along with rebasing on obj->pin_global.
Chris Wilson [Fri, 13 Oct 2017 20:26:15 +0000 (21:26 +0100)]
drm/i915: Drop debugfs/i915_gem_pin_display
It has now lost its meaning (it shows more than just pin_display), I do
not believe that we are using in preference to the complete listing from
i915_gem_gtt, or the listing from i915_gem_framebuffer, or the listing
of active display objects in i915_display_info.
Chris Wilson [Fri, 13 Oct 2017 20:26:14 +0000 (21:26 +0100)]
drm/i915: Rename obj->pin_display to obj->pin_global
In the next patch, we want to extend use of the global pin counter for
semi-permanent pinning of context/ring objects. Given that we plan to
extend the usage to encompass a disparate set of objects, we want a name
that reflects both and should entail less confusion.
Chris Wilson [Fri, 13 Oct 2017 20:26:13 +0000 (21:26 +0100)]
drm/i915: Refactor testing obj->mm.pages
Since we occasionally stuff an error pointer into obj->mm.pages for a
semi-permanent or even permanent failure, we have to be more careful and
not just test against NULL when deciding if the object has a complete
set of its concurrent pages.
Michal Wajdeczko [Mon, 16 Oct 2017 14:47:20 +0000 (14:47 +0000)]
drm/i915: Update DMC firmware load error messages
Some of the error messages from DMC load were too generic and
may be confusing for the user. Lets explicitly add DMC words there.
Also as homepage of DMC firmware is same as for the GuC and Huc,
lets reuse URL definition.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171016144724.17244-12-michal.wajdeczko@intel.com
Michal Wajdeczko [Mon, 16 Oct 2017 14:47:17 +0000 (14:47 +0000)]
drm/i915/guc: Pick better place for Guc final status message
GuC status message printed right after firmware upload may be too
optimistic, as we may fail on subsequent steps. Move that message
to the end of intel_uc_init_hw where we know the status for sure.
v2: use dev_info (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171016144724.17244-9-michal.wajdeczko@intel.com
Weinan Li [Sun, 15 Oct 2017 03:55:25 +0000 (11:55 +0800)]
drm/i915: enable to read CSB and CSB write pointer from HWSP in GVT-g VM
Let GVT-g VM read the CSB and CSB write pointer from virtual HWSP, not all
the host support this feature, need to check the BIT(3) of caps in PVINFO.
v3 : Remove unnecessary comments.
v4 : Separate VM enable patch with GVT-g implementation patch due to code
dependency.
v5 : Use inline for GVT virtual HWSP caps check function.
v6 : Comments refine.
Chris Wilson [Fri, 13 Oct 2017 15:47:35 +0000 (16:47 +0100)]
drm/i915: Use bdw_ddi_translations_fdi for Broadwell
The compiler warns:
drivers/gpu/drm/i915/intel_ddi.c:118:35: warning: ‘bdw_ddi_translations_fdi’ defined but not used
Lo and behold, if we look at intel_ddi_get_buf_trans_fdi(), it uses
hsw_ddi_translations_fdi[] for both Haswell and *Broadwell*
Fixes: 7d1c42e679f9 ("drm/i915: Refactor code to select the DDI buf translation table") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: David Weinehall <david.weinehall@linux.intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.12+ Link: https://patchwork.freedesktop.org/patch/msgid/20171013154735.27163-1-chris@chris-wilson.co.uk Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Chris Wilson [Fri, 13 Oct 2017 13:12:18 +0000 (14:12 +0100)]
drm/i915: Always stop the rings before a missing GPU reset
Always try to stop the rings, even if the GPU reset itself has been
disabled (via modparam i915.reset). This should at least stop the hw
from spinning in the background consuming resources (e.g. power and
memory bandwidth) letting the system rest-in-peace.
Chris Wilson [Fri, 13 Oct 2017 13:12:17 +0000 (14:12 +0100)]
drm/i915: Keep the rings stopped until they have been re-initialized
Before modifying the ring register (RING_START, HEAD, TAIL, CTL) we
first make sure it is stopped (or else the hw may not resample the
registers). However, we do not need to let the hw restart until after we
have reprogrammed all the rings. This should help prevent situations
where pending operations on the ring may resume (because we are trying
to re-initialize following an unsuccessful GPU hang, i.e. from
i915_gem_unset_wedged).
Chris Wilson [Thu, 12 Oct 2017 20:40:19 +0000 (21:40 +0100)]
drm/i915: Stop asserting on set-wedged vs nop_submit_request ordering
Since the removal of the stop_machine(), it is allowed and expected for
the nop_submit_request() and nop_complete_submit_request() to run in
parallel to the i915_gem_set_wedged() processing. As such we can no
longer assert that i915_gem_set_wedged() has completed inside the
stop_machine prior to the individual nop_submit_request execution.
Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171012204019.3557-1-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Ville Syrjälä [Tue, 10 Oct 2017 12:12:06 +0000 (15:12 +0300)]
drm/i915: Plumb crtc_state etc. directly to intel_ddi_pre_enable_{dp,hdmi}()
Rather that plumb the link parameters separately to
intel_ddi_pre_enable_dp() let's just pass the entire crtc state.
intel_ddi_pre_enable_hdmi() already took the crtc state, but for some
reason intel_ddi_pre_enable() still wanted to extract has_infoframe
from therein and pass it in separately. Let's not do that since it's
pointless.
v2: Rebase due to more code getting pulled into the DDI hooks
Ville Syrjälä [Tue, 10 Oct 2017 12:12:04 +0000 (15:12 +0300)]
drm/i915: Remove useless eDP check from intel_ddi_pre_enable_dp()
intel_edp_panel_on() will itself do the is_edp() check, so the caller
doesn't have to bother. Pre-DDI code doesn't bother, so let's follow the
same approach for DDI.
Ville Syrjälä [Tue, 10 Oct 2017 12:12:02 +0000 (15:12 +0300)]
drm/i915: Inline the required bits of intel_ddi_post_disable() into intel_ddi_fdi_post_disable()
To untangle the mess that is intel_ddi_post_disable() move the the bits
needed by FDI into intel_ddi_fdi_post_disable(). This way we can stop
worrying about FDI in intel_ddi_post_disable().
Ville Syrjälä [Tue, 10 Oct 2017 12:11:59 +0000 (15:11 +0300)]
drm/i915: Dump 'output_types' in crtc state dump
To make it easier to debug things let's dump the output types bitmask in
the crtc state dump. And to make life that much better, let's pretty
print it as a a human reaadable string as well.
v2: Have the caller pass in the buffer (Chris)
#undef OUTPUT_TYPE (Jani)
Harsha Sharma [Mon, 9 Oct 2017 12:06:43 +0000 (17:36 +0530)]
drm/i915: Replace *_reference/unreference() or *_ref/unref with _get/put()
Replace instances of drm_framebuffer_reference/unreference() with
*_get/put() suffixes and drm_dev_unref with *_put() suffix
because get/put is shorter and consistent with the
kernel use of *_get/put suffixes.
Done with following coccinelle semantic patch
Shashank Sharma [Tue, 10 Oct 2017 10:07:44 +0000 (15:37 +0530)]
drm/i915: Add retries for LSPCON detection
We read the dp dual mode Adapter identifier to detect the
LSPCON device. It's been observed from the CI testing that in
few cases, this read can get delayed or fail. For such scenarios,
LSPCON vendors suggest to retry the read operation.
This patch adds retry in the probe function, while reading
LSPCON identifier.
V3: added this patch in the series
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102294
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102295
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102359
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103186 Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1507630064-17908-4-git-send-email-shashank.sharma@intel.com
Shashank Sharma [Tue, 10 Oct 2017 10:07:43 +0000 (15:37 +0530)]
drm/i915: Don't give up waiting on INVALID_MODE
Our current logic to read LSPCON's current mode, stops retries and
breaks wait-loop, if it gets LSPCON_MODE_INVALID as return from the
core function. This doesn't allow us to try reading the mode again.
This patch removes this condition and allows retries reading
the currnt mode until timeout.
This also fixes/prevents some of the noise in form of debug messages
while running IGT CI test cases.
V2: rebase, added r-b
V2: changed some debug message levels from debug->error and
error->debug in lspcon_get_current_mode function.
V3: Rebase
Cc: Imre Deak <imre.deak@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102294
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102295
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102359
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103186 Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Signed-off-by: Mahesh Kumar <Mahesh1.kumar@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1507630064-17908-3-git-send-email-shashank.sharma@intel.com
Shashank Sharma [Thu, 12 Oct 2017 16:40:08 +0000 (22:10 +0530)]
drm: Add retries for lspcon mode detection
From the CI builds, its been observed that during a driver
reload/insert, dp dual mode read function sometimes fails to
read from LSPCON device over i2c-over-aux channel.
This patch:
- adds some delay and few retries, allowing a scope for these
devices to settle down and respond.
- changes one error log's level from ERROR->DEBUG as we want
to call it an error only after all the retries are exhausted.
V2: Addressed review comments from Jani (for loop for retry)
V3: Addressed review comments from Imre (break on partial read too)
V3: Addressed review comments from Ville/Imre (Add the retries
exclusively for LSPCON, not for all dp_dual_mode devices)
V4: Added r-b from Imre, sending it to dri-devel (Jani)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102294
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102295
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102359
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103186 Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1507826408-19322-1-git-send-email-shashank.sharma@intel.com
James Ausmus [Thu, 12 Oct 2017 21:30:36 +0000 (14:30 -0700)]
drm/i915: Fix DP_AUX_CH_CTL_TIME_OUT naming
Rename DP_AUX_CH_CTL_TIME_OUT_1600us to DP_AUX_CH_CTL_TIME_OUT_MAX, as
the meaning of the (3 << 26) value varies per platform, but it's always the
maximum timeout for that platform. Pre-CNL it means 1600us, and for CNL
it means 3200us.
Chris Wilson [Thu, 12 Oct 2017 12:57:26 +0000 (13:57 +0100)]
drm/i915/selftests: Exercise adding requests to a full GGTT
A bug recently encountered involved the issue where are we were
submitting requests to different ppGTT, each would pin a segment of the
GGTT for its logical context and ring. However, this is invisible to
eviction as we do not tie the context/ring VMA to a request and so do
not automatically wait upon it them (instead they are marked as pinned,
preventing eviction entirely). Instead the eviction code must flush those
contexts by switching to the kernel context. This selftest tries to
fill the GGTT with contexts to exercise a path where the
switch-to-kernel-context failed to make forward progress and we fail
with ENOSPC.
v2: Make the hole in the filled GGTT explicit.
v3: Swap out the arbitrary timeout for a private notification from
i915_gem_evict_something()
Chris Wilson [Thu, 12 Oct 2017 12:57:25 +0000 (13:57 +0100)]
drm/i915/selftests: Wrap a timer into a i915_sw_fence
For some selftests, we want to issue requests but delay them going to
hardware. Furthermore, we don't want those requests to block
indefinitely (or else we may hang the driver and block testing) so we
want to employ a timeout. So naturally we want a fence that is
automatically signaled by a timer.
v2: Add kselftests.
v3: Limit the API available to selftests; there isn't an overwhelming
reason to export it universally.
Chris Wilson [Thu, 12 Oct 2017 12:57:24 +0000 (13:57 +0100)]
drm/i915: Fix eviction when the GGTT is idle but full
In the full-ppgtt world, we can fill the GGTT full of context objects.
These context objects are currently implicitly tracked by the requests
that pin them i.e. they are only unpinned when the request is completed
and retired, but we do not have the link from the vma to the request
(anymore). In order to unpin those contexts, we have to issue another
request and wait upon the switch to the kernel context.
The bug during eviction was that we assumed that a full GGTT meant we
would have requests on the GGTT timeline, and so we missed situations
where those requests where merely in flight (and when even they have not
yet been submitted to hw yet). The fix employed here is to change the
already-is-idle test to no look at the execution timeline, but count the
outstanding requests and then check that we have switched to the kernel
context. Erring on the side of overkill here just means that we stall a
little longer than may be strictly required, but we only expect to hit
this path in extreme corner cases where returning an erroneous error is
worse than the delay.
v2: Logical inversion when swapping over branches.
Ville Syrjälä [Thu, 12 Oct 2017 13:02:01 +0000 (16:02 +0300)]
drm/i915: Start tracking PSR state in crtc state
Add the minimal amount of PSR tracking into the crtc state. This allows
precomputing the possibility of using PSR correctly, and it means we can
safely call the psr enable/disable functions for any DP endcoder.
As a nice bonus we get rid of some more crtc->config usage, which we
want to kill off eventually.
v2: Fix 'goto unlock' fail in intel_psr_enable() (Jani)
Check intel_dp_is_edp() in is_edp_psr() (Jani)
Chris Wilson [Wed, 11 Oct 2017 14:18:57 +0000 (15:18 +0100)]
drm/i915/userptr: Drop struct_mutex before cleanup
Purely to silence lockdep, as we know that no bo can exist at this time
and so the inversion is impossible. Nevertheless, lockdep currently
warns on unload:
[ 137.522565] WARNING: possible circular locking dependency detected
[ 137.522568] 4.14.0-rc4-CI-CI_DRM_3209+ #1 Tainted: G U
[ 137.522570] ------------------------------------------------------
[ 137.522572] drv_module_relo/1532 is trying to acquire lock:
[ 137.522574] ("i915-userptr-acquire"){+.+.}, at: [<ffffffff8109a831>] flush_workqueue+0x91/0x540
[ 137.522581]
but task is already holding lock:
[ 137.522583] (&dev->struct_mutex){+.+.}, at: [<ffffffffa014fb3f>] i915_gem_fini+0x3f/0xc0 [i915]
[ 137.522605]
which lock already depends on the new lock.
Jani Nikula [Mon, 9 Oct 2017 09:29:58 +0000 (12:29 +0300)]
drm/i915/dp: centralize max source rate conditions more
Turn intel_dp_source_supports_hbr2() into a simple helper to query the
pre-filled source rates array, and move the conditions about which
platforms support which rates to the single point of truth in
intel_dp_set_source_rates().
This also reduces the code paths you have to think about in the source
rates initialization in intel_dp_set_source_rates(), making it easier to
grasp.
Ville Syrjälä [Mon, 9 Oct 2017 16:19:51 +0000 (19:19 +0300)]
drm/i915: Allow PCH platforms fall back to BIOS LVDS mode
With intel_encoder_current_mode() using the normal state readout code it
actually works on PCH platforms as well. So let's nuke the PCH check from
intel_lvds_init(). I suppose there aren't any machines that actually
need this, but at least we get to eliminate a few lines of code, and one
FIXME.
Ville Syrjälä [Mon, 9 Oct 2017 16:19:50 +0000 (19:19 +0300)]
drm/i915: Reuse normal state readout for LVDS/DVO fixed mode
Reuse the normal state readout code to get the fixed mode for LVDS/DVO
encoders. This removes some partially duplicated state readout code
from LVDS/DVO encoders. The duplicated code wasn't actually even
populating the negative h/vsync flags, leading to possible state checker
complaints. The normal readout code populates that stuff fully.
Daniel Vetter [Wed, 11 Oct 2017 09:10:19 +0000 (11:10 +0200)]
drm/i915: Use rcu instead of stop_machine in set_wedged
stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.
This patch tries to address the locking abuse of stop_machine() from
drm/i915: Stop the machine as we install the wedged submit_request handler
Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.
I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.
Unfortunately there's a complication with the call to
intel_engine_init_global_seqno:
- Without stop_machine we must hold the corresponding spinlock.
- Without stop_machine we must ensure that all requests are marked as
having failed with dma_fence_set_error() before we call it. That
means we need to split the nop request submission into two phases,
both synchronized with rcu:
1. Only stop submitting the requests to hw and mark them as failed.
2. After all pending requests in the scheduler/ring are suitably
marked up as failed and we can force complete them all, also force
complete by calling intel_engine_init_global_seqno().
This should fix the followwing lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
but task is already holding lock:
(&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.
v3: We need to protect the seqno update with the timeline spinlock (in
set_wedged) to avoid racing with other updates of the seqno, like we
already do in nop_submit_request (Chris).
v4: Use two-phase sequence to plug the race Chris spotted where we can
complete requests before they're marked up with -EIO.
v5: Review from Chris:
- simplify nop_submit_request.
- Add comment to rcu_read_lock section.
- Align comments with the new style.
v6: Remove unused variable to appease CI.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marta Lofstedt <marta.lofstedt@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch
drm/i915: Create generic functions to control RC6, RPS
Prepared generic functions intel_enable_rc6, intel_disable_rc6,
intel_enable_rps and intel_disable_rps functions to setup RC6/RPS
based on platforms.
v2: Make intel_enable/disable_rc6/rps static. (Chris)
v3: Added lockdep_assert_held(dev_priv->pcu_lock) in new generic
functions. (Chris)
Removed WARN_ON(&dev_priv->pcu_lock) from lower level functions as generic
function now has lockdep_assert. Rebase.
drm/i915: Rename intel_enable_rc6 to intel_rc6_enabled
This function gives the status of RC6, whether disabled or if
enabled then which state. intel_enable_rc6 will be used for
enabling RC6 in the next patch.
drm/i915: Name structure in dev_priv that contains RPS/RC6 state as "gt_pm"
Prepared substructure rps for RPS related state. autoenable_work is
used for RC6 too hence it is defined outside rps structure. As we do
this lot many functions are refactored to use intel_rps *rps to access
rps related members. Hence renamed intel_rps_client pointer variables
to rps_client in various functions.
v2: Rebase.
v3: s/pm/gt_pm (Chris)
Refactored access to rps structure by declaring struct intel_rps * in
many functions.
drm/i915: Move rps.hw_lock to dev_priv and s/hw_lock/pcu_lock
In order to separate GT PM related functionality into new structure
we are updating rps structure. hw_lock in it is used for display
related PCU communication too hence move it to dev_priv.
This patch separates RC6 and RPS enabling for BDW.
RC6/RPS Disabling are handled through gen6 functions.
PM Programming guide recommends a sequence within forcewakes to
configure RC6, RPS and ring frequencies in sequence. With this
patch the order is still maintained.
v2: Update sequence numbers in RC6 programming and comment about
intent of reset_rps during gen8_enable_rps. (Radoslaw)
Matthew Auld [Tue, 10 Oct 2017 13:30:30 +0000 (14:30 +0100)]
drm/i915/selftests: ditch the kernel context
There's really no good reason to be using the kernel context for the
huge-page livetests. Also with the introduction of commit bef27bdb6cfb
("drm/i915: Assert we do not try to expand VMA for hugepage inside GGTT")
we start hitting the bug on in the selftests, since the kernel context
will always return true for i915_vma_is_ggtt(), so now seems like the
opportune time to instead create our own context.
Fixes: 4049866f0913 ("drm/i915/selftests: huge page tests") Fixes: bef27bdb6cfb ("drm/i915: Assert we do not try to expand VMA for hugepage inside GGTT") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171010133030.12112-1-matthew.auld@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris Wilson [Tue, 10 Oct 2017 11:10:05 +0000 (12:10 +0100)]
drm/i915: Silently fallback to 4k scratch
If we fail to allocate a 64k hugepage for scratch, we try again with a
normal 4k page (with some loss of efficiency at runtime). As we handle
this gracefully, we do not need a noisy allocation failure warning.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171010111005.13625-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Daniel Vetter [Tue, 10 Oct 2017 09:18:16 +0000 (11:18 +0200)]
drm/i915: Increase atomic update vblank evasion time with lockdep
All our mmio writes take forever with lockdep due to the constant
lock acquire&dropping we do. Ville has some patches to only acquire
the mmio spinlocks once instead for every single mmio, but those
aren't ready yet.
As an interim solution just extend our budget slightly when lockdep is
enabled, to avoid the rare and sporadic noise in CI.
v2: I forgot to add the FIXME comment ...
Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=103169
References: https://bugs.freedesktop.org/show_bug.cgi?id=103124
References: https://bugs.freedesktop.org/show_bug.cgi?id=102403
References: https://bugs.freedesktop.org/show_bug.cgi?id=103020
References: https://bugs.freedesktop.org/show_bug.cgi?id=103019
References: https://bugs.freedesktop.org/show_bug.cgi?id=102723
References: https://bugs.freedesktop.org/show_bug.cgi?id=102544
References: https://bugs.freedesktop.org/show_bug.cgi?id=103180 Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20171010091816.26898-1-daniel.vetter@ffwll.ch
Mika Kuoppala [Tue, 10 Oct 2017 11:48:57 +0000 (14:48 +0300)]
drm/i915: Use execlists_num_ports instead of size of array
There is function to tell how many ports we have, so use it.
We still have direct relationship with array size and port count,
so no harm was done.
Fixes: 76e70087d360 ("drm/i915: Make execlist port count variable") Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171010114857.13108-1-mika.kuoppala@intel.com
Daniel Vetter [Mon, 9 Oct 2017 16:44:00 +0000 (18:44 +0200)]
drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
4.14-rc1 gained the fancy new cross-release support in lockdep, which
seems to have uncovered a few more rules about what is allowed and
isn't.
This one here seems to indicate that allocating a work-queue while
holding mmap_sem is a no-go, so let's try to preallocate it.
Of course another way to break this chain would be somewhere in the
cpu hotplug code, since this isn't the only trace we're finding now
which goes through msr_create_device.
Full lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G U
------------------------------------------------------
prime_mmap/1551 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
but task is already holding lock:
(&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Note that this also has the minor benefit of slightly reducing the
critical section where we hold mmap_sem.
v2: Set ret correctly when we raced with another thread.
v3: Use Chris' diff. Attach the right lockdep splat.
v4: Repaint in Tvrtko's colors (aka don't report ENOMEM if we race and
some other thread managed to not also get an ENOMEM and successfully
install the mmu notifier. Note that the kernel guarantees that small
allocations succeed, so this never actually happens).
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sasha Levin <alexander.levin@verizon.com> Cc: Marta Lofstedt <marta.lofstedt@intel.com> Cc: Tejun Heo <tj@kernel.org>
References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939 Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171009164401.16035-1-daniel.vetter@ffwll.ch
Jani Nikula [Thu, 28 Sep 2017 08:22:03 +0000 (11:22 +0300)]
drm/i915/bios: parse SDVO device mapping from pre-parsed child devices
We parse and store the child devices in
parse_general_definitions(). There is no need to parse the VBT block
again for SDVO device mapping. Do the same as we do in
parse_ddi_ports().
We no longer have access to child device size at this stage, but we also
don't need to worry about reading past the child device anymore. Instead
of a child device size check, do a mild optimization by limiting the
parsing to gens 3 through 7.
Jani Nikula [Thu, 28 Sep 2017 08:21:59 +0000 (11:21 +0300)]
drm/i915/bios: don't initialize fields based on vbt version
In theory, these might clobber information for older VBT versions.
We might have to store the BDB version for later parsing, but currently
all code accessing these fields will only use them on newer platforms
with new enough BDB versions.
Jani Nikula [Thu, 28 Sep 2017 08:21:57 +0000 (11:21 +0300)]
drm/i915/bios: parse DDI ports also for CHV for HDMI DDC pin and DP AUX channel
While technically CHV isn't DDI, we do look at the VBT based DDI port
info for HDMI DDC pin and DP AUX channel. (We call these "alternate",
but they're really just something that aren't platform defaults.)
In commit e4ab73a13291 ("drm/i915: Respect alternate_ddc_pin for all DDI
ports") Ville writes, "IIRC there may be CHV system that might actually
need this."
I'm not sure why there couldn't be even more platforms that need this,
but start conservative, and parse the info for CHV in addition to DDI.
Paulo Zanoni [Thu, 5 Oct 2017 21:38:42 +0000 (18:38 -0300)]
drm/i915: avoid division by zero on cnl_calc_wrpll_link
If for some unexpected reason the registers all read zero it's better
to WARN and return instead of dividing by zero and completely freezing
the machine.
I don't expect this to happen in the wild with the current code, but I
accidentally triggered the division by zero while doing some debugging
in an unusual environment.
Matthew Auld [Mon, 9 Oct 2017 11:00:24 +0000 (12:00 +0100)]
drm/i915: s/sg_mask/sg_page_sizes/
It's a little unclear what the sg_mask actually is, so prefer the more
meaningful name of sg_page_sizes.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171009110024.29114-1-matthew.auld@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris Wilson [Mon, 9 Oct 2017 08:44:01 +0000 (09:44 +0100)]
drm/i915: Early rejection of mappable GGTT pin attempts for large bo
Currently, we reject attempting to pin a large bo into the mappable
aperture, but only after trying to create the vma. Under debug kernels,
repeatedly creating and freeing that vma for an oversized bo consumes
one-third of the runtime for pwrite/pread tests as it is spent on
kmalloc/kfree tracking. If we move the rejection to before creating that
vma, we lose some accuracy of checking against the fence_size as opposed
to object size, though the fence can never be smaller than the object.
Note that the vma creation itself will reject an attempt to create a vma
larger than the GTT so we can remove one redundant test.
Chris Wilson [Mon, 9 Oct 2017 08:44:00 +0000 (09:44 +0100)]
drm/i915: Avoid evicting user fault mappable vma for pread/pwrite
Both pread/pwrite GTT paths provide a fast fallback in case we cannot
map the whole object at a time. Currently, we use the fallback for very
large objects and for active objects that would require remapping, but
we can also add active fault mappable objects to the list that we want
to avoid evicting. The rationale is that such fault mappable objects are
in active use and to evict requires tearing down the CPU PTE and forcing
a page fault on the next access; more costly, and intefers with other
processes, than our per-page GTT fallback.
Chris Wilson [Mon, 9 Oct 2017 08:43:59 +0000 (09:43 +0100)]
drm/i915: Try a minimal attempt to insert the whole object for relocations
As we have a lightweight fallback to insert a single page into the
aperture, try to avoid any heavier evictions when attempting to insert
the entire object.
Chris Wilson [Mon, 9 Oct 2017 08:43:58 +0000 (09:43 +0100)]
drm/i915: Check PIN_NONFAULT overlaps in evict_for_node
If the caller says that he doesn't want to evict any other faulting
vma, honour that flag. The logic was used in evict_something, but not
the more specific evict_for_node, now being used as a preliminary probe
since commit 606fec956c0e ("drm/i915: Prefer random replacement before
eviction search").
Fixes: 606fec956c0e ("drm/i915: Prefer random replacement before eviction search") Fixes: 821188778b9b ("drm/i915: Choose not to evict faultable objects from the GGTT")
References: https://bugs.freedesktop.org/show_bug.cgi?id=102490 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-4-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Mon, 9 Oct 2017 08:43:57 +0000 (09:43 +0100)]
drm/i915: Track user GTT faulting per-vma
We don't wish to refault the entire object (other vma) when unbinding
one partial vma. To do this track which vma have been faulted into the
user's address space.
v2: Use a local vma_offset to tidy up a multiline unmap_mapping_range().
Chris Wilson [Mon, 9 Oct 2017 08:43:56 +0000 (09:43 +0100)]
drm/i915: Consolidate get_fence with pin_fence
Following the pattern now used for obj->mm.pages, use just pin_fence and
unpin_fence to control access to the fence registers. I.e. instead of
calling get_fence(); pin_fence(), we now just need to call pin_fence().
This will make it easier to reduce the locking requirements around
fence registers.