Chris Wilson [Wed, 13 Sep 2017 08:56:05 +0000 (09:56 +0100)]
drm/i915/execlists: Read the context-status HEAD from the HWSP
The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!
v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g
v5: Prevent the compiler reloading tail (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Chris Wilson [Wed, 13 Sep 2017 13:35:34 +0000 (14:35 +0100)]
drm/i915/execlists: Read the context-status buffer from the HWSP
The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.
Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.
v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path
v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
memory latency. (Mika)
v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio
read and we want to stop the compiler from issuing a later (v.slow) reload.
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Acked-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Chris Wilson [Wed, 13 Sep 2017 08:56:03 +0000 (09:56 +0100)]
drm/i915: Allow HW status page to be bound high
At the time of commit ff2673a36c4f ("drm/i915: HWS must be in the
mappable region for g33"), drm_mm insertion would often default to
placing a new object high in the zone forcing us to specify that certain
HWSP must be bound within the low mappable region. Since then, drm_mm
has gained more finesse over its placement and exposes that to the
caller, commit e615900281b8 ("drm: Improve drm_mm search (and fix
topdown allocation) with rbtrees"). As such where possible we want the
HWSP to be outside of the mappable aperture and so need to specify that
can be pinned high.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-4-chris@chris-wilson.co.uk
On gen8+ we're currently using the PPHWSP of the kernel ctx as the
global HWSP. However, when the kernel ctx gets submitted (e.g. from
__intel_autoenable_gt_powersave) the HW will use that page as both
HWSP and PPHWSP. This causes a conflict in the register arena of the
HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
but in the following patches we will take advantage of the cached
register state for handling execlist's context status interrupt.
To avoid the conflict, instead of re-using the PPHWSP of the kernel
ctx we can allocate a separate page for the HWSP like what happens for
pre-execlists platform.
v2: Add a use-case for the register arena of the HWSP.
Michel Thierry [Wed, 13 Sep 2017 08:56:01 +0000 (09:56 +0100)]
drm/i915/guc: Don't make assumptions while getting the lrca offset
Using the HWSP ggtt_offset to get the lrca offset is only correct if the
HWSP happens to be before it (when we reuse the PPHWSP of the kernel
context as the engine HWSP). Instead of making this assumption, get the
lrca offset from the kernel_context engine state.
And while looking at this part of the GuC interaction, it was also
noticed that the firmware expects the size of only the engine context
(context minus the execlist part, i.e. don't include the first 80
dwords), so pass the right size.
v2: Use the new macros to prevent abusive overuse of the old ones (Chris).
Michel Thierry [Wed, 13 Sep 2017 08:56:00 +0000 (09:56 +0100)]
drm/i915/lrc: Clarify the format of the context image
Not only the context image consist of two parts (the PPHWSP, and the
logical context state), but we also allocate a header at the start of
for sharing data with GuC. Thus every lrc looks like this:
So far, we have oversimplified whenever we use each of these parts of the
context, just because the GuC header happens to be in page 0, and the
(PP)HWSP is in page 1. But this had led to using the same define for more
than one meaning (as a page index in the lrc and as 1 page).
This patch adds defines for the GuC shared page, the PPHWSP page and the
start of the logical state. It also updated the places where the old
define was being used. Since we are not changing the size (or format) of
the context, there are no functional changes.
Chris Wilson [Wed, 13 Sep 2017 10:51:54 +0000 (11:51 +0100)]
drm/i915/selftests: Use mul_u32_u32() for 32b x 32b -> 64b result
As realised by commit f98f17c0f33d ("math64, timers: Fix 32bit
mul_u64_u32_shr() and friends"), GCC does not always generate ideal code
for performing a 32b x 32b multiply returning a 64b result (i.e. where
we idiomatically use u64 result = (u64)x * (u32)x).
Chris Wilson [Wed, 13 Sep 2017 10:51:53 +0000 (11:51 +0100)]
drm/i915: Use mul_u32_u32() for 32b x 32b -> 64b result
As realised by commit f98f17c0f33d ("math64, timers: Fix 32bit
mul_u64_u32_shr() and friends"), GCC does not always generate ideal code
for performing a 32b x 32b multiply returning a 64b result (i.e. where
we idiomatically use u64 result = (u64)x * (u32)x). This catches a
couple of instances in the display code using (u64)x * (u32)y.
Chris Wilson [Tue, 12 Sep 2017 21:49:05 +0000 (22:49 +0100)]
drm/i915: Move the context descriptor to an inline helper
The context descriptor is stored inside the per-engine context state, as
we only need to compute it once and access it frequently. However,
currently only intel_lrc.c has easy access, but i915_guc_submission.c
would like to frequently read it as well, and more so only ever needs
the lower 32bits. Make it an inline as the compiler should be able to
retrieve the value in less instructions than it takes to do the function
call:
add/remove: 0/1 grow/shrink: 1/0 up/down: 8/-45 (-37)
function old new delta
i915_guc_submit 621 629 +8
intel_lr_context_descriptor 45 - -45
Min brightness value from vbt was missing for CNP platform.
This setting have to refer backlight ic spec to restrict
min backlight output. Without this restriction, driver would
allow to configure lower brightness value and violate
backlight ic requirement.
Uma Shankar [Tue, 5 Sep 2017 09:44:31 +0000 (15:14 +0530)]
Revert "drm/i915/bxt: Disable device ready before shutdown command"
This reverts commit 24bf7f011ddb ("drm/i915/bxt: Disable device ready
before shutdown command").
Disable device ready before shutdown command was added previously to
avoid a split screen issue seen on dual link DSI panels. As of now, dual
link is not supported and will need some rework in the upstream
code. For single link DSI panels, the change is not required. This will
cause failure in sending SHUTDOWN packet during disable. Hence reverting
the change. Will handle the change as part of dual link enabling in
upstream.
Chris Wilson [Tue, 12 Sep 2017 15:07:52 +0000 (16:07 +0100)]
drm/i915: Cleanup error paths through eb_lookup_vma()
Following the simplification to a single lookup loop in commit 20414d4dd5e7 ("drm/i915: Simplify eb_lookup_vmas()") and commit 1b5dde0153b2 ("drm/i915: Replace execbuf vma ht with an idr"), we can go
one step further and reorder the error paths so that the state of the
local variable obj is always known to the compiler and doesn't need the
uninitialized_var markup to squelch a compiler warning.
drm/i915/spt+: Don't reset invalid AUX channel interrupt bits in SDEIMR
The SDE interrupt bits 25, 26 and 27 are either reserved or meant for
DDI E hotplug in SPT+. These bits are meant for AUX channels only in LPT
and CPT, so add the appropriate checks.
Zhi Wang [Tue, 12 Sep 2017 07:42:24 +0000 (15:42 +0800)]
drm/i915: Factor out setup_private_pat()
Factor out setup_private_pat() for introducing the following patches.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1505202148-22959-1-git-send-email-zhi.a.wang@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Min brightness value from vbt was missing for BXT platform.
This setting have to refer backlight ic spec to restrict
min backlight output. Without this restriction, driver would
allow to configure lower brightness value and violate
backlight ic requirement.
Marta Lofstedt [Fri, 8 Sep 2017 13:28:29 +0000 (16:28 +0300)]
drm/i915: Increase poll time for BDW FCLK_DONE
During IGT testing it has been shown that the specification
defined polling time of 1 us for FCLK_DONE, is sometimes not
enough. The issue is still reproducible while disabling
C-states through the PM QoS framework and also while disabling
preemtion. From this the most plausible explanation is that the
issue is due to a firmware flaw.
As a workaround, it is better to wait a little bit longer for
the FCLK_DONE to come around, than to leave with an DRM_ERROR
and having FCLK_DONE at a randome time after.
While spinning a list of igt tests prone to reproduce the issue
the FCLK_DONE poll failed at approximately 2% of the invocations
of the bdw_set_cdclk function. The longest poll time during this
testing was measured to ~7us. So, the suggested new poll time of
100us is on the safe side.
v2: Added more documentation about investigations done.
Let's move the activation calls together after enable is done.
No real functional change should be expected here. Just an attempt
to get it clear when we are really activating PSR after enabling it.
v2: Add braces on if/else because commit message there is too long
as suggested by Jani.
v3: Rebased on top of commit b5469f93f649 ("drm/i915: Plumb
crtc_state to PSR enable/disable")
drm/i915/psr: Move hsw_enable_source after enabling sink.
No functional change is expected here since at this point
PSR is not allowed to go to any active state. In other
words, not really enabled.
However let's do in a separated patch so it gets clear
on what is change and specially it can helps on bisect
case if we figure something has caused changes in behaviour.
But this needs to be done before we make the vfunc to
enable source to be in parity with VLV implementation.
This sequence is part of enable source anyways, but they
only need to be executed once and not on every activation,
So let's re-create hsw_enable_source.
v2: Avoid changing order here to avoid changing behaviour
as suggested by Jani.
v3: Rebased on top of commit b5469f93f649 ("drm/i915: Plumb
crtc_state to PSR enable/disable")
On HSW+ the real activate of PSR is decided by the source
after certain amount of configured idle frames.
However for the driver perspective where we track psr.active
variable this function here is the actual activate one. So
let's rename it before moving to vfunc with that.
Chris Wilson [Fri, 8 Sep 2017 18:16:22 +0000 (19:16 +0100)]
drm/i915: Only initialize partially filled pagetables
If we know that we will completely fill a pagetable (i.e. we are
inserting a complete set of 512 pages), we can skip prefilling that PT
with scratch entries. If we have to abort the insertion prior to writing
the real entries, we will teardown the pagetable and remove it from the
page directory (so that we will restart the allocation next time).
We could do similar tricks for the PD and PDP, but the likelihood of a
single insertion covering the entire 512 entries diminishes, as do the
cycle savings. The saving are even greater (relatively) when we are
preallocating page tables for huge pages, as then we never need to fill
the page table.
Changbin Du [Wed, 23 Aug 2017 06:08:10 +0000 (14:08 +0800)]
drm/i915/gvt: Add support for PCIe extended configuration space
IGD is PCIe device and has extended configuration space. Checking
the binary dump, we can see we have Caps located out of PCI compatible
Configuration Space range.
Currently, we only emulate the PCI compatible Configuration Space.
This is okay if we attach vGPU to PCI bus. But when we attach to
a PCI Express bus (when Qemu emulates a Intel Q35 chipset which has
PCIe slot), it will not work. Extended Configuration Space is required
for a PCIe device.
This patch extended the virtual configuration space from 256 bytes
to 4KB bytes. So we are to be a *real* PCIe device. And for the
Extended CapList we keep same to physical GPU.
Cc: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Changbin Du [Fri, 18 Aug 2017 09:49:58 +0000 (17:49 +0800)]
drm/i915/gvt: Fix incorrect PCI BARs reporting
Looking at our virtual PCI device, we can see surprising Region 4 and Region 5.
00:10.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 06) (prog-if 00 [VGA controller])
....
Region 0: Memory at 140000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 180000000 (64-bit, prefetchable) [size=1G]
Region 4: Memory at <ignored> (32-bit, non-prefetchable)
Region 5: Memory at <ignored> (32-bit, non-prefetchable)
Expansion ROM at febd6000 [disabled] [size=2K]
The fact is that we only implemented BAR0 and BAR2. Surprising Region 4 and
Region 5 are shown because we report their size as 0xffffffff. They should
report size 0 instead.
BTW, the physical GPU has a PIO BAR. GVTg hasn't implemented PIO access, so
we ignored this BAR for vGPU device.
fred gao [Fri, 18 Aug 2017 07:41:10 +0000 (15:41 +0800)]
drm/i915/gvt: Refine error handling in dispatch_workload
When an error occurs in dispatch_workload, this patch is to do the
proper cleanup and rollback to the original states before the workload
is abandoned.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
- original PTR_ERR(cs) is good and code cleanup. (Zhenyu)
v4:
- reuse the existing i915_add_request for error handling. (Zhenyu)
v5:
- remove the duplicate error handling release_shadow_wa_ctx and
move the engine->context_unpin upper. (Zhenyu)
v6:
- keep the old label "out". (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
fred gao [Fri, 18 Aug 2017 07:41:07 +0000 (15:41 +0800)]
drm/i915/gvt: Add error handling for intel_gvt_scan_and_shadow_workload
When an error occurs after shadow_indirect_ctx, this patch is to do the
proper cleanup and rollback to the original states for shadowed indirect
context before the workload is abandoned.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
- no return check for clean up functions. (Changbin)
v4:
- expose and reuse the existing release_shadow_wa_ctx. (Zhenyu)
v5:
- move the release function to scheduler.c file. (Zhenyu)
v6:
- move error handling code of intel_gvt_scan_and_shadow_workload
to here. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
fred gao [Fri, 18 Aug 2017 07:41:06 +0000 (15:41 +0800)]
drm/i915/gvt: Separate cmd scan from request allocation
Currently i915 request structure and shadow ring buffer are allocated
before command scan, so it will have to restore to previous states once
any error happens afterwards in the long dispatch_workload path.
This patch is to introduce a reserved ring buffer created at the beginning
of vGPU initialization. Workload will be coped to this reserved buffer and
be scanned first, the i915 request and shadow ring buffer are only
allocated after the result of scan is successful.
To balance the memory usage and buffer alloc time, the coming bigger ring
buffer will be reallocated and kept until more bigger buffer is coming.
v2:
- use kmalloc for the smaller ring buffer, realloc if required. (Zhenyu)
v3:
- remove the dynamically allocated ring buffer. (Zhenyu)
Changbin Du [Tue, 15 Aug 2017 05:14:04 +0000 (13:14 +0800)]
drm/i915/gvt: Add emulation for BAR2 (aperture) with normal file RW approach
For vfio-pci, if the region support MMAP then it should support both
mmap and normal file access. The user-space is free to choose which is
being used. For qemu, we just need add 'x-no-mmap=on' for vfio-pci
option.
Currently GVTg only support MMAP for BAR2. So GVTg will not work when
user turn on x-no-mmap option.
This patch added file style access for BAR2, aka the GPU aperture. We
map the entire aperture partition of active vGPU to kernel space when
guest driver try to enable PCI Memory Space. Then we redirect the file
RW operation from kvmgt to this mapped area.
Changbin Du [Tue, 15 Aug 2017 05:20:51 +0000 (13:20 +0800)]
drm/i915/kvmgt: Sanitize PCI bar emulation
For PCI, 64bit bar consumes two BAR registers, but this doesn't mean
both of two BAR are valid. Actually the second BAR is regarded as
reserved in this case. So we shouldn't emulate the second BAR.
Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Oscar Mateo [Thu, 7 Sep 2017 15:40:09 +0000 (08:40 -0700)]
drm/i915: Transform WaDisablePooledEuLoadBalancingFix into a simple register write
FF_SLICE_CS_CHICKEN2 does not belong to the context image.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1504798809-5653-6-git-send-email-oscar.mateo@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oscar Mateo [Thu, 7 Sep 2017 15:40:08 +0000 (08:40 -0700)]
drm/i915: Transform WaDisableDynamicCreditSharing into a simple register write
GAMT_CHKN_BIT_REG does not live in the context image.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1504798809-5653-5-git-send-email-oscar.mateo@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oscar Mateo [Thu, 7 Sep 2017 15:40:07 +0000 (08:40 -0700)]
drm/i915: Transform WaDisableGafsUnitClkGating into a simple reg write
GEN7_UCGCTL4 does not live in the context.
v2: Missing parenthesis
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1504798809-5653-4-git-send-email-oscar.mateo@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oscar Mateo [Thu, 7 Sep 2017 15:40:06 +0000 (08:40 -0700)]
drm/i915: WaPushConstantDereferenceHoldDisable needs to modify a masked register
So do it correctly.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1504798809-5653-3-git-send-email-oscar.mateo@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oscar Mateo [Thu, 7 Sep 2017 15:40:05 +0000 (08:40 -0700)]
drm/i915: Transform WaDisableI2mCycleOnWRPort into a simple reg write
GAMT_CHKN_BIT_REG does not live in the context.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1504798809-5653-2-git-send-email-oscar.mateo@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oscar Mateo [Thu, 7 Sep 2017 15:40:04 +0000 (08:40 -0700)]
drm/i915: Transform WaInPlaceDecompressionHang into a simple reg write
Afaict, GEN9_GAMT_ECO_REG_RW_IA does not live in the context, so writing
it on every context creation is overkill (and wrong).
v2: Missing end parenthesis
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1504798809-5653-1-git-send-email-oscar.mateo@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris Wilson [Thu, 7 Sep 2017 18:45:20 +0000 (19:45 +0100)]
drm/i915: Apply the GTT write flush for all !llc machines
We also see the delayed GTT write issue on i915g/i915gm, so let's
presume that it is a universal problem for all !llc machines, and that we
just haven't yet noticed on g33, gen4 and gen5 machines.
v2: Use a register that exists on all platforms
Testcase: igt/gem_mmap_gtt/coherency # i915gm
References: https://bugs.freedesktop.org/show_bug.cgi?id=102577 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170907184520.5032-1-chris@chris-wilson.co.uk Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Thu, 7 Sep 2017 14:32:03 +0000 (17:32 +0300)]
i915: Fix obj size vs. alignment for drm_pci_alloc()
drm_pci_alloc() refuses to cooperate if the passed alignment exceeds the
object size. So round up the obj size to the next power of two as well
to make this actually work.
Obviously things work just fine as long as the size was a power of two
to begin with. However kms_cursor_crc doesn't always use power of two
sizes so we hit a failure when we try to allocate the phys memory.
Chris Wilson [Thu, 7 Sep 2017 13:44:41 +0000 (14:44 +0100)]
drm/i915: Disable mmio debugging during user access
If the user bypasses i915 and accesses mmio directly, that easily
confuses our automatic mmio debugging (any error we then detect is
likely to be as a result of the user). Since we expect userspace to open
debugfs/i915_forcewake_user if i915.ko is loaded and they want mmio
access, that makes the opportune time to disable our debugging for
duration of the bypass.
v2: Move the fiddling of uncore internals to uncore.c
Kumar, Mahesh [Thu, 17 Aug 2017 13:45:28 +0000 (19:15 +0530)]
drm/i915/bxt+: Enable IPC support
This patch adds IPC support. This patch also enables IPC in all supported
platforms based on has_ipc flag.
IPC (Isochronous Priority Control) is the hardware feature, which
dynamically controls the memory read priority of Display.
When IPC is enabled, plane read requests are sent at high priority until
filling above the transition watermark, then the requests are sent at
lower priority until dropping below the level 0 watermark.
The lower priority requests allow other memory clients to have better
memory access. When IPC is disabled, all plane read requests are sent at
high priority.
Changes since V1:
- Remove commandline parameter to disable ipc
- Address Paulo's comments
Changes since V2:
- Address review comments
- Set ipc_enabled flag
Changes since V3:
- move ipc_enabled flag assignment inside intel_ipc_enable function
Changes since V4:
- Re-enable IPC after suspend/resume
Changes since V5:
- Enable IPC for all gen >=9 except SKL
Changes since V6:
- fix commit msg
- after resume program IPC based on SW state.
Changes since V7:
- Modify IPC support check based on HAS_IPC macro (suggested by Chris)
Mahesh Kumar [Thu, 17 Aug 2017 13:45:27 +0000 (19:15 +0530)]
drm/i915/gen9+: Add has_ipc flag in device info structure
New Isochronous Priority Control (IPC) capability is introduced in newer
GEN platforms. This patch adds a device info flag to indicate if platform
supports IPC. Patch also sets this flag in supported platforms.
Kumar, Mahesh [Thu, 17 Aug 2017 13:45:24 +0000 (19:15 +0530)]
drm/i915/gen10: Calculate and enable transition WM
GEN > 9 require transition WM to be programmed if IPC is enabled.
This patch calculates & enable transition WM for supported platforms.
If transition WM is enabled, Plane read requests are sent at high
priority until filling above the transition watermark, then the
requests are sent at lower priority until dropping below the level-0 WM.
The lower priority requests allow other memory clients to have better
memory access.
transition minimum is the minimum amount needed for trans_wm to work to
ensure the demote does not happen before enough data has been read to
meet the level 0 watermark requirements.
transition amount is configurable value. Higher values will
tend to cause longer periods of high priority reads followed by longer
periods of lower priority reads. Tuning to lower values will tend to
cause shorter periods of high and lower priority reads.
Keeping transition amount to 10 in this patch, as suggested by HW team.
Changes since V1:
- Address review comments from Maarten
Kumar, Mahesh [Thu, 17 Aug 2017 13:45:23 +0000 (19:15 +0530)]
drm/i915/skl+: Optimize WM calculation
Plane configuration parameters doesn't change for each WM-level
calculation. Currently we compute same parameters 8 times for each
wm-level.
This patch optimizes it by calculating these parameters in beginning
& reuse during each level-wm calculation.
Changes since V1:
- rebase on top of Rodrigo's series for CNL
Kumar, Mahesh [Thu, 17 Aug 2017 13:45:22 +0000 (19:15 +0530)]
drm/i915: Fixed point fixed16 wrapper cleanup
As per suggestion from Jani, cleanup the code. Cleanup includes
- Instead of left shifting & check, compare with U32/16_MAX
- Use typecast instead of clamp_t
tools/testing/scatterlist: Test new __sg_alloc_table_from_pages
Exercise the new __sg_alloc_table_from_pages API (and through
it also the old sg_alloc_table_from_pages), checking that the
created table has the expected number of segments depending on
the sequence of input pages and other conditions.
v2: Move to data driven for readability.
v3: Add some more testcases and -fsanitize=undefined. (Chris Wilson)
Tvrtko Ursulin [Thu, 3 Aug 2017 09:14:17 +0000 (10:14 +0100)]
drm/i915: Use __sg_alloc_table_from_pages for userptr allocations
With the addition of __sg_alloc_table_from_pages we can control
the maximum coalescing size and eliminate a separate path for
allocating backing store here.
Similar to 3da9aad80994 ("drm/i915: Allow compaction upto
SWIOTLB max segment size") this enables more compact sg lists to
be created and so has a beneficial effect on workloads with many
and/or large objects of this class.
v2:
* Rename helper to i915_sg_segment_size and fix swiotlb override.
* Commit message update.
v3:
* Actually include the swiotlb override fix.
v4:
* Regroup parameters a bit. (Chris Wilson)
v5:
* Rebase for swiotlb_max_segment.
* Add DMA map failure handling as in a662a0d50851
("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping").
Since the scatterlist length field is an unsigned int, make
sure that sg_alloc_table_from_pages does not overflow it while
coalescing pages to a single entry.
v2: Drop reference to future use. Use UINT_MAX.
v3: max_segment must be page aligned.
v4: Do not rely on compiler to optimise out the rounddown.
(Joonas Lahtinen)
v5: Simplified loops and use post-increments rather than
pre-increments. Use PAGE_MASK and fix comment typo.
(Andy Shevchenko)
v6: Commit spelling fix.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: linux-kernel@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170803091312.22875-1-tvrtko.ursulin@linux.intel.com
Chris Wilson [Wed, 6 Sep 2017 19:24:24 +0000 (20:24 +0100)]
drm/i915: Disable snooping (userptr, set-cache-level) on gen4
The original gen4 has an issue where writes (both render and blt) into
snoopable pages are lost. We've previously worked around this in
userspace (ddx, igt) by simply not requesting snoopable buffers, but upon
rediscovering this problem for a third time, make the kernel reject such
requests with -ENODEV.
This disables snooping on userspace buffers for i965g and i965gm (original
gen4) machines.
Chris Wilson [Wed, 6 Sep 2017 13:52:20 +0000 (14:52 +0100)]
drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages
i915_gem_object_attach_phys() is trying to swap out its shmemfs pages
for a new set of physically contiguous pages, but unfortunately triggers
an assert inside get-pages.
drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk
Skip compressing 1 segment at the end of the frame,
avoid a pixel count mismatch nuke event when last active
pixel and dummy pixel has same color for Odd Plane
Width / Height.
For both platforms Gemini Lake and Cannon Lake.
v2: Use function-like macro and also use mask to clean
to make sure bit 11 is 0. (Suggested by Paulo).
v3: Add Display WA notation and also apply for GLK.
Both Forgotten on v2.
Using "GLK_" prefix since GLK came before CNL.
v4: Forgot to "|=" when moving directly macro to masked
val. (Noticed by Paulo.)
v5: Rebased on top of 16f611684940 ("drm/i915/cnp: Wa 1181:
Fix Backlight issue")
Chris Wilson [Wed, 6 Sep 2017 10:56:53 +0000 (11:56 +0100)]
drm/i915: Move device_info.has_snoop into the static tables
Currently we define any !llc machine as using snoop instead. However,
some platforms run into trouble using snoop that we would like to
disable, and to do so easily we want to be able to use the static
device_info tables.
v2: Leave the old snoop = !llc as a warning for the time being to check
that all stanzas are filled as either llc or snoop.
Chris Wilson [Wed, 6 Sep 2017 15:28:59 +0000 (16:28 +0100)]
drm/i915: Disable MI_STORE_DATA_IMM for i915g/i915gm
The early gen3 machines (i915g/Grantsdale and i915gm/Alviso) share a lot
of characteristics in their MI/GTT blocks with gen2, and in particular
can only use physical addresses in MI_STORE_DATA_IMM. This makes it
incompatible with our usage, so include those two machines in the
blacklist to prevent usage.
v2: Make it easy for gcc and rewrite it as a switch to save some space.
Chris Wilson [Wed, 6 Sep 2017 11:14:05 +0000 (12:14 +0100)]
drm/i915: Re-enable GTT following a device reset
Ville Syrjälä spotted that PGETBL_CTL was losing its enable bit upon a
reset. That was causing the display to show garbage on his 945gm. On my
i915gm the effect was far more severe; re-enabling the display following
the reset without PGETBL_CTL being enabled lead to an immediate hard
hang.
We do have a routine to re-enable PGETBL_CTL which is applicable to
gen2-4, although on gen4 it is documented that a graphics reset doesn't
alter the register (no such wording is given for gen3) and should be safe
to call to punch back in the enable bit. However, that leaves the question
of whether we need to completely re-initialise the register and the
rest of the GSM. For g33/pnv/gen4+, where we do have a configurable
page table, its contents do seem to be kept, and so we should be able to
recover without having to reinitialise the GTT from scratch (as prior to
g33, that register is configured by the BIOS and we leave alone except
for the enable bit).
This appears to have been broken by commit 89763657f883 ("drm/i915:
Re-enable GGTT earlier during resume on pre-gen6 platforms"), which
moved the intel_enable_gtt() from i915_gem_init_hw() (also used by
reset) to add it earlier during hw init and resume, missing the reset
path.
v2: Find the culprit, rearrange ggtt_enable to be before gem_init_hw to
match init/resume
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: 89763657f883 ("drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101852 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170906111405.27110-1-chris@chris-wilson.co.uk Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Ville Syrjälä [Fri, 1 Sep 2017 16:54:34 +0000 (19:54 +0300)]
drm/i915: Annotate user relocs with __user
Add the missing __user to the urelocs cast to fix the following sparse
warning:
i915_gem_execbuffer.c:1541:47: warning: cast removes address space of expression
i915_gem_execbuffer.c:1541:62: warning: incorrect type in argument 2 (different address spaces)
i915_gem_execbuffer.c:1541:62: expected void const [noderef] <asn:1>*from
i915_gem_execbuffer.c:1541:62: got char *
Ville Syrjälä [Fri, 1 Sep 2017 17:12:52 +0000 (20:12 +0300)]
drm/i915: io unmap functions want __iomem
Don't cast away the __iomem from the io_mapping functions so that
sparse won't be so unhappy when we pass the pointer to the unmap
functions. Instead let's move the cast to where we actually use the
pointer.
Fixes the following sparse warnings:
i915_gem.c:1022:33: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1022:33: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1022:33: got void *[assigned] vaddr
i915_gem.c:1027:34: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1027:34: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1027:34: got void *[assigned] vaddr
i915_gem.c:1199:33: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1199:33: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1199:33: got void *[assigned] vaddr
i915_gem.c:1204:34: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1204:34: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1204:34: got void *[assigned] vaddr
Ville Syrjälä [Fri, 1 Sep 2017 19:54:56 +0000 (22:54 +0300)]
drm/i915: Wake up the device for the fbdev setup
Our fbdev setup requires the device to be awake for access
through the GTT. If one boots without connected displays and
later plugs one in, we won't have any runtime PM references when
the fbdev setup runs. Explicitly grab a runtime PM reference during
the fbdev setup to avoid the following spew:
Changbin Du [Mon, 4 Sep 2017 08:01:01 +0000 (16:01 +0800)]
drm/i915: Add interface to reserve fence registers for vGPU
In the past, vGPU alloc fence registers by walking through mm.fence_list
to find fence which pin_count = 0 and vma is empty. vGPU may not find
enough fence registers this way. Because a fence can be bind to vma even
though it is not in using. We have found such failure many times these
days.
An option to resolve this issue is that we can force-remove fence from
vma in this case.
This patch added two new api to the fence management code:
- i915_reserve_fence() will try to find a free fence from fence_list
and force-remove vma if need.
- i915_unreserve_fence() reclaim a reserved fence after vGPU has
finished.
With this change, the fence management is more clear to work with vGPU.
GVTg do not need remove fence from fence_list in private.
v3: (Chris)
- Add struct_mutex lock assertion.
- Only count for unpinned fence.
v2: (Chris)
- Rename the new api for symmetry.
- Add safeguard to ensure at least 1 fence remained for host display.
The header comment in include/trace/define_trace.h specifies that the
TRACE_INCLUDE_PATH needs to be relative to the define_trace.h header
rather than the trace file including it. Most instances get that wrong
and work around it by adding the $(src) directory to the include path.
While this works, it is preferable to refer to the correct path to the
trace file in the first place and avoid any workaround.
Ville Syrjälä [Fri, 1 Sep 2017 14:31:23 +0000 (17:31 +0300)]
drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
Use enum pipe for PCH transcoders also in the FIFO underrun code.
Fixes the following new sparse warnings:
intel_fifo_underrun.c:340:49: warning: mixing different enum types
intel_fifo_underrun.c:340:49: int enum pipe versus
intel_fifo_underrun.c:340:49: int enum transcoder
intel_fifo_underrun.c:344:49: warning: mixing different enum types
intel_fifo_underrun.c:344:49: int enum pipe versus
intel_fifo_underrun.c:344:49: int enum transcoder
intel_fifo_underrun.c:397:57: warning: mixing different enum types
intel_fifo_underrun.c:397:57: int enum pipe versus
intel_fifo_underrun.c:397:57: int enum transcoder
intel_fifo_underrun.c:398:17: warning: mixing different enum types
intel_fifo_underrun.c:398:17: int enum pipe versus
intel_fifo_underrun.c:398:17: int enum transcoder
Ville Syrjälä [Fri, 1 Sep 2017 14:31:22 +0000 (17:31 +0300)]
drm/i915: Make i2c lock ops static
Make gmbus_lock_ops and proxy_lock_ops static to appease sparse
intel_i2c.c:652:34: warning: symbol 'gmbus_lock_ops' was not declared. Should it be static?
intel_sdvo.c:2981:34: warning: symbol 'proxy_lock_ops' was not declared. Should it be static?
Ville Syrjälä [Fri, 1 Sep 2017 14:31:21 +0000 (17:31 +0300)]
drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
Make i9xx_load_ycbcr_conversion_matrix() static to appease sparse:
intel_color.c:110:6: warning: symbol 'i9xx_load_ycbcr_conversion_matrix' was not declared. Should it be static?
Ville Syrjälä [Wed, 23 Aug 2017 15:22:23 +0000 (18:22 +0300)]
drm/i915: Pass proper old/new states to intel_plane_atomic_check_with_state()
Eliminate plane->state and crtc->state usage from
intel_plane_atomic_check_with_state() and its callers. Instead pass the
proper states in or dig them up from the top level atomic state.
Note that intel_plane_atomic_check_with_state() itself isn't allowed to
use the top level atomic state as there is none when it gets called from
the legacy cursor short circuit path.
v2: Rename some variables for easier comprehension (Maarten)