Sometimes Icelake forgets to reset the CSB pointers on a GPU reset,
leading to it carry on updating the old tail of the buffer.
<0>[ 618.138490] i915_sel-5636 3d..1 673425465us : trace_ports: vecs0: submit { 14de2:504, 0:0 }
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : intel_engine_reset: vecs0 flags=100
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : execlists_reset_prepare: vecs0: depth<-0
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : intel_engine_stop_cs: vecs0
<0>[ 618.138490] i915_sel-5636 3.... 673425523us : __intel_gt_reset: engine_mask=40
<0>[ 618.138490] i915_sel-5636 3.... 673425568us : execlists_reset: vecs0
<0>[ 618.138490] i915_sel-5636 3d..1 673425568us : process_csb: vecs0 cs-irq head=1, tail=2
<0>[ 618.138490] i915_sel-5636 3d..1 673425568us : process_csb: vecs0 csb[2]: status=0x00000001:0x40000000
<0>[ 618.138490] i915_sel-5636 3d..1 673425569us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
<0>[ 618.138490] i915_sel-5636 3d..1 673425570us : __i915_request_reset: vecs0 rq=14de2:504, guilty? yes
<0>[ 618.138490] i915_sel-5636 3d..1 673425571us : __execlists_reset: vecs0 replay {head:2de0, tail:2e48}
<0>[ 618.138490] i915_sel-5636 3d..1 673425572us : __i915_request_unsubmit: vecs0 fence 14de2:504, current 503
<0>[ 618.138490] i915_sel-5636 3.... 673435544us : intel_engine_cancel_stop_cs: vecs0
<0>[ 618.138490] i915_sel-5636 3.... 673435544us : process_csb: vecs0 cs-irq head=11, tail=11
<0>[ 618.138490] i915_sel-5636 3d..1 673435545us : __i915_request_submit: vecs0 fence 14de2:504, current 503
<0>[ 618.138490] i915_sel-5636 3d..1 673435546us : __execlists_submission_tasklet: vecs0: queue_priority_hint:-
2147483648, submit:yes
<0>[ 618.138490] i915_sel-5636 3d..1 673435548us : trace_ports: vecs0: submit { 14de2:504*, 0:0 }
<0>[ 618.138490] i915_sel-5636 3.... 673435549us : execlists_reset_finish: vecs0: depth->0
<0>[ 618.138490] ksoftirq-21 2..s. 673435592us : process_csb: vecs0 cs-irq head=11, tail=3
<0>[ 618.138490] ksoftirq-21 2..s. 673435593us : process_csb: vecs0 csb[0]: status=0x00000001:0x40000000
<0>[ 618.138490] ksoftirq-21 2..s. 673435594us : trace_ports: vecs0: promote { 14de2:504*, 0:0 }
<0>[ 618.138490] ksoftirq-21 2..s. 673435596us : process_csb: vecs0 csb[1]: status=0x00000018:0x40000040
<0>[ 618.138490] ksoftirq-21 2..s. 673435597us : trace_ports: vecs0: completed { 14de2:504*, 0:0 }
<0>[ 618.138490] ksoftirq-21 2..s. 673435612us : process_csb: process_csb:2188 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))
After the reset, we do another clflush before checking the CSB to be
sure we see whatever was left in the CSB prior to the reset. So it is
unlikely to be an incoherent view of the CSB, and more likely that
Icelake didn't reset its pointers.
References:
582a6f90aa0d ("drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191104135307.21083-1-chris@chris-wilson.co.uk