git.baikalelectronics.ru Git

drm/i915: Emphasize that ctx->id is merely a user handle

This is an Execlists preparatory patch, since they make context ID become an
overloaded term:

- In the software, it was used to distinguish which context userspace was
  trying to use.
- In the BSpec, the term is used to describe the 20-bits long field the
  hardware uses to it to discriminate the contexts that are submitted to
  the ELSP and inform the driver about their current status (via Context
  Switch Interrupts and Context Status Buffers).

Initially, I tried to make the different meanings converge, but it proved
impossible:

- The software ctx->id is per-filp, while the hardware one needs to be
  globally unique.
- Also, we multiplex several backing states objects per intel_context,
  and all of them need unique HW IDs.
- I tried adding a per-filp ID and then composing the HW context ID as:
  ctx->id + file_priv->id + ring->id, but the fact that the hardware only
  uses 20-bits means we have to artificially limit the number of filps or
  contexts the userspace can create.

The ctx->user_handle renaming bits are done with this Cocci patch (plus
manual frobbing of the struct declaration):

    @@
    struct intel_context c;
    @@
    - (c).id
    + c.user_handle

    @@
    struct intel_context *c;
    @@
    - (c)->id
    + c->user_handle

Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and
change the type to unsigned 32 bits.

v2: s/handle/user_handle and change the type to uint32_t as suggested by
Chris Wilson.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Emphasize that ctx->obj & ctx->is_initialized refer to the legacy rcs ctx

We have already advanced that Logical Ring Contexts have their own kind
of backing objects, but everything will be better explained in the Execlists
series. For now, suffice it to say that the current backing object is only
ever used with the render ring, so we're making this fact more explicit
(which is a good reason on its own).

As for the is_initialized flag, we only use to signify that the render state
has been initialized (a.k.a. golden context, a.k.a. null context). It doesn't
mean anything for the other engines, so make that distinction obvious.

Done with the following Coccinelle patch (plus manual frobbing of the struct):

    @@
    struct intel_context c;
    @@
    - (c).obj
    + c.legacy_hw_ctx.rcs_state

    @@
    struct intel_context *c;
    @@
    - (c)->obj
    + c->legacy_hw_ctx.rcs_state

    @@
    struct intel_context c;
    @@
    - (c).is_initialized
    + c.legacy_hw_ctx.initialized

    @@
    struct intel_context *c;
    @@
    - (c)->is_initialized
    + c->legacy_hw_ctx.initialized

This Execlists prep-work patch has been suggested by Chris Wilson and Daniel
Vetter separately.

Initially, it was two separate patches:
drm/i915: Rename ctx->obj to ctx->rcs_state
drm/i915: Make it obvious that ctx->id is merely a user handle

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/id/is_initialized/ to fix the subject and resolve a
conflict in i915_gem_context_reset. Also introduce a new lctx local
variable to avoid overtly long lines.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Extract context backing object allocation

This is preparatory work for Execlists: we plan to use it later to
allocate our own context objects (since Logical Ring Contexts do
not have the same kind of backing objects).

No functional changes.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: make system freeze support depend on CONFIG_ACPI_SLEEP

To achieve further power savings during system freeze (aka connected
standby, or s0ix) we have to send a PCI_D1 opregion notification. As
the information about the state we're entering (system freeze,
suspend to ram or suspend to disk) is only available through the ACPI
subsystem, make this support depend on the relevant kconfig option.
Things will still work if this option isn't set, albeit with less than
optimial power saving.

This also fixes a compile breakage when the option is not set introduced
in

commit 324691690956f9df27a3b34fbf2ea4ca572f548f
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Thu Jun 12 08:35:47 2014 -0700

drm/i915: send proper opregion notifications on suspend/resume

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Restrict GPU boost to the RCS engine

Make the assumption that media workloads are not as latency sensitive
for __wait_seqno, and that upclocking the GPU does not affect the BLT
engine. Under that assumption, we only wait to forcibly upclock the GPU
when we are stalling for results from the render pipeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: correct BLC vs PWM enable/disable ordering

With the new checks in place, we can see we're doing things backwards,
so fix them up per the spec.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Enable semaphores on BDW

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/bdw: poll semaphores

As Ville points out, it's possible/probable we don't actually need this.
Potentially, this validates the letter of the spec, and not the spirit.

Ville:
> I discussed this on irc w/ Ben, and I was suggesting we don't need to
> poll. Polling apparently can be used as a workaround for certain
> hardware issues, but it looks like those issues shouldn't affect us,
> for the momemnt at least. So my suggestion was to try w/o polling
> first (since there could be some power cost to polling) and add the
> poll bit if problems arise.

Rodrigo: Spec suggests this as an W/A for GT3. However semaphores didn't
worked in my BDW GT2 on Signal Mode. So pool mode is definitely needed.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: semaphore debugfs

Simple debugfs file to display the current state of semaphores. This is
useful if you want to see the state without hanging the GPU.

NOTE: This patch is optional to the series.

NOTE2: Like the GPU error state collection, the reads are currently
incoherent.

v2 (Rodrigo): * Iterate only on active rings.
* s/ring_buffer/engine_cs.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/bdw: collect semaphore error state

Since the semaphore information is in an object, just dump it, and let
the user parse it later.

NOTE: The page being used for the semaphores are incoherent with the
CPU. No matter what I do, I cannot figure out a way to read anything but
0s. Note that the semaphore waits are indeed working.

v2: Don't print signal, and wait (they should be the same). Instead,
print sync_seqno (Chris)

v3: Free the semaphore error object (Chris)

v4: Fix semaphore offset calculation during error state collection
(Ville)

v5: VCS2 rebase
Make semaphore object error capture coding style consistent (Ville)
Do the proper math for the signal offset (Ville)

v6: Fix small conflicts on rebase and s/ring_buffer/engine_cs (Rodrigo)

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Extract semaphore error collection

v2: s/ring_buffer/engine_cs (Rodrigo)

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Implement MI decode for gen8

Ipehr just carries Dword 0 and on Gen 8, offsets are located
on Dword 2 and 3 of MI_SEMAPHORE_WAIT.

This implementation was based on Ben's work and on Ville's suggestion for Ben

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Fixup format string.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/bdw: implement semaphore wait

Semaphore waits use a new instruction, MI_SEMAPHORE_WAIT. The seqno to
wait on is all well defined by the table in the previous patch. There is
nothing else different from previous GEN's semaphore synchronization
code.

v2: Update macros to not require the other ring's ring->id (Chris)

v3: Add missing VCS2 gen8_ring_wait init besides
s/ring_buffer/engine_cs (Rodrigo)

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/bdw: implement semaphore signal

Semaphore signalling works similarly to previous GENs with the exception
that the per ring mailboxes no longer exist. Instead you must define
your own space, somewhere in the GTT.

The comments in the code define the layout I've opted for, which should
be fairly future proof. Ie. I tried to define offsets in abstract terms
(NUM_RINGS, seqno size, etc).

NOTE: If one wanted to move this to the HWSP they could. I've decided
one 4k object would be easier to deal with, and provide potential wins
with cache locality, but that's all speculative.

v2: Update the macro to not need the other ring's ring->id (Chris)
Update the comment to use the correct formula (Chris)

v3: Move the macros the ringbuffer.h to prevent churn in next patch
(Ville)

v4: Fixed compilation rebase conflict
commit 82a12dc89e3b1a1151c0012158ab44d6cda2dfdf
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Feb 14 14:01:11 2014 +0100

    drm/i915: Consolidate binding parameters into flags

v5: VCS2 rebase
Replace hweight_long with hweight32

v6 (Rodrigo): * Add missed VC2 gen8 ring signal init
          * fixing conflicst on rebase
           * minor fixes on address table
      * remove WARN_ON

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[danvet: s/BUG_ON/WARN_ON/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Make semaphore updates more precise

With the ring mask we now have an easy way to know the number of rings
in the system, and therefore can accurately predict the number of dwords
to emit for semaphore signalling. This was not possible (easily)
previously.

There should be no functional impact, simply fewer instructions emitted.

While we're here, simply do the round up to 2 instead of the fancier
rounding we did before, which rounding up per mbox, ie 4. This also
allows us to drop the unnecessary MI_NOOP, so not really 4, 3.

v2: Use 3 dwords instead of 4 (Ville)
Do the proper calculation to get the number of dwords to emit (Ville)
Conditionally set .sync_to when semaphores are enabled (Ville)

v3: Rebased on VCS2
Replace hweight_long with hweight32 (Ville)

v4: Pull out the accidentally squashed hunk from the next patch after
rebase (Daniel).

v5: Fix conflict after rebase (Rodrigo)

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: gen specific ring init

Gen8 has already had some differentiation with how it handles rings.
Semaphores bring yet more differences, and now is as good a time as any
to do the split.

Also, since gen8 doesn't actually use semaphores up until this point,
put the proper "NULL" values in for the mbox info.

v2: v1 had a stale commit message

v3: Move everything in the is_semaphore_enabled() check

v4: VCS2 rebase
Remove double assignment of signal in render ring (Ville)

v5: Adding missed VCS2 signal init on gen8+ (Rodrigo)

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Updating comments.

ring index calculation table was out of date after other rings were added,
although the formula is flexible and scale when adding new rings.

So this patch just update the comments and add a brief explanation
why to use sync_seqno[ring index].

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Fix VCS2's ring name.

It just fix a typo.

v2: removing underscore to let this like all other ring names (Oscar)

Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by (v1): Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Corrected 'file_priv' to 'file' in 'i915_driver_preclose()'

The 'i915_driver_preclose()' function has a parameter called 'file_priv'.
However, this is misleading as the structure it points to is a 'drm_file' not a
'drm_i915_file_private'. It should be named just 'file' to avoid confusion.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Support pf CRC source on haswell transcoder edp

The always-on power well pixel path on haswell is routed such that it
bypasses the panel fitter when we use is. Which means the pfit CRC
source won't work in that configuration.

Add a new disallow-bypass flags to the pfit pipe config state and set
it when we want to use the pf CRC. Results in a bit of flicker, but
should get the job done. We'll also undo do it afterwards to make sure
other tests arent' negatively affected.

Totally untested due to lack of hsw laptops around here.

v2: s/disallow_bypass/force_power_well_on/ to avoid a double negative
(Damien).

v3: force_thru because roadsigns.

v4: Don't forget the power wells! Also note that until the runtime pm
for DPMS series is fully merged the simple disable/enable trick won't
work since the ->crtc_mode_set callback is still required to do nasty
things. This stuff is tricky, but I think by both fixing up
get_crtc_power_domains and the debugfs wa code we should always
grab/drop the additional power well correctly.

v5: Wrap in () as suggested by Damien to avoid setting reserved values
for the edp transcoder path on bdw+

References: https://bugs.freedesktop.org/show_bug.cgi?id=72864
Cc: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Tested-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/bdw: 3D_CHICKEN3 has write mask bits

The workaround to limit SDE poly depth FIFO to 2 is not applied because
3D Chicken-3 mask bit is not set.

WaLimitSizeOfSDEPolyFifo is only for BDW-A and could be removed.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

DRM/i915: Remove magic to prevent blank screen on gen4 chipsets

Since the root cause is understood now and with the fix

   commit 479ecefe4f5bd8db735ad87ba10eaff852d5952f
   Author: Imre Deak <imre.deak@intel.com>
   Date:   Fri Jun 13 14:54:21 2014 +0300

       drm/i915: gmch: fix stuck primary plane due to memory self-refresh mode

in place the magic for G4x chipsets introduced with commit

   commit 45af59c702d3bd10e08c0f0bfb6ef95d7db596b0
   Author: Egbert Eich <eich@suse.com>
   Date:   Mon Mar 4 09:24:38 2013 -0500

       DRM/i915: On G45 enable cursor plane briefly after enabling the display plane.

to avoided occasional screen blanking on mode changes can finally
be removed.
It's been verified that Imre's fix also resolves the said issue.

Signed-off-by: Egbert Eich <eich@suse.de>
Tested-by: Stefan Dirsch <sndirsch@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Don't pretend ips is always enabled on BDW.

As pointed out before we don't have a reliable way to read back ips
status on BDW without the risk to disable it when reading.
However now we are pretending that IPS on BDW is always on and getting
people confused about it.

So this patch allows people to know if ips was ever attempted to be enabled.
Even if the current status is impossible to be ascertain.

v2: (spotted by Paulo):
     * A version that at least compiles
     * with more clear messages
     * let Cheryview on the safe side until we aren't sure that checking ips
       state on ips won't disable it.

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Unpin last_context at reset

We're forgetting to unpin the last_context from the ggtt at GPU reset
time. This leads to the vma pin_count leaking at every reset if the
last context wasn't the ring default context. Further use of the same
context will trigger the pin_count check in i915_gem_object_pin() and
userspace will be faced with EBUSY as a result.

This plaques kms_flip rather badly since it performs lots of resets,
and every fd has its own default context these days.

Fix the problem by properly unpinning the last context at reset.

This regression seems to back to

commit 7727eaf9a70b86f64f68ccb97517b375b4b42155
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Fri Dec 6 14:11:03 2013 -0800

drm/i915: Better reset handling for contexts

Testcase: igt/gem_ctx_exec/reset-pin-leak
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: rework digital port IRQ handling (v2)

The digital ports from Ironlake and up have the ability to distinguish
between long and short HPD pulses. Displayport 1.1 only uses the short
form to request link retraining usually, so we haven't really needed
support for it until now.

However with DP 1.2 MST we need to handle the short irqs on their
own outside the modesetting locking the long hpd's involve. This
patch adds the framework to distinguish between short/long to the
current code base, to lay the basis for future DP 1.2 MST work.

This should mean we get better bisectability in case of regression
due to the new irq handling.

v2: add GM45 support (untested, due to lack of hw)

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Todd Previte <tprevite@gmail.com>
[danvet: Fix conflicts in i915_irq.c with Oscar Mateo's irq handling
race fixes and a trivial one in intel_drv.h with the psr code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: gmch: fix stuck primary plane due to memory self-refresh mode

Blanking/unblanking the console in a loop on an Asus T100 sometimes
leaves the console blank. After some digging I found that applying

commit 45af59c702d3bd10e08c0f0bfb6ef95d7db596b0
Author: Egbert Eich <eich@suse.com>
Date:   Mon Mar 4 09:24:38 2013 -0500

    DRM/i915: On G45 enable cursor plane briefly after enabling the display plane.

fixed VLV too.

In my case the problem seemed to happen already during the previous crtc
disabling and went away if I disabled self-refresh mode before disabling
the primary plane.

The root cause for this is that updates from the shadow to live plane
control register are blocked at vblank time if the memory self-refresh
mode (aka max-fifo mode on VLV) is active at that moment. The controller
checks at frame start time if the CPU is in C0 and the self-refresh mode
enable bit is set and if so activates self-reresh mode, otherwise
deactivates it. So to make sure that the plane truly gets disabled before
pipe-off we have to:

1. disable memory self-refresh mode
2. disable plane
3. wait for vblank
4. disable pipe
5. wait for pipe-off

v2:
- add explanation for the root cause from HW team (Cesar Mancini et al)
- remove note about the CPU C7S state, in my latest tests disabling it
  alone didn't make a difference
- add vblank between disabling plane and pipe (Ville)
- apply the same workaround for all gmch platforms (Ville)

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: gmch: set SR WMs to valid values before enabling them

Atm it's possible that we enable the memory self-refresh mode before the
watermark levels used by this mode are programmed with valid values. So
move the enabling after we programmed the WM levels.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: gmch: factor out intel_set_memory_cxsr

This functionality will be also needed by an upcoming patch, so factor
it out. As a bonus this also makes things a bit more uniform across
platforms. Note that this also changes the register read-modify-write
to a simple write during disabling. This is what we do during enabling
anyway and according to the spec all the relevant bits are reserved-MBZ
or reserved with a 0 default value.

v2:
- unchanged
v3:
- fix missing cxsr disabling on pineview (Deepak)

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Move VLV cmnlane workaround to intel_power_domains_init_hw()

Now that the CMNRESET deassert is part of the cmnlane power well,
intel_reset_dpio() is called too late to make any difference. We've
deasserted CMNRESET by that time, and so the off+on toggle w/a will
never kick in.

Move the workaround to intel_power_domains_init_hw() where it gets
called before we enable the init power domain.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Pull the cmnlane tricks into its own power well ops

Remove the clutter in __vlv_set_power_well() by moving the cmnlane
handling into custom enable/disable hooks for the cmnlane.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Kill duplicated cdclk readout code from i2c

We have a slightly different way of readoing out the cdclk in
gmbus_set_freq(). Kill that and just call .get_display_clock_speed().

Also need to remove the GMBUSFREQ update from intel_i2c_reset() since
that gets called way too early. Let's do it in intel_modeset_init_hw()
instead, and also pull the initial vlv_cdclk_freq update there from
init_clock gating.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Warn if there's a cdclk change in progess

If someone is interested in the current cdclk frquency it should
be stable and not in process of changing frquency. Warn if the current
and requested cdclk don't match in .get_display_clock_spee() on vlv.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Wait for cdclk change to occure when going for 400MHz

VLV Punit doesn't support the 400MHz cdclk option, so we bypass the
Punit and poke at CCK directly. However we forgot to wait for the
frequeency change to complete. Poll the CCK clock status to make sure
the clock has changed before we fire up any pipes.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Use 200MHz cdclk on vlv when all pipes are off

Drop the cdclk frequency to 200MHz on vlv when all pipes are off. In
theory we should be able to use 200MHz also when the pixel clock is at
most 90% of 200MHz. However in practice all we seem to get is a solid
color picture or an otherwise corrupted display.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Handle 320 vs. 333 MHz cdclk on vlv

Depending on the HPLL frequency one of the supported cdclk frquencies is
either 320MHz or 333MHz. Figure out which one it is to accurately pick
the minimal required cdclk. This would also avoid a warning from the
cdclk code where it compares the actual cdclk read out from the hardware
with a value that was calculated using valleyview_calc_cdclk().

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Move vlv cdclk code to .get_display_clock_speed()

We have a standard hook for reading out the current cdclk. Move the VLV
code from valleyview_cur_cdclk() to .get_display_clock_speed().

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Give names to the CCK_DISPLAY_CLOCK_CONTROL bits

Avoid using magic values for CCK frequency bits. Also the mask we were
using for the requested frequency was one bit too short. Fix it up.

Note: This also fixes the #define for a mask (spotted by Jesse in his
review).

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: Add note about mask change.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Change vlv cdclk to use kHz units

Use kHz units in vlv cdclk code since that's more customary.

Also replace the precomputed 90% values with *9/10 computation
for extra clarity.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Merge tag 'v3.16-rc4' into drm-intel-next-queued

Due to Dave's vacation drm-next hasn't opened yet for 3.17 so I
couldn't move my drm-intel-next queue forward yet like I usually do.
Just pull in the latest upstream -rc to unblock patch merging - I
don't want to needlessly rebase my current patch pile really and void
all the testing we've done already.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Merge patches merged by Jani while I was on vacation.

Jani apparently didn't rebase onto latest drm-intel-next so a merge is
in order.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Linux 3.16-rc4

Merge tag 'dt-for-linus' of git://git.secretlab.ca/git/linux

Pull devicetree bugfix from Grant Likely:
"Important bug fix for parsing 64-bit addresses on 32-bit platforms.
  Without this patch the kernel will try to use memory ranges that
  cannot be reached"

* tag 'dt-for-linus' of git://git.secretlab.ca/git/linux:
  of: Check for phys_addr_t overflows in early_init_dt_add_memory_arch

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"This is a set of 13 fixes, a MAINTAINERS update and a sparse update.
  The fixes are mostly correct value initialisations, avoiding NULL
  derefs and some uninitialised pointer avoidance.

  All the patches have been incubated in -next for a few days.  The
  final patch (use the scsi data buffer length to extract transfer size)
  has been rebased to add a cc to stable, but only the commit message
  has changed"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] use the scsi data buffer length to extract transfer size
  virtio-scsi: fix various bad behavior on aborted requests
  virtio-scsi: avoid cancelling uninitialized work items
  ibmvscsi: Add memory barriers for send / receive
  ibmvscsi: Abort init sequence during error recovery
  qla2xxx: Fix sparse warning in qla_target.c.
  bnx2fc: Improve stats update mechanism
  bnx2fc: do not scan uninitialized lists in case of error.
  fc: ensure scan_work isn't active when freeing fc_rport
  pm8001: Fix potential null pointer dereference and memory leak.
  MAINTAINERS: Update LSILOGIC MPT FUSION DRIVERS (FC/SAS/SPI) maintainers Email IDs
  be2iscsi: remove potential junk pointer free
  be2iscsi: add an missing goto in error path
  scsi_error: set DID_TIME_OUT correctly
  scsi_error: fix invalid setting of host byte

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"i915, tda998x and vmwgfx fixes,

  The main one is i915 fix for missing VGA connectors, along with some
  fixes for the tda998x from Russell fixing some modesetting problems.

  (still on holidays, but got a spare moment to find these)"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/vmwgfx: Fix incorrect write to read-only register v2:
  drm/i915: Drop early VLV WA to fix Voltage not getting dropped to Vmin
  drm/i915: only apply crt_present check on VLV
  drm/i915: Wait for vblank after enabling the primary plane on BDW
  drm/i2c: tda998x: add some basic mode validation
  drm/i2c: tda998x: faster polling for edid
  drm/i2c: tda998x: move drm_i2c_encoder_destroy call

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"This week's arm-soc fixes:

   - A set of of OMAP patches that we had missed Tony's pull request of:
     * Reset fix for am43xx
     * Proper OPP table for omap5
     * Fix for SoC detection of one of the DRA7 SoCs
     * hwmod updates to get SATA and OCP to work on omap5 (drivers
       merged in 3.16)
     * ... plus a handful of smaller fixes
   - sunxi needed to re-add machine specific restart code that was
     removed in anticipation of a watchdog driver being merged for 3.16,
     and it didn't make it in.
   - Marvell fixes for PCIe on SMP and a big-endian fix.
   - A trivial defconfig update to make my capri test board boot with
     bcm_defconfig again.

  ... and a couple of MAINTAINERS updates, one to claim new Keystone
  drivers that have been merged, and one to merge MXS and i.MX (both
  Freescale platforms).

  The largest diffs come from the hwmod code for omap5 and the re-add of
  the restart code on sunxi.  The hwmod stuff is quite late at this
  point but it slipped through cracks repeatedly while coming up the
  maintainer chain and only affects the one SoC so risk is low"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: Add few more Keystone drivers
  MAINTAINERS: merge MXS entry into IMX one
  ARM: sunxi: Reintroduce the restart code for A10/A20 SoCs
  ARM: mvebu: fix cpuidle implementation to work on big-endian systems
  ARM: mvebu: update L2/PCIe deadlock workaround after L2CC cleanup
  ARM: mvebu: move Armada 375 external abort logic as a quirk
  ARM: bcm: Fix bcm and multi_v7 defconfigs
  ARM: dts: dra7-evm: remove interrupt binding
  ARM: OMAP2+: Fix parser-bug in platform muxing code
  ARM: DTS: dra7/dra7xx-clocks: ATL related changes
  ARM: OMAP2+: drop unused function
  ARM: dts: am43x-epos-evm: Add Missing cpsw-phy-sel for am43x-epos-evm
  ARM: dts: omap5: Update CPU OPP table as per final production Manual
  ARM: DRA722: add detection of SoC information
  ARM: dts: Enable twl4030 off-idle configuration for selected omaps
  ARM: OMAP5: hwmod: Add ocp2scp3 and sata hwmods
  ARM: OMAP2+: hwmod: Change hardreset soc_ops for AM43XX

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"A few minor fixlets in ARM SoC irq drivers and a fix for a memory leak
  which I introduced in the last round of cleanups :("

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Fix memory leak when calling irq_free_hwirqs()
  irqchip: spear_shirq: Fix interrupt offset
  irqchip: brcmstb-l2: Level-2 interrupts are edge sensitive
  irqchip: armada-370-xp: Mask all interrupts during initialization.

Merge tag 'drm-intel-fixes-2014-07-03' of git://anongit.freedesktop.org/drm-intel

Fixes for 3.16-rc3; most importantly Jesse brings back VGA he took away
on a bunch of machines. Also a vblank fix for BDW and a power workaround
fix for VLV.

* tag 'drm-intel-fixes-2014-07-03' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Drop early VLV WA to fix Voltage not getting dropped to Vmin
  drm/i915: only apply crt_present check on VLV
  drm/i915: Wait for vblank after enabling the primary plane on BDW

Merge branch 'vmwgfx-fixes-3.16' of git://people.freedesktop.org/~thomash/linux

fix to a 3.15 commit.

* 'vmwgfx-fixes-3.16' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Fix incorrect write to read-only register v2:

Merge branch 'tda998x-fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-cubox

mode fixes for tda998x.

* 'tda998x-fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-cubox:
  drm/i2c: tda998x: add some basic mode validation
  drm/i2c: tda998x: faster polling for edid
  drm/i2c: tda998x: move drm_i2c_encoder_destroy call

genirq: Fix memory leak when calling irq_free_hwirqs()

irq_free_hwirqs() always calls irq_free_descs() with a cnt == 0
which makes it a no-op since the interrupt count to free is
decremented in itself.

Fixes: 0c727e15e8b5255fd5591bbf4f627bca661e09ab
Signed-off-by: Keith Busch <keith.busch@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1404167084-8070-1-git-send-email-keith.busch@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull ARM64 fixes from Catalin Marinas:
- Exception level check at boot time (for completeness, not triggering
   any bug before)
- I/D-cache synchronisation logic for huge pages
- Config symbol typo

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fix el2_setup check of CurrentEL
  arm64: mm: Make icache synchronisation logic huge page aware
  arm64: mm: Fix horrendous config typo

MAINTAINERS: Add few more Keystone drivers

Update MAINTAINERS file for recently added reset controller, AEMIF
and clocksource driver for Keystone SOCs.

The EMIF memory controller driver is also added along with AEMIF.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Olof Johansson <olof@lixom.net>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

MAINTAINERS: merge MXS entry into IMX one

The mach-mxs platform is actually co-maintained by myself and
pengutronix folks. Also it's hosted in the same kernel tree as IMX.
So let's merge the entry into IMX one.

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'mvebu-fixes-3.16-2' of git://git.infradead.org/linux-mvebu into fixes

mvebu fixes for v3.16 (round #2)

- mvebu
    - Fix PCIe deadlock now that SMP is enabled
    - Fix cpuidle for big-endian systems

* tag 'mvebu-fixes-3.16-2' of git://git.infradead.org/linux-mvebu:
  ARM: mvebu: fix cpuidle implementation to work on big-endian systems
  ARM: mvebu: update L2/PCIe deadlock workaround after L2CC cleanup
  ARM: mvebu: move Armada 375 external abort logic as a quirk

Signed-off-by: Olof Johansson <olof@lixom.net>

ARM: sunxi: Reintroduce the restart code for A10/A20 SoCs

This partly reverts commits e309ad9470e2 (ARM: sunxi: Remove reset code from
the platform) and 439ab4aa3912 (ARM: sunxi: Remove init_machine callback) for
the sun4i, sun5i and sun7i families.

This is needed because the watchdog counterpart of these commits was dropped,
and didn't make it into 3.16. In order to still be able to reboot the board, we
need to reintroduce that code. Of course, the long term view is still to get
rid of that code in mach-sunxi.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'omap-for-v3.16/fixes-against-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

Merge OMAP fixes from Tony Lindgren:

Fixes for omaps for issues discovered during the merge window and
enabling of a few features that had to wait for the driver
dependencies to clear.

The fixes included are:

- Fix am43xx hard reset flags
- Fix SoC detection for DRA722
- Fix CPU OPP table for omap5
- Fix legacy mux parser bug if requested muxname is a prefix of
  multiple mux entries
- Fix qspi interrupt binding that relies on the irq crossbar
  that has not yet been enabled
- Add missing phy_sel for am43x-epos-evm
- Drop unused gic_init_irq() that is no longer needed

And the enabling of features that had driver dependencies are:

- Change dra7 to use Audio Tracking Logic clock instead of a fixed
  clock now that the clock driver for it has been merged

- Enable off idle configuration for selected omaps as all the kernel
  dependencies for device tree based booting are finally merged as
  this is needed to get the automated PM tests working finally with
  device tree based booting

- Add hwmod entry for ocp2scp3 for omap5 to get sata working as
  all the driver dependencies are now in the kernel and this patch
  fell through the cracks during the merge window

* tag 'omap-for-v3.16/fixes-against-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: dra7-evm: remove interrupt binding
  ARM: OMAP2+: Fix parser-bug in platform muxing code
  ARM: DTS: dra7/dra7xx-clocks: ATL related changes
  ARM: OMAP2+: drop unused function
  ARM: dts: am43x-epos-evm: Add Missing cpsw-phy-sel for am43x-epos-evm
  ARM: dts: omap5: Update CPU OPP table as per final production Manual
  ARM: DRA722: add detection of SoC information
  ARM: dts: Enable twl4030 off-idle configuration for selected omaps
  ARM: OMAP5: hwmod: Add ocp2scp3 and sata hwmods
  ARM: OMAP2+: hwmod: Change hardreset soc_ops for AM43XX

Merge tag 'md/3.16-fixes' of git://neil.brown.name/md

Pull md bugfixes from Neil Brown:
"Two minor bugfixes for md in 3.16"

* tag 'md/3.16-fixes' of git://neil.brown.name/md:
md: flush writes before starting a recovery.
md: make sure GET_ARRAY_INFO ioctl reports correct "clean" status

Merge tag 'sound-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This contains a few fixes for HD-audio: yet another Dell headset pin
  quirk, a fixup for Thinkpad T540P, and an improved fix for
  Haswell/Broadwell HDMI clock setup"

* tag 'sound-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - restore BCLK M/N value as per CDCLK for HSW/BDW display HDA controller
  drm/i915: provide interface for audio driver to query cdclk
  ALSA: hda - Add a fixup for Thinkpad T540p
  ALSA: hda - Add another headset pin quirk for some Dell machines

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"We've queued up a few fixes in my for-linus branch"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix crash when starting transaction
  Btrfs: fix btrfs_print_leaf for skinny metadata
  Btrfs: fix race of using total_bytes_pinned
  btrfs: use E2BIG instead of EIO if compression does not help
  btrfs: remove stale comment from btrfs_flush_all_pending_stuffs
  Btrfs: fix use-after-free when cloning a trailing file hole
  btrfs: fix null pointer dereference in btrfs_show_devname when name is null
  btrfs: fix null pointer dereference in clone_fs_devices when name is null
  btrfs: fix nossd and ssd_spread mount option regression
  Btrfs: fix race between balance recovery and root deletion
  Btrfs: atomically set inode->i_flags in btrfs_update_iflags
  btrfs: only unlock block in verify_parent_transid if we locked it
  Btrfs: assert send doesn't attempt to start transactions
  btrfs compression: reuse recently used workspace
  Btrfs: fix crash when mounting raid5 btrfs with missing disks
  btrfs: create sprout should rename fsid on the sysfs as well
  btrfs: dev replace should replace the sysfs entry
  btrfs: dev add should add its sysfs entry
  btrfs: dev delete should remove sysfs entry
  btrfs: rename add_device_membership to btrfs_kobj_add_device

arm64: fix el2_setup check of CurrentEL

The CurrentEL system register reports the Current Exception Level
of the CPU. It doesn't say anything about the stack handling, and
yet we compare it to PSR_MODE_EL2t and PSR_MODE_EL2h.

It works by chance because PSR_MODE_EL2t happens to match the right
bits, but that's otherwise a very bad idea. Just check for the EL
value instead.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[catalin.marinas@arm.com: fixed arch/arm64/kernel/efi-entry.S]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: mm: Make icache synchronisation logic huge page aware

The __sync_icache_dcache routine will only flush the dcache for the
first page of a compound page, potentially leading to stale icache
data residing further on in a hugetlb page.

This patch addresses this issue by taking into consideration the
order of the page when flushing the dcache.

Reported-by: Mark Brown <broonie@linaro.org>
Tested-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org> # v3.11+

arm64: mm: Fix horrendous config typo

The define ARM64_64K_PAGES is tested for rather than
CONFIG_ARM64_64K_PAGES. Correct that typo here.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

drm/i915: Show cursor size in debugfs/i915_display_info

Inlcude the pipe-size and cursor-size in debugfs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>

ALSA: hda - restore BCLK M/N value as per CDCLK for HSW/BDW display HDA controller

For HSW/BDW display HD-A controller, hda_set_bclk() is defined to set BCLK
by programming the M/N values as per the core display clock (CDCLK) queried from
i915 display driver.

And the audio driver will also set BCLK in azx_first_init() since the display
driver can turn off the shared power in boot phase if only eDP is connected
and M/N values will be lost and must be reprogrammed.

Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

drm/i915: provide interface for audio driver to query cdclk

For Haswell and Broadwell, if the display power well has been disabled,
the display audio controller divider values EM4 M VALUE and EM5 N VALUE
will have been lost. The CDCLK frequency is required for reprogramming them
to generate 24MHz HD-A link BCLK. So provide a private interface for the
audio driver to query CDCLK.

This is a stopgap solution until a more generic interface between audio
and display drivers has been implemented.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge tag 'usb-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB bugfixes from Greg KH:
"Here's a round of USB bugfixes, quirk additions, and new device ids
  for 3.16-rc4.  Nothing major in here at all, just a bunch of tiny
  changes.  All have been in linux-next with no reported issues"

* tag 'usb-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
  usb: chipidea: udc: delete td from req's td list at ep_dequeue
  usb: Kconfig: make EHCI_MSM selectable for QCOM SOCs
  usb-storage/SCSI: Add broken_fua blacklist flag
  usb: musb: dsps: fix the base address for accessing the mode register
  tools: ffs-test: fix header values endianess
  usb: phy: msm: Do not do runtime pm if the phy is not idle
  usb: musb: Ensure that cppi41 timer gets armed on premature DMA TX irq
  usb: gadget: gr_udc: Fix check for invalid number of microframes
  usb: musb: Fix panic upon musb_am335x module removal
  usb: gadget: f_fs: resurect usb_functionfs_descs_head structure
  Revert "tools: ffs-test: convert to new descriptor format fixing compilation error"
  xhci: Fix runtime suspended xhci from blocking system suspend.
  xhci: clear root port wake on bits if controller isn't wake-up capable
  xhci: correct burst count field for isoc transfers on 1.0 xhci hosts
  xhci: Use correct SLOT ID when handling a reset device command
  MAINTAINERS: update e-mail address
  usb: option: add/modify Olivetti Olicard modems
  USB: ftdi_sio: fix null deref at port probe
  MAINTAINERS: drop two usb-serial subdriver entries
  USB: option: add device ID for SpeedUp SU9800 usb 3g modem
  ...

Merge tag 'staging-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver bugfixes from Greg KH:
"Nothing major here, just 4 small bugfixes that resolve some issues
  reported for the IIO (staging and non-staging) and the tidspbridge
  driver"

* tag 'staging-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: tidspbridge: fix an erroneous removal of parentheses
  iio: of_iio_channel_get_by_name() returns non-null pointers for error legs
  staging: iio/ad7291: fix error code in ad7291_probe()
  iio:adc:ad799x: Fix reading and writing of event values, apply shift

Merge tag 'driver-core-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Well, one drivercore fix for kernfs to resolve a reported issue with
  sysfs files being updated from atomic contexts, and another lz4 bugfix
  for testing potential buffer overflows"

* tag 'driver-core-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  lz4: add overrun checks to lz4_uncompress_unknownoutputsize()
  kernfs: kernfs_notify() must be useable from non-sleepable contexts

Merge tag 'trace-fixes-v3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Oleg Nesterov found and fixed a bug in the perf/ftrace/uprobes code
  where running:

    # perf probe -x /lib/libc.so.6 syscall
    # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
    # perf record -e probe_libc:syscall whatever

  kills the uprobe.  Along the way he found some other minor bugs and
  clean ups that he fixed up making it a total of 4 patches.

  Doing unrelated work, I found that the reading of the ftrace trace
  file disables all function tracer callbacks.  This was fine when
  ftrace was the only user, but now that it's used by perf and kprobes,
  this is a bug where reading trace can disable kprobes and perf.  A
  very unexpected side effect and should be fixed"

* tag 'trace-fixes-v3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Remove ftrace_stop/start() from reading the trace file
  tracing/uprobes: Fix the usage of uprobe_buffer_enable() in probe_event_enable()
  tracing/uprobes: Kill the bogus UPROBE_HANDLER_REMOVE code in uprobe_dispatcher()
  uprobes: Change unregister/apply to WARN() if uprobe/consumer is gone
  tracing/uprobes: Revert "Support mix of ftrace and perf"

Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild

Pull kbuild fix from Michal Marek:
"There is one more fix for the relative paths series from -rc1: Print
  the path to the build directory at the start of the build, so that
  editors and IDEs can match the relative paths to source files"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: Print the name of the build directory

Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
"By coincidence, two NFSv4 symlink bugs, one introduced in the 3.16 xdr
  encoding rewrite, the other a decoding bug that I think we've had
  since the start but that just doesn't trigger very often"

* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
  nfs: fix nfs4d readlink truncated packet
  nfsd: fix rare symlink decoding bug

ptrace,x86: force IRET path after a ptrace_stop()

The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values. That is
very much part of why it's faster than 'iret'.

Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.

However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'. Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.

Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lz4: add overrun checks to lz4_uncompress_unknownoutputsize()

Jan points out that I forgot to make the needed fixes to the
lz4_uncompress_unknownoutputsize() function to mirror the changes done
in lz4_decompress() with regards to potential pointer overflows.

The only in-kernel user of this function is the zram code, which only
takes data from a valid compressed buffer that it made itself, so it's
not a big issue. But due to external kernel modules using this
function, it's better to be safe here.

Reported-by: Jan Beulich <JBeulich@suse.com>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge remote-tracking branch 'scsi-queue/drivers-for-3.16' into for-linus

[SCSI] use the scsi data buffer length to extract transfer size

Commit 4e4db0e5926a introduced a helper that can be used to query the
wire transfer size for a SCSI command taking protection information into
account.

However, some commands do not have a 1:1 mapping between the block range
they work on and the payload size (discard, write same). After the
scatterlist has been set up these requests use __data_len to store the
number of bytes to report completion on. This means that callers of
scsi_transfer_length() would get the wrong byte count for these types of
requests.

To overcome this we make scsi_transfer_length() use the scatterlist
length in the scsi_data_buffer as basis for the wire transfer
calculation instead of __data_len.

Reported-by: Christoph Hellwig <hch@infradead.org>
Debugged-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Fixes: 7cee286b096f9e481102b77c3bde736df7b712b1
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

Merge branch 'akpm' (patches from Andrew Morton)

Merge fixes from Andrew Morton:
"14 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  shmem: fix init_page_accessed use to stop !PageLRU bug
  kernel/printk/printk.c: revert "printk: enable interrupts before calling console_trylock_for_printk()"
  tools/testing/selftests/ipc/msgque.c: improve error handling when not running as root
  fs/seq_file: fallback to vmalloc allocation
  /proc/stat: convert to single_open_size()
  hwpoison: fix the handling path of the victimized page frame that belong to non-LRU
  mm:vmscan: update the trace-vmscan-postprocess.pl for event vmscan/mm_vmscan_lru_isolate
  msync: fix incorrect fstart calculation
  zram: revalidate disk after capacity change
  tools: memory-hotplug fix unexpected operator error
  tools: cpu-hotplug fix unexpected operator error
  autofs4: fix false positive compile error
  slub: fix off by one in number of slab tests
  mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

shmem: fix init_page_accessed use to stop !PageLRU bug

Under shmem swapping load, I sometimes hit the VM_BUG_ON_PAGE(!PageLRU)
in isolate_lru_pages() at mm/vmscan.c:1281!

Commit 8b04f2deac87 ("mm: non-atomically mark page accessed during page
cache allocation where possible") looks like interrupted work-in-progress.

mm/filemap.c's call to init_page_accessed() is fine, but not mm/shmem.c's
- shmem_write_begin() is clearly wrong to use it after shmem_getpage(),
when the page is always visible in radix_tree, and often already on LRU.

Revert change to shmem_write_begin(), and use init_page_accessed() or
mark_page_accessed() appropriately for SGP_WRITE in shmem_getpage_gfp().

SGP_WRITE also covers shmem_symlink(), which did not mark_page_accessed()
before; but since many other filesystems use [__]page_symlink(), which did
and does mark the page accessed, consider this as rectifying an oversight.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Prabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/printk/printk.c: revert "printk: enable interrupts before calling console_trylock_for_printk()"

Revert commit 2ab1a8e97f05 ("printk: enable interrupts before calling
console_trylock_for_printk()").

Andreas reported:

: None of the post 3.15 kernel boot for me. They all hang at the GRUB
: screen telling me it loaded and started the kernel, but the kernel
: itself stops before it prints anything (or even replaces the GRUB
: background graphics).

2ab1a8e97f05 is modest latency reduction. Revert it until we understand
the reason for these failures.

Reported-by: Andreas Bombe <aeb@debian.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tools/testing/selftests/ipc/msgque.c: improve error handling when not running as root

The test fails in the middle when it is not run as root while accessing
/proc/sys/kernel/msg_next_id. Changed it to check for root at the
beginning of the test and exit if not root.

Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/seq_file: fallback to vmalloc allocation

There are a couple of seq_files which use the single_open() interface.
This interface requires that the whole output must fit into a single
buffer.

E.g.  for /proc/stat allocation failures have been observed because an
order-4 memory allocation failed due to memory fragmentation.  In such
situations reading /proc/stat is not possible anymore.

Therefore change the seq_file code to fallback to vmalloc allocations
which will usually result in a couple of order-0 allocations and hence
also work if memory is fragmented.

For reference a call trace where reading from /proc/stat failed:

  sadc: page allocation failure: order:4, mode:0x1040d0
  CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1
  [...]
  Call Trace:
    show_stack+0x6c/0xe8
    warn_alloc_failed+0xd6/0x138
    __alloc_pages_nodemask+0x9da/0xb68
    __get_free_pages+0x2e/0x58
    kmalloc_order_trace+0x44/0xc0
    stat_open+0x5a/0xd8
    proc_reg_open+0x8a/0x140
    do_dentry_open+0x1bc/0x2c8
    finish_open+0x46/0x60
    do_last+0x382/0x10d0
    path_openat+0xc8/0x4f8
    do_filp_open+0x46/0xa8
    do_sys_open+0x114/0x1f0
    sysc_tracego+0x14/0x1a

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: David Rientjes <rientjes@google.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

/proc/stat: convert to single_open_size()

These two patches are supposed to "fix" failed order-4 memory
allocations which have been observed when reading /proc/stat.  The
problem has been observed on s390 as well as on x86.

To address the problem change the seq_file memory allocations to
fallback to use vmalloc, so that allocations also work if memory is
fragmented.

This approach seems to be simpler and less intrusive than changing
/proc/stat to use an interator.  Also it "fixes" other users as well,
which use seq_file's single_open() interface.

This patch (of 2):

Use seq_file's single_open_size() to preallocate a buffer that is large
enough to hold the whole output, instead of open coding it.  Also
calculate the requested size using the number of online cpus instead of
possible cpus, since the size of the output only depends on the number
of online cpus.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hwpoison: fix the handling path of the victimized page frame that belong to non-LRU

Until now, the kernel has the same policy to handle victimized page
frames that belong to kernel-space(reserved/slab-subsystem) or
non-LRU(unknown page state).  In other word, the result of handling
either of these victimized page frames is (IGNORED | FAILED), and the
return value of memory_failure() is -EBUSY.

This patch is to avoid that memory_failure() returns very soon due to
the "true" value of (!PageLRU(p)), and it also ensures that
action_result() can report more precise information("reserved kernel",
"kernel slab", and "unknown page state") instead of "non LRU",
especially for memory errors which are detected by memory-scrubbing.

Andi said:

: While running the mcelog test suite on 3.14 I hit the following VM_BUG_ON:
:
: soft_offline: 0x56d4: unknown non LRU page type 3ffff800008000
: page:ffffea000015b400 count:3 mapcount:2097169 mapping:          (null) index:0xffff8800056d7000
: page flags: 0x3ffff800004081(locked|slab|head)
: ------------[ cut here ]------------
: kernel BUG at mm/rmap.c:1495!
:
: I think what happened is that a LRU page turned into a slab page in
: parallel with offlining.  memory_failure initially tests for this case,
: but doesn't retest later after the page has been locked.
:
: ...
:
: I ran this patch in a loop over night with some stress plus
: the mcelog test suite running in a loop. I cannot guarantee it hit it,
: but it should have given it a good beating.
:
: The kernel survived with no messages, although the mcelog test suite
: got killed at some point because it couldn't fork anymore. Probably
: some unrelated problem.
:
: So the patch is ok for me for .16.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm:vmscan: update the trace-vmscan-postprocess.pl for event vmscan/mm_vmscan_lru_isolate

When using trace-vmscan-postprocess.pl for checking the file/anon rate
of scanning, we can find that it can not be performed.  At the same
time, the following message will be reported:

  WARNING: Format not as expected for event vmscan/mm_vmscan_lru_isolate
  'file' != 'contig_taken' Fewer fields than expected in format at
  ./trace-vmscan-postprocess.pl line 171, <FORMAT> line 76.

In trace-vmscan-postprocess.pl, (contig_taken, contig_dirty, and
contig_failed) are be associated respectively to (nr_lumpy_taken,
nr_lumpy_dirty, and nr_lumpy_failed) for lumpy reclaim.  Via commit
ae9fa0f01f1b ("mm: vmscan: remove lumpy reclaim"), lumpy reclaim had
already been removed by Mel, but the update for
trace-vmscan-postprocess.pl was missed.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

msync: fix incorrect fstart calculation

Fix a regression caused by 6000acea89ab ("mm/msync.c: sync only the
requested range in msync()").

xfstests generic/075 fail occured on ext4 data=journal mode because the
intended range was not syncing due to wrong fstart calculation.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reported-by: Eric Whitney <enwlinux@gmail.com>
Tested-by: Eric Whitney <enwlinux@gmail.com>
Acked-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Tested-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

zram: revalidate disk after capacity change

Alexander reported mkswap on /dev/zram0 is failed if other process is
opening the block device file.

Step is as follows,

0. Reset the unused zram device.
1. Use a program that opens /dev/zram0 with O_RDWR and sleeps
   until killed.
2. While that program sleeps, echo the correct value to
   /sys/block/zram0/disksize.
3. Verify (e.g. in /proc/partitions) that the disk size is applied
   correctly. It is.
4. While that program still sleeps, attempt to mkswap /dev/zram0.
   This fails: mkswap: error: swap area needs to be at least 40 KiB

When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on
mkswap to get a size of blockdev was zero although zram0 has right size by
2.

The reason is zram didn't revalidate disk after changing capacity so that
size of blockdev's inode is not uptodate until all of file is close.

This patch should fix the BUG.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
Tested-by: Alexander E. Patrakov <patrakov@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tools: memory-hotplug fix unexpected operator error

on-off-test uses "$UID != 0" to test for root, but $UID is a construct
specific to bash. Using /bin/sh that isn't bash results in the
following error (due to the "$UID" part expanding to nothing):

./on-off-test.sh: 9: [: !=: unexpected operator

Change Makefile to use bash instead.

Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tools: cpu-hotplug fix unexpected operator error

on-off-test uses "$UID != 0" to test for root, but $UID is a construct
specific to bash. Using /bin/sh that isn't bash results in the
following error (due to the "$UID" part expanding to nothing):

./on-off-test.sh: 9: [: !=: unexpected operator

Change Makefile to use bash instead.

Signed-off-by: Shuah Khan <shuah.kh@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

autofs4: fix false positive compile error

On strict build environments we can see:

  fs/autofs4/inode.c: In function 'autofs4_fill_super':
  fs/autofs4/inode.c:312: error: 'pgrp' may be used uninitialized in this function
  make[2]: *** [fs/autofs4/inode.o] Error 1
  make[1]: *** [fs/autofs4] Error 2
  make: *** [fs] Error 2
  make: *** Waiting for unfinished jobs....

This is due to the use of pgrp_set being used to indicate pgrp has has
been set rather than initializing pgrp itself.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

slub: fix off by one in number of slab tests

min_partial means minimum number of slab cached in node partial list.
So, if nr_partial is less than it, we keep newly empty slab on node
partial list rather than freeing it.  But if nr_partial is equal or
greater than it, it means that we have enough partial slabs so should
free newly empty slab.  Current implementation missed the equal case so
if we set min_partial is 0, then, at least one slab could be cached.
This is critical problem to kmemcg destroying logic because it doesn't
works properly if some slabs is cached.  This patch fixes this problem.

Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs
immediately").

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:

  SMP: Total of 8 processors activated.
  devtmpfs: initialized
  Unable to handle kernel NULL pointer dereference at virtual address 00000008
  pgd = fffffe0000050000
  [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
  Internal error: Oops: 96000006 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
  task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
  PC is at __list_add+0x10/0xd4
  LR is at free_one_page+0x270/0x638
  ...
  Call trace:
    __list_add+0x10/0xd4
    free_one_page+0x26c/0x638
    __free_pages_ok.part.52+0x84/0xbc
    __free_pages+0x74/0xbc
    init_cma_reserved_pageblock+0xe8/0x104
    cma_init_reserved_areas+0x190/0x1e4
    do_one_initcall+0xc4/0x154
    kernel_init_freeable+0x204/0x2a8
    kernel_init+0xc/0xd4

This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
than MAX_ORDER.  This in turn causes accesses past zone->free_list[].

Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is bigger
than a MAX_ORDER page.

In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
â\80\9cpageblock_order > MAX_ORDERâ\80\9d condition will be optimised out since both
sides of the operator are constants.  In cases where pageblock size is
variable, the performance degradation should not be significant anyway
since init_cma_reserved_pageblock() is called only at boot time at most
MAX_CMA_AREAS times which by default is eight.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Tested-by: Christopher Covington <cov@codeaurora.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org> [3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Btrfs: fix crash when starting transaction

Often when starting a transaction we commit the currently running transaction,
which can end up writing block group caches when the current process has its
journal_info set to NULL (and not to a transaction). This makes our assertion
at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
in a crash/hang. Therefore fix it by setting journal_info.

Two different traces of this issue follow below.

1)

    [51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
    [51502.242213] ------------[ cut here ]------------
    [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
    [51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    (...)
    [51502.244010] Call Trace:
    [51502.244010]  [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
    [51502.244010]  [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
    [51502.244010]  [<ffffffffa0357a6a>] commit_cowonly_roots+0x164/0x226 [btrfs]
    [51502.244010]  [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
    [51502.244010]  [<ffffffff8168ec7b>] ? _raw_spin_unlock+0x2b/0x40
    [51502.244010]  [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs]
    [51502.244010]  [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs]
    [51502.244010]  [<ffffffffa02d73e1>] __unlink_start_trans+0x31/0xe0 [btrfs]
    [51502.244010]  [<ffffffffa02dea67>] btrfs_unlink+0x37/0xc0 [btrfs]
    [51502.244010]  [<ffffffff811bb054>] ? do_unlinkat+0x114/0x2a0
    [51502.244010]  [<ffffffff811baebc>] vfs_unlink+0xcc/0x150
    [51502.244010]  [<ffffffff811bb1a0>] do_unlinkat+0x260/0x2a0
    [51502.244010]  [<ffffffff811a9ef4>] ? filp_close+0x64/0x90
    [51502.244010]  [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0
    [51502.244010]  [<ffffffff81349cab>] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [51502.244010]  [<ffffffff811be9eb>] SyS_unlinkat+0x1b/0x40
    [51502.244010]  [<ffffffff81698452>] system_call_fastpath+0x16/0x1b
    [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
    [51502.244010] RIP  [<ffffffffa03575da>] assfail.constprop.88+0x1e/0x20 [btrfs]

2)

    [25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
    [25405.097488] ------------[ cut here ]------------
    [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
    [25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    (...)
    [25405.100008] Call Trace:
    [25405.100008]  [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
    [25405.100008]  [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
    [25405.100008]  [<ffffffffa035755a>] commit_cowonly_roots+0x164/0x226 [btrfs]
    [25405.100008]  [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
    [25405.100008]  [<ffffffff8109c170>] ? bit_waitqueue+0xc0/0xc0
    [25405.100008]  [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs]
    [25405.100008]  [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs]
    [25405.100008]  [<ffffffffa02e3407>] btrfs_create+0x47/0x210 [btrfs]
    [25405.100008]  [<ffffffffa02d74cc>] ? btrfs_permission+0x3c/0x80 [btrfs]
    [25405.100008]  [<ffffffff811bc63b>] vfs_create+0x9b/0x130
    [25405.100008]  [<ffffffff811bcf19>] do_last+0x849/0xe20
    [25405.100008]  [<ffffffff811b9409>] ? link_path_walk+0x79/0x820
    [25405.100008]  [<ffffffff811bd5b5>] path_openat+0xc5/0x690
    [25405.100008]  [<ffffffff810ab07d>] ? trace_hardirqs_on+0xd/0x10
    [25405.100008]  [<ffffffff811cdcd2>] ? __alloc_fd+0x32/0x1d0
    [25405.100008]  [<ffffffff811be2a3>] do_filp_open+0x43/0xa0
    [25405.100008]  [<ffffffff811cddf1>] ? __alloc_fd+0x151/0x1d0
    [25405.100008]  [<ffffffff811abcfc>] do_sys_open+0x13c/0x230
    [25405.100008]  [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0
    [25405.100008]  [<ffffffff811abe12>] SyS_open+0x22/0x30
    [25405.100008]  [<ffffffff81698452>] system_call_fastpath+0x16/0x1b
    [25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
    [25405.100008] RIP  [<ffffffffa03570ca>] assfail.constprop.88+0x1e/0x20 [btrfs]

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix btrfs_print_leaf for skinny metadata

We wouldn't actuall print the extent information if we had a skinny metadata
item, this fixes that. Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix race of using total_bytes_pinned

This percpu counter @total_bytes_pinned is introduced to skip unnecessary
operations of 'commit transaction', it accounts for those space we may free
but are stuck in delayed refs.

And we zero out @space_info->total_bytes_pinned every transaction period so
we have a better idea of how much space we'll actually free up by committing
this transaction. However, we do the 'zero out' part a little earlier, before
we actually unpin space, so we end up returning ENOSPC when we actually have
free space that's just unpinned from committing transaction.

xfstests/generic/074 complained then.

This fixes it by actually accounting the percpu pinned number when 'unpin',
and since it's protected by space_info->lock, the race is gone now.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: use E2BIG instead of EIO if compression does not help

Return codes got updated in 4cafa37b17092862546a2e859094a09d2586751f
(btrfs: return errno instead of -1 from compression)
lzo wrapper returns E2BIG in this case, do the same for zlib.

Signed-off-by: David Sterba <dsterba@suse.cz>

btrfs: remove stale comment from btrfs_flush_all_pending_stuffs

Commit 2c0ef5780037660d08ca07810eae7e583e2ae315 (Btrfs: rework qgroup
accounting) removed the qgroup accounting after delayed refs.

Signed-off-by: David Sterba <dsterba@suse.cz>

Btrfs: fix use-after-free when cloning a trailing file hole

The transaction handle was being used after being freed.

Cc: Chris Mason <clm@fb.com>
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: fix null pointer dereference in btrfs_show_devname when name is null

dev->name is null but missing flag is not set.
Strictly speaking the missing flag should have been set, but there
are more places where code just checks if name is null. For now this
patch does the same.

stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000064
IP: [<ffffffffa0228908>] btrfs_show_devname+0x58/0xf0 [btrfs]

[<ffffffff81198879>] show_vfsmnt+0x39/0x130
[<ffffffff81178056>] m_show+0x16/0x20
[<ffffffff8117d706>] seq_read+0x296/0x390
[<ffffffff8115aa7d>] vfs_read+0x9d/0x160
[<ffffffff8115b549>] SyS_read+0x49/0x90
[<ffffffff817abe52>] system_call_fastpath+0x16/0x1b

reproducer:
mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
btrfstune -S 1 /dev/sdg1
modprobe -r btrfs && modprobe btrfs
mount -o degraded /dev/sdg1 /btrfs
btrfs dev add /dev/sdg3 /btrfs

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: fix null pointer dereference in clone_fs_devices when name is null

when one of the device path is missing btrfs_device name is null. So this
patch will check for that.

stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff812e18c0>] strlen+0x0/0x30
[<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs]
[<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs]
[<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0
[<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs]
[<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0
[<ffffffff81192a61>] ? __blkdev_put+0x171/0x180
[<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590
[<ffffffff81193426>] ? blkdev_put+0x106/0x110
[<ffffffff81179175>] ? mntput+0x35/0x40
[<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0
[<ffffffff8115c72e>] ? ____fput+0xe/0x10
[<ffffffff81068033>] ? task_work_run+0xb3/0xd0
[<ffffffff8116d547>] SyS_ioctl+0x57/0x90
[<ffffffff817a793e>] ? do_page_fault+0xe/0x10
[<ffffffff817abe52>] system_call_fastpath+0x16/0x1b

reproducer:
mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
btrfstune -S 1 /dev/sdg1
modprobe -r btrfs && modprobe btrfs
mount -o degraded /dev/sdg1 /btrfs
btrfs dev add /dev/sdg3 /btrfs

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>

btrfs: fix nossd and ssd_spread mount option regression

The commit

6be1e5c btrfs: Cleanup the btrfs_parse_options for remount.

broke ssd options quite badly; it stopped making ssd_spread
imply ssd, and it made "nossd" unsettable.

Put things back at least as well as they were before
(though ssd mount option handling is still pretty odd:
# mount -o "nossd,ssd_spread" works?)

Reported-by: Roman Mamedov <rm@romanrm.net>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>

Btrfs: fix race between balance recovery and root deletion

Balance recovery is called when RW mounting or remounting from
RO to RW, it is called to finish roots merging.

When doing balance recovery, relocation root's corresponding
fs root(whose root refs is 0) might be destroyed by cleaner
thread, this will make btrfs fail to mount.

Fix this problem by holding @cleaner_mutex when doing balance
recovery.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>