Merge tag 'spi-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"Business as usual for SPI - some new drivers, lots of fixes and
updates to existing drivers plus some new framework features. Notable
changes are:
- Support for dual and quad data lines, commonly used by flash chips
to improve performance, from Wang Yuhang.
- Factored out a common pattern for runtime PM implementation into
the core saving a bunch of code.
- A particularly nice set of updates to the ep93xx driver from
H Hartley Sweeten, modernising it and reducing the code size a lot.
- New drivers for Blackfin v3, EFM32, Freescale DSPI and TI QSPI"
* tag 'spi-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (133 commits)
spi/qspi: fix missing unlock on error in ti_qspi_start_transfer_one()
spi: quad: fix the name of DT property
spi: core: Fix spi_register_master error handling
spi: efm32: Fix build error
spi: altera: Use DIV_ROUND_UP to calculate hw->bytes_per_word
spi: rspi: Add spi_master_get() call to prevent use after free
spi: quad: Make DT properties optional
spi: quad: Fix missing return
spi: Use dev_get_drvdata at appropriate places
spi: use dev_get_platdata()
spi: nuc900: Fix mode_bits setting
spi: simplify devm_request_mem_region/devm_ioremap
spi: altera: Simplify altera_spi_txrx implementation for noirq case
spi: spi-rspi: fix inconsistent spin_lock_irqsave
spi/qspi: Add compatible string for am4372.
spi/qspi: Fix device table entry
spi/sirf: fix the misunderstanding about len of spi_transfer
spi/qspi: Add dual/quad spi read support
spi: sirf: fix error return code in spi_sirfsoc_probe()
spi: bcm2835: Add spi_master_get() call to prevent use after free
...
Merge tag 'regmap-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"A quiet release for regmap, some cleanups, fixes and:
- Improved node coalescing for rbtree, reducing memory usage and
improving performance during syncs.
- Support for registering multiple register patches.
- A quirk for handling interrupts that need to be clear when masked
in regmap-irq"
* tag 'regmap-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: rbtree: Make cache_present bitmap per node
regmap: rbtree: Reduce number of nodes, take 2
regmap: rbtree: Simplify adjacent node look-up
regmap: debugfs: Fix continued read from registers file
regcache-rbtree: Fix reg_stride != 1
regmap: Allow multiple patches to be registered
regmap: regcache: allow read-only regs to be cached
regmap: fix regcache_reg_present() for empty cache
regmap: core: allow a virtual range to cover its own data window
regmap: irq: document mask/wake_invert flags
regmap: irq: make flags bool and put them in a bitfield
regmap: irq: Allow to acknowledge masked interrupts during initialization
regmap: Provide __acquires/__releases annotations
Merge lockref infrastructure code by me and Waiman Long.
I already merged some of the preparatory patches that didn't actually do
any semantic changes earlier, but this merges the actual _reason_ for
those preparatory patches.
The "lockref" structure is a combination "spinlock and reference count"
that allows optimized reference count accesses. In particular, it
guarantees that the reference count will be updated AS IF the spinlock
was held, but using atomic accesses that cover both the reference count
and the spinlock words, we can often do the update without actually
having to take the lock.
This allows us to avoid the nastiest cases of spinlock contention on
large machines under heavy pathname lookup loads. When updating the
dentry reference counts on a large system, we'll still end up with the
cache line bouncing around, but that's much less noticeable than
actually having to spin waiting for the lock.
* lockref:
lockref: implement lockless reference count updates using cmpxchg()
lockref: uninline lockref helper functions
vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()
vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()
lockref: add 'lockref_get_or_lock() helper
lockref: implement lockless reference count updates using cmpxchg()
Instead of taking the spinlock, the lockless versions atomically check
that the lock is not taken, and do the reference count update using a
cmpxchg() loop. This is semantically identical to doing the reference
count update protected by the lock, but avoids the "wait for lock"
contention that you get when accesses to the reference count are
contended.
Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
Even when the lockref reference counts are updated atomically with
cmpxchg, the fact that they also verify the state of the spinlock means
that the lockless updates can never happen while somebody else holds the
spinlock.
So while "lockref_put_or_lock()" looks a lot like just another name for
"atomic_dec_and_lock()", and both optimize to lockless updates, they are
fundamentally different: the decrement done by atomic_dec_and_lock() is
truly independent of any lock (as long as it doesn't decrement to zero),
so a locked region can still see the count change.
The lockref structure, in contrast, really is a *locked* reference
count. If you hold the spinlock, the reference count will be stable and
you can modify the reference count without using atomics, because even
the lockless updates will see and respect the state of the lock.
In order to enable the cmpxchg lockless code, the architecture needs to
do three things:
(1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
in an aligned u64, and have a "cmpxchg()" implementation that works
on such a u64 data type.
(2) define a helper function to test for a spinlock being unlocked
("arch_spin_value_unlocked()")
(3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
Kconfig file.
This enables it for x86-64 (but not 32-bit, we'd need to make sure
cmpxchg() turns into the proper cmpxchg8b in order to enable it for
32-bit mode).
They aren't very good to inline, since they already call external
functions (the spinlock code), and we're going to create rather more
complicated versions of them that can do the reference count updates
locklessly.
vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()
This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
and re-implements it using the lockref infrastructure instead. It also
adds a lot of comments about what is actually going on, because turning
a dentry that was looked up using RCU into a long-lived reference
counted entry is one of the more subtle parts of the rcu walk.
We also used to be _particularly_ subtle in unlazy_walk() where we
re-validate both the dentry and its parent using the same sequence
count. We used to do it by nesting the locks and then verifying the
sequence count just once.
That was silly, because nested locking is expensive, but the sequence
count check is not. So this just re-validates the dentry and the parent
separately, avoiding the nested locking, and making the lockref lookup
possible.
Acked-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Waiman Long [Mon, 2 Sep 2013 18:29:22 +0000 (11:29 -0700)]
vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()
A valid parent pointer is always going to have a non-zero reference
count, but if we look up the parent optimistically without locking, we
have to protect against the (very unlikely) race against renaming
changing the parent from under us.
We do that by using lockref_get_not_zero(), and then re-checking the
parent pointer after getting a valid reference.
[ This is a re-implementation of a chunk from the original patch by
Waiman Long: "dcache: Enable lockless update of dentry's refcount".
I've completely rewritten the patch-series and split it up, but I'm
attributing this part to Waiman as it's close enough to his earlier
patch - Linus ]
Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This behaves like "lockref_get_not_zero()", but instead of doing nothing
if the count was zero, it returns with the lock held.
This allows callers to revalidate the lockref-protected data structure
if required even if the count was zero to begin with, and possibly
increment the count if it passes muster.
In particular, the dentry code wants this when it wants to turn an
RCU-protected dentry into a stable refcounted one: if the dentry count
it zero, but the sequence number still validates the dentry, we can take
a reference to it.
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"This is a bug fix for the pm80xx driver. It turns out that when the
new hardware support was added in 3.10 the IO command size was kept at
the old hard coded value. This means that the driver attaches to some
new cards and then simply hangs the system"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] pm80xx: fix Adaptec 71605H hang
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot fix from Peter Anvin:
"A single very small boot fix for very large memory systems (> 0.5T)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM
The previous property name spi-tx-nbits and spi-rx-nbits looks not
human-readable. To make it consistent with other devices, using property
name spi-tx-bus-width and spi-rx-bus-width instead of the previous one
specify the number of data wires that spi controller will work in.
Add the specification in spi-bus.txt.
Signed-off-by: wangyuhang <wangyuhang2014@gmail.com> Signed-off-by: Mark Brown <broonie@linaro.org>
Axel Lin [Sat, 31 Aug 2013 12:25:52 +0000 (20:25 +0800)]
spi: core: Fix spi_register_master error handling
In the case spi_master_initialize_queue() fails, current code calls
device_unregister() before return error from spi_register_master().
However, all the drivers call spi_master_put() in the error path if
spi_register_master() fails. Thus we should call device_del() rather than
device_unregister() before return error from spi_register_master().
This also makes all the spi_register_master() error handling consistent,
because all other error paths of spi_register_master() expect drivers to
call spi_master_put() if spi_register_master() fails.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@linaro.org>
Axel Lin [Thu, 29 Aug 2013 15:41:20 +0000 (23:41 +0800)]
spi: altera: Use DIV_ROUND_UP to calculate hw->bytes_per_word
The Altera SPI hardware can be configured to support data width from 1 to 32
since Quartus II 8.1. To avoid truncation by integer division, use DIV_ROUND_UP
to calculate hw->bytes_per_word.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Acked-by: Thomas Chou <thomas@wytron.com.tw> Signed-off-by: Mark Brown <broonie@linaro.org>
Axel Lin [Sat, 31 Aug 2013 11:42:56 +0000 (19:42 +0800)]
spi: rspi: Add spi_master_get() call to prevent use after free
In rspi_remove(), current code dereferences rspi after spi_unregister_master(),
thus add an extra spi_master_get() call is necessary to prevent use after free.
Current code already has an extra spi_master_put() call in rspi_remove(), so
this patch just adds a spi_master_get() call rather than a spi_master_get() with
spi_master_put() calls.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@linaro.org>
1) There was a simplification in the ipv6 ndisc packet sending
attempted here, which avoided using memory accounting on the
per-netns ndisc socket for sending NDISC packets. It did fix some
important issues, but it causes regressions so it gets reverted here
too. Specifically, the problem with this change is that the IPV6
output path really depends upon there being a valid skb->sk
attached.
The reason we want to do this change in some form when we figure out
how to do it right, is that if a device goes down the ndisc_sk
socket send queue will fill up and block NDISC packets that we want
to send to other devices too. That's really bad behavior.
Hopefully Thomas can come up with a better version of this change.
2) Fix a severe TCP performance regression by reverting a change made
to dev_pick_tx() quite some time ago. From Eric Dumazet.
3) TIPC returns wrongly signed error codes, fix from Erik Hugne.
4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the
skb->sk too early. Fix from Li Hongjun.
5) RAW ipv4 sockets can use the wrong routing key during lookup, from
Chris Clark.
6) Similar to #1 revert an older change that tried to use plain
alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner
mark which needs to see the skb->sk for such frames. From Phil
Oester.
7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz,
specifically in the handling of virtual functions.
8) IPSEC path error propagations to sockets is not done properly when
we have v4 in v6, and v6 in v4 type rules. Fix from Hannes Frederic
Sowa.
9) Fix missing channel context release in mac80211, from Johannes Berg.
10) Fix network namespace handing wrt. SCM_RIGHTS, from Andy
Lutomirski.
11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic
drivers. From Michal Schmidt.
12) Hopefully a complete and correct fix for the genetlink dump locking
and module reference counting. From Pravin B Shelar.
13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir.
14) Fix handling of timestamp offset when restoring a snapshotted TCP
socket. From Andrew Vagin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: fec: fix time stamping logic after napi conversion
net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
mISDN: return -EINVAL on error in dsp_control_req()
net: revert 06087dd51ea ("net: dev_pick_tx() fix")
Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages"
ipv4 tunnels: fix an oops when using ipip/sit with IPsec
tipc: set sk_err correctly when connection fails
tcp: tcp_make_synack() should use sock_wmalloc
bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones
ipv6: Don't depend on per socket memory for neighbour discovery messages
ipv4: sendto/hdrincl: don't use destination address found in header
tcp: don't apply tsoffset if rcv_tsecr is zero
tcp: initialize rcv_tstamp for restored sockets
net: xilinx: fix memleak
net: usb: Add HP hs2434 device to ZLP exception table
net: add cpu_relax to busy poll loop
net: stmmac: fixed the pbl setting with DT
genl: Hold reference on correct module while netlink-dump.
genl: Fix genl dumpit() locking.
xfrm: Fix potential null pointer dereference in xdst_queue_output
...
Mark Brown [Fri, 30 Aug 2013 22:19:40 +0000 (23:19 +0100)]
spi: quad: Make DT properties optional
The addition SPI quad support made the DT properties mandatory, breaking
compatibility with existing systems. Fix that by making them optional,
also improving the error messages while we're at it.
Signed-off-by: Mark Brown <broonie@linaro.org> Tested-by: Stephen Warren <swarren@nvidia.com>
Linus Torvalds [Sat, 31 Aug 2013 00:05:02 +0000 (17:05 -0700)]
Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains two Oops fixes (opti9xx and HD-audio) and a simple fixup
for an Acer laptop. All marked as stable patches"
* tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: opti9xx: Fix conflicting driver object name
ALSA: hda - Fix NULL dereference with CONFIG_SND_DYNAMIC_MINORS=n
ALSA: hda - Add inverted digital mic fixup for Acer Aspire One
Linus Torvalds [Fri, 30 Aug 2013 23:18:59 +0000 (16:18 -0700)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Two straggling fixes that I had missed as they were posted a couple of
weeks ago, causing problems with interrupts (breaking them completely)
on the CSR SiRF platforms"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm: prima2: drop nr_irqs in mach as we moved to linear irqdomain
irqchip: sirf: move from legacy mode to linear irqdomain
Linus Torvalds [Fri, 30 Aug 2013 23:17:10 +0000 (16:17 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Since we are getting to the pointy end, one i915 black screen on some
machines, and one vmwgfx stop userspace ability to nuke the VM,
There might be one or two ati or nouveau fixes trickle in before
final, but I think this should pretty much be it"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/vmwgfx: Split GMR2_REMAP commands if they are to large
drm/i915: ivb: fix edp voltage swing reg val
Linus Torvalds [Fri, 30 Aug 2013 23:15:52 +0000 (16:15 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input layer updates from Dmitry Torokhov:
"Just a couple of new IDs in Wacom and xpad drivers, i8042 is now
disabled on ARC, and data checks in Elantech driver that were overly
relaxed by the previous patch are now tightened"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - disable the driver on ARC platforms
Input: xpad - add signature for Razer Onza Classic Edition
Input: elantech - fix packet check for v3 and v4 hardware
Input: wacom - add support for 0x300 and 0x301
Richard Cochran [Fri, 30 Aug 2013 18:28:10 +0000 (20:28 +0200)]
net: fec: fix time stamping logic after napi conversion
Commit e1190862 "net: fec: add napi support to improve proformance"
converted the fec driver to the napi model. However, that commit
forgot to remove the call to skb_defer_rx_timestamp which is only
needed in non-napi drivers.
(The function napi_gro_receive eventually calls netif_receive_skb,
which in turn calls skb_defer_rx_timestamp.)
This patch should also be applied to the 3.9 and 3.10 kernels.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 29 Aug 2013 21:55:05 +0000 (23:55 +0200)]
net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
While looking into MLDv1/v2 code, I noticed that bridging code does
not convert it's max delay into jiffies for MLDv2 messages as we do
in core IPv6' multicast code.
RFC3810, 5.1.3. Maximum Response Code says:
The Maximum Response Code field specifies the maximum time allowed
before sending a responding Report. The actual time allowed, called
the Maximum Response Delay, is represented in units of milliseconds,
and is derived from the Maximum Response Code as follows: [...]
As we update timers that work with jiffies, we need to convert it.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 06087dd51eaf8c ("net: dev_pick_tx() fix") and commit b1d3022e4097 ("bonding: refine IFF_XMIT_DST_RELEASE capability")
are quite incompatible : Queue selection is disabled because skb
dst was dropped before entering bonding device.
This causes major performance regression, mainly because TCP packets
for a given flow can be sent to multiple queues.
This is particularly visible when using the new FQ packet scheduler
with MQ + FQ setup on the slaves.
We can safely revert the first commit now that 6fa22e0c261dc
("net: Split core bits of netdev_pick_tx into __netdev_pick_tx")
properly caps the queue_index.
Reported-by: Xi Wang <xii@google.com> Diagnosed-by: Xi Wang <xii@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It seems to cause regressions, and in particular the output path
really depends upon there being a socket attached to skb->sk for
checks such as sk_mc_loop(skb->sk) for example. See ip6_output_finish2().
Reported-by: Stephen Warren <swarren@wwwdotorg.org> Reported-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Li Hongjun [Wed, 28 Aug 2013 09:54:50 +0000 (11:54 +0200)]
ipv4 tunnels: fix an oops when using ipip/sit with IPsec
Since commit af1b379eaf20 (ip_tunnel: push generic protocol handling to
ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on
an IPv4 over IPv4 tunnel.
xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But
this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb).
Signed-off-by: Li Hongjun <hongjun.li@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne [Wed, 28 Aug 2013 07:29:58 +0000 (09:29 +0200)]
tipc: set sk_err correctly when connection fails
Should a connect fail, if the publication/server is unavailable or
due to some other error, a positive value will be returned and errno
is never set. If the application code checks for an explicit zero
return from connect (success) or a negative return (failure), it
will not catch the error and subsequent send() calls will fail as
shown from the strace snippet below.
Phil Oester [Tue, 27 Aug 2013 23:41:40 +0000 (16:41 -0700)]
tcp: tcp_make_synack() should use sock_wmalloc
In commit f31432a8 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed
the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so,
the netfilter owner match lost its ability to block the SYNACK packet on
outbound listening sockets. Revert the change, restoring the owner match
functionality.
This closes netfilter bugzilla #847.
Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Fri, 30 Aug 2013 15:28:17 +0000 (17:28 +0200)]
bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones
Currently we would still potentially suffer multicast packet loss if there
is just either an IGMP or an MLD querier: For the former case, we would
possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is
because we are currently assuming that if either an IGMP or MLD querier
is present that the other one is present, too.
This patch makes the behaviour and fix added in
"bridge: disable snooping if there is no querier" (adc2c3e4165b)
to also work if there is either just an IGMP or an MLD querier on the
link: It refines the deactivation of the snooping to be protocol
specific by using separate timers for the snooped IGMP and MLD queries
as well as separate timers for our internal IGMP and MLD queriers.
Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 30 Aug 2013 00:03:48 +0000 (17:03 -0700)]
Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"During the percpu reference counting update which was merged during
v3.11-rc1, the cgroup destruction path was updated so that a cgroup in
the process of dying may linger on the children list, which was
necessary as the cgroup should still be included in child/descendant
iteration while percpu ref is being killed.
Unfortunately, I forgot to update cgroup destruction path accordingly
and cgroup destruction may fail spuriously with -EBUSY due to
lingering dying children even when there's no live child left - e.g.
"rmdir parent/child parent" will usually fail.
This can be easily fixed by iterating through the children list to
verify that there's no live child left. While this is very late in
the release cycle, this bug is very visible to userland and I believe
the fix is relatively safe.
Thanks Hugh for spotting and providing fix for the issue"
* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix rmdir EBUSY regression in 3.11
Linus Torvalds [Fri, 30 Aug 2013 00:02:48 +0000 (17:02 -0700)]
Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"This contains one fix which could lead to system-wide lockup on
!PREEMPT kernels. It's very late in the cycle but this definitely is
a -stable material.
The problem is that workqueue worker tasks may process unlimited
number of work items back-to-back without every yielding inbetween.
This usually isn't noticeable but a work item which re-queues itself
waiting for someone else to do something can deadlock with
stop_machine. stop_machine will ensure nothing else happens on all
other cpus and the requeueing work item will reqeueue itself
indefinitely without ever yielding and thus preventing the CPU from
entering stop_machine.
Kudos to Jamie Liu for spotting and diagnosing the problem. This can
be trivially fixed by adding cond_resched() after processing each work
item"
* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: cond_resched() after processing each work item
Dave Airlie [Thu, 29 Aug 2013 23:02:57 +0000 (09:02 +1000)]
Merge tag 'drm-intel-fixes-2013-08-30' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Just a one-line patch to fix a black screen issue on rare ivb machines,
cc: stable. Normally I'd just shovel this into the -next pull request this
late in the -rc cycle, but Linus was making noises about not getting real
fixes which are cc: stable. So here we go ;-)
* tag 'drm-intel-fixes-2013-08-30' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: ivb: fix edp voltage swing reg val
This fixes eDP link-training failures and cases where all voltage swing
/pre-emphasis levels were tried and failed during clock recovery and -
as a fallback - we go on to do channel equalization with the last voltage
swing/pre-emphasis level which will succeed. Both issues can lead to a
blank screen.
v2:
- improve commit message
CC: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64880 Tested-by: Jeremy Moles <cubicool@gmail.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
David S. Miller [Thu, 29 Aug 2013 20:05:30 +0000 (16:05 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
This pull request fixes some issues that arise when 6in4 or 4in6 tunnels
are used in combination with IPsec, all from Hannes Frederic Sowa and a
null pointer dereference when queueing packets to the policy hold queue.
1) We might access the local error handler of the wrong address family if
6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer
to the correct local_error to xfrm_state_afinet.
2) Add a helper function to always refer to the correct interpretation
of skb->sk.
3) Call skb_reset_inner_headers to record the position of the inner headers
when adding a new one in various ipv6 tunnels. This is needed to identify
the addresses where to send back errors in the xfrm layer.
4) Dereference inner ipv6 header if encapsulated to always call the
right error handler.
5) Choose protocol family by skb protocol to not call the wrong
xfrm{4,6}_local_error handler in case an ipv6 sockets is used
in ipv4 mode.
6) Partly revert "xfrm: introduce helper for safe determination of mtu"
because this introduced pmtu discovery problems.
7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs.
We need this to get the correct mtu informations in xfrm.
8) Fix null pointer dereference in xdst_queue_output.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Tue, 27 Aug 2013 23:07:25 +0000 (01:07 +0200)]
ipv6: Don't depend on per socket memory for neighbour discovery messages
Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.
If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.
The number of neighbour discovery messages sent is very limited,
simply use alloc_skb() and don't depend on any socket wmem space any
longer.
This patch has orginally been posted by Eric Dumazet in a modified
form.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Chris Clark <chris.clark@alcatel-lucent.com> Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com> Tested-by: Chris Clark <chris.clark@alcatel-lucent.com> Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Vagin [Tue, 27 Aug 2013 08:21:55 +0000 (12:21 +0400)]
tcp: don't apply tsoffset if rcv_tsecr is zero
The zero value means that tsecr is not valid, so it's a special case.
tsoffset is used to customize tcp_time_stamp for one socket.
tsoffset is usually zero, it's used when a socket was moved from one
host to another host.
Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to
incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to
TCP_RTO_MAX.
Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Barry Song [Tue, 6 Aug 2013 05:37:13 +0000 (13:37 +0800)]
irqchip: sirf: move from legacy mode to linear irqdomain
the series of patches for irqdomain core in 3.11 has broken sirf
irq which uses legacy mapping. all users fail in the new kernel
while setupping irq.
this patch moves to linear irqdomain and drop old legacy irqdomain
codes since we don't need it any more, and at the same time, it
also fixes the broken interrupts of sirfsoc in 3.11.
on the other hand, we actually only have 64 interrupt sources for
prima2 and atlas6, but there are 128 interrupt souces for marco
which uses GIC. in the legacy codes, sirf gpio also uses legacy
irqdomain, so to make gpio interrupt mapping not depend on the
prima2/atlas6/marco an use unified marco,we enlarge prima2/atlas6
interrupt number to 128. here we don't need this workaround any
more as sirf gpio also moved to linear mode before. so we move
SIRFSOC_NUM_IRQS back to 64 too.
Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Olof Johansson <olof@lixom.net>
Hugh Dickins [Wed, 28 Aug 2013 23:31:23 +0000 (16:31 -0700)]
cgroup: fix rmdir EBUSY regression in 3.11
On 3.11-rc we are seeing cgroup directories left behind when they should
have been removed. Here's a trivial reproducer:
cd /sys/fs/cgroup/memory
mkdir parent parent/child; rmdir parent/child parent
rmdir: failed to remove `parent': Device or resource busy
It's because cgroup_destroy_locked() (step 1 of destruction) leaves
cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
destruction) remove it; but step 2 is run by work queue, which may not
yet have removed the children when parent destruction checks the list.
Fix that by checking through a non-empty list of children: if every one
of them has already been marked CGRP_DEAD, then it's safe to proceed:
those children are invisible to userspace, and should not obstruct rmdir.
(I didn't see any reason to keep the cgrp->children checks under the
unrelated css_set_lock, so moved them out.)
tj: Flattened nested ifs a bit and updated comment so that it's
correct on both for-3.11-fixes and for-3.12.
Tejun Heo [Wed, 28 Aug 2013 21:33:37 +0000 (17:33 -0400)]
workqueue: cond_resched() after processing each work item
If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine. Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs. The two would deadlock.
Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports. With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.
Fix it by invoking cond_resched() after executing each work item.
Axel Lin [Thu, 15 Aug 2013 06:11:21 +0000 (14:11 +0800)]
spi: nuc900: Fix mode_bits setting
The code in nuc900_slave_select() supports handling SPI_CS_HIGH.
Thus set SPI_CS_HIGH bit in master->mode_bits to make it work.
Otherwise, spi_setup() will return unsupported mode bits error message if
SPI_CS_HIGH is set in the mode field of struct spi_device.
Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@linaro.org>