perf annotate: Fix off by one symbol hist size allocation and hit accounting
We were not noticing it because symbol__inc_addr_samples was erroneously
dropping samples that hit the last byte in a function.
Working on a fix for a problem reported by David Miller, Stephane
Eranian and Sorin Dumitru, where addresses < sym->start were causing
problems, I noticed this other problem.
Cc: David Ahern <dsahern@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sorin Dumitru <dumitru.sorin87@gmail.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-pqjaq4cr1xs2xen73pjhbav4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: Finder fails to resolve function name to address
If DIE entries corresponding to declarations appear before definition
entry, probe finder returns error instead of continuing to look further
for a definition entry.
This patch ensures we reach to the DIE entry corresponding to the
definition and get the function address.
V2: A simpler solution based on Masami's suggestion.
Stephane Eranian [Sat, 17 Mar 2012 22:23:18 +0000 (23:23 +0100)]
perf tools: Fix bug in raw sample parsing
In perf_event__parse_sample(), the array variable was not incremented
by the amount of data used by the raw_data.
That was okay until we added PERF_SAMPLE_BRANCH_STACK which depends on
the array variable pointing to the beginning of the branch stack data.
But that was not the case if branch stack was combined with raw mode
sampling. That led to bogus branch stack addresses and count.
The bug would show up with:
$ perf record -R -b foo
This patch fixes the problem by correctly moving the array pointer
forward for RAW samples.
Signed-off-by: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120317222317.GA8803@quad
[ committer note: Fix also later submitted by Jiri Olsa ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Fix display of first level of callchains
The callchain stdio mode display was written using a sorted by symbol
report. In this mode we have only one callchain root per hist so we
forgot to handle cases where we have multiple callchain root, as in per
dso sorting for example.
Fix this by handling these roots like any other branch, with the hist as
the parent.
Single roots keep their entry without percentage because they have
the same overhead than the hist they refer to. ie: 100% in fractal
mode and the percentage of the hist in graph mode:
Several places were broken:
- hists data need to be collected into opened sessions instead
of into events
- session's hists data need to be initialized properly when the
session is created
- hist_entry__pcnt_snprintf: the percentage and displacement
buffer preparation must not use 'ret' because it's used
as a pointer to the final buffer
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120322133726.GB1601@m.brq.redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 20 Mar 2012 18:15:40 +0000 (19:15 +0100)]
perf tools: Fix modifier to be applied on correct events
The event modifier needs to be applied only on the event definition it
is attached to.
The current state is that in case of multiple events definition (in
single '-e' option, separated by ',') all will get modifier of the last
one.
Fixing this by adding separated list for each event definition, so the
modifier is applied only to proper event(s). Added automated test to
catch this, plus some other modifier tests.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1332267341-26338-3-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 20 Mar 2012 18:15:39 +0000 (19:15 +0100)]
perf tools: Fix various casting issues for 32 bits
- util/parse-events.c(parse_events_add_breakpoint)
need to use unsigned long instead of u64, otherwise
we get following gcc error on 32 bits:
error: cast from pointer to integer of different size
- util/header.c(print_event_desc)
cannot retype to signed type, otherwise we get following
gcc error on 32 bits:
error: comparison between signed and unsigned integer expressions
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1332267341-26338-2-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Dan Carpenter [Tue, 20 Mar 2012 16:58:06 +0000 (16:58 +0000)]
AFS: checking wrong bit in afs_readpages()
We should be testing "if (vnode->flags & (1 << 4))" instead of
"if (vnode->flags & 4) {". The current test checks if the data was
modified instead of deleted.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 20 Mar 2012 17:32:09 +0000 (10:32 -0700)]
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes for v3.4 from Ingo Molnar
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
ntp: Fix integer overflow when setting time
math: Introduce div64_long
cs5535-clockevt: Allow the MFGPT IRQ to be shared
cs5535-clockevt: Don't ignore MFGPT on SMP-capable kernels
x86/time: Eliminate unused irq0_irqs counter
clocksource: scx200_hrt: Fix the build
x86/tsc: Reduce the TSC sync check time for core-siblings
timer: Fix bad idle check on irq entry
nohz: Remove ts->Einidle checks before restarting the tick
nohz: Remove update_ts_time_stat from tick_nohz_start_idle
clockevents: Leave the broadcast device in shutdown mode when not needed
clocksource: Load the ACPI PM clocksource asynchronously
clocksource: scx200_hrt: Convert scx200 to use clocksource_register_hz
clocksource: Get rid of clocksource_calc_mult_shift()
clocksource: dbx500: convert to clocksource_register_hz()
clocksource: scx200_hrt: use pr_<level> instead of printk
time: Move common updates to a function
time: Reorder so the hot data is together
time: Remove most of xtime_lock usage in timekeeping.c
ntp: Add ntp_lock to replace xtime_locking
...
Linus Torvalds [Tue, 20 Mar 2012 17:31:44 +0000 (10:31 -0700)]
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes for v3.4 from Ingo Molnar
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
printk: Make it compile with !CONFIG_PRINTK
sched/x86: Fix overflow in cyc2ns_offset
sched: Fix nohz load accounting -- again!
sched: Update yield() docs
printk/sched: Introduce special printk_sched() for those awkward moments
sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
sched: Cleanup cpu_active madness
sched: Fix load-balance wreckage
sched: Clean up parameter passing of proc_sched_autogroup_set_nice()
sched: Ditch per cgroup task lists for load-balancing
sched: Rename load-balancing fields
sched: Move load-balancing arguments into helper struct
sched/rt: Do not submit new work when PI-blocked
sched/rt: Prevent idle task boosting
sched/wait: Add __wake_up_all_locked() API
sched/rt: Document scheduler related skip-resched-check sites
sched/rt: Use schedule_preempt_disabled()
sched/rt: Add schedule_preempt_disabled()
sched/rt: Do not throttle when PI boosting
sched/rt: Keep period timer ticking when rt throttling is active
...
Linus Torvalds [Tue, 20 Mar 2012 17:29:15 +0000 (10:29 -0700)]
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events changes for v3.4 from Ingo Molnar:
- New "hardware based branch profiling" feature both on the kernel and
the tooling side, on CPUs that support it. (modern x86 Intel CPUs
with the 'LBR' hardware feature currently.)
This new feature is basically a sophisticated 'magnifying glass' for
branch execution - something that is pretty difficult to extract from
regular, function histogram centric profiles.
The simplest mode is activated via 'perf record -b', and the result
looks like this in perf report:
$ perf record -b any_call,u -e cycles:u branchy
$ perf report -b --sort=symbol
52.34% [.] main [.] f1
24.04% [.] f1 [.] f3
23.60% [.] f1 [.] f2
0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
0.01% [k] _IO_vfprintf_internal [k] strchrnul
0.01% [k] __printf [k] _IO_vfprintf_internal
0.01% [k] main [k] __printf
This output shows from/to branch columns and shows the highest
percentage (from,to) jump combinations - i.e. the most likely taken
branches in the system. "branches" can also include function calls
and any other synchronous and asynchronous transitions of the
instruction pointer that are not 'next instruction' - such as system
calls, traps, interrupts, etc.
This feature comes with (hopefully intuitive) flat ascii and TUI
support in perf report.
- Various 'perf annotate' visual improvements for us assembly junkies.
It will now recognize function calls in the TUI and by hitting enter
you can follow the call (recursively) and back, amongst other
improvements.
- Multiple threads/processes recording support in perf record, perf
stat, perf top - which is activated via a comma-list of PIDs:
perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485
- Support for per UID views, via the --uid paramter to perf top, perf
report, etc. For example 'perf top --uid mingo' will only show the
tasks that I am running, excluding other users, root, etc.
- Jump label restructurings and improvements - this includes the
factoring out of the (hopefully much clearer) include/linux/static_key.h
generic facility:
struct static_key key = STATIC_KEY_INIT_FALSE;
...
if (static_key_false(&key))
do unlikely code
else
do likely code
The static_key_false() branch will be generated into the code with as
little impact to the likely code path as possible. the
static_key_slow_*() APIs flip the branch via live kernel code patching.
This facility can now be used more widely within the kernel to
micro-optimize hot branches whose likelihood matches the static-key
usage and fast/slow cost patterns.
- SW function tracer improvements: perf support and filtering support.
- Various hardenings of the perf.data ABI, to make older perf.data's
smoother on newer tool versions, to make new features integrate more
smoothly, to support cross-endian recording/analyzing workflows
better, etc.
- Restructuring of the kprobes code, the splitting out of 'optprobes',
and a corner case bugfix.
- Allow the tracing of kernel console output (printk).
- Improvements/fixes to user-space RDPMC support, allowing user-space
self-profiling code to extract PMU counts without performing any
system calls, while playing nice with the kernel side.
- 'perf bench' improvements
- ... and lots of internal restructurings, cleanups and fixes that made
these features possible. And, as usual this list is incomplete as
there were also lots of other improvements
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
perf report: Fix annotate double quit issue in branch view mode
perf report: Remove duplicate annotate choice in branch view mode
perf/x86: Prettify pmu config literals
perf report: Enable TUI in branch view mode
perf report: Auto-detect branch stack sampling mode
perf record: Add HEADER_BRANCH_STACK tag
perf record: Provide default branch stack sampling mode option
perf tools: Make perf able to read files from older ABIs
perf tools: Fix ABI compatibility bug in print_event_desc()
perf tools: Enable reading of perf.data files from different ABI rev
perf: Add ABI reference sizes
perf report: Add support for taken branch sampling
perf record: Add support for sampling taken branch
perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
x86/kprobes: Split out optprobe related code to kprobes-opt.c
x86/kprobes: Fix a bug which can modify kernel code permanently
x86/kprobes: Fix instruction recovery on optimized path
perf: Add callback to flush branch_stack on context switch
perf: Disable PERF_SAMPLE_BRANCH_* when not supported
perf/x86: Add LBR software filter support for Intel CPUs
...
Linus Torvalds [Tue, 20 Mar 2012 17:28:56 +0000 (10:28 -0700)]
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq/core changes for v3.4 from Ingo Molnar
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove paranoid warnons and bogus fixups
genirq: Flush the irq thread on synchronization
genirq: Get rid of unnecessary IRQTF_DIED flag
genirq: No need to check IRQTF_DIED before stopping a thread handler
genirq: Get rid of unnecessary irqaction field in task_struct
genirq: Fix incorrect check for forced IRQ thread handler
softirq: Reduce invoke_softirq() code duplication
genirq: Fix long-term regression in genirq irq_set_irq_type() handling
x86-32/irq: Don't switch to irq stack for a user-mode irq
Linus Torvalds [Tue, 20 Mar 2012 00:12:34 +0000 (17:12 -0700)]
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes for v3.4 from Ingo Molnar. The major features of this
series are:
- making RCU more aggressive about entering dyntick-idle mode in order
to improve energy efficiency
- converting a few more call_rcu()s to kfree_rcu()s
- applying a number of rcutree fixes and cleanups to rcutiny
- removing CONFIG_SMP #ifdefs from treercu
- allowing RCU CPU stall times to be set via sysfs
- adding CPU-stall capability to rcutorture
- adding more RCU-abuse diagnostics
- updating documentation
- fixing yet more issues located by the still-ongoing top-to-bottom
inspection of RCU, this time with a special focus on the CPU-hotplug
code path.
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
rcu: Stop spurious warnings from synchronize_sched_expedited
rcu: Hold off RCU_FAST_NO_HZ after timer posted
rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop
rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections
rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
rcu: Remove redundant check for rcu_head misalignment
PTR_ERR should be called before its argument is cleared.
rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep
rcu: Trace only after NULL-pointer check
rcu: Call out dangers of expedited RCU primitives
rcu: Rework detection of use of RCU by offline CPUs
lockdep: Add CPU-idle/offline warning to lockdep-RCU splat
rcu: No interrupt disabling for rcu_prepare_for_idle()
rcu: Move synchronize_sched_expedited() to rcutree.c
rcu: Check for illegal use of RCU from offlined CPUs
rcu: Update stall-warning documentation
rcu: Add CPU-stall capability to rcutorture
rcu: Make documentation give more realistic rcutorture duration
rcutorture: Permit holding off CPU-hotplug operations during boot
rcu: Print scheduling-clock information on RCU CPU stall-warning messages
...
Steven Rostedt [Tue, 20 Mar 2012 16:28:29 +0000 (12:28 -0400)]
tracing: Move the tracing_on/off() declarations into CONFIG_TRACING
The tracing_on/off() declarations were under CONFIG_RING_BUFFER, but
the functions are now only defined under CONFIG_TRACING as they are
specific to ftrace and not the ring buffer.
But the declarations were still defined under the ring buffer and
this caused the build to fail when CONFIG_RING_BUFFER was set but
CONFIG_TRACING was not.
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Tue, 20 Mar 2012 00:11:15 +0000 (17:11 -0700)]
Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core/locking changes for v3.4 from Ingo Molnar
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Simplify return logic
futex: Cover all PI opcodes with cmpxchg enabled check
Linus Torvalds [Tue, 20 Mar 2012 00:10:38 +0000 (17:10 -0700)]
Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core/iommu changes for v3.4 from Ingo Molnar
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/iommu/intel: Increase the number of iommus supported to MAX_IO_APICS
x86/iommu/intel: Fix identity mapping for sandy bridge
Linus Torvalds [Mon, 19 Mar 2012 23:37:28 +0000 (16:37 -0700)]
Merge branch 'dcache-word-accesses'
* branch 'dcache-word-accesses':
vfs: use 'unsigned long' accesses for dcache name comparison and hashing
This does the name hashing and lookup using word-sized accesses when
that is efficient, namely on x86 (although any little-endian machine
with good unaligned accesses would do).
It does very much depend on little-endian logic, but it's a very hot
couple of functions under some real loads, and this patch improves the
performance of __d_lookup_rcu() and link_path_walk() by up to about 30%.
Giving a 10% improvement on some very pathname-heavy benchmarks.
Because we do make unaligned accesses past the filename, the
optimization is disabled when CONFIG_DEBUG_PAGEALLOC is active, and we
effectively depend on the fact that on x86 we don't really ever have the
last page of usable RAM followed immediately by any IO memory (due to
ACPI tables, BIOS buffer areas etc).
Some of the bit operations we do are a bit "subtle". It's commented,
but you do need to really think about the code. Or just consider it
black magic.
Thanks to people on G+ for some of the optimized bit tricks.
Linus Torvalds [Mon, 19 Mar 2012 23:19:53 +0000 (16:19 -0700)]
vfs: get rid of batshit-insane pointless dentry hash calculations
For some odd historical reason, the final mixing round for the dentry
cache hash table lookup had an insane "xor with big constant" logic. In
two places.
The big constant that is being xor'ed is GOLDEN_RATIO_PRIME, which is a
fairly random-looking number that is designed to be *multiplied* with so
that the bits get spread out over a whole long-word.
But xor'ing with it is insane. It doesn't really even change the hash -
it really only shifts the hash around in the hash table. To make
matters worse, the insane big constant is different on 32-bit and 64-bit
builds, even though the name hash bits we use are always 32-bit (and the
bits from the pointer we mix in effectively are too).
It's all total voodoo programming, in other words.
Now, some testing and analysis of the hash chains shows that the rest of
the hash function seems to be fairly good. It does pick the right bits
of the parent dentry pointer, for example, and while it's generally a
bad idea to use an xor to mix down the upper bits (because if there is a
repeating pattern, the xor can cause "destructive interference"), it
seems to not have been a disaster.
For example, replacing the hash with the normal "hash_long()" code (that
uses the GOLDEN_RATIO_PRIME constant correctly, btw) actually just makes
the hash worse. The hand-picked hash knew which bits of the pointer had
the highest entropy, and hash_long() ends up mixing bits less optimally
at least in some trivial tests.
So the hash function overall seems fine, it just has that really odd
"shift result around by a constant xor".
So get rid of the silly xor, and replace the down-mixing of the bits
with an add instead of an xor that tends to not have the same kind of
destructive interference issues. Some stats on the resulting hash
chains shows that they look statistically identical before and after,
but the code is simpler and no longer makes you go "WTF?".
Also, the incoming hash really is just "unsigned int", not a long, and
there's no real point to worry about the high 26 bits of the dentry
pointer for the 64-bit case, because they are all going to be identical
anyway.
So also change the hashing to be done in the more natural 'unsigned int'
that is the real size of the actual hashed data anyway.
Pekka Enberg [Mon, 19 Mar 2012 18:13:29 +0000 (15:13 -0300)]
perf report: Add a simple GTK2-based 'perf report' browser
This patch adds a simple GTK2-based browser to 'perf report' that's
based on the TTY-based browser in builtin-report.c.
To launch "perf report" using the new GTK interface just type:
$ perf report --gtk
The interface is somewhat limited in features at the moment:
- No callgraph support
- No KVM guest profiling support
- No color coding for percentages
- No sorting from the UI
- ..and many, many more!
That said, I think this patch a reasonable start to build future features on.
Signed-off-by: Pekka Enberg <penberg@kernel.org> Cc: Colin Walters <walters@verbum.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202231952410.6689@tux.localdomain
[ committer note: Added #pragma to make gtk no strict prototype problem go
away as suggested by Colin Walters modulo avoiding push/pop ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jason Baron [Fri, 16 Mar 2012 20:34:03 +0000 (16:34 -0400)]
Don't limit non-nested epoll paths
Commit c4313053290c ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).
The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.
This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578
Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking changes from David Miller:
"1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to
crashes, particularly during shutdown. Reported by Dave Jones and
fixed by Eric Dumazet.
2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have
already freed the SKB, which causes crashes as to the caller this
means requeue the packet. Fixes from Eric Dumazet.
3) usbnet driver doesn't allocate the right amount of headroom on
fresh RX SKBs, fix from Eric Dumazet.
4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it
abolutely should not take a reference to 'dev', this leads to
leaks. Fix from RonQing Li.
5) Fix netfilter ctnetlink race between delete and timeout expiration.
From Pablo Neira Ayuso.
6) Revert SFQ change which causes regressions, specifically queueing
to tail can lead to unavoidable flow starvation. From Eric
Dumazet.
7) Fix a memory leak and a crash on corrupt firmware files in bnx2x,
from Michal Schmidt."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netfilter: ctnetlink: fix race between delete and timeout expiration
ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
net/hyperv: fix erroneous NETDEV_TX_BUSY use
net/usbnet: reserve headroom on rx skbs
bnx2x: fix memory leak in bnx2x_init_firmware()
bnx2x: fix a crash on corrupt firmware file
sch_sfq: revert dont put new flow at the end of flows
ipv6: fix icmp6_dst_alloc()
Linus Torvalds [Sat, 17 Mar 2012 16:54:16 +0000 (09:54 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools, x86: Build perf on older user-space as well
perf tools: Use scnprintf where applicable
perf tools: Incorrect use of snprintf results in SEGV
netfilter: ctnetlink: fix race between delete and timeout expiration
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.
After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.
Tested-by: Kerin Millar <kerframil@gmail.com> Reported-by: Kerin Millar <kerframil@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
RongQing.Li [Thu, 15 Mar 2012 22:54:14 +0000 (22:54 +0000)]
ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.
[ bug introduced in 6d6255cb59c (ipv6: mcast: RCU conversions) ]
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 17 Mar 2012 00:14:55 +0000 (17:14 -0700)]
Merge branch 'akpm' (more patches from Andrew)
Merge some more email patches from Andrew Morton:
"A couple of nilfs fixes"
* emailed from Andrew Morton <akpm@linux-foundation.org>:
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
nilfs2: clamp ns_r_segments_percentage to [1, 99]
Ryusuke Konishi [Sat, 17 Mar 2012 00:08:39 +0000 (17:08 -0700)]
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:
Haogang Chen [Sat, 17 Mar 2012 00:08:38 +0000 (17:08 -0700)]
nilfs2: clamp ns_r_segments_percentage to [1, 99]
ns_r_segments_percentage is read from the disk. Bogus or malicious
value could cause integer overflow and malfunction due to meaningless
disk usage calculation. This patch reports error when mounting such
bogus volumes.
Linus Torvalds [Sat, 17 Mar 2012 00:04:02 +0000 (17:04 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull maintainer update from James Morris:
"Please pull this patch which adds Serge as maintainer of the
capabilities code, as discussed on lwn and the lsm list.
New capabilities must be signed off by the maintainer, and new uses of
any capabilities should at be cc'd to the maintainer."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
MAINTAINERS: Add Serge as maintainer of capabilities
Linus Torvalds [Sat, 17 Mar 2012 00:03:15 +0000 (17:03 -0700)]
Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
Pull c6x bugfix from Mark Salter:
"Remove dead code from entry.S which causes a build failure when using
a newer assembler (v2.22 complains about it, v2.20 ignores it)."
* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
C6X: remove dead code from entry.S
Namhyung Kim [Fri, 16 Mar 2012 08:50:55 +0000 (17:50 +0900)]
perf report: Treat an argument as a symbol filter
As Ingo requested, it'd be better off treating first (and the only)
argument as a symbol filter, so that user doesn't need to input the
symbol on the dialog window on TUI.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1331887855-874-5-git-send-email-namhyung.kim@lge.com Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:50:53 +0000 (17:50 +0900)]
perf ui browser: Add 's' key to filter by symbol name
Now user can enter symbol name interested via ui_browser__input_window,
and perf can process it using hists__filter_by_symbol(). Giving empty
symbol (by pressing 's' followed by ENTER) will disable the filtering.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1331887855-874-3-git-send-email-namhyung.kim@lge.com Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:42:20 +0000 (17:42 +0900)]
perf tools: Do not disable members of group event
When event group is enabled for forked task (i.e. no target task/cpu
was specified) all events were disabled and marked ->enable_on_exec.
However they wouldn't be counted at all since only group leader will
be enabled on exec actually.
In contrast to perf stat, perf record doesn't have a real problem
as it enables all the event before proceeding. But it needs to be
fixed anyway IMHO.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1331887340-32448-2-git-send-email-namhyung.kim@lge.com Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:42:19 +0000 (17:42 +0900)]
perf stat: Fix event grouping on forked task
When event group is enabled for forked task (i.e. no target task was
specified) all events were disabled and marked ->enable_on_exec.
However they are not counted at all since only group leader will be
enabled on exec actually. So the result looked like below:
Jiri Olsa [Thu, 15 Mar 2012 19:09:17 +0000 (20:09 +0100)]
perf tools: Add perf pmu object to access pmu format definition
Adding pmu object which provides interface to pmu's sysfs
event format definition located at:
${sysfs_mount}/bus/event_source/devices/${pmu}/format
Following interface is exported:
struct perf_pmu* perf_pmu__find(char *name);
- this function returns pmu object, which is then
passed as a handle to other interface functions
int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
struct list_head *head_terms);
- this function configures perf_event_attr struct based
on pmu's format definitions and config terms data,
containined in head_terms list.
Parser generator is used to retrive the pmu's format definition.
The generated parser is part of the patch. Added makefile rule
'pmu-parser' to generate the parser code out of the bison/flex
sources.
Added builtin test 'Test perf pmu format parsing', which could
be run like:
perf test pmu
At the moment the config options are hardcoded to be used for legacy
symbol events to define several perf_event_attr fields. It is:
'config' to define perf_event_attr::config
'config1' to define perf_event_attr::config1
'config2' to define perf_event_attr::config2
'period' to define perf_event_attr::sample_period
Legacy events could be now specified as:
cycles/period=100000/
If term is specified without the value assignment, then 1 is
assigned by default.
Added flex/bison files for event grammar parsing. The generated
parser is part of the patch. Added makefile rule 'event-parser'
to generate the parser code out of the bison/flex sources.
line: config ':' bits
config: 'config' | 'config1' | 'config2"
bits: bits ',' bit_term | bit_term
bit_term: VALUE '-' VALUE | VALUE
Adding format attribute definitions for x86 cpu pmus.
Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/n/tip-vhdk5y2hyype9j63prymty36@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mark Salter [Fri, 16 Mar 2012 13:27:57 +0000 (09:27 -0400)]
C6X: remove dead code from entry.S
The ENDPROC() on sys_fadvise64_c6x() in arch/c6x/kernel/entry.S is
outside of the conditional block with the matching ENTRY() macro. This
leads a newer (v2.22 vs. v2.20) assembler to complain:
/tmp/ccGZBaPT.s: Assembler messages:
/tmp/ccGZBaPT.s: Error: .size expression for sys_fadvise64_c6x does not evaluate to a constant
The conditional block became dead code when c6x switched to generic
unistd.h and should be removed along with the offending ENDPROC().
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: David Howells <dhowells@redhat.com>
Thomas Gleixner [Thu, 15 Mar 2012 21:55:21 +0000 (22:55 +0100)]
genirq: Remove paranoid warnons and bogus fixups
Alexander pointed out that the warnons in the regular exit path are
bogus and the thread_mask one actually could be triggered when
__setup_irq() hands out that thread_mask again after __free_irq()
dropped irq_desc->lock.
Thinking more about it, neither IRQTF_RUNTHREAD nor the bit in
thread_mask can be set as this is the regular exit path. We come here
due to:
__free_irq()
remove action from desc
synchronize_irq()
kthread_stop()
So synchronize_irq() makes sure that the thread finished running and
cleaned up both the thread_active count and thread_mask. After that
point nothing can set IRQTF_RUNTHREAD on this action. So the warnons
and the cleanups are pointless.
Eric Dumazet [Wed, 14 Mar 2012 09:21:44 +0000 (09:21 +0000)]
wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.
In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.
In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK
Also increments tx_dropped counter
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 06:56:25 +0000 (06:56 +0000)]
net/usbnet: reserve headroom on rx skbs
network drivers should reserve some headroom on incoming skbs so that we
dont need expensive reallocations, eg forwarding packets in tunnels.
This NET_SKB_PAD padding is done in various helpers, like
__netdev_alloc_skb_ip_align() in this patch, combining NET_SKB_PAD and
NET_IP_ALIGN magic.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Oliver Neukum <oneukum@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 15 Mar 2012 14:08:29 +0000 (14:08 +0000)]
bnx2x: fix memory leak in bnx2x_init_firmware()
When cycling the interface down and up, bnx2x_init_firmware() knows that
the firmware is already loaded, but nevertheless it allocates certain
arrays anew (init_data, init_ops, init_ops_offsets, iro_arr). The old
arrays are leaked.
Fix the leaks by returning early if the firmware was already loaded.
Because if the firmware is loaded, so are the arrays.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 15 Mar 2012 14:08:28 +0000 (14:08 +0000)]
bnx2x: fix a crash on corrupt firmware file
If the requested firmware is deemed corrupt and then released, reset the
pointer to NULL in order to avoid double-freeing it in
bnx2x_release_firmware() or dereferencing it in bnx2x_init_firmware().
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 13 Mar 2012 18:04:25 +0000 (18:04 +0000)]
sch_sfq: revert dont put new flow at the end of flows
This reverts commit 6245a8d877 (sch_sfq: dont put new flow at the end of
flows)
As Jesper found out, patch sounded great but has bad side effects.
In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.
It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.
A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.
Reported-by: Jesper Dangaard Brouer <jdb@comx.dk> Cc: Dave Taht <dave.taht@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 16 Mar 2012 00:16:22 +0000 (17:16 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge patches from Andrew Morton:
"Nine patches - some bug fixes and some MAINTAINERS fiddling."
* emailed from Andrew Morton <akpm@linux-foundation.org>:
drivers/video/backlight/s6e63m0.c: fix corruption storing gamma mode
MAINTAINERS: add entry for exynos mipi display drivers
MAINTAINERS: fix link to Gustavo Padovans tree
MAINTAINERS: add Johan to Bluetooth maintainers
MAINTAINERS: Gustavo has moved
prctl: use CAP_SYS_RESOURCE for PR_SET_MM option
rapidio/tsi721: fix bug in register offset definitions
MAINTAINERS: update ST's Mailing list for SPEAr
memcg: free mem_cgroup by RCU to fix oops
Linus Torvalds [Fri, 16 Mar 2012 00:14:35 +0000 (17:14 -0700)]
Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull i2c subsystem fixes from Jean Delvare.
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c-algo-bit: Fix spurious SCL timeouts under heavy load
i2c-core: Comment says "transmitted" but means "received"
Linus Torvalds [Fri, 16 Mar 2012 00:13:39 +0000 (17:13 -0700)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck.
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (zl6100) Enable interval between chip accesses for all chips
hwmon: (w83627ehf) Describe undocumented pwm attributes
hwmon: (w83627ehf) Fix temp2 source for W83627UHG
hwmon: (w83627ehf) Fix memory leak in probe function
hwmon: (w83627ehf) Fix writing into fan_stop_time for NCT6775F/NCT6776F
Linus Torvalds [Fri, 16 Mar 2012 00:07:25 +0000 (17:07 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm exynos/intel updates from Dave Airlie:
"Two minor updates from Jesse for Intel SNB fixes, and a few fixes from
Samsung for exynos. The pull req has Alan's commit in it since Intel
based their tree on my tree at that time, but it all seems fine wrt
merging."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm exynos: use drm_fb_helper_set_par directly
drm/exynos: Fix fb_videomode <-> drm_mode_modeinfo conversion
drm/exynos: fix runtime_pm fimd device state on probe
drm/exynos: use correct 'exynos-drm' name for platform device
drm/i915: support 32 bit BGR formats in sprite planes
drm/i915: fix color order for BGR formats on SNB
drm/gma500: Fix Cedarview boot failures in 3.3-rc
Linus Torvalds [Fri, 16 Mar 2012 00:06:05 +0000 (17:06 -0700)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"For 4 fixes for 3.3 (all trivial):
- uvc video driver: fixes a division by zero;
- davinci: add module.h to fix compilation;
- smsusb: fix the delivery system setting;
- smsdvb: the get_frontend implementation there is broken.
The smsdvb patch has 127 lines, but it is trivial: instead of
returning a cache of the set_frontend (with is wrong, as it doesn't
have the updated values for the data, and the implementation there is
buggy), it copies the information of the detected DVB parameters from
the smsdvb private structures into the corresponding DVBv5 struct
fields."
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] smsdvb: fix get_frontend
[media] smsusb: fix the default delivery system setting
[media] media: davinci: added module.h to resolve unresolved macros
[media] [FOR,v3.3] uvcvideo: Avoid division by 0 in timestamp calculation
Linus Torvalds [Fri, 16 Mar 2012 00:04:56 +0000 (17:04 -0700)]
Merge branch '3.3-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target fixes from Nicholas Bellinger:
"This series addresses two recently reported regression bugs related to
legacy SCSI reservation usage in target core, and iscsi-target
reservation conflict handling.
The second patch in particular addresses possible data-corruption with
SCSI reservations that is specific to iscsi-target fabric LUNs with
multiple client writers. Both patches need to go into v3.2 stable
ASAP, and the branch based on the last target-pending/3.3-rc-fixes
HEAD.
Again, thanks to Martin Svec for his help to identify and address this
regression bug with iscsi-target."
strict_strtoul() writes a long but ->gamma_mode only has space to store an
int, so on 64 bit systems we end up scribbling over ->gamma_table_count as
well. I've changed it to use kstrtouint() instead.
Donghwa Lee [Thu, 15 Mar 2012 22:17:11 +0000 (15:17 -0700)]
MAINTAINERS: add entry for exynos mipi display drivers
I'd like to add Inki Dae, Donghwa Lee and Kyungmin Park as maintainers
who developers for exynos mipi display drivers for
video/driver/exynos/exynos_mipi* and include/video/exynos_mipi*.
Signed-off-by: Donghwa Lee <dh09.lee@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Kukjin Kim <kgene.kim@samsung.com> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johan Hedberg [Thu, 15 Mar 2012 22:17:11 +0000 (15:17 -0700)]
MAINTAINERS: add Johan to Bluetooth maintainers
I've been coordinating Bluetooth patches in my tree for some time and
it's possible I'll do it in the future too, so add myself to the
Bluetooth sections as well as mention my tree there.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: "Gustavo F. Padovan" <padovan@profusion.mobi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Thu, 15 Mar 2012 22:17:10 +0000 (15:17 -0700)]
prctl: use CAP_SYS_RESOURCE for PR_SET_MM option
CAP_SYS_ADMIN is already overloaded left and right, so to have more
fine-grained access control use CAP_SYS_RESOURCE here.
The CAP_SYS_RESOUCE is chosen because this prctl option allows a current
process to adjust some fields of memory map descriptor which rather
represents what the process owns: pointers to code, data, stack
segments, command line, auxiliary vector data and etc.
Suggested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/tsi721: fix bug in register offset definitions
Fix indexed register offset definitions that use decimal (wrong) instead
of hexadecimal (correct) notation for indexing multipliers.
Incorrect definitions do not affect Tsi721 driver in its current default
configuration because it uses only IDB queue 0. Loss of inbound
doorbell functionality should be observed if queue other than 0 is used.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Chul Kim <chul.kim@idt.com> Cc: <stable@vger.kernel.org> [3.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Viresh Kumar [Thu, 15 Mar 2012 22:17:09 +0000 (15:17 -0700)]
MAINTAINERS: update ST's Mailing list for SPEAr
We have created a ST's Mailing list for SPEAr. This can be accessed
from non-st email ids. I want people to cc this list, when they have
changes specific to SPEAr. So, its better to get this updated in
MAINTAINERS file.
linux-arm-kernel@lists.infradead.org is also added for SPEAr.
Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 15 Mar 2012 22:17:07 +0000 (15:17 -0700)]
memcg: free mem_cgroup by RCU to fix oops
After fixing the GPF in mem_cgroup_lru_del_list(), three times one
machine running a similar load (moving and removing memcgs while
swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving
memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone():
this is where a struct mem_cgroup is first accessed after being chosen
by mem_cgroup_iter().
Just what protects a struct mem_cgroup from being freed, in between
mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget()
fails once css->refcnt is zero with CSS_REMOVED set in flags, yes: but
what if that memory is freed and reused for something else, which sets
"refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is
left at zero but flags are cleared.
It's tempting to move the css_tryget() into css_get_next(), to make it
really "get" the css, but I don't think that actually solves anything:
the same difficulty in moving from css_id found to stable css remains.
But we already have rcu_read_lock() around the two, so it's easily fixed
if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup.
However, a big struct mem_cgroup is allocated with vzalloc() instead of
kzalloc(), and we're not allowed to vfree() at interrupt time: there
doesn't appear to be a general vfree_rcu() to help with this, so roll
our own using schedule_work(). The compiler decently removes
vfree_work() and vfree_rcu() when the config doesn't need them.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha Levin [Thu, 15 Mar 2012 16:36:14 +0000 (12:36 -0400)]
ntp: Fix integer overflow when setting time
'long secs' is passed as divisor to div_s64, which accepts a 32bit
divisor. On 64bit machines that value is trimmed back from 8 bytes
back to 4, causing a divide by zero when the number is bigger than
(1 << 32) - 1 and all 32 lower bits are 0.
Jan Beulich [Thu, 8 Mar 2012 09:29:28 +0000 (09:29 +0000)]
perf tools: Adjust make rules
Add rules to generate pre-processed files (just like are available for
the normal kernel build), and adjust the rule to create assembly files
from C ones to produce its output in the output directory rather than
in the source tree.
Ville Syrjala [Thu, 15 Mar 2012 17:11:05 +0000 (18:11 +0100)]
i2c-algo-bit: Fix spurious SCL timeouts under heavy load
When the system is under heavy load, there can be a significant delay
between the getscl() and time_after() calls inside sclhi(). That delay
may cause the time_after() check to trigger after SCL has gone high,
causing sclhi() to return -ETIMEDOUT.
To fix the problem, double check that SCL is still low after the
timeout has been reached, before deciding to return -ETIMEDOUT.
Signed-off-by: Ville Syrjala <syrjala@sci.fi> Cc: stable@vger.kernel.org Signed-off-by: Jean Delvare <khali@linux-fr.org>
Peter Zijlstra [Thu, 15 Mar 2012 11:35:37 +0000 (12:35 +0100)]
printk: Make it compile with !CONFIG_PRINTK
Commit 04a5dcffe8b2 ("printk/sched: Introduce special
printk_sched() for those awkward moments") overlooked
an #ifdef, so move code around to respect these directives.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Randy Dunlap <rdunlap@xenotime.net> Link: http://lkml.kernel.org/r/1331811337.18960.179.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dave Airlie [Thu, 15 Mar 2012 09:41:26 +0000 (09:41 +0000)]
Merge branch 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-samsung into drm-fixes
* 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-samsung:
drm exynos: use drm_fb_helper_set_par directly
drm/exynos: Fix fb_videomode <-> drm_mode_modeinfo conversion
drm/exynos: fix runtime_pm fimd device state on probe
drm/exynos: use correct 'exynos-drm' name for platform device
Sascha Hauer [Wed, 14 Mar 2012 10:44:54 +0000 (19:44 +0900)]
drm exynos: use drm_fb_helper_set_par directly
info->fix.visual already is correctly set from drm_fb_helper_fill_fix.
info->fix.line_length is also set from drm_fb_helper_fill_fix,
so drm_fb_helper_set_par directly instead of a custom
exynos_drm_fbdev_set_par.
The fb_videomode structure stores the front porch and back porch in the
right_margin and left_margin fields respectively. right_margin should
thus be computed with hsync_start - hdisplay, and left_margin with
htotal - hsync_end. The same holds for the vertical direction.
Active Front Sync Back
Region Porch Porch
<-------------------><----------------><-------------><---------------->
drm/exynos: fix runtime_pm fimd device state on probe
A call to pm_runtime_set_active() forces device to be at the active
state and skips calling its runtime suspend/resume callbacks. This
results in a freeze with a new power domain code based on gen_pd. Fimd
driver does all required runtime power management calls, so this
pm_runtime_set_active call is buggy. This patch removes it and corrects
clock management in probe function (clocks are now enabled by
pm_runtime_get_sync() call).
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
drm/exynos: use correct 'exynos-drm' name for platform device
Currently Exynos DRM driver uses DRIVER_NAME ('exynos') name for the
core platform device. This is confusing, because it doesn't refer to the
function the platform device is performing. This patch renames the
platform device to the 'exynos-drm', which matches the convention for
naming the platform devices. The name used inside DRM subsystem has not
been changed.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Linus Torvalds [Thu, 15 Mar 2012 00:16:45 +0000 (17:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Been sitting on this for a while, but lets get this out the door.
This fixes various important bugs for 3.3 final, along with a few more
trivial ones. Please pull!"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix ioc leak in put_io_context
block, sx8: fix pointer math issue getting fw version
Block: use a freezable workqueue for disk-event polling
drivers/block/DAC960: fix -Wuninitialized warning
drivers/block/DAC960: fix DAC960_V2_IOCTL_Opcode_T -Wenum-compare warning
block: fix __blkdev_get and add_disk race condition
block: Fix setting bio flags in drivers (sd_dif/floppy)
block: Fix NULL pointer dereference in sd_revalidate_disk
block: exit_io_context() should call elevator_exit_icq_fn()
block: simplify ioc_release_fn()
block: replace icq->changed with icq->flags
Linus Torvalds [Thu, 15 Mar 2012 00:16:02 +0000 (17:16 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Another small batch of driver specific bug fixes, a couple more errors
in the da9052 driver and a bad return value in the tps6524x driver."
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: da9052: Ensure the selected voltage falls within the specified range
regulator: Set n_voltages for da9052 regulators
regulator: Fix setting selector in tps6524x set_voltage function
Linus Torvalds [Thu, 15 Mar 2012 00:13:49 +0000 (17:13 -0700)]
Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull arch/tile update to run "make minconfig" on the tile defconfigs
from Chris Metcalf.
This removes almost three thousand lines of inane defconfig chatter.
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile/configs: convert to minimal configs via "make savedefconfig"