Filipe Manana [Thu, 5 May 2016 00:41:57 +0000 (01:41 +0100)]
Btrfs: fix inode leak on failure to setup whiteout inode in rename
If we failed to fully setup the whiteout inode during a rename operation
with the whiteout flag, we ended up leaking the inode, not decrementing
its link count nor removing all its items from the fs/subvol tree.
Dan Fuhry [Thu, 17 Mar 2016 14:23:38 +0000 (15:23 +0100)]
btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT
Two new flags, RENAME_EXCHANGE and RENAME_WHITEOUT, provide for new
behavior in the renameat2() syscall. This behavior is primarily used by
overlayfs. This patch adds support for these flags to btrfs, enabling it to
be used as a fully functional upper layer for overlayfs.
RENAME_EXCHANGE support was written by Davide Italiano originally
submitted on 2 April 2015.
Signed-off-by: Davide Italiano <dccitaliano@gmail.com> Signed-off-by: Dan Fuhry <dfuhry@datto.com>
[ remove unlikely ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Filipe Manana [Fri, 29 Apr 2016 12:14:42 +0000 (13:14 +0100)]
Btrfs: pin log earlier when renaming
We were pinning the log right after the first step in the rename operation
(inserting inode ref for the new name in the destination directory)
instead of doing it before. This behaviour was introduced in 2009 for some
reason that was not mentioned neither on the changelog nor any comment,
with the drawback of a small time window where concurrent log writers can
end up logging the new inode reference for the inode we are renaming while
the rename operation is in progress (so that we can end up with a log
containing both the new and old references). As of today there's no reason
to not pin the log before that first step anymore, so just fix this.
Filipe Manana [Fri, 29 Apr 2016 10:34:22 +0000 (11:34 +0100)]
Btrfs: unpin log if rename operation fails
If rename operations fail at some point after we pinned the log, we end
up aborting the current transaction but never unpin the log, which leaves
concurrent tasks that are trying to sync the log (as part of an fsync
request from user space) blocked forever and preventing the filesystem
from being unmountable.
Filipe Manana [Tue, 26 Apr 2016 14:39:32 +0000 (15:39 +0100)]
Btrfs: don't do unnecessary delalloc flushes when relocating
Before we start the actual relocation process of a block group, we do
calls to flush delalloc of all inodes and then wait for ordered extents
to complete. However we do these flush calls just to make sure we don't
race with concurrent tasks that have actually already started to run
delalloc and have allocated an extent from the block group we want to
relocate, right before we set it to readonly mode, but have not yet
created the respective ordered extents. The flush calls make us wait
for such concurrent tasks because they end up calling
filemap_fdatawrite_range() (through btrfs_start_delalloc_roots() ->
__start_delalloc_inodes() -> btrfs_alloc_delalloc_work() ->
btrfs_run_delalloc_work()) which ends up serializing us with those tasks
due to attempts to lock the same pages (and the delalloc flush procedure
calls the allocator and creates the ordered extents before unlocking the
pages).
These flushing calls not only make us waste time (cpu, IO) but also reduce
the chances of writing larger extents (applications might be writing to
contiguous ranges and we flush before they finish dirtying the whole
ranges).
So make sure we don't flush delalloc and just wait for concurrent tasks
that have already started flushing delalloc and have allocated an extent
from the block group we are about to relocate.
This change also ends up fixing a race with direct IO writes that makes
relocation not wait for direct IO ordered extents. This race is
illustrated by the following diagram:
CPU 1 CPU 2
btrfs_relocate_block_group(bg X)
starts direct IO write,
target inode currently has no
ordered extents ongoing nor
dirty pages (delalloc regions),
therefore the root for our inode
is not in the list
fs_info->ordered_roots
btrfs_direct_IO()
__blockdev_direct_IO()
btrfs_get_blocks_direct()
btrfs_lock_extent_direct()
locks range in the io tree
btrfs_new_extent_direct()
btrfs_reserve_extent()
--> extent allocated
from bg X
btrfs_inc_block_group_ro(bg X)
btrfs_start_delalloc_roots()
__start_delalloc_inodes()
--> does nothing, no dealloc ranges
in the inode's io tree so the
inode's root is not in the list
fs_info->delalloc_roots
btrfs_wait_ordered_roots()
--> does not find the inode's root in the
list fs_info->ordered_roots
--> ends up not waiting for the direct IO
write started by the task at CPU 2
iterates the extent tree, using its
commit root and moves extents into new
locations
btrfs_add_ordered_extent_dio()
--> now a ordered extent is
created and added to the
list root->ordered_extents
and the root added to the
list fs_info->ordered_roots
--> this is too late and the
task at CPU 1 already
started the relocation
btrfs_commit_transaction()
btrfs_finish_ordered_io()
btrfs_alloc_reserved_file_extent()
--> adds delayed data reference
for the extent allocated
from bg X
prepare_to_relocate()
btrfs_commit_transaction()
--> delayed refs are run, so an extent
item for the allocated extent from
bg X is added to extent tree
--> commit roots are switched, so the
next scan in the extent tree will
see the extent item
sees the extent in the extent tree
When this happens the relocation produces the following warning when it
finishes:
This is because at the end of the first stage, in relocate_block_group(),
we commit the current transaction, which makes delayed refs run, the
commit roots are switched and so the second stage will find the extent
item that the ordered extent added to the delayed refs. But this extent
was not moved (ordered extent completed after first stage finished), so
at the end of the relocation our block group item still has a positive
used bytes counter, triggering a warning at the end of
btrfs_relocate_block_group(). Later on when trying to read the extent
contents from disk we hit a BUG_ON() due to the inability to map a block
with a logical address that belongs to the block group we relocated and
is no longer valid, resulting in the following trace:
Filipe Manana [Tue, 26 Apr 2016 14:36:38 +0000 (15:36 +0100)]
Btrfs: don't wait for unrelated IO to finish before relocation
Before the relocation process of a block group starts, it sets the block
group to readonly mode, then flushes all delalloc writes and then finally
it waits for all ordered extents to complete. This last step includes
waiting for ordered extents destinated at extents allocated in other block
groups, making us waste unecessary time.
So improve this by waiting only for ordered extents that fall into the
block group's range.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Filipe Manana [Mon, 25 Apr 2016 03:45:02 +0000 (04:45 +0100)]
Btrfs: fix empty symlink after creating symlink and fsync parent dir
If we create a symlink, fsync its parent directory, crash/power fail and
mount the filesystem, we end up with an empty symlink, which not only is
useless it's also not allowed in linux (the man page symlink(2) is well
explicit about that). So we just need to make sure to fully log an inode
if it's a symlink, to ensure its inline extent gets logged, ensuring the
same behaviour as ext3, ext4, xfs, reiserfs, f2fs, nilfs2, etc.
Filipe Manana [Wed, 6 Apr 2016 16:11:56 +0000 (17:11 +0100)]
Btrfs: fix for incorrect directory entries after fsync log replay
If we move a directory to a new parent and later log that parent and don't
explicitly log the old parent, when we replay the log we can end up with
entries for the moved directory in both the old and new parent directories.
Besides being ilegal to have directories with multiple hard links in linux,
it also resulted in the leaving the inode item with a link count of 1.
A similar issue also happens if we move a regular file - after the log tree
is replayed the file has a link in both the old and new parent directories,
when it should be only at the new directory.
Sample reproducer:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/x
$ mkdir /mnt/y
$ touch /mnt/x/foo
$ mkdir /mnt/y/z
$ sync
$ ln /mnt/x/foo /mnt/x/bar
$ mv /mnt/y/z /mnt/x/z
< power fail >
$ mount /dev/sdc /mnt
$ ls -1Ri /mnt
/mnt:
257 x
258 y
/mnt/x:
259 bar
259 foo
260 z
/mnt/x/z:
/mnt/y:
260 z
/mnt/y/z:
$ umount /dev/sdc
$ btrfs check /dev/sdc
Checking filesystem on /dev/sdc
UUID: a67e2c4a-a4b4-4fdc-b015-9d9af1e344be
checking extents
checking free space cache
checking fs roots
root 5 inode 260 errors 2000, link count wrong
unresolved ref dir 257 index 4 namelen 1 name z filetype 2 errors 0
unresolved ref dir 258 index 2 namelen 1 name z filetype 2 errors 0
(...)
Attempting to remove the directory becomes impossible:
$ mount /dev/sdc /mnt
$ rmdir /mnt/y/z
$ ls -lh /mnt/y
ls: cannot access /mnt/y/z: No such file or directory
total 0
d????????? ? ? ? ? ? z
$ rmdir /mnt/x/z
rmdir: failed to remove ‘/mnt/x/z’: Stale file handle
$ ls -lh /mnt/x
ls: cannot access /mnt/x/z: Stale file handle
total 0
-rw-r--r-- 2 root root 0 Apr 6 18:06 bar
-rw-r--r-- 2 root root 0 Apr 6 18:06 foo
d????????? ? ? ? ? ? z
So make sure that on rename we set the last_unlink_trans value for our
inode, even if it's a directory, to the value of the current transaction's
ID and that if the new parent directory is logged that we fallback to a
transaction commit.
A test case for fstests is being submitted as well.
Linus Torvalds [Sat, 7 May 2016 17:50:48 +0000 (10:50 -0700)]
Merge tag 'staging-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO driver fixes from Grek KH:
"It's really just IIO drivers here, some small fixes that resolve some
'crash on boot' errors that have shown up in the -rc series, and other
bugfixes that are required.
All have been in linux-next with no reported problems"
* tag 'staging-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: imu: mpu6050: Fix name/chip_id when using ACPI
iio: imu: mpu6050: fix possible NULL dereferences
iio:adc:at91-sama5d2: Repair crash on module removal
iio: ak8975: fix maybe-uninitialized warning
iio: ak8975: Fix NULL pointer exception on early interrupt
Linus Torvalds [Sat, 7 May 2016 17:47:03 +0000 (10:47 -0700)]
Merge tag 'usb-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some last-remaining fixes for USB drivers to resolve issues
that have shown up in testing. And two new device ids as well.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "USB / PM: Allow USB devices to remain runtime-suspended when sleeping"
usb: musb: jz4740: fix error check of usb_get_phy()
Revert "usb: musb: musb_host: Enable HCD_BH flag to handle urb return in bottom half"
usb: musb: gadget: nuke endpoint before setting its descriptor to NULL
USB: serial: cp210x: add Straizona Focusers device ids
USB: serial: cp210x: add ID for Link ECU
Linus Torvalds [Sat, 7 May 2016 15:27:35 +0000 (08:27 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"These are a number of updates to fix a few problems found in the ARM
nommu code over the last couple of years, caused mostly by changes on
the mmu side"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8573/1: domain: move {set,get}_domain under config guard
ARM: 8572/1: nommu: change memory reserve for the vectors
ARM: 8571/1: nommu: fix PMSAv7 setup
Linus Torvalds [Sat, 7 May 2016 15:17:45 +0000 (08:17 -0700)]
Merge tag 'media/v4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- deadlock fixes on driver probe at exynos4-is and s43-camif drivers
- a build breakage if media controller is enabled and USB or PCI is
built as module.
* tag 'media/v4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] media-device: fix builds when USB or PCI is compiled as module
[media] media: s3c-camif: fix deadlock on driver probe()
[media] media: exynos4-is: fix deadlock on driver probe
Linus Torvalds [Fri, 6 May 2016 20:08:35 +0000 (13:08 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull writeback fix from Jens Axboe:
"Just a single fix for domain aware writeback, fixing a regression that
can cause balance_dirty_pages() to keep looping while not getting any
work done"
* 'for-linus' of git://git.kernel.dk/linux-block:
writeback: Fix performance regression in wb_over_bg_thresh()
Linus Torvalds [Fri, 6 May 2016 19:59:27 +0000 (12:59 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This contains two fixes: a boot fix for older SGI/UV systems, and an
APIC calibration fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init
Linus Torvalds [Fri, 6 May 2016 18:58:45 +0000 (11:58 -0700)]
Merge tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Fixes for problems introduced or discovered recently (intel_pstate,
sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework,
generic device properties framework) and one fix for a hotplug-related
deadlock in ACPICA that's been there forever, but is nasty enough.
Specifics:
- Fix for a recent regression in the intel_pstate driver causing it
to fail to restore the HWP (HW-managed P-states) configuration of
the boot CPU after suspend-to-RAM (Rafael Wysocki).
- Fix for two recent regressions in the intel_pstate driver, one that
can trigger a divide by zero if the driver is accessed via sysfs
before it manages to take the first sample and one causing it to
fail to update a structure field used in a trace point, so the
information coming from it is less useful (Rafael Wysocki).
- Fix for a problem in the sti-cpufreq driver introduced during the
4.5 cycle that causes it to break CPU PM in multi-platform kernels
by registering cpufreq-dt (which subsequently doesn't work)
unconditionally and preventing the driver that would actually work
from registering (Sudeep Holla).
- Stable-candidate fix for an ARM64 cpuidle issue causing idle state
usage counters to be incorrectly updated for idle states that were
not entered due to errors (James Morse).
- Fix for a recently introduced issue in the OPP (Operating
Performance Points) framework causing it to print bogus error
messages for missing optional regulators (Viresh Kumar).
- Fix for a recently introduced issue in the generic device
properties framework that may cause it to attempt to dereferece and
invalid pointer in some cases (Heikki Krogerus).
- Fix for a deadlock in the ACPICA core that may be triggered by
device (eg Thunderbolt) hotplug (Prarit Bhargava)"
* tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / OPP: Remove useless check
ACPICA: Dispatcher: Update thread ID for recursive method calls
intel_pstate: Fix intel_pstate_get()
cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
cpufreq: st: enable selective initialization based on the platform
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
device property: Avoid potential dereferences of invalid pointers
Linus Torvalds [Fri, 6 May 2016 18:53:27 +0000 (11:53 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"This contains a single fix that fixes a nohz tick stopping bug when
mixed-poliocy SCHED_FIFO and SCHED_RR tasks are present on a runqueue"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()
Linus Torvalds [Fri, 6 May 2016 18:40:24 +0000 (11:40 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This tree contains two fixes: new Intel CPU model numbers and an
AMD/iommu uncore PMU driver fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUs
perf/x86: Add model numbers for Kabylake CPUs
Linus Torvalds [Fri, 6 May 2016 18:33:02 +0000 (11:33 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"This tree contains three fixes: a console spam fix, a file pattern fix
and a sysfb_efi fix for a bug that triggered on older ThinkPads"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sysfb_efi: Fix valid BAR address range check
x86/efi-bgrt: Switch all pr_err() to pr_notice() for invalid BGRT
MAINTAINERS: Remove asterisk from EFI directory names
Linus Torvalds [Fri, 6 May 2016 18:27:05 +0000 (11:27 -0700)]
Merge branch 'parisc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"Patch from Dmitry V Levin to fix a kernel crash when a straced process
calls the (invalid) syscall which is equal to value of __NR_Linux_syscalls"
* 'parisc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls
Linus Torvalds [Fri, 6 May 2016 18:14:38 +0000 (11:14 -0700)]
Merge tag 'arc-4.6-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"Late in the cycle, but this has fixes for couple of issues: a PAE40
boot crash and Arnd spotting lack of barriers in BE io-accessors.
The 3rd patch for enabling highmem in low physical mem ;-) honestly is
more than a "fix" but its been in works for some time, seems to be
stable in testing and enables 2 of our customers to go forward with
4.6 kernel.
- Fix for PTE truncation in PAE40 builds
- Fix for big endian IO accessors lacking IO barrier
- Allow HIGHMEM to work with low physical addresses"
* tag 'arc-4.6-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: support HIGHMEM even without PAE40
ARC: Fix PAE40 boot failures due to PTE truncation
ARC: Add missing io barriers to io{read,write}{16,32}be()
Linus Torvalds [Fri, 6 May 2016 17:59:53 +0000 (10:59 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Fixes for i915, amdgpu/radeon and imx.
The IMX fix is for an autoloading regression found in Fedora. The
radeon fixes, are the same fix to amdgpu/radeon to avoid a hardware
lockup in some circumstances with a bad mode, and a double free bug I
took a few hours chasing down the other morning.
The i915 fixes are across the board, all stable material, and fixing
some hangs and suspend/resume issues, along with a live status
regressions"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
drm/amdgpu: make sure vertical front porch is at least 1
drm/radeon: make sure vertical front porch is at least 1
drm/amdgpu: set metadata pointer to NULL after freeing.
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fake HDMI live status
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/i915: Fix system resume if PCI device remained enabled
drm/i915: Avoid stalling on pending flips for legacy cursor updates
parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls
Do not load one entry beyond the end of the syscall table when the
syscall number of a traced process equals to __NR_Linux_syscalls.
Similar bug with regular processes was fixed by commit b6f8227c76e6
("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").
Merge branches 'pm-opp-fixes', 'pm-cpufreq-fixes' and 'pm-cpuidle-fixes'
* pm-opp-fixes:
PM / OPP: Remove useless check
* pm-cpufreq-fixes:
intel_pstate: Fix intel_pstate_get()
cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
cpufreq: st: enable selective initialization based on the platform
* pm-cpuidle-fixes:
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
Linus Torvalds [Fri, 6 May 2016 03:48:35 +0000 (20:48 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"14 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
byteswap: try to avoid __builtin_constant_p gcc bug
lib/stackdepot: avoid to return 0 handle
mm: fix kcompactd hang during memory offlining
modpost: fix module autoloading for OF devices with generic compatible property
proc: prevent accessing /proc/<PID>/environ until it's ready
mm/zswap: provide unique zpool name
mm: thp: kvm: fix memory corruption in KVM with THP enabled
MAINTAINERS: fix Rajendra Nayak's address
mm, cma: prevent nr_isolated_* counters from going negative
mm: update min_free_kbytes from khugepaged after core initialization
huge pagecache: mmap_sem is unlocked when truncation splits pmd
rapidio/mport_cdev: fix uapi type definitions
mm: memcontrol: let v2 cgroups follow changes in system swappiness
mm: thp: correct split_huge_pages file permission
Linus Torvalds [Fri, 6 May 2016 01:10:01 +0000 (18:10 -0700)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- a fix for the persistent memory 'struct page' driver. The
implementation overlooked the fact that pages are allocated in 2MB
units leading to -ENOMEM when establishing some configurations.
It's tagged for -stable as the problem was introduced with the
initial implementation in 4.5.
- The new "error status translation" routine, introduced with the 4.6
updates to the nfit driver, missed a necessary path in
acpi_nfit_ctl().
The end result is that we are falsely assuming commands complete
successfully when the embedded status says otherwise.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nfit: fix translation of command status results
libnvdimm, pfn: fix memmap reservation sizing
Arnd Bergmann [Thu, 5 May 2016 23:22:39 +0000 (16:22 -0700)]
byteswap: try to avoid __builtin_constant_p gcc bug
This is another attempt to avoid a regression in wwn_to_u64() after that
started using get_unaligned_be64(), which in turn ran into a bug on
gcc-4.9 through 6.1.
The regression got introduced due to the combination of two separate
workarounds (commits 1b5c98bdc644: "include/linux/unaligned: force
inlining of byteswap operations" and 8c3597696740: "scsi: fc: use
get/put_unaligned64 for wwn access") that each try to sidestep distinct
problems with gcc behavior (code growth and increased stack usage).
Unfortunately after both have been applied, a more serious gcc bug has
been uncovered, leading to incorrect object code that discards part of a
function and causes undefined behavior.
As part of this problem is how __builtin_constant_p gets evaluated on an
argument passed by reference into an inline function, this avoids the
use of __builtin_constant_p() for all architectures that set
CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not set
ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably do not
suffer from the problem in the qla2xxx driver, but they might still run
into it elsewhere.
Both of the original workarounds were only merged in the 4.6 kernel, and
the bug that is fixed by this patch should only appear if both are
there, so we probably don't need to backport the fix. On the other
hand, it works by simplifying the code path and should not have any
negative effects.
[arnd@arndb.de: fix older gcc warnings]
(http://lkml.kernel.org/r/12243652.bxSxEgjgfk@wuerfel) Link: https://lkml.org/lkml/headers/2016/4/12/1103 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 Fixes: 1b5c98bdc644 ("include/linux/unaligned: force inlining of byteswap operations") Fixes: 8c3597696740 ("scsi: fc: use get/put_unaligned64 for wwn access") Link: http://lkml.kernel.org/r/1780465.XdtPJpi8Tt@wuerfel Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> # on gcc-5.3 Tested-by: Quinn Tran <quinn.tran@qlogic.com> Cc: Martin Jambor <mjambor@suse.cz> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Jan Hubicka <hubicka@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Thu, 5 May 2016 23:22:35 +0000 (16:22 -0700)]
lib/stackdepot: avoid to return 0 handle
Recently, we allow to save the stacktrace whose hashed value is 0. It
causes the problem that stackdepot could return 0 even if in success.
User of stackdepot cannot distinguish whether it is success or not so we
need to solve this problem. In this patch, 1 bit are added to handle
and make valid handle none 0 by setting this bit. After that, valid
handle will not be 0 and 0 handle will represent failure correctly.
Fixes: a65a231eda27 ("lib/stackdepot.c: allow the stack trace hash to be zero") Link: http://lkml.kernel.org/r/1462252403-1106-1-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Philipp Zabel [Thu, 5 May 2016 23:22:29 +0000 (16:22 -0700)]
modpost: fix module autoloading for OF devices with generic compatible property
Since the wildcard at the end of OF module aliases is gone, autoloading
of modules that don't match a device's last (most generic) compatible
value fails.
For example the CODA960 VPU on i.MX6Q has the SoC specific compatible
"fsl,imx6q-vpu" and the generic compatible "cnm,coda960". Since the
driver currently only works with knowledge about the SoC specific
integration, it doesn't list "cnm,cod960" in the module device table.
This results in the device compatible
"of:NvpuT<NULL>Cfsl,imx6q-vpuCcnm,coda960" not matching the module alias
"of:N*T*Cfsl,imx6q-vpu" anymore, whereas before commit ef37cfddf89e
("modpost: don't add a trailing wildcard for OF module aliases") it
matched the module alias "of:N*T*Cfsl,imx6q-vpu*".
This patch adds two module aliases for each compatible, one without the
wildcard and one with "C*" appended.
Mathias Krause [Thu, 5 May 2016 23:22:26 +0000 (16:22 -0700)]
proc: prevent accessing /proc/<PID>/environ until it's ready
If /proc/<PID>/environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.
Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().
This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.
The expected consequence is that userland trying to access
/proc/<PID>/environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.
Dan Streetman [Thu, 5 May 2016 23:22:23 +0000 (16:22 -0700)]
mm/zswap: provide unique zpool name
Instead of using "zswap" as the name for all zpools created, add an
atomic counter and use "zswap%x" with the counter number for each zpool
created, to provide a unique name for each new zpool.
As zsmalloc, one of the zpool implementations, requires/expects a unique
name for each pool created, zswap should provide a unique name. The
zsmalloc pool creation does not fail if a new pool with a conflicting
name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case,
zsmalloc pool creation fails with -ENOMEM. Then zswap will be unable to
change its compressor parameter if its zpool is zsmalloc; it also will
be unable to change its zpool parameter back to zsmalloc, if it has any
existing old zpool using zsmalloc with page(s) in it. Attempts to
change the parameters will result in failure to create the zpool. This
changes zswap to provide a unique name for each zpool creation.
Fixes: a8b09439e43f ("zswap: dynamic pool creation") Signed-off-by: Dan Streetman <ddstreet@ieee.org> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <dan.streetman@canonical.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm: thp: kvm: fix memory corruption in KVM with THP enabled
After the THP refcounting change, obtaining a compound pages from
get_user_pages() no longer allows us to assume the entire compound page
is immediately mappable from a secondary MMU.
A secondary MMU doesn't want to call get_user_pages() more than once for
each compound page, in order to know if it can map the whole compound
page. So a secondary MMU needs to know from a single get_user_pages()
invocation when it can map immediately the entire compound page to avoid
a flood of unnecessary secondary MMU faults and spurious
atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
users).
Ideally instead of the page->_mapcount < 1 check, get_user_pages()
should return the granularity of the "page" mapping in the "mm" passed
to get_user_pages(). However it's non trivial change to pass the "pmd"
status belonging to the "mm" walked by get_user_pages up the stack (up
to the caller of get_user_pages). So the fix just checks if there is
not a single pte mapping on the page returned by get_user_pages, and in
turn if the caller can assume that the whole compound page is mapped in
the current "mm" (in a pmd_trans_huge()). In such case the entire
compound page is safe to map into the secondary MMU without additional
get_user_pages() calls on the surrounding tail/head pages. In addition
of being faster, not having to run other get_user_pages() calls also
reduces the memory footprint of the secondary MMU fault in case the pmd
split happened as result of memory pressure.
Without this fix after a MADV_DONTNEED (like invoked by QEMU during
postcopy live migration or balloning) or after generic swapping (with a
failure in split_huge_page() that would only result in pmd splitting and
not a physical page split), KVM would map the whole compound page into
the shadow pagetables, despite regular faults or userfaults (like
UFFDIO_COPY) may map regular pages into the primary MMU as result of the
pte faults, leading to the guest mode and userland mode going out of
sync and not working on the same memory at all times.
Any other secondary MMU notifier manager (KVM is just one of the many
MMU notifier users) will need the same information if it doesn't want to
run a flood of get_user_pages_fast and it can support multiple
granularity in the secondary MMU mappings, so I think it is justified to
be exposed not just to KVM.
The other option would be to move transparent_hugepage_adjust to
mm/huge_memory.c but that currently has all kind of KVM data structures
in it, so it's definitely not a cut-and-paste work, so I couldn't do a
fix as cleaner as this one for 4.6.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: "Li, Liang Z" <liang.z.li@intel.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 5 May 2016 23:22:15 +0000 (16:22 -0700)]
mm, cma: prevent nr_isolated_* counters from going negative
/proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
increasingly negative under compaction: which would add delay when
should be none, or no delay when should delay. The bug in compaction
was due to a recent mmotm patch, but much older instance of the bug was
also noticed in isolate_migratepages_range() which is used for CMA and
gigantic hugepage allocations.
The bug is caused by putback_movable_pages() in an error path
decrementing the isolated counters without them being previously
incremented by acct_isolated(). Fix isolate_migratepages_range() by
removing the error-path putback, thus reaching acct_isolated() with
migratepages still isolated, and leaving putback to caller like most
other places do.
Fixes: f20ad70ca336 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
[vbabka@suse.cz: expanded the changelog] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason Baron [Thu, 5 May 2016 23:22:12 +0000 (16:22 -0700)]
mm: update min_free_kbytes from khugepaged after core initialization
Khugepaged attempts to raise min_free_kbytes if its set too low.
However, on boot khugepaged sets min_free_kbytes first from
subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes
after from init_per_zone_wmark_min(), via a module_init() call.
Khugepaged used to use a late_initcall() to set min_free_kbytes (such
that it occurred after the core initialization), however this was
removed when the initialization of min_free_kbytes was integrated into
the starting of the khugepaged thread.
The fix here is simply to invoke the core initialization using a
core_initcall() instead of module_init(), such that the previous
initialization ordering is restored. I didn't restore the
late_initcall() since start_stop_khugepaged() already sets
min_free_kbytes via set_recommended_min_free_kbytes().
This was noticed when we had a number of page allocation failures when
moving a workload to a kernel with this new initialization ordering. On
an 8GB system this restores min_free_kbytes back to 67584 from 11365
when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
Fixes: 11593e3588e7 ("thp: cleanup khugepaged startup") Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 5 May 2016 23:22:09 +0000 (16:22 -0700)]
huge pagecache: mmap_sem is unlocked when truncation splits pmd
zap_pmd_range()'s CONFIG_DEBUG_VM !rwsem_is_locked(&mmap_sem) BUG() will
be invalid with huge pagecache, in whatever way it is implemented:
truncation of a hugely-mapped file to an unhugely-aligned size would
easily hit it.
(Although anon THP could in principle apply khugepaged to private file
mappings, which are not excluded by the MADV_HUGEPAGE restrictions, in
practice there's a vm_ops check which excludes them, so it never hits
this BUG() - there's no interface to "truncate" an anonymous mapping.)
We could complicate the test, to check i_mmap_rwsem also when there's a
vm_file; but my inclination was to make zap_pmd_range() more readable by
simply deleting this check. A search has shown no report of the issue
in the years since commit da25e8706cd3 ("mm, thp: print useful
information when mmap_sem is unlocked in zap_pmd_range") expanded it
from VM_BUG_ON() - though I cannot point to what commit I would say then
fixed the issue.
But there are a couple of other patches now floating around, neither yet
in the tree: let's agree to retain the check as a VM_BUG_ON_VMA(), as
Matthew Wilcox has done; but subject to a vma_is_anonymous() check, as
Kirill Shutemov has done. And let's get this in, without waiting for
any particular huge pagecache implementation to reach the tree.
Matthew said "We can reproduce this BUG() in the current Linus tree with
DAX PMDs".
Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Matthew Wilcox <willy@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix problems in uapi definitions reported by Gabriel Laskar: (see
https://lkml.org/lkml/2016/4/5/205 for details)
- move public header file rio_mport_cdev.h to include/uapi/linux directory
- change types in data structures passed as IOCTL parameters
- improve parameter checking in some IOCTL service routines
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Reported-by: Gabriel Laskar <gabriel@lse.epita.fr> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Gabriel Laskar <gabriel@lse.epita.fr> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 5 May 2016 23:22:03 +0000 (16:22 -0700)]
mm: memcontrol: let v2 cgroups follow changes in system swappiness
Cgroup2 currently doesn't have a per-cgroup swappiness setting. We
might want to add one later - that's a different discussion - but until
we do, the cgroups should always follow the system setting. Otherwise
it will be unchangeably set to whatever the ancestor inherited from the
system setting at the time of cgroup creation.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 5 May 2016 22:40:38 +0000 (15:40 -0700)]
Merge tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic syscall fix from Arnd Bergmann:
"My last pull request for asm-generic had just one patch that added two
new system calls to asm/unistd.h, but unfortunately it turned out to
be wrong, pointing arch/tile compat mode at the native handlers rather
than the compat ones.
This was spotted by Yury Norov, who is working on ILP32 mode for
arch/arm64, which would have the same problem when merged. This fixes
the table to use the correct compat syscalls, like the other 64-bit
architectures do.
I'll try to find the time to come up with a solution that prevents
this problem from happening again, by allowing all future system calls
to just get added in a single file for use by all architectures"
* tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: use compat version for preadv2 and pwritev2
Merge tag 'iio-fixes-for-4.6d' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Fourth set of IIO fixes for the 4.6 cycle.
This last minute set is concerned with a regression in the mpu6050 driver.
The regression causes a null pointer dereference on any ACPI device
that has one of these present such as the ASUS T100TA Baytrail/T.
The issue was known but thought (i.e. missunderstood by me)
to only be a possible with no reports, so was routed via the normal merge
window. Turns out this was wrong (thanks to Alan for reporting the crash).
The pull is just for the null dereference fix and a followup fix
that also stops the reported name of the device being NULL.
* mpu6050
- Fix a 'possible' NULL dereference introduced as part of splitting the
driver to allow both i2c and spi to be supported. The issue affects ACPI
systems with this device.
- Fix a follow up issue where the name and chip id both get set to null if
the device driver instance is instantiated from ACPI tables.
Linus Torvalds [Thu, 5 May 2016 22:31:35 +0000 (15:31 -0700)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a couple last-minute fixes for ARM SoCs. Most of them are
for the OMAP platforms, the rest are all for different platforms.
OMAP:
All dts fixes, mostly affecting voltages and pinctrl for various
device drivers:
- Regulator minimum voltage fixes for omap5
- ISP syscon register offset fix for omap3
- Fix regulator initial modes for n900
- Fix omap5 pinctrl wkup instance size
Allwinner:
Remove incorrect constraints from a dcdc1 regulator
Alltera SoCFPGA:
Fix compilation in thumb2 mode
Samsung exynos:
Fix a potential oops in the pm-domain error handling
Davinci:
Avoid a link error if NVMEM is disabled
Renesas:
Do not mark an external uart clock as disabled, to allow probing
the uarts"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: davinci: only use NVMEM when available
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: dts: omap5: fix range of permitted wakeup pinmux registers
ARM: dts: omap3-n900: Specify peripherals LDO regulators initial mode
ARM: dts: omap3: Fix ISP syscon register offset
ARM: dts: omap5-cm-t54: fix ldo1_reg and ldo4_reg ranges
ARM: dts: omap5-board-common: fix ldo1_reg and ldo4_reg ranges
arm64: dts: r8a7795: Don't disable referenced optional scif clock
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
ARM: dts: sun8i-q8-common: Do not set constraints on dc1sw regulator
Howard Cochran [Thu, 10 Mar 2016 06:12:39 +0000 (01:12 -0500)]
writeback: Fix performance regression in wb_over_bg_thresh()
Commit 46484d1b117c ("writeback: update wb_over_bg_thresh() to use
wb_domain aware operations") unintentionally changed this function's
meaning from "are there more dirty pages than the background writeback
threshold" to "are there more dirty pages than the writeback threshold".
The background writeback threshold is typically half of the writeback
threshold, so this had the effect of raising the number of dirty pages
required to cause a writeback worker to perform background writeout.
This can cause a very severe performance regression when a BDI uses
BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker
can now disagree on whether writeback should be initiated.
For example, in a system having 1GB of RAM, a single spinning disk, and a
"pass-through" FUSE filesystem mounted over the disk, application code
mmapped a 128MB file on the disk and was randomly dirtying pages in that
mapping.
Because FUSE uses strictlimit and has a default max_ratio of only 1%, in
balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the
dirty_freerun_ceiling is the average of those, ~150. So, it pauses the
dirtying processes when we have 151 dirty pages and wakes up a background
writeback worker. But the worker tests the wrong threshold (200 instead of
100), so it does not initiate writeback and just returns.
Thus, balance_dirty_pages keeps looping, sleeping and then waking up the
worker who will do nothing. It remains stuck in this state until the few
dirty pages that we have finally expire and we write them back for that
reason. Then the whole process repeats, resulting in near-zero throughput
through the FUSE BDI.
The fix is to call the parameterized variant of wb_calc_thresh, so that the
worker will do writeback if the bg_thresh is exceeded which was the
behavior before the referenced commit.
Vladimir Murzin [Wed, 4 May 2016 09:39:02 +0000 (10:39 +0100)]
ARM: 8573/1: domain: move {set,get}_domain under config guard
Recursive undefined instrcution falut is seen with R-class taking an
exception. The reson for that is __show_regs() tries to get domain
information, but domains is not available on !MMU cores, like R/M
class.
Fix it by puting {set,get}_domain functions under CONFIG_CPU_CP15_MMU
guard and providing stubs for the case where domains is not supported.
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM: 8572/1: nommu: change memory reserve for the vectors
Commit 2c628e5b (ARM: move vector stubs) moved the vector stubs in an
additional page above the base vector one. This change wasn't taken into
account by the nommu memreserve.
This patch ensures that the kernel won't overwrite any vector stub on
nommu.
[changed the MPU side too]
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit b177076 (ARM: 8025/1: Get rid of meminfo) broke the support for
MPU on ARMv7-R. This patch adapts the code inside CONFIG_ARM_MPU to use
memblocks appropriately.
MPU initialisation only uses the first memory region, and removes all
subsequent ones. Because looping over all regions that need removal is
inefficient, and memblock_remove already handles memory ranges, we can
flatten the 'for_each_memblock' part.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
iSER currently has a couple places that set max_sectors in either the host
template or SCSI host, and all of them get it wrong.
This patch instead uses a single assignment that (hopefully) gets it right:
the max_sectors value must be derived from the number of segments in the
FR or FMR structure, but actually be one lower than the page size multiplied
by the number of sectors, as it has to handle the case of non-aligned I/O.
Without this I get trivial to reproduce hangs when running xfstests
(on XFS) over iSER to Linux targets.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
This oops happens with the namespace_sem held and can be triggered by
non-root users. An all around not pleasant experience.
To avoid this scenario when finding the appropriate source mount to
copy stop the walk up the mnt_master chain when the first source mount
is encountered.
Further rewrite the walk up the last_source mnt_master chain so that
it is clear what is going on.
The reason why the first source mount is special is that it it's
mnt_parent is not a mount in the dest_mnt propagation tree, and as
such termination conditions based up on the dest_mnt mount propgation
tree do not make sense.
To avoid other kinds of confusion last_dest is not changed when
computing last_source. last_dest is only used once in propagate_one
and that is above the point of the code being modified, so changing
the global variable is meaningless and confusing.
Wang YanQing [Thu, 5 May 2016 13:14:21 +0000 (14:14 +0100)]
x86/sysfb_efi: Fix valid BAR address range check
The code for checking whether a BAR address range is valid will break
out of the loop when a start address of 0x0 is encountered.
This behaviour is wrong since by breaking out of the loop we may miss
the BAR that describes the EFI frame buffer in a later iteration.
Because of this bug I can't use video=efifb: boot parameter to get
efifb on my new ThinkPad E550 for my old linux system hard disk with
3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not
supporting the GPU.
This patch also add a trivial optimization to break out after we find
the frame buffer address range without testing later BARs.
Signed-off-by: Wang YanQing <udknight@gmail.com>
[ Rewrote changelog. ] Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Peter Jones <pjones@redhat.com> Cc: <stable@vger.kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
Initial HIGHMEM support on ARC was introduced for PAE40 where the low
memory (0x8000_0000 based) and high memory (0x1_0000_0000) were
physically contiguous. So CONFIG_FLATMEM sufficed (despite a peipheral
hole in the middle, which wasted a bit of struct page memory, but things
worked).
However w/o PAE, highmem was not possible and we could only reach
~1.75GB of DDR. Now there is a use case to access ~4GB of DDR w/o PAE40
The idea is to have low memory at canonical 0x8000_0000 and highmem
at 0 so enire 4GB address space is available for physical addressing
This needs additional platform/interconnect mapping to convert
the non contiguous physical addresses into linear bus adresses.
From Linux point of view, non contiguous divide means FLATMEM no
longer works and DISCONTIGMEM is needed to track the pfns in the 2
regions.
This scheme would also work for PAE40, only better in that we don't
waste struct page memory for the peripheral hole.
Vineet Gupta [Thu, 5 May 2016 09:23:48 +0000 (14:53 +0530)]
ARC: Fix PAE40 boot failures due to PTE truncation
So a benign looking cleanup which macro'ized PAGE_SHIFT shifts turned
out to be bad (since it was done non-sensically across the board).
It caused boot failures with PAE40 as forced cast to (unsigned long)
from newly introduced virt_to_pfn() was causing truncatiion of the
(long long) pte/paddr values.
It is OK to use this in accessors dealing with kernel virtual address,
pointers etc, but not for PTE values themelves.
Fixes: cJ2ff5cf2735c ("ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern) Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Vineet Gupta [Thu, 5 May 2016 08:02:34 +0000 (13:32 +0530)]
ARC: Add missing io barriers to io{read,write}{16,32}be()
While reviewing a different change to asm-generic/io.h Arnd spotted that
ARC ioread32 and ioread32be both of which come from asm-generic versions
are not symmetrical in terms of calling the io barriers.
exposed some issues with the fact that we were relying on the EFI memory
mapping mechanisms to map in our MMRs for us, after commit c79205501043.
Rather than revert the entire commit and go back to forcing
EFI_OLD_MEMMAP on all UVs, we're going to add the call to map_low_mmrs()
back into uv_system_init(), and then fix up our EFI runtime calls to use
the appropriate page table.
For now, UV2+ will still need efi=old_map to boot, but there will be
other changes soon that should eliminate the need for this.
Signed-off-by: Alex Thorlton <athorlton@sgi.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Adam Buchbinder <adam.buchbinder@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Russ Anderson <rja@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Link: http://lkml.kernel.org/r/1462401592-120735-1-git-send-email-athorlton@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Thu, 5 May 2016 02:12:09 +0000 (12:12 +1000)]
Merge tag 'drm-intel-fixes-2016-05-02' of git://anongit.freedesktop.org/drm-intel into drm-fixes
i915 fixes for 4.6. A bit more than I'd like at this stage, but
OTOH they're all stable material.
* tag 'drm-intel-fixes-2016-05-02' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fake HDMI live status
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/i915: Fix system resume if PCI device remained enabled
drm/i915: Avoid stalling on pending flips for legacy cursor updates
Dave Airlie [Thu, 5 May 2016 00:37:25 +0000 (10:37 +1000)]
Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
two fixes for hw lockups and one for a double free
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: make sure vertical front porch is at least 1
drm/radeon: make sure vertical front porch is at least 1
drm/amdgpu: set metadata pointer to NULL after freeing.
If of_node is set before calling platform_device_add, the driver core
will try to use of: modalias matching, which fails because the device
tree nodes don't have a compatible property set. This patch fixes
imx-ipuv3-crtc module autoloading by setting the of_node property only
after the platform modalias is set.
Fixes: bd2e83e06dbb ("gpu: ipu-v3: Assign of_node of child platform devices to corresponding ports") Reported-by: Dennis Gilmore <dennis@ausil.us> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-By: Dennis Gilmore <dennis@ausil.us> Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Dave Airlie <airlied@redhat.com>
Viresh Kumar [Wed, 4 May 2016 13:19:55 +0000 (18:49 +0530)]
PM / OPP: Remove useless check
Regulators are optional for devices using OPPs and the OPP core
shouldn't be printing any errors for such missing regulators.
It was fine before the commit 7e88ed669b6c, but that failed to update
this part of the code to remove an 'always true' check and an extra
unwanted print message.
Fix that now.
Fixes: 7e88ed669b6c (PM / OPP: Initialize regulator pointer to an error value) Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Yury Norov [Mon, 2 May 2016 16:12:47 +0000 (19:12 +0300)]
asm-generic: use compat version for preadv2 and pwritev2
Compat architectures that does not use generic unistd (mips, s390),
declare compat version in their syscall tables for preadv2 and
pwritev2. Generic unistd syscall table should do it as well.
[arnd: this initially slipped through the review and an
incorrect patch got merged. arch/tile/ is the only architecture
that could be affected for their 32-bit compat mode, every
other architecture we support today is fine.]
Set the mutex owner thread ID.
Original patch from: Prarit Bhargava <prarit@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115121 Link: https://github.com/acpica/acpica/commit/7a3bd2d9 Signed-off-by: Prarit Bhargava <prarit@redhat.com> Tested-by: Andy Lutomirski <luto@kernel.org> # On a Dell XPS 13 9350 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Wed, 4 May 2016 18:14:00 +0000 (11:14 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull IMA fix from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
ima: fix the string representation of the LSM/IMA hook enumeration ordering
Linus Torvalds [Wed, 4 May 2016 18:00:05 +0000 (11:00 -0700)]
Merge tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen regression fixes from David Vrabel:
- Fix two regressions causing crashes in 32-bit PV guests
- Fix a regression in the evtchn driver
* tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/evtchn: fix ring resize when binding new events
xen/balloon: Fix crash when ballooning on x86 32 bit PAE
xen: Fix page <-> pfn conversion on 32 bit systems
Jan Beulich [Wed, 4 May 2016 13:02:36 +0000 (07:02 -0600)]
xen/evtchn: fix ring resize when binding new events
The copying of ring data was wrong for two cases: For a full ring
nothing got copied at all (as in that case the canonicalized producer
and consumer indexes are identical). And in case one or both of the
canonicalized (after the resize) indexes would point into the second
half of the buffer, the copied data ended up in the wrong (free) part
of the new buffer. In both cases uninitialized data would get passed
back to the caller.
Fix this by simply copying the old ring contents twice: Once to the
low half of the new buffer, and a second time to the high half.
This addresses the inability to boot a HVM guest with 64 or more
vCPUs. This regression was caused by b50359880d1fa7b3 (xen/evtchn:
dynamically grow pending event channel ring).
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> # 4.4+ Signed-off-by: David Vrabel <david.vrabel@citrix.com>
After commit 73542d67c1c8 "intel_pstate: Remove freq calculation from
intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
to compute the average frequency, which is problematic for two reasons.
First, intel_pstate_get() may be invoked before the driver reads the
CPU feedback registers for the first time and if that happens,
get_avg_frequency() will attempt to divide by zero.
Second, the get_avg_frequency() call in intel_pstate_get() is racy
with respect to intel_pstate_sample() and it may end up returning
completely meaningless values for this reason.
Moreover, after commit df42e9279b3e "intel_pstate: Move
intel_pstate_calc_busy() into get_target_pstate_use_performance()"
sample.core_pct_busy is never computed on Atom, but it is used in
intel_pstate_adjust_busy_pstate() in that case too.
To address those problems notice that if sample.core_pct_busy
was used in the average frequency computation carried out by
get_avg_frequency(), both the divide by zero problem and the
race with respect to intel_pstate_sample() would be avoided.
Accordingly, move the invocation of intel_pstate_calc_busy() from
get_target_pstate_use_performance() to intel_pstate_update_util(),
which also will take care of the uninitialized sample.core_pct_busy
on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
as per the above.
Reported-by: kernel test robot <ying.huang@linux.intel.com> Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4 Fixes: 73542d67c1c8 "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()" Fixes: df42e9279b3e "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()" Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ima: fix the string representation of the LSM/IMA hook enumeration ordering
This patch fixes the string representation of the LSM/IMA hook enumeration
ordering used for displaying the IMA policy.
Fixes: b234434946a3 ("ima: support for kexec image and initramfs") Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Tested-by: Eric Richter <erichte@linux.vnet.ibm.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
Daniel Baluta [Thu, 17 Mar 2016 16:32:44 +0000 (18:32 +0200)]
iio: imu: mpu6050: Fix name/chip_id when using ACPI
When using ACPI, id is NULL and the current code automatically
defaults name to NULL and chip id to 0. We should instead use
the data provided in the ACPI device table.
Fixes: c816d9e7a57b ("iio: imu: mpu6050: fix possible NULL dereferences") Signed-off-by: Daniel Baluta <daniel.baluta@intel.com> Reviewed-By: Matt Ranostay <matt.ranostay@intel.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Josh Boyer [Tue, 3 May 2016 19:29:41 +0000 (20:29 +0100)]
x86/efi-bgrt: Switch all pr_err() to pr_notice() for invalid BGRT
The promise of pretty boot splashes from firmware via BGRT was at
best only that; a promise. The kernel diligently checks to make
sure the BGRT data firmware gives it is valid, and dutifully warns
the user when it isn't. However, it does so via the pr_err log
level which seems unnecessary. The user cannot do anything about
this and there really isn't an error on the part of Linux to
correct.
This lowers the log level by using pr_notice instead. Users will
no longer have their boot process uglified by the kernel reminding
us that firmware can and often is broken when the 'quiet' kernel
parameter is specified. Ironic, considering BGRT is supposed to
make boot pretty to begin with.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Môshe van der Sterre <me@moshe.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462303781-8686-4-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
Matt Fleming [Tue, 3 May 2016 19:29:39 +0000 (20:29 +0100)]
MAINTAINERS: Remove asterisk from EFI directory names
Mark reported that having asterisks on the end of directory names
confuses get_maintainer.pl when it encounters subdirectories, and that
my name does not appear when run on drivers/firmware/efi/libstub.
Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: <stable@vger.kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1462303781-8686-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Wed, 4 May 2016 01:02:38 +0000 (18:02 -0700)]
Merge tag 'trace-fixes-v4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Chunyu Hu noticed that if one writes into the trigger files within the
ftrace subsystem of events that it can cause an oops. This file is
only writable by root, but still is a bug that needs to be fixed"
* tag 'trace-fixes-v4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Don't display trigger file for events that can't be enabled
Pull networking fixes from David Miller:
"Some straggler bug fixes:
1) Batman-adv DAT must consider VLAN IDs when choosing candidate
nodes, from Antonio Quartulli.
2) Fix botched reference counting of vlan objects and neigh nodes in
batman-adv, from Sven Eckelmann.
3) netem can crash when it sees GSO packets, the fix is to segment
then upon ->enqueue. Fix from Neil Horman with help from Eric
Dumazet.
4) Fix VXLAN dependencies in mlx5 driver Kconfig, from Matthew
Finlay.
5) Handle VXLAN ops outside of rcu lock, via a workqueue, in mlx5,
since it can sleep. Fix also from Matthew Finlay.
6) Check mdiobus_scan() return values properly in pxa168_eth and macb
drivers. From Sergei Shtylyov.
7) If the netdevice doesn't support checksumming, disable
segmentation. From Alexandery Duyck.
8) Fix races between RDS tcp accept and sending, from Sowmini
Varadhan.
9) In macb driver, probe MDIO bus before we register the netdev,
otherwise we can try to open the device before it is really ready
for that. Fix from Florian Fainelli.
10) Netlink attribute size for ILA "tunnels" not calculated properly,
fix from Nicolas Dichtel"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
ipv6/ila: fix nlsize calculation for lwtunnel
net: macb: Probe MDIO bus before registering netdev
RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.
RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock
vxlan: Add checksum check to the features check function
net: Disable segmentation if checksumming is not supported
net: mvneta: Remove superfluous SMP function call
macb: fix mdiobus_scan() error check
pxa168_eth: fix mdiobus_scan() error check
net/mlx5e: Use workqueue for vxlan ops
net/mlx5e: Implement a mlx5e workqueue
net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue
net/mlx5: Unmap only the relevant IO memory mapping
netem: Segment GSO packets on enqueue
batman-adv: Fix reference counting of hardif_neigh_node object for neigh_node
batman-adv: Fix reference counting of vlan object for tt_local_entry
batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP event
batman-adv: fix DAT candidate selection (must use vid)
Linus Torvalds [Tue, 3 May 2016 21:23:58 +0000 (14:23 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Fix a regression and update the MAINTAINERS entry for fuse"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: update mailing list in MAINTAINERS
fuse: Fix return value from fuse_get_user_pages()
Nicolas Dichtel [Tue, 3 May 2016 07:58:27 +0000 (09:58 +0200)]
ipv6/ila: fix nlsize calculation for lwtunnel
The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR.
Fixes: df96e99b0e68 ("net: Identifier Locator Addressing module") CC: Tom Herbert <tom@herbertland.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: macb: Probe MDIO bus before registering netdev
The current sequence makes us register for a network device prior to
registering and probing the MDIO bus which could lead to some unwanted
consequences, like a thread of execution calling into ndo_open before
register_netdev() returns, while the MDIO bus is not ready yet.
Rework the sequence to register for the MDIO bus, and therefore attach
to a PHY prior to calling register_netdev(), which implies reworking the
error path a bit.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 May 2016 20:03:45 +0000 (16:03 -0400)]
Merge branch 'rds-fixes'
Sowmini Varadhan says:
====================
RDS: TCP: sychronization during connection startup
This patch series ensures that the passive (accept) side of the
TCP connection used for RDS-TCP is correctly synchronized with
any concurrent active (connect) attempts for a given pair of peers.
Patch 1 in the series makes sure that the t_sock in struct
rds_tcp_connection is only reset after any threads in rds_tcp_xmit
have completed (otherwise a null-ptr deref may be encountered).
Patch 2 synchronizes rds_tcp_accept_one() with the rds_tcp*connect()
path.
v2: review comments from Santosh Shilimkar, other spelling corrections
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.
An arbitration scheme for duelling SYNs is implemented as part of
commit 1610b9ae2f38 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
involved will arrive at the same arbitration decision. However, this
needs to be synchronized with an outgoing SYN to be generated by
rds_tcp_conn_connect(). This commit achieves the synchronization
through the t_conn_lock mutex in struct rds_tcp_connection.
The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
the t_conn_lock mutex. A SYN is sent out only if the RDS connection is
not already UP (an UP would indicate that rds_tcp_accept_one() has
completed 3WH, so no SYN needs to be generated).
Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
acquiring the t_conn_lock mutex. The only acceptable states (to
allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
outgoing SYN before we saw incoming SYN).
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock
There is a race condition between rds_send_xmit -> rds_tcp_xmit
and the code that deals with resolution of duelling syns added
by commit 1610b9ae2f38 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()").
Specifically, we may end up derefencing a null pointer in rds_send_xmit
if we have the interleaving sequence:
rds_tcp_accept_one rds_send_xmit
The race condition can be avoided without adding the overhead of
additional locking in the xmit path: have rds_tcp_accept_one wait
for rds_tcp_xmit threads to complete before resetting callbacks.
The synchronization can be done in the same manner as rds_conn_shutdown().
First set the rds_conn_state to something other than RDS_CONN_UP
(so that new threads cannot get into rds_tcp_xmit()), then wait for
RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any
threads in rds_tcp_xmit are done.
Fixes: 1610b9ae2f38 ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 May 2016 20:00:55 +0000 (16:00 -0400)]
Merge branch 'tunnel-csum-and-sg-offloads'
Alexander Duyck says:
====================
Fixes for tunnel checksum and segmentation offloads
This patch series is a subset of patches I had submitted for net-next. I
plan to drop these two patches from the v3 of "Fix Tunnel features and
enable GSO partial for several drivers" and I am instead submitting them
for net since these are truly fixes and likely will need to be backported
to stable branches.
This series addresses 2 specific issues. The first is that we could
request TSO on a v4 inner header while not supporting checksum offload of
the outer IPv6 header. The second is that we could request an IPv6 inner
checksum offload without validating that we could actually support an inner
IPv6 checksum offload.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 2 May 2016 16:25:16 +0000 (09:25 -0700)]
vxlan: Add checksum check to the features check function
We need to perform an additional check on the inner headers to determine if
we can offload the checksum for them. Previously this check didn't occur
so we would generate an invalid frame in the case of an IPv6 header
encapsulated inside of an IPv4 tunnel. To fix this I added a secondary
check to vxlan_features_check so that we can verify that we can offload the
inner checksum.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 2 May 2016 16:25:10 +0000 (09:25 -0700)]
net: Disable segmentation if checksumming is not supported
In the case of the mlx4 and mlx5 driver they do not support IPv6 checksum
offload for tunnels. With this being the case we should disable GSO in
addition to the checksum offload features when we find that a device cannot
perform a checksum on a given packet type.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 354a9025185a ("cpu/hotplug: Fix rollback during error-out
in __cpu_disable()") it is ensured that callbacks of CPU_ONLINE and
CPU_DOWN_PREPARE are processed on the hotplugged CPU. Due to this SMP
function calls are no longer required.
Replace smp_call_function_single() with a direct call to
mvneta_percpu_enable() or mvneta_percpu_disable(). The functions do
not require to be called with interrupts disabled, therefore the
smp_call_function_single() calling convention is not preserved.
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: netdev@vger.kernel.org Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 30 Apr 2016 22:47:36 +0000 (01:47 +0300)]
macb: fix mdiobus_scan() error check
Now mdiobus_scan() returns ERR_PTR(-ENODEV) instead of NULL if the PHY
device ID was read as all ones. As this was not an error before, this
value should be filtered out now in this driver.
Fixes: b74766a0a0fe ("phylib: don't return NULL from get_phy_device()") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 30 Apr 2016 20:35:11 +0000 (23:35 +0300)]
pxa168_eth: fix mdiobus_scan() error check
Since mdiobus_scan() returns either an error code or NULL on error, the
driver should check for both, not only for NULL, otherwise a crash is
imminent...
Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 3 May 2016 18:06:01 +0000 (11:06 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
"Fixes for the HID subsystem:
- regression fix for Wacom driver; commit introduced in 4.6-rc1
mistakenly removed line that should be kept. Fix by Ping Cheng
- two device-specific quirks, by Ping Cheng and Nazar Mokrynskyi"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: wacom: add missed stylus_in_proximity line back
HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
HID: wacom: Add support for DTK-1651