git.baikalelectronics.ru Git

net:dccp: do not report ICMP redirects to user space

DCCP shouldn't be setting sk_err on redirects as it
isn't an error condition. it should be doing exactly
what tcp is doing and leaving the error handler without
touching the socket.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic: Fix crash in cnic_bnx2x_service_kcq()

commit f6072305af3520f940a69a36aebdfcdabafbfda4
    cnic: Use CHIP_NUM macros from bnx2x.h

changed the code to use the bnx2x macro NO_FCOE() to determine if FCoE
is supported or not.  There is another place in cnic that is still using
the old method to determine if FCoE is supported or not.  The 2 methods
may not yield the same result after the network interface is brought down
and up.  This will cause the crash as cnic_bnx2x_service_kcq() will access
the uninitialized cp->kcq2.

The fix is to consistently use the same macro CNIC_SUPPORTS_FCOE() which
uses the bnx2x NO_FCOE() macro.  As a follow-up, we can clean up the code
to remove the old method as it is no longer needed.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x, cnic, bnx2i, bnx2fc: Fix bnx2i and bnx2fc regressions.

commit 66f9b6ae1b7fb6e629e17236b16c220b6a0b8f2a
bnx2x: VF RSS support - PF side

changed the configuration of the doorbell HW and it broke iSCSI and FCoE.
We fix this by making compatible changes to the doorbell address in bnx2i
and bnx2fc. For the userspace driver, we need to pass a modified CID
so that the existing userspace driver will calculate the correct doorbell
address and continue to work.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included change:
- fix the Bridge Loop Avoidance component by marking the variables containing
the VLAN ID with the HAS_TAG flag when needed.

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for you net tree,
mostly targeted to ipset, they are:

* Fix ICMPv6 NAT due to wrong comparison, code instead of type, from
  Phil Oester.

* Fix RCU race in conntrack extensions release path, from Michal Kubecek.

* Fix missing inversion in the userspace ipset test command match if
  the nomatch option is specified, from Jozsef Kadlecsik.

* Skip layer 4 protocol matching in ipset in case of IPv6 fragments,
  also from Jozsef Kadlecsik.

* Fix sequence adjustment in nfnetlink_queue due to using the netlink
  skb instead of the network skb, from Gao feng.

* Make sure we cannot swap of sets with different layer 3 family in
  ipset, from Jozsef Kadlecsik.

* Fix possible bogus matching in ipset if hash sets with net elements
  are used, from Oliver Smith.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Avoid creating fdb entry with NULL destination

Commit a762d3d70d3eb5bbd6f1f09e31888bd5d9b3c55a
   vxlan: add implicit fdb entry for default destination
creates an implicit fdb entry for default destination. This results
in an invalid fdb entry if default destination is not specified.
For ex:
  ip link add vxlan1 type vxlan id 100
creates the following fdb entry
  00:00:00:00:00:00 dev vxlan1 dst 0.0.0.0 self permanent

This patch fixes this issue by creating an fdb entry only if a
valid default destination is specified.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix RTO calculated from cached RTT

Commit 112530f77a12a ("tcp: do not use cached RTT for RTT estimation")
did not correctly account for the fact that crtt is the RTT shifted
left 3 bits. Fix the calculation to consistently reflect this fact.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: phy: cicada.c: clears warning Use #include <linux/io.h> instead of <asm/io.h>

clears following warnings :
WARNING: Use include <linux/io.h> instead of <asm/io.h>
WARNING: Use include <linux/uaccess.h> instead of <asm/uaccess.h>

Signed-off-by: Avinash Kumar <avi.kp.137@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net loopback: Set loopback_dev to NULL when freed

It has recently turned up that we have a number of long standing bugs
in the network stack cleanup code with use of the loopback device
after it has been freed that have not turned up because in most cases
the storage allocated to the loopback device is not reused, when those
accesses happen.

Set looback_dev to NULL to trigger oopses instead of silent data corrupt
when we hit this class of bug.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

batman-adv: set the TAG flag for the vid passed to BLA

When receiving or sending a packet a packet on a VLAN, the
vid has to be marked with the TAG flag in order to make any
component in batman-adv understand that the packet is coming
from a really tagged network.

This fix the Bridge Loop Avoidance behaviour which was not
able to send announces over VLAN interfaces.

Introduced by 0b1da1765fdb00ca5d53bc95c9abc70dfc9aae5b
("batman-adv: change VID semantic in the BLA code")

Signed-off-by: Antonio Quartulli <antonio@open-mesh.org>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

netfilter: nfnetlink_queue: use network skb for sequence adjustment

Instead of the netlink skb.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge branch 'sfc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc

Ben Hutchings says:

====================
Some bug fixes and future-proofing for the recently added SFC9120
support:

1. Minimal support for the 40G configuration.
2. Disable the incomplete PTP/hardware timestamping support.
3. Reset MAC stats properly after a firmware upgrade.
4. Re-check the datapath firmware capabilities after the controller is
reset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: rfc4443: do not report ICMP redirects to user space

Adapt the same behaviour for SCTP as present in TCP for ICMP redirect
messages. For IPv6, RFC4443, section 2.4. says:

  ...
  (e) An ICMPv6 error message MUST NOT be originated as a result of
      receiving the following:
  ...
       (e.2) An ICMPv6 redirect message [IPv6-DISC].
  ...

Therefore, do not report an error to user space, just invoke dst's redirect
callback and leave, same for IPv4 as done in TCP as well. The implication
w/o having this patch could be that the reception of such packets would
generate a poll notification and in worst case it could even tear down the
whole connection. Therefore, stop updating sk_err on redirects.

Reported-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: cdc_ether: use usb.h macros whenever possible

Use USB_DEVICE_AND_INTERFACE_INFO and USB_VENDOR_AND_INTERFACE_INFO
macros to reduce boilerplate.

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: cdc_ether: fix checkpatch errors and warnings

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: cdc_ether: Use wwan interface for Telit modules

Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Cc: <stable@vger.kernel.org> # 3.0+ as far back as it applies cleanly
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnels: raddr and laddr are inverted in nl msg

IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted.

Introduced by 4e7d6f1a8b63 (ip6tnl: advertise tunnel param via rtnl).

Signed-off-by: Ding Zhi <zhi.ding@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: implement unaligned addressing for DMA rings that support it

This is important patch for new devices that support unaligned
addressing. That devices suffer from the backward-compatibility bug in
DMA engine. In theory we should be able to use old mechanism, but in
practice DMA address seems to be randomly copied into status register
when hardware reaches end of a ring. This breaks reading slot number
from status register and we can't use DMA anymore.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: allow bigger et_swtype nvram variable

Without this patch it is impossible to read et_swtype, because the 1
byte space is needed for the terminating null byte. The max expected
value is 0xF, so now it should be possible to read decimal form ("15")
and hex form ("0xF").

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: fix internal switch initialization

Some devices (BCM4749, BCM5357, BCM53572) have internal switch that
requires initialization. We already have code for this, but because
of the typo in code it was never working. This resulted in network not
working for some routers and possibility of soft-bricking them.

Use correct bit for switch initialization and fix typo in the define.

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/ibm/ehea/ehea_main.c: add alias entry for portN properties

Use separate table for alias entries in the ehea module, otherwise the
probe() function will operate on the separate ports instead of the
lhea-"root" entry of the device-tree

Addresses https://bugzilla.novell.com/show_bug.cgi?id=435215

[ Thadeu notes that: "... this issue might happen with the generation of
  initrd, when the scripts check for /sys/class/net/eth0/device/modalias,
  which links to the port device at
  /sys/devices/ibmebus/23c00400.lhea/port0/" ]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Olaf Hering <ohering@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: ipset: Fix serious failure in CIDR tracking

This fixes a serious bug affecting all hash types with a net element -
specifically, if a CIDR value is deleted such that none of the same size
exist any more, all larger (less-specific) values will then fail to
match. Adding back any prefix with a CIDR equal to or more specific than
the one deleted will fix it.

Steps to reproduce:
ipset -N test hash:net
ipset -A test 1.1.0.0/16
ipset -A test 2.2.2.0/24
ipset -T test 1.1.1.1 #1.1.1.1 IS in set
ipset -D test 2.2.2.0/24
ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set

This is due to the fact that the nets counter was unconditionally
decremented prior to the iteration that shifts up the entries. Now, we
first check if there is a proceeding entry and if not, decrement it and
return. Otherwise, we proceed to iterate and then zero the last element,
which, in most cases, will already be zero.

Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipset: Validate the set family and not the set type family at swapping

This closes netfilter bugzilla #843, reported by Quentin Armitage.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipset: Consistent userspace testing with nomatch flag

The "nomatch" commandline flag should invert the matching at testing,
similarly to the --return-nomatch flag of the "set" match of iptables.
Until now it worked with the elements with "nomatch" flag only. From
now on it works with elements without the flag too, i.e:

# ipset n test hash:net
# ipset a test 10.0.0.0/24 nomatch
# ipset t test 10.0.0.1
10.0.0.1 is NOT in set test.
# ipset t test 10.0.0.1 nomatch
10.0.0.1 is in set test.

# ipset a test 192.168.0.0/24
# ipset t test 192.168.0.1
192.168.0.1 is in set test.
# ipset t test 192.168.0.1 nomatch
192.168.0.1 is NOT in set test.

Before the patch the results were

...
# ipset t test 192.168.0.1
192.168.0.1 is in set test.
# ipset t test 192.168.0.1 nomatch
192.168.0.1 is in set test.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipset: Skip really non-first fragments for IPv6 when getting port/protocol

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

cxgb4: remove workqueue when driver registration fails

When driver registration fails, we need to clean up the resources allocated
before. cxgb4 missed to destroy the workqueue allocated at the very beginning.

This patch destroies the workqueue when registration fails.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: hfcpci_softirq: get func return to suppress compiler warning

Signed-off-by: Antonio Alecrim Jr <antonio.alecrim@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Make alb learning packet interval configurable

running bonding in ALB mode requires that learning packets be sent periodically,
so that the switch knows where to send responding traffic. However, depending
on switch configuration, there may not be any need to send traffic at the
default rate of 3 packets per second, which represents little more than wasted
data. Allow the ALB learning packet interval to be made configurable via sysfs

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Acked-by: Veaceslav Falico <vfalico@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: nicstar: fix regression made by previous patch

The commit 42330003 "atm: nicstar: re-use native mac_pton() helper" did a
usefull thing. However, mac_pton() returns 1 in the case of the successfully
parsed input. This patch fixes a typo.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Fix sparse warnings

This patch fixes sparse warnings when incorrectly handling the port number
and using int instead of unsigned int iterating through &vn->sock_list[].
Keeping the port as __be16 also makes things clearer wrt endianess.
Also, it was pointed out that vxlan_get_rx_port() had unnecessary checks
which got removed.

Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix VF reset recovery

o At the time of firmware hang "adapter->need_fw_reset" variable gets
  set but after re-initialization of firmware OR at the time of VF
  re-initialization that variable was not getting cleared which
  was leading to failure in VF reset recovery.Fix it by clearing
  this variable before re-initializing VF

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: fix NULL pointer deref of br_port_get_rcu

The NULL deref happens when br_handle_frame is called between these
2 lines of del_nbp:
dev->priv_flags &= ~IFF_BRIDGE_PORT;
/* --> br_handle_frame is called at this time */
netdev_rx_handler_unregister(dev);

In br_handle_frame the return of br_port_get_rcu(dev) is dereferenced
without check but br_port_get_rcu(dev) returns NULL if:
!(dev->priv_flags & IFF_BRIDGE_PORT)

Eric Dumazet pointed out the testing of IFF_BRIDGE_PORT is not necessary
here since we're in rcu_read_lock and we have synchronize_net() in
netdev_rx_handler_unregister. So remove the testing of IFF_BRIDGE_PORT
and by the previous patch, make sure br_port_get_rcu is called in
bridging code.

Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: use br_port_get_rtnl within rtnl lock

current br_port_get_rcu is problematic in bridging path
(NULL deref). Change these calls in netlink path first.

Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ps3_gelic: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/toshiba/ps3_gelic_net.c

It's a NOOP since 2.6.35 and I will remove it one day ;)

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: smsc: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
code in drivers/net/ethernet/smsc/

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pasemi: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/pasemi/pasemi_mac.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: natsemi: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
code in drivers/net/ethernet/natsemi/

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ks8851-ml: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/micrel/ks8851_mll.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pxa168_eth: remove deprecated IRQF_DISABLED

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: lantiq_etop: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/lantiq_etop.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hp100: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/hp/hp100.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/freescale/fec_main.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Use pci_dev pm_cap

Use the already existing pm_cap variable in struct pci_dev for
determining the power management offset. This saves the driver from
having to keep track of an extra variable.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Nithin Nayak Sujir <nsujir@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Use pci_dev pm_cap

Use the already existing pm_cap variable in struct pci_dev for
determining the power management offset. This saves the driver from
having to keep track of an extra variable.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Cc: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

alx: remove redundant D0 power state set

Pci_enable_device_mem() will set device power state to D0,
so it's no need to do it again in alx_probe().
Also remove redundant PM Cap find code, because pci core
has been saved the pci device pm cap value.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: missing variable initialization

Signed-off-by: Antonio Alecrim Jr <antonio.alecrim@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: clean up debug format string usage

Avoid unneeded local string buffers for constructing debug output. Also
cleans up debug calls that contain a single parameter so that they cannot
be accidentally parsed as format strings.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/atm/he.c: convert to module_pci_driver

Signed-off-by: Libo Chen <libo.chen@huawei.com>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to ixgbe and e1000e.

Jacob provides a ixgbe patch to fix the configure_rx patch to properly
disable RSC hardware logic when a user disables it.  Previously we only
disabled RSC in the queue settings, but this does not fully disable
hardware RSC logic which can lead to unexpected performance issues.

Emil provides three fixes for ixgbe.  First fixes the ethtool loopback
test when DCB is enabled, where the frames may be modified on Tx
(by adding VLAN tag) which will fail the check on receive.  Then a fix
for QSFP+ modules, limit the speed setting to advertise only one speed
at a time since the QSFP+ modules do not support auto negotiation.
Lastly, resolve an issue where the driver will display incorrect info
for QSFP+ modules that were inserted after the driver has been loaded.

David Ertman provides to fixes for e1000e, one removes a comparison to
the boolean value true where evaluating the lvalue will produce the
same result.  The other fixes an error in the calculation of the
rar_entry_count, which causes a write of unkown/undefined register
space in the MAC to unknown/undefined register space in the PHY.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: fix overrun of PHY RAR array

When copying the MAC RAR registers to PHY there is an error in the
calculation of the rar_entry_count, which causes a write of unknown/
undefined register space in the MAC to unknown/undefined register space in
the PHY.

This patch fixes the overrun with writing to the PHY RAR and also fixes the
ethtool offline register tests so that the correctly addressed registers
have the appropriate bitmasks for R/W and RO bits for affected parts.

Shawn Rader gets credit for finding and fixing the register overrun.

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
CC: Shawn Rader <shawn.t.rader@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: cleanup boolean comparison to true

Removing a comparison to the boolean value true where simply interrogating
the lvalue will produce the same result.

Signed-off-by: David Ertman <davidx.m.ertman@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix ethtool reporting of supported links for SFP modules

This patch resolves an issue where the driver will display incorrect info
for Q/SFP+ modules that were inserted after the driver has been loaded.

This patch adds a call to identify_phy() in ixgbe_get_settings() prior to
calling get_link_capabilities() which needs the PHY data in order to
determine the correct settings.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: limit setting speed to only one at a time for QSFP modules

QSFP+ modules do not support auto negotiation and should advertise only
one speed at a time.

This patch adds logic in ethtool to allow setting and reporting the
advertised speed at either 1Gbps or 10Gbps, but not both. Also limits
the speed set in ixgbe_sfp_link_config_subtask() to highest supported.
Previously the link was set to whatever the supported speeds were.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix ethtool loopback diagnostic with DCB enabled

This patch disables DCB prior to running the loopback test.
When DCB is enabled the frames may be modified on Tx (by adding vlan tag)
which will fail the check on Rx.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Jack Morgan <jack.morgan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fully disable hardware RSC logic when disabling RSC

This patch modifies the configure_rx path in order to properly disable RSC
hardware logic when the user disables it. Previously we only disabled RSC in the
queue settings, but this does not fully disable hardware RSC logic which can
lead to some unexpected performance issues.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

netfilter: nf_nat_proto_icmpv6:: fix wrong comparison in icmpv6_manip_pkt

In commit 162106cf (netfilter: ipv6: add IPv6 NAT support), icmpv6_manip_pkt
was added with an incorrect comparison of ICMP codes to types. This causes
problems when using NAT rules with the --random option. Correct the
comparison.

This closes netfilter bugzilla #851, reported by Alexander Neumann.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack: use RCU safe kfree for conntrack extensions

Commit 7cd26af1 (netfilter: nf_nat: fix RCU races) introduced
RCU protection for freeing extension data when reallocation
moves them to a new location. We need the same protection when
freeing them in nf_ct_ext_free() in order to prevent a
use-after-free by other threads referencing a NAT extension data
via bysource list.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net/irda/mcs7780: fix memory leaks in mcs_net_open()

If rx_urb allocation fails in mcs_setup_urbs(), tx_urb leaks.
If mcs_receive_start() fails in mcs_net_open(), the both urbs are not deallocated.

The patch fixes the issues and by the way fixes label indentation.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Check device state when setting coalescing

When the device is down, CQs are freed. We must check the device state
to avoid issuing firmware commands on non existing CQs.

CC: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Clamp forward_delay when enabling STP

At some point limits were added to forward_delay. However, the
limits are only enforced when STP is enabled. This created a
scenario where you could have a value outside the allowed range
while STP is disabled, which then stuck around even after STP
is enabled.

This patch fixes this by clamping the value when we enable STP.

I had to move the locking around a bit to ensure that there is
no window where someone could insert a value outside the range
while we're in the middle of enabling STP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>

resubmit bridge: fix message_age_timer calculation

This changes the message_age_timer calculation to use the BPDU's max age as
opposed to the local bridge's max age. This is in accordance with section
8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification.

With the current implementation, when running with very large bridge
diameters, convergance will not always occur even if a root bridge is
configured to have a longer max age.

Tested successfully on bridge diameters of ~200.

Signed-off-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tulip: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/dec/tulip/de4x5.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: amd: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/amd/sun3lance.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/ibm/ehea/ehea_main.c

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bfin_mac: remove deprecated IRQF_DISABLED

This patch proposes to remove the IRQF_DISABLED flag from
drivers/net/ethernet/adi/bfin_mac.c.

It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: count number required slots for an skb more carefully

When a VM is providing an iSCSI target and the LUN is used by the
backend domain, the generated skbs for direct I/O writes to the disk
have large, multi-page skb->data but no frags.

With some lengths and starting offsets, xen_netbk_count_skb_slots()
would be one short because the simple calculation of
DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
decisions made by start_new_rx_buffer() which does not guarantee
responses are fully packed.

For example, a skb with length < 2 pages but which spans 3 pages would
be counted as requiring 2 slots but would actually use 3 slots.

skb->data:

    |        1111|222222222222|3333        |

Fully packed, this would need 2 slots:

    |111122222222|22223333    |

But because the 2nd page wholy fits into a slot it is not split across
slots and goes into a slot of its own:

    |1111        |222222222222|3333        |

Miscounting the number of slots means netback may push more responses
than the number of available requests.  This will cause the frontend
to get very confused and report "Too many frags/slots".  The frontend
never recovers and will eventually BUG.

Fix this by counting the number of required slots more carefully.  In
xen_netbk_count_skb_slots(), more closely follow the algorithm used by
xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
is the dry-run equivalent of netbk_gop_frag_copy().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Expand led off fix to include 5720

Commit 77add871ba34f3a6a97e4f6b1ba274dd0ff50566 ("tg3: Don't turn off
led on 5719 serdes port 0") added code to skip turning led off on port
0 of the 5719 since it powered down other ports. This workaround needs
to be enabled on the 5720 as well.

Cc: stable@vger.kernel.org
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: fix ipv6 ipsec encryption bug in sctp_v6_xmit

Alan Chester reported an issue with IPv6 on SCTP that IPsec traffic is not
being encrypted, whereas on IPv4 it is. Setting up an AH + ESP transport
does not seem to have the desired effect:

SCTP + IPv4:

  22:14:20.809645 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 116)
    192.168.0.2 > 192.168.0.5: AH(spi=0x00000042,sumlen=16,seq=0x1): ESP(spi=0x00000044,seq=0x1), length 72
  22:14:20.813270 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto AH (51), length 340)
    192.168.0.5 > 192.168.0.2: AH(spi=0x00000043,sumlen=16,seq=0x1):

SCTP + IPv6:

  22:31:19.215029 IP6 (class 0x02, hlim 64, next-header SCTP (132) payload length: 364)
    fe80::222:15ff:fe87:7fc.3333 > fe80::92e6:baff:fe0d:5a54.36767: sctp
    1) [INIT ACK] [init tag: 747759530] [rwnd: 62464] [OS: 10] [MIS: 10]

Moreover, Alan says:

  This problem was seen with both Racoon and Racoon2. Other people have seen
  this with OpenSwan. When IPsec is configured to encrypt all upper layer
  protocols the SCTP connection does not initialize. After using Wireshark to
  follow packets, this is because the SCTP packet leaves Box A unencrypted and
  Box B believes all upper layer protocols are to be encrypted so it drops
  this packet, causing the SCTP connection to fail to initialize. When IPsec
  is configured to encrypt just SCTP, the SCTP packets are observed unencrypted.

In fact, using `socat sctp6-listen:3333 -` on one end and transferring "plaintext"
string on the other end, results in cleartext on the wire where SCTP eventually
does not report any errors, thus in the latter case that Alan reports, the
non-paranoid user might think he's communicating over an encrypted transport on
SCTP although he's not (tcpdump ... -X):

  ...
  0x0030: 5d70 8e1a 0003 001a 177d eb6c 0000 0000  ]p.......}.l....
  0x0040: 0000 0000 706c 6169 6e74 6578 740a 0000  ....plaintext...

Only in /proc/net/xfrm_stat we can see XfrmInTmplMismatch increasing on the
receiver side. Initial follow-up analysis from Alan's bug report was done by
Alexey Dobriyan. Also thanks to Vlad Yasevich for feedback on this.

SCTP has its own implementation of sctp_v6_xmit() not calling inet6_csk_xmit().
This has the implication that it probably never really got updated along with
changes in inet6_csk_xmit() and therefore does not seem to invoke xfrm handlers.

SCTP's IPv4 xmit however, properly calls ip_queue_xmit() to do the work. Since
a call to inet6_csk_xmit() would solve this problem, but result in unecessary
route lookups, let us just use the cached flowi6 instead that we got through
sctp_v6_get_dst(). Since all SCTP packets are being sent through sctp_packet_transmit(),
we do the route lookup / flow caching in sctp_transport_route(), hold it in
tp->dst and skb_dst_set() right after that. If we would alter fl6->daddr in
sctp_v6_xmit() to np->opt->srcrt, we possibly could run into the same effect
of not having xfrm layer pick it up, hence, use fl6_update_dst() in sctp_v6_get_dst()
instead to get the correct source routed dst entry, which we assign to the skb.

Also source address routing example from 42f03327d ("sctp: fix sctp to work with
ipv6 source address routing") still works with this patch! Nevertheless, in RFC5095
it is actually 'recommended' to not use that anyway due to traffic amplification [1].
So it seems we're not supposed to do that anyway in sctp_v6_xmit(). Moreover, if
we overwrite the flow destination here, the lower IPv6 layer will be unable to
put the correct destination address into IP header, as routing header is added in
ipv6_push_nfrag_opts() but then probably with wrong final destination. Things aside,
result of this patch is that we do not have any XfrmInTmplMismatch increase plus on
the wire with this patch it now looks like:

SCTP + IPv6:

  08:17:47.074080 IP6 2620:52:0:102f:7a2b:cbff:fe27:1b0a > 2620:52:0:102f:213:72ff:fe32:7eba:
    AH(spi=0x00005fb4,seq=0x1): ESP(spi=0x00005fb5,seq=0x1), length 72
  08:17:47.074264 IP6 2620:52:0:102f:213:72ff:fe32:7eba > 2620:52:0:102f:7a2b:cbff:fe27:1b0a:
    AH(spi=0x00003d54,seq=0x1): ESP(spi=0x00003d55,seq=0x1), length 296

This fixes Kernel Bugzilla 24412. This security issue seems to be present since
2.6.18 kernels. Lets just hope some big passive adversary in the wild didn't have
its fun with that. lksctp-tools IPv6 regression test suite passes as well with
this patch.

[1] http://www.secdev.org/conf/IPv6_RH_security-csw07.pdf

Reported-by: Alan Chester <alan.chester@tekelec.com>
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tuntap: correctly handle error in tun_set_iff()

Commit ca420e6eb688e160642f883a32ba860eda46cd6f
(tuntap: multiqueue support) only call free_netdev() on error in
tun_set_iff(). This causes several issues:

- memory of tun security were leaked
- use after free since the flow gc timer was not deleted and the tfile
were not detached

This patch solves the above issues.

Reported-by: Wannes Rombouts <wannes.rombouts@epitech.eu>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen-netback: fix possible format string flaw

This makes sure a format string cannot accidentally leak into the
kthread_run() call.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: Should handle ETH_P_ARP other than ETH_P_IP in netpoll_neigh_reply

The received ARP request type in the Ethernet packet head is ETH_P_ARP other than ETH_P_IP.

[ Bug introduced by commit 81308195e0486cb5503095eb57eecdefeb81e012
("netpoll: prepare for ipv6") ]

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Read flow control for i350 from correct EEPROM section

Flow control is defined in the four EEPROM sections but the driver only reads
from section 0.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Add additional get_phy_id call for i354 devices

This patch fixes a problem where some ports can fail to initialize on a
cold boot. This patch adds an additional call to read the PHY id for i354
devices in order workaround the hardware problem.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'master-2013-09-09' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless

John W. Linville says:

====================
This is a pull request for a few early fixes for the 3.12 stream.

Alexey Khoroshilov corrects a use-after-free issue on rtl8187 found
by the Linux Driver Verification project.

Arend van Spriel provides a brcmfmac patch to fix a build issue
reported by Randy Dunlap.

Hauke Mehrtens offers a bcma fix to properly account for the storage
width of error code values before checking them.

Solomon Peachy brings a pair of cw1200 fixes to avoid hangs in that
driver with SPI devices. One avoids transfers in interrupt context,
the other fixes a locking issue.

Stanislaw Gruszka changes the initialization of the rt2800 driver to
avoid a freeze, addressing a bug in the Red Hat bugzilla.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: enforce RX_MULTI_EN for the 8168f.

Same narrative as ffa606e9c1235c28193ed234996e08877c901972 ("r8169: RxConfig
hack for the 8168evl.") regarding AMD IOMMU errors.

RTL_GIGA_MAC_VER_36 - 8168f as well - has not been reported to behave the
same.

Tested-by: David R <david@unsolicited.net>
Tested-by: Frédéric Leroy <fredo@starox.org>
Cc: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Bye, bye, WfW flag

This reverts the Linux for Workgroups thing.  And no, before somebody
asks, we're not doing Linux95.  Not for a few years, at least.

Sure, the flag added some color to the logo, and could have remained as
a testament to my leet gimp skills.  But no.  And I'll do this early, to
avoid the chance of forgetting when I'm doing the actual rc1 release on
the road.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'ecryptfs-3.12-rc1-crypt-ctx' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull eCryptfs fixes from Tyler Hicks:
"Two small fixes to the code that initializes the per-file crypto
  contexts"

* tag 'ecryptfs-3.12-rc1-crypt-ctx' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: avoid ctx initialization race
  ecryptfs: remove check for if an array is NULL

Merge branch 'for-v3.12-fix' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping

Pull DMA-mapping fix from Marek Szyprowski:
"A build bugfix for the device tree support for reserved memory
  regions.  Due to superfluous include the common code failed to build
  on ARM64 and MIPS architectures.

  The patch that caused the build break has lived at linux-next for
  about two weeks and noone noticed the issue, what convinced me that
  everything was ok"

* 'for-v3.12-fix' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  drivers: of: fix build break if asm/dma-contiguous.h is missing

Merge tag 'for-3.12' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf

Pull dma-buf updates from Sumit Semwal:
"Yet another small one - dma-buf framework now supports size discovery
  of the buffer via llseek"

* tag 'for-3.12' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf:
  dma-buf: Expose buffer size to userspace (v2)
  dma-buf: Check return value of anon_inode_getfile

Merge branch 'akpm' (patches from Andrew Morton)

Merge first patch-bomb from Andrew Morton:
- Some pidns/fork/exec tweaks
- OCFS2 updates
- Most of MM - there remain quite a few memcg parts which depend on
   pending core cgroups changes.  Which might have been already merged -
   I'll check tomorrow...
- Various misc stuff all over the place
- A few block bits which I never got around to sending to Jens -
   relatively minor things.
- MAINTAINERS maintenance
- A small number of lib/ updates
- checkpatch updates
- epoll
- firmware/dmi-scan
- Some kprobes work for S390
- drivers/rtc updates
- hfsplus feature work
- vmcore feature work
- rbtree upgrades
- AOE updates
- pktcdvd cleanups
- PPS
- memstick
- w1
- New "inittmpfs" feature, which does the obvious
- More IPC work from Davidlohr.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (303 commits)
  lz4: fix compression/decompression signedness mismatch
  ipc: drop ipc_lock_check
  ipc, shm: drop shm_lock_check
  ipc: drop ipc_lock_by_ptr
  ipc, shm: guard against non-existant vma in shmdt(2)
  ipc: document general ipc locking scheme
  ipc,msg: drop msg_unlock
  ipc: rename ids->rw_mutex
  ipc,shm: shorten critical region for shmat
  ipc,shm: cleanup do_shmat pasta
  ipc,shm: shorten critical region for shmctl
  ipc,shm: make shmctl_nolock lockless
  ipc,shm: introduce shmctl_nolock
  ipc: drop ipcctl_pre_down
  ipc,shm: shorten critical region in shmctl_down
  ipc,shm: introduce lockless functions to obtain the ipc object
  initmpfs: use initramfs if rootfstype= or root= specified
  initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled
  initmpfs: move rootfs code from fs/ramfs/ to init/
  initmpfs: move bdi setup from init_rootfs to init_ramfs
  ...

lz4: fix compression/decompression signedness mismatch

LZ4 compression and decompression functions require different in
signedness input/output parameters: unsigned char for compression and
signed char for decompression.

Change decompression API to require "(const) unsigned char *".

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc: drop ipc_lock_check

No remaining users, we now use ipc_obtain_object_check().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc, shm: drop shm_lock_check

This function was replaced by a the lockless shm_obtain_object_check(),
and no longer has any users.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc: drop ipc_lock_by_ptr

After previous cleanups and optimizations, this function is no longer
heavily used and we don't have a good reason to keep it. Update the few
remaining callers and get rid of it.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc, shm: guard against non-existant vma in shmdt(2)

When !CONFIG_MMU there's a chance we can derefence a NULL pointer when the
VM area isn't found - check the return value of find_vma().

Also, remove the redundant -EINVAL return: retval is set to the proper
return code and *only* changed to 0, when we actually unmap the segments.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc: document general ipc locking scheme

As suggested by Andrew, add a generic initial locking scheme used
throughout all sysv ipc mechanisms. Documenting the ids rwsem, how rcu
can be enough to do the initial checks and when to actually acquire the
kern_ipc_perm.lock spinlock.

I found that adding it to util.c was generic enough.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,msg: drop msg_unlock

There is only one user left, drop this function and just call
ipc_unlock_object() and rcu_read_unlock().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc: rename ids->rw_mutex

Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,shm: shorten critical region for shmat

Similar to other system calls, acquire the kern_ipc_perm lock after doing
the initial permission and security checks.

[sasha.levin@oracle.com: dont leave do_shmat with rcu lock held]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,shm: cleanup do_shmat pasta

Clean up some of the messy do_shmat() spaghetti code, getting rid of
out_free and out_put_dentry labels. This makes shortening the critical
region of this function in the next patch a little easier to do and read.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,shm: shorten critical region for shmctl

With the *_INFO, *_STAT, IPC_RMID and IPC_SET commands already optimized,
deal with the remaining SHM_LOCK and SHM_UNLOCK commands. Take the
shm_perm lock after doing the initial auditing and security checks. The
rest of the logic remains unchanged.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,shm: make shmctl_nolock lockless

While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
it unnecessarily. We can do the permissions and security checks only
holding the rcu lock.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,shm: introduce shmctl_nolock

Similar to semctl and msgctl, when calling msgctl, the *_INFO and *_STAT
commands can be performed without acquiring the ipc object.

Add a shmctl_nolock() function and move the logic of *_INFO and *_STAT out
of msgctl(). Since we are just moving functionality, this change still
takes the lock and it will be properly lockless in the next patch.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc: drop ipcctl_pre_down

Now that sem, msgque and shm, through *_down(), all use the lockless
variant of ipcctl_pre_down(), go ahead and delete it.

[akpm@linux-foundation.org: fix function name in kerneldoc, cleanups]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,shm: shorten critical region in shmctl_down

Instead of holding the ipc lock for the entire function, use the
ipcctl_pre_down_nolock and only acquire the lock for specific commands:
RMID and SET.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc,shm: introduce lockless functions to obtain the ipc object

This is the third and final patchset that deals with reducing the amount
of contention we impose on the ipc lock (kern_ipc_perm.lock).  These
changes mostly deal with shared memory, previous work has already been
done for semaphores and message queues:

  http://lkml.org/lkml/2013/3/20/546 (sems)
  http://lkml.org/lkml/2013/5/15/584 (mqueues)

With these patches applied, a custom shm microbenchmark stressing shmctl
doing IPC_STAT with 4 threads a million times, reduces the execution
time by 50%.  A similar run, this time with IPC_SET, reduces the
execution time from 3 mins and 35 secs to 27 seconds.

Patches 1-8: replaces blindly taking the ipc lock for a smarter
combination of rcu and ipc_obtain_object, only acquiring the spinlock
when updating.

Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.

Patch 10: is a trivial mqueue leftover cleanup

Patch 11: adds a brief lock scheme description, requested by Andrew.

This patch:

Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
to get the ipc object without acquiring the lock.  Just as with other
forms of ipc, these functions are basically wrappers around
ipc_obtain_object*().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

initmpfs: use initramfs if rootfstype= or root= specified

Command line option rootfstype=ramfs to obtain old initramfs behavior, and
use ramfs instead of tmpfs for stub when root= defined (for cosmetic
reasons).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Rob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

initmpfs: make rootfs use tmpfs when CONFIG_TMPFS enabled

Conditionally call the appropriate fs_init function and fill_super
functions. Add a use once guard to shmem_init() to simply succeed on a
second call.

(Note that IS_ENABLED() is a compile time constant so dead code
elimination removes unused function calls when CONFIG_TMPFS is disabled.)

Signed-off-by: Rob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

initmpfs: move rootfs code from fs/ramfs/ to init/

When the rootfs code was a wrapper around ramfs, having them in the same
file made sense. Now that it can wrap another filesystem type, move it in
with the init code instead.

This also allows a subsequent patch to access rootfstype= command line
arg.

Signed-off-by: Rob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

initmpfs: move bdi setup from init_rootfs to init_ramfs

Even though ramfs hasn't got a backing device, commit 955661646be4 ("mm:
bdi init hooks") added one anyway, and put the initialization in
init_rootfs() since that's the first user, leaving it out of init_ramfs()
to avoid duplication.

But initmpfs uses init_tmpfs() instead, so move the init into the
filesystem's init function, add a "once" guard to prevent duplicate
initialization, and call the filesystem init from rootfs init.

This goes part of the way to allowing ramfs to be built as a module.

[akpm@linux-foundation.org; using bit 1 was odd]
Signed-off-by: Rob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>