]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
9 years agohv_netvsc: Implement batching in send buffer
Haiyang Zhang [Thu, 26 Mar 2015 16:03:37 +0000 (09:03 -0700)]
hv_netvsc: Implement batching in send buffer

With this patch, we can send out multiple RNDIS data packets in one send buffer
slot and one VMBus message. It reduces the overhead associated with VMBus messages.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Sun, 29 Mar 2015 19:43:43 +0000 (12:43 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree.
Basically, nf_tables updates to add the set extension infrastructure and finish
the transaction for sets from Patrick McHardy. More specifically, they are:

1) Move netns to basechain and use recently added possible_net_t, from
   Patrick McHardy.

2) Use LOGLEVEL_<FOO> from nf_log infrastructure, from Joe Perches.

3) Restore nf_log_trace that was accidentally removed during conflict
   resolution.

4) nft_queue does not depend on NETFILTER_XTABLES, starting from here
   all patches from Patrick McHardy.

5) Use raw_smp_processor_id() in nft_meta.

Then, several patches to prepare ground for the new set extension
infrastructure:

6) Pass object length to the hash callback in rhashtable as needed by
   the new set extension infrastructure.

7) Cleanup patch to restore struct nft_hash as wrapper for struct
   rhashtable

8) Another small source code readability cleanup for nft_hash.

9) Convert nft_hash to rhashtable callbacks.

And finally...

10) Add the new set extension infrastructure.

11) Convert the nft_hash and nft_rbtree sets to use it.

12) Batch set element release to avoid several RCU grace period in a row
    and add new function nft_set_elem_destroy() to consolidate set element
    release.

13) Return the set extension data area from nft_lookup.

14) Refactor existing transaction code to add some helper functions
    and document it.

15) Complete the set transaction support, using similar approach to what we
    already use, to activate/deactivate elements in an atomic fashion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tipc-next'
David S. Miller [Sun, 29 Mar 2015 19:40:37 +0000 (12:40 -0700)]
Merge branch 'tipc-next'

Ying Xue says:

====================
tipc: fix two corner issues

The patch set aims at resolving the following two critical issues:

Patch #1: Resolve a deadlock which happens while all links are reset
Patch #2: Correct a mistake usage of RCU lock which is used to protect
          node list
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: involve reference counter for node structure
Ying Xue [Thu, 26 Mar 2015 10:10:24 +0000 (18:10 +0800)]
tipc: involve reference counter for node structure

TIPC node hash node table is protected with rcu lock on read side.
tipc_node_find() is used to look for a node object with node address
through iterating the hash node table. As the entire process of what
tipc_node_find() traverses the table is guarded with rcu read lock,
it's safe for us. However, when callers use the node object returned
by tipc_node_find(), there is no rcu read lock applied. Therefore,
this is absolutely unsafe for callers of tipc_node_find().

Now we introduce a reference counter for node structure. Before
tipc_node_find() returns node object to its caller, it first increases
the reference counter. Accordingly, after its caller used it up,
it decreases the counter again. This can prevent a node being used by
one thread from being freed by another thread.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: fix potential deadlock when all links are reset
Ying Xue [Thu, 26 Mar 2015 10:10:23 +0000 (18:10 +0800)]
tipc: fix potential deadlock when all links are reset

[   60.988363] ======================================================
[   60.988754] [ INFO: possible circular locking dependency detected ]
[   60.989152] 3.19.0+ #194 Not tainted
[   60.989377] -------------------------------------------------------
[   60.989781] swapper/3/0 is trying to acquire lock:
[   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.990743]
[   60.990743] but task is already holding lock:
[   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.991738]
[   60.991738] which lock already depends on the new lock.
[   60.991738]
[   60.992174]
[   60.992174] the existing dependency chain (in reverse order) is:
[   60.992174]
-> #1 (&(&bclink->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
[   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
[   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
[   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
[   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
-> #0 (&(&n_ptr->lock)->rlock){+.-...}:
[   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
[   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
[   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
[   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
[   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
[   60.992174]
[   60.992174] other info that might help us debug this:
[   60.992174]
[   60.992174]  Possible unsafe locking scenario:
[   60.992174]
[   60.992174]        CPU0                    CPU1
[   60.992174]        ----                    ----
[   60.992174]   lock(&(&bclink->lock)->rlock);
[   60.992174]                                lock(&(&n_ptr->lock)->rlock);
[   60.992174]                                lock(&(&bclink->lock)->rlock);
[   60.992174]   lock(&(&n_ptr->lock)->rlock);
[   60.992174]
[   60.992174]  *** DEADLOCK ***
[   60.992174]
[   60.992174] 3 locks held by swapper/3/0:
[   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
[   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
[   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[   60.992174]

The correct the sequence of grabbing n_ptr->lock and bclink->lock
should be that the former is first held and the latter is then taken,
which exactly happened on CPU1. But especially when the retransmission
of broadcast link is failed, bclink->lock is first held in
tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
called by tipc_link_retransmit() subsequently, which is demonstrated on
CPU0. As a result, deadlock occurs.

If the order of holding the two locks happening on CPU0 is reversed, the
deadlock risk will be relieved. Therefore, the node lock taken in
link_retransmit_failure() originally is moved to tipc_bclink_rcv()
so that it's obtained before bclink lock. But the precondition of
the adjustment of node lock is that responding to bclink reset event
must be moved from tipc_bclink_unlock() to tipc_node_unlock().

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agovirtio: simplify the using of received in virtnet_poll
Li RongQing [Thu, 26 Mar 2015 07:39:45 +0000 (15:39 +0800)]
virtio: simplify the using of received in virtnet_poll

received is 0, no need to minus it and use "+=" to reassign it

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'be2net-next'
David S. Miller [Sun, 29 Mar 2015 19:34:05 +0000 (12:34 -0700)]
Merge branch 'be2net-next'

Sathya Perla says:

====================
be2net: patch set

Hi David, this patch set includes 2 feature additions to the be2net driver:

Patch 1 sets up cpu affinity hints for be2net irqs using the
cpumask_set_cpu_local_first() API that first picks the near numa cores
and when they are exhausted, selects the far numa cores.

Patch 2 setups up xps queue mapping for be2net's TXQs to avoid,
by default, TX lock contention.

Patch 3 just bumps up the driver version.

Pls consider applying this patch set to the net-next queue. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: bump up the driver version to 10.6.0.1
Sathya Perla [Thu, 26 Mar 2015 07:05:10 +0000 (03:05 -0400)]
be2net: bump up the driver version to 10.6.0.1

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: setup xps queue mapping
Sathya Perla [Thu, 26 Mar 2015 07:05:09 +0000 (03:05 -0400)]
be2net: setup xps queue mapping

This patch sets up xps queue mapping on load, so that TX traffic is
steered to the queue whose irqs are being processed by the current cpu.
This helps in avoiding TX lock contention.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobe2net: assign CPU affinity hints to be2net IRQs
Padmanabh Ratnakar [Thu, 26 Mar 2015 07:05:08 +0000 (03:05 -0400)]
be2net: assign CPU affinity hints to be2net IRQs

This patch provides hints to irqbalance to map be2net IRQs to
specific CPU cores. cpumask_set_cpu_local_first() is used, which first
maps IRQs to near NUMA cores; when those cores are exhausted, IRQs are
mapped to far NUMA cores.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: tcp_syn_flood_action() can be static
Eric Dumazet [Wed, 25 Mar 2015 22:08:47 +0000 (15:08 -0700)]
tcp: tcp_syn_flood_action() can be static

After commit b717cd6f223e ("tcp: add tcp_conn_request"),
tcp_syn_flood_action() is no longer used from IPv6.

We can make it static, by moving it above tcp_conn_request()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: fix boolreturn.cocci warnings
Wu Fengguang [Wed, 25 Mar 2015 21:55:25 +0000 (05:55 +0800)]
cxgb4: fix boolreturn.cocci warnings

drivers/net/ethernet/chelsio/cxgb4/cxgb4_fcoe.c:49:9-10: WARNING: return of 0/1 in function 'cxgb_fcoe_sof_eof_supported' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: Varun Prakash <varun@chelsio.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofib6: install fib6 ops in the last step
WANG Cong [Wed, 25 Mar 2015 21:45:02 +0000 (14:45 -0700)]
fib6: install fib6 ops in the last step

We should not commit the new ops until we finish
all the setup, otherwise we have to NULL it on failure.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bcmgenet-next'
David S. Miller [Fri, 27 Mar 2015 21:26:21 +0000 (14:26 -0700)]
Merge branch 'bcmgenet-next'

Petri Gynther says:

====================
net: bcmgenet: multiple Rx queues support

Final patch set to add support for multiple Rx queues:
1. remove priv->int0_mask and priv->int1_mask
2. modify Tx ring int_enable and int_disable vectors
3. simplify bcmgenet_init_dma()
4. tweak init_umac()
5. rework Tx NAPI code
6. rework Rx NAPI code
7. add support for multiple Rx queues
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: add support for multiple Rx queues
Petri Gynther [Wed, 25 Mar 2015 19:35:16 +0000 (12:35 -0700)]
net: bcmgenet: add support for multiple Rx queues

Add support for multiple Rx queues:
1. Add NAPI context per Rx queue
2. Modify Rx interrupt and Rx NAPI code to handle multiple Rx queues

Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: rework Rx NAPI code
Petri Gynther [Wed, 25 Mar 2015 19:35:15 +0000 (12:35 -0700)]
net: bcmgenet: rework Rx NAPI code

Introduce new bcmgenet functions to handle the NAPI calls to:
netif_napi_add()
napi_enable()
napi_disable()
netif_napi_del()

Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: rework Tx NAPI code
Petri Gynther [Wed, 25 Mar 2015 19:35:14 +0000 (12:35 -0700)]
net: bcmgenet: rework Tx NAPI code

Introduce new bcmgenet functions to handle the NAPI calls to:
netif_napi_add()
napi_enable()
napi_disable()
netif_napi_del()

Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: tweak init_umac()
Petri Gynther [Wed, 25 Mar 2015 19:35:12 +0000 (12:35 -0700)]
net: bcmgenet: tweak init_umac()

Use more meaningful variable names int0_enable and int1_enable when
enabling bcmgenet interrupts.

For Rx default queue interrupts, use:
UMAC_IRQ_RXDMA_BDONE | UMAC_IRQ_RXDMA_PDONE

For Tx default queue interrupts, use:
UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE

Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: simplify bcmgenet_init_dma()
Petri Gynther [Wed, 25 Mar 2015 19:35:11 +0000 (12:35 -0700)]
net: bcmgenet: simplify bcmgenet_init_dma()

Do the two kcalloc() calls first, before proceeding into Rx/Tx DMA init.
Makes the error case handling much simpler.

Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: modify Tx ring int_enable and int_disable vectors
Petri Gynther [Wed, 25 Mar 2015 19:35:10 +0000 (12:35 -0700)]
net: bcmgenet: modify Tx ring int_enable and int_disable vectors

Remove unnecessary function parameter priv. Use ring->priv instead.

Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: remove priv->int0_mask and priv->int1_mask
Petri Gynther [Wed, 25 Mar 2015 19:35:09 +0000 (12:35 -0700)]
net: bcmgenet: remove priv->int0_mask and priv->int1_mask

Remove unused priv->int0_mask and priv->int1_mask.

Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'xgene-next'
David S. Miller [Fri, 27 Mar 2015 21:18:52 +0000 (14:18 -0700)]
Merge branch 'xgene-next'

Iyappan Subramanian says:

====================
drivers: net: xgene: Add separate tx completion ring

SGMII based 1GbE and 10GbE interfaces support multiple interrupts.
Adding separate tx completion descriptor ring and associating a dedicated irq for the TX completion.
====================

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
9 years agodrivers: net: xgene: Add separate tx completion ring
Iyappan Subramanian [Wed, 25 Mar 2015 19:19:12 +0000 (12:19 -0700)]
drivers: net: xgene: Add separate tx completion ring

- Added wrapper functions around napi_add, napi_del, napi_enable and napi_disable
- Moved platform_get_irq function call after reading phy_mode
- Associating the new irq to tx completion for the supported ethernet interfaces

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodtb: xgene: Add interrupt for Tx completion
Iyappan Subramanian [Wed, 25 Mar 2015 19:19:11 +0000 (12:19 -0700)]
dtb: xgene: Add interrupt for Tx completion

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoDocumentation: dts: xgene: Update interrupt field description
Iyappan Subramanian [Wed, 25 Mar 2015 19:19:10 +0000 (12:19 -0700)]
Documentation: dts: xgene: Update interrupt field description

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetfilter: nf_tables: implement set transaction support
Patrick McHardy [Wed, 25 Mar 2015 14:08:50 +0000 (14:08 +0000)]
netfilter: nf_tables: implement set transaction support

Set elements are the last object type not supporting transaction support.
Implement similar to the existing rule transactions:

The global transaction counter keeps track of two generations, current
and next. Each element contains a bitmask specifying in which generations
it is inactive.

New elements start out as inactive in the current generation and active
in the next. On commit, the previous next generation becomes the current
generation and the element becomes active. The bitmask is then cleared
to indicate that the element is active in all future generations. If the
transaction is aborted, the element is removed from the set before it
becomes active.

When removing an element, it gets marked as inactive in the next generation.
On commit the next generation becomes active and the therefor the element
inactive. It is then taken out of then set and released. On abort, the
element is marked as active for the next generation again.

Lookups ignore elements not active in the current generation.

The current set types (hash/rbtree) both use a field in the extension area
to store the generation mask. This (currently) does not require any
additional memory since we have some free space in there.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nf_tables: add transaction helper functions
Patrick McHardy [Wed, 25 Mar 2015 14:08:49 +0000 (14:08 +0000)]
netfilter: nf_tables: add transaction helper functions

Add some helper functions for building the genmask as preparation for
set transactions.

Also add a little documentation how this stuff actually works.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nf_tables: return set extensions from ->lookup()
Patrick McHardy [Wed, 25 Mar 2015 14:08:48 +0000 (14:08 +0000)]
netfilter: nf_tables: return set extensions from ->lookup()

Return the extension area from the ->lookup() function to allow to
consolidate common actions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nf_tables: consolide set element destruction
Patrick McHardy [Wed, 25 Mar 2015 14:08:47 +0000 (14:08 +0000)]
netfilter: nf_tables: consolide set element destruction

With the conversion to set extensions, it is now possible to consolidate
the different set element destruction functions.

The set implementations' ->remove() functions are changed to only take
the element out of their internal data structures. Elements will be freed
in a batched fashion after the global transaction's completion RCU grace
period.

This reduces the amount of grace periods required for nft_hash from N
to zero additional ones, additionally this guarantees that the set
elements' extensions of all implementations can be used under RCU
protection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agoipv6: hash net ptr into fragmentation bucket selection
Hannes Frederic Sowa [Wed, 25 Mar 2015 16:07:45 +0000 (17:07 +0100)]
ipv6: hash net ptr into fragmentation bucket selection

As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: hash net ptr into fragmentation bucket selection
Hannes Frederic Sowa [Wed, 25 Mar 2015 16:07:44 +0000 (17:07 +0100)]
ipv4: hash net ptr into fragmentation bucket selection

As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'tipc-next'
David S. Miller [Wed, 25 Mar 2015 18:05:56 +0000 (14:05 -0400)]
Merge branch 'tipc-next'

Jon Maloy says:

====================
tipc: some improvements and fixes

We introduce a better algorithm for selecting when and which
users should be subject to link congestion control, plus clean
up some code for that mechanism.
Commit #3 fixes another rare race condition during packet reception.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: eliminate race condition at dual link establishment
Jon Paul Maloy [Wed, 25 Mar 2015 16:07:26 +0000 (12:07 -0400)]
tipc: eliminate race condition at dual link establishment

Despite recent improvements, the establishment of dual parallel
links still has a small glitch where messages can bypass each
other. When the second link in a dual-link configuration is
established, part of the first link's traffic will be steered over
to the new link. Although we do have a mechanism to ensure that
packets sent before and after the establishment of the new link
arrive in sequence to the destination node, this is not enough.
The arriving messages will still be delivered upwards in different
threads, something entailing a risk of message disordering during
the transition phase.

To fix this, we introduce a synchronization mechanism between the
two parallel links, so that traffic arriving on the new link cannot
be added to its input queue until we are guaranteed that all
pre-establishment messages have been delivered on the old, parallel
link.

This problem seems to always have been around, but its occurrence is
so rare that it has not been noticed until recent intensive testing.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: clean up handling of link congestion
Jon Paul Maloy [Wed, 25 Mar 2015 16:07:25 +0000 (12:07 -0400)]
tipc: clean up handling of link congestion

After the recent changes in message importance handling it becomes
possible to simplify handling of messages and sockets when we
encounter link congestion.

We merge the function tipc_link_cong() into link_schedule_user(),
and simplify the code of the latter. The code should now be
easier to follow, especially regarding return codes and handling
of the message that caused the situation.

In case the scheduling function is unable to pre-allocate a wakeup
message buffer, it now returns -ENOBUFS, which is a more correct
code than the previously used -EHOSTUNREACH.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: introduce starvation free send algorithm
Jon Paul Maloy [Wed, 25 Mar 2015 16:07:24 +0000 (12:07 -0400)]
tipc: introduce starvation free send algorithm

Currently, we only use a single counter; the length of the backlog
queue, to determine whether a message should be accepted to the queue
or not. Each time a message is being sent, the queue length is compared
to a threshold value for the message's importance priority. If the queue
length is beyond this threshold, the message is rejected. This algorithm
implies a risk of starvation of low importance senders during very high
load, because it may take a long time before the backlog queue has
decreased enough to accept a lower level message.

We now eliminate this risk by introducing a counter for each importance
priority. When a message is sent, we check only the queue level for that
particular message's priority. If that is ok, the message can be added
to the backlog, irrespective of the queue level for other priorities.
This way, each level is guaranteed a certain portion of the total
bandwidth, and any risk of starvation is eliminated.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: dsa: Handle non-bridge master change
Guenter Roeck [Wed, 25 Mar 2015 15:08:37 +0000 (08:08 -0700)]
net: dsa: Handle non-bridge master change

Master change notifications may occur other than when joining or
leaving a bridge, for example when being added to or removed from
a bond or Open vSwitch. In that case, do nothing instead of asking
the switch driver to remove a port from a bridge that it didn't join.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetfilter: nf_tables: convert hash and rbtree to set extensions
Patrick McHardy [Wed, 25 Mar 2015 13:07:50 +0000 (13:07 +0000)]
netfilter: nf_tables: convert hash and rbtree to set extensions

The set implementations' private struct will only contain the elements
needed to maintain the search structure, all other elements are moved
to the set extensions.

Element allocation and initialization is performed centrally by
nf_tables_api instead of by the different set implementations'
->insert() functions. A new "elemsize" member in the set ops specifies
the amount of memory to reserve for internal usage. Destruction
will also be moved out of the set implementations by a following patch.

Except for element allocation, the patch is a simple conversion to
using data from the extension area.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nf_tables: add set extensions
Patrick McHardy [Wed, 25 Mar 2015 13:07:49 +0000 (13:07 +0000)]
netfilter: nf_tables: add set extensions

Add simple set extension infrastructure for maintaining variable sized
and optional per element data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nft_hash: convert to use rhashtable callbacks
Patrick McHardy [Wed, 25 Mar 2015 13:07:48 +0000 (13:07 +0000)]
netfilter: nft_hash: convert to use rhashtable callbacks

A following patch will convert sets to use so called set extensions,
where the key is not located in a fixed position anymore. This will
require rhashtable hashing and comparison callbacks to be used.

As preparation, convert nft_hash to use these callbacks without any
functional changes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nft_hash: indent rhashtable parameters
Patrick McHardy [Wed, 25 Mar 2015 13:07:47 +0000 (13:07 +0000)]
netfilter: nft_hash: indent rhashtable parameters

Improve readability by indenting the parameter initialization.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nft_hash: restore struct nft_hash
Patrick McHardy [Wed, 25 Mar 2015 13:07:46 +0000 (13:07 +0000)]
netfilter: nft_hash: restore struct nft_hash

Following patches will add new private members, restore struct nft_hash
as preparation.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agorhashtable: provide len to obj_hashfn
Patrick McHardy [Wed, 25 Mar 2015 13:07:45 +0000 (13:07 +0000)]
rhashtable: provide len to obj_hashfn

nftables sets will be converted to use so called setextensions, moving
the key to a non-fixed position. To hash it, the obj_hashfn must be used,
however it so far doesn't receive the length parameter.

Pass the key length to obj_hashfn() and convert existing users.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agocrypto: algif - fix warn: unsigned 'used' is never less than zero
tadeusz.struk@intel.com [Wed, 25 Mar 2015 14:29:19 +0000 (07:29 -0700)]
crypto: algif - fix warn: unsigned 'used' is never less than zero

Change type from unsigned long to int to fix an issue reported by kbuild robot:
crypto/algif_skcipher.c:596 skcipher_recvmsg_async() warn: unsigned 'used' is
never less than zero.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: fix a link reset issue due to retransmission failures
Ying Xue [Wed, 25 Mar 2015 10:09:40 +0000 (18:09 +0800)]
tipc: fix a link reset issue due to retransmission failures

When a node joins a cluster while we are transmitting a fragment
stream over the broadcast link, it's missing the preceding fragments
needed to build a meaningful message. As a result, the node has to
drop it. However, as the fragment message is not acknowledged to
its sender before it's dropped, it accidentally causes link reset
of retransmission failure on the node.

Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agos390: fix /proc/interrupts output
Sebastian Ott [Wed, 25 Mar 2015 15:31:37 +0000 (16:31 +0100)]
s390: fix /proc/interrupts output

The irqclass_sub_desc array and enum interruption_class are out of sync
thus /proc/interrupts is broken. Remove IRQIO_CLW.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosctp: avoid to repeatedly declare external variables
Ying Xue [Wed, 25 Mar 2015 06:13:01 +0000 (14:13 +0800)]
sctp: avoid to repeatedly declare external variables

Move the declaration for external variables to sctp.h file avoiding
to repeatedly declare them with extern keyword.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetfilter: nft_meta: use raw_smp_processor_id()
Patrick McHardy [Wed, 25 Mar 2015 08:09:56 +0000 (08:09 +0000)]
netfilter: nft_meta: use raw_smp_processor_id()

Using smp_processor_id() triggers warnings with PREEMPT_RCU. There is no
point in disabling preemption since we only collect the numeric value,
so use raw_smp_processor_id() instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nf_tables: nft_queue does not depend on x_tables
Patrick McHardy [Wed, 25 Mar 2015 08:09:55 +0000 (08:09 +0000)]
netfilter: nf_tables: nft_queue does not depend on x_tables

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nf_tables: restore nf_log_trace() in nf_tables_core.c
Pablo Neira Ayuso [Tue, 24 Mar 2015 09:55:38 +0000 (10:55 +0100)]
netfilter: nf_tables: restore nf_log_trace() in nf_tables_core.c

As described by 8787a81 ("netfilter: restore rule tracing via
nfnetlink_log"), this accidentally slipped through during conflict
resolution in 0b4fe07.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: Use LOGLEVEL_<FOO> defines
Joe Perches [Mon, 23 Mar 2015 18:50:10 +0000 (11:50 -0700)]
netfilter: Use LOGLEVEL_<FOO> defines

Use the #defines where appropriate.

Miscellanea:

Add explicit #include <linux/kernel.h> where it was not
previously used so that these #defines are a bit more
explicitly defined instead of indirectly included via:
module.h->moduleparam.h->kernel.h

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agonetfilter: nf_tables: move struct net pointer to base chain
Patrick McHardy [Sat, 21 Mar 2015 15:19:15 +0000 (15:19 +0000)]
netfilter: nf_tables: move struct net pointer to base chain

The network namespace is only needed for base chains to get at the
gencursor. Also convert to possible_net_t.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agotcp: fix ipv4 mapped request socks
Eric Dumazet [Wed, 25 Mar 2015 04:45:56 +0000 (21:45 -0700)]
tcp: fix ipv4 mapped request socks

ss should display ipv4 mapped request sockets like this :

tcp    SYN-RECV   0      0  ::ffff:192.168.0.1:8080   ::ffff:192.0.2.1:35261

and not like this :

tcp    SYN-RECV   0      0  192.168.0.1:8080   192.0.2.1:35261

We should init ireq->ireq_family based on listener sk_family,
not the actual protocol carried by SYN packet.

This means we can set ireq_family in inet_reqsk_alloc()

Fixes: 72f329329ce1 ("inet: introduce ireq_family")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agovirtio: change comment in transmit
stephen hemminger [Tue, 24 Mar 2015 23:22:07 +0000 (16:22 -0700)]
virtio: change comment in transmit

The original comment was not really informative or funny
as well as sexist. Replace it with a better explanation of
why the driver does stop and what the impacts are.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotools: bpf_asm: cleanup vlan extension related token
Daniel Borkmann [Tue, 24 Mar 2015 22:19:24 +0000 (23:19 +0100)]
tools: bpf_asm: cleanup vlan extension related token

We now have K_VLANT, K_VLANP and K_VLANTPID. Clean them up into more
descriptive token, namely K_VLAN_TCI, K_VLAN_AVAIL and K_VLAN_TPID.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'listener_refactor_16'
David S. Miller [Wed, 25 Mar 2015 01:16:30 +0000 (21:16 -0400)]
Merge branch 'listener_refactor_16'

Eric Dumazet says:

====================
tcp: listener refactor part 16

A CONFIG_PROVE_RCU=y build revealed an RCU splat I had to fix.

I added const qualifiers to various md5 methods, as I expect
to call them on behalf of request sock traffic even if
the listener socket is not locked. This seems ok, but adding
const makes the contract clearer. Note a good reduction
of code size thanks to request/establish sockets convergence.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()
Eric Dumazet [Tue, 24 Mar 2015 22:58:56 +0000 (15:58 -0700)]
tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()

With request socks convergence, we no longer need
different lookup methods. A request socket can
use generic lookup function.

Add const qualifier to 2nd tcp_v[46]_md5_lookup() parameter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: md5: remove request sock argument of calc_md5_hash()
Eric Dumazet [Tue, 24 Mar 2015 22:58:55 +0000 (15:58 -0700)]
tcp: md5: remove request sock argument of calc_md5_hash()

Since request and established sockets now have same base,
there is no need to pass two pointers to tcp_v4_md5_hash_skb()
or tcp_v6_md5_hash_skb()

Also add a const qualifier to their struct tcp_md5sig_key argument.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: md5: input path is run under rcu protected sections
Eric Dumazet [Tue, 24 Mar 2015 22:58:54 +0000 (15:58 -0700)]
tcp: md5: input path is run under rcu protected sections

It is guaranteed that both tcp_v4_rcv() and tcp_v6_rcv()
run from rcu read locked sections :

ip_local_deliver_finish() and ip6_input_finish() both
use rcu_read_lock()

Also align tcp_v6_inbound_md5_hash() on tcp_v4_inbound_md5_hash()
by returning a boolean.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: use C99 initializers in new_state[]
Eric Dumazet [Tue, 24 Mar 2015 22:58:53 +0000 (15:58 -0700)]
tcp: use C99 initializers in new_state[]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: md5: fix rcu lockdep splat
Eric Dumazet [Tue, 24 Mar 2015 22:58:52 +0000 (15:58 -0700)]
tcp: md5: fix rcu lockdep splat

While timer handler effectively runs a rcu read locked section,
there is no explicit rcu_read_lock()/rcu_read_unlock() annotations
and lockdep can be confused here :

net/ipv4/tcp_ipv4.c-906-        /* caller either holds rcu_read_lock() or socket lock */
net/ipv4/tcp_ipv4.c:907:        md5sig = rcu_dereference_check(tp->md5sig_info,
net/ipv4/tcp_ipv4.c-908-                                       sock_owned_by_user(sk) ||
net/ipv4/tcp_ipv4.c-909-                                       lockdep_is_held(&sk->sk_lock.slock));

Let's explicitely acquire rcu_read_lock() in tcp_make_synack()

Before commit 95313fcb2b6 ("inet: get rid of central tcp/dccp listener
timer"), we were holding listener lock so lockdep was happy.

Fixes: 95313fcb2b6 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric DUmazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'rhashtable-next'
David S. Miller [Tue, 24 Mar 2015 21:53:06 +0000 (17:53 -0400)]
Merge branch 'rhashtable-next'

Thomas Graf says:

====================
rhashtable updates on top of Herbert's work

Patch 1 is a bugfix for an RCU splash I encountered while testing.
Patch 2 & 3 are pure cleanups. Patch 4 disables automatic shrinking
by default as discussed in previous thread. Patch 5 removes some
rhashtable internal knowledge from nft_hash and fixes another RCU
splash.

I've pushed various rhashtable tests (Netlink, nft) together with a
Makefile to a git tree [0] for easier stress testing.

[0] https://github.com/tgraf/rhashtable
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Add rhashtable_free_and_destroy()
Thomas Graf [Tue, 24 Mar 2015 13:18:20 +0000 (14:18 +0100)]
rhashtable: Add rhashtable_free_and_destroy()

rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.

Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.

Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Disable automatic shrinking by default
Thomas Graf [Tue, 24 Mar 2015 20:42:19 +0000 (20:42 +0000)]
rhashtable: Disable automatic shrinking by default

Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Mark internal/private inline functions as such
Thomas Graf [Tue, 24 Mar 2015 13:18:18 +0000 (14:18 +0100)]
rhashtable: Mark internal/private inline functions as such

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Use 'unsigned int' consistently
Thomas Graf [Tue, 24 Mar 2015 13:18:17 +0000 (14:18 +0100)]
rhashtable: Use 'unsigned int' consistently

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Extend RCU read lock into rhashtable_insert_rehash()
Thomas Graf [Tue, 24 Mar 2015 13:18:16 +0000 (14:18 +0100)]
rhashtable: Extend RCU read lock into rhashtable_insert_rehash()

rhashtable_insert_rehash() requires RCU locks to be held in order
to access ht->tbl and traverse to the last table.

Fixes: 43719bf4351e ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofilter: introduce SKF_AD_VLAN_TPID BPF extension
Michal Sekletar [Tue, 24 Mar 2015 13:48:41 +0000 (14:48 +0100)]
filter: introduce SKF_AD_VLAN_TPID BPF extension

If vlan offloading takes place then vlan header is removed from frame
and its contents, both vlan_tci and vlan_proto, is available to user
space via TPACKET interface. However, only vlan_tci can be used in BPF
filters.

This commit introduces a new BPF extension. It makes possible to load
the value of vlan_proto (vlan TPID) to register A. Support for classic
BPF and eBPF is being added, analogous to skb->protocol.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'cxgb4-fcoe'
David S. Miller [Tue, 24 Mar 2015 19:24:39 +0000 (15:24 -0400)]
Merge branch 'cxgb4-fcoe'

Varun Prakash says:

====================
FCoE support in cxgb4 driver

This patch series enables FCoE support in cxgb4 driver, it enables
FCOE_CRC and FCOE_MTU net device features.

This series is created against net-next tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: update Kconfig and Makefile for FCoE support
Varun Prakash [Tue, 24 Mar 2015 13:44:47 +0000 (19:14 +0530)]
cxgb4: update Kconfig and Makefile for FCoE support

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: add cxgb4_fcoe.c for FCoE
Varun Prakash [Tue, 24 Mar 2015 13:44:46 +0000 (19:14 +0530)]
cxgb4: add cxgb4_fcoe.c for FCoE

This patch adds cxgb4_fcoe.c and enables FCOE_CRC, FCOE_MTU
net device features.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb4: add cxgb4_fcoe.h and macro definitions for FCoE
Varun Prakash [Tue, 24 Mar 2015 13:44:45 +0000 (19:14 +0530)]
cxgb4: add cxgb4_fcoe.h and macro definitions for FCoE

This patch adds new header file cxgb4_fcoe.h and defines new
macros for FCoE support in cxgb4 driver.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: fix sparse warnings in privacy stable addresses generation
Hannes Frederic Sowa [Tue, 24 Mar 2015 10:05:28 +0000 (11:05 +0100)]
ipv6: fix sparse warnings in privacy stable addresses generation

Those warnings reported by sparse endianness check (via kbuild test robot)
are harmless, nevertheless fix them up and make the code a little bit
easier to read.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 576441be3c915c0 ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: fix compile error when IPV6=m and TIPC=y
Ying Xue [Tue, 24 Mar 2015 08:59:21 +0000 (16:59 +0800)]
tipc: fix compile error when IPV6=m and TIPC=y

When IPV6=m and TIPC=y, below error will appear during building kernel
image:

net/tipc/udp_media.c:196:
undefined reference to `ip6_dst_lookup'
make: *** [vmlinux] Error 1

As ip6_dst_lookup() is implemented in IPV6 and IPV6 is compiled as
module, ip6_dst_lookup() is not built-in core kernel image. As a
result, compiler cannot find 'ip6_dst_lookup' reference while
compiling TIPC code into core kernel image.

But with the method introduced by commit d22e5150283a ("ipv6: export a
stub for IPv6 symbols used by vxlan"), we can avoid the compile error
through "ipv6_stub" pointer to access ip6_dst_lookup().

Fixes: 38e17eaa07d7 ("tipc: add ip/udp media type")
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: allow to delete a whole device group
WANG Cong [Tue, 24 Mar 2015 18:53:31 +0000 (11:53 -0700)]
net: allow to delete a whole device group

With dev group, we can change a batch of net devices,
so we should allow to delete them together too.

Group 0 is not allowed to be deleted since it is
the default group.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Add comment on choice of elasticity value
Herbert Xu [Tue, 24 Mar 2015 02:37:30 +0000 (13:37 +1100)]
rhashtable: Add comment on choice of elasticity value

This patch adds a comment on the choice of the value 16 as the
maximum chain length before we force a rehash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocx82310_eth: fix semicolon.cocci warnings
Wu Fengguang [Tue, 24 Mar 2015 01:51:32 +0000 (09:51 +0800)]
cx82310_eth: fix semicolon.cocci warnings

drivers/net/usb/cx82310_eth.c:175:2-3: Unneeded semicolon

 Removes unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotipc: validate length of sockaddr in connect() for dgram/rdm
Sasha Levin [Mon, 23 Mar 2015 19:30:00 +0000 (15:30 -0400)]
tipc: validate length of sockaddr in connect() for dgram/rdm

Commit 5f94e5c ("tipc: add support for connect() on dgram/rdm sockets")
hasn't validated user input length for the sockaddr structure which allows
a user to overwrite kernel memory with arbitrary input.

Fixes: 5f94e5c ("tipc: add support for connect() on dgram/rdm sockets")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: remove never used forwarding_accel_ops pointer from net_device
Hannes Frederic Sowa [Mon, 23 Mar 2015 17:40:02 +0000 (18:40 +0100)]
net: remove never used forwarding_accel_ops pointer from net_device

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Tue, 24 Mar 2015 02:22:43 +0000 (22:22 -0400)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
net/netfilter/nf_tables_core.c

The nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Fix sleeping inside RCU critical section in walk_stop
Herbert Xu [Mon, 23 Mar 2015 22:53:17 +0000 (09:53 +1100)]
rhashtable: Fix sleeping inside RCU critical section in walk_stop

The commit 0062d19dc7de879c3c1033f6ea970a6f8e8e8a1f ("rhashtable:
Fix use-after-free in rhashtable_walk_stop") fixed a real bug
but created another one because we may end up sleeping inside an
RCU critical section.

This patch fixes it properly by replacing the mutex with a spin
lock that specifically protects the walker lists.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'ipv6_stable_privacy_address'
David S. Miller [Tue, 24 Mar 2015 02:12:15 +0000 (22:12 -0400)]
Merge branch 'ipv6_stable_privacy_address'

Hannes Frederic Sowa says:

====================
ipv6: RFC7217 stable privacy addresses implementation

this is an implementation of basic support for RFC7217 stable privacy
addresses. Please review and consider for net-next.

v2:
* Correct references to RFC 7212 -> RFC 7217 in documentation patch (thanks, Eric!)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: add documentation for stable_secret, idgen_delay and idgen_retries knobs
Hannes Frederic Sowa [Mon, 23 Mar 2015 22:36:06 +0000 (23:36 +0100)]
ipv6: add documentation for stable_secret, idgen_delay and idgen_retries knobs

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: introduce idgen_delay and idgen_retries knobs
Hannes Frederic Sowa [Mon, 23 Mar 2015 22:36:05 +0000 (23:36 +0100)]
ipv6: introduce idgen_delay and idgen_retries knobs

This is specified by RFC 7217.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: do retries on stable privacy addresses
Hannes Frederic Sowa [Mon, 23 Mar 2015 22:36:04 +0000 (23:36 +0100)]
ipv6: do retries on stable privacy addresses

If a DAD conflict is detected, we want to retry privacy stable address
generation up to idgen_retries (= 3) times with a delay of idgen_delay
(= 1 second). Add the logic to addrconf_dad_failure.

By design, we don't clean up dad failed permanent addresses.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: collapse state_lock and lock
Hannes Frederic Sowa [Mon, 23 Mar 2015 22:36:03 +0000 (23:36 +0100)]
ipv6: collapse state_lock and lock

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: introduce IFA_F_STABLE_PRIVACY flag
Hannes Frederic Sowa [Mon, 23 Mar 2015 22:36:02 +0000 (23:36 +0100)]
ipv6: introduce IFA_F_STABLE_PRIVACY flag

We need to mark appropriate addresses so we can do retries in case their
DAD failed.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: generation of stable privacy addresses for link-local and autoconf
Hannes Frederic Sowa [Mon, 23 Mar 2015 22:36:01 +0000 (23:36 +0100)]
ipv6: generation of stable privacy addresses for link-local and autoconf

This patch implements the stable privacy address generation for
link-local and autoconf addresses as specified in RFC7217.

  RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key)

is the RID (random identifier). As the hash function F we chose one
round of sha1. Prefix will be either the link-local prefix or the
router advertised one. As Net_Iface we use the MAC address of the
device. DAD_Counter and secret_key are implemented as specified.

We don't use Network_ID, as it couples the code too closely to other
subsystems. It is specified as optional in the RFC.

As Net_Iface we only use the MAC address: we simply have no stable
identifier in the kernel we could possibly use: because this code might
run very early, we cannot depend on names, as they might be changed by
user space early on during the boot process.

A new address generation mode is introduced,
IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to
none or eui64 address configuration mode although the stable_secret is
already set.

We refuse writes to ipv6/conf/all/stable_secret but only allow
ipv6/conf/default/stable_secret and the interface specific file to be
written to. The default stable_secret is used as the parameter for the
namespace, the interface specific can overwrite the secret, e.g. when
switching a network configuration from one system to another while
inheriting the secret.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: introduce secret_stable to ipv6_devconf
Hannes Frederic Sowa [Mon, 23 Mar 2015 22:36:00 +0000 (23:36 +0100)]
ipv6: introduce secret_stable to ipv6_devconf

This patch implements the procfs logic for the stable_address knob:
The secret is formatted as an ipv6 address and will be stored per
interface and per namespace. We track initialized flag and return EIO
errors until the secret is set.

We don't inherit the secret to newly created namespaces.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agolib: EXPORT_SYMBOL sha_init
Hannes Frederic Sowa [Mon, 23 Mar 2015 22:35:59 +0000 (23:35 +0100)]
lib: EXPORT_SYMBOL sha_init

We need this symbol later on in ipv6.ko, thus export it via EXPORT_SYMBOL
like sha_transform already is.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bcmgenet-next'
David S. Miller [Tue, 24 Mar 2015 02:10:50 +0000 (22:10 -0400)]
Merge branch 'bcmgenet-next'

Florian Fainelli says:

====================
net: bcmgenet: integrated GPHY power up/down

This patch series implements integrated Gigabit PHY power up/down, which allows
us to save close to 300mW on some designs when the Gigabit PHY is known to be
unused (e.g: during bcmgenet_close or bcmgenet_suspend not doing Wake-on-LAN).

Changes in v2:

- drop an extra bcmgenet_ext_readl in bcmgenet_phy_power_set
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: power down and up GPHY during suspend/resume
Florian Fainelli [Mon, 23 Mar 2015 22:09:57 +0000 (15:09 -0700)]
net: bcmgenet: power down and up GPHY during suspend/resume

In case the interface is not used, power down the integrated GPHY during
suspend. Similarly to bcmgenet_open(), bcmgenet_resume() powers on the GPHY
prior to any UniMAC activity.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: power up and down integrated GPHY when unused
Florian Fainelli [Mon, 23 Mar 2015 22:09:56 +0000 (15:09 -0700)]
net: bcmgenet: power up and down integrated GPHY when unused

Power up the GPHY while we are bringing-up the network interface, and
conversely, upon bring down, power the GPHY down. In order to avoid
creating hardware hazards, make sure that the GPHY gets powered on
during bcmgenet_open() prior to the UniMAC being reset as the UniMAC may
start creating activity towards the GPHY if we reverse the steps.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: implement GPHY power down sequence
Florian Fainelli [Mon, 23 Mar 2015 22:09:55 +0000 (15:09 -0700)]
net: bcmgenet: implement GPHY power down sequence

Implement the GPHY power down sequence by setting all power down bits, putting
the GPHY in reset, and finally cutting the 25Mhz reference clock.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: fix GPHY power-up sequence
Florian Fainelli [Mon, 23 Mar 2015 22:09:54 +0000 (15:09 -0700)]
net: bcmgenet: fix GPHY power-up sequence

We were missing a number of extra steps and delays to power-up the GPHY, update
the sequence to reflect the proper procedure here.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: rename bcmgenet_ephy_power_up
Florian Fainelli [Mon, 23 Mar 2015 22:09:53 +0000 (15:09 -0700)]
net: bcmgenet: rename bcmgenet_ephy_power_up

In preparation for implementing the power down GPHY sequence, rename
bcmgenet_ephy_power_up to illustrate that it is not EPHY specific but
PHY agnostic, and add an "enable" argument.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: update bcmgenet_ephy_power_up to clear CK25_DIS bit
Florian Fainelli [Mon, 23 Mar 2015 22:09:52 +0000 (15:09 -0700)]
net: bcmgenet: update bcmgenet_ephy_power_up to clear CK25_DIS bit

The CK25_DIS bit controls whether a 25Mhz clock is fed to the GPHY or
not, in preparation for powering down the integrated GPHY when relevant,
make sure we clear that bit.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: bcmgenet: propagate errors from bcmgenet_power_down
Florian Fainelli [Mon, 23 Mar 2015 22:09:51 +0000 (15:09 -0700)]
net: bcmgenet: propagate errors from bcmgenet_power_down

If bcmgenet_power_down() fails, we would want to propagate a return
value from bcmgenet_wol_power_down_cfg() to know about this.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'rhashtable-next'
David S. Miller [Tue, 24 Mar 2015 02:07:56 +0000 (22:07 -0400)]
Merge branch 'rhashtable-next'

Herbert Xu says:

====================
rhashtable: Multiple rehashing

This series introduces multiple rehashing.

Recall that the original implementation in br_multicast used
two list pointers per hash node and therefore is limited to at
most one rehash at a time since you need one list pointer for
the old table and one for the new table.

Thanks to Josh Triplett's suggestion of using a single list pointer
we're no longer limited by that.  So it is perfectly OK to have
an arbitrary number of tables in existence at any one time.

The reader and removal simply has to walk from the oldest table
to the newest table in order not to miss anything.  Insertion
without lookup are just as easy as we simply go to the last table
that we can find and add the entry there.

However, insertion with uniqueness lookup is more complicated
because we need to ensure that two simultaneous insertions of the
same key do not both succeed.  To achieve this, all insertions
including those without lookups are required to obtain the bucket
lock from the oldest hash table that is still alive.  This is
determined by having the rehasher (there is only one rehashing
thread in the system) keep a pointer of where it is up to.  If
a bucket has already been rehashed then it is dead, i.e., there
cannot be any more insertions to it, otherwise it is considered
alive.  This guarantees that the same key cannot be inserted
in two different tables in parallel.

Patch 1 is actually a bug fix for the walker.

Patch 2-5 eliminates unnecessary out-of-line copies of jhash.

Patch 6 makes rhashtable_shrink shrink to fit.

Patch 7 introduces multiple rehashing.  This means that if we
decide to grow then we will grow regardless of whether the previous
one has finished.  However, this is still asynchronous meaning
that if insertions come fast enough we may still end up with a
table that is overutilised.

Patch 8 adds support for GFP_ATOMIC allocations of struct bucket_table.

Finally patch 9 enables immediate rehashing.  This is done either
when the table reaches 100% utilisation, or when the chain length
exceeds 16 (the latter can be disabled on request, e.g., for
nft_hash.

With these patches the system should no longer have any trouble
dealing with fast insertions on a small table.  In the worst
case you end up with a list of tables that's log N in length
while the rehasher catches up.

v3 restores rhashtable_shrink and fixes a number of bugs in the
multiple rehashing patches (7 and 9).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Add immediate rehash during insertion
Herbert Xu [Mon, 23 Mar 2015 13:50:28 +0000 (00:50 +1100)]
rhashtable: Add immediate rehash during insertion

This patch reintroduces immediate rehash during insertion.  If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail.  Otherwise the
insertion will be reattempted in the new hash table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorhashtable: Allow GFP_ATOMIC bucket table allocation
Herbert Xu [Mon, 23 Mar 2015 13:50:27 +0000 (00:50 +1100)]
rhashtable: Allow GFP_ATOMIC bucket table allocation

This patch adds the ability to allocate bucket table with GFP_ATOMIC
instead of GFP_KERNEL.  This is needed when we perform an immediate
rehash during insertion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>