git.baikalelectronics.ru Git - kernel.git/commit

hugetlb: derive huge pages nodes allowed from task mempolicy

This patch derives a "nodes_allowed" node mask from the numa mempolicy of
the task modifying the number of persistent huge pages to control the
allocation, freeing and adjusting of surplus huge pages when the pool page
count is modified via the new sysctl or sysfs attribute
"nr_hugepages_mempolicy".  The nodes_allowed mask is derived as follows:

* For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
  is produced.  This will cause the hugetlb subsystem to use
  node_online_map as the "nodes_allowed".  This preserves the
  behavior before this patch.
* For "preferred" mempolicy, including explicit local allocation,
  a nodemask with the single preferred node will be produced.
  "local" policy will NOT track any internode migrations of the
  task adjusting nr_hugepages.
* For "bind" and "interleave" policy, the mempolicy's nodemask
  will be used.
* Other than to inform the construction of the nodes_allowed node
  mask, the actual mempolicy mode is ignored.  That is, all modes
  behave like interleave over the resulting nodes_allowed mask
  with no "fallback".

See the updated documentation [next patch] for more information
about the implications of this patch.

Examples:

Starting with:

Node 0 HugePages_Total:     0
Node 1 HugePages_Total:     0
Node 2 HugePages_Total:     0
Node 3 HugePages_Total:     0

Default behavior [with or without this patch] balances persistent
hugepage allocation across nodes [with sufficient contiguous memory]:

sysctl vm.nr_hugepages[_mempolicy]=32

yields:

Node 0 HugePages_Total:     8
Node 1 HugePages_Total:     8
Node 2 HugePages_Total:     8
Node 3 HugePages_Total:     8

Of course, we only have nr_hugepages_mempolicy with the patch,
but with default mempolicy, nr_hugepages_mempolicy behaves the
same as nr_hugepages.

Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
'--membind' because it allows multiple nodes to be specified
and it's easy to type]--we can allocate huge pages on
individual nodes or sets of nodes.  So, starting from the
condition above, with 8 huge pages per node, add 8 more to
node 2 using:

numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40

This yields:

Node 0 HugePages_Total:     8
Node 1 HugePages_Total:     8
Node 2 HugePages_Total:    16
Node 3 HugePages_Total:     8

The incremental 8 huge pages were restricted to node 2 by the
specified mempolicy.

Similarly, we can use mempolicy to free persistent huge pages
from specified nodes:

numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32

yields:

Node 0 HugePages_Total:     4
Node 1 HugePages_Total:     4
Node 2 HugePages_Total:    16
Node 3 HugePages_Total:     8

The 8 huge pages freed were balanced over nodes 0 and 1.

[rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

author	Lee Schermerhorn <lee.schermerhorn@hp.com>
	Tue, 15 Dec 2009 01:58:21 +0000 (17:58 -0800)
committer	Linus Torvalds <torvalds@linux-foundation.org>
	Tue, 15 Dec 2009 16:53:12 +0000 (08:53 -0800)
commit	86b444b30f9b2cfbd37fb71db4f77b22ff575730
tree	8f7b52a4af1532ed414631f68b99a059e299d83f	tree \| snapshot
parent	d1aa5aa4cca6643fea7c1b84f6bdb3100dd9fdc6	commit \| diff

include/linux/hugetlb.h		diff \| blob \| history
include/linux/mempolicy.h		diff \| blob \| history
kernel/sysctl.c		diff \| blob \| history
mm/hugetlb.c		diff \| blob \| history
mm/mempolicy.c		diff \| blob \| history