]> git.baikalelectronics.ru Git - kernel.git/commit
mm: page_counter: re-layout structure to reduce false sharing
authorFeng Tang <feng.tang@intel.com>
Wed, 24 Feb 2021 20:04:01 +0000 (12:04 -0800)
committerLinus Torvalds <torvalds@linux-foundation.org>
Wed, 24 Feb 2021 21:38:29 +0000 (13:38 -0800)
commitc9372f335a281e6a4111db8174d811c208128978
treebf95df2e8b53d139ed521602b6dba9327594d699
parent8cd2642922942e6a88d9d80dfde1b73f1d17fe4c
mm: page_counter: re-layout structure to reduce false sharing

When checking a memory cgroup related performance regression [1], from the
perf c2c profiling data, we found high false sharing for accessing 'usage'
and 'parent'.

On 64 bit system, the 'usage' and 'parent' are close to each other, and
easy to be in one cacheline (for cacheline size == 64+ B).  'usage' is
usally written, while 'parent' is usually read as the cgroup's
hierarchical counting nature.

So move the 'parent' to the end of the structure to make sure they
are in different cache lines.

Following are some performance data with the patch, against v5.11-rc1.  [
In the data, A means a platform with 2 sockets 48C/96T, B is a platform of
4 sockests 72C/144T, and if a %stddev will be shown bigger than 2%,
P100/P50 means number of test tasks equals to 100%/50% of nr_cpu]

will-it-scale/malloc1
---------------------
   v5.11-rc1 v5.11-rc1+patch

A-P100      15782 ±  2%      -0.1%      15765 ±  3%  will-it-scale.per_process_ops
A-P50      21511            +8.9%      23432        will-it-scale.per_process_ops
B-P100       9155            +2.2%       9357        will-it-scale.per_process_ops
B-P50      10967            +7.1%      11751 ±  2%  will-it-scale.per_process_ops

will-it-scale/pagefault2
------------------------
   v5.11-rc1 v5.11-rc1+patch

A-P100      79028            +3.0%      81411        will-it-scale.per_process_ops
A-P50     183960 ±  2%      +4.4%     192078 ±  2%  will-it-scale.per_process_ops
B-P100      85966            +9.9%      94467 ±  3%  will-it-scale.per_process_ops
B-P50     198195            +9.8%     217526        will-it-scale.per_process_ops

fio (4k/1M is block size)
-------------------------
   v5.11-rc1 v5.11-rc1+patch

A-P50-r-4k     16881 ±  2%    +1.2%      17081 ±  2%  fio.read_bw_MBps
A-P50-w-4k      3931          +4.5%       4111 ±  2%  fio.write_bw_MBps
A-P50-r-1M     15178          -0.2%      15154        fio.read_bw_MBps
A-P50-w-1M      3924          +0.1%       3929        fio.write_bw_MBps

[1].https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/

Link: https://lkml.kernel.org/r/1611040814-33449-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
include/linux/page_counter.h