epoll: use rwlock in order to reduce ep_poll_callback() contention
The goal of this patch is to reduce contention of ep_poll_callback()
which can be called concurrently from different CPUs in case of high
events rates and many fds per epoll. Problem can be very well
reproduced by generating events (write to pipe or eventfd) from many
threads, while consumer thread does polling. In other words this patch
increases the bandwidth of events which can be delivered from sources to
the poller by adding poll items in a lockless way to the list.
The main change is in replacement of the spinlock with a rwlock, which
is taken on read in ep_poll_callback(), and then by adding poll items to
the tail of the list using xchg atomic instruction. Write lock is taken
everywhere else in order to stop list modifications and guarantee that
list updates are fully completed (I assume that write side of a rwlock
does not starve, it seems qrwlock implementation has these guarantees).
The following are some microbenchmark results based on the test [1]
which starts threads which generate N events each. The test ends when
all events are successfully fetched by the poller thread:
spinlock
========
threads events/ms run-time ms
8 6402 12495
16 7045 22709
32 7395 43268
rwlock + xchg
=============
threads events/ms run-time ms
8 10038 7969
16 12178 13138
32 13223 24199
According to the results bandwidth of delivered events is significantly
increased, thus execution time is reduced.
This patch was tested with different sort of microbenchmarks and
artificial delays (e.g. "udelay(get_random_int() & 0xff)") introduced
in kernel on paths where items are added to lists.
[1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
Link: http://lkml.kernel.org/r/20190103150104.17128-5-rpenyaev@suse.de
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>