git.baikalelectronics.ru Git - kernel.git/commit

mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry()

Tetsuo Handa has reported that it is possible to bypass the short sleep
for PF_WQ_WORKER threads which was introduced by commit 0a19bf3c9a041189
("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make
any progress") and lock up the system if OOM.

The primary reason is that WQ_MEM_RECLAIM WQs are not guaranteed to run
even when they have a rescuer available.  Those workers might be essential
for reclaim to make a forward progress, however.  If we are too unlucky
all the allocations requests can get stuck waiting for a WQ_MEM_RECLAIM
work item and the system is essentially stuck in an OOM condition without
much hope to move on.  Tetsuo has seen the reclaim stuck on
drain_local_pages_wq or xlog_cil_push_work (xfs).  There might be others.

Since should_reclaim_retry() should be a natural reschedule point,
let's do the short sleep for PF_WQ_WORKER threads unconditionally in
order to guarantee that other pending work items are started.  This
will workaround this problem and it is less fragile than hunting down
when the sleep is missed.  Having a single sleeping point is more
robust.

[akpm@linux-foundation.org: reflow comment to 80 cols to save a couple of lines]
Link: http://lkml.kernel.org/r/20180827135101.15700-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

author	Michal Hocko <mhocko@suse.com>
	Fri, 26 Oct 2018 22:03:31 +0000 (15:03 -0700)
committer	Linus Torvalds <torvalds@linux-foundation.org>
	Fri, 26 Oct 2018 23:25:19 +0000 (16:25 -0700)
commit	5344a52861738c283fe98213800d9073270706be
tree	b0d42d4904ed7a8309eb357b0bf0441cccc59993	tree \| snapshot
parent	325b4c8ffa8f7d908c2d1d0daec6a593725d866a	commit \| diff