SeongJae Park [Tue, 13 Sep 2022 17:44:28 +0000 (17:44 +0000)]
mm/damon/paddr: make supported DAMOS actions of paddr clear
Patch series "mm/damon: cleanup code".
DAMON code was not so clean from the beginning, but it has been too much
nowadays, especially due to the duplicates in DAMON_RECLAIM and
DAMON_LRU_SORT. This patchset cleans some of the mess.
This patch (of 22):
The 'switch-case' statement in 'damon_va_apply_scheme()' function provides
a 'case' for every supported DAMOS action while all not-yet-supported
DAMOS actions fall through the 'default' case, and comment it so that
people can easily know which actions are supported. Its counterpart in
'paddr', 'damon_pa_apply_scheme()', however, doesn't. This commit makes
the 'paddr' side function follows the pattern of 'vaddr' for better
readability and consistency.
mm/damon: simplify scheme create in damon_lru_sort_apply_parameters
In damon_lru_sort_apply_parameters(), we can use damon_set_schemes() to
replace the way of creating the first 'scheme' in original code, this
makes the code look cleaner.
zram_table_entry::flags stores object size in the lower bits and zram
pageflags in the upper bits. However, for some reason, we use 24 lower
bits, while maximum zram object size is PAGE_SIZE, which requires
PAGE_SHIFT bits (up to 16 on arm64). This wastes 24 - PAGE_SHIFT bits
that we can use for additional zram pageflags instead.
Also add a BUILD_BUG_ON() to alert us should we run out of bits in
zram_table_entry::flags.
Link: https://lkml.kernel.org/r/20220912152744.527438-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Brian Geffon <bgeffon@google.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dawei Li [Mon, 12 Sep 2022 14:39:03 +0000 (22:39 +0800)]
mm/damon: improve damon_new_region strategy
Kdamond is implemented as a periodical split-merge pattern, which will
create and destroy regions possibly at high frequency (hundreds or even
thousands of per sec), depending on the number of regions and aggregation
period. In that case, kmalloc and kfree could bring speed and space
overheads, which can be improved by using a private kmem cache.
A user who reads THP_ZERO_PAGE_ALLOC may be more concerned about the huge
zero pages that are really allocated for thp. It is misleading to
increase THP_ZERO_PAGE_ALLOC twice if two threads call get_huge_zero_page
concurrently. Don't increase the value if the huge page is not really
used.
Update Documentation/admin-guide/mm/transhuge.rst to suit.
Link: https://lkml.kernel.org/r/20220909021653.3371879-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Cc: Alexander Potapenko <glider@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cheng Li [Fri, 9 Sep 2022 07:31:09 +0000 (07:31 +0000)]
mm: use nth_page instead of mem_map_offset mem_map_next
To handle the discontiguous case, mem_map_next() has a parameter named
`offset`. As a function caller, one would be confused why "get next
entry" needs a parameter named "offset". The other drawback of
mem_map_next() is that the callers must take care of the map between
parameter "iter" and "offset", otherwise we may get an hole or duplication
during iteration. So we use nth_page instead of mem_map_next.
And replace mem_map_offset with nth_page() per Matthew's comments.
Link: https://lkml.kernel.org/r/1662708669-9395-1-git-send-email-lic121@chinatelecom.cn Signed-off-by: Cheng Li <lic121@chinatelecom.cn> Fixes: 69d177c2fc70 ("hugetlbfs: handle pages higher order than MAX_ORDER") Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In lru_sort.c and reclaim.c, they are all defining get_monitoring_region()
function, there is no need to define it separately.
As 'get_monitoring_region()' is not a 'static' function anymore, we try to
use a prefix to distinguish with other functions, so there rename it to
'damon_find_biggest_system_ram'.
Link: https://lkml.kernel.org/r/20220909213606.136221-1-sj@kernel.org Signed-off-by: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: SeongJae Park <sj@kernel.org> Suggested-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liu Shixin [Fri, 9 Sep 2022 08:31:40 +0000 (16:31 +0800)]
mm: kfence: convert to DEFINE_SEQ_ATTRIBUTE
Use DEFINE_SEQ_ATTRIBUTE helper macro to simplify the code.
Link: https://lkml.kernel.org/r/20220909083140.3592919-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
zsmalloc: use correct types in _first_obj_offset functions
Since commit ffedd09fa9b0 ("zsmalloc: Stop using slab fields in struct
page") we are using page->page_type (unsigned int) field instead of
page->units (int) as first object offset in a subpage of zspage. So
get_first_obj_offset() and set_first_obj_offset() functions should work
with unsigned int type.
Link: https://lkml.kernel.org/r/20220909083722.85024-1-avromanov@sberdevices.ru Fixes: ffedd09fa9b0 ("zsmalloc: Stop using slab fields in struct page") Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liu Shixin [Fri, 9 Sep 2022 08:39:47 +0000 (16:39 +0800)]
mm/shuffle: convert module_param_call to module_param_cb
module_param_call is now completely consistent with module_param_cb, so
there is no need to keep two macros. Convert module_param_call to
module_param_cb since former is obsolete and latter is more kernel-ish.
Link: https://lkml.kernel.org/r/20220909083947.3595610-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Paul Russel <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 9 Sep 2022 20:29:01 +0000 (20:29 +0000)]
Docs/admin-guide/mm/damon/usage: note DAMON debugfs interface deprecation plan
Commit b18402726bd1 ("Docs/admin-guide/mm/damon/usage: document DAMON
sysfs interface") announced the DAMON debugfs interface deprecation plan,
but it is not so aggressively announced. As the deprecation time is
coming, this commit makes the announce more easy to be found by adding the
note at the beginning of the DAMON debugfs interface usage document.
Link: https://lkml.kernel.org/r/20220909202901.57977-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Yun Levi <ppbuk5246@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 9 Sep 2022 20:29:00 +0000 (20:29 +0000)]
Docs/admin-guide/mm/damon/start: mention the dependency as sysfs instead of debugfs
'Getting Started' document of DAMON says DAMON user-space tool, damo[1],
is using DAMON debugfs interface, and therefore it needs to ensure debugfs
is mounted. However, the latest version of the tool is using DAMON sysfs
interface. Moreover, DAMON debugfs interface is going to be deprecated as
announced by commit b18402726bd1 ("Docs/admin-guide/mm/damon/usage:
document DAMON sysfs interface").
This commit therefore update the document to tell readers about DAMON
sysfs interface dependency instead and never mention about debugfs
interface, which will be deprecated.
[1] https://github.com/awslabs/damo
Link: https://lkml.kernel.org/r/20220909202901.57977-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Yun Levi <ppbuk5246@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 9 Sep 2022 20:28:59 +0000 (20:28 +0000)]
mm/damon/Kconfig: notify debugfs deprecation plan
Commit b18402726bd1 ("Docs/admin-guide/mm/damon/usage: document DAMON
sysfs interface") announced the DAMON debugfs interface deprecation plan,
but it is not so aggressively announced. As the deprecation time is
coming, this commit makes the announce more easy to be found by adding the
note to the config menu of DAMON debugfs interface.
Link: https://lkml.kernel.org/r/20220909202901.57977-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Yun Levi <ppbuk5246@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 9 Sep 2022 20:28:58 +0000 (20:28 +0000)]
Docs/admin-guide/mm/damon: rename the title of the document
The title of the DAMON document for admin-guide, 'Monitoring Data
Accesses', could confuse readers in some ways. First of all, DAMON is not
the only single way for data access monitoring. And the document is for
not only the data access monitoring but also data access pattern based
memory management optimizations (DAMOS). This commit updates the title to
'DAMON: Data Access MONitor', which more explicitly explains what the
document describes.
Link: https://lkml.kernel.org/r/20220909202901.57977-5-sj@kernel.org Fixes: c4ba6014aec3 ("Documentation: add documents for DAMON") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Yun Levi <ppbuk5246@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 9 Sep 2022 20:28:57 +0000 (20:28 +0000)]
mm/damon/core-test: test damon_set_regions
Preceding commit fixes a bug in 'damon_set_regions()', which allows holes
in the new monitoring target ranges. This commit adds a kunit test case
for the problem to avoid any regression.
Link: https://lkml.kernel.org/r/20220909202901.57977-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Yun Levi <ppbuk5246@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 9 Sep 2022 20:28:56 +0000 (20:28 +0000)]
mm/damon/core: avoid holes in newly set monitoring target ranges
When there are two or more non-contiguous regions intersecting with given
new ranges, 'damon_set_regions()' does not fill the holes. This commit
makes the function to fill the holes with newly created regions.
[sj@kernel.org: handle error from 'damon_fill_regions_holes()'] Link: https://lkml.kernel.org/r/20220913215420.57761-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220909202901.57977-3-sj@kernel.org Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Yun Levi <ppbuk5246@gmail.com> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 9 Sep 2022 20:28:55 +0000 (20:28 +0000)]
selftest/damon: add a test for duplicate context dirs creation
Patch series "mm/damon: minor fixes and cleanups".
This patchset contains minor fixes and cleanups for DAMON including
- selftest for a bug we found before (Patch 1),
- fix of region holes in vaddr corner case and a kunit test for it
(Patches 2 and 3), and
- documents/Kconfig updates for title wordsmithing (Patch 4) and more
aggressive DAMON debugfs interface deprecation announcement
(Patches 5-7).
This patch (of 7):
Commit d26f60703606 ("mm/damon/dbgfs: avoid duplicate context directory
creation") fixes a bug which could result in memory leak and DAMON
disablement. This commit adds a selftest for verifying the fix and avoid
regression.
Jeff Layton [Fri, 9 Sep 2022 13:00:31 +0000 (09:00 -0400)]
tmpfs: add support for an i_version counter
NFSv4 mandates a change attribute to avoid problems with timestamp
granularity, which Linux implements using the i_version counter. This is
particularly important when the underlying filesystem is fast.
Give tmpfs an i_version counter. Since it doesn't have to be persistent,
we can just turn on SB_I_VERSION and sprinkle some inode_inc_iversion
calls in the right places.
Also, while there is no formal spec for xattrs, most implementations
update the ctime on setxattr. Fix shmem_xattr_handler_set to update the
ctime and bump the i_version appropriately.
Link: https://lkml.kernel.org/r/20220909130031.15477-1-jlayton@kernel.org Signed-off-by: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/damon/vaddr: add a comment for 'default' case in damon_va_apply_scheme()
The switch case 'DAMOS_STAT' and switch case 'default' have same return
value in damon_va_apply_scheme(), and the 'default' case is for DAMOS
actions that not supported by 'vaddr'. It might make sense to add a
comment here.
Yajun Deng [Thu, 8 Sep 2022 19:14:43 +0000 (19:14 +0000)]
mm/damon: introduce struct damos_access_pattern
damon_new_scheme() has too many parameters, so introduce struct
damos_access_pattern to simplify it.
In additon, we can't use a bpf trace kprobe that has more than 5
parameters.
Link: https://lkml.kernel.org/r/20220908191443.129534-1-sj@kernel.org Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The struct memcg_vmstats and struct memcg_vmstats_percpu contains two
arrays each for events of size NR_VM_EVENT_ITEMS which can be as large as
110. However the memcg v1 only uses 4 of those while memcg v2 uses 15.
The union of both is 17. On a 64 bit system, we are wasting approximately
((110 - 17) * 8 * 2) * (nr_cpus + 1) bytes which is significant on large
machines.
This patch reduces the size of the given structures by adding one
indirection and only stores array of events which are actually used by the
memcg code. With this patch, the size of memcg_vmstats has reduced from
2544 bytes to 1056 bytes while the size of memcg_vmstats_percpu has
reduced from 2568 bytes to 1080 bytes.
memcg: extract memcg_vmstats from struct mem_cgroup
Patch series "memcg: reduce memory overhead of memory cgroups".
Currently a lot of memory is wasted to maintain the vmevents for memory
cgroups as we have multiple arrays of size NR_VM_EVENT_ITEMS which can be
as large as 110. However memcg code uses small portion of those entries.
This patch series eliminate this overhead by removing the unneeded vmevent
entries from memory cgroup data structures.
This patch (of 3):
This is a preparatory patch to reduce the memory overhead of memory
cgroup. The struct memcg_vmstats is the largest object embedded into the
struct mem_cgroup. This patch extracts struct memcg_vmstats from struct
mem_cgroup to ease the following patches in reducing the size of struct
memcg_vmstats.
Kefeng Wang [Wed, 7 Sep 2022 08:26:43 +0000 (16:26 +0800)]
memblock tests: add new pageblock related macro
Add new pageblock_start_pfn() and pageblock_align() macro which are needed
by memblock tests.
Link: https://lkml.kernel.org/r/20220907082643.186979-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Wed, 7 Sep 2022 06:08:44 +0000 (14:08 +0800)]
mm: add pageblock_aligned() macro
Add pageblock_aligned() and use it to simplify code.
Link: https://lkml.kernel.org/r/20220907060844.126891-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Wed, 7 Sep 2022 06:08:43 +0000 (14:08 +0800)]
mm: add pageblock_align() macro
Add pageblock_align() macro and use it to simplify code.
Link: https://lkml.kernel.org/r/20220907060844.126891-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kefeng Wang [Wed, 7 Sep 2022 06:08:42 +0000 (14:08 +0800)]
mm: reuse pageblock_start/end_pfn() macro
Move pageblock_start_pfn/pageblock_end_pfn() into pageblock-flags.h, then
they could be used somewhere else, not only in compaction, also use
ALIGN_DOWN() instead of round_down() to be pair with ALIGN(), which should
be same for pageblock usage.
Link: https://lkml.kernel.org/r/20220907060844.126891-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove an expensive and unnecessary operation as PCP pages are safely
skipped when reading page owner.PCP pages can be skipped because
PAGE_EXT_OWNER_ALLOCATED is cleared.
With draining PCP pages, these pages are moved to buddy list so they can
be identified as buddy pages and skipped quickly. Although it improved
efficiency of PFN walker, the drain is guaranteed expensive that is
unlikely to be offset by a slight increase in efficiency when skipping
free pages.
PAGE_EXT_OWNER_ALLOCATED is cleared in the page owner reset path below:
free_unref_page
-> free_unref_page_prepare
-> free_pcp_prepare
-> free_pages_prepare which do page owner
reset
-> free_unref_page_commit which add pages into pcp list
mm/damon: simplify damon_ctx check in damon_sysfs_before_terminate
In damon_sysfs_before_terminate(), it needs to check whether ctx->ops.id
supports 'DAMON_OPS_VADDR' or 'DAMON_OPS_FVADDR', there we can use
damon_target_has_pid() instead.
mm/damon/core: iterate the regions list from current point in damon_set_regions()
We iterate the whole regions list every time to get the first/last regions
intersecting with the specific range in damon_set_regions(), in order to
add new region or resize existing regions to fit in the specific range.
Actually, it is unnecessary to iterate the new added regions and the front
regions that have been checked. Just iterate the regions list from the
current point using list_for_each_entry_from() every time to improve
performance.
Mika Penttilä [Fri, 26 Aug 2022 05:06:31 +0000 (08:06 +0300)]
mm/hmm/test: use char dev with struct device to get device node
HMM selftests use an in-kernel pseudo device to emulate device memory.
The pseudo device registers a major device range for two or four pseudo
device instances. User space has a script that reads /proc/devices in
order to find the assigned major number, and sends that to mknod(1), once
for each node.
Change this to properly use cdev and struct device APIs.
Delete the /proc/devices parsing from the user-space test script, now that
it is unnecessary.
Also, delete an unused field in struct dmirror_device: devmem.
Link: https://lkml.kernel.org/r/20220826050631.25771-1-mpenttil@redhat.com Signed-off-by: Mika Penttilä <mpenttil@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kasan: better identify bug types for tag-based modes
Identify the bug type for the tag-based modes based on the stack trace
entries found in the stack ring.
If a free entry is found first (meaning that it was added last), mark the
bug as use-after-free. If an alloc entry is found first, mark the bug as
slab-out-of-bounds. Otherwise, assign the common bug type.
This change returns the functionalify of the previously dropped
CONFIG_KASAN_TAGS_IDENTIFY.
Link: https://lkml.kernel.org/r/13ce7fa07d9d995caedd1439dfae4d51401842f2.1662411800.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Instead of using a large static array, allocate the stack ring dynamically
via memblock_alloc().
The size of the stack ring is controlled by a new kasan.stack_ring_size
command-line parameter. When kasan.stack_ring_size is not provided, the
default value of 32 << 10 is used.
When the stack trace collection is disabled via kasan.stacktrace=off, the
stack ring is not allocated.
Link: https://lkml.kernel.org/r/03b82ab60db53427e9818e0b0c1971baa10c3cbc.1662411800.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add support for the kasan.stacktrace command-line argument for Software
Tag-Based KASAN.
The following patch adds a command-line argument for selecting the stack
ring size, and, as the stack ring is supported by both the Software and
the Hardware Tag-Based KASAN modes, it is natural that both of them have
support for kasan.stacktrace too.
Link: https://lkml.kernel.org/r/3b43059103faa7f8796017847b7d674b658f11b5.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement storing stack depot handles for alloc/free stack traces for slab
objects for the tag-based KASAN modes in a ring buffer.
This ring buffer is referred to as the stack ring.
On each alloc/free of a slab object, the tagged address of the object and
the current stack trace are recorded in the stack ring.
On each bug report, if the accessed address belongs to a slab object, the
stack ring is scanned for matching entries. The newest entries are used
to print the alloc/free stack traces in the report: one entry for alloc
and one for free.
The number of entries in the stack ring is fixed in this patch, but one of
the following patches adds a command-line argument to control it.
Add bug_type and alloc/free_track fields to kasan_report_info and add a
kasan_complete_mode_report_info() function that fills in these fields.
This function is implemented differently for different KASAN mode.
Change the reporting code to use the filled in fields instead of invoking
kasan_get_bug_type() and kasan_get_alloc/free_track().
For the Generic mode, kasan_complete_mode_report_info() invokes these
functions instead. For the tag-based modes, only the bug_type field is
filled in; alloc/free_track are handled in the next patch.
Using a single function that fills in these fields is required for the
tag-based modes, as the values for all three fields are determined in a
single procedure implemented in the following patch.
Link: https://lkml.kernel.org/r/8432b861054fa8d0cee79a8877dedeaf3b677ca8.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kasan: fill in cache and object in complete_report_info
Add cache and object fields to kasan_report_info and fill them in in
complete_report_info() instead of fetching them in the middle of the
report printing code.
This allows the reporting code to get access to the object information
before starting printing the report. One of the following patches uses
this information to determine the bug type with the tag-based modes.
Link: https://lkml.kernel.org/r/23264572cb2cbb8f0efbb51509b6757eb3cc1fc9.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kasan: only define kasan_cache_create for Generic mode
Right now, kasan_cache_create() assigns SLAB_KASAN for all KASAN modes and
then sets up metadata-related cache parameters for the Generic mode.
SLAB_KASAN is used in two places:
1. In slab_ksize() to account for per-object metadata when
calculating the size of the accessible memory within the object.
2. In slab_common.c via kasan_never_merge() to prevent merging of
caches with per-object metadata.
Both cases are only relevant when per-object metadata is present, which is
only the case with the Generic mode.
Thus, assign SLAB_KASAN and define kasan_cache_create() only for the
Generic mode.
Also update the SLAB_KASAN-related comment.
Link: https://lkml.kernel.org/r/61faa2aa1906e2d02c97d00ddf99ce8911dda095.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kasan: only define metadata structs for Generic mode
Hide the definitions of kasan_alloc_meta and kasan_free_meta under an
ifdef CONFIG_KASAN_GENERIC check, as these structures are now only used
when the Generic mode is enabled.
Link: https://lkml.kernel.org/r/8d2aabff8c227c444a3f62edf87d5630beb77640.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kasan: only define metadata offsets for Generic mode
Hide the definitions of alloc_meta_offset and free_meta_offset under an
ifdef CONFIG_KASAN_GENERIC check, as these fields are now only used when
the Generic mode is enabled.
Link: https://lkml.kernel.org/r/d4bafa0534facafd1a23c465a94261e64f366493.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a kasan_requires_meta() helper that indicates whether the enabled
KASAN mode requires per-object metadata and use this helper in the common
code.
Also hide kasan_init_object_meta() under CONFIG_KASAN_GENERIC ifdef check,
as Generic is the only mode that uses per-object metadata.
To allow for a potential future change that makes Generic KASAN support
the kasan.stacktrace command-line parameter, let kasan_requires_meta()
return kasan_stack_collection_enabled() instead of simply returning true.
Link: https://lkml.kernel.org/r/cf837e9996246aaaeebf704ccf8ec26a34fcf64f.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a kasan_init_object_meta() helper that initializes metadata for a slab
object and use it in the common code.
For now, the implementations of this helper are the same for the Generic
and tag-based modes, but they will diverge later in the series.
This change hides references to alloc_meta from the common code. This is
desired as only the Generic mode will be using per-object metadata after
this series.
Link: https://lkml.kernel.org/r/47c12938fc7f8105e7aaa592527c0e9d3c81fc37.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a kasan_get_alloc_track() helper that fetches alloc_track for a slab
object and use this helper in the common reporting code.
For now, the implementations of this helper are the same for the Generic
and tag-based modes, but they will diverge later in the series.
This change hides references to alloc_meta from the common reporting code.
This is desired as only the Generic mode will be using per-object
metadata after this series.
Link: https://lkml.kernel.org/r/0c365a35f4a833fff46f9d42c3212b32f7166556.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a kasan_print_aux_stacks() helper that prints the auxiliary stack
traces for the Generic mode.
This change hides references to alloc_meta from the common reporting code.
This is desired as only the Generic mode will be using per-object
metadata after this series.
Link: https://lkml.kernel.org/r/67c7a9ea6615533762b1f8ccc267cd7f9bafb749.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kasan: check KASAN_NO_FREE_META in __kasan_metadata_size
Patch series "kasan: switch tag-based modes to stack ring from per-object
metadata", v3.
This series makes the tag-based KASAN modes use a ring buffer for storing
stack depot handles for alloc/free stack traces for slab objects instead
of per-object metadata. This ring buffer is referred to as the stack
ring.
On each alloc/free of a slab object, the tagged address of the object and
the current stack trace are recorded in the stack ring.
On each bug report, if the accessed address belongs to a slab object, the
stack ring is scanned for matching entries. The newest entries are used
to print the alloc/free stack traces in the report: one entry for alloc
and one for free.
The advantages of this approach over storing stack trace handles in
per-object metadata with the tag-based KASAN modes:
- Allows to find relevant stack traces for use-after-free bugs without
using quarantine for freed memory. (Currently, if the object was
reallocated multiple times, the report contains the latest alloc/free
stack traces, not necessarily the ones relevant to the buggy allocation.)
- Allows to better identify and mark use-after-free bugs, effectively
making the CONFIG_KASAN_TAGS_IDENTIFY functionality always-on.
- Has fixed memory overhead.
The disadvantage:
- If the affected object was allocated/freed long before the bug happened
and the stack trace events were purged from the stack ring, the report
will have no stack traces.
Discussion
==========
The proposed implementation of the stack ring uses a single ring buffer
for the whole kernel. This might lead to contention due to atomic
accesses to the ring buffer index on multicore systems.
At this point, it is unknown whether the performance impact from this
contention would be significant compared to the slowdown introduced by
collecting stack traces due to the planned changes to the latter part, see
the section below.
For now, the proposed implementation is deemed to be good enough, but this
might need to be revisited once the stack collection becomes faster.
A considered alternative is to keep a separate ring buffer for each CPU
and then iterate over all of them when printing a bug report. This
approach requires somehow figuring out which of the stack rings has the
freshest stack traces for an object if multiple stack rings have them.
Further plans
=============
This series is a part of an effort to make KASAN stack trace collection
suitable for production. This requires stack trace collection to be fast
and memory-bounded.
The planned steps are:
1. Speed up stack trace collection (potentially, by using SCS;
patches on-hold until steps #2 and #3 are completed).
2. Keep stack trace handles in the stack ring (this series).
3. Add a memory-bounded mode to stack depot or provide an alternative
memory-bounded stack storage.
4. Potentially, implement stack trace collection sampling to minimize
the performance impact.
This patch (of 34):
__kasan_metadata_size() calculates the size of the redzone for objects in
a slab cache.
When accounting for presence of kasan_free_meta in the redzone, this
function only compares free_meta_offset with 0. But free_meta_offset
could also be equal to KASAN_NO_FREE_META, which indicates that
kasan_free_meta is not present at all.
Add a comparison with KASAN_NO_FREE_META into __kasan_metadata_size().
Left-shifting past the size of your datatype is undefined behaviour in C.
The literal 34 gets the type `int`, and that one is not big enough to be
left shifted by 26 bits.
An `unsigned` is long enough (on any machine that has at least 32 bits for
their ints.)
For uniformity, we mark all the literals as unsigned. But it's only
really needed for HUGETLB_FLAG_ENCODE_16GB.
Thanks to Randy Dunlap for an initial review and suggestion.
Link: https://lkml.kernel.org/r/20220905031904.150925-1-matthias.goergens@gmail.com Signed-off-by: Matthias Goergens <matthias.goergens@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/damon/sysfs: simplify the judgement whether kdamonds are busy
It is unnecessary to get the number of the running kdamond to judge
whether kdamonds are busy. Here we can use the
damon_sysfs_kdamond_running() helper and return -EBUSY directly when
finding a running kdamond. Meanwhile, merging with the judgement that a
kdamond has current sysfs command callback request to make the code more
clear.
mm: convert page_get_anon_vma() to folio_get_anon_vma()
With all callers now passing in a folio, rename the function and convert
all callers. Removes a couple of calls to compound_head() and a reference
to page->mapping.