]> git.baikalelectronics.ru Git - kernel.git/commitdiff
mm/khugepaged: don't recycle vma pgtable if uffd-wp registered
authorPeter Xu <peterx@redhat.com>
Fri, 13 May 2022 03:22:55 +0000 (20:22 -0700)
committerAndrew Morton <akpm@linux-foundation.org>
Fri, 13 May 2022 14:20:11 +0000 (07:20 -0700)
When we're trying to collapse a 2M huge shmem page, don't retract pgtable
pmd page if it's registered with uffd-wp, because that pgtable could have
pte markers installed.  Recycling of that pgtable means we'll lose the pte
markers.  That could cause data loss for an uffd-wp enabled application on
shmem.

Instead of disabling khugepaged on these files, simply skip retracting
these special VMAs, then the page cache can still be merged into a huge
thp, and other mm/vma can still map the range of file with a huge thp when
proper.

Note that checking VM_UFFD_WP needs to be done with mmap_sem held for
write, that avoids race like:

         khugepaged                             user thread
         ==========                             ===========
     check VM_UFFD_WP, not set
                                       UFFDIO_REGISTER with uffd-wp on shmem
                                       wr-protect some pages (install markers)
     take mmap_sem write lock
     erase pmd and free pmd page
      --> pte markers are dropped unnoticed!

Link: https://lkml.kernel.org/r/20220405014921.14994-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/khugepaged.c

index a2560f97088148ec0e577c10b0d03e1e0c23ee12..2243ed095f023b78a07bfe5612311039ce189b35 100644 (file)
@@ -1456,6 +1456,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
        if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
                return;
 
+       /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */
+       if (userfaultfd_wp(vma))
+               return;
+
        hpage = find_lock_page(vma->vm_file->f_mapping,
                               linear_page_index(vma, haddr));
        if (!hpage)
@@ -1591,7 +1595,15 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
                 * reverse order. Trylock is a way to avoid deadlock.
                 */
                if (mmap_write_trylock(mm)) {
-                       if (!khugepaged_test_exit(mm))
+                       /*
+                        * When a vma is registered with uffd-wp, we can't
+                        * recycle the pmd pgtable because there can be pte
+                        * markers installed.  Skip it only, so the rest mm/vma
+                        * can still have the same file mapped hugely, however
+                        * it'll always mapped in small page size for uffd-wp
+                        * registered ranges.
+                        */
+                       if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma))
                                collapse_and_free_pmd(mm, vma, addr, pmd);
                        mmap_write_unlock(mm);
                } else {