]> git.baikalelectronics.ru Git - kernel.git/commitdiff
mm: do page fault accounting in handle_mm_fault
authorPeter Xu <peterx@redhat.com>
Wed, 12 Aug 2020 01:37:44 +0000 (18:37 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Wed, 12 Aug 2020 17:58:02 +0000 (10:58 -0700)
Patch series "mm: Page fault accounting cleanups", v5.

This is v5 of the pf accounting cleanup series.  It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 85fa6c186149 ("mm: allow
VM_FAULT_RETRY for multiple times"):

  https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

What this series did:

  - Correct page fault accounting: we do accounting for a page fault
    (no matter whether it's from #PF handling, or gup, or anything else)
    only with the one that completed the fault.  For example, page fault
    retries should not be counted in page fault counters.  Same to the
    perf events.

  - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
    event is used in an adhoc way across different archs.

    Case (1): for many archs it's done at the entry of a page fault
    handler, so that it will also cover e.g.  errornous faults.

    Case (2): for some other archs, it is only accounted when the page
    fault is resolved successfully.

    Case (3): there're still quite some archs that have not enabled
    this perf event.

    Since this series will touch merely all the archs, we unify this
    perf event to always follow case (1), which is the one that makes most
    sense.  And since we moved the accounting into handle_mm_fault, the
    other two MAJ/MIN perf events are well taken care of naturally.

  - Unify definition of "major faults": the definition of "major
    fault" is slightly changed when used in accounting (not
    VM_FAULT_MAJOR).  More information in patch 1.

  - Always account the page fault onto the one that triggered the page
    fault.  This does not matter much for #PF handlings, but mostly for
    gup.  More information on this in patch 25.

Patchset layout:

Patch 1:     Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23:  Enable the new accounting for arch #PF handlers one by one.
Patch 24:    Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25:    Cleanup GUP task_struct pointer since it's not needed any more

This patch (of 25):

This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault().  This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events.  To
do this, the pt_regs pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
31 files changed:
arch/alpha/mm/fault.c
arch/arc/mm/fault.c
arch/arm/mm/fault.c
arch/arm64/mm/fault.c
arch/csky/mm/fault.c
arch/hexagon/mm/vm_fault.c
arch/ia64/mm/fault.c
arch/m68k/mm/fault.c
arch/microblaze/mm/fault.c
arch/mips/mm/fault.c
arch/nds32/mm/fault.c
arch/nios2/mm/fault.c
arch/openrisc/mm/fault.c
arch/parisc/mm/fault.c
arch/powerpc/mm/copro_fault.c
arch/powerpc/mm/fault.c
arch/riscv/mm/fault.c
arch/s390/mm/fault.c
arch/sh/mm/fault.c
arch/sparc/mm/fault_32.c
arch/sparc/mm/fault_64.c
arch/um/kernel/trap.c
arch/x86/mm/fault.c
arch/xtensa/mm/fault.c
drivers/iommu/amd/iommu_v2.c
drivers/iommu/intel/svm.c
include/linux/mm.h
mm/gup.c
mm/hmm.c
mm/ksm.c
mm/memory.c

index c2303a8c2b9f7ca40cdb7d01e03fb0a7bde8a2a9..1983e43a5e2f45f91576abd7148c238f7c3f1a53 100644 (file)
@@ -148,7 +148,7 @@ retry:
        /* If for any reason at all we couldn't handle the fault,
           make sure we exit gracefully rather than endlessly redo
           the fault.  */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index 7287c793d1c9de1c9bd3f887923fe923474421ea..587dea524e6b7a3327f45d64a833b336e0676921 100644 (file)
@@ -130,7 +130,7 @@ retry:
                goto bad_area;
        }
 
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        /* Quick path to respond to signals */
        if (fault_signal_pending(fault, regs)) {
index c6550eddfce190680b78a240ea8ad0cda5db3c0f..01a8e0f8fef7f9d6b268e2768243ad06f7abf4bd 100644 (file)
@@ -224,7 +224,7 @@ good_area:
                goto out;
        }
 
-       return handle_mm_fault(vma, addr & PAGE_MASK, flags);
+       return handle_mm_fault(vma, addr & PAGE_MASK, flags, NULL);
 
 check_stack:
        /* Don't allow expansion below FIRST_USER_ADDRESS */
index 8afb238ff3358c0d785b58325f642973a4a77a24..be29f4076fe3c2b4c4622373361622a2977030f7 100644 (file)
@@ -428,7 +428,7 @@ static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr,
         */
        if (!(vma->vm_flags & vm_flags))
                return VM_FAULT_BADACCESS;
-       return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags);
+       return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags, NULL);
 }
 
 static bool is_el0_instruction_abort(unsigned int esr)
index b1dce9f2f04dd815fc713936701b9bcca0a7a66d..b252e6e4d32f4fef207af89b2c3ddc67ad1089fa 100644 (file)
@@ -150,7 +150,8 @@ good_area:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0);
+       fault = handle_mm_fault(vma, address, write ? FAULT_FLAG_WRITE : 0,
+                               NULL);
        if (unlikely(fault & VM_FAULT_ERROR)) {
                if (fault & VM_FAULT_OOM)
                        goto out_of_memory;
index cd3808f96b930b5523e788b38afa4c4e11007511..f12f330e7946b9852fdfc49bcfc97fee170c34b5 100644 (file)
@@ -88,7 +88,7 @@ good_area:
                break;
        }
 
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index 3a4dec334cc58cb855ffc9558caa2d66a4d242a6..abf2808f9b4bafabf3eef9a398c2344d6c833a47 100644 (file)
@@ -143,7 +143,7 @@ retry:
         * sure we exit gracefully rather than endlessly redo the
         * fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index 508abb63da6785ba2d3bc735938a0a9a09117b1b..08b35a318ebeb3c305c22290195fa79614c9ebfd 100644 (file)
@@ -134,7 +134,7 @@ good_area:
         * the fault.
         */
 
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
        pr_debug("handle_mm_fault returns %x\n", fault);
 
        if (fault_signal_pending(fault, regs))
index a2bfe587b49126f4fa44a035cf02e2e7463345f2..1a3d4c4ca28be0ebd93d337d7fbe9520ef6d2a09 100644 (file)
@@ -214,7 +214,7 @@ good_area:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index 01b168a90434aa564a20628a4586b5df9458864a..b1db39784db9dcfd3f8e5cd4d09b10aeaa7c7937 100644 (file)
@@ -152,7 +152,7 @@ good_area:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index 8fb73f6401a03e05c50e036a3cdcaf5a33cc4da0..d0ecc8fb5b237bf76dbaabf5a23e0a1b38460064 100644 (file)
@@ -206,7 +206,7 @@ good_area:
         * the fault.
         */
 
-       fault = handle_mm_fault(vma, addr, flags);
+       fault = handle_mm_fault(vma, addr, flags, NULL);
 
        /*
         * If we need to retry but a fatal signal is pending, handle the
index 4112ef0e247ee438308968f45daaf9bbfbb679e6..86beb9a2698eaa3bfe6add02d9b91485eb25790e 100644 (file)
@@ -131,7 +131,7 @@ good_area:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index d2224ccca2941f4777c3f4aee06eb02db39f1e98..3daa491d1edbcc46b06c746474819a912319fd52 100644 (file)
@@ -159,7 +159,7 @@ good_area:
         * the fault.
         */
 
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index 66ac0719bd4927eb8dca1288728ddc2dd25d5f6a..e32d06928c24f335c570394fcc8af4578b59918e 100644 (file)
@@ -302,7 +302,7 @@ good_area:
         * fault.
         */
 
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index b83abbead4a23237bab54f52f25520b16baf8e00..2d0276abe0a68a6077f167b558fc1d51dbec41db 100644 (file)
@@ -64,7 +64,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea,
        }
 
        ret = 0;
-       *flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0);
+       *flt = handle_mm_fault(vma, ea, is_write ? FAULT_FLAG_WRITE : 0, NULL);
        if (unlikely(*flt & VM_FAULT_ERROR)) {
                if (*flt & VM_FAULT_OOM) {
                        ret = -ENOMEM;
index 925a7231abb334d96232b0d7fa55846f539a5553..c6a5225a35214bfcef3bd8118328f758fdec8118 100644 (file)
@@ -511,7 +511,7 @@ retry:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        major |= fault & VM_FAULT_MAJOR;
 
index 5873835a3e6b715af02217d74549668fcac12b46..30c1124d0fb6ea8bffc979a227d93cacd8618e25 100644 (file)
@@ -109,7 +109,7 @@ good_area:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, addr, flags);
+       fault = handle_mm_fault(vma, addr, flags, NULL);
 
        /*
         * If we need to retry but a fatal signal is pending, handle the
index aebf9183bedd15a36fce4c315b0eca1169cfbb9d..ad783aaaf6492ac65767078821ce93a79e54950f 100644 (file)
@@ -476,7 +476,7 @@ retry:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
        if (fault_signal_pending(fault, regs)) {
                fault = VM_FAULT_SIGNAL;
                if (flags & FAULT_FLAG_RETRY_NOWAIT)
index fbe1f2fe9a8c8f558b75dd41f1bf028a83d9a623..3c0a11827f7ed180e5e52bece4f7be386df6eeb8 100644 (file)
@@ -482,7 +482,7 @@ good_area:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (unlikely(fault & (VM_FAULT_RETRY | VM_FAULT_ERROR)))
                if (mm_fault_error(regs, error_code, address, fault))
index cfef656eda0f948f814e943c313f4bf66c9b97be..06af03db4417a24f7d68a3fbb0b68b919fe2bcbd 100644 (file)
@@ -234,7 +234,7 @@ good_area:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
@@ -410,7 +410,7 @@ good_area:
                if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
                        goto bad_area;
        }
-       switch (handle_mm_fault(vma, address, flags)) {
+       switch (handle_mm_fault(vma, address, flags, NULL)) {
        case VM_FAULT_SIGBUS:
        case VM_FAULT_OOM:
                goto do_sigbus;
index a3806614e4dc0e25a25fd6a03433b2ae292547d3..9ebee14ee893ae1191cc4011229a71f5332114db 100644 (file)
@@ -422,7 +422,7 @@ good_area:
                        goto bad_area;
        }
 
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                goto exit_exception;
index 2b3afa354a9049f7469c7da9185fb123dfc267e5..8d9870d76da12258e56132749ead061b03f018e5 100644 (file)
@@ -71,7 +71,7 @@ good_area:
        do {
                vm_fault_t fault;
 
-               fault = handle_mm_fault(vma, address, flags);
+               fault = handle_mm_fault(vma, address, flags, NULL);
 
                if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
                        goto out_nosemaphore;
index 0c7643d9f7cb30ec561c4f1ebeea6d6bed96f4cc..e1bf5555d80a44cf3924b141d9c47087b8ad910d 100644 (file)
@@ -1291,7 +1291,7 @@ good_area:
         * userland). The return to userland is identified whenever
         * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
        major |= fault & VM_FAULT_MAJOR;
 
        /* Quick path to respond to signals */
index c128dcc7c85b461f4b91cb2f37fedad6ebaf946d..e72c8c1359a6c1863c3b1b82058ac27a7b01ed7f 100644 (file)
@@ -107,7 +107,7 @@ good_area:
         * make sure we exit gracefully rather than endlessly redo
         * the fault.
         */
-       fault = handle_mm_fault(vma, address, flags);
+       fault = handle_mm_fault(vma, address, flags, NULL);
 
        if (fault_signal_pending(fault, regs))
                return;
index e4b025c5637c45bd909c6250fe02eb5f12b48790..c259108ab6dd78fcb0514fd8d6f03240dc21b7d2 100644 (file)
@@ -495,7 +495,7 @@ static void do_fault(struct work_struct *work)
        if (access_error(vma, fault))
                goto out;
 
-       ret = handle_mm_fault(vma, address, flags);
+       ret = handle_mm_fault(vma, address, flags, NULL);
 out:
        mmap_read_unlock(mm);
 
index 6c87c807a0abb8e3527a996d3d9ba88fd52456fe..5ae59a6ad681500956c4f71e73b2e99e9515eb8e 100644 (file)
@@ -872,7 +872,8 @@ static irqreturn_t prq_event_thread(int irq, void *d)
                        goto invalid;
 
                ret = handle_mm_fault(vma, address,
-                                     req->wr_req ? FAULT_FLAG_WRITE : 0);
+                                     req->wr_req ? FAULT_FLAG_WRITE : 0,
+                                     NULL);
                if (ret & VM_FAULT_ERROR)
                        goto invalid;
 
index f97b10117d44d1d1a561fc53231ef00a6f3662b3..ec0ffb423769244e34703bc27f869668775a993e 100644 (file)
@@ -38,6 +38,7 @@ struct file_ra_state;
 struct user_struct;
 struct writeback_control;
 struct bdi_writeback;
+struct pt_regs;
 
 void init_mm_internals(void);
 
@@ -1658,7 +1659,8 @@ int invalidate_inode_page(struct page *page);
 
 #ifdef CONFIG_MMU
 extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
-                       unsigned long address, unsigned int flags);
+                                 unsigned long address, unsigned int flags,
+                                 struct pt_regs *regs);
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
                            unsigned long address, unsigned int fault_flags,
                            bool *unlocked);
@@ -1668,7 +1670,8 @@ void unmap_mapping_range(struct address_space *mapping,
                loff_t const holebegin, loff_t const holelen, int even_cows);
 #else
 static inline vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
-               unsigned long address, unsigned int flags)
+                                        unsigned long address, unsigned int flags,
+                                        struct pt_regs *regs)
 {
        /* should never happen if there's no MMU */
        BUG();
index e9d1d0cc18f05726ef2681495b5734ab8eb2a826..ae7121d729fa784b3b88e2829d125c8217647d56 100644 (file)
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -884,7 +884,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
                fault_flags |= FAULT_FLAG_TRIED;
        }
 
-       ret = handle_mm_fault(vma, address, fault_flags);
+       ret = handle_mm_fault(vma, address, fault_flags, NULL);
        if (ret & VM_FAULT_ERROR) {
                int err = vm_fault_to_errno(ret, *flags);
 
@@ -1238,7 +1238,7 @@ retry:
            fatal_signal_pending(current))
                return -EINTR;
 
-       ret = handle_mm_fault(vma, address, fault_flags);
+       ret = handle_mm_fault(vma, address, fault_flags, NULL);
        major |= ret & VM_FAULT_MAJOR;
        if (ret & VM_FAULT_ERROR) {
                int err = vm_fault_to_errno(ret, 0);
index bb279319bf4057674a8a52203ff04617bbf49eab..943cb2ba444232565a696e5550ec9f22bd96f4fa 100644 (file)
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -75,7 +75,8 @@ static int hmm_vma_fault(unsigned long addr, unsigned long end,
        }
 
        for (; addr < end; addr += PAGE_SIZE)
-               if (handle_mm_fault(vma, addr, fault_flags) & VM_FAULT_ERROR)
+               if (handle_mm_fault(vma, addr, fault_flags, NULL) &
+                   VM_FAULT_ERROR)
                        return -EFAULT;
        return -EBUSY;
 }
index 217842a66912f2e88704c7ae33058b6163fbc475..0aa2247bddd76dbee9099f2df7f46542f7d6e7e4 100644 (file)
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -480,7 +480,8 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
                        break;
                if (PageKsm(page))
                        ret = handle_mm_fault(vma, addr,
-                                       FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE);
+                                             FAULT_FLAG_WRITE | FAULT_FLAG_REMOTE,
+                                             NULL);
                else
                        ret = VM_FAULT_WRITE;
                put_page(page);
index 325bb575e7ec3e531c56a1a6c1d61052d825265a..9b7d35734caaf262c27d6da39e4150225d19afd9 100644 (file)
@@ -71,6 +71,8 @@
 #include <linux/dax.h>
 #include <linux/oom.h>
 #include <linux/numa.h>
+#include <linux/perf_event.h>
+#include <linux/ptrace.h>
 
 #include <trace/events/kmem.h>
 
@@ -4356,6 +4358,64 @@ retry_pud:
        return handle_pte_fault(&vmf);
 }
 
+/**
+ * mm_account_fault - Do page fault accountings
+ *
+ * @regs: the pt_regs struct pointer.  When set to NULL, will skip accounting
+ *        of perf event counters, but we'll still do the per-task accounting to
+ *        the task who triggered this page fault.
+ * @address: the faulted address.
+ * @flags: the fault flags.
+ * @ret: the fault retcode.
+ *
+ * This will take care of most of the page fault accountings.  Meanwhile, it
+ * will also include the PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN] perf counter
+ * updates.  However note that the handling of PERF_COUNT_SW_PAGE_FAULTS should
+ * still be in per-arch page fault handlers at the entry of page fault.
+ */
+static inline void mm_account_fault(struct pt_regs *regs,
+                                   unsigned long address, unsigned int flags,
+                                   vm_fault_t ret)
+{
+       bool major;
+
+       /*
+        * We don't do accounting for some specific faults:
+        *
+        * - Unsuccessful faults (e.g. when the address wasn't valid).  That
+        *   includes arch_vma_access_permitted() failing before reaching here.
+        *   So this is not a "this many hardware page faults" counter.  We
+        *   should use the hw profiling for that.
+        *
+        * - Incomplete faults (VM_FAULT_RETRY).  They will only be counted
+        *   once they're completed.
+        */
+       if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
+               return;
+
+       /*
+        * We define the fault as a major fault when the final successful fault
+        * is VM_FAULT_MAJOR, or if it retried (which implies that we couldn't
+        * handle it immediately previously).
+        */
+       major = (ret & VM_FAULT_MAJOR) || (flags & FAULT_FLAG_TRIED);
+
+       /*
+        * If the fault is done for GUP, regs will be NULL, and we will skip
+        * the fault accounting.
+        */
+       if (!regs)
+               return;
+
+       if (major) {
+               current->maj_flt++;
+               perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
+       } else {
+               current->min_flt++;
+               perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
+       }
+}
+
 /*
  * By the time we get here, we already hold the mm semaphore
  *
@@ -4363,7 +4423,7 @@ retry_pud:
  * return value.  See filemap_fault() and __lock_page_or_retry().
  */
 vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
-               unsigned int flags)
+                          unsigned int flags, struct pt_regs *regs)
 {
        vm_fault_t ret;
 
@@ -4404,6 +4464,8 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
                        mem_cgroup_oom_synchronize(false);
        }
 
+       mm_account_fault(regs, address, flags, ret);
+
        return ret;
 }
 EXPORT_SYMBOL_GPL(handle_mm_fault);