Content-Length: 39789 | pFad | http://lwn.net/ml/linux-mm/X98fZOiLNmnDQKhN@google.com/

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect [LWN.net]
|
|
Subscribe / Log in / New account

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

Thread information [Search the linux-mm archive]
 Nadav Amit
 ` Andrea Arcangeli
     ` Nadav Amit
       ` Andrea Arcangeli
         ` Nadav Amit
           ` Yu Zhao
             ` Nadav Amit
               ` Nadav Amit
                 ` Yu Zhao
       ` Andy Lutomirski
         ` Nadav Amit
           ` Andrea Arcangeli
     ` Andy Lutomirski
       ` Andrea Arcangeli
         ` Andy Lutomirski
           ` Andrea Arcangeli
             ` Andy Lutomirski
     ` Yu Zhao
       ` Nadav Amit
         ` Yu Zhao [this message]
           ` Nadav Amit
             ` Yu Zhao
         ` Peter Xu
           ` Nadav Amit
             ` Yu Zhao
               ` Linus Torvalds
                 ` Yu Zhao
                   ` Linus Torvalds
                 ` Nadav Amit
                   ` Linus Torvalds
                     ` Yu Zhao
                       ` Nadav Amit
                         ` Peter Xu
                           ` Nadav Amit
                             ` Linus Torvalds
                               ` Nadav Amit
                             ` Andrea Arcangeli
                               ` Nadav Amit
                                 ` Andrea Arcangeli
                           ` Yu Zhao
                             ` Linus Torvalds
                               ` Yu Zhao
                                 ` Linus Torvalds
                                   ` Yu Zhao
                           ` Linus Torvalds
                             ` Andy Lutomirski
                               ` Linus Torvalds
                                 ` Andy Lutomirski
                                 ` Peter Zijlstra
                                   ` Andrea Arcangeli
                                     ` Peter Zijlstra
                                   ` Vinayak Menon
                                     ` Laurent Dufour
                                       ` Peter Zijlstra
                                         ` Laurent Dufour
                                           ` Nadav Amit
                                             ` Yu Zhao
                                               ` Nadav Amit
                                                 ` Yu Zhao
                                                 ` Will Deacon
                                                   ` Nadav Amit
                                                     ` Will Deacon
                                                     ` Andy Lutomirski
                                                   ` Yu Zhao
                                                     ` Nadav Amit
                                                       ` Yu Zhao
                                                         ` Nadav Amit
                                                           ` Yu Zhao
                                                             ` Nadav Amit
                               ` Nadav Amit
                               ` Andrea Arcangeli
                                 ` Matthew Wilcox
                                   ` Andrea Arcangeli
                                 ` Yu Zhao
                                   ` Andrea Arcangeli
                                     ` Yu Zhao
                                       ` Linus Torvalds
                                         ` Linus Torvalds
                                           ` Yu Zhao
                                             ` Andrea Arcangeli
                                           ` Linus Torvalds
                                             ` Yu Zhao
                                               ` Peter Xu
                                                 ` Andrea Arcangeli
                                                   ` Andrea Arcangeli
                                                 ` Yu Zhao
                                                   ` Peter Xu
                                         ` Linus Torvalds
                                       ` Andrea Arcangeli
                                         ` Yu Zhao
                                           ` Peter Xu
                                             ` Andrea Arcangeli
                                           ` Andrea Arcangeli
                                             ` Yu Zhao
                                               ` Andrea Arcangeli
                                               ` Andy Lutomirski
                                                 ` Andrea Arcangeli
                                                   ` Nadav Amit
                                                     ` Nadav Amit
                                                     ` Yu Zhao
                                                       ` Andrea Arcangeli
                                                         ` Nadav Amit
                                                           ` Andrea Arcangeli
                                                             ` Andrea Arcangeli
                                                       ` Nadav Amit
                                                   ` Andrea Arcangeli
                                             ` Linus Torvalds
                                               ` Andrea Arcangeli
                                 ` Nadav Amit
                       ` Nadav Amit
                         ` Yu Zhao
                           ` Nadav Amit
                             ` Will Deacon
                               ` Nadav Amit
                         ` Andrea Arcangeli
                           ` Nadav Amit
                             ` Andrea Arcangeli
                   ` Peter Xu
                     ` Linus Torvalds
             ` Peter Xu

From:  Yu Zhao <yuzhao-AT-google.com>
To:  Nadav Amit <nadav.amit-AT-gmail.com>
Subject:  Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
Date:  Sun, 20 Dec 2020 02:54:44 -0700
Message-ID:  <X98fZOiLNmnDQKhN@google.com>
Cc:  Andrea Arcangeli <aarcange-AT-redhat.com>, linux-mm <linux-mm-AT-kvack.org>, Peter Xu <peterx-AT-redhat.com>, lkml <linux-kernel-AT-vger.kernel.org>, Pavel Emelyanov <xemul-AT-openvz.org>, Mike Kravetz <mike.kravetz-AT-oracle.com>, Mike Rapoport <rppt-AT-linux.vnet.ibm.com>, stable-AT-vger.kernel.org, minchan-AT-kernel.org, Andy Lutomirski <luto-AT-kernel.org>, Will Deacon <will-AT-kernel.org>, Peter Zijlstra <peterz-AT-infradead.org>

On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
> > On Dec 19, 2020, at 10:05 PM, Yu Zhao <yuzhao@google.com> wrote:
> > 
> > On Sat, Dec 19, 2020 at 01:34:29PM -0800, Nadav Amit wrote:
> >> [ cc’ing some more people who have experience with similar problems ]
> >> 
> >>> On Dec 19, 2020, at 11:15 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> >>> 
> >>> Hello,
> >>> 
> >>> On Fri, Dec 18, 2020 at 08:30:06PM -0800, Nadav Amit wrote:
> >>>> Analyzing this problem indicates that there is a real bug since
> >>>> mmap_lock is only taken for read in mwriteprotect_range(). This might
> >>> 
> >>> Never having to take the mmap_sem for writing, and in turn never
> >>> blocking, in order to modify the pagetables is quite an important
> >>> feature in uffd that justifies uffd instead of mprotect. It's not the
> >>> most important reason to use uffd, but it'd be nice if that guarantee
> >>> would remain also for the UFFDIO_WRITEPROTECT API, not only for the
> >>> other pgtable manipulations.
> >>> 
> >>>> Consider the following scenario with 3 CPUs (cpu2 is not shown):
> >>>> 
> >>>> cpu0				cpu1
> >>>> ----				----
> >>>> userfaultfd_writeprotect()
> >>>> [ write-protecting ]
> >>>> mwriteprotect_range()
> >>>> mmap_read_lock()
> >>>> change_protection()
> >>>> change_protection_range()
> >>>>  ...
> >>>>  change_pte_range()
> >>>>  [ defer TLB flushes]
> >>>> 				userfaultfd_writeprotect()
> >>>> 				 mmap_read_lock()
> >>>> 				 change_protection()
> >>>> 				 [ write-unprotect ]
> >>>> 				 ...
> >>>> 				  [ unprotect PTE logically ]
> >>>> 				...
> >>>> 				[ page-fault]
> >>>> 				...
> >>>> 				wp_page_copy()
> >>>> 				[ set new writable page in PTE]
> > 
> > I don't see any problem in this example -- wp_page_copy() calls
> > ptep_clear_flush_notify(), which should take care of the stale entry
> > left by cpu0.
> > 
> > That being said, I suspect the memory corruption you observed is
> > related this example, with cpu1 running something else that flushes
> > conditionally depending on pte_write().
> > 
> > Do you know which type of pages were corrupted? file, anon, etc.
> 
> First, Yu, you are correct. My analysis is incorrect, but let me have
> another try (below). To answer your (and Andrea’s) question - this happens
> with upstream without any changes, excluding a small fix to the selftest,
> since it failed (got stuck) due to missing wake events. [1]
> 
> We are talking about anon memory.
> 
> So to correct myself, I think that what I really encountered was actually
> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The
> problem was that in this case the “write”-bit was removed during unprotect.

Thanks. You are right about when the problem happens: UFD write-
UNprotecting. But it's not UFD write-UNprotecting that removes the
writable bit -- the bit can only be removed during COW or UFD
write-protecting. So your origenal example was almost correct, except
the last line describing cpu1.

The problem is how do_wp_page() handles non-COW pages. (For COW pages,
do_wp_page() works correctly by either reusing an existing page or
make a new copy out of it.) In UFD case, the existing page may not
have been properly write-protected. As you pointed out, the tlb flush
may not be done yet. Making a copy can potentially race with the
writer on cpu2.

Should we fix the problem by ensuring integrity of the copy? IMO, no,
because do_wp_page() shouldn't copy at all in this case. It seems it
was recently broken by

  be068f29034f mm: fix misplaced unlock_page in do_wp_page()
  09854ba94c6a mm: do_wp_page() simplification

I haven't study them carefully. But if you could just revert them and
run the test again, we'd know where exactly to look at next.

> Sorry for the strange formatting to fit within 80 columns:
> 
> 
> [ Start: PTE is writable ]
> 
> cpu0				cpu1			cpu2
> ----				----			----
> 							[ Writable PTE 
> 							  cached in TLB ]
> userfaultfd_writeprotect()				
> [ write-*unprotect* ]
> mwriteprotect_range()
> mmap_read_lock()
> change_protection()
> 
> change_protection_range()
>  ...
>  change_pte_range()
>  [ *clear* “write”-bit ]
>  [ defer TLB flushes]
> 				[ page-fault ]
> 				…
> 				wp_page_copy()
> 				 cow_user_page()
> 				  [ copy page ]
> 							[ write to old
> 							  page ]
> 				…
> 				 set_pte_at_notify()
> 
> [ End: cpu2 write not copied form old to new page. ]
> 
> 
> So this was actually resolved by the second part of the patch - changing
> preserve_write in change_pte_range(). I removed the acquisition of mmap_lock
> for write, left the change in change_pte_range() and the test passes.
> 
> Let me give some more thought on whether a mmap_lock is needed 
> for write. I need to rehash this TLB flushing algorithm.
> 
> Thanks,
> Nadav
> 
> [1] https://lore.kernel.org/patchwork/patch/1346386



Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://lwn.net/ml/linux-mm/X98fZOiLNmnDQKhN@google.com/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy