Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
Thread information
[Search the linux-mm archive]
Nadav Amit ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Nadav Amit ` Yu Zhao ` Andy Lutomirski ` Nadav Amit ` Andrea Arcangeli ` Andy Lutomirski ` Andrea Arcangeli ` Andy Lutomirski ` Andrea Arcangeli ` Andy Lutomirski ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Peter Xu ` Nadav Amit [this message] ` Yu Zhao ` Linus Torvalds ` Yu Zhao ` Linus Torvalds ` Nadav Amit ` Linus Torvalds ` Yu Zhao ` Nadav Amit ` Peter Xu ` Nadav Amit ` Linus Torvalds ` Nadav Amit ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Yu Zhao ` Linus Torvalds ` Yu Zhao ` Linus Torvalds ` Yu Zhao ` Linus Torvalds ` Andy Lutomirski ` Linus Torvalds ` Andy Lutomirski ` Peter Zijlstra ` Andrea Arcangeli ` Peter Zijlstra ` Vinayak Menon ` Laurent Dufour ` Peter Zijlstra ` Laurent Dufour ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Will Deacon ` Nadav Amit ` Will Deacon ` Andy Lutomirski ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Nadav Amit ` Andrea Arcangeli ` Matthew Wilcox ` Andrea Arcangeli ` Yu Zhao ` Andrea Arcangeli ` Yu Zhao ` Linus Torvalds ` Linus Torvalds ` Yu Zhao ` Andrea Arcangeli ` Linus Torvalds ` Yu Zhao ` Peter Xu ` Andrea Arcangeli ` Andrea Arcangeli ` Yu Zhao ` Peter Xu ` Linus Torvalds ` Andrea Arcangeli ` Yu Zhao ` Peter Xu ` Andrea Arcangeli ` Andrea Arcangeli ` Yu Zhao ` Andrea Arcangeli ` Andy Lutomirski ` Andrea Arcangeli ` Nadav Amit ` Nadav Amit ` Yu Zhao ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Linus Torvalds ` Andrea Arcangeli ` Nadav Amit ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Will Deacon ` Nadav Amit ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Peter Xu ` Linus Torvalds ` Peter Xu
From: | Nadav Amit <nadav.amit-AT-gmail.com> | |
To: | Peter Xu <peterx-AT-redhat.com> | |
Subject: | Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect | |
Date: | Mon, 21 Dec 2020 10:31:57 -0800 | |
Message-ID: | <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> | |
Cc: | Yu Zhao <yuzhao-AT-google.com>, Andrea Arcangeli <aarcange-AT-redhat.com>, linux-mm <linux-mm-AT-kvack.org>, lkml <linux-kernel-AT-vger.kernel.org>, Pavel Emelyanov <xemul-AT-openvz.org>, Mike Kravetz <mike.kravetz-AT-oracle.com>, Mike Rapoport <rppt-AT-linux.vnet.ibm.com>, stable-AT-vger.kernel.org, minchan-AT-kernel.org, Andy Lutomirski <luto-AT-kernel.org>, Will Deacon <will-AT-kernel.org>, Peter Zijlstra <peterz-AT-infradead.org> |
> On Dec 21, 2020, at 9:27 AM, Peter Xu <peterx@redhat.com> wrote: > > Hi, Nadav, > > On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote: > > [...] > >> So to correct myself, I think that what I really encountered was actually >> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The >> problem was that in this case the “write”-bit was removed during unprotect. >> Sorry for the strange formatting to fit within 80 columns: > > I assume I can ignore the race mentioned in the commit message but only refer > to this one below. However I'm still confused. Please see below. > >> [ Start: PTE is writable ] >> >> cpu0 cpu1 cpu2 >> ---- ---- ---- >> [ Writable PTE >> cached in TLB ] > > Here cpu2 got writable pte in tlb. But why? > > If below is an unprotect, it means it must have been protected once by > userfaultfd, right? If so, the previous change_protection_range() which did > the wr-protect should have done a tlb flush already before it returns (since > pages>0 - we protected one pte at least). Then I can't see why cpu2 tlb has > stall data. Thanks, Peter. Just as you can munprotect() a region which was not protected before, you can ufff-unprotect a region that was not protected before. It might be that the user tried to unprotect a large region, which was partially protected and partially unprotected. The selftest obviously blindly unprotect some regions to check for bugs. So to your question - it was not write-protected (think about initial copy without write-protecting). > If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't > happen either, because cpu1/cpu2 will all go through the cow path and pgtable > lock should serialize them. > >> userfaultfd_writeprotect() >> [ write-*unprotect* ] >> mwriteprotect_range() >> mmap_read_lock() >> change_protection() >> >> change_protection_range() >> ... >> change_pte_range() >> [ *clear* “write”-bit ] >> [ defer TLB flushes] >> [ page-fault ] >> … >> wp_page_copy() >> cow_user_page() >> [ copy page ] >> [ write to old >> page ] >> … >> set_pte_at_notify() >> >> [ End: cpu2 write not copied form old to new page. ] > > Could you share how to reproduce the problem? I would be glad to give it a > shot as well. You can run the selftests/userfaultfd with my small patch [1]. I ran it with the following parameters: “ ./userfaultfd anon 100 100 “. I think that it is more easily reproducible with “mitigations=off idle=poll” as kernel parameters. [1] https://lore.kernel.org/patchwork/patch/1346386/ > >> [1] https://lore.kernel.org/patchwork/patch/1346386 > > PS: Sorry to not have read the other series of yours. It seems to need some > chunk of time so I postponed it a bit due to other things; but I'll read at > least the fixes very soon. Thanks again, I will post RFCv2 with some numbers soon.