Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
Thread information
[Search the linux-mm archive]
Nadav Amit ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Nadav Amit ` Yu Zhao ` Andy Lutomirski ` Nadav Amit ` Andrea Arcangeli ` Andy Lutomirski ` Andrea Arcangeli ` Andy Lutomirski ` Andrea Arcangeli ` Andy Lutomirski ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Peter Xu ` Nadav Amit ` Yu Zhao ` Linus Torvalds [this message] ` Yu Zhao ` Linus Torvalds ` Nadav Amit ` Linus Torvalds ` Yu Zhao ` Nadav Amit ` Peter Xu ` Nadav Amit ` Linus Torvalds ` Nadav Amit ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Yu Zhao ` Linus Torvalds ` Yu Zhao ` Linus Torvalds ` Yu Zhao ` Linus Torvalds ` Andy Lutomirski ` Linus Torvalds ` Andy Lutomirski ` Peter Zijlstra ` Andrea Arcangeli ` Peter Zijlstra ` Vinayak Menon ` Laurent Dufour ` Peter Zijlstra ` Laurent Dufour ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Will Deacon ` Nadav Amit ` Will Deacon ` Andy Lutomirski ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Nadav Amit ` Andrea Arcangeli ` Matthew Wilcox ` Andrea Arcangeli ` Yu Zhao ` Andrea Arcangeli ` Yu Zhao ` Linus Torvalds ` Linus Torvalds ` Yu Zhao ` Andrea Arcangeli ` Linus Torvalds ` Yu Zhao ` Peter Xu ` Andrea Arcangeli ` Andrea Arcangeli ` Yu Zhao ` Peter Xu ` Linus Torvalds ` Andrea Arcangeli ` Yu Zhao ` Peter Xu ` Andrea Arcangeli ` Andrea Arcangeli ` Yu Zhao ` Andrea Arcangeli ` Andy Lutomirski ` Andrea Arcangeli ` Nadav Amit ` Nadav Amit ` Yu Zhao ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Linus Torvalds ` Andrea Arcangeli ` Nadav Amit ` Nadav Amit ` Yu Zhao ` Nadav Amit ` Will Deacon ` Nadav Amit ` Andrea Arcangeli ` Nadav Amit ` Andrea Arcangeli ` Peter Xu ` Linus Torvalds ` Peter Xu
From: | Linus Torvalds <torvalds-AT-linux-foundation.org> | |
To: | Yu Zhao <yuzhao-AT-google.com> | |
Subject: | Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect | |
Date: | Mon, 21 Dec 2020 11:55:02 -0800 | |
Message-ID: | <CAHk-=wg_UBuo7ro1fpEGkMyFKA1+PxrE85f9J_AhUfr-nJPpLQ@mail.gmail.com> | |
Cc: | Peter Xu <peterx-AT-redhat.com>, Andrea Arcangeli <aarcange-AT-redhat.com>, linux-mm <linux-mm-AT-kvack.org>, lkml <linux-kernel-AT-vger.kernel.org>, Pavel Emelyanov <xemul-AT-openvz.org>, Mike Kravetz <mike.kravetz-AT-oracle.com>, Mike Rapoport <rppt-AT-linux.vnet.ibm.com>, stable <stable-AT-vger.kernel.org>, Minchan Kim <minchan-AT-kernel.org>, Andy Lutomirski <luto-AT-kernel.org>, Will Deacon <will-AT-kernel.org>, Peter Zijlstra <peterz-AT-infradead.org>, Nadav Amit <nadav.amit-AT-gmail.com> |
On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao <yuzhao@google.com> wrote: > > Nadav Amit found memory corruptions when running userfaultfd test above. > It seems to me the problem is related to commit 09854ba94c6a ("mm: > do_wp_page() simplification"). Can you please take a look? Thanks. > > TL;DR: it may not safe to make copies of singly mapped (non-COW) pages > when it's locked or has additional ref count because concurrent > clear_soft_dirty or change_pte_range may have removed pte_write but yet > to flush tlb. Hmm. The TLB flush shouldn't actually matter, because anything that changes the writable bit had better be serialized by the page table lock. Yes, we often load the page table value without holding the page table lock (in order to know what we are going to do), but then before we finalize the operation, we then re-check - undet the page table lock - that the value we loaded still matches. But I think I see what *MAY* be going on. The userfaultfd mwriteprotect_range() code takes the mm lock for _reading_. Which means that you can have Thread A Thread B - fault starts. Sees write-protected pte, allocates memory, copies data - userfaultfd makes the regions writable - usefaultfd case writes to the region - userfaultfd makes region non-writable - fault continues, gets the page table lock, sees that the pte is the same, uses old copied data But if this is what's happening, I think it's a userfaultfd bug. I think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be a mmap_write_lock(). mprotect() does this right, it looks like userfaultfd does not. You cannot just change the writability of a page willy-nilly without the correct locking. Maybe there are other causes, but this one stands out to me as one possible cause. Comments? Linus