LWN: Comments on "Converting NFSD to use iomap and folios"
https://lwn.net/Articles/936628/
This is a special feed containing comments posted
to the individual LWN article titled "Converting NFSD to use iomap and folios".
en-usThu, 27 Feb 2025 22:07:55 +0000Thu, 27 Feb 2025 22:07:55 +0000https://www.rssboard.org/rss-specificationlwn@lwn.netConverting NFSD to use iomap and folios
https://lwn.net/Articles/937529/
https://lwn.net/Articles/937529/dgc<div class="FormattedComment">
<span class="QuotedText">> Does the filesystem know about dirty pages in the page cache? Let's say</span><br>
<span class="QuotedText">> we did a read() from a hole, put a zeroed page in the page cache, then</span><br>
<span class="QuotedText">> stored to it, does page_mkwrite() mark the extent as containing data?</span><br>
<p>
Yes, page_mkwrite() will force filesystem block allocation when the folio over a hole (or shared extent needing COW) is first dirtied. That's the entire point of ->page_mkwrite existing - to allow filesytsems to reserve/allocate space and return ENOSPC before the data in the folio is dirtied by userspace.<br>
<p>
For XFS, this triggers delayed allocation reservation for the range of hole in the file being dirtied (or the range of the write() being serviced), which the filesystem then tracks as a delayed allocation extent. This is returned to iomap as iomap->type = IOMAP_DELALLOC. IOWs, the range contains a hole on disk, but space has been reserved and it has dirty cached data on top of it.<br>
<p>
The IOMAP_UNWRITTEN case is different - it represents a specific on-disk extent state, and that may or may not have dirty cached data over it. There is no separate data for "dirty, unwritten" in iomap, hence the need for the page cache lookup. I suspect we could do something internal to filesystems and iomap to track dirty unwritten extent ranges like we do delalloc ranges, but largely that complexity has not been necessary because the dirty ranges are already in the page cache...<br>
</div>
Thu, 06 Jul 2023 23:16:40 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937437/
https://lwn.net/Articles/937437/willy<div class="FormattedComment">
<span class="QuotedText">> IMO, the right way to do "sparse reads" is with a sparse iov type, returning {buffer, len} for each data iovec, and {NULL, len} for each hole iovec in the read range. This can be done as a single "atomic" read from the filesystem POV (i.e. a single iomap "sparse read" operation under the i_rwsem) </span><br>
<p>
Does the filesystem know about dirty pages in the page cache? Let's say we did a read() from a hole, put a zeroed page in the page cache, then stored to it, does page_mkwrite() mark the extent as containing data? I'm just looking at the circumstances under which iomap calls mapping_seek_hole_data() and it seems that we do have to ask the page cache whether a page is present in the UNWRITTEN case.<br>
</div>
Thu, 06 Jul 2023 02:00:04 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937436/
https://lwn.net/Articles/937436/jake<div class="FormattedComment">
<span class="QuotedText">> I suspect some wires have been crossed here. Reads into holes do not cause filesystems to allocate blocks </span><br>
<span class="QuotedText">> for the holes - only writes into holes will cause that to happen. Reads into holes cause filesystems </span><br>
<span class="QuotedText">> to allocate page cache folios full of zeroes over the ranges - they remain as holes on disk....</span><br>
<p>
Right, Chuck misunderstood you, which is what: <br>
<br>
<span class="QuotedText">> Jan Kara said that it was a misunderstanding; the system will not allocate blocks on disk,</span><br>
<span class="QuotedText">> but it will allocate zero pages in the page cache, which could be avoided using iomap. </span><br>
<p>
was meant to convey. Sorry for the confusion.<br>
<p>
jake<br>
</div>
Thu, 06 Jul 2023 01:20:38 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937434/
https://lwn.net/Articles/937434/dgc<div class="FormattedComment">
<span class="QuotedText">> In the past, Dave Chinner had told him that reading an unallocated</span><br>
<span class="QuotedText">> extent (i.e. "hole") in a sparse file will cause the system to allocate</span><br>
<span class="QuotedText">> blocks on disk to hold the hole and fill it in with zeroes; that is not</span><br>
<span class="QuotedText">> something that he wants an NFS read to do, especially for large files. </span><br>
<p>
I suspect some wires have been crossed here. Reads into holes do not cause filesystems to allocate blocks for the holes - only writes into holes will cause that to happen. Reads into holes cause filesystems to allocate page cache folios full of zeroes over the ranges - they remain as holes on disk....<br>
<p>
IMO, the right way to do "sparse reads" is with a sparse iov type, returning {buffer, len} for each data iovec, and {NULL, len} for each hole iovec in the read range. This can be done as a single "atomic" read from the filesystem POV (i.e. a single iomap "sparse read" operation under the i_rwsem) and the iomap extent map iteration would determine if the operation to be performed is "read data into page cache and copy" or "fill out a sparse iov entry" before moving to the next extent map. This will be atomic wrt. truncate, hole punching, buffered writes, etc and so provide the same data atomicity "guarantees" as a single data read operation that filled the page cache with zeroes...<br>
<p>
-Dave.<br>
</div>
Wed, 05 Jul 2023 23:36:40 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937352/
https://lwn.net/Articles/937352/bfields<div class="FormattedComment">
Yeah, distributed filesystems are very different from network block devices. There's more than could be done to hide the latency. (E.g., it's total pie in the sky, but directory write delegations could allow creating entire directory trees locally, then writing back to the server in the background.) There are lots of hard problems.<br>
<p>
But none of that is what Chuck's talking about here. For something like a simple sequential read of a large file, there's nothing in theory stopping NFS from delivering whatever the network and disk hardware are capable of. So reports of regressions there are interesting.<br>
</div>
Wed, 05 Jul 2023 13:52:09 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937345/
https://lwn.net/Articles/937345/jlayton<div class="FormattedComment">
NFS has just as much latency as is required for what it does.<br>
<p>
A disk-based filesystem on a network block device doesn't have to contend with cache coherency. Since there is only one "client", you don't need to worry about whether someone else wrote to the file while you're in the middle of reading it, for example.<br>
<p>
NFS on the other hand does have to deal with that sort of thing.<br>
</div>
Wed, 05 Jul 2023 10:48:55 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937340/
https://lwn.net/Articles/937340/josh<div class="FormattedComment">
And yet, NFS has substantially more latency than a network file system necessarily needs to have. As a bound, consider the much lower latency of an ordinary file system stored on a network block device.<br>
</div>
Wed, 05 Jul 2023 10:00:02 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937331/
https://lwn.net/Articles/937331/jengelh<div class="FormattedComment">
Not just NFS, *every* network protocol is subject to latency. The problem isn't even so much the protocol itself as the next layer above. wget, cat, tar, they all operate on their arguments in sequential fashion, so you can expect to incur N*RTT wait time. Had those programs read input files in parallel, that would hide the latency, though at the cost of making the programs more complex.<br>
</div>
Wed, 05 Jul 2023 06:16:06 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937313/
https://lwn.net/Articles/937313/josh<div class="FormattedComment">
<span class="QuotedText">> He has gotten some anecdotal reports that NFS reads from the server are slow</span><br>
<p>
As I understand it, the most common complaint about NFS performance is latency and round-trips.<br>
</div>
Tue, 04 Jul 2023 17:33:43 +0000Converting NFSD to use iomap and folios
https://lwn.net/Articles/937308/
https://lwn.net/Articles/937308/mrchuck<div class="FormattedComment">
Thank you Jake, nice summary of a complicated topic.<br>
</div>
Tue, 04 Jul 2023 15:50:34 +0000