Content-Length: 12067 | pFad | http://lwn.net/Articles/998764/

support block layer write streams and FDP [LWN.net]
|
|
Subscribe / Log in / New account

support block layer write streams and FDP

From:  Christoph Hellwig <hch-AT-lst.de>
To:  Jens Axboe <axboe-AT-kernel.dk>
Subject:  support block layer write streams and FDP
Date:  Tue, 19 Nov 2024 13:16:14 +0100
Message-ID:  <20241119121632.1225556-1-hch@lst.de>
Cc:  Christian Brauner <brauner-AT-kernel.org>, Keith Busch <kbusch-AT-kernel.org>, Sagi Grimberg <sagi-AT-grimberg.me>, Kanchan Joshi <joshi.k-AT-samsung.com>, Hui Qi <hui81.qi-AT-samsung.com>, Nitesh Shetty <nj.shetty-AT-samsung.com>, Jan Kara <jack-AT-suse.cz>, Pavel Begunkov <asml.silence-AT-gmail.com>, linux-block-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org, linux-nvme-AT-lists.infradead.org, linux-fsdevel-AT-vger.kernel.org, io-uring-AT-vger.kernel.org
Archive-link:  Article

Hi all,

as a small interruptions to regularly scheduled culture wars this series
implements a properly layered approach to block layer write streams.

This is based on Keith "Subject: [PATCHv11 0/9] write hints with nvme fdp
and scsi streams", but doesn't bypass the file systems.

The rough idea is that block devices can expose a number of distinct
write streams, and bio submitter can pick on them.  All bios that do
not pick an explicit write stream get the default one.  On the driver
layer this is wird up to NVMe FDP, but it should also work for SCSI
and NVMe streams if someone cares enough.  On the upper layer the only
consuder right now are the block device node file operations, which
either support an explicit stream selection through io_uring, or
by mapping the old per-inode life time hints to streams.

The stream API is designed to also implementable by other files,
so a statx extension to expose the number of handles, and their
granularity is added as well.

This currently does not do the write hint mapping for file systems,
which needs to be done in the file system and under careful consideration
about how many of these streams the file system wants to grant to
the application - if any.  It also doesn't support querying how much
has been written to a "granularity unit" aka reclaim unit in NVMe,
which is essential if you want a WAF=1 but apparently not needed for
the current urgent users.

The last patch to support write streams on partitions works, but feels
like a not very nice interface to me, and might allow only to restricted
mappings for some.  It would be great if users that absolutely require
partition support to speak up and help improve it, otherwise I'd suggest
to skip it for the initial submission.

The series is based on Jens' for-next branch as of today, and also
available as git tree:

    git://git.infradead.org/users/hch/misc.git block-write-streams

Gitweb:

    http://git.infradead.org/?p=users/hch/misc.git;a=shortlog;...

Diffstat:
 Documentation/ABI/stable/sysfs-block |   15 +++
 block/bdev.c                         |   15 +++
 block/bio.c                          |    2 
 block/blk-core.c                     |    2 
 block/blk-crypto-fallback.c          |    1 
 block/blk-merge.c                    |   39 ++-------
 block/blk-sysfs.c                    |    6 +
 block/bounce.c                       |    1 
 block/fops.c                         |   23 +++++
 block/genhd.c                        |   52 ++++++++++++
 block/partitions/core.c              |    6 -
 drivers/nvme/host/core.c             |  151 ++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h             |   10 +-
 fs/stat.c                            |    2 
 include/linux/blk_types.h            |    8 +
 include/linux/blkdev.h               |   16 +++
 include/linux/fs.h                   |    1 
 include/linux/nvme.h                 |   77 +++++++++++++++++
 include/linux/stat.h                 |    2 
 include/uapi/linux/io_uring.h        |    4 
 include/uapi/linux/stat.h            |    7 +
 io_uring/io_uring.c                  |    2 
 io_uring/rw.c                        |    2 
 23 files changed, 405 insertions(+), 39 deletions(-)



Copyright © 2024, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://lwn.net/Articles/998764/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy