-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: io: CopierTo and CopierFrom interfaces #67074
Comments
Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error? |
Yes. |
Maybe I'm missing the point but what precisely is the issue with solving this how you opt out of any unwanted type unerasure by forcing a barrier struct (as already mentioned in #16474)? If you are using io.CopyBuffer, chances are you specially prepared for the buffer, so we are talking about a deliberate, pin-pointed optimization, not implicit system-wide optimizations which io.WriterTo do help with. io.CopyBuffer is not alone in exhibiting behaviour not expressible in the type system which one needs to be wary about: compress/zlib.NewReader may discard data if the reader cannot unerase to an exact one and bufio.(*Reader).WriteTo can even call ReadFrom (although undocumented). If you want to prohibit a function from utilizing its fast paths, hide your own capabilities. |
The motivation for this proposal is to fix It's perhaps not immediately obvious that func (t *transferWriter) doBodyCopy(dst io.Writer, src io.Reader) (n int64, err error) {
buf := getCopyBuf()
defer putCopyBuf(buf)
n, err = io.CopyBuffer(dst, src, buf)
if err != nil && err != io.EOF {
t.bodyReadError = err
}
return
} This function copies from a user-provided Let's consider the case where we're on macOS, copying from an In this case:
Note that there are three We could change This is a mess. At a high level, I have two goals. When using common types from the standard library (files, network sockets):
Arguably, the recursive calls to The Go compatibility promise closes off most avenues for fixing this:
(Of these options, the last one seems like the most feasible: If could return The root of the problem is that
The proposed Adding yet more magic methods to dig us out of the hole caused by the existing magic methods does seem somewhat odd. It's quite possible that if we were designing |
We could constrain this by using an internal package, and by looking for methods that return types defined in that internal package. That would mean that only types in the standard library would implement the new interfaces. On the one hand that would prevent other packages from taking advantage of these new methods. On the other hand it would constrain the additional complexity. For example package iofd
type FD uintptr
type SysFDer interface {
SysFD() FD
CopyBetween(dst, src FD, buf []byte) (int64, bool, error)
} Then in package io srcfd, srcOK := src.(iofd.SysFDer)
dstfd, dstOK := dst.(iofd.SysFDer)
if srcOK && dstOK {
len, handled, err := srcfd.CopyBetween(dstfd, srcfd, buf)
if handled {
return len, err
}
} |
We could constrain this to be internal-only, but I don't think it's worth it. We'd still have the complexity of the new methods, we'd just not be allowing anyone else to take advantage of the benefits. |
I think the complexity of the new methods is significantly less if nobody else can implement them. In particular we can design the new methods so that they work on the pair of source and destination, which is what we need. One of the problems with the current methods is that they only work on the source or on the destination, which is how we've gotten into the position of needing to use both, and therefore needing to turn off both, and therefore increasing complexity. |
#69097 is another example of things going wrong in the current tangle of ReaderFroms/WriterTos. I'm pretty sure this also broke with the addition of *os.File.WriteTo in 1.22. |
I don't follow this. Any method we add will need to be called on one of the source or destination, with the other side of the copy as a parameter. I don't see the reason to pass srcfd as a parameter in the example; srcfd is already the receiver, so we're just passing it to the method twice:
Also, requiring that both source and destination implement SysFDer is limiting. Currently, the Linux File.ReadFrom checks to see if the source is an io.LimitedReader and extracts the inner reader when available ( https://go.googlesource.com/go/+/refs/tags/go1.23.0/src/os/zero_copy_linux.go#73). Either we'd have to make LimitedReader implement SysFDer or drop the ability to splice from a size-limited source. The problem with ReaderFrom/WriterTo isn't that they operate on only the source or destination, it's that they have no access to the copy buffer and no way to fall back if the fast path isn't available. I don't think fixing those problems adds complexity; instead it removes it. |
You're right, there is no reason to make I think there is a problem with |
It'd be nice to fix the fast paths that stopped working in 1.22. (net/http has a path which uses io.CopyBuffer with a pooled buffer where the buffer is ignored, #69097 is a sendfile path that no longer works.) I think there are two orthogonal points in the discussion above:
I don't think it's a solution to the first problem. I don't think the right fix here is to define a fast path that only works for types in the standard library. It should be possible for someone to write a type outside the standard library with a fast-path copy method that uses the buffer passed to |
While the individual steps are reasonable, in retrospect this looks like a mistake. The However, these methods are not what we want for accessing special system calls like But at least in the standard library I don't think there are any other cases of So why are we thinking about passing in a buffer? Because the magic system calls are not always available; they depend on the types being copied between. And in our current implementation, if the magic system calls are not available, they fall back on doing the copy the old-fashioned way. And when that happens, they should use the buffer passed to So we don't need // DataSender may be implemented by a data source.
// The SendData method sends data to w, sending up to n bytes.
// If n is negative, it sends all available bytes.
// It returns the number of bytes sent.
// This is expected to use an efficient mechanism that depends on both
// the data source and w, such as the sendfile system call.
// If no efficient mechanism is available for the pair of the data source and w,
// this will return 0, [errors.ErrUnsupported].
// For a partial send, this will return a positive value along with an error.
type DataSender interface {
SendData(w Writer, n int64) (int64, error)
} Note that we need not define a We could pass in an optional buffer if that seems useful. Today I think it is not useful, but perhaps someday it would be. Then we change
The goal of these steps is to use the Hopefully I haven't made a mistake somewhere. |
Passing the number of bytes to copy is a good change. That should avoid an allocation in I think we should pass in the optional buffer. Even if we don't have any immediate need for it, passing both the number of bytes to copy and an optional buffer means the call includes all the parameters passed to Thinking through a few possible scenarios:
I think that all works. The one place where it seems like a single I like the name With the optional buffer: // DataSender may be implemented by a data source.
// The SendData method sends up to b bytes of data to w.
// If n is negative, it sends all available bytes.
// It returns the number of bytes sent.
//
// This is expected to use an efficient mechanism that depends on both
// the data source and w, such as the sendfile system call.
// If no efficient mechanism is available for the pair of the data source and w,
// this will return 0, [errors.ErrUnsupported].
// For a partial send, this will return a positive value along with an error.
//
// When len(buf) > 0, the SendData method may use it as temporary space.
//
// The [Copy] function uses DataSender when available.
type DataSender interface {
SendData(w Writer, n int64, buf []byte) (int64, error)
} |
Proposal Details
The
io.Copy
andio.CopyBuffer
functions will useWriteTo
when the source supports it andReadFrom
when the destination supports it.There are cases when a type can efficiently support
WriteTo
orReadFrom
, but only some of the time. For example, CL 472475 added anWriteTo
method to*os.File
which performs a more efficient copy on Linux systems when the destination is a Unix or TCP socket. In all other cases, it falls back toio.Copy
.The
io.Copy
fallback means thatio.CopyBuffer
with an*os.File
source will no longer use the provided buffer, because the buffer is not plumbed through toWriteTo
. (It also resulted in https://go.dev/issue/66988, in which the fallback resulted in the fast path in *net.TCPConn.ReadFrom not being taken.)There have been previous proposals to fix this problem:
io.Copy
case.Those issues also contain some links to various other instances of this problem turning up, but I think the
*os.File.WriteTo
case is a sufficient example of a problem that needs solving, sinceio.CopyBuffer
no longer works as expected on files.I propose that we add two new interfaces to the
io
package:The
CopierTo
andCopierFrom
interfaces supersedeWriterTo
andReaderFrom
when available. They provide a way to plumb the copy buffer down through the copy operation, and permit an implementation to implement a fast path for some subset of sources or destinations while deferring toio.Copy
for other cases.We will update
io.Copy
andio.CopyBuffer
to preferCopierTo
andCopierFrom
when available.We will update
*os.File
and*net.TCPConn
(and possibly other types) to addCopyTo
andCopyFrom
methods.The text was updated successfully, but these errors were encountered: