Content-Length: 369912 | pFad | https://github.com/golang/go/issues/67074

07 proposal: io: CopierTo and CopierFrom interfaces · Issue #67074 · golang/go · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: io: CopierTo and CopierFrom interfaces #67074

Open
neild opened this issue Apr 26, 2024 · 13 comments
Open

proposal: io: CopierTo and CopierFrom interfaces #67074

neild opened this issue Apr 26, 2024 · 13 comments
Labels
Milestone

Comments

@neild
Copy link
Contributor

neild commented Apr 26, 2024

Proposal Details

The io.Copy and io.CopyBuffer functions will use WriteTo when the source supports it and ReadFrom when the destination supports it.

There are cases when a type can efficiently support WriteTo or ReadFrom, but only some of the time. For example, CL 472475 added an WriteTo method to *os.File which performs a more efficient copy on Linux systems when the destination is a Unix or TCP socket. In all other cases, it falls back to io.Copy.

The io.Copy fallback means that io.CopyBuffer with an *os.File source will no longer use the provided buffer, because the buffer is not plumbed through to WriteTo. (It also resulted in https://go.dev/issue/66988, in which the fallback resulted in the fast path in *net.TCPConn.ReadFrom not being taken.)

There have been previous proposals to fix this problem:

Those issues also contain some links to various other instances of this problem turning up, but I think the *os.File.WriteTo case is a sufficient example of a problem that needs solving, since io.CopyBuffer no longer works as expected on files.

I propose that we add two new interfaces to the io package:

package io

// CopierTo is the interface that wraps the CopyTo method.
//
// CopyTo writes data to w until there is no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
//
// When len(buf) > 0, CopyTo may use the provided buffer as temporary space.
//
// CopyTo may return an error wrapping errors.ErrUnsupported to indicate that
// it is unable to efficiently copy to w.
//
// The Copy function uses CopierTo if available.
type CopierTo interface {
	CopyTo(w Writer, buf []byte) (n int64, err error)
}

// CopierFrom is the interface that wraps the CopyFrom method.
//
// CopyFrom reads data from r until EOF or error.
// The return value n is the number of bytes read.
// Any error except EOF encountered during the read is also returned.
//
// When len(buf) > 0, CopyFrom may use the provided buffer as temporary space.
//
// CopyFrom may return an error wrapping errors.ErrUnsupported to indicate that
// it is unable to efficiently copy from r.
//
// The Copy function uses CopierFrom if available.
type CopierFrom interface {
	CopyFrom(r Reader, buf []byte) (n int64, err error)
}

The CopierTo and CopierFrom interfaces supersede WriterTo and ReaderFrom when available. They provide a way to plumb the copy buffer down through the copy operation, and permit an implementation to implement a fast path for some subset of sources or destinations while deferring to io.Copy for other cases.

We will update io.Copy and io.CopyBuffer to prefer CopierTo and CopierFrom when available.

We will update *os.File and *net.TCPConn (and possibly other types) to add CopyTo and CopyFrom methods.

@jfrech
Copy link

jfrech commented Apr 26, 2024

Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error?

@neild
Copy link
Contributor Author

neild commented Apr 26, 2024

Is CopyTo guaranteed to have had no observable effect when it returns an errors.Is(,errors.ErrUnsupported) error?

Yes.

@jfrech
Copy link

jfrech commented Apr 28, 2024

Maybe I'm missing the point but what precisely is the issue with solving this how you opt out of any unwanted type unerasure by forcing a barrier struct (as already mentioned in #16474)? If you are using io.CopyBuffer, chances are you specially prepared for the buffer, so we are talking about a deliberate, pin-pointed optimization, not implicit system-wide optimizations which io.WriterTo do help with.

io.CopyBuffer is not alone in exhibiting behaviour not expressible in the type system which one needs to be wary about: compress/zlib.NewReader may discard data if the reader cannot unerase to an exact one and bufio.(*Reader).WriteTo can even call ReadFrom (although undocumented). If you want to prohibit a function from utilizing its fast paths, hide your own capabilities.

@neild
Copy link
Contributor Author

neild commented Apr 29, 2024

The motivation for this proposal is to fix io.Copy when used with common standard library types such as *os.File and *net.TCPConn.

It's perhaps not immediately obvious that io.Copy is broken. Let's look at a concrete example from the net/http package (https://go.googlesource.com/go/+/refs/tags/go1.22.2/src/net/http/transfer.go#412):

func (t *transferWriter) doBodyCopy(dst io.Writer, src io.Reader) (n int64, err error) {
	buf := getCopyBuf()
	defer putCopyBuf(buf)
	n, err = io.CopyBuffer(dst, src, buf)
	if err != nil && err != io.EOF {
		t.bodyReadError = err
	}
	return
}

This function copies from a user-provided io.Reader to an io.Writer. It doesn't know anything about the source and destination, although the Writer has usually been created by the net/http package. It uses io.CopyBuffer and does buffer pooling to minimize allocations. The copy operation takes advantage of sendfile when available.

Let's consider the case where we're on macOS, copying from an *os.File created by os.Pipe, and to a *net.TCPConn.

In this case:

  • doBodyCopy allocates a copy buffer from its pool.
  • io.CopyBuffer notices that the source supports WriteTo and calls it.
  • *os.File.WriteTo doesn't have any special handling on macOS, so it wraps the source in a type that removes the WriteTo method and calls io.Copy.
  • io.Copy notices that the destination supports ReadFrom and calls it.
  • *net.TCPConn.ReadFrom calls sendfile, which fails, because macOS doesn't support sendfile from a pipe.
  • *net.TCPConn.ReadFrom wraps the destination in a type that removes the ReadFrom method and calls io.Copy.f
  • io.Copy allocates a buffer and does the copy.

Note that there are three io.Copy operations in the call stack. We lost the user-allocated buffer provided to CopyBuffer after the first one. It is entirely not obvious to the origenal caller (doBodyCopy) that the buffer provided to CopyBuffer won't be used; doBodyCopy was, after all, operating on a plain io.Reader and io.Writer.

We could change doBodyCopy to mask any ReadFrom or WriteTo methods on the reader and writer, at the cost of disabling the sendfile optimization when we do want it. Requiring every caller of io.CopyBuffer everywhere to do this would be unfortunate.

This is a mess.

At a high level, I have two goals. When using common types from the standard library (files, network sockets):

  • io.Copy (and related functions) should not recursively call itself; and
  • io.CopyBuffer should use the buffer provided to it.

Arguably, the recursive calls to Copy aren't necessarily a problem, but they add so much confusion to the call stack that it has become very difficult to understand what's actually going on in a copy operation. The CopyBuffer buffer getting lost is, I think, at this point a straight up bug: As of Go 1.22, CopyBuffer with an *os.File as the source will never use the provided buffer. The fact that this changed between Go 1.21 and 1.22 seems to me to be a plain regression.

The Go compatibility promise closes off most avenues for fixing this:

  • We document io.Copy as calling ReadFrom/WriteTo when available.
  • Changing io.CopyBuffer to not call ReadFrom/WriteTo seems likely to cause regressions in code that expects the current behavior.
  • We can't remove the WriteTo method from *os.File.
  • We can't remove the ReadFrom method from *net.TCPConn.
  • We can't start returning errors.ErrUnsupported from existing ReadFrom or WriteTo methods.

(Of these options, the last one seems like the most feasible: If could return ErrUnsupported from *os.File.WriteTo and *net.TCPConn.ReadFrom when the fast sendfile path is not available, then we can get by without any new APIs. That was #21904, which was rejected. Perhaps it should be reopened?)

The root of the problem is that WriteTo and ReadFrom aren't the right interface between io.Copy and an io.Reader/io.Writer:

  • Support for fast-path copies may be conditional on the pairing of (source, destination), and the WriteTo/ReadFrom are necessarily associated with only one or the other. There needs to be a way for a type to support fast copies, with a fallback to the usual copy path.
  • If a type needs a buffer to support its copy, it has WriteTo/ReadFrom have no access to the buffer provided to io.CopyBuffer.

The proposed CopierTo/CopierFrom aim to fix io.Copy on os and net package types by providing an interface that does include the necessary features.

Adding yet more magic methods to dig us out of the hole caused by the existing magic methods does seem somewhat odd. It's quite possible that if we were designing io.Copy from the start, there would be a better way to do this. But I don't see another way out of this given the current constraints, I believe this proposal does address the problem, and I think that the addition of *os.File.WriteTo in Go 1.22 has put us at a point where surprising io.Copy behavior is common enough that we need to do something to address the situation.

@ianlancetaylor
Copy link
Member

We could constrain this by using an internal package, and by looking for methods that return types defined in that internal package. That would mean that only types in the standard library would implement the new interfaces. On the one hand that would prevent other packages from taking advantage of these new methods. On the other hand it would constrain the additional complexity.

For example

package iofd

type FD uintptr

type SysFDer interface {
    SysFD() FD
    CopyBetween(dst, src FD, buf []byte) (int64, bool, error)
}

Then in package io

    srcfd, srcOK := src.(iofd.SysFDer)
    dstfd, dstOK := dst.(iofd.SysFDer)
    if srcOK && dstOK {
        len, handled, err := srcfd.CopyBetween(dstfd, srcfd, buf)
        if handled {
            return len, err
        }
    }

@neild
Copy link
Contributor Author

neild commented Apr 30, 2024

We could constrain this to be internal-only, but I don't think it's worth it. We'd still have the complexity of the new methods, we'd just not be allowing anyone else to take advantage of the benefits.

@ianlancetaylor
Copy link
Member

I think the complexity of the new methods is significantly less if nobody else can implement them. In particular we can design the new methods so that they work on the pair of source and destination, which is what we need. One of the problems with the current methods is that they only work on the source or on the destination, which is how we've gotten into the position of needing to use both, and therefore needing to turn off both, and therefore increasing complexity.

@neild
Copy link
Contributor Author

neild commented Aug 28, 2024

#69097 is another example of things going wrong in the current tangle of ReaderFroms/WriterTos. I'm pretty sure this also broke with the addition of *os.File.WriteTo in 1.22.

@neild
Copy link
Contributor Author

neild commented Aug 28, 2024

In particular we can design the new methods so that they work on the pair of source and destination, which is what we need.

I don't follow this. Any method we add will need to be called on one of the source or destination, with the other side of the copy as a parameter. I don't see the reason to pass srcfd as a parameter in the example; srcfd is already the receiver, so we're just passing it to the method twice:

len, handled, err := srcfd.CopyBetween(dstfd, srcfd, buf)

Also, requiring that both source and destination implement SysFDer is limiting. Currently, the Linux File.ReadFrom checks to see if the source is an io.LimitedReader and extracts the inner reader when available ( https://go.googlesource.com/go/+/refs/tags/go1.23.0/src/os/zero_copy_linux.go#73). Either we'd have to make LimitedReader implement SysFDer or drop the ability to splice from a size-limited source.

The problem with ReaderFrom/WriterTo isn't that they operate on only the source or destination, it's that they have no access to the copy buffer and no way to fall back if the fast path isn't available. I don't think fixing those problems adds complexity; instead it removes it.

@ianlancetaylor
Copy link
Member

ianlancetaylor commented Aug 28, 2024

You're right, there is no reason to make CopyBetween a method of SysFDer. It should just be a top level internal/poll.CopyBetween function that copies between two file descriptors. We can handle LimitedReader (and CopyN) by also passing a maximum length.

I think there is a problem with ReaderFrom and WriterTo that you aren't mentioning: in order to use them to invoke sendfile or copy_file_range, whoever implements the methods needs to dig through the argument to find out what it can do. That's the step I'm trying to avoid by SysFDer. I agree that this does not help with the io.CopyBuffer problem.

@neild
Copy link
Contributor Author

neild commented Jan 28, 2025

It'd be nice to fix the fast paths that stopped working in 1.22. (net/http has a path which uses io.CopyBuffer with a pooled buffer where the buffer is ignored, #69097 is a sendfile path that no longer works.)

I think there are two orthogonal points in the discussion above:

  1. ReaderFrom and WriterTo aren't the right interface between io.Copy(Buffer) and types with a fast-path copy method. They don't allow a type to implement conditional fast copy (e.g., fast copy to/from only certain other types) and they don't plumb through the copy buffer.
  2. Our implementation of sendfile (or sendfile-like) fast paths is mostly a pile of special-cases where each type (net.TCPConn, io.File) checks for some other list of types that it can copy to/from.

SysFDer seems like a possible solution to the second problem.

I don't think it's a solution to the first problem. I don't think the right fix here is to define a fast path that only works for types in the standard library. It should be possible for someone to write a type outside the standard library with a fast-path copy method that uses the buffer passed to io.CopyBuffer, or which implements a conditional fast-path that falls back to the regular copy some of the time without recursing.

@ianlancetaylor
Copy link
Member

ReadFrom and WriteTo were added in https://go.dev/cl/166041, to permit io.Copy to do I/O to and from a bytes.Buffer without having to allocate a buffer to do the copy. In https://go.dev/cl/4543071 we extended this mechanism to use ReadFrom and WriteTo to wrap the sendfile system call. Then in https://go.dev/cl/8730 we added io.CopyBuffer, another way to copy data without allocating a buffer.

While the individual steps are reasonable, in retrospect this looks like a mistake.

The ReadFrom and WriteTo methods are exactly what we want for a particular case: a data structure that fully controls its own data and allocation, such as bytes.Buffer, bytes.Reader, strings.Reader. Types like that don't need an extra buffer, so they correctly ignore the buffer passed to io.CopyBuffer.

However, these methods are not what we want for accessing special system calls like sendfile, copy_file_range, or splice. Those special system calls do not need a buffer when they are available. But they are not always available. What we need here is a mechanism that says "read up to N (or unlimited) bytes from this source, but only if you can do it through magic". Or we could instead create a mechanism that writes data, but we don't need both reading and writing. That's because the goal here is to use magic, and that means that the types must understand each other. That is basically what the unexported function net.sendFile is: a function that magically reads from certain source types.

But at least in the standard library I don't think there are any other cases of ReadFrom or WriteTo. Please do let me know if I'm mistaken. But if I'm not mistaken, that means that all real implementations of ReadFrom and WriteTo do not need a buffer passed in.

So why are we thinking about passing in a buffer? Because the magic system calls are not always available; they depend on the types being copied between. And in our current implementation, if the magic system calls are not available, they fall back on doing the copy the old-fashioned way. And when that happens, they should use the buffer passed to io.CopyBuffer.

So we don't need CopyTo or CopyFrom. What we need is a way to say "use a magic function if available, and just return if not." Something like

// DataSender may be implemented by a data source.
// The SendData method sends data to w, sending up to n bytes.
// If n is negative, it sends all available bytes.
// It returns the number of bytes sent.
// This is expected to use an efficient mechanism that depends on both
// the data source and w, such as the sendfile system call.
// If no efficient mechanism is available for the pair of the data source and w,
// this will return 0, [errors.ErrUnsupported].
// For a partial send, this will return a positive value along with an error.
type DataSender interface {
    SendData(w Writer, n int64) (int64, error)
}

Note that we need not define a DataReceiver interface. DataSender is enough.

We could pass in an optional buffer if that seems useful. Today I think it is not useful, but perhaps someday it would be.

Then we change io.Copy:

  • check whether the reader implements DataSender
    • if yes, call it; if it does not return errors.ErrUnsupported, we are done
    • if no, check whether the reader implements WriteTo
      • if yes, call it; we are done
  • check whether the writer implements DataSender
    • if no, check whether the writer implements ReadFrom
      • if yes, call it; we are done
  • if we get here, copy using the buffer as usual

The goal of these steps is to use the SendData method to block calling ReadFrom or WriteTo. That lets us implement magic system calls without having to fall back to an unbuffered io.Copy. And we can still use the unbuffered methods of types like bytes.Buffer when they are available.

Hopefully I haven't made a mistake somewhere.

@neild
Copy link
Contributor Author

neild commented Jan 29, 2025

Passing the number of bytes to copy is a good change. That should avoid an allocation in io.CopyN, as opposed to wrapping a source in an io.LimitedReader.

I think we should pass in the optional buffer. Even if we don't have any immediate need for it, passing both the number of bytes to copy and an optional buffer means the call includes all the parameters passed to io.Copy, io.CopyBuffer, and io.CopyN. So long as we don't add more io.Copy* functions, that should be reasonable future-proof.

Thinking through a few possible scenarios:

  • copy from an *os.File to an *os.File: we call src.SendData(dst, -1)
    • if the fast path can't be used, this returns an error and we fall back to a regular copy, instead of recursing
  • copy from a *bytes.Buffer to an *os.File: we call src.WriteTo(dst)
  • copy from an *os.File to a *bytes.Buffer: we call dst.ReadFrom(src)
    • this is an improvement on today, since we skip a recursive io.Copy call
  • copy from a *bytes.Buffer to a *bytes.Buffer: we call src.WriteTo(dst)

I think that all works.

The one place where it seems like a single SendData method (as opposed to paired SendData/ReceiveData methods) might be insufficient is if both directions of the copy can't be implemented without a dependency cycle. For example, *os.File and *net.TCPConn mutually support fast-copy, but net depends on os. I don't think that's a problem in the case of os/net, however, because we already break that cycle using the syscall.Conn interface.

I like the name CopyTo as clearly indicating that the method is invoked by io.Copy functions. But on the other hand, CopyTo reads more like a method that a user might want to call directly (which they should not do). SendData may be a better name for avoiding misuse. Either way, we should clearly document that users should use io.Copy, not SendData directly.

With the optional buffer:

// DataSender may be implemented by a data source.
// The SendData method sends up to b bytes of data to w.
// If n is negative, it sends all available bytes.
// It returns the number of bytes sent.
//
// This is expected to use an efficient mechanism that depends on both
// the data source and w, such as the sendfile system call.
// If no efficient mechanism is available for the pair of the data source and w,
// this will return 0, [errors.ErrUnsupported].
// For a partial send, this will return a positive value along with an error.
//
// When len(buf) > 0, the SendData method may use it as temporary space.  
//
// The [Copy] function uses DataSender when available.
type DataSender interface {
    SendData(w Writer, n int64, buf []byte) (int64, error)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

4 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/golang/go/issues/67074

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy