Skip to content

[Filesystem] Add readFileInChunks method to read files in fixed-size chunks #60916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 7.4
Choose a base branch
from

Conversation

santysisi
Copy link
Contributor

@santysisi santysisi commented Jun 27, 2025

Q A
Branch? 7.4
Bug fix? no
New feature? yes
Deprecations? no
Issues no
License MIT

Description

This PR introduces a new readFileInChunks() method to the Filesystem component, which provides a memory-efficient way to read large files by yielding fixed-size chunks of data.

Motivation

Reading large files all at once can consume excessive memory and potentially kill the process, especially in constrained environments. This method avoids that by yielding smaller chunks, making it safer and more efficient for large file handling.

Example

foreach ($filesystem->readFileInChunks('/path/to/large-file.txt') as $chunk) {
    // Process $chunk (string of up to 8192 bytes by default)
}

@carsonbot carsonbot added this to the 7.4 milestone Jun 27, 2025
@santysisi santysisi force-pushed the feature/read-file-in-chunks branch 2 times, most recently from 33f31eb to 5516edd Compare June 27, 2025 02:34
$chunks .= $chunk;
}

$this->assertSame($expectedContent, $chunks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be assertions/test cases about the chunk size?

*
* @param string $filename The full path to the file
*
* @return iterable<string> Yields file content as strings in chunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "yields" isn't accurate, given the return type is an iterable (which can be any array for example). You may either change the return type to \Generator (I personally prefer iterable because it's more flexible) or not use the "yield" word here

@santysisi santysisi force-pushed the feature/read-file-in-chunks branch from 5516edd to eb16ee7 Compare June 27, 2025 12:59
@santysisi
Copy link
Contributor Author

Hi everyone, thanks for your suggestions!
I've made the changes accordingly, let me know if there's anything else you'd like me to adjust.

*
* @return \Generator<string> Yields file content as strings in chunks
*
* @throws IOException If the file cannot be opened or read, or if it's a directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong in the current implementation. This exception is not thrown when calling this method, but when starting the iteration (your test is hiding that by using iterator_to_array).
To properly perform the validation synchronously (which is easier for error handling and documentation), you would need to move the iteration to a private method (defining the generator) while the validation runs in the public method before calling that private method. We use that approach in symfony/cache for instance (where we are required to validate keys synchronously to respect PSR-6)

*
* @throws IOException If the file cannot be opened or read, or if it's a directory
*/
public function readFileInChunks(string $filename, int $size = 8192): \Generator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the return type should be either iterable or \Traversable (if we want to guarantee that it is not returning an array) rather than \Generator, as it gives us more flexibility to refactor the implementation in the future (widening a return type is a BC break)

@santysisi santysisi force-pushed the feature/read-file-in-chunks branch from eb16ee7 to b06ce7a Compare June 28, 2025 16:34
@santysisi
Copy link
Contributor Author

Hi @stof , thanks for your suggestion!
I've made the changes accordingly, let me know if there's anything else you'd like me to adjust.

@santysisi santysisi force-pushed the feature/read-file-in-chunks branch from b06ce7a to ea7570e Compare June 28, 2025 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy