Unnecessary dependence in fromfile on the ability to seek into a file #28840

eyalroz · 2025-04-27T12:31:39Z

If, in my script, my input file is a PIPE, e.g. I call my script like so:

cat nums.fp32.bin | ./myscript

and my script contains the line:

arr = numpy.fromfile(sys.stdin, dtype=numpy.dtype('f32'))

then - NumPy fails,
when I run the script, I get:

$ cat nums.fp32.bin | ./myscript
File "./myscript", line 123, in main
    arr = numpy.fromfile(sys.stdin, dtype=numpy.dtype('f32'))
OSError: obtaining file position failed

why is NumPy not "falling back" on reading individual values, without knowing the overall number of them?

The text was updated successfully, but these errors were encountered:

seberg · 2025-04-28T07:29:38Z

That would require memory resizing, etc. which fromfile doesn't do. I suppose it could do it, but there seems no downside to me in reading via: data = raw_reading; arr = np.frombuffer(data, dtype='f') for binary data.
(For non-binary reads np.loadtxt, should be superior anyway.)

eyalroz · 2025-04-28T07:48:45Z

Well, it seems - naively - that reading raw, then calling frombuffer, would mean requiring twice the memory, at least temoprarily... but regardless - the point is that its not justified for NumPy to fail. That is, the fact that the user of numpy could have gone to the effort of checking the seekability of the file and writing something differently is not reason enough for refusing to perform the read within the library itself.

seberg · 2025-04-28T08:06:05Z

Well, it seems - naively - that reading raw, then calling frombuffer

Just for the record (not 100% sure you are aware): There is no additional copy involed.

eyalroz · 2025-04-28T08:09:36Z

Just for the record (not 100% sure you are aware): There is no additional copy involed.

Ok, that's a nice optimization, kudos... I'm not a Pythonista, so I wouldn't know.

However, in that case - why not just write the few lines of raw reading and a frombuffer invocation, in case you've discovered that the file is unseekable? It seems like little effort on your part, based on what you've said; it rounds out the API; it doesn't fill your codebase with only-used-once functionality... seems like a win.

aureliobarbosa · 2025-05-14T20:11:50Z

Why not just improve the documentation by clearly stating the file is a raw binary file and not a buffered file?

I think the same issue occurs with numpy.rec.fromfile, but there documentation clearly states that "The file object must
support random access (i.e. it must have tell and seek methods).", as indicated below:

numpy/numpy/_core/records.py

Lines 837 to 848 in d32cf93

    
           @set_module("numpy.rec") 
        
           def fromfile(fd, dtype=None, shape=None, offset=0, formats=None, 
        
                        names=None, titles=None, aligned=False, byteorder=None): 
        
               """Create an array from binary file data 
        
               Parameters 
        
               ---------- 
        
               fd : str or file type 
        
                   If file is a string or a path-like object then that file is opened, 
        
                   else it is assumed to be a file object. The file object must 
        
                   support random access (i.e. it must have tell and seek methods). 
        
               dtype : data-type, optional

This would avoid confusion and close the issue. If a maintainer agree with that I can do the PR.

Otherwise, it would be better to tag this issue as an enhancement.

seberg · 2025-05-14T21:29:13Z

Why not just improve the documentation by clearly stating the file is a raw binary file and not a buffered file?

I think that sounds great, it doesn't mean we can't ever extend it (although if we point to alternatives, I am not sure I think that it is necessary).

aureliobarbosa added a commit to aureliobarbosa/numpy that referenced this issue May 15, 2025

DOC: improves np.fromfile file description (numpy#28840)

94039f3

aureliobarbosa mentioned this issue May 15, 2025

DOC: improves np.fromfile file description (#28840) #28979

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Unnecessary dependence in fromfile on the ability to seek into a file #28840

Unnecessary dependence in fromfile on the ability to seek into a file #28840

eyalroz commented Apr 27, 2025

seberg commented Apr 28, 2025

Uh oh!

eyalroz commented Apr 28, 2025

Uh oh!

seberg commented Apr 28, 2025

Uh oh!

eyalroz commented Apr 28, 2025

Uh oh!

aureliobarbosa commented May 14, 2025

Uh oh!

seberg commented May 14, 2025

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Uh oh!

Unnecessary dependence in fromfile on the ability to seek into a file #28840

Unnecessary dependence in fromfile on the ability to seek into a file #28840

Comments

eyalroz commented Apr 27, 2025

seberg commented Apr 28, 2025

Uh oh!

eyalroz commented Apr 28, 2025

Uh oh!

seberg commented Apr 28, 2025

Uh oh!

eyalroz commented Apr 28, 2025

Uh oh!

aureliobarbosa commented May 14, 2025

Uh oh!

seberg commented May 14, 2025

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!