Description
Bug report
Bug description:
For a more detailed description, please see #136601.
I have a bug that causes TAR file parsing to end preemptively for very large sparse files. The computed next TAR header is off by one 512 B block.
The problem is the recomputation of the next TAR offset in case the PAX header contains a size
key to override the overflowed (> 8GB) TAR size:
Lines 1562 to 1569 in 47b01da
The problem is that next.offset_data
is used for this recomputation even though next.offset_data
gets overwritten in _proc_gnusparse_10
:
Line 1612 in 47b01da
This leads to the next TAR offset header being off by the number of blocks it takes to store the sparse data.
But, maybe I am wrong and have overlooked something. I can say, that this fixes it for my test case:
diff --git a/Lib/tarfile.py b/Lib/tarfile.py
index 068aa13ed7..7f3e62f5a2 100644
--- a/Lib/tarfile.py
+++ b/Lib/tarfile.py
@@ -1565,7 +1565,7 @@ def _proc_pax(self, tarfile):
# header starts.
offset = next.offset_data
if next.isreg() or next.type not in SUPPORTED_TYPES:
- offset += next._block(next.size)
+ offset += next._block(next.size) - BLOCKSIZE
tarfile.offset = offset
return next
Minimal reproducer (tested on EXT4 with GNU tar 1.35):
echo bar > foo
echo bar > sparse
fallocate -l 9G sparse
echo bar >> sparse
fallocate --punch-hole -o 1G -l 10M sparse
tar --numeric-owner --format=pax --sparse-version=1.0 -cSf sparse.tar sparse foo
ls -la sparse.tar
# -rw-rw-r-- 1 user user 9663682560 Jul 13 14:14 sparse.tar
tar tvlf sparse.tar
# -rw-rw-r-- 1000/1000 9663676420 2025-07-13 14:13 sparse
# -rw-rw-r-- 1000/1000 4 2025-07-13 14:11 foo
python3 -c 'import sys, tarfile;
[print(tarInfo.sparse, tarInfo.offset, tarInfo.offset_data, tarInfo.size, tarInfo.name)
for tarInfo in tarfile.open(sys.argv[1])]' sparse.tar
# [(0, 1073741824), (1084227584, 8579448836), (9663676420, 0)] 0 2048 9653191172 sparse
# -> foo is missing!
cat sparse.tar | xz -9 | zstd -19 | base64
Reproducer sparse-file-larger-than-8GiB-followed-by-normal-file.tar.xz.zst
file as base64:
cat <<EOF | base64 -d | zstd -d > sparse-file-larger-than-8GiB-followed-by-normal-file.tar.xz
KLUv/QRojBIA1CP9N3pYWgAABObWtEYCACEBHAAAABDPWMz//5wCcV0AFwvGh5JaO6ePxyUOuA/z
XtE/5U/vyT1WUwqPhMr1HTeZeJyWILwrrtDwH0eKx6KKGcU7D2aYidf/9bCtFMcWp8KxDA1FLF58
w9bO4J+eDKd9QfIZFPCutpNB91dMk9bSVazx9pUcWEWn2r0SWsv1BtSYmVDmdKaMdGC/Epx8bcRA
nm5Joy2Tgi3O7VouoCAqha+1YYNOQyyB4sG+tDbfLGdW6fyZMztJ/lRFQwtlFpDLHGFpia92kkke
+2a/mwMvPc58aiT5X56QuH2mw1OhsrBKnbYYnT89BJjyAh2GTOeDbtZ/lLDGwhvxkXlnCm/M8Qiq
fUGfqAjnBeikNY2nodSBFo8YQh+636fk9xfuTQ3kKQ8qEWa613HftzHJ/X/ha1bKD91T/SPTCgd/
rhyvFtn8FBBiUS7UayidinQBNmGebczIaRsKUQKoffUTC9EbCrRXDQjQMjfDyo7N/eDIxD7jBImH
Dv8Qk/hxeFn4C83/lShGD6n8fN77mjAuVsCPhfODgcBlxCVT+PWRNjEFpbDub8FwTUcM0ZERqq1g
HbrOsScYXFmG6WZSWL7pdqxZ5OVbBQj5x9qt/PtSK3TNHlsgQvndUz34KWQJO4DLKmzftTvwxL0u
X6oPPktmQpAT+5I61gCf/xABKwDsc1On/b6ufDEan7eNMW5wnqcjX+woy4XRlZiKfiqR8id19xnA
BphNmP3Yr9WQD1EPP7IEADz8NCsncIBOR5aC/hM+FaZUAAAAAQD9/778uQYCRAAAAAEA/f85AAJE
AAAAAQD9/zkAAkQAAAABAP3/OQACRAAAAAEA/f85AAJEAAAAAQD9/zkAAkQAAAABAP3/OQACRAAA
AAEA/f85AAJEAAAAAQD9/zkAAh0GAIQLkODIaNUNHQG+Ib0xB5201x9u5Typk+S1zSY18D/tc2o+
BXKM/RM9v6MTQoFntxwNm0So6CELgft8dinBPFJBg583tJn+q69PwBnThZQjYTzvNhv0fkxX4Gjm
SgwnOrb7GU5pc2qtcrNCcHrPaNkQicmkdyzESbMAA8S2zfCiJIzpnN25EroA08/3fFWQ44Jfrake
AIiPdXLNPTRNAAGY3VWAsID7IwAAdKr/2BQXOzADAAAAAARZWgIAG0DNWVsOgERj+N4=
EOF
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Projects
Status