Skip to content

tarfile indeterminate TarInfo.size when PAX headers contain size and GNU.sparse.realsize keys at the same time #136601

@mxmlnkn

Description

@mxmlnkn

Bug report

Bug description:

Hello,

I am currently debugging this issue.

I have noticed that the bug can be reproduced when the problematic file is truncated to 9 GiB B but it does not happen when truncated to 8 GiB.

The problem seems to be that the next member offset is computed wrong. It seems to point 512 B after the correct TAR header, which, in this case, points into the data for the extended attributes such as 30 mtime=1752348[...].

One of the differences seems to be this code part, which is not hit for the working case:

cpython/Lib/tarfile.py

Lines 1562 to 1569 in 47b01da

if "size" in pax_headers:
# If the extended header replaces the size field,
# we need to recalculate the offset where the next
# header starts.
offset = next.offset_data
if next.isreg() or next.type not in SUPPORTED_TYPES:
offset += next._block(next.size)
tarfile.offset = offset

While looking into the line above, i.e., into _apply_pax_info, I noticed that there is no definite order for applying the size even though it can appear multiple times!

cpython/Lib/tarfile.py

Lines 1615 to 1634 in 47b01da

def _apply_pax_info(self, pax_headers, encoding, errors):
"""Replace fields with supplemental information from a previous
pax extended or global header.
"""
for keyword, value in pax_headers.items():
if keyword == "GNU.sparse.name":
setattr(self, "path", value)
elif keyword == "GNU.sparse.size":
setattr(self, "size", int(value))
elif keyword == "GNU.sparse.realsize":
setattr(self, "size", int(value))
elif keyword in PAX_FIELDS:
if keyword in PAX_NUMBER_FIELDS:
try:
value = PAX_NUMBER_FIELDS[keyword](value)
except ValueError:
value = 0
if keyword == "path":
value = value.rstrip("/")
setattr(self, keyword, value)

In the non-working case, the PAX headers look like this:

{'GNU.sparse.major': '1',
 'GNU.sparse.minor': '0',
 'GNU.sparse.name': 'userdata',
 'GNU.sparse.realsize': '9663676416',
 'atime': '1752349406.975921575',
 'ctime': '1752349534.57652562',
 'mtime': '1752349534.57652562',
 'size': '9602318848'}

I.e, the size member first gets set to GNU.sparse.realsize and then to size. The debug output looks like this:

[_apply_pax_info] SET SIZE to: 9663676416 from key: GNU.sparse.realsize
[_apply_pax_info] SET SIZE to: 9602318848 from key: size
[_apply_pax_info] SET key to: 1752349534.5765257 from key: mtime

Is it specified that the order of the PAX headers must always be this way? Else, one might just as well encounter it like this:

{'atime': '1752349406.975921575',
 'ctime': '1752349534.57652562',
 'mtime': '1752349534.57652562',
 'size': '9602318848',
 'GNU.sparse.major': '1',
 'GNU.sparse.minor': '0',
 'GNU.sparse.name': 'userdata',
 'GNU.sparse.realsize': '9663676416'}

and either one of these orders would be a bug.

The working case does not have this ambiguity:

{'GNU.sparse.major': '1',
 'GNU.sparse.minor': '0',
 'GNU.sparse.name': 'userdata',
 'GNU.sparse.realsize': '8589934592',
 'atime': '1752349538.445543898',
 'ctime': '1752351104.53673501',
 'mtime': '1752351104.53673501'}

the debug output looks like this:

[_apply_pax_info] SET SIZE to: 8589934592 from key: GNU.sparse.realsize
[_apply_pax_info] SET key to: 1752351104.536735 from key: mtime

I.e., even if the is no ordering problem, there already are different semantics for the TarInfo.size member as one will contain GNU.sparse.realsize and the other will contain [PAXHeader.]size.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy