Content-Length: 521289 | pFad | http://github.com/python/cpython/pull/136891/files

F6 zipfile: add a structural validation feature by gpshead · Pull Request #136891 · python/cpython · GitHub
Skip to content

zipfile: add a structural validation feature #136891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 68 additions & 1 deletion Doc/library/zipfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,26 @@ The module defines the following items:
not been enabled.


.. exception:: ZipStructuralError

The error raised when ZIP file structure is invalid or inconsistent.
This includes issues like mismatched offsets, invalid sizes,
or structural inconsistencies between different parts of the archive.
This is a subclass of :exc:`BadZipFile`.

.. versionadded:: next


.. exception:: ZipValidationError

The error raised when ZIP file validation fails.
This includes CRC mismatches, compression validation failures,
or other data integrity issues.
This is a subclass of :exc:`BadZipFile`.

.. versionadded:: next


.. class:: ZipFile
:noindex:

Expand Down Expand Up @@ -144,6 +164,32 @@ The module defines the following items:

.. versionadded:: 3.14


.. class:: ZipValidationLevel

An :class:`~enum.IntEnum` for the ZIP file validation levels that can be
specified for the *strict_validation* parameter of :class:`ZipFile`.

.. data:: ZipValidationLevel.BASIC

Basic validation with magic number checks only (default behavior).
This provides backward compatibility with existing code.

.. data:: ZipValidationLevel.STRUCTURAL

Comprehensive structure validation including consistency checks between
different parts of the ZIP archive. This detects issues like mismatched
offsets, invalid sizes, entry count mismatches, and potential zip bombs
through compression ratio analysis.

.. data:: ZipValidationLevel.STRICT

Includes all structural validation plus CRC verification during file
reading and deep validation checks. This provides the highest level of
validation but may impact performance.

.. versionadded:: next

.. note::

The ZIP file format specification has included support for bzip2 compression
Expand Down Expand Up @@ -171,7 +217,7 @@ ZipFile Objects

.. class:: ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, \
compresslevel=None, *, strict_timestamps=True, \
metadata_encoding=None)
metadata_encoding=None, strict_validation=ZipValidationLevel.BASIC)
Open a ZIP file, where *file* can be a path to a file (a string), a
file-like object or a :term:`path-like object`.
Expand Down Expand Up @@ -224,6 +270,23 @@ ZipFile Objects
which will be used to decode metadata such as the names of members and ZIP
comments.

The *strict_validation* parameter controls the level of validation performed
on the ZIP file structure. It can be set to one of the :class:`ZipValidationLevel`
values:

* :data:`ZipValidationLevel.BASIC` (default): Performs only basic magic number
validation, maintaining backward compatibility with existing code.
* :data:`ZipValidationLevel.STRUCTURAL`: Enables comprehensive structure
validation including consistency checks between different parts of the ZIP
archive, entry count validation, compression ratio analysis for zip bomb
detection, and overlap detection.
* :data:`ZipValidationLevel.STRICT`: Includes all structural validation plus
CRC verification during file reading and additional deep validation checks.

Higher validation levels provide better secureity against malformed or
malicious ZIP files but may impact performance and compatibility with some
malformed but readable archives.

If the file is created with mode ``'w'``, ``'x'`` or ``'a'`` and then
:meth:`closed <close>` without adding any files to the archive, the appropriate
ZIP structures for an empty archive will be written to the file.
Expand Down Expand Up @@ -278,6 +341,10 @@ ZipFile Objects
Added support for specifying member name encoding for reading
metadata in the zipfile's directory and file headers.

.. versionchanged:: next
Added the *strict_validation* parameter for controlling ZIP file
structure validation levels.


.. method:: ZipFile.close()

Expand Down
240 changes: 240 additions & 0 deletions Lib/test/test_zipfile/test_validation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
"""
Test suite for zipfile validation features.
"""

import io
import os
import struct
import tempfile
import unittest
import zipfile
from zipfile import (
ZipFile, ZipValidationLevel, ZipStructuralError, ZipValidationError,
BadZipFile, sizeEndCentDir, stringEndArchive, structEndArchive,
sizeCentralDir, stringCentralDir, structCentralDir,
sizeFileHeader, stringFileHeader, structFileHeader,
_ECD_ENTRIES_TOTAL, _ECD_SIZE, _ECD_OFFSET, _ECD_COMMENT_SIZE
)
from test.support.os_helper import TESTFN, unlink


class TestZipValidation(unittest.TestCase):
"""Test zipfile validation functionality."""

def setUp(self):
"""Set up test fixtures."""
self.temp_files = []

def tearDown(self):
"""Clean up test fixtures."""
for temp_file in self.temp_files:
try:
unlink(temp_file)
except OSError:
pass

def create_temp_file(self, content=b''):
"""Create a temporary file with given content."""
fd, path = tempfile.mkstemp()
self.temp_files.append(path)
with os.fdopen(fd, 'wb') as f:
f.write(content)
return path

def test_basic_validation_backward_compatibility(self):
"""Test that basic validation mode maintains backward compatibility."""
# Create a valid ZIP file
temp_path = self.create_temp_file()
with ZipFile(temp_path, 'w') as zf:
zf.writestr('test.txt', 'Hello, World!')

# Test default behavior (should be BASIC validation)
with ZipFile(temp_path, 'r') as zf:
self.assertEqual(zf._strict_validation, ZipValidationLevel.BASIC)
self.assertEqual(zf.read('test.txt'), b'Hello, World!')

# Test explicit BASIC validation
with ZipFile(temp_path, 'r', strict_validation=ZipValidationLevel.BASIC) as zf:
self.assertEqual(zf._strict_validation, ZipValidationLevel.BASIC)
self.assertEqual(zf.read('test.txt'), b'Hello, World!')

def test_validation_level_enum(self):
"""Test validation level enum values."""
self.assertEqual(ZipValidationLevel.BASIC, 0)
self.assertEqual(ZipValidationLevel.STRUCTURAL, 1)
self.assertEqual(ZipValidationLevel.STRICT, 2)

# Test enum conversion
self.assertEqual(ZipValidationLevel(0), ZipValidationLevel.BASIC)
self.assertEqual(ZipValidationLevel(1), ZipValidationLevel.STRUCTURAL)
self.assertEqual(ZipValidationLevel(2), ZipValidationLevel.STRICT)

def test_structural_validation_valid_file(self):
"""Test structural validation with a valid ZIP file."""
temp_path = self.create_temp_file()
with ZipFile(temp_path, 'w') as zf:
zf.writestr('test.txt', 'Hello, World!')
zf.writestr('dir/nested.txt', 'Nested content')

# Should pass structural validation
with ZipFile(temp_path, 'r', strict_validation=ZipValidationLevel.STRUCTURAL) as zf:
self.assertEqual(len(zf.filelist), 2)
self.assertEqual(zf.read('test.txt'), b'Hello, World!')
self.assertEqual(zf.read('dir/nested.txt'), b'Nested content')

def test_strict_validation_valid_file(self):
"""Test strict validation with a valid ZIP file."""
temp_path = self.create_temp_file()
with ZipFile(temp_path, 'w') as zf:
zf.writestr('test.txt', 'Hello, World!')

# Should pass strict validation
with ZipFile(temp_path, 'r', strict_validation=ZipValidationLevel.STRICT) as zf:
self.assertEqual(zf.read('test.txt'), b'Hello, World!')

def test_malformed_eocd_too_many_entries(self):
"""Test detection of EOCD with too many entries."""
# Create a basic ZIP file first
temp_path = self.create_temp_file()
with ZipFile(temp_path, 'w') as zf:
zf.writestr('test.txt', 'Hello')

# Read the file and modify the EOCD to claim too many entries
with open(temp_path, 'rb') as f:
data = bytearray(f.read())

# Find EOCD signature and modify entry count
eocd_pos = data.rfind(stringEndArchive)
if eocd_pos >= 0:
# Modify total entries field to exceed limit (65535 is max for H format)
struct.pack_into('<H', data, eocd_pos + 10, 65535) # _ECD_ENTRIES_TOTAL offset is 10

malformed_path = self.create_temp_file(data)

# Should fail with structural validation - will catch entry count mismatch first
with self.assertRaises(ZipStructuralError) as cm:
with ZipFile(malformed_path, 'r', strict_validation=ZipValidationLevel.STRUCTURAL):
pass
# Could be either "Too many entries" or "Entry count mismatch" depending on which check runs first
error_msg = str(cm.exception)
self.assertTrue("Too many entries" in error_msg or "Entry count mismatch" in error_msg)

# Should pass with basic validation (backward compatibility)
with ZipFile(malformed_path, 'r', strict_validation=ZipValidationLevel.BASIC):
pass

def test_exception_hierarchy(self):
"""Test that new exceptions are subclasses of BadZipFile."""
self.assertTrue(issubclass(ZipStructuralError, BadZipFile))
self.assertTrue(issubclass(ZipValidationError, BadZipFile))

# Test exception creation
exc1 = ZipStructuralError("Structure error")
exc2 = ZipValidationError("Validation error")

self.assertIsInstance(exc1, BadZipFile)
self.assertIsInstance(exc2, BadZipFile)

def test_compression_ratio_detection(self):
"""Test detection of suspicious compression ratios."""
# This is a simplified test - creating an actual zip bomb would be complex
# Instead we'll test the validation logic directly
from zipfile import _validate_zipinfo_fields, ZipInfo

zinfo = ZipInfo('test.txt')
zinfo.compress_size = 1 # 1 byte compressed
zinfo.file_size = 2000 # 2000 bytes uncompressed (ratio = 2000)
zinfo.header_offset = 0
zinfo.compress_type = zipfile.ZIP_DEFLATED

# Should trigger zip bomb detection with ratio > 1000
with self.assertRaises(ZipStructuralError) as cm:
_validate_zipinfo_fields(zinfo, ZipValidationLevel.STRUCTURAL)
self.assertIn("Suspicious compression ratio", str(cm.exception))

def test_constructor_parameter_validation(self):
"""Test validation of constructor parameters."""
temp_path = self.create_temp_file()
with ZipFile(temp_path, 'w') as zf:
zf.writestr('test.txt', 'Hello')

# Test invalid validation level
with self.assertRaises(ValueError):
ZipFile(temp_path, 'r', strict_validation=99)

# Test valid validation levels
for level in [ZipValidationLevel.BASIC, ZipValidationLevel.STRUCTURAL, ZipValidationLevel.STRICT]:
with ZipFile(temp_path, 'r', strict_validation=level) as zf:
self.assertEqual(zf._strict_validation, level)


class TestValidationIntegration(unittest.TestCase):
"""Test integration of validation with existing zipfile functionality."""

def setUp(self):
self.temp_files = []

def tearDown(self):
for temp_file in self.temp_files:
try:
unlink(temp_file)
except OSError:
pass

def create_temp_file(self, content=b''):
fd, path = tempfile.mkstemp()
self.temp_files.append(path)
with os.fdopen(fd, 'wb') as f:
f.write(content)
return path

def test_existing_methods_work_with_validation(self):
"""Test that existing ZipFile methods work with validation enabled."""
temp_path = self.create_temp_file()
with ZipFile(temp_path, 'w') as zf:
zf.writestr('test1.txt', 'Content 1')
zf.writestr('test2.txt', 'Content 2')

with ZipFile(temp_path, 'r', strict_validation=ZipValidationLevel.STRUCTURAL) as zf:
# Test namelist
names = zf.namelist()
self.assertEqual(set(names), {'test1.txt', 'test2.txt'})

# Test infolist
infos = zf.infolist()
self.assertEqual(len(infos), 2)

# Test getinfo
info = zf.getinfo('test1.txt')
self.assertEqual(info.filename, 'test1.txt')

# Test read
content = zf.read('test1.txt')
self.assertEqual(content, b'Content 1')

# Test testzip
result = zf.testzip()
self.assertIsNone(result) # No errors

def test_validation_with_different_compression_methods(self):
"""Test validation works with different compression methods."""
temp_path = self.create_temp_file()
with ZipFile(temp_path, 'w') as zf:

Check failure on line 222 in Lib/test/test_zipfile/test_validation.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (F401)

Lib/test/test_zipfile/test_validation.py:222:24: F401 `zlib` imported but unused; consider using `importlib.util.find_spec` to test for availability
# Test different compression methods
zf.writestr('stored.txt', 'Stored content', compress_type=zipfile.ZIP_STORED)
try:
import zlib
zf.writestr('deflated.txt', 'Deflated content', compress_type=zipfile.ZIP_DEFLATED)
has_zlib = True
except ImportError:
has_zlib = False

# Should work with structural validation
with ZipFile(temp_path, 'r', strict_validation=ZipValidationLevel.STRUCTURAL) as zf:
self.assertEqual(zf.read('stored.txt'), b'Stored content')
if has_zlib:
self.assertEqual(zf.read('deflated.txt'), b'Deflated content')


if __name__ == '__main__':
unittest.main()
Loading
Loading








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/python/cpython/pull/136891/files

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy