Skip to content

[tangentially related to CVE-2023-24329] urlparse does not correctly handle schemes that begin with ASCII digits, '+', '-', and '.' characters #99418

@kenballus

Description

@kenballus

Background

RFC 3986 defines a scheme like this:

  • scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

RFC 2234 defines an ALPHA like this:

  • ALPHA = %x41-5A / %x61-7A

The WHATWG URL spec defines a scheme like this:

The bug

This is the scheme string parsing code from Lib/urllib/parse.py:462-468:

    i = url.find(':')
    if i > 0:
        for c in url[:i]:
            if c not in scheme_chars:
                break
        else:
            scheme, url = url[:i].lower(), url[i+1:]

This is the definition of scheme_chars from Lib/urllib/parse.py:77-80:

scheme_chars = ('abcdefghijklmnopqrstuvwxyz'
                'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
                '0123456789'
                '+-.')

This will erroneously validate schemes that begin with any of ('.', '-', '+', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'). This behavior is in violation of both specifications.

This bug is reproducible with the following snippet:

>>> from urllib.parse import urlparse
>>> urlparse(".://") # Should error, but doesn't
ParseResult(scheme='.', netloc='', path='', params='', query='', fragment='')

My environment

  • CPython versions tested on:
  • Operating system and architecture:
    • Arch Linux x86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy