Skip to content

gh-127146: Skip test_open_undecodable_uri on Emscripten #136510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hoodmane
Copy link
Contributor

@hoodmane hoodmane commented Jul 10, 2025

PR #136326 removed the Emscripten skip for this file but it is still broken.

PR python#136326 removed the Emscripten skip for this file but it
is still broken.
@StanFromIreland
Copy link
Member

cc @serhiy-storchaka

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised. Is test_open_with_undecodable_path passed?

@freakboy3742
Copy link
Contributor

@serhiy-storchaka Both test_open_undecodable_uri and test_open_undecodable_path are failing in CI at present (ref).

FWIW - in both cases, the test path being used is b'@test_42_tmp\xe7w\xf0' - the path is defined, and it is successfully opened, so the get_undecodable_path() check is passing where previously the test was explicitly skipped on Emscripten.

When I run this locally, I get a warning in the console:

Warning -- files was modified by test.test_sqlite3.test_dbapi
Warning --   Before: []
Warning --   After:  ['@test_42_tmp�w�'] 
test test.test_sqlite3.test_dbapi failed

which suggests to me that the bad unicode isn't round-tripping consistently through the filesystem.

I agree that this should be a "skip both or skip neither" situation; and I'd definitely prefer to understand better why it needs to be skipped explicitly before restoring the skip.

@serhiy-storchaka
Copy link
Member

I suppose that paths are Unicode strings on Emscripten. Python and SQLite can use different ways to decode bytes path to Unicode, so os.path.exists() does not see the file created by SQLite, and unlink() cannot remove it.

This is similar to Windows where we need to keep a separate skip.

@hoodmane
Copy link
Contributor Author

Well '@test_42_tmp\xe7w\xf0' does round trip through JavaScript, MEMFS, and NODEFS correctly. So I agree, this test should be fixable. I will investigate more.

@hoodmane hoodmane closed this Jul 11, 2025
@hoodmane
Copy link
Contributor Author

Both test_open_undecodable_uri and test_open_undecodable_path are failing in CI at present

Interesting. Locally only test_open_with_undecodable_path is failing for me.

@hoodmane
Copy link
Contributor Author

hoodmane commented Jul 11, 2025

I changed the test to this and added syscall tracing:

    def get_undecodable_path(self):
        path = TESTFN_UNDECODABLE
        print("open", path)
        f = open(path, 'wb')
        print("close", path)
        f.close()
        print("unlink", path)
        unlink(path)
        return path

    @unittest.skipIf(sys.platform == "win32", "skipped on Windows")
    def test_open_with_undecodable_path(self):
        path = self.get_undecodable_path()
        self.addCleanup(unlink, path)
        print("sqlite.connect", path)
        c = sqlite.connect(path)
        with contextlib.closing(c) as cx:
            print("exists", path)
            exists = os.path.exists(path)
            print(" .. ", exists)
            self.assertTrue(exists)

The relevant part of the log looks like this:

open b'@test_42_tmp\xe7w\xf0'
___syscall_openat @test_42_tmp緰
close b'@test_42_tmp\xe7w\xf0'
_fd_close 3 /home/.../test_python_worker_601748æ/@test_42_tmp緰
unlink b'@test_42_tmp\xe7w\xf0'
___syscall_unlinkat @test_42_tmp緰
sqlite.connect b'@test_42_tmp\xe7w\xf0'
___syscall_openat /home/.../test_python_worker_601748æ/@test_42_tmp�w�
___syscall_stat64 /home/.../test_python_worker_601748æ/@test_42_tmp�w�
exists b'@test_42_tmp\xe7w\xf0'
___syscall_stat64 @test_42_tmp緰
 ..  False
___syscall_stat64 /home/.../test_python_worker_601748æ/@test_42_tmp�w�
_fd_close 3 /home/.../test_python_worker_601748æ/@test_42_tmp�w�
<unrelated teardown syscalls>
FAIL
___syscall_unlinkat @test_42_tmp緰

@hoodmane
Copy link
Contributor Author

Okay I got it: the problem is that UTF8ArrayToString uses a different code path on strings with more than 16 bytes and the behaviors don't exactly match:

    // When using conditional TextDecoder, skip it for short strings as the overhead of the native call is not worth it.
    if (endPtr - idx > 16 && heapOrArray.buffer && UTF8Decoder) {
      return UTF8Decoder.decode({{{ getUnsharedTextDecoderView('heapOrArray', 'idx', 'endPtr') }}});
    }

https://github.com/emscripten-core/emscripten/blob/main/src/lib/libstrings.js#L57-L59

So when the file system decodes the string, it checks the length. If it's short, it uses the JS decoder, if it's long it uses the native decoder. sqlite fully resolves the path before opening the file, and the absolute path is longer than 16 bytes. Whereas the not fully resolved path is 15 bytes long and gets the correct slow path. Adding two extra bytes to TESTFN_UNDECODABLE makes the test pass.

@hoodmane
Copy link
Contributor Author

Upstream report:
emscripten-core/emscripten#24690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy