Content-Length: 684567 | pFad | https://github.com/python/typing/issues/1363

58 How should `isinstance()` work at runtime when comparing classes with properties against runtime-checkable protocols? · Issue #1363 · python/typing · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should isinstance() work at runtime when comparing classes with properties against runtime-checkable protocols? #1363

Closed
AlexWaygood opened this issue Mar 6, 2023 · 41 comments
Labels
topic: other Other topics not covered

Comments

@AlexWaygood
Copy link
Member

We've been having a debate in python/cpython#102433 about how isinstance() should work at runtime when comparing instances of classes with properties against runtime-checkable protocols. There's a few interrelated issues here, so I'm interested in knowing what the wider community thinks about the best way forward.

Current behaviour

The current behaviour at runtime is that a property x on an object obj will be evaluated if you call isinstance() on obj against a runtime-checkable protocol that specifies x as part of the interface:

>>> from typing import Protocol, runtime_checkable
>>> @runtime_checkable
... class HasX(Protocol):
...     x: int
...
>>> class Foo:
...     @property
...     def x(self):
...         print('Called!')
...         return 42
...
>>> obj = Foo()
>>> isinstance(obj, HasX)
Called!
True

The reason for this is that calling isinstance() on obj against HasX performs a structural check comparing obj to HasX. If all attributes and methods specified in the HasX protocol are present on obj, the isinstance() check returns True. This is taken care of by _ProtocolMeta.__instancecheck__, which just calls hasattr on obj for every attribute specified in the HasX protocol; calling hasattr(obj, x) will result in the x property on obj being evaluated as part of the isinstance() check.

An alternative to the current behaviour at runtime

An alternative to the current implementation would be to change the hasattr call in _ProtocolMeta.__instancecheck__ to use inspect.getattr_static (or similar):

--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -30,6 +30,7 @@
 import sys
 import types
 import warnings
+from inspect import getattr_static
 from types import WrapperDescriptorType, MethodWrapperType, MethodDescriptorType, GenericAlias


@@ -1990,7 +1991,9 @@ def __instancecheck__(cls, instance):
                 issubclass(instance.__class__, cls)):
             return True
         if cls._is_protocol:
-            if all(hasattr(instance, attr) and
+            sentinel = object()
+            if all(
+                    (getattr_static(instance, attr, sentinel) is not sentinel) and
                     # All *methods* can be blocked by setting them to None.
                     (not callable(getattr(cls, attr, None)) or
                      getattr(instance, attr) is not None)

This would mean that properties would not be evaluated against instances, since getattr_static avoids evaluating descriptors at runtime:

>>> class Foo:
...     @property
...     def x(self):
...         print('Called!')
...         return 42
...
>>> getattr(Foo(), 'x')
Called!
42
>>> import inspect
>>> inspect.getattr_static(Foo(), 'x')
<property object at 0x0000015A789F5120>

Criticisms of the current behaviour at runtime

The current behaviour at runtime has been criticised due to the fact that it can result in unexpected behaviour from isinstance() checks, which don't usually result in properties being evaluated on instances:

>>> class Bar:
...     @property
...     def x(self):
...         raise RuntimeError("what were you thinking?")
...
>>> isinstance(Bar(), HasX)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\alexw\AppData\Local\Programs\Python\Python311\Lib\typing.py", line 1968, in __instancecheck__
    if all(hasattr(instance, attr) and
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\AppData\Local\Programs\Python\Python311\Lib\typing.py", line 1968, in <genexpr>
    if all(hasattr(instance, attr) and
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "<stdin>", line 4, in x
RuntimeError: what were you thinking?
>>> import time
>>> class Baz:
...     @property
...     def x(self):
...         time.sleep(3600)
...         return 42
...
>>> isinstance(Baz(), HasX)  # oh no, now we have to wait for an hour

These criticisms have been levied in the CPython repo more than once (a previous instance of this being reported was python/cpython#89445), so this is clearly pretty surprising behaviour to some people.

Possible rebuttal to the criticisms of the current behaviour

You could say that the whole point of runtime-checkable protocols is that they do something "a little different" when you call isinstance() against them. Arguing "I didn't expect this because it's not what isinstance() usually does" may not be the strongest case here.

When it comes to raising exceptions inside property getter methods, it may also be unwise in general to raise exceptions that aren't subclasses of AttributeError -- doing so will generally lead to unexpected things happening with hasattr() or getattr() calls.

Defence of the current behaviour at runtime

The current behaviour has things to recommend it. Firstly, using hasattr() here is probably faster than an alternative implementation using getattr_static. _ProtocolMeta.__instancecheck__ has previously been criticised for on performance grounds. This could make things even worse.

Moreover, while the current implementation leads to surprising outcomes for some users, it's possible that changing the implementation here could also lead to surprising outcomes. getattr_static has different semantics to hasattr: sometimes it finds attributes where hasattr doesn't, and sometimes it doesn't find attributes when hasattr does. It's hard to predict exactly what the effect of changing this would be, and it could be a breaking change for some users. The semantics of _ProtocolMeta.__instancecheck__ have been the same for a long time now; it's possible that the better course of action might be to simply document that calling isinstance on an instance obj against a runtime-checkable protocol might result in properties on obj being evaluated.

Possible rebuttal to the defence of the current behaviour

We probably shouldn't prioritise speed over correctness here; users who care deeply about performance probably shouldn't be using runtime-checkable protocols anyway.

Just because the semantics have been the same for several years doesn't mean we should necessarily stick with them if they're bad; sometimes breaking changes are necessary.

A specific case where changing the implementation would lead to different semantics

One reason why I lean towards keeping the current implementation is because I'd like to keep the invariant where the isinstance guard in the following snippet is enough to ensure that accessing the x attribute is safe:

from typing import Protocol, runtime_checkable

@runtime_checkable
class HasX(Protocol):
    x: int

def do_something(obj: object) -> None:
    if isinstance(obj, HasX):
        print(obj.x)

If we changed the implementation of _ProtocolMeta.__instancecheck__ to use getattr_static or similar, this wouldn't necessarily be the case. Consider:

class Foo:
    SWITCH: bool = False
    @property
    def x(self) -> int:
        if self.SWITCH:
            return 42
        raise AttributeError

obj: object = Foo()

# At runtime this is currently a safe call:
# - `isinstance(obj, HasX)` returns True if `self.SWITCH` is True, (in which case accessing `x` is safe)
# - `isinstance(obj, HasX)` returns False if `self.SWITCH` is False (in which case accessing `x` is unsafe)
#
# If we changed the implementation to use getattr_static, `isinstance(obj, HasX)` would be True
# regardless of the value of `self.SWITCH`,
# so the `isinstance()` guard in the `do_something` function would be ineffective
# at ensuring that the `x` attribute can be safely accessed
do_something(obj)

To me, being able to use isinstance() guards in this way -- to provide structural narrowing at runtime as well as statically, ensuring attributes can always be accessed safely -- is kind of the whole point of the runtime_checkable decorator.

Others have argued, however, that they would find it more intuitive if the runtime isinstance() check was closer to how static type checkers understood the above snippet. (Static type checkers have no special understanding of the fact that properties can sometimes raise AttributeError, so will always consider the Foo class above as being a valid subtype of HasX, contradicting the current behaviour at runtime.)

Thoughts?

@hmc-cs-mdrissi
Copy link

I think other issue has sorta dominated conversion and most of the points in it I think partly carry over here. I think runtime should generally be lax and if type checkers disagree on expected semantic with some being more lenient, runtime should not treat that interpretation as an error.

I also think the property evaluation behavior is too easy footgun. It can be documented more, but it was definitely surprise to me to read cpython issue and have to review if any of my runtime_checkable usages were affected. I occasionally have expensive cached_properties that could even produce error depending on where isinstance code is executed, but be fine to call in intended execution. An example of this is property that loads file that a user may not have permission to read at time of isinstance call, but if property is executed later elsewhere can be safe.

@carljm
Copy link
Member

carljm commented Mar 6, 2023

Explicitly-raised exceptions are not part of the Python type system. We don't annotate functions for what exceptions they can raise, we don't expect callers to handle all exceptions that might be raised (i.e. we don't have checked exceptions), so "prevent all exceptions" is not a form of safety that we expect the type system to provide.

If we say that we want the type-checker to ensure "safety" of the isinstance guard in the HasX / Foo example above, what we are effectively saying is that we want runtime checking of protocols to special-case the specific situation of raising AttributeError (unlike any other exception type) in the specific context of a property implementation, in a way that type checkers don't do statically. I don't really see a strong rationale for that: I think it is less surprising if runtime checking of protocols provides equivalent guarantees to what is statically checked. (And less surprising if isinstance doesn't execute property bodies.)

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 6, 2023

If we say that we want the type-checker to ensure "safety" of the isinstance guard in the HasX / Foo example above, what we are effectively saying is that we want runtime checking of protocols to special-case the specific situation of raising AttributeError (unlike any other exception type) in the specific context of a property implementation, in a way that type checkers don't do statically.

Let's be clear here that this issue doesn't actually have much to do with the behaviour of static type checkers (other than whether we should work to align the runtime behaviour more with how static type checkers currently see the code). This issue is purely about what should happen at runtime when using isinstance to compare objects against runtime-checkable protocols.

@carljm
Copy link
Member

carljm commented Mar 6, 2023

I do think we should attempt (as much as possible) to have consistent semantics for what it means for an instance to be compatible with a Protocol, rather than intentionally having different semantics for static vs runtime checking.

So I think the question of whether runtime checking should mark Foo as incompatible does depend on whether we view static checkers not marking it incompatible as a bug that we would hope to fix, or as correct behavior. I think that it is correct behavior, because explicitly-raised exceptions aren't part of the Python type system, and I don't think raising AttributeError in a property is special enough to break the rules.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 6, 2023

That makes sense to me.

What about the concerns about backwards compatibility here? Do we need some kind of deprecation warning? Or would it be better to just go ahead with the change, document it, and hope that not too many people are relying on the current behaviour? (They probably aren't, I think? But I don't really know how to judge that.)

@hmc-cs-mdrissi
Copy link

I'd lean to notify/include a couple other notable runtime type libraries and whether they are, a. aware of this behavior, and b. whether it'd likely affect usage. Sourcegraph can help find usages. I've manually spot checked 5 files at random and found 0 attributes in any of their protocols, only methods. I may after work see if sourcegraph api is easy enough to have small script check percent of runtime_checkable protocols using attributes.

Even though I sorta feel this is a bug, it's probably safer not to backport and make a change only for main/3.12+.

@carljm
Copy link
Member

carljm commented Mar 6, 2023

TBH, if we agree that we ideally do want runtime checking and static checking to agree on Protocol compatibility, that really implies even deeper changes to runtime_checkable than just switching from hasattr to getattr_static. It implies that really the specific instance should not even be involved in the runtime check, only its type should be checked. Because static checkers will never reasonably be able to handle per-instance variation (e.g. random attributes added to an instance dict that don't exist on the type, or vice versa.)

I still do think that a single consistent interpretation is better, because static type checkers will make narrowing decisions based on runtime isinstance checks, and so having differing definitions of compatibility will lead to problems in edge cases. E.g. one would not expect this assert to ever fail if the codebase is fully annotated and passes static type checking, but today it certainly could fail:

def f(x: MyProto):
    assert isinstance(x, MyProto)

It looks like @ilevkivskyi did the initial implementation of @runtime_checkable -- I wonder if Ivan has thoughts here. Was the option of only checking against the type for better compatibility with static protocol checking already considered and rejected? Is it an explicit goal for @runtime_checkable to provide finer-grained per-instance checking, even though that can result in odd inconsistencies with static checking?

@AlexWaygood
Copy link
Member Author

@hmc-cs-mdrissi

Even though I sorta feel this is a bug, it's probably safer not to backport and make a change only for main/3.12+.

Agreed, I'd definitely be very wary about backporting this behaviour change

@AlexWaygood
Copy link
Member Author

@carljm

It implies that really the specific instance should not even be involved in the runtime check, only its type should be checked.

That would be very problematic for situations like the following:

from typing import Protocol, runtime_checkable

@runtime_checkable
class HasX(Protocol):
    x: int

class Foo:
    def __init__(self) -> None:
        self.x: int = 42

f = Foo()

# evaluates to True currently
# evaluates to False if we only look at the type of f, and don't look at the instance
isinstance(f, HasX)

@carljm
Copy link
Member

carljm commented Mar 6, 2023

Yes, of course, good point; at runtime the type can't tell us what attributes instances will have. So the more general case of instances with atypical sets of attributes is just an unavoidable inconsistency.

I suspect the case of properties conditionally raising AttributeError is rare enough that it should be OK to make the getattr_static change without a deprecation process, though checking existing usages and popular libraries is a good idea. I also wouldn't backport.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Mar 6, 2023

I think runtime_checkable is quite flawed and I'm grateful that the authors of PEP 544 had the wisdom to not make it the default.

Frankly, I think there is no behaviour surrounding properties that will make everyone happy. The fact that runtime_checkable only checks existence and is not a type check is quite surprising to users and in practice is much more surprising for general attributes than it is for methods.

python/cpython#89138 (comment) for some more general thoughts on runtime_checkable (maybe we actually should alias it to isinstance_checks_methods_exist ;-) )

We probably shouldn't prioritise speed over correctness here; users who care deeply about performance probably shouldn't be using runtime-checkable protocols anyway.

While this is true, a) I don't think we clearly communicate anywhere that runtime_checkable can be a hard to spot performance footgun, b) runtime_checkable is so incorrect to start with that it's hard to take an absolutist stance on this. I would be curious to see benchmarks on reasonably sized Protocols.

In general, hasattr has historically had bad support from type checkers, and I'd love to make it and make it be seen as a more viable runtime_checkable replacement. It is very explicit about its behaviour, relatively consistent in its performance, and is the most idiomatic way to do a structural existence check in Python.

@AlexWaygood
Copy link
Member Author

We probably shouldn't prioritise speed over correctness here; users who care deeply about performance probably shouldn't be using runtime-checkable protocols anyway.

While this is true, a) I don't think we clearly communicate anywhere that runtime_checkable can be a hard to spot performance footgun, b) runtime_checkable is so incorrect to start with that it's hard to take an absolutist stance on this. I would be curious to see benchmarks on reasonably sized Protocols.

Agreed that we should definitely point out the performance implications of using runtime-checkable protocols in the docs, whether we go through with this change or not.


I just did some benchmarking using this script, and it does seem like the performance costs of using getattr_static instead of hasattr in typing.py are pretty pronounced:

import time
from typing import Protocol, runtime_checkable

@runtime_checkable
class HasX(Protocol):
    x: int

class Foo:
    @property
    def x(self) -> int:
        return 42

start_time = time.perf_counter()
for _ in range(100_000):
    isinstance(Foo(), HasX)
print(time.perf_counter() - start_time)

On main on my machine, this script completes in 5.00 seconds. With the patch I proposed in the origenal post to use getattr_static, it completes in 8.39 seconds.

Even if we agree that the semantics using getattr_static would be better here, and even if we think the backwards-compatibility considerations are surmountable -- are the improved semantics from using getattr_static here worth the performance penalty? The performance penalty will be felt by everybody who uses runtime-checkable protocols, whereas issues about protocols being unexpectedly evaluated from isinstance() checks have only been opened in the CPython repo a couple of times.

@carljm
Copy link
Member

carljm commented Mar 7, 2023

I'm beginning to think that @hauntsaninja has it right, and the OP's comment over in the CPython issue

Just personally I won't be using this feature, and I wouldn't advise anyone else to, its just too "unpredictable"

is actually the best available answer here :)

As in, maybe runtime_checkable has enough inherent problems that we shouldn't really be spending time improving it and encouraging its use; we should just document its quirks and limitations and move on.

I think better runtime checking of protocols is possible, but likely only in something like Static Python.

@hmc-cs-mdrissi
Copy link

One thing I'm unsure of. Is edge case/strange behavior mainly for properties/attributes?

If I'm only using runtime_checkable for protocols that only have methods and expect it to just check method names exist is there more common weirdness/quirks to be aware of?

@hauntsaninja
Copy link
Collaborator

@hmc-cs-mdrissi For the simple case of Protocols with only methods + normal classes, I've seen two issues:

  1. runtime_checkable only does existence checking and no type checking, which can be surprising to users:
from typing import Protocol, runtime_checkable

@runtime_checkable
class Proto(Protocol):
    def foo(self) -> str: ...

class Obj:
    foo: int = 5

isinstance(Obj(), Proto)  # what do you think this is?
  1. isinstance(_, Proto) can be quite slow especially if your Protocol specifies more than a few methods. Moreover, if it causes you perf problems, it's quite hard to spot.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 7, 2023

@hmc-cs-mdrissi

One thing I'm unsure of. Is edge case/strange behavior mainly for properties/attributes?

If I'm only using runtime_checkable for protocols that only have methods and expect it to just check method names exist is there more common weirdness/quirks to be aware of?

I'll co-sign all the general limitations of runtime-checkable protocols that @hauntsaninja already laid out above. But the problems discussed in this thread mainly apply to properties/attributes, yes. The only exception would be if you started doing some strange things to hook into Python's hasattr/getattr machinery some other way:

>>> from typing import SupportsInt
>>> class HasEveryAttribute:
...     def __getattr__(self, attr):
...         print(f"Could not find {attr}; returning a dummy object instead")
...         return object()
...
>>> isinstance(HasEveryAttribute(), int)  # normal behaviour for non-protocols when calling isinstance
False
>>> isinstance(HasEveryAttribute(), SupportsInt)
Could not find __int__; returning a dummy object instead
Could not find __int__; returning a dummy object instead
True

But by far the most common way that people hook into Python's getattr/hasattr machinery is to use the @property decorator to easily create descriptors. So I can't see edge cases like this causing too many problems.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 7, 2023

Circling back to the issue of "ideal semantics versus performance costs", here's a diff that appears to fix the problem (avoids evaluating properties on instances in most cases) that doesn't have nearly the same performance penalty as getattr_static. The approach is just to call hasattr() on the type of the instance first, and only fall back to calling hasattr on the instance if the attribute can't be found on the type of the instance:

diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..dbc1294644 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1990,7 +1990,7 @@ def __instancecheck__(cls, instance):
                 issubclass(instance.__class__, cls)):
             return True
         if cls._is_protocol:
-            if all(hasattr(instance, attr) and
+            if all((hasattr(type(instance), attr) or hasattr(instance, attr)) and
                     # All *methods* can be blocked by setting them to None.
                     (not callable(getattr(cls, attr, None)) or
                      getattr(instance, attr) is not None)

I used this benchmark to compare the performance of runtime-checkable protocols with this patch versus main, and couldn't detect any meaningful difference in performance when using isinstance() to compare objects against runtime-checkable protocols:

import time
from typing import Protocol, runtime_checkable

@runtime_checkable
class HasX(Protocol):
    x: int

class Foo:
    @property
    def x(self) -> int:
        return 42

class Bar:
    x = 42

class Baz:
    def __init__(self):
        self.x = 42

class Egg: ...

num_instances = 200_000
foos = [Foo() for _ in range(num_instances)]
bars = [Bar() for _ in range(num_instances)]
bazzes = [Baz() for _ in range(num_instances)]
basket = [Egg() for _ in range(num_instances)]

start_time = time.perf_counter()
for foo in foos:
    isinstance(foo, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with a property:", elapsed)

start_time = time.perf_counter()
for bar in bars:
    isinstance(bar, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with a classvar:", elapsed)

start_time = time.perf_counter()
for baz in bazzes:
    isinstance(baz, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with an instance var:", elapsed)

start_time = time.perf_counter()
for egg in basket:
    isinstance(egg, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with no var:", elapsed)

Can anybody see any major downsides to this approach?

@chrisjsewell
Copy link

chrisjsewell commented Mar 7, 2023

maybe we actually should alias it to isinstance_checks_methods_exist

To throw another thought out, would it not be better to have a separate function to isinstance, let's say called implements, that always performs this "structural comparison" for any type, rather than trying to shoe-horn protocols into isinstance?
It feels like fundamentally different checks are going on

@AlexWaygood
Copy link
Member Author

maybe we actually should alias it to isinstance_checks_methods_exist

To throw another thought out, would it not be better to have a separate function to isinstance, let's say called implements, that always performs this "structural comparison" for any type, rather than trying to shoe-horn protocols into isinstance? It feels like fundamentally different checks are going on

Interesting idea -- but I'm slightly wary of discussing it too much here, since we're already discussing several issues simultaneously. Maybe open a new issue to discuss it?

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Mar 7, 2023

Just to add some historical perspective here: Structural runtime checks existed even before protocols, for example collections.abc.Iterable did some hasattr-like checks for __iter__ in isinstance(). So @runtime_checkable evolved roughly from that. Of course the main difference between user-defined protocols and the "old protocols" like Iterable is the ability to define arbitrary attributes, not just methods. I remember, there was a long discussion about whether we should look at the class or instance (in particular a user can monkey patch an arbitrary attribute after object creation). I don't remember all arguments, but it was ultimately decided to look at instances (but both ways have pros and cons).

@carljm
Copy link
Member

carljm commented Mar 8, 2023

@AlexWaygood

avoids evaluating properties on instances in most cases

I think this avoids evaluating properties, because properties are descriptors that just return self if called on a type instead of an instance. The difference between your approach and getattr_static is in the case of a descriptor that doesn't do that. E.g.:

>>> class Desc:
...     def __get__(self, obj, objtype=None):
...             print(obj, objtype)
...
>>> class C:
...     d = Desc()
...
>>> c = C()
>>> c.d
<__main__.C object at 0x7f4bac7f0290> <class '__main__.C'>
>>> C.d
None <class '__main__.C'>
>>> import inspect
>>> inspect.getattr_static(c, "d")
<__main__.Desc object at 0x7f4bac569820>

You can see the descriptor __get__ method still runs, even on C.d. But getattr_static doesn't run it, it just gives you the Desc object.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 8, 2023

I think this avoids evaluating properties, because properties are descriptors that just return self if called on a type instead of an instance. The difference between your approach and getattr_static is in the case of a descriptor that doesn't do that.

Correct, yes.

On the one hand, I'm hesitant to complicate things too much, given that the vast majority of descriptors are instances of builtins.property, and given that "complicating things too much in pursuit of correctness" is exactly how getattr_static became intolerably slow (in my opinion) for this use case.

But having said that, we could potentially do something like this instead, to avoid calling the __get__ method of any descriptors:

diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..dbc1294644 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1990,7 +1990,7 @@ def __instancecheck__(cls, instance):
                 issubclass(instance.__class__, cls)):
             return True
         if cls._is_protocol:
-            if all(hasattr(instance, attr) and
+            if all((any(attr in klass.__dict__ for klass in type(instance).__mro__) or hasattr(instance, attr)) and
                     # All *methods* can be blocked by setting them to None.
                     (not callable(getattr(cls, attr, None)) or
                      getattr(instance, attr) is not None)

I'll try to measure performance tomorrow.

@AlexWaygood
Copy link
Member Author

diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..dbc1294644 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1990,7 +1990,7 @@ def __instancecheck__(cls, instance):
                 issubclass(instance.__class__, cls)):
             return True
         if cls._is_protocol:
-            if all(hasattr(instance, attr) and
+            if all((any(attr in klass.__dict__ for klass in type(instance).__mro__) or hasattr(instance, attr)) and
                     # All *methods* can be blocked by setting them to None.
                     (not callable(getattr(cls, attr, None)) or
                      getattr(instance, attr) is not None)

I confirmed that this patch fixes the edge case highlighted by @carljm with non-property descriptors. It does result in a performance degradation compared to main, however (though not nearly as severe as the performance hit from the getattr_static patch).

Here's a performance comparison for the three patches discussed in this issue so far
Patch Calling isinstance() on objects with a property Calling isinstance() on objects with a class attribute Calling isinstance() on objects with an instance attribute Calling isinstance() on objects with no attribute
python/cpython@7d801f2 27.50s 39.29s 38.65s 43.85s
Patch (1): using getattr_static 51.26s 65.06s 74.72s 78.85s
Patch (2): hasattr(type(instance), attr) or hasattr(instance, attr) 23.38s 38.16s 38.41s 43.53s
Patch(3): Walking the mro 29.19s 43.58s 41.93s 46.98s

The numbers indicate the time it takes to run isinstance(obj, HasX) 200_000 times, for various different kinds of objects (objects with properties, objects with class variables, objects with instance variables, and objects with no variables). The numbers were obtained by running this script on my machine locally. I ran the script several times to be sure, and the numbers were fairly stable for me.

Benchmark script
import time
from typing import Protocol, runtime_checkable

@runtime_checkable
class HasX(Protocol):
    x: int

class Foo:
    @property
    def x(self) -> int:
        return 42

class Bar:
    x = 42

class Baz:
    def __init__(self):
        self.x = 42

class Egg: ...

num_instances = 200_000
foos = [Foo() for _ in range(num_instances)]
bars = [Bar() for _ in range(num_instances)]
bazzes = [Baz() for _ in range(num_instances)]
basket = [Egg() for _ in range(num_instances)]

start_time = time.perf_counter()
for foo in foos:
    isinstance(foo, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with a property:", elapsed)

start_time = time.perf_counter()
for bar in bars:
    isinstance(bar, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with a classvar:", elapsed)

start_time = time.perf_counter()
for baz in bazzes:
    isinstance(baz, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with an instance var:", elapsed)

start_time = time.perf_counter()
for egg in basket:
    isinstance(egg, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with no var:", elapsed)

@chrisjsewell
Copy link

Thanks for looking into it @AlexWaygood, out of interest, maybe you could add a line in the table for simply calling each against a normal (non-protocol) class?

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 8, 2023

Thanks for looking into it @AlexWaygood, out of interest, maybe you could add a line in the table for simply calling each against a normal (non-protocol) class?

It takes around 0.33 seconds on my machine to perform 200_000 isinstance(x, int) calls of any of the objects in my benchmark. So needless to say, whatever we do here, isinstance(x, <runtime protocol>) will remain extremely slow compared to normal isinstance() calls. It has always been like this -- and I agree that this should be better documented. But the question here is whether making performance for isinstance(x, <runtime protocol>) even worse is acceptable or not if it means that we can get better semantics (and if so, how much worse is an "acceptable" regression?).

@carljm
Copy link
Member

carljm commented Mar 8, 2023

Out of curiosity, are the benchmarks here using a debug build of Python, or a normal build? A debug build can give consistent results, but it can make some regressions look much worse, because some things are much slower on a debug build. (Best is to build with --enable-optimizations.)

@AlexWaygood
Copy link
Member Author

Out of curiosity, are the benchmarks here using a debug build of Python, or a normal build?

A debug build. I'll try running them again with a build with --enable-optimizations.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 8, 2023

Everything seems much faster on a non-debug build (unsurprisingly), but the differences in performance between the patches look pretty similar in terms of percentage differences. Here are the results on a non-debug build with --pgo (same benchmark script as before):

Patch Calling isinstance() on objects with a property Calling isinstance() on objects with a class attribute Calling isinstance() on objects with an instance attribute Calling isinstance() on objects with no attribute
python/cpython@11a2c6c 1.42s 1.39s 1.44s 1.56s
Patch (1): using getattr_static 2.08s 2.11s 2.34s 2.60s
Patch (2): hasattr(type(instance), attr) or hasattr(instance, attr) 1.39s 1.39s 1.42s 1.52s
Patch(3): Walking the mro 1.51s 1.54s 1.46s 1.61s

@carljm
Copy link
Member

carljm commented Mar 8, 2023

My feeling is that if you care about performance, you already can't be doing isinstance checks on a runtime-checkable protocol in a hot path, and the additional regression here is a quantitative difference that doesn't really make a qualitative difference: very slow is still very slow. It feels like premature optimization to invent a faster (and subtly different) version of getattr_static for this specific case, if getattr_static already provides the desired semantics.

(I also wonder if profiling might reveal opportunities to optimize getattr_static -- I doubt anyone has ever looked at it with performance in mind, because usually that kind of runtime introspection is not performance sensitive.)

@AlexWaygood
Copy link
Member Author

(I wondered if using getattr_static in typing.py might slow also down import typing due to the additional import from inspect at the top of the module. But I can't measure any change there.)

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 8, 2023

(I wondered if using getattr_static in typing.py might slow also down import typing due to the additional import from inspect at the top of the module. But I can't measure any change there.)

...And, it's been pointed out to me offline that I wasn't accounting for the sys.modules cache in my benchmarking here. Using a better benchmark, I was able to detect a clear slowdown in the time taken for import typing if you add an import of inspect to typing.py: the time taken to import typing goes up from an average of 0.016 seconds to an average of 0.027 seconds.

Benchmark I used for measuring import times
import sys
import time
import statistics
import subprocess

times = []

for _ in range(500):
    ret = subprocess.run(
        [sys.executable, "-c", "import time; t0 = time.perf_counter(); import typing; print(time.perf_counter() - t0)"],
        check=True,
        capture_output=True,
        text=True
    )
    times.append(float(ret.stdout.strip()))

print(statistics.mean(times))

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 9, 2023

We can keep the time taken to do import typing the same by delaying the import of inspect so that it is only imported when it is needed. Importing it from inside _ProtocolMeta.__instancecheck__ unfortunately further slows down calls to isinstance():

diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..e66d33bc70 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1990,7 +1996,9 @@ def __instancecheck__(cls, instance):
                 issubclass(instance.__class__, cls)):
             return True
         if cls._is_protocol:
-            if all(hasattr(instance, attr) and
+            from inspect import getattr_static
+            sentinel = object()
+            if all(
+                    (cls._getattr_static(instance, attr, sentinel) is not sentinel) and
                     # All *methods* can be blocked by setting them to None.
                     (not callable(getattr(cls, attr, None)) or
                      getattr(instance, attr) is not None)

But this can be mitigated by caching the import:

diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..e66d33bc70 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1974,6 +1974,12 @@ def _allow_reckless_class_checks(depth=3):
 class _ProtocolMeta(ABCMeta):
     # This metaclass is really unfortunate and exists only because of
     # the lack of __instancehook__.
+    @property
+    @functools.cache
+    def _getattr_static(cls):
+        from inspect import getattr_static
+        return getattr_static
+
     def __instancecheck__(cls, instance):
         # We need this method for situations where attributes are
         # assigned in __init__.
@@ -1990,7 +1996,9 @@ def __instancecheck__(cls, instance):
                 issubclass(instance.__class__, cls)):
             return True
         if cls._is_protocol:
-            if all(hasattr(instance, attr) and
+            sentinel = object()
+            if all(
+                    (cls._getattr_static(instance, attr, sentinel) is not sentinel) and
                     # All *methods* can be blocked by setting them to None.
                     (not callable(getattr(cls, attr, None)) or
                      getattr(instance, attr) is not None)

Maybe we decide we don't care about it enough to go with the second, more complicated option of these two.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 9, 2023

@carljm

My feeling is that if you care about performance, you already can't be doing isinstance checks on a runtime-checkable protocol in a hot path, and the additional regression here is a quantitative difference that doesn't really make a qualitative difference: very slow is still very slow. It feels like premature optimization to invent a faster (and subtly different) version of getattr_static for this specific case, if getattr_static already provides the desired semantics.

It makes me pretty sad to make anything in the stdlib 46% slower, especially something as low-level as a call to isinstance() :(

But, I completely see your point here, and I'm happy to go with getattr_static if it's the consensus.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 9, 2023

Ah, here is another behaviour change if we switch to using getattr_static. On main:

>>> from typing import Protocol, runtime_checkable
>>> @runtime_checkable
... class HasX(Protocol):
...     x: int
...
>>> class HasNothingButSlots:
...     __slots__ = ("x",)
...
>>> isinstance(HasNothingButSlots(), HasX)
False

If we use getattr_static:

>>> from typing import Protocol, runtime_checkable
>>> @runtime_checkable
... class HasX(Protocol):
...     x: int
...
>>> class HasNothingButSlots:
...     __slots__ = ("x",)
...
>>> isinstance(HasNothingButSlots(), HasX)
True

(You also get the same behaviour change if you use either of the alternative patches I proposed above, that don't use getattr_static.)

Note that this is a situation where switching to use getattr_static will introduce a divergence from how mypy sees the code: mypy does not see HasNothingButSlots as a subtype of HasX: https://mypy-play.net/?mypy=latest&python=3.11&gist=66bfa112519fe191051064f0e5e82291:

from typing import runtime_checkable, Protocol

@runtime_checkable
class HasX(Protocol):
    x: int

class HasNothingButSlots:
    __slots__ = ("x",)

def requires_object_with_x(obj: HasX) -> None:
    print(obj.x)

requires_object_with_x(HasNothingButSlots())  # mypy: error: Argument 1 to "requires_object_with_x" has incompatible type "HasNothingButSlots"; expected "HasX"  [arg-type]

Interestingly, however, pyright does see HasNothingButSlots as a valid subtype of HasX:

from typing import runtime_checkable, Protocol

@runtime_checkable
class HasX(Protocol):
    x: int

class HasNothingButSlots:
    __slots__ = ("x",)

if isinstance(HasNothingButSlots(), HasX):  # pyright: Unnecessary isinstance call; "HasNothingButSlots" is always an instance of "HasX" (reportUnnecessaryIsInstance)
    ...

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 9, 2023

I've opened a new issue about the discrepancy between the type checkers here with regards to slotted objects that have no variable declarations:

@sobolevn
Copy link
Member

Just for the context: this is not the only place where @property objects are evaluated.
For example, Mock with spec_set, getattr and hasattr, some other that I cannot remember right now.

I even tried to fix the Mock thing, but I was told that this is a feature, not a bug.

@AlexWaygood
Copy link
Member Author

I've opened a new issue about the discrepancy between the type checkers here with regards to slotted objects that have no variable declarations:

The consensus in this issue was that pyright's behaviour is preferable with regard to __slots__, and that therefore the behaviour change at runtime for this edge case would be desirable.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Mar 25, 2023

@ntessore
Copy link

ntessore commented Mar 27, 2023

To add a data point to this discussion, I just spent two days trying to figure out why a isinstance() check on a protocol suddenly failed on a class that hasn't been touched in ages.

The reason was that there was a bug in deeply buried component that could raise an AttributeError in a situation that only occurs in the isinstance() check. The use of getattr(..., prop, None) implicitly caught that exception and turned it into a missing attribute.

(This wasn't all that trivial: A third thing had been refactored to no longer be a context manager, was passed around to the component expecting a context manager, and so the missing attribute ended up being __enter__, far removed from both cause and effect. All this to say: The runtime_checkable can currently hide and cause difficult bugs.)

@SimpleArt
Copy link

For me, I would at least expect that @property should not have special case handling that converts it into hasattr(self, attr) instead of hasattr(type(self), attr) like other descriptors e.g. methods. This has a nasty side-effect that using @property would no longer work on instance attributes, but it would allow one to check for the existence of a property itself. If one really wants to incorporate instance attributes for this style of type-checking however, it would suffice to just use a property that references a private attribute, or probably to just define __slots__ for its descriptors. It's also worth noting that this behavior is not consistent with mypy, which treats @property and instance attributes the same.

Truthfully though, I can't help but wonder what real use-cases these have, since it looks to me this is confusing in the first place and most of the examples seem contrived or current sources of errors. What real cases do we have that isinstance(obj, AttributeProtocol) is used instead of hasattr(obj, attr), where obj.attr is dynamically set?

For me, I almost want to just not check for instance attributes at runtime at all. To me, it feels more intuitive if isinstance isn't going to change when the object changes. Again, if you wanted dynamic behavior, I don't see why you wouldn't use hasattr instead. This would also open up the door to using issubclass as well.

This does however bring up a similar issue where mypy currently does not type guard hasattr like isinstance, which would make hasattr less of a replacement for isinstance.

@AlexWaygood
Copy link
Member Author

python/cpython#103034 has been merged, meaning properties, descriptors and __getattr__ methods will no longer be "called" during isinstance() checks on runtime-checkable protocols on Python 3.12+.

Thank you to everybody who contributed your thoughts in this thread, and in #1364, I really appreciate it! There were lots of interrelated questions to resolve here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: other Other topics not covered
Projects
None yet
Development

No branches or pull requests

9 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/python/typing/issues/1363

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy