-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should isinstance()
work at runtime when comparing classes with properties against runtime-checkable protocols?
#1363
Comments
I think other issue has sorta dominated conversion and most of the points in it I think partly carry over here. I think runtime should generally be lax and if type checkers disagree on expected semantic with some being more lenient, runtime should not treat that interpretation as an error. I also think the property evaluation behavior is too easy footgun. It can be documented more, but it was definitely surprise to me to read cpython issue and have to review if any of my runtime_checkable usages were affected. I occasionally have expensive cached_properties that could even produce error depending on where isinstance code is executed, but be fine to call in intended execution. An example of this is property that loads file that a user may not have permission to read at time of isinstance call, but if property is executed later elsewhere can be safe. |
Explicitly-raised exceptions are not part of the Python type system. We don't annotate functions for what exceptions they can raise, we don't expect callers to handle all exceptions that might be raised (i.e. we don't have checked exceptions), so "prevent all exceptions" is not a form of safety that we expect the type system to provide. If we say that we want the type-checker to ensure "safety" of the |
Let's be clear here that this issue doesn't actually have much to do with the behaviour of static type checkers (other than whether we should work to align the runtime behaviour more with how static type checkers currently see the code). This issue is purely about what should happen at runtime when using |
I do think we should attempt (as much as possible) to have consistent semantics for what it means for an instance to be compatible with a Protocol, rather than intentionally having different semantics for static vs runtime checking. So I think the question of whether runtime checking should mark |
That makes sense to me. What about the concerns about backwards compatibility here? Do we need some kind of deprecation warning? Or would it be better to just go ahead with the change, document it, and hope that not too many people are relying on the current behaviour? (They probably aren't, I think? But I don't really know how to judge that.) |
I'd lean to notify/include a couple other notable runtime type libraries and whether they are, a. aware of this behavior, and b. whether it'd likely affect usage. Sourcegraph can help find usages. I've manually spot checked 5 files at random and found 0 attributes in any of their protocols, only methods. I may after work see if sourcegraph api is easy enough to have small script check percent of runtime_checkable protocols using attributes. Even though I sorta feel this is a bug, it's probably safer not to backport and make a change only for main/3.12+. |
TBH, if we agree that we ideally do want runtime checking and static checking to agree on Protocol compatibility, that really implies even deeper changes to I still do think that a single consistent interpretation is better, because static type checkers will make narrowing decisions based on runtime
It looks like @ilevkivskyi did the initial implementation of |
Agreed, I'd definitely be very wary about backporting this behaviour change |
That would be very problematic for situations like the following: from typing import Protocol, runtime_checkable
@runtime_checkable
class HasX(Protocol):
x: int
class Foo:
def __init__(self) -> None:
self.x: int = 42
f = Foo()
# evaluates to True currently
# evaluates to False if we only look at the type of f, and don't look at the instance
isinstance(f, HasX) |
Yes, of course, good point; at runtime the type can't tell us what attributes instances will have. So the more general case of instances with atypical sets of attributes is just an unavoidable inconsistency. I suspect the case of properties conditionally raising |
I think Frankly, I think there is no behaviour surrounding properties that will make everyone happy. The fact that python/cpython#89138 (comment) for some more general thoughts on
While this is true, a) I don't think we clearly communicate anywhere that In general, |
Agreed that we should definitely point out the performance implications of using runtime-checkable protocols in the docs, whether we go through with this change or not. I just did some benchmarking using this script, and it does seem like the performance costs of using import time
from typing import Protocol, runtime_checkable
@runtime_checkable
class HasX(Protocol):
x: int
class Foo:
@property
def x(self) -> int:
return 42
start_time = time.perf_counter()
for _ in range(100_000):
isinstance(Foo(), HasX)
print(time.perf_counter() - start_time) On Even if we agree that the semantics using |
I'm beginning to think that @hauntsaninja has it right, and the OP's comment over in the CPython issue
is actually the best available answer here :) As in, maybe I think better runtime checking of protocols is possible, but likely only in something like Static Python. |
One thing I'm unsure of. Is edge case/strange behavior mainly for properties/attributes? If I'm only using runtime_checkable for protocols that only have methods and expect it to just check method names exist is there more common weirdness/quirks to be aware of? |
@hmc-cs-mdrissi For the simple case of Protocols with only methods + normal classes, I've seen two issues:
|
I'll co-sign all the general limitations of runtime-checkable protocols that @hauntsaninja already laid out above. But the problems discussed in this thread mainly apply to properties/attributes, yes. The only exception would be if you started doing some strange things to hook into Python's >>> from typing import SupportsInt
>>> class HasEveryAttribute:
... def __getattr__(self, attr):
... print(f"Could not find {attr}; returning a dummy object instead")
... return object()
...
>>> isinstance(HasEveryAttribute(), int) # normal behaviour for non-protocols when calling isinstance
False
>>> isinstance(HasEveryAttribute(), SupportsInt)
Could not find __int__; returning a dummy object instead
Could not find __int__; returning a dummy object instead
True But by far the most common way that people hook into Python's |
Circling back to the issue of "ideal semantics versus performance costs", here's a diff that appears to fix the problem (avoids evaluating properties on instances in most cases) that doesn't have nearly the same performance penalty as diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..dbc1294644 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1990,7 +1990,7 @@ def __instancecheck__(cls, instance):
issubclass(instance.__class__, cls)):
return True
if cls._is_protocol:
- if all(hasattr(instance, attr) and
+ if all((hasattr(type(instance), attr) or hasattr(instance, attr)) and
# All *methods* can be blocked by setting them to None.
(not callable(getattr(cls, attr, None)) or
getattr(instance, attr) is not None) I used this benchmark to compare the performance of runtime-checkable protocols with this patch versus import time
from typing import Protocol, runtime_checkable
@runtime_checkable
class HasX(Protocol):
x: int
class Foo:
@property
def x(self) -> int:
return 42
class Bar:
x = 42
class Baz:
def __init__(self):
self.x = 42
class Egg: ...
num_instances = 200_000
foos = [Foo() for _ in range(num_instances)]
bars = [Bar() for _ in range(num_instances)]
bazzes = [Baz() for _ in range(num_instances)]
basket = [Egg() for _ in range(num_instances)]
start_time = time.perf_counter()
for foo in foos:
isinstance(foo, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with a property:", elapsed)
start_time = time.perf_counter()
for bar in bars:
isinstance(bar, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with a classvar:", elapsed)
start_time = time.perf_counter()
for baz in bazzes:
isinstance(baz, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with an instance var:", elapsed)
start_time = time.perf_counter()
for egg in basket:
isinstance(egg, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with no var:", elapsed) Can anybody see any major downsides to this approach? |
To throw another thought out, would it not be better to have a separate function to |
Interesting idea -- but I'm slightly wary of discussing it too much here, since we're already discussing several issues simultaneously. Maybe open a new issue to discuss it? |
Just to add some historical perspective here: Structural runtime checks existed even before protocols, for example |
I think this avoids evaluating properties, because properties are descriptors that just
You can see the descriptor |
Correct, yes. On the one hand, I'm hesitant to complicate things too much, given that the vast majority of descriptors are instances of But having said that, we could potentially do something like this instead, to avoid calling the diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..dbc1294644 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1990,7 +1990,7 @@ def __instancecheck__(cls, instance):
issubclass(instance.__class__, cls)):
return True
if cls._is_protocol:
- if all(hasattr(instance, attr) and
+ if all((any(attr in klass.__dict__ for klass in type(instance).__mro__) or hasattr(instance, attr)) and
# All *methods* can be blocked by setting them to None.
(not callable(getattr(cls, attr, None)) or
getattr(instance, attr) is not None) I'll try to measure performance tomorrow. |
I confirmed that this patch fixes the edge case highlighted by @carljm with non-property descriptors. It does result in a performance degradation compared to Here's a performance comparison for the three patches discussed in this issue so far
The numbers indicate the time it takes to run Benchmark scriptimport time
from typing import Protocol, runtime_checkable
@runtime_checkable
class HasX(Protocol):
x: int
class Foo:
@property
def x(self) -> int:
return 42
class Bar:
x = 42
class Baz:
def __init__(self):
self.x = 42
class Egg: ...
num_instances = 200_000
foos = [Foo() for _ in range(num_instances)]
bars = [Bar() for _ in range(num_instances)]
bazzes = [Baz() for _ in range(num_instances)]
basket = [Egg() for _ in range(num_instances)]
start_time = time.perf_counter()
for foo in foos:
isinstance(foo, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with a property:", elapsed)
start_time = time.perf_counter()
for bar in bars:
isinstance(bar, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with a classvar:", elapsed)
start_time = time.perf_counter()
for baz in bazzes:
isinstance(baz, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with an instance var:", elapsed)
start_time = time.perf_counter()
for egg in basket:
isinstance(egg, HasX)
elapsed = time.perf_counter() - start_time
print("Time taken for objects with no var:", elapsed) |
Thanks for looking into it @AlexWaygood, out of interest, maybe you could add a line in the table for simply calling each against a normal (non-protocol) class? |
It takes around 0.33 seconds on my machine to perform 200_000 |
Out of curiosity, are the benchmarks here using a debug build of Python, or a normal build? A debug build can give consistent results, but it can make some regressions look much worse, because some things are much slower on a debug build. (Best is to build with |
A debug build. I'll try running them again with a build with |
Everything seems much faster on a non-debug build (unsurprisingly), but the differences in performance between the patches look pretty similar in terms of percentage differences. Here are the results on a non-debug build with
|
My feeling is that if you care about performance, you already can't be doing (I also wonder if profiling might reveal opportunities to optimize |
(I wondered if using |
...And, it's been pointed out to me offline that I wasn't accounting for the Benchmark I used for measuring import timesimport sys
import time
import statistics
import subprocess
times = []
for _ in range(500):
ret = subprocess.run(
[sys.executable, "-c", "import time; t0 = time.perf_counter(); import typing; print(time.perf_counter() - t0)"],
check=True,
capture_output=True,
text=True
)
times.append(float(ret.stdout.strip()))
print(statistics.mean(times)) |
We can keep the time taken to do diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..e66d33bc70 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1990,7 +1996,9 @@ def __instancecheck__(cls, instance):
issubclass(instance.__class__, cls)):
return True
if cls._is_protocol:
- if all(hasattr(instance, attr) and
+ from inspect import getattr_static
+ sentinel = object()
+ if all(
+ (cls._getattr_static(instance, attr, sentinel) is not sentinel) and
# All *methods* can be blocked by setting them to None.
(not callable(getattr(cls, attr, None)) or
getattr(instance, attr) is not None) But this can be mitigated by caching the import: diff --git a/Lib/typing.py b/Lib/typing.py
index 8d40e923bb..e66d33bc70 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1974,6 +1974,12 @@ def _allow_reckless_class_checks(depth=3):
class _ProtocolMeta(ABCMeta):
# This metaclass is really unfortunate and exists only because of
# the lack of __instancehook__.
+ @property
+ @functools.cache
+ def _getattr_static(cls):
+ from inspect import getattr_static
+ return getattr_static
+
def __instancecheck__(cls, instance):
# We need this method for situations where attributes are
# assigned in __init__.
@@ -1990,7 +1996,9 @@ def __instancecheck__(cls, instance):
issubclass(instance.__class__, cls)):
return True
if cls._is_protocol:
- if all(hasattr(instance, attr) and
+ sentinel = object()
+ if all(
+ (cls._getattr_static(instance, attr, sentinel) is not sentinel) and
# All *methods* can be blocked by setting them to None.
(not callable(getattr(cls, attr, None)) or
getattr(instance, attr) is not None) Maybe we decide we don't care about it enough to go with the second, more complicated option of these two. |
It makes me pretty sad to make anything in the stdlib 46% slower, especially something as low-level as a call to But, I completely see your point here, and I'm happy to go with |
Ah, here is another behaviour change if we switch to using >>> from typing import Protocol, runtime_checkable
>>> @runtime_checkable
... class HasX(Protocol):
... x: int
...
>>> class HasNothingButSlots:
... __slots__ = ("x",)
...
>>> isinstance(HasNothingButSlots(), HasX)
False If we use >>> from typing import Protocol, runtime_checkable
>>> @runtime_checkable
... class HasX(Protocol):
... x: int
...
>>> class HasNothingButSlots:
... __slots__ = ("x",)
...
>>> isinstance(HasNothingButSlots(), HasX)
True (You also get the same behaviour change if you use either of the alternative patches I proposed above, that don't use Note that this is a situation where switching to use from typing import runtime_checkable, Protocol
@runtime_checkable
class HasX(Protocol):
x: int
class HasNothingButSlots:
__slots__ = ("x",)
def requires_object_with_x(obj: HasX) -> None:
print(obj.x)
requires_object_with_x(HasNothingButSlots()) # mypy: error: Argument 1 to "requires_object_with_x" has incompatible type "HasNothingButSlots"; expected "HasX" [arg-type] Interestingly, however, pyright does see from typing import runtime_checkable, Protocol
@runtime_checkable
class HasX(Protocol):
x: int
class HasNothingButSlots:
__slots__ = ("x",)
if isinstance(HasNothingButSlots(), HasX): # pyright: Unnecessary isinstance call; "HasNothingButSlots" is always an instance of "HasX" (reportUnnecessaryIsInstance)
... |
I've opened a new issue about the discrepancy between the type checkers here with regards to slotted objects that have no variable declarations: |
Just for the context: this is not the only place where I even tried to fix the |
The consensus in this issue was that pyright's behaviour is preferable with regard to |
To add a data point to this discussion, I just spent two days trying to figure out why a The reason was that there was a bug in deeply buried component that could raise an (This wasn't all that trivial: A third thing had been refactored to no longer be a context manager, was passed around to the component expecting a context manager, and so the missing attribute ended up being |
For me, I would at least expect that Truthfully though, I can't help but wonder what real use-cases these have, since it looks to me this is confusing in the first place and most of the examples seem contrived or current sources of errors. What real cases do we have that For me, I almost want to just not check for instance attributes at runtime at all. To me, it feels more intuitive if This does however bring up a similar issue where mypy currently does not type guard |
python/cpython#103034 has been merged, meaning properties, descriptors and Thank you to everybody who contributed your thoughts in this thread, and in #1364, I really appreciate it! There were lots of interrelated questions to resolve here. |
We've been having a debate in python/cpython#102433 about how
isinstance()
should work at runtime when comparing instances of classes with properties against runtime-checkable protocols. There's a few interrelated issues here, so I'm interested in knowing what the wider community thinks about the best way forward.Current behaviour
The current behaviour at runtime is that a property
x
on an objectobj
will be evaluated if you callisinstance()
onobj
against a runtime-checkable protocol that specifiesx
as part of the interface:The reason for this is that calling
isinstance()
onobj
againstHasX
performs a structural check comparingobj
toHasX
. If all attributes and methods specified in theHasX
protocol are present onobj
, theisinstance()
check returnsTrue
. This is taken care of by_ProtocolMeta.__instancecheck__
, which just callshasattr
onobj
for every attribute specified in theHasX
protocol; callinghasattr(obj, x)
will result in thex
property onobj
being evaluated as part of theisinstance()
check.An alternative to the current behaviour at runtime
An alternative to the current implementation would be to change the
hasattr
call in_ProtocolMeta.__instancecheck__
to useinspect.getattr_static
(or similar):This would mean that properties would not be evaluated against instances, since
getattr_static
avoids evaluating descriptors at runtime:Criticisms of the current behaviour at runtime
The current behaviour at runtime has been criticised due to the fact that it can result in unexpected behaviour from
isinstance()
checks, which don't usually result in properties being evaluated on instances:These criticisms have been levied in the CPython repo more than once (a previous instance of this being reported was python/cpython#89445), so this is clearly pretty surprising behaviour to some people.
Possible rebuttal to the criticisms of the current behaviour
You could say that the whole point of runtime-checkable protocols is that they do something "a little different" when you call
isinstance()
against them. Arguing "I didn't expect this because it's not whatisinstance()
usually does" may not be the strongest case here.When it comes to raising exceptions inside property getter methods, it may also be unwise in general to raise exceptions that aren't subclasses of
AttributeError
-- doing so will generally lead to unexpected things happening withhasattr()
orgetattr()
calls.Defence of the current behaviour at runtime
The current behaviour has things to recommend it. Firstly, using
hasattr()
here is probably faster than an alternative implementation usinggetattr_static
._ProtocolMeta.__instancecheck__
has previously been criticised for on performance grounds. This could make things even worse.Moreover, while the current implementation leads to surprising outcomes for some users, it's possible that changing the implementation here could also lead to surprising outcomes.
getattr_static
has different semantics tohasattr
: sometimes it finds attributes wherehasattr
doesn't, and sometimes it doesn't find attributes whenhasattr
does. It's hard to predict exactly what the effect of changing this would be, and it could be a breaking change for some users. The semantics of_ProtocolMeta.__instancecheck__
have been the same for a long time now; it's possible that the better course of action might be to simply document that callingisinstance
on an instanceobj
against a runtime-checkable protocol might result in properties onobj
being evaluated.Possible rebuttal to the defence of the current behaviour
We probably shouldn't prioritise speed over correctness here; users who care deeply about performance probably shouldn't be using runtime-checkable protocols anyway.
Just because the semantics have been the same for several years doesn't mean we should necessarily stick with them if they're bad; sometimes breaking changes are necessary.
A specific case where changing the implementation would lead to different semantics
One reason why I lean towards keeping the current implementation is because I'd like to keep the invariant where the
isinstance
guard in the following snippet is enough to ensure that accessing thex
attribute is safe:If we changed the implementation of
_ProtocolMeta.__instancecheck__
to usegetattr_static
or similar, this wouldn't necessarily be the case. Consider:To me, being able to use
isinstance()
guards in this way -- to provide structural narrowing at runtime as well as statically, ensuring attributes can always be accessed safely -- is kind of the whole point of theruntime_checkable
decorator.Others have argued, however, that they would find it more intuitive if the runtime
isinstance()
check was closer to how static type checkers understood the above snippet. (Static type checkers have no special understanding of the fact that properties can sometimes raiseAttributeError
, so will always consider theFoo
class above as being a valid subtype ofHasX
, contradicting the current behaviour at runtime.)Thoughts?
The text was updated successfully, but these errors were encountered: