Skip to content

Validators *only* for assignment? - mimicking properties with setters #11787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 of 13 tasks
sneakers-the-rat opened this issue Apr 22, 2025 · 7 comments
Open
4 of 13 tasks

Comments

@sneakers-the-rat
Copy link

Initial Checks

  • I have searched Google & GitHub for similar requests and couldn't find anything
  • I have read and followed the docs and still think this feature is missing

Description

Currently:

  • it is possible to run validators on instantiation and assignment with ConfigDict(validate_assignment = True) . This is very useful!
  • It is possible to have a @computed_field value with a setter to mutate fields other than the computed field (since it does not, by definition, have its "own" value) on assignment.

It is not possible, as far as I know, to have a validator that runs only on assignment. This seems like a missing feature that makes it tricky to make models that have assignment side effects that should not apply during model instantiation - or more generally, to have models with fields whose behavior mimicks @property.setters.

Motivation

As a motivating example, say, hypothetically, we are trying to model a .torrent file. A torrent contains some list of files as well as a set of hashes that correspond to those files. When the file list is changed, we should invalidate any hashes that are set so that we know they need to be recomputed (or recompute on assignment, either/or, equivalent for this example). We should not do that when the model is instantiated, because those hashes are correct at that time!

Assignment-only validators might want to make use of the other fields on the model, so ideally they would not need to be defined as @classmethods as validators need to be.

Assignment validation using model validators leads to recursion for obvious reasons (e.g. #6597 ), and a specific validator that was aware of the context of assignment would be able to avoid them by only triggering once per actual assignment.

Syntax Example

I figure these would just be a wrapper around the other validator methods that mark them as not being called on model creation, and always create the appropriate __setattr__ code even if validate_assignment = False (to allow for using them without running validation on every field).

The annotated validators might look like this:

from pathlib import Path
from typing import Annotated, Any
from pydantic import (
    BaseModel, 
    BeforeAssignmentValidator, 
    AfterAssignmentValidator, 
    WrapAssignmentValidator, 
    PydanticSelf
)

# where PydanticSelf is something like
# type PydanticSelf = Generic[TypeVar("T", bound=BaseModel)]

def _invalidate_hashes(self: PydanticSelf["Torrent"], value: Any) -> Any:
    self.hashes = None
    return value

class Torrent(BaseModel):
    files: Annotated[list[Path], AfterAssignmentValidator(_invalidate_hashes)]
    hashes: list[bytes] | None = None
  • we use a special placeholder "PydanticSelf" type like ValidationInfo to keep with the optional dependency injection style, but it could just as easily be mandatory or use a regular python type annotation.
  • we could also have self inside of of ValidationInfo or similar

The functional form might look like this

class Torrent(BaseModel):
    files: list[Path]
    hashes: list[bytes] | None = None

    @field_validator("files", mode="after", assignment_only=True)
    def invalidate_hashes(self, value: list[Path]) -> list[Path]:
        self.hashes = None
        return value

I'm not sure if this is valid, but another obvious option would be doing this:

class Torrent(BaseModel):
    files: list[Path]
    hashes: list[bytes] | None = None

    @files.setter
    def invalidate_hashes(self, value: list[Path]) -> list[Path]:
        self.hashes = None
        return value

anyway let me know what ya think, as always, i would be more than happy to draft this.

but that seems worse: we could rig it so multiple were allowed unlike normal property setters, but them looking so similar would probably be confusing.

Affected Components

@Viicos
Copy link
Member

Viicos commented Apr 22, 2025

This seems pretty niche to me, so I'm afraid this will introduce attractive nuisances if implemented as is.

Could you expand on why you don't need validation to performed on instantiation? The goal of Pydantic is to make sure data is validated so that you can safely make use of it.

You could also go a different way we a solution that might suit your use case better:

hashes is fully determined from files. This seems to be a great fit for defining it as a @property/@cached_property (with an invalidation mechanism to be implemented). You could also define files as a list[File], File being a separate model. This will scale better (you'll be limited with your current approach whenever you need to add extra info to the files), and you can define a hash cached property on File (no need to implement any invalidation mechanism this time), and have hashes defined as a simple property returning [f.hash for f in self.files].

@Viicos Viicos added the awaiting author response awaiting response from issue opener label Apr 22, 2025
@sneakers-the-rat
Copy link
Author

sneakers-the-rat commented Apr 22, 2025

I don't have any estimates on how niche it is, but my initial guess would be somewhere around how niche properties with setters are? Am not sure why this would be an attractive nuisance. Since this would directly address these issues:
#6597
#6429 (comment)
#8185
#8474
Among others where the behavior of validation on assignment is not very intuitive, it doesn't seem that niche to me, but you know better than i do I suppose.

You could also go a different way we a solution that might suit your use case better:

Those are possible options if I was not trying to model an existing format and the format was arbitrary. Part of the benefit of having the pydantic model being a direct reflection of the format is that serialization/deserialization is direct: I just dump the model and bencode it: this is directly in line with the spirit of making sure the data is validated so you can use it, just that validity is nonlocal and depends on the alignment of multiple fields.

Hashing can take quite a long time and is stored in the torrent file when read, so cached property is not really a great fit. Plus would still need a trigger to invalidate it from another field changing, which circles back to this issue

you'll be limited with your current approach whenever you need to add extra info to the files

I'm not sure how? Since the hash only depends on the files, any other fields won't affect this feature or the hash calculation.

@Viicos
Copy link
Member

Viicos commented Apr 29, 2025

I'm not sure the issues you linked are showing exactly the same problem; for instance in #6597, I assume the author also want validation to run on instantiation.

Hashing can take quite a long time and is stored in the torrent file when read, so cached property is not really a great fit.

But cached property is precisely meant for this right? Only computed once, and a proper invalidation mechanism makes it so that it is only recomputed when needed. More broadly I think properties still are a better fit for your use case. For instance, what if someone does torrent.files.append(Path(...))? Then your hashes are inconsistent but there is no way to catch this with validator on assignments.

I'm not sure how? Since the hash only depends on the files, any other fields won't affect this feature or the hash calculation.

What I meant is that currently a file is represented as a pathlib.Path. If you ever need to expose some useful properties in the future, it wont be possible unless you do something like:

class TorrentFile(BaseModel):
    path: Path
    some_useful_attribute: <...>

    @property
    def hash(self):
        ...

class Torrent(BaseModel):
    files: list[TorrentFile]

@sneakers-the-rat
Copy link
Author

Related Issues

Fair, not obvious how some of those issues are related, so here's a little more detailed breakdown of some prior issues that I think having this would help resolve.

Per-field assignment validation, post-init modification of validation behavior

E.g. #10563

Problem: Want to be able to validate some fields on assignment but not others. Or, want to conditionally change assignment validation behavior.

Relation: Straightforwardly, this would allow one to specify assignment validation behavior for specific fields, and would allow differentiating assignment validation from init validation.

Recursion errors with `validate_assignment` and `mode='after'`

E.g. #9576

Problem: having validate assignment on with an after validate that assigns values necessarily causes recursion.

Relation: To mutate a value on assignment, using model_validator and validate_assignment=True is the most straightforward way, but it's too general: the model validator is triggered on assignment to every field, and every model validator runs. so in #9576, we shouldn't trigger the validation on assignment to self.other, we only care about assignment to self.name.

If instead of this:

class User(BaseModel):  
    model_config = ConfigDict(validate_assignment = True)

    name: str
    other : Optional[str] = None
    
    @model_validator(mode = 'after')
    def validate_me(self):
        self.other = 'abc'
        return self

we were able to do this

def _set_other(self, value: Any, info: ValidationInfo) -> Self:
    self.other = "abc"
    return self

SetOther = Annotated[T, AfterAssignmentValidator(_set_other)]

class User(BaseModel):
    name: SetOther[str]
    other: str | None = None

or, functional form, this

class User(BaseModel):  

    name: str
    other : Optional[str] = None
    
    # this, or assignment_only=True, or whatever the syntax may be
    @field_validator("name", mode = 'after', when_used="assignment")
    def set_other(self, value: Any):
        self.other = "abc"
        return self

then there is no need for enabling validate_assignment for the whole model, the side-effect is contained and triggered only on the fields that are needed, and the infinite recursion is avoided.

The other direction is suggested here, of having run_on_assignment=False to disable assignment validation, but to me that's backwards: people are using after model_validators to do assignment-time side-effects that depend on field values other than the one that is being mutated, and it seems odd to 1) have a workaround (after model validator + validate assignment), 2) the workaround causes problems, 3) disable part of the workaround, rather than just make a thing so the workaround isn't needed anymore.

The other suggestion of disabling assignment validation during execution of the after validator also doesn't exactly meet the need either and seems like it would deepen the potential for footguns - one can only imagine the "why don't my validators run on assignment when it seems like they should!" issues.

Even for intrinsically recursive validators that mutate the same value that's being assigned ti, If instead the validation-only behavior was separated from the general model validators then it would be possible to not trigger recursive assignment validators when they are called within assignment validators* without needing to disable them in a way that will be more difficult to communicate to users than providing an explicit AssignmentValidator.

So e.g. say I have a model like this that has one self-recursing validator to count the number of assignments and another assignment side effect model that emits an event based on another field value (e.g. say we are implementing a rate-limiter or a session limiter or something):

class MyModel(BaseModel):
    model_config = ConfigDict(validate_assignment=True)

    n_assignments: int = -1  # ideally this should be 0, but can't do assignment-only validators
    event_threshold: int = 10
    event: Callable 
    value: str = "something we assign to a lot"

    @model_validator(mode="after")
    def increment(self) -> Self:
        self.n_assignments += 1
    
    @field_validator("n_assignments", mode="after")
    def emit_event(self, value: int, info: ValidationInfo) -> Self:
        if value > info.data['event_threshold']:
            info.data['event']()
            return 0
        return value

If we were to simply disable assignment validation during execution of the validator, then our emit_event validator would fail to run even though it seems like it should. Having a specific assignment validator would give a clear way of saying "assignment validators are always only called once per user-initiated assignment" in the docs, separate assignment side effects from model correctness checks, and not have that behavior bleed into instantiation-time model/field validators

Initialize `@computed_field`/`@cached_field`

I.e. #9131

Problem: This issue is almost identical to what i'm asking for here - want to be able to both initialize a model with a precomputed value, but then also recompute it if it is not present. The problem is "what do we do when we dump the model, do we recompute it or keep the assigned value?"

Relation: If it was possible to have assignment side effects on the other fields of the model that could invalidate the cached property, then the problems with that issue are resolved - One can explicitly annotate when the cached property should be invalidated. So cached_property is not a replacement for setter-like behavior, but a complement to it!

from pydantic import BaseModel, computed_field

class Circle(BaseModel):
    radius: int
    unrelated_prop: str

    @cached_property
    def diameter(self) -> int:
        return self.radius * 2

    @field_validator("radius", mode="after", assignment_only=True)
    def invalidate_diameter(self):
        del self.diameter

That handles the need to instantiate values (cached property can be assigned to), invalidate values (via assignment-only validators), and lazily compute values (if unset, compute using the method when dumped/accessed). That would be very awkward to do with @model_validator and validate_assignment=True on a whole-model basis, if it could be done avoiding the recursion problems.

Internals problems with assignment validation and state

There are a handful of these issues, but e.g. #7105 , also #8474

Problem: Validating with assignment causes values to be updated even if they fail validation. According to the issue this is because the object has already been mutated, and there isn't a clear way to rollback the object state

Relation: Taking a brief read of the impl, it seems like the issue with handling this comes from the mixed behavior of validate_assignment in pydantic-core, where some impls mutate the value before vs after the validation step. @model_validator, particularly with mode=after, which is the most common way that i can see from reading issues the way of doing assigment side effects that affect non-target fields, is a tricky case that should have distinct behavior for assignment vs instantiation. By removing the need to use model_validator for off-target assignment validation we make it much easier to have safe/performant validation-only behavior: we may not necessarily want to implement the idea of the proxy class mentioned in #7105 for all @model_validators that can rollback state on validation failure because that would be expensive to do for every instantiation, where we only really care about rolling back the object state on assignment. So this may not be unique to assignment-only validators, but just an example of how by providing an "official" way to do the things people are carving desire paths to do it might ease maintenance burden.

distinction from @property and @cached_property

But cached property is precisely meant for this right? Only computed once, and a proper invalidation mechanism makes it so that it is only recomputed when needed.

Right, so another way of asking for what i'm asking for is a way of writing that invalidation mechanism - a way of having assign-time side effects on the model that don't recurse.

  • @property: either cannot be passed a value on instantiation, needs a private attr or a differently-named attr to store value. @property is not responsive to the need here, as we want to trigger invalidation from mutating other fields. we also want this to be dumped when the model is dumped
  • @cached_property - same as Make @computed_field (optionally) initializable #9131 , we also want to be able to init the model with these values if they are already precomputed, and then only recompute them if we need to.

The reason these don't really work is the same reason that @property has @property.setter - sometimes we want something to happen to other fields on the model, or use the values of other fields to do something when the value is changed.

Other problems

For instance, what if someone does torrent.files.append(Path(...))

It is true that this would not solve all problems, but it would solve the problem i am having.

What I meant is that currently a file is represented as a pathlib.Path. If you ever need to expose some useful properties in the future, it wont be possible unless you do something like:

I am not quite sure how this is relevant to the need for assignment-only validation behavior, but again just to clarify i am modeling an already-existing specification, so the fields are already prescribed and we can't really remodel our way around problems like this (if we want the model to be a 1:1 representation of the spec, which we do).

(For the record, the actual model looks something like this)

Modeling an existing format, so I don't get to make calls about what it looks like:

class FileItem(BaseModel):
    length: int
    path: list[FilePart]
    attr: L[b"p"] | None = None

class InfoDictV1(InfoDictRoot):
    pieces: Pieces | None = None
    length: Annotated[int, Gt(0)] | None = None
    files: Annotated[list[FileItem, MinLen(1)] | None = None
    piece_length: V1PieceLength = Field(alias="piece length")

Omitting details like referenced models in annotations, but ya just to say there is a specific way that one is supposed to add metadata to the metainfo dict and it is not by adding additional fields to the file list.

Another alternative

if this is too complicated or sounds like a bad itea, i would settle for having the information about what initiated a validation call in the ValidationInfo object so i could roll my own assignment-only validator, like

class MyModel(BaseModel):
    something: int

    @field_validator("something", mode="after")
    def something_else(cls, value: int, info: ValidationInfo) -> int:
        if info.initiator.type == "assignment":
            # do something that's assignment-only
        return value
        

That seems like it would be generally good to have, as more information about the initiating context would have been useful in several other situations, but it would directly address my need here too.

again just to reiterate that i am not showing up here demanding y'all do work for me, if we arrive at something that sounds reasonable i am happy to draft it!

@Viicos
Copy link
Member

Viicos commented May 2, 2025

For instance, what if someone does torrent.files.append(Path(...))

It is true that this would not solve all problems, but it would solve the problem i am having.

What I meant is that currently a file is represented as a pathlib.Path. If you ever need to expose some useful properties in the future, it wont be possible unless you do something like:

I am not quite sure how this is relevant to the need for assignment-only validation behavior, but again just to clarify i am modeling an already-existing specification, so the fields are already prescribed and we can't really remodel our way around problems like this (if we want the model to be a 1:1 representation of the spec, which we do).

What I am trying to express here is that Pydantic can't reasonably introduce solutions that will perfectly fit every user use case. I understand you need to have a 1:1 representation of some spec that might not be ideal, but Pydantic can't be held responsible for this.

Instead of providing validators only for assignment, having a really limited scope, we need to find more generic ways on achieving such behavior.

For instance, you could have your model defined as:

from pydantic import BaseModel, SkipValidation

def update_hashes(instance, attribute, value):
    instance.files = value
    # Recompute hashes
    instance.hashes = ...

class Torrent(BaseModel):
    files: list[Path] = Field(on_setattr=update_hashes)
    hashes: SkipValidation[list[bytes]]

Inspired from attrs's on_setattr.

Your alternative in ValidationInfo seems also more reasonable to me.

@sneakers-the-rat
Copy link
Author

What I am trying to express here is that Pydantic can't reasonably introduce solutions that will perfectly fit every user use case. I understand you need to have a 1:1 representation of some spec that might not be ideal, but Pydantic can't be held responsible for this.

Absolutely understood! Not trying to be ornery or saying "pydantic fix my special problem," only reason I raised the issue is because I think it is a more general need and would benefit other use cases, which is why I gathered the related issues and tried to describe how this would benefit them.

Inspired from attrs's on_setattr.

This is great! And would be exactly what I need. If I'm reading the docs right, what I pitched above is basically a pydantic flavor of on_setattr, with annotated and decorated validators instead of in the field.

Your alternative in ValidationInfo seems also more reasonable to me.

Also great! If this is preferable I can make a fuller sketch. I'll go through and find all the other kinds of contexts where a validator can be called so it's a more general solution than just detecting assignment, e.g. being able to differentiate between direct instantiation vs. Instantiation nested within a model would also probably be helpful. Will need a little bit of scoping input on that - don't want to make it just an edge case feature that is incomplete, but also don't want to blow it up into a huge change.

@Viicos
Copy link
Member

Viicos commented May 5, 2025

This is great! And would be exactly what I need. If I'm reading the docs right, what I pitched above is basically a pydantic flavor of on_setattr, with annotated and decorated validators instead of in the field.

Seems like it, and on_setattr has the benefit of fulfilling many other use cases not limited to validation.

I think we'll explore more on_setattr so I'll report here for any progress. We'll try to see if it can fit for 2.12.

@Viicos Viicos removed the awaiting author response awaiting response from issue opener label May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy