Skip to content

Allow setting fill_value on Zarr format 3 arrays #10161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 26, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Set use_zarr_fill_value_as_mask=False
  • Loading branch information
dcherian committed Mar 24, 2025
commit 1fba99a56e93bdba369b8610f58f96728fb57533
3 changes: 2 additions & 1 deletion doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1033,7 +1033,8 @@ For the Zarr version 3 format, ``_FillValue`` and ```fill_value`` are decoupled.
So you can set ``fill_value`` in ``encoding`` as usual.

Note that at read-time, you can control whether ``_FillValue`` is masked using the
``mask_and_scale`` kwarg.
``mask_and_scale`` kwarg; and whether Zarr's ``fill_value`` is treated as synonymous
with ``_FillValue`` using the ``use_zarr_fill_value_as_mask`` kwarg to :py:func:`xarray.open_zarr`.


.. _io.kerchunk:
Expand Down
8 changes: 7 additions & 1 deletion xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -3369,7 +3369,13 @@ def test_zarr_fill_value_setting(self, dtype):
# for floats, Xarray inserts a default `np.nan`
expected.foo.attrs["_FillValue"] = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this assume that np.nan is serializable to the attributes field using the same JSON encoding that we use for the fill_value field? Because the spec doesn't cover how scalars should be encoded in attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think so:

xarray/xarray/backends/zarr.py

Lines 1150 to 1155 in cdebbf5

fill_value = None
if "_FillValue" in attrs:
# replace with encoded fill value
fv = attrs.pop("_FillValue")
if fv is not None:
attrs["_FillValue"] = FillValueCoder.encode(fv, dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the zarr spec only defines a special JSON encoding for nan in the specific context of the fill_value field, I would recommend handling the JSON serialization explicitly here.

Also, In zarr-developers/zarr-python#2874 I remove the behavior where all numpy scalars get special serialization to JSON (instead, the dtype object is responsible for that). But it seems like we will break xarray if we move forward with that change.

Copy link
Contributor

@d-v-b d-v-b Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked what zarr v3 does today -- although we apply special JSON encoding to any nan, +infinity, -inifinity when writing JSON (e.g., {"foo": np.nan} -> {"foo": "NaN"}), we only apply special decoding when reading the fill value. Special nan values in attributes will not get special decoding. Is xarray basically handling that special decoding here? If so, we don't have anything to worry about w.r.t. the new dtype stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes looks like Ryan added base64 encode/decode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could take that custom attribute encoding decoding stuff out of Xarray and replace it with @d-v-b's new ZarrDType machinery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have stand-alone functions that handle this, but they are currently not public API. Would be happy to make them public though!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("this" meaning "convert special floats to / from JSON")


open_kwargs = {"mask_and_scale": False, "consolidated": False}
# turn off all decoding so we see what Zarr returns to us.
# Since chunks, are not written, we should receive on `fill_value`
open_kwargs = {
"mask_and_scale": False,
"consolidated": False,
"use_zarr_fill_value_as_mask": False,
}
save_kwargs = dict(compute=False, consolidated=False)
with self.roundtrip(
ds,
Expand Down
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy