Skip to content

Prevent pydap (dap4) to change string arrays to unicode type (testing). Fixed upstream #10482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 2, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
force string array elements to be dtype=S for testing, avoid unicode
  • Loading branch information
Mikejmnez committed Jul 1, 2025
commit 6be02af6cd07ce9059a9b545592269107a55b473
22 changes: 12 additions & 10 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -5425,7 +5425,9 @@ def convert_to_pydap_dataset(self, original):

ds = DatasetType("bears", **original.attrs)
for key, var in original.data_vars.items():
ds[key] = BaseType(key, var.values, dims=var.dims, **var.attrs)
ds[key] = BaseType(
key, var.values, dtype=var.values.dtype.kind, dims=var.dims, **var.attrs
)
# check all dims are stored in ds
for d in original.coords:
ds[d] = BaseType(d, original[d].values, dims=(d,), **original[d].attrs)
Expand All @@ -5437,9 +5439,9 @@ def create_datasets(self, **kwargs):
# print("QQ0:", expected["bears"].load())
pydap_ds = self.convert_to_pydap_dataset(expected)
actual = open_dataset(PydapDataStore(pydap_ds))
if Version(np.__version__) < Version("2.3.0"):
# netcdf converts string to byte not unicode
expected["bears"] = expected["bears"].astype(str)
# netcdf converts string to byte not unicode
# fixed in pydap 3.5.6. https://github.com/pydap/pydap/issues/510
actual["bears"].values = actual["bears"].values.astype("S")
yield actual, expected

def test_cmp_local_file(self) -> None:
Expand Down Expand Up @@ -5483,9 +5485,6 @@ def test_compatible_to_netcdf(self) -> None:
with create_tmp_file() as tmp_file:
actual.to_netcdf(tmp_file)
with open_dataset(tmp_file) as actual2:
if Version(np.__version__) < Version("2.3.0"):
# netcdf converts string to byte not unicode
actual2["bears"] = actual2["bears"].astype(str)
assert_equal(actual2, expected)

@requires_dask
Expand All @@ -5503,9 +5502,11 @@ def create_dap2_datasets(self, **kwargs):
# in pydap 3.5.0, urls defaults to dap2.
url = "http://test.opendap.org/opendap/data/nc/bears.nc"
actual = open_dataset(url, engine="pydap", **kwargs)
# pydap <3.5.6 converts to unicode dtype=|U. Not what
# xarray expects. Thus force to bytes dtype. pydap >=3.5.6
# does not convert to unicode. https://github.com/pydap/pydap/issues/510
actual["bears"].values = actual["bears"].values.astype("S")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand: pydap is changing behaviour and we are enforcing that new behaviour always?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding, too.

Copy link
Contributor Author

@Mikejmnez Mikejmnez Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sort of... It was more like pydap was always enforcing a change from "S" --> "U", which was not jiving with xarray. Beginning with the new version it will no longer enforce the change to unicode. Once the minimal pydap version moves up from 3.5.0 --> 3.5.6 this line won't be necessary.

with open_example_dataset("bears.nc") as expected:
# workaround to restore string which is converted to byte
expected["bears"] = expected["bears"].astype(str)
yield actual, expected

def output_grid_deprecation_warning_dap2dataset(self):
Expand All @@ -5518,7 +5519,8 @@ def create_dap4_dataset(self, **kwargs):
actual = open_dataset(url, engine="pydap", **kwargs)
with open_example_dataset("bears.nc") as expected:
# workaround to restore string which is converted to byte
expected["bears"] = expected["bears"].astype(str)
# only needed for pydap <3.5.6 https://github.com/pydap/pydap/issues/510
expected["bears"].values = expected["bears"].values.astype("S")
yield actual, expected

def test_session(self) -> None:
Expand Down
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy