-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does gridded write compliant UGRID netcdfs? #65
Comments
The intent is to produce UGRID compliant files, yes. But that code hasn't had any attention in a long time -- since before UGRID hit 1.0. PRs accepted:-) As to what you've noticed, the names of the dimensions and variables are arbitrary. So the only issue here may be the conventions attribute. But if you notice other issues, please let us know. |
@ChrisBarker-NOAA I'm happy to put together a PR. I'll try adding the conventions attribute and see if RGFGRID can read that first. Since I'm not very familiar with the UGRID standard I might need some guidance on some parts of the spec if more indepth changes are required. |
thanks! |
@ChrisBarker-NOAA One my meshes challenges some of the assumptions made by the GridU class. The GridU class doesn't seem to like meshes with mixed faces (you end up with a masked array which breaks the edge and face handling logic in the class). In refactoring some of the methods, I think it would be better to store edges and faces as sets instead of numpy arrays. This would avoid a lot of the issues I'm having with ragged arrays. I can provide methods for converting edges and faces to numpy arrays. For ordering purposes, edge and face vertices would be stored in sorted order (though that may not necessarily mean CW or CCW). |
hmm. You are quite right, we were vaguely thinking of supporting mixed faces, but never actually tested anything. So we welcome help with that. However, I'm wary of going to a non- numpy solution -- numpy arrays buy you a lot. The UGRID netcdf standard specifies using a flag value (or mask) to handle mixed faces, so we just assumed we'd do that int he UGRid class as well. But you are right, numpy doesn't have a ragged array concept built in in any way, but that's what we really want here. My first choice would be to make a ragged array class -- I"ve done some work on that in the past, but never finished it, but that would buy us something that would act like a numpy array as much as possible. Thinking out loud, for this use case, maybe not a true ragged array, but rather, a kind of custom masked array for this use case. That is, the information would be stored as it is now, with a mask or flagged value for the "null" nodes. But when you accessed an element, you'd get a correct sized array back. Lots of details to be worked out of course. Another option would be to make an array-like object with another data structure underneath actually storing the information. But why a set? sets are optimized for fast membership testing, but they are unordered, which I think would make it a challenge for this use case. That is, no way to find out what the Nth face is. Anyway -- you've thought about this more than I have, so I may be thinking about it incorrectly. I think the way to proceed is to get a manageable sized test case, and some tests, and then you can prototype a solution and see how it goes. |
BTW: with regard to UGRID compliance, I think we're not far off. However, what we don't have is a way to maintain / specify dimension or variable names, which maybe we should. That is, if you load a grid from a NetCDF file, and save it out again, it would be nice if all the names weren't changed on you :-) I believe the one name it is currently supposed to preserve is the name of the mesh -- but that's it. |
@ChrisBarker-NOAA I realize it might be a big step to move away from numpy. I was kinda thinking aloud too. I'm still in the process of learning how meshes are stored and computed on. As I've poked around some more, I've realized order in very important. Reading the spec, I noticed this:
Perhaps, the fill value route is doable. It would be nice if the heavy looping could be ported to cython to minimize the performance hit. I don't know of any array libraries that allow ragged arrays off the top of my head. Maybe a list of arrays? |
Regarding UGRID compatibility, the following produces files that load in QGIS. def save_ugrid(filepath, grid):
mesh_name = grid.mesh_name
with netCDF4.Dataset(filepath, 'w') as ds:
ds.Conventions = "CF-1.6 UGRID-1.0"
# create dimensions
node_dim = ds.createDimension(f"{mesh_name}_num_node", len(grid.nodes))
ds.createDimension("two", 2)
mesh = ds.createVariable(mesh_name, IND_DT, dimensions=())
mesh.cf_role = "mesh_topology"
mesh.long_name = "Topology data of 2D unstructured mesh"
mesh.topology_dimension = 2
mesh.node_coordinates = "{0}_node_lon {0}_node_lat".format(mesh_name)
mesh.node_dimension = node_dim.name
if grid.edges is not None:
ds.createDimension(f"{mesh_name}_num_edge", grid.edges.shape[0])
mesh.edge_node_connectivity = f"{mesh_name}_edge_nodes"
#if grid.edge_coordinates is not None:
# mesh.edge_coordinates = " ".join((_edge_lon, _edge_lat))
_dims = (f"{mesh_name}_num_edge", "two")
edge_nodes = ds.createVariable(f"{mesh_name}_edge_nodes", IND_DT, dimensions=_dims, fill_value=999999)
edge_nodes[:] = grid.edges
edge_nodes.cf_role = "edge_node_connectivity"
edge_nodes.long_name = ("Maps every edge to the two "
"nodes that it connects.")
edge_nodes.start_index = IND_DT(0)
edge_nodes.mesh = mesh_name
edge_nodes.location = 'edge'
if grid.faces is not None:
faces_dim = ds.createDimension(f"{mesh_name}_num_face", grid.faces.shape[0])
faces_nodes_dim = ds.createDimension(f"{mesh_name}_num_vertices", grid.faces.shape[1])
mesh.max_face_nodes_dimension = faces_nodes_dim.name
#if grid.face_coordinates is not None:
# mesh.edge_coordinates = "{0}_face_lon {0}_face_lat".format(mesh_name)
_dims = (f"{mesh_name}_num_face", f"{mesh_name}_num_vertices")
face_nodes = ds.createVariable(f"{mesh_name}_face_nodes", IND_DT, dimensions=_dims, fill_value=999999)
mesh.face_node_connectivity = face_nodes.name
face_nodes[:] = grid.faces
face_nodes.cf_role = "face_node_connectivity"
face_nodes.long_name = ("Maps every triangular face to "
"its three corner nodes.")
face_nodes.start_index = IND_DT(0)
if grid.boundaries is not None:
ds.createDimension(f"{mesh_name}_num_boundary", grid.boundaries.shape[0])
mesh.boundary_node_connectivity = f"{mesh_name}_boundary_nodes"
_dims = (f"{mesh_name}_num_boundary", "two")
boundary_nodes = ds.createVariable(f"{mesh_name}_boundary_nodes", IND_DT, dimensions=_dims, fill_value=999999)
boundary_nodes[:] = grid.boundaries
boundary_nodes.cf_role = "boundary_node_connectivity"
boundary_nodes.long_name = ("Maps every boundary segment to "
"the two nodes that it connects.")
boundary_nodes.start_index = IND_DT(0)
if grid.face_edge_connectivity is not None:
mesh.face_edge_connectivity = f"{mesh_name}_face_edges"
if grid.face_face_connectivity is not None:
mesh.face_face_connectivity = f"{mesh_name}_face_links"
_lonlats = {'lon': {'long_name': 'longitude', 'axis': 0, 'units': 'degrees_east'},
'lat': {'long_name': 'latitude', 'axis': 1, 'units': 'degrees_north'}}
for location in ('face', 'edge', 'boundary'):
loc = f"{location}_coordinates"
if getattr(grid, loc, None) is not None:
coord_vars = []
for axis, _data in _lonlats.items():
name = f"{mesh_name}_{location}_{axis}"
coord_vars.append(name)
_dims = (f"{mesh_name}_num_{location}",)
var = ds.createVariable(name, NODE_DT, dimensions=_dims)
var[:] = getattr(grid, loc)[:, _data['axis']]
var.standard_name = _data['long_name']
var.units = _data['units']
var.long_name = f"Characteristics {var.standard_name} of 2D mesh {location}"
var.mesh = mesh_name
var.location = location
#if location in {'face', 'edge'}:
# mesh.setncattr(f"{location}_coordinates", " ".join(coord_vars))
_dims = (f"{mesh_name}_num_node",)
node_lon = ds.createVariable(f"{mesh_name}_node_lon", grid.nodes.dtype, dimensions=_dims)
node_lon[:] = grid.nodes[:, 0]
node_lon.standard_name = "longitude"
node_lon.long_name = "Longitude of 2D mesh nodes."
node_lon.units = "degrees_east"
node_lon.mesh = mesh.name
node_lon.location = "node"
node_lat = ds.createVariable(f"{mesh_name}_node_lat", grid.nodes.dtype, dimensions=_dims)
node_lat[:] = grid.nodes[:, 1]
node_lat.standard_name = "latitude"
node_lat.long_name = "Latitude of 2D mesh nodes."
node_lat.units = "degrees_north"
node_lat.mesh = mesh.name
node_lat.location = "node"
print("Saved to", filepath) For some reason adding the face and edge coordinates breaks things, and I'm not sure why. |
It looks like QGIS is using MDAL; (https://github.com/lutraconsulting/MDAL) which says it supports UGRID. But if it can't read a file with face and edge coordinates, then that's a bug in MDAL :-( |
At least I get QGIS to read and render the mesh. RGFGRID (from Deltares) doesn't load the file correctly either. Perhaps RGFGRID might be expecting specifically named dimensions/variables. |
@ChrisBarker-NOAA I think I cracked the case on UGRID compatibility. I'm working on a way to preserve names from a netcdf file. The mesh variable has defined keys like Example API: schema = Schema.from_nc(<filename>)
schema['edge_dimension'] # returns the name of the file edge dimension
schema['dimensions'] # a dictionary of objects representing the dimensions (not the netCDF4.Dimension objects). Stores name and size
schema['variables'] # a dictionary of variable info (except for the mesh variable which is stored in the top level dictionary
# Some helper methods to retrieve name of dimensions
schema['edge_dimension'] # results in something like mesh_num_edge
schema.get_dimension('mesh_num_edge') # results in 'edge_dimension'
# Optionally pass in grid to evaluate to a value
schema.get_dimension('mesh_num_edge', mesh=grid) # evaluates getattr(grid, schema.get_dimension('mesh_num_edge'))
# With a coordinate variable
schema.get_variable('mesh_node_x') # results in something like ('node_coordinates', 0)
schema.get_variable('mesh_node_x', mesh=grid) # evaluates getattr(grid, 'node_coordinates')[:,0] When writing to a file the anticipated use would be: for d in schema['dimensions']:
ds.createDimension(d, schema.get_dimension(d, mesh=grid))
for name, v in schema['variables'].items():
ds.createVariable(name, v['dtype'], dimensions=v['dimensions'])
... For creating a file from scratch, a function could be written to generate names for each entity. Schemas could be validated to ensure that dimensions and variables are defined and names match. |
I will work on mixed meshes in another issue/PR. |
hmm, I'm wondering about the amount of nesting here -- but you're probably right, keeping the names of everything in one object probably makes sense. But where to put it? I think an attribute of the UGRid object makes sense. When a UGRid is read from a netcd file, the "schema" object would get populated. When a UGrid object was created "from scratch" -- the names object would get populated with default names -- or they could be hand-added. Any other way to create a UGRid object would be responsible for populating the names object. Have you tried putting any of this code in the existing code yet? |
@ChrisBarker-NOAA Could elaborate on the nesting? Is it the amount of nesting in the schema itself? I basically tried to mirror the structure of the existing UGRID netcdf, but promoting the mesh variable attributes to the top level as I felt they were the most critical details. If you have ideas on a better layout, I'm happy to hear them. I have most of it coded already I'll put in a PR. I was finishing up my proof of concept when I discovered the missing dimension attributes. I'll omit the integration bit and hopefully that will become clearer with some actual code to review. |
By nesting, I mean do we have:
or
or
right now, I'm thinking:
which I think is more or less what you're doing. so mesh_name is the only top-level attribute, and all the other names are only one level deep. |
An example Schema object:
How this schema object should relate to the UGrid object is unclear to me. One possibility is to attach it via an attribute. |
Got it. my thoughts:
Maybe make it a python object at the top level, e.g.
That would be consistent with the rest of the UGrid object, but,now that I think about it, might make for more boilerplate code :-) |
What I'm trying to achieve is subtle, but in my opinion, very useful. Perhaps I have failed to adequately explain my objective here. |
Thanks for the clarifications. A few thoughts:
That may be been the way to go in the first place, but I really don't want to change those names in the UGrid object at this point -- and adding aliases simply confuses things, and I'm not sure we buy much. But if it's only one name, maybe ....
yes, that is very useful -- see my notes on the PR -- that's required to read a netCDF file anyway. Currently, the rading code does all the introspection while reading the file, which is less than ideal.
I'm not sure that it's very useful to ask the question in that direction -- the point of the What is most useful is A) "which variable in this netcdf file performs this particular function (i.e. is mapped to a UGrid attribute) And the second one is: And is is what could use some major refactoring, and B is a new feature. But once you have that, yes you could ask what role a particular variable name in a netcdf file plays -- but that shouldn't be the priority. BTW: I may be remembering wrong here, but I think the |
After manipulating the mesh in gridded, saving the resulting grid doesn't seem to produce a valid UGRID formatted netcdf. RGFGRID doesn't seem to know what to do with the file.
I noticed that a netcdf output by RGFGRID (in UGRID format) look like:
whereas the file output by gridded looks like:
Noted differences are:
Does gridded produce compliant UGRID netcdf files? Reading the documentation, it's never explicitly stated that the outputs conform to any standard. I simply expected the GridU save method to output something that was UGRID.
The text was updated successfully, but these errors were encountered: