This file documents NCO, a collection of utilities to manipulate and analyze netCDF files.
Copyright © 1995–2025 Charlie Zender
This is the first edition of the NCO User Guide,
and is consistent with version 2 of texinfo.tex.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. The license is available online at http://www.gnu.org/copyleft/fdl.html
The original author of this software, Charlie Zender, wants to improve it
with the help of your suggestions, improvements, bug-reports, and patches.
Charlie Zender <surname at uci dot edu> (yes, my surname is zender)
3200 Croul Hall
Department of Earth System Science
University of California, Irvine
Irvine, CA 92697-3100
Note to readers of the NCO User Guide in HTML format:
The NCO User Guide in PDF format
(also on SourceForge)
contains the complete NCO documentation.
|
The netCDF Operators, or NCO, are a suite of programs known as
operators.
The operators facilitate manipulation and analysis of data stored in the
self-describing netCDF format, available from
(http://www.unidata.ucar.edu/software/netcdf).
Each NCO operator (e.g., ncks
) takes netCDF input
file(s), performs an operation (e.g., averaging, hyperslabbing, or
renaming), and outputs a processed netCDF file.
Although most users of netCDF data are involved in scientific research,
these data formats, and thus NCO, are generic and are equally
useful in fields from agriculture to zoology.
The NCO User Guide illustrates NCO use with
examples from the field of climate modeling and analysis.
The NCO homepage is http://nco.sf.net, and the
source code is maintained at http://github.com/nco/nco.
This documentation is for NCO version 5.3.1-alpha02. It was last updated 1 January 2025. Corrections, additions, and rewrites of this documentation are gratefully welcome.
Enjoy,
Charlie Zender
ncap2
netCDF Arithmetic Processor
ncap2
statementssort
methodsncatted
netCDF Attribute Editorncbo
netCDF Binary Operatorncchecker
netCDF Compliance Checkerncclimo
netCDF Climatology Generator
ncclimo
do?ncecat
netCDF Ensemble Concatenatornces
netCDF Ensemble Statisticsncflint
netCDF File Interpolatorncks
netCDF Kitchen Sink
ncpdq
netCDF Permute Dimensions Quickly
ncra
netCDF Record Averagerncrcat
netCDF Record Concatenatorncremap
netCDF Remapper
ncrename
netCDF Renamerncwa
netCDF Weighted Averager
NCO is the result of software needs that arose while I worked
on projects funded by NCAR, NASA, and ARM.
Thinking they might prove useful as tools or templates to others,
it is my pleasure to provide them freely to the scientific community.
Many users (most of whom I have never met) have encouraged the
development of NCO.
Thanks espcially to Jan Polcher, Keith Lindsay, Arlindo da Silva,
John Sheldon, and William Weibel for stimulating suggestions and
correspondence.
Your encouragment motivated me to complete the NCO User Guide.
So if you like NCO, send me a note!
I should mention that NCO is not connected to or
officially endorsed by Unidata, ACD, ASP,
CGD, or Nike.
Charlie Zender
May 1997
Boulder, Colorado
Major feature improvements entitle me to write another Foreword.
In the last five years a lot of work has been done to refine
NCO.
NCO is now an open source project and appears to be much
healthier for it.
The list of illustrious institutions that do not endorse NCO
continues to grow, and now includes UCI.
Charlie Zender
October 2000
Irvine, California
The most remarkable advances in NCO capabilities in the last
few years are due to contributions from the Open Source community.
Especially noteworthy are the contributions of Henry Butowsky and Rorik
Peterson.
Charlie Zender
January 2003
Irvine, California
NCO was generously supported from 2004–2008 by US
National Science Foundation (NSF) grant
IIS-0431203.
This support allowed me to maintain and extend core NCO code,
and others to advance NCO in new directions:
Gayathri Venkitachalam helped implement MPI;
Harry Mangalam improved regression testing and benchmarking;
Daniel Wang developed the server-side capability, SWAMP;
and Henry Butowsky, a long-time contributor, developed ncap2
.
This support also led NCO to debut in professional journals
and meetings.
The personal and professional contacts made during this evolution have
been immensely rewarding.
Charlie Zender
March 2008
Grenoble, France
The end of the NSF SEI grant in August, 2008 curtailed
NCO development.
Fortunately we could justify supporting Henry Butowsky on other research
grants until May, 2010 while he developed the key ncap2
features used in our climate research.
And recently the NASA ACCESS program commenced
funding us to support netCDF4 group functionality.
Thus NCO will grow and evade bit-rot for the foreseeable
future.
I continue to receive with gratitude the thanks of NCO users at nearly every scientific meeting I attend. People introduce themselves, shake my hand and extol NCO, often effusively, while I grin in stupid embarassment. These exchanges lighten me like anti-gravity. Sometimes I daydream how many hours NCO has turned from grunt work to productive research for researchers world-wide, or from research into early happy-hours. It’s a cool feeling.
Charlie Zender
April, 2012
Irvine, California
The NASA ACCESS 2011 program generously supported (Cooperative Agreement NNX12AF48A) NCO from 2012–2014. This allowed us to produce the first iteration of a Group-oriented Data Analysis and Distribution (GODAD) software ecosystem. Shifting more geoscience data analysis to GODAD is a long-term plan. Then the NASA ACCESS 2013 program agreed to support (Cooperative Agreement NNX14AH55A) NCO from 2014–2016. This support permits us to implement support for Swath-like Data (SLD). Most recently, the DOE has funded me to implement NCO re-gridding and parallelization in support of their ACME program. After many years of crafting NCO as an after-hours hobby, I finally have the cushion necessary to give it some real attention. And I’m looking forward to this next, and most intense yet, phase of NCO development.
Charlie Zender
June, 2015
Irvine, California
The DOE Energy Exascale Earth System Model (E3SM) project (formerly ACME) has generously supported NCO development for the past four years. Supporting NCO for a mission-driven, high-performance climate model development effort has brought unprecedented challenges and opportunities. After so many years of staid progress, the recent development speed has been both exhilirating and terrifying.
Charlie Zender
May, 2019
Laguna Beach, California
The DOE E3SM project has supported NCO development and maintenance since 2015. This is an eternity in the world of research funding! Their reliable support has enabled us to add cutting-edge features including quantization, vertical interpolation, and support for multiple regridding weight-generators. Recently NSF supported us to enable user-friendly support for modern compression algorithms that can make geoscience data analysis greener by reducing dataset size, and thereby storage, power, and associated greenhouse gas emissions. I am grateful for this this agency support that inspires me to create new features that help my amazing colleagues pursue their scientific ideas.
Charlie Zender
July, 2022
Laguna Beach, California
This manual describes NCO, which stands for netCDF Operators.
NCO is a suite of programs known as operators.
Each operator is a standalone, command line program executed at the
shell-level like, e.g., ls
or mkdir
.
The operators take netCDF files (including HDF5 files
constructed using the netCDF API) as input, perform an
operation (e.g., averaging or hyperslabbing), and produce a netCDF file
as output.
The operators are primarily designed to aid manipulation and analysis of
data.
The examples in this documentation are typical applications of the
operators for processing climate model output.
This stems from their origin, though the operators are as general as
netCDF itself.
The complete NCO source distribution is currently distributed
as a compressed tarfile from
http://sf.net/projects/nco
and from
http://dust.ess.uci.edu/nco/nco.tar.gz.
The compressed tarfile must be uncompressed and untarred before building
NCO.
Uncompress the file with ‘gunzip nco.tar.gz’.
Extract the source files from the resulting tarfile with ‘tar -xvf
nco.tar’.
GNU tar
lets you perform both operations in one step
with ‘tar -xvzf nco.tar.gz’.
The documentation for NCO is called the
NCO User Guide.
The User Guide is available in PDF, Postscript,
HTML, DVI, TeXinfo, and Info formats.
These formats are included in the source distribution in the files
nco.pdf, nco.ps, nco.html, nco.dvi,
nco.texi, and nco.info*, respectively.
All the documentation descends from a single source file,
nco.texi
1.
Hence the documentation in every format is very similar.
However, some of the complex mathematical expressions needed to describe
ncwa
can only be displayed in DVI, Postscript, and
PDF formats.
A complete list of papers and publications on/about NCO is available on the NCO homepage. Most of these are freely available. The primary refereed publications are ZeM06 and Zen08. These contain copyright restrictions which limit their redistribution, but they are freely available in preprint form from the NCO.
If you want to quickly see what the latest improvements in NCO are (without downloading the entire source distribution), visit the NCO homepage at http://nco.sf.net. The HTML version of the User Guide is also available online through the World Wide Web at URL http://nco.sf.net/nco.html. To build and use NCO, you must have netCDF installed. The netCDF homepage is http://www.unidata.ucar.edu/software/netcdf.
New NCO releases are announced on the netCDF list
and on the nco-announce
mailing list
http://lists.sf.net/mailman/listinfo/nco-announce.
Detailed instructions about how to download the newest version, and how to complie source code, as well as a FAQ and descriptions of Known Problems etc. are on our homepage (http://nco.sf.net/).
There are twelve operators in the current version (5.3.1-alpha02).
The function of each is explained in Reference Manual.
Many of the tasks that NCO can accomplish are described during
the explanation of common NCO Features (see Shared Features).
More specific use examples for each operator can be seen by visiting the
operator-specific examples in the Reference Manual.
These can be found directly by prepending the operator name with the
xmp_
tag, e.g., http://nco.sf.net/nco.html#xmp_ncks.
Also, users can type the operator name on the shell command line to
see all the available options, or type, e.g., ‘man ncks’ to see
a help man-page.
NCO is a command-line language. You may either use an operator after the prompt (e.g., ‘$’ here), like,
$ operator
[options] input [output]
or write all commands lines into a shell script, as in the CMIP5 Example (see CMIP5 Example).
If you are new to NCO, the Quick Start (see Quick Start) shows simple examples about how to use NCO on different kinds of data files. More detailed “real-world” examples are in the CMIP5 Example. The Index is presents multiple keyword entries for the same subject. If these resources do not help enough, please see Help Requests and Bug Reports.
In its time on Earth, NCO has been successfully ported and tested on so many 32- and 64-bit platforms that if we did not write them down here we would forget their names: IBM AIX 4.x, 5.x, FreeBSD 4.x, GNU/Linux 2.x, LinuxPPC, LinuxAlpha, LinuxARM, LinuxSparc64, LinuxAMD64, SGI IRIX 5.x and 6.x, MacOS X 10.x, DEC OSF, NEC Super-UX 10.x, Sun SunOS 4.1.x, Solaris 2.x, Cray UNICOS 8.x–10.x, and Microsoft Windows (95, 98, NT, 2000, XP, Vista, 7, 8, 10). If you port the code to a new operating system, please send me a note and any patches you required.
The major prerequisite for installing NCO on a particular
platform is the successful, prior installation of the netCDF library
(and, as of 2003, the UDUnits library).
Unidata has shown a commitment to maintaining netCDF and UDUnits on all
popular UNIX platforms, and is moving towards full support for
the Microsoft Windows operating system (OS).
Given this, the only difficulty in implementing NCO on a
particular platform is standardization of various C-language API
system calls.
NCO code is tested for ANSI compliance by
compiling with C99 compilers including those from
GNU (‘gcc -std=c99 -pedantic -D_BSD_SOURCE -D_POSIX_SOURCE’ -Wall)
2,
Comeau Computing (‘como --c99’),
Cray (‘cc’),
HP/Compaq/DEC (‘cc’),
IBM (‘xlc -c -qlanglvl=extc99’),
Intel (‘icc -std=c99’),
LLVM (‘clang’),
NEC (‘cc’),
PathScale (QLogic) (‘pathcc -std=c99’),
PGI (‘pgcc -c9x’),
SGI (‘cc -c99’),
and
Sun (‘cc’).
NCO (all commands and the libnco
library) and
the C++ interface to netCDF (called libnco_c++
) comply with
the ISO C++ standards as implemented by
Comeau Computing (‘como’),
Cray (‘CC’),
GNU (‘g++ -Wall’),
HP/Compaq/DEC (‘cxx’),
IBM (‘xlC’),
Intel (‘icc’),
Microsoft (‘MVS’),
NEC (‘c++’),
PathScale (Qlogic) (‘pathCC’),
PGI (‘pgCC’),
SGI (‘CC -LANG:std’),
and
Sun (‘CC -LANG:std’).
See nco/bld/Makefile and nco/src/nco_c++/Makefile.old for
more details and exact settings.
Until recently (and not even yet), ANSI-compliant has meant
compliance with the 1989 ISO C-standard, usually called C89 (with
minor revisions made in 1994 and 1995).
C89 lacks variable-size arrays, restricted pointers, some useful
printf
formats, and many mathematical special functions.
These are valuable features of C99, the 1999 ISO C-standard.
NCO is C99-compliant where possible and C89-compliant where
necessary.
Certain branches in the code are required to satisfy the native
SGI and SunOS C compilers, which are strictly ANSI
C89 compliant, and cannot benefit from C99 features.
However, C99 features are fully supported by modern AIX,
GNU, Intel, NEC, Solaris, and UNICOS
compilers.
NCO requires a C99-compliant compiler as of NCO
version 2.9.8, released in August, 2004.
The most time-intensive portion of NCO execution is spent in
arithmetic operations, e.g., multiplication, averaging, subtraction.
These operations were performed in Fortran by default until August,
1999.
This was a design decision based on the relative speed of Fortran-based
object code vs. C-based object code in late 1994.
C compiler vectorization capabilities have dramatically improved
since 1994.
We have accordingly replaced all Fortran subroutines with C functions.
This greatly simplifies the task of building NCO on nominally
unsupported platforms.
As of August 1999, NCO built entirely in C by default.
This allowed NCO to compile on any machine with an
ANSI C compiler.
In August 2004, the first C99 feature, the restrict
type
qualifier, entered NCO in version 2.9.8.
C compilers can obtain better performance with C99 restricted
pointers since they inform the compiler when it may make Fortran-like
assumptions regarding pointer contents alteration.
Subsequently, NCO requires a C99 compiler to build correctly
3.
In January 2009, NCO version 3.9.6 was the first to link
to the GNU Scientific Library (GSL).
GSL must be version 1.4 or later.
NCO, in particular ncap2
, uses the GSL
special function library to evaluate geoscience-relevant mathematics
such as Bessel functions, Legendre polynomials, and incomplete gamma
functions (see GSL special functions).
In June 2005, NCO version 3.0.1 began to take advantage
of C99 mathematical special functions.
These include the standarized gamma function (called tgamma()
for “true gamma”).
NCO automagically takes advantage of some GNU
Compiler Collection (GCC) extensions to ANSI C.
As of July 2000 and NCO version 1.2, NCO no
longer performs arithmetic operations in Fortran.
We decided to sacrifice executable speed for code maintainability.
Since no objective statistics were ever performed to quantify
the difference in speed between the Fortran and C code,
the performance penalty incurred by this decision is unknown.
Supporting Fortran involves maintaining two sets of routines for every
arithmetic operation.
The USE_FORTRAN_ARITHMETIC
flag is still retained in the
Makefile.
The file containing the Fortran code, nco_fortran.F, has been
deprecated but a volunteer (Dr. Frankenstein?) could resurrect it.
If you would like to volunteer to maintain nco_fortran.F please
contact me.
NCO has been successfully ported and tested on most Microsoft Windows operating systems including: XP SP2/Vista/7/10. Support is provided for compiling either native Windows executables, using the Microsoft Visual Studio Compiler (MVSC), or with Cygwin, the UNIX-emulating compatibility layer with the GNU toolchain. The switches necessary to accomplish both are included in the standard distribution of NCO.
With Microsoft Visual Studio compiler, one must build NCO with C++ since MVSC does not support C99. Support for Qt, a convenient integrated development environment, was deprecated in 2017. As of NCO version 4.6.9 (September, 2017) please build native Windows executables with CMake:
cd ~/nco/cmake cmake .. -DCMAKE_INSTALL_PREFIX=${HOME} make install
The file nco/cmake/build.bat shows how deal with various path issues.
As of NCO version 4.7.1 (December, 2017) the Conda package
for NCO is available from the conda-forge
channel on
all three smithies: Linux, MacOS, and Windows.
# Recommended install with Conda conda config --add channels conda-forge # Permananently add conda-forge conda install nco # Or, specify conda-forge explicitly as a one-off: conda install -c conda-forge nco
Using the freely available Cygwin (formerly gnu-win32) development
environment
4, the compilation process is very similar to
installing NCO on a UNIX system.
Set the PVM_ARCH
preprocessor token to WIN32
.
Note that defining WIN32
has the side effect of disabling
Internet features of NCO (see below).
NCO should now build like it does on UNIX.
The least portable section of the code is the use of standard
UNIX and Internet protocols (e.g., ftp
, rcp
,
scp
, sftp
, getuid
, gethostname
, and header
files <arpa/nameser.h> and
<resolv.h>).
Fortunately, these UNIX-y calls are only invoked by the single
NCO subroutine which is responsible for retrieving files
stored on remote systems (see Accessing Remote Files).
In order to support NCO on the Microsoft Windows platforms,
this single feature was disabled (on Windows OS only).
This was required by Cygwin 18.x—newer versions of Cygwin may
support these protocols (let me know if this is the case).
The NCO operators should behave identically on Windows and
UNIX platforms in all other respects.
NCO relies on a common set of underlying algorithms.
To minimize duplication of source code, multiple operators sometimes
share the same underlying source.
This is accomplished by symbolic links from a single underlying
executable program to one or more invoked executable names.
For example, nces
and ncrcat
are symbolically linked
to the ncra
executable.
The ncra
executable behaves slightly differently based on its
invocation name (i.e., ‘argv[0]’), which can be
nces
, ncra
, or ncrcat
.
Logically, these are three different operators that happen to share
the same executable.
For historical reasons, and to be more user friendly, multiple synonyms
(or pseudonyms) may refer to the same operator invoked with different
switches.
For example, ncdiff
is the same as ncbo
and
ncpack
is the same as ncpdq
.
We implement the symbolic links and synonyms by the executing the
following UNIX commands in the directory where the
NCO executables are installed.
ln -s -f ncbo ncdiff # ncbo --op_typ='-' ln -s -f ncra nces # ncra --pseudonym='nces' ln -s -f ncra ncrcat # ncra --pseudonym='ncrcat' ln -s -f ncbo ncadd # ncbo --op_typ='+' ln -s -f ncbo ncsubtract # ncbo --op_typ='-' ln -s -f ncbo ncmultiply # ncbo --op_typ='*' ln -s -f ncbo ncdivide # ncbo --op_typ='/' ln -s -f ncpdq ncpack # ncpdq ln -s -f ncpdq ncunpack # ncpdq --unpack # NB: Windows/Cygwin executable/link names have '.exe' suffix, e.g., ln -s -f ncbo.exe ncdiff.exe ...
The imputed command called by the link is given after the comment.
As can be seen, some these links impute the passing of a command line
argument to further modify the behavior of the underlying executable.
For example, ncdivide
is a pseudonym for
ncbo --op_typ='/'
.
Like all executables, the NCO operators can be built using dynamic
linking.
This reduces the size of the executable and can result in significant
performance enhancements on multiuser systems.
Unfortunately, if your library search path (usually the
LD_LIBRARY_PATH
environment variable) is not set correctly, or if
the system libraries have been moved, renamed, or deleted since
NCO was installed, it is possible NCO operators
will fail with a message that they cannot find a dynamically loaded (aka
shared object or ‘.so’) library.
This will produce a distinctive error message, such as
‘ld.so.1: /usr/local/bin/nces: fatal: libsunmath.so.1: can't
open file: errno=2’.
If you received an error message like this, ask your system
administrator to diagnose whether the library is truly missing
5, or whether you
simply need to alter your library search path.
As a final remedy, you may re-compile and install NCO with all
operators statically linked.
netCDF version 2 was released in 1993.
NCO (specifically ncks
) began soon after this in 1994.
netCDF 3.0 was released in 1996, and we were not exactly eager to
convert all code to the newer, less tested netCDF implementation.
One netCDF3 interface call (nc_inq_libvers
) was added to
NCO in January, 1998, to aid in maintainance and debugging.
In March, 2001, the final NCO conversion to netCDF3
was completed (coincidentally on the same day netCDF 3.5 was
released).
NCO versions 2.0 and higher are built with the
-DNO_NETCDF_2
flag to ensure no netCDF2 interface calls
are used.
However, the ability to compile NCO with only netCDF2
calls is worth maintaining because HDF version 4,
aka HDF4 or simply HDF,
6
(available from HDF)
supports only the netCDF2 library calls
(see http://hdfgroup.org/UG41r3_html/SDS_SD.fm12.html#47784).
There are two versions of HDF.
Currently HDF version 4.x supports the full netCDF2
API and thus NCO version 1.2.x.
If NCO version 1.2.x (or earlier) is built with only
netCDF2 calls then all NCO operators should work with
HDF4 files as well as netCDF files
7.
The preprocessor token NETCDF2_ONLY
exists
in NCO version 1.2.x to eliminate all netCDF3
calls.
Only versions of NCO numbered 1.2.x and earlier have this
capability.
HDF version 5 became available in 1999, but did not support netCDF (or, for that matter, Fortran) as of December 1999. By early 2001, HDF5 did support Fortran90. Thanks to an NSF-funded “harmonization” partnership, HDF began to fully support the netCDF3 read interface (which is employed by NCO 2.x and later). In 2004, Unidata and THG began a project to implement the HDF5 features necessary to support the netCDF API. NCO version 3.0.3 added support for reading/writing netCDF4-formatted HDF5 files in October, 2005. See File Formats and Conversion for more details.
HDF support for netCDF was completed with HDF5 version version 1.8 in 2007. The netCDF front-end that uses this HDF5 back-end was completed and released soon after as netCDF version 4. Download it from the netCDF4 website.
NCO version 3.9.0, released in May, 2007, added support for
all netCDF4 atomic data types except NC_STRING
.
Support for NC_STRING
, including ragged arrays of strings,
was finally added in version 3.9.9, released in June, 2009.
Support for additional netCDF4 features has been incremental.
We add one netCDF4 feature at a time.
You must build NCO with netCDF4 to obtain this support.
NCO supports many netCDF4 features including atomic data
types, Lempel-Ziv compression (deflation), chunking, and groups.
The new atomic data types are NC_UBYTE
, NC_USHORT
,
NC_UINT
, NC_INT64
, and NC_UINT64
.
Eight-byte integer support is an especially useful improvement from
netCDF3.
All NCO operators support these types, e.g., ncks
copies and prints them, ncra
averages them, and
ncap2
processes algebraic scripts with them.
ncks
prints compression information, if any, to screen.
NCO version 3.9.1 (June, 2007) added support for netCDF4 Lempel-Ziv deflation. Lempel-Ziv deflation is a lossless compression technique. See Deflation for more details.
NCO version 3.9.9 (June, 2009) added support for netCDF4
chunking in ncks
and ncecat
.
NCO version 4.0.4 (September, 2010) completed support for
netCDF4 chunking in the remaining operators.
See Chunking for more details.
NCO version 4.2.2 (October, 2012) added support for netCDF4
groups in ncks
and ncecat
.
Group support for these operators was complete (e.g., regular
expressions to select groups and Group Path Editing) as of
NCO version 4.2.6 (March, 2013).
See Group Path Editing for more details.
Group support for all other operators was finished in the
NCO version 4.3.x series completed in December, 2013.
Support for netCDF4 in the first arithmetic operator, ncbo
,
was introduced in NCO version 4.3.0 (March, 2013).
NCO version 4.3.1 (May, 2013) completed this support and
introduced the first example of automatic group broadcasting.
See ncbo
netCDF Binary Operator for more details.
netCDF4-enabled NCO handles netCDF3 files without change. In addition, it automagically handles netCDF4 (HDF5) files: If you feed NCO netCDF3 files, it produces netCDF3 output. If you feed NCO netCDF4 files, it produces netCDF4 output. Use the handy-dandy ‘-4’ switch to request netCDF4 output from netCDF3 input, i.e., to convert netCDF3 to netCDF4. See File Formats and Conversion for more details.
When linked to a netCDF library that was built with HDF4 support 8, NCO automatically supports reading HDF4 files and writing them as netCDF3/netCDF4/HDF5 files. NCO can only write through the netCDF API, which can only write netCDF3/netCDF4/HDF5 files. So NCO can read HDF4 files, perform manipulations and calculations, and then it must write the results in netCDF format.
NCO support for HDF4 has been quite functional since December, 2013. For best results install NCO versions 4.4.0 or later on top of netCDF versions 4.3.1 or later. Getting to this point has been an iterative effort where Unidata improved netCDF library capabilities in response to our requests. NCO versions 4.3.6 and earlier do not explicitly support HDF4, yet should work with HDF4 if compiled with a version of netCDF (4.3.2 or later?) that does not unexpectedly die when probing HDF4 files with standard netCDF calls. NCO versions 4.3.7–4.3.9 (October–December, 2013) use a special flag to circumvent netCDF HDF4 issues. The user must tell these versions of NCO that an input file is HDF4 format by using the ‘--hdf4’ switch.
When compiled with netCDF version 4.3.1 (20140116) or later,
NCO versions 4.4.0 (January, 2014) and later more gracefully
handle HDF4 files.
In particular, the ‘--hdf4’ switch is obsolete.
Current versions of NCO use netCDF to determine automatically
whether the underlying file is HDF4, and then take appropriate
precautions to avoid netCDF4 API calls that fail when applied
to HDF4 files (e.g., nc_inq_var_chunking()
,
nc_inq_var_deflate()
).
When compiled with netCDF version 4.3.2 (20140423) or earlier,
NCO will report that chunking and deflation properties of
HDF4 files as HDF4_UNKNOWN
, because determining
those properties was impossible.
When compiled with netCDF version 4.3.3-rc2 (20140925) or later,
NCO versions 4.4.6 (October, 2014) and later fully support
chunking and deflation features of HDF4 files.
Unfortunately, netCDF version 4.7.4 (20200327) introduced a regression
that breaks this functionality for all NCO versions until
we first noticed the regression a year later and implemented a
workaround to restore this functionality as of 4.9.9-alpha02
(20210327).
The ‘--hdf4’ switch is supported (for backwards compatibility) yet
redundant (i.e., does no harm) with current versions of NCO
and netCDF.
Converting HDF4 files to netCDF: Since NCO reads HDF4 files natively, it is now easy to convert HDF4 files to netCDF files directly, e.g.,
ncks fl.hdf fl.nc # Convert HDF4->netCDF4 (NCO 4.4.0+, netCDF 4.3.1+) ncks --hdf4 fl.hdf fl.nc # Convert HDF4->netCDF4 (NCO 4.3.7-4.3.9)
The most efficient and accurate way to convert HDF4 data to netCDF format is to convert to netCDF4 using NCO as above. Many HDF4 producers (NASA!) love to use netCDF4 types, e.g., unsigned bytes, so this procedure is the most typical. Conversion of HDF4 to netCDF4 as above suffices when the data will only be processed by NCO and other netCDF4-aware tools.
However, many tools are not fully netCDF4-aware, and so conversion to netCDF3 may be desirable. Obtaining any netCDF file from an HDF4 is easy:
ncks -3 fl.hdf fl.nc # HDF4->netCDF3 (NCO 4.4.0+, netCDF 4.3.1+) ncks -4 fl.hdf fl.nc # HDF4->netCDF4 (NCO 4.4.0+, netCDF 4.3.1+) ncks -6 fl.hdf fl.nc # HDF4->netCDF3 64-bit (NCO 4.4.0+, ...) ncks -7 -L 1 fl.hdf fl.nc # HDF4->netCDF4 classic (NCO 4.4.0+, ...) ncks --hdf4 -3 fl.hdf fl.nc # HDF4->netCDF3 (netCDF 4.3.0-) ncks --hdf4 -4 fl.hdf fl.nc # HDF4->netCDF4 (netCDF 4.3.0-) ncks --hdf4 -6 fl.hdf fl.nc # HDF4->netCDF3 64-bit (netCDF 4.3.0-) ncks --hdf4 -7 fl.hdf fl.nc # HDF4->netCDF4 classic (netCDF 4.3.0-)
As of NCO version 4.4.0 (January, 2014), these commands work even when the HDF4 file contains netCDF4 atomic types (e.g., unsigned bytes, 64-bit integers) because NCO can autoconvert everything to atomic types supported by netCDF3 9.
As of NCO version 4.4.4 (May, 2014) both
ncl_convert2nc
and NCO have built-in, automatic
workarounds to handle element names that contain characters that are
legal in HDF though are illegal in netCDF.
For example, slashes and leading special characters are are legal in
HDF and illegal in netCDF element (i.e., group,
variable, dimension, and attribute) names.
NCO converts these forbidden characters to underscores, and
retains the original names of variables in automatically produced
attributes named hdf_name
10.
Finally, in February 2014, we learned that the HDF group
has a project called H4CF
(described here)
whose goal is to make HDF4 files accessible to CF
tools and conventions.
Their project includes a tool named h4tonccf
that converts
HDF4 files to netCDF3 or netCDF4 files.
We are not yet sure what advantages or features h4tonccf
has
that are not in NCO, though we suspect both methods have their
own advantages. Corrections welcome.
As of 2012, netCDF4 is relatively stable software. Problems with netCDF4 and HDF libraries have mainly been fixed. Binary NCO distributions shipped as RPMs and as debs have used the netCDF4 library since 2010 and 2011, respectively.
One must often build NCO from source to obtain netCDF4
support.
Typically, one specifies the root of the netCDF4
installation directory. Do this with the NETCDF4_ROOT
variable.
Then use your preferred NCO build mechanism, e.g.,
export NETCDF4_ROOT=/usr/local/netcdf4 # Set netCDF4 location cd ~/nco;./configure --enable-netcdf4 # Configure mechanism -or- cd ~/nco/bld;./make NETCDF4=Y allinone # Old Makefile mechanism
We carefully track the netCDF4 releases, and keep the netCDF4 atomic type support and other features working. Our long term goal is to utilize more of the extensive new netCDF4 feature set. The next major netCDF4 feature we are likely to utilize is parallel I/O. We will enable this in the MPI netCDF operators.
We generally receive three categories of mail from users: help requests, bug reports, and feature requests. Notes saying the equivalent of “Hey, NCO continues to work great and it saves me more time everyday than it took to write this note” are a distant fourth.
There is a different protocol for each type of request. The preferred etiquette for all communications is via NCO Project Forums. Do not contact project members via personal e-mail unless your request comes with money or you have damaging information about our personal lives. Please use the Forums—they preserve a record of the questions and answers so that others can learn from our exchange. Also, since NCO is both volunteer-driven and government-funded, this record helps us provide program officers with information they need to evaluate our project.
Before posting to the NCO forums described below, you might first register your name and email address with SourceForge.net or else all of your postings will be attributed to nobody. Once registered you may choose to monitor any forum and to receive (or not) email when there are any postings including responses to your questions. We usually reply to the forum message, not to the original poster.
If you want us to include a new feature in NCO, please consider implementing the feature yourself and sending us the patch. If that is beyond your ken, then send a note to the NCO Discussion forum.
Read the manual before reporting a bug or posting a help request. Sending questions whose answers are not in the manual is the best way to motivate us to write more documentation. We would also like to accentuate the contrapositive of this statement. If you think you have found a real bug the most helpful thing you can do is simplify the problem to a manageable size and then report it. The first thing to do is to make sure you are running the latest publicly released version of NCO.
Once you have read the manual, if you are still unable to get NCO to perform a documented function, submit a help request. Follow the same procedure as described below for reporting bugs (after all, it might be a bug). That is, describe what you are trying to do, and include the complete commands (run with ‘-D 5’), error messages, and version of NCO (with ‘-r’). Some commands behave differently depending on the exact order and rank of dimensions in the pertinent variables. In such cases we need you to provide that metadata, e.g., the text results of ‘ncks -m’ on your input and/or output files. Post your help request to the NCO Help forum.
If you think you used the right command when NCO misbehaves, then you might have found a bug. Incorrect numerical answers are the highest priority. We usually fix those within one or two days. Core dumps and sementation violations receive lower priority. They are always fixed, eventually.
How do you simplify a problem that reveal a bug? Cut out extraneous variables, dimensions, and metadata from the offending files and re-run the command until it no longer breaks. Then back up one step and report the problem. Usually the file(s) will be very small, i.e., one variable with one or two small dimensions ought to suffice. Run the operator with ‘-r’ and then run the command with ‘-D 5’ to increase the verbosity of the debugging output. It is very important that your report contain the exact error messages and compile-time environment. Include a copy of your sample input file, or place one on a publicly accessible location, of the file(s). If you are sure it is a bug, post the full report to the NCO Project buglist. Otherwise post all the information to NCO Help forum.
Build failures count as bugs.
Our limited machine access means we cannot fix all build failures.
The information we need to diagnose, and often fix, build failures
are the three files output by GNU build tools,
nco.config.log.${GNU_TRP}.foo,
nco.configure.${GNU_TRP}.foo,
and nco.make.${GNU_TRP}.foo.
The file configure.eg shows how to produce these files.
Here ${GNU_TRP}
is the “GNU architecture triplet”,
the chip-vendor-OS string returned by config.guess.
Please send us your improvements to the examples supplied in
configure.eg.
The regressions archive at http://dust.ess.uci.edu/nco/rgr
contains the build output from our standard test systems.
You may find you can solve the build problem yourself by examining the
differences between these files and your own.
The main design goal is command line operators which perform useful, scriptable operations on netCDF files. Many scientists work with models and observations which produce too much data to analyze in tabular format. Thus, it is often natural to reduce and massage this raw or primary level data into summary, or second level data, e.g., temporal or spatial averages. These second level data may become the inputs to graphical and statistical packages, and are often more suitable for archival and dissemination to the scientific community. NCO performs a suite of operations useful in manipulating data from the primary to the second level state. Higher level interpretive languages (e.g., IDL, Yorick, Matlab, NCL, Perl, Python), and lower level compiled languages (e.g., C, Fortran) can always perform any task performed by NCO, but often with more overhead. NCO, on the other hand, is limited to a much smaller set of arithmetic and metadata operations than these full blown languages.
Another goal has been to implement enough command line switches so that frequently used sequences of these operators can be executed from a shell script or batch file. Finally, NCO was written to consume the absolute minimum amount of system memory required to perform a given job. The arithmetic operators are extremely efficient; their exact memory usage is detailed in Memory Requirements.
NCO was developed at NCAR to aid analysis and manipulation of datasets produced by General Circulation Models (GCMs). GCM datasets share many features with other gridded scientific datasets and so provide a useful paradigm for the explication of the NCO operator set. Examples in this manual use a GCM paradigm because latitude, longitude, time, temperature and other fields related to our natural environment are as easy to visualize for the layman as the expert.
NCO operators are designed to be reasonably fault tolerant, so
that a system failure or user-abort of the operation (e.g., with
C-c) does not cause loss of data.
The user-specified output-file is only created upon successful
completion of the operation
11.
This is accomplished by performing all operations in a temporary copy
of output-file.
The name of the temporary output file is constructed by appending
.pid<process ID>.<operator name>.tmp
to the
user-specified output-file name.
When the operator completes its task with no fatal errors, the temporary
output file is moved to the user-specified output-file.
This imbues the process with fault-tolerance since fatal error
(e.g., disk space fills up) affect only the temporary output file,
leaving the final output file not created if it did not already exist.
Note the construction of a temporary output file uses more disk space
than just overwriting existing files “in place” (because there may be
two copies of the same file on disk until the NCO operation
successfully concludes and the temporary output file overwrites the
existing output-file).
Also, note this feature increases the execution time of the operator
by approximately the time it takes to copy the output-file
12.
Finally, note this fault-tolerant feature allows the output-file
to be the same as the input-file without any danger of
“overlap”.
Over time many “power users” have requested a way to turn-off the fault-tolerance safety feature that automatically creates a temporary file. Often these users build and execute production data analysis scripts that are repeated frequently on large datasets. Obviating an extra file write can then conserve significant disk space and time. For this purpose NCO has, since version 4.2.1 in August, 2012, made configurable the controls over temporary file creation. The ‘--wrt_tmp_fl’ and equivalent ‘--write_tmp_fl’ switches ensure NCO writes output to an intermediate temporary file. This is and has always been the default behavior so there is currently no need to specify these switches. However, the default may change some day, especially since writing to RAM disks (see RAM disks) may some day become the default. The ‘--no_tmp_fl’ switch causes NCO to write directly to the final output file instead of to an intermediate temporary file. “Power users” may wish to invoke this switch to increase performance (i.e., reduce wallclock time) when manipulating large files. When eschewing temporary files, users may forsake the ability to have the same name for both output-file and input-file since, as described above, the temporary file prevented overlap issues. However, if the user creates the output file in RAM (see RAM disks) then it is still possible to have the same name for both output-file and input-file.
ncks in.nc out.nc # Default: create out.pid.tmp.nc then move to out.nc ncks --wrt_tmp_fl in.nc out.nc # Same as default ncks --no_tmp_fl in.nc out.nc # Create out.nc directly on disk ncks --no_tmp_fl in.nc in.nc # ERROR-prone! Overwrite in.nc with itself ncks --create_ram --no_tmp_fl in.nc in.nc # Create in RAM, write to disk ncks --open_ram --no_tmp_fl in.nc in.nc # Read into RAM, write to disk
There is no reason to expect the fourth example to work. The behavior of overwriting a file while reading from the same file is undefined, much as is the shell command ‘cat foo > foo’. Although it may “work” in some cases, it is unreliable. One way around this is to use ‘--create_ram’ so that the output file is not written to disk until the input file is closed, See RAM disks. However, as of 20130328, the behavior of the ‘--create_ram’ and ‘--open_ram’ examples has not been thoroughly tested.
The NCO authors have seen compelling use cases for utilizing the RAM switches, though not (yet) for combining them with ‘--no_tmp_fl’. NCO implements both options because they are largely independent of eachother. It is up to “power users” to discover which best fit their needs. We welcome accounts of your experiences posted to the forums.
Other safeguards exist to protect the user from inadvertently overwriting data. If the output-file specified for a command is a pre-existing file, then the operator will prompt the user whether to overwrite (erase) the existing output-file, attempt to append to it, or abort the operation. However, in processing large amounts of data, too many interactive questions slows productivity. Therefore NCO also implements two ways to override its own safety features, the ‘-O’ and ‘-A’ switches. Specifying ‘-O’ tells the operator to overwrite any existing output-file without prompting the user interactively. Specifying ‘-A’ tells the operator to attempt to append to any existing output-file without prompting the user interactively. These switches are useful in batch environments because they suppress interactive keyboard input.
Adding variables from one file to another is often desirable. This is referred to as appending, although some prefer the terminology merging 13 or pasting. Appending is often confused with what NCO calls concatenation. In NCO, concatenation refers to splicing a variable along the record dimension. The length along the record dimension of the output is the sum of the lengths of the input files. Appending, on the other hand, refers to copying a variable from one file to another file which may or may not already contain the variable 14. NCO can append or concatenate just one variable, or all the variables in a file at the same time.
In this sense, ncks
can append variables from one file to
another file.
This capability is invoked by naming two files on the command line,
input-file and output-file.
When output-file already exists, the user is prompted whether to
overwrite, append/replace, or exit from the command.
Selecting overwrite tells the operator to erase the existing
output-file and replace it with the results of the operation.
Selecting exit causes the operator to exit—the output-file
will not be touched in this case.
Selecting append/replace causes the operator to attempt to place
the results of the operation in the existing output-file,
See ncks
netCDF Kitchen Sink.
The simplest way to create the union of two files is
ncks -A fl_1.nc fl_2.nc
This puts the contents of fl_1.nc into fl_2.nc. The ‘-A’ is optional. On output, fl_2.nc is the union of the input files, regardless of whether they share dimensions and variables, or are completely disjoint. The append fails if the input files have differently named record dimensions (since netCDF supports only one), or have dimensions of the same name but different sizes.
Users comfortable with NCO semantics may find it easier to
perform some simple mathematical operations in NCO rather than
higher level languages.
ncbo
(see ncbo
netCDF Binary Operator) does file
addition, subtraction, multiplication, division, and broadcasting.
It even does group broadcasting.
ncflint
(see ncflint
netCDF File Interpolator) does
file addition, subtraction, multiplication and interpolation.
Sequences of these commands can accomplish simple yet powerful
operations from the command line.
The most frequently used operators of NCO are probably the
statisticians (i.e., tools that do statistics) and concatenators.
Because there are so many types of statistics like averaging (e.g.,
across files, within a file, over the record dimension, over other
dimensions, with or without weights and masks) and of concatenating
(across files, along the record dimension, along other dimensions),
there are currently no fewer than five operators which tackle these two
purposes: ncra
, nces
, ncwa
,
ncrcat
, and ncecat
.
These operators do share many capabilities 15, though each has its unique specialty.
Two of these operators, ncrcat
and ncecat
,
concatenate hyperslabs across files.
The other two operators, ncra
and nces
, compute
statistics across (and/or within) files
16.
First, let’s describe the concatenators, then the statistics tools.
ncrcat
and ncecat
¶Joining together independent files along a common record dimension is
called concatenation.
ncrcat
is designed for concatenating record variables, while
ncecat
is designed for concatenating fixed length variables.
Consider five files, 85.nc, 86.nc,
… 89.nc each containing a year’s worth of data.
Say you wish to create from them a single file, 8589.nc
containing all the data, i.e., spanning all five years.
If the annual files make use of the same record variable, then
ncrcat
will do the job nicely with, e.g.,
ncrcat 8?.nc 8589.nc
.
The number of records in the input files is arbitrary and can vary from
file to file.
See ncrcat
netCDF Record Concatenator, for a complete description of
ncrcat
.
However, suppose the annual files have no record variable, and thus
their data are all fixed length.
For example, the files may not be conceptually sequential, but rather
members of the same group, or ensemble.
Members of an ensemble may have no reason to contain a record dimension.
ncecat
will create a new record dimension (named record
by default) with which to glue together the individual files into the
single ensemble file.
If ncecat
is used on files which contain an existing record
dimension, that record dimension is converted to a fixed-length
dimension of the same name and a new record dimension (named
record
) is created.
Consider five realizations, 85a.nc, 85b.nc,
… 85e.nc of 1985 predictions from the same climate
model.
Then ncecat 85?.nc 85_ens.nc
glues together the individual
realizations into the single file, 85_ens.nc.
If an input variable was dimensioned [lat
,lon
], it will
have dimensions [record
,lat
,lon
] in the output file.
A restriction of ncecat
is that the hyperslabs of the
processed variables must be the same from file to file.
Normally this means all the input files are the same size, and contain
data on different realizations of the same variables.
See ncecat
netCDF Ensemble Concatenator, for a complete description
of ncecat
.
ncpdq
makes it possible to concatenate files along any
dimension, not just the record dimension.
First, use ncpdq
to convert the dimension to be concatenated
(i.e., extended with data from other files) into the record dimension.
Second, use ncrcat
to concatenate these files.
Finally, if desirable, use ncpdq
to revert to the original
dimensionality.
As a concrete example, say that files x_01.nc, x_02.nc,
… x_10.nc contain time-evolving datasets from spatially
adjacent regions.
The time and spatial coordinates are time
and x
, respectively.
Initially the record dimension is time
.
Our goal is to create a single file that contains joins all the
spatially adjacent regions into one single time-evolving dataset.
for idx in 01 02 03 04 05 06 07 08 09 10; do # Bourne Shell ncpdq -a x,time x_${idx}.nc foo_${idx}.nc # Make x record dimension done ncrcat foo_??.nc out.nc # Concatenate along x ncpdq -a time,x out.nc out.nc # Revert to time as record dimension
Note that ncrcat
will not concatenate fixed-length variables,
whereas ncecat
concatenates both fixed-length and record
variables along a new record variable.
To conserve system memory, use ncrcat
where possible.
nces
, ncra
, and ncwa
¶The differences between the averagers ncra
and nces
are analogous to the differences between the concatenators.
ncra
is designed for averaging record variables from at least
one file, while nces
is designed for averaging fixed length
variables from multiple files.
ncra
performs a simple arithmetic average over the record
dimension of all the input files, with each record having an equal
weight in the average.
nces
performs a simple arithmetic average of all the input
files, with each file having an equal weight in the average.
Note that ncra
cannot average fixed-length variables,
but nces
can average both fixed-length and record variables.
To conserve system memory, use ncra
rather than
nces
where possible (e.g., if each input-file is one
record long).
The file output from nces
will have the same dimensions
(meaning dimension names as well as sizes) as the input hyperslabs
(see nces
netCDF Ensemble Statistics, for a complete description of
nces
).
The file output from ncra
will have the same dimensions as
the input hyperslabs except for the record dimension, which will have a
size of 1 (see ncra
netCDF Record Averager, for a complete
description of ncra
).
ncflint
¶ncflint
can interpolate data between or two files.
Since no other operators have this ability, the description of
interpolation is given fully on the ncflint
reference page
(see ncflint
netCDF File Interpolator).
Note that this capability also allows ncflint
to linearly
rescale any data in a netCDF file, e.g., to convert between differing
units.
Occasionally one desires to digest (i.e., concatenate or average)
hundreds or thousands of input files.
Unfortunately, data archives (e.g., NASA EOSDIS) may not
name netCDF files in a format understood by the ‘-n loop’
switch (see Specifying Input Files) that automagically generates
arbitrary numbers of input filenames.
The ‘-n loop’ switch has the virtue of being concise,
and of minimizing the command line.
This helps keeps output file small since the command line is stored
as metadata in the history
attribute
(see History Attribute).
However, the ‘-n loop’ switch is useless when there is no
simple, arithmetic pattern to the input filenames (e.g.,
h00001.nc, h00002.nc, … h90210.nc).
Moreover, filename globbing does not work when the input files are too
numerous or their names are too lengthy (when strung together as a
single argument) to be passed by the calling shell to the NCO
operator
17.
When this occurs, the ANSI C-standard argc
-argv
method of passing arguments from the calling shell to a C-program (i.e.,
an NCO operator) breaks down.
There are (at least) three alternative methods of specifying the input
filenames to NCO in environment-limited situations.
The recommended method for sending very large numbers (hundreds or
more, typically) of input filenames to the multi-file operators is
to pass the filenames with the UNIX standard input
feature, aka stdin
:
# Pipe large numbers of filenames to stdin /bin/ls | grep ${CASEID}_'......'.nc | ncecat -o foo.nc
This method avoids all constraints on command line size imposed by
the operating system.
A drawback to this method is that the history
attribute
(see History Attribute) does not record the name of any input
files since the names were not passed as positional arguments
on the command line.
This makes it difficult later to determine the data provenance.
To remedy this situation, operators store the number of input files in
the nco_input_file_number
global attribute and the input file
list itself in the nco_input_file_list
global attribute
(see File List Attributes).
Although this does not preserve the exact command used to generate the
file, it does retains all the information required to reconstruct the
command and determine the data provenance.
As of NCO version 5.1.1, released in November, 2022,
all operators support specifying input files via stdin
(see Specifying Input Files), and also store such input filenames
in the File List Attributes).
A second option is to use the UNIX xargs
command.
This simple example selects as input to xargs
all the
filenames in the current directory that match a given pattern.
For illustration, consider a user trying to average millions of
files which each have a six character filename.
If the shell buffer cannot hold the results of the corresponding
globbing operator, ??????.nc, then the filename globbing
technique will fail.
Instead we express the filename pattern as an extended regular
expression, ......\.nc (see Subsetting Files).
We use grep
to filter the directory listing for this pattern
and to pipe the results to xargs
which, in turn, passes the
matching filenames to an NCO multi-file operator, e.g.,
ncecat
.
# Use xargs to transfer filenames on the command line /bin/ls | grep ${CASEID}_'......'.nc | xargs -x ncecat -o foo.nc
The single quotes protect the only sensitive parts of the extended
regular expression (the grep
argument), and allow shell
interpolation (the ${CASEID}
variable substitution) to
proceed unhindered on the rest of the command.
xargs
uses the UNIX pipe feature to append the
suitably filtered input file list to the end of the ncecat
command options.
The -o foo.nc
switch ensures that the input files supplied by
xargs
are not confused with the output file name.
xargs
does, unfortunately, have its own limit (usually about
20,000 characters) on the size of command lines it can pass.
Give xargs
the ‘-x’ switch to ensure it dies if it
reaches this internal limit.
When this occurs, use either the stdin
method above, or the
symbolic link presented next.
Even when its internal limits have not been reached, the
xargs
technique may not be flexible enough to handle
all situations.
A full scripting language like Perl or Python can handle any level of
complexity of filtering input filenames, and any number of filenames.
The technique of last resort is to write a script that creates symbolic
links between the irregular input filenames and a set of regular,
arithmetic filenames that the ‘-n loop’ switch understands.
For example, the following Perl script creates a monotonically
enumerated symbolic link to up to one million .nc files in a
directory. If there are 999,999 netCDF files present, the links are
named 000001.nc to 999999.nc:
# Create enumerated symbolic links /bin/ls | grep \.nc | perl -e \ '$idx=1;while(<STDIN>){chop;symlink $_,sprintf("%06d.nc",$idx++);}' ncecat -n 999999,6,1 000001.nc foo.nc # Remove symbolic links when finished /bin/rm ??????.nc
The ‘-n loop’ option tells the NCO operator to
automatically generate the filnames of the symbolic links.
This circumvents any OS and shell limits on command-line size.
The symbolic links are easily removed once NCO is finished.
One drawback to this method is that the history
attribute
(see History Attribute) retains the filename list of the symbolic
links, rather than the data files themselves.
This makes it difficult to determine the data provenance at a later
date.
Large datasets are those files that are comparable in size to the
amount of random access memory (RAM) in your computer.
Many users of NCO work with files larger than 100 MB.
Files this large not only push the current edge of storage technology,
they present special problems for programs which attempt to access the
entire file at once, such as nces
and ncecat
.
If you work with a 300 MB files on a machine with only 32 MB of
memory then you will need large amounts of swap space (virtual memory on
disk) and NCO will work slowly, or even fail.
There is no easy solution for this.
The best strategy is to work on a machine with sufficient amounts of
memory and swap space.
Since about 2004, many users have begun to produce or analyze files
exceeding 2 GB in size.
These users should familiarize themselves with NCO’s Large
File Support (LFS) capabilities (see Large File Support).
The next section will increase your familiarity with NCO’s
memory requirements.
With this knowledge you may re-design your data reduction approach to
divide the problem into pieces solvable in memory-limited situations.
If your local machine has problems working with large files, try running
NCO from a more powerful machine, such as a network server.
If you get a memory-related core dump
(e.g., ‘Error exit (core dumped)’) on a GNU/Linux system,
or the operation ends before the entire output file is written,
try increasing the process-available memory with ulimit
:
ulimit -f unlimited
This may solve constraints on clusters where sufficient hardware
resources exist yet where system administrators felt it wise to prevent
any individual user from consuming too much of resource.
Certain machine architectures, e.g., Cray UNICOS, have special
commands which allow one to increase the amount of interactive memory.
On Cray systems, try to increase the available memory with the
ilimit
command.
The speed of the NCO operators also depends on file size.
When processing large files the operators may appear to hang, or do
nothing, for large periods of time.
In order to see what the operator is actually doing, it is useful to
activate a more verbose output mode.
This is accomplished by supplying a number greater than 0 to the
‘-D debug-level’ (or ‘--debug-level’, or
‘--dbg_lvl’) switch.
When the debug-level is non-zero, the operators report their
current status to the terminal through the stderr facility.
Using ‘-D’ does not slow the operators down.
Choose a debug-level between 1 and 3 for most situations,
e.g., nces -D 2 85.nc 86.nc 8586.nc
.
A full description of how to estimate the actual amount of memory the
multi-file NCO operators consume is given in
Memory Requirements.
Many people use NCO on gargantuan files which dwarf the memory available (free RAM plus swap space) even on today’s powerful machines. These users want NCO to consume the least memory possible so that their scripts do not have to tediously cut files into smaller pieces that fit into memory. We commend these greedy users for pushing NCO to its limits!
This section describes the memory NCO requires during
operation.
The required memory depends on the underlying algorithms, datatypes,
and compression, if any.
The description below is the memory usage per thread.
Users with shared memory machines may use the threaded NCO
operators (see OpenMP Threading).
The peak and sustained memory usage will scale accordingly,
i.e., by the number of threads.
In all cases the memory use refers to the uncompressed size of
the data.
The netCDF4 library automatically decompresses variables during reads.
The filesize can easily belie the true size of the uncompressed data.
In other words, the usage below can be taken at face value for netCDF3
datasets only.
Chunking will also affect memory usage on netCDF4 operations.
Memory consumption patterns of all operators are similar, with
the exception of ncap2
.
The multi-file operators currently comprise the record operators,
ncra
and ncrcat
, and the ensemble operators,
nces
and ncecat
.
The record operators require much less memory than the ensemble
operators.
This is because the record operators operate on one single record (i.e.,
time-slice) at a time, whereas the ensemble operators retrieve the
entire variable into memory.
Let MS be the peak sustained memory demand of an operator,
FT be the memory required to store the entire contents of all the
variables to be processed in an input file,
FR be the memory required to store the entire contents of a
single record of each of the variables to be processed in an input file,
VR be the memory required to store a single record of the
largest record variable to be processed in an input file,
VT be the memory required to store the largest variable
to be processed in an input file,
VI be the memory required to store the largest variable
which is not processed, but is copied from the initial file to the
output file.
All operators require MI = VI during the initial copying of
variables from the first input file to the output file.
This is the initial (and transient) memory demand.
The sustained memory demand is that memory required by the
operators during the processing (i.e., averaging, concatenation)
phase which lasts until all the input files have been processed.
The operators have the following memory requirements:
ncrcat
requires MS <= VR.
ncecat
requires MS <= VT.
ncra
requires MS = 2FR + VR.
nces
requires MS = 2FT + VT.
ncbo
requires MS <= 3VT
(both input variables and the output variable).
ncflint
requires MS <= 3VT
(both input variables and the output variable).
ncpdq
requires MS <= 2VT
(one input variable and the output variable).
ncwa
requires MS <= 8VT (see below).
Note that only variables that are processed, e.g., averaged,
concatenated, or differenced, contribute to MS.
Variables that do not appear in the output file
(see Subsetting Files) are never read and contribute nothing
to the memory requirements.
Further note that some operators perform internal type-promotion on some
variables prior to arithmetic (see Type Conversion).
For example, ncra
, nces
, and ncwa
all
promote integer types to double-precision floating-point prior to
arithmetic, then perform the arithmetic, then demote back to the
original integer type after arithmetic.
This preserves the on-disk storage type while obtaining the precision
advantages of double-precision floating-point arithmetic.
Since version 4.3.6 (released in September, 2013), NCO also
by default converts single-precision floating-point to double-precision
prior to arithmetic, which incurs the same RAM penalty.
Hence, the sustained memory required for integer variables and
single-precision floats are two or four-times their on-disk,
uncompressed, unpacked sizes if they meet the rules for automatic
internal promotion.
Put another way, disabling auto-promotion of single-precision variables
(with ‘--flt’) considerably reduces the RAM footprint
of arithmetic operators.
The ‘--open_ram’ switch (and switches that invoke it like ‘--ram_all’ and ‘--diskless_all’) incurs a RAM penalty. These switches cause each input file to be copied to RAM upon opening. Hence any operator invoking these switches utilizes an additional FT of RAM (i.e., MS += FT). See RAM disks for further details.
ncwa
consumes between two and eight times the memory of an
NC_DOUBLE
variable in order to process it.
Peak consumption occurs when storing simultaneously in memory
one input variable, one tally array,
one input weight, one conformed/working weight, one weight tally,
one input mask, one conformed/working mask, and
one output variable.
NCO’s tally arrays are of type C-type long
, whose size
is eight-bytes on all modern computers, the same as NC_DOUBLE
18.
When invoked, the weighting and masking features contribute up to
three-eighths and two-eighths of these requirements apiece.
If weights and masks are not specified
(i.e., no ‘-w’ or ‘-a’ options)
then ncwa
requirements drop to MS <= 3VT
(one input variable, one tally array, and the output variable).
The output variable is the same size as the input variable when
averaging only over a degenerate dimension.
However, normally the output variable is much smaller than the input,
and is often a simple scalar, in which case the memory requirements
drop by 1VT since the output array requires essentially no
memory.
All of this is subject to the type promotion rules mentioned above.
For example, ncwa
averaging a variable of type
NC_FLOAT
requires MS <= 16VT (rather than MS <= 8VT)
since all arrays are (at least temporarily) composed of eight-byte
elements, twice the size of the values on disk.
Without mask or weights, the requirements for NC_FLOAT
are
MS <= 6VT (rather than MS <= 3VT as for NC_DOUBLE
)
due to temporary internal promotion of both the input variable and the
output variable to type NC_DOUBLE
.
The ‘--flt’ option that suppresses promotion reduces this to
MS <= 4VT (the tally elements do not change size), and to
MS <= 3VT when the output array is a scalar.
The above memory requirements must be multiplied by the number of threads thr_nbr (see OpenMP Threading). If this causes problems then reduce (with ‘-t thr_nbr’) the number of threads.
ncap2
¶ncap2
has unique memory requirements due its ability to process
arbitrarily long scripts of any complexity.
All scripts acceptable to ncap2
are ultimately processed as a
sequence of binary or unary operations.
ncap2
requires MS <= 2VT under most conditions.
An exception to this is when left hand casting (see Left hand casting) is used to stretch the size of derived variables beyond the
size of any input variables.
Let VC be the memory required to store the largest variable
defined by left hand casting.
In this case, MS <= 2VC.
ncap2
scripts are complete dynamic and may be of arbitrary
length.
A script that contains many thousands of operations, may uncover a
slow memory leak even though each single operation consumes little
additional memory.
Memory leaks are usually identifiable by their memory usage signature.
Leaks cause peak memory usage to increase monotonically with time
regardless of script complexity.
Slow leaks are very difficult to find.
Sometimes a malloc()
(or new[]
) failure is the
only noticeable clue to their existence.
If you have good reasons to believe that a memory allocation failure
is ultimately due to an NCO memory leak (rather than
inadequate RAM on your system), then we would be very
interested in receiving a detailed bug report.
An overview of NCO capabilities as of about 2006 is in Zender, C. S. (2008), “Analysis of Self-describing Gridded Geoscience Data with netCDF Operators (NCO)”, Environ. Modell. Softw., doi:10.1016/j.envsoft.2008.03.004. This paper is also available at http://dust.ess.uci.edu/ppr/ppr_Zen08.pdf.
NCO performance and scaling for arithmetic operations is described in Zender, C. S., and H. J. Mangalam (2007), “Scaling Properties of Common Statistical Operators for Gridded Datasets”, Int. J. High Perform. Comput. Appl., 21(4), 485-498, doi:10.1177/1094342007083802. This paper is also available at http://dust.ess.uci.edu/ppr/ppr_ZeM07.pdf.
It is helpful to be aware of the aspects of NCO design that can limit its performance:
nc_get_var
and
nc_put_var
operations.
Hyperslabs too large to hold in core memory will suffer substantial
performance penalties because of this.
ncks
when printing variables to screen.
This chapter presents reference pages for each of the operators individually. The operators are presented in alphabetical order. All valid command line switches are included in the syntax statement. Recall that descriptions of many of these command line switches are provided only in Shared Features, to avoid redundancy. Only options specific to, or most useful with, a particular operator are described in any detail in the sections below.
ncap2
netCDF Arithmetic Processorncatted
netCDF Attribute Editorncbo
netCDF Binary Operatorncchecker
netCDF Compliance Checkerncclimo
netCDF Climatology Generatorncecat
netCDF Ensemble Concatenatornces
netCDF Ensemble Statisticsncflint
netCDF File Interpolatorncks
netCDF Kitchen Sinkncpdq
netCDF Permute Dimensions Quicklyncra
netCDF Record Averagerncrcat
netCDF Record Concatenatorncremap
netCDF Remapperncrename
netCDF Renamerncwa
netCDF Weighted Averagerncap2
netCDF Arithmetic Processor ¶
|
SYNTAX
ncap2 [-3] [-4] [-5] [-6] [-7] [-A] [-C] [-c] [-D dbg] [-F] [-f] [--glb ...] [-H] [-h] [--hdf] [--hdr_pad nbr] [--hpss] [-L dfl_lvl] [-l path] [--no_tmp_fl] [-O] [-o output-file] [-p path] [-R] [-r] [--ram_all] [-s algebra] [-S fl.nco] [-t thr_nbr] [-v] [input-file] [output-file]
DESCRIPTION
ncap2
arithmetically processes netCDF files.
ncap2
is the successor to ncap
which was put into
maintenance mode in November, 2006, and completely removed from
NCO in March, 2018.
This documentation refers to ncap2
implements its own
domain-specific language to produc a powerful superset
ncap
-functionality.
ncap2
may be renamed ncap
one day!
The processing instructions are contained either in the NCO
script file fl.nco or in a sequence of command line arguments.
The options ‘-s’ (or long options ‘--spt’ or ‘--script’)
are used for in-line scripts and ‘-S’ (or long options
‘--fl_spt’, ‘--nco_script’, or ‘--script-file’) are used to provide the
filename where (usually multiple) scripting commands are pre-stored.
ncap2
was written to perform arbitrary algebraic
transformations of data and archive the results as easily as
possible.
See Missing values, for treatment of missing values.
The results of the algebraic manipulations are called
derived fields.
Unlike the other operators, ncap2
does not accept a list of
variables to be operated on as an argument to the ‘-v’ option
(see Subsetting Files).
In ncap2
, ‘-v’ is a switch that takes no arguments and
indicates that ncap2
should output only user-defined
variables (and coordinates associated with variables used in deriving
them).
ncap2
neither accepts nor understands the -x switch.
We recommend making this distinction clear by using
‘--usr_dfn_var’ (or its synonym,
‘--output_user_defined_variables’, both introduced in
NCO version 5.1.9 in October, 2023) instead of ‘-v’,
which may be deprecated.
NB: As of 20120515, ncap2
is unable to append to files that
already contain the appended dimensions.
Providing a name for output-file is optional if input-file
is a netCDF3 format, in which case ncap2
attempts to write
modifications directly to input-file (similar to the behavior of
ncrename
and ncatted
).
Format-constraints prevent this type of appending from working on a
netCDF4 format input-file.
In any case, reading and writing the same file can be risky and lead
to unexpected consequences (since the file is being both read and
written), so in normal usage we recommend providing output-file
(which can be the same as input-file since the changes are first
written to an intermediate file).
As of NCO version 4.8.0 (released May, 2019),
ncap2
does not require that input-file be specified
when output-file has no dependency on it.
Prior to this, ncap2
required users to specify a dummy
input-file even if it was not used to construct
output-file.
Input files are always read by ncap2
, and dummy input
files are read though not used for anything nor modified.
Now
ncap2 -s 'quark=1' ~/foo.nc # Create new foo.nc ncap2 -s 'print(quark)' ~/foo.nc # Print existing foo.nc ncap2 -O -s 'quark=1' ~/foo.nc # Overwrite old with new foo.nc ncap2 -s 'quark=1' ~/foo.nc ~/foo.nc # Add to old foo.nc
Defining new variables in terms of existing variables is a powerful
feature of ncap2
.
Derived fields inherit the metadata (i.e., attributes) of their
ancestors, if any, in the script or input file.
When the derived field is completely new (no identically-named ancestors
exist), then it inherits the metadata (if any) of the left-most variable
on the right hand side of the defining expression.
This metadata inheritance is called attribute propagation.
Attribute propagation is intended to facilitate well-documented
data analysis, and we welcome suggestions to improve this feature.
The only exception to this rule of attribute propagation is in cases of left hand casting (see Left hand casting). The user must manually define the proper metadata for variables defined using left hand casting.
ncap2
statementssort
methodsncap2
statements ¶Mastering ncap2
is relatively simple.
Each valid statement statement consists of standard forward
algebraic expression.
The fl.nco, if present, is simply a list of such statements,
whitespace, and comments.
The syntax of statements is most like the computer language C.
The following characteristics of C are preserved:
Arrays elements are placed within []
characters;
Arrays are 0-based;
Last dimension is most rapidly varying;
A semi-colon ‘;’ indicates the end of an assignment statement.
Multi-line comments are enclosed within /* */
characters.
Single line comments are preceded by //
characters.
Files may be nested in scripts using #include script
.
The #include
command is not followed by a semi-colon because it
is a pre-processor directive, not an assignment statement.
The filename script is interpreted relative to the run directory.
The at-sign @
is used to delineate an attribute name from a
variable name.
Expressions are the fundamental building block of ncap2
.
Expressions are composed of variables, numbers, literals, and
attributes.
The following C operators are “overloaded” and work with scalars
and multi-dimensional arrays:
Arithmetic Operators: * / % + - ^ Binary Operators: > >= < <= == != == || && >> << Unary Operators: + - ++ -- ! Conditional Operator: exp1 ? exp2 : exp3 Assign Operators: = += -= /= *=
In the following section a variable also refers to a number literal which is read in as a scalar variable:
Arithmetic and Binary Operators
Consider var1 ’op’ var2
Precision
NC_FLOAT
, the result is NC_FLOAT
.
When either type is NC_DOUBLE
, the result is also NC_DOUBLE
.
Rank
Even though the logical operators return True(1) or False(0)
they are treated in the same way as the arithmetic operators with regard
to precision and rank.
Examples:
dimensions: time=10, lat=2, lon=4 Suppose we have the two variables: double P(time,lat,lon); float PZ0(lon,lat); // PZ0=1,2,3,4,5,6,7,8; Consider now the expression: PZ=P-PZ0 PZ0 is made to conform to P and the result is PZ0 = 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, Once the expression is evaluated then PZ will be of type double; Consider now start=four-att_var@double_att; // start =-69 and is of type intger; four_pow=four^3.0f // four_pow=64 and is of type float three_nw=three_dmn_var_sht*1.0f; // type is now float start@n1=att_var@short_att*att_var@int_att; // start@n1=5329 and is type int
Binary Operators
Unlike C the binary operators return an array of values.
There is no such thing as short circuiting with the AND/OR operators.
Missing values are carried into the result in the same way they are with
the arithmetic operators.
When an expression is evaluated in an if() the missing values are
treated as true.
The binary operators are, in order of precedence:
! Logical Not ---------------------------- << Less Than Selection >> Greater Than Selection ---------------------------- > Greater than >= Greater than or equal to < Less than <= Less than or equal to ---------------------------- == Equal to != Not equal to ---------------------------- && Logical AND ---------------------------- || Logical OR ----------------------------
To see all operators: see Operator precedence and associativity Examples:
tm1=time>2 && time <7; // tm1=0, 0, 1, 1, 1, 1, 0, 0, 0, 0 double tm2=time==3 || time>=6; // tm2=0, 0, 1, 0, 0, 1, 1, 1, 1, 1 double tm3=int(!tm1); // tm3=1, 1, 0, 0, 0, 0, 1, 1, 1, 1 int tm4=tm1 && tm2; // tm4=0, 0, 1, 0, 0, 1, 0, 0, 0, 0 double tm5=!tm4; // tm5=1, 1, 0, 1, 1, 0, 1, 1, 1, 1 double
Regular Assign Operator
var1 ’=’ exp1
If var1 does not already exist in Input or Output then var1 is written to Output with the values, type and dimensions from expr1. If var1 is in Input only it is copied to Output first. Once the var is in Ouptut then the only reqirement on expr1 is that the number of elements must match the number already on disk. The type of expr1 is converted as necessary to the disk type.
If you wish to change the type or shape of a variable in Input then you must cast the variable. See see Left hand casting
time[time]=time.int(); three_dmn_var_dbl[time,lon,lat]=666L;
Other Assign Operators +=,-=,*=./=
var1 ’ass_op’ exp1
if exp1 is a variable and it doesn’t conform to var1 then an attempt is made to make it conform to var1. If exp1 is an attribute it must have unity size or else have the same number of elements as var1. If expr1 has a different type to var1 the it is converted to the var1 type.
z1=four+=one*=10 // z1=14 four=14 one=10; time-=2 // time= -1,0,1,2,3,4,5,6,7,8
Increment/Decrement Operators
These work in a similar fashion to their regular C counterparts. If say the variable four
is input only then the statement ++four
effectively means read four
from input increment each element by one, then write the new values to Output;
Example:
n2=++four; n2=5, four=5 n3=one--+20; n3=21 one=0; n4=--time; n4=time=0.,1.,2.,3.,4.,5.,6.,7.,8.,9.;
Conditional Operator ?:
exp1 ? exp2 : exp3
The conditional operator (or ternary Operator) is a succinct way
of writing an if/then/else. If exp1 evaluates to true then exp2 is
returned else exp3 is returned.
Example:
weight_avg=weight.avg(); weight_avg@units= (weight_avg == 1 ? "kilo" : "kilos"); PS_nw=PS-(PS.min() > 100000 ? 100000 : 0);
For arrays, the less-than selection operator selects all values in the left operand that are less than the corresponding value in the right operand. If the value of the left side is greater than or equal to the corresponding value of the right side, then the right side value is placed in the result
For arrays, the greater-than selection operator selects all values in the left operand that are greater than the corresponding value in the right operand. If the value of the left side is less than or equal to the corresponding value of the right side, then the right side value is placed in the result.
Example:
RDM2=RDM >> 100.0 // 100,100,100,100,126,126,100,100,100,100 double RDM2=RDM << 90s // 1, 9, 36, 84, 90, 90, 84, 36, 9, 1 int
Dimensions are defined in Output using the defdim()
function.
defdim("cnt",10); # Dimension size is fixed by default defdim("cnt",10,NC_UNLIMITED); # Dimension is unlimited (record dimension) defdim("cnt",10,0); # Dimension is unlimited (record dimension) defdim("cnt",10,1); # Dimension size is fixed defdim("cnt",10,737); # All non-zero values indicate dimension size is fixed
This dimension name must then be prefixed with a dollar-sign ‘$’ when referred to in method arguments or left-hand-casting, e.g.,
new_var[$cnt]=time; temperature[$time,$lat,$lon]=35.5; temp_avg=temperature.avg($time);
The size
method allows dimension sizes to be used in
arithmetic expressions:
time_avg=time.total()/$time.size;
Increase the size of a new variable by one and set new member to zero:
defdim("cnt_new",$cnt.size+1); new_var[$cnt_new]=0.0; new_var(0:($cnt_new.size-2))=old_var;
To define an unlimited dimension, simply set the size to zero
defdim("time2",0)
Dimension Abbreviations
It is possible to use dimension abbreviations as method arguments:
$0
is the first dimension of a variable
$1
is the second dimension of a variable
$n
is the n+1 dimension of a variable
float four_dmn_rec_var(time,lat,lev,lon); double three_dmn_var_dbl(time,lat,lon); four_nw=four_dmn_rev_var.reverse($time,$lon) four_nw=four_dmn_rec_var.reverse($0,$3); four_avg=four_dmn_rec_var.avg($lat,$lev); four_avg=four_dmn_rec_var.avg($1,$2); three_mw=three_dmn_var_dbl.permute($time,$lon,$lat); three_mw=three_dmn_var_dbl.permute($0,$2,$1);
ID Quoting
If the dimension name contains non-regular characters use ID quoting:
See see ID Quoting
defdim("a--list.A",10); A1['$a--list.A']=30.0;
GOTCHA
It is not possible to manually define in Output any dimensions that exist in Input. When a variable from Input appears in an expression or statement its dimensions in Input are automagically copied to Output (if they are not already present)
The following examples demonstrate the utility of the
left hand casting ability of ncap2
.
Consider first this simple, artificial, example.
If lat and lon are one dimensional coordinates of
dimensions lat and lon, respectively, then addition
of these two one-dimensional arrays is intrinsically ill-defined because
whether lat_lon should be dimensioned lat by lon
or lon by lat is ambiguous (assuming that addition is to
remain a commutative procedure, i.e., one that does not depend on
the order of its arguments).
Differing dimensions are said to be orthogonal to one another,
and sets of dimensions which are mutually exclusive are orthogonal
as a set and any arithmetic operation between variables in orthogonal
dimensional spaces is ambiguous without further information.
The ambiguity may be resolved by enumerating the desired dimension ordering of the output expression inside square brackets on the left hand side (LHS) of the equals sign. This is called left hand casting because the user resolves the dimensional ordering of the RHS of the expression by specifying the desired ordering on the LHS.
ncap2 -s 'lat_lon[lat,lon]=lat+lon' in.nc out.nc ncap2 -s 'lon_lat[lon,lat]=lat+lon' in.nc out.nc
The explicit list of dimensions on the LHS, [lat,lon]
resolves the otherwise ambiguous ordering of dimensions in
lat_lon.
In effect, the LHS casts its rank properties onto the
RHS.
Without LHS casting, the dimensional ordering of lat_lon
would be undefined and, hopefully, ncap2
would print an error
message.
Consider now a slightly more complex example.
In geophysical models, a coordinate system based on
a blend of terrain-following and density-following surfaces is
called a hybrid coordinate system.
In this coordinate system, four variables must be manipulated to
obtain the pressure of the vertical coordinate:
PO is the domain-mean surface pressure offset (a scalar),
PS is the local (time-varying) surface pressure (usually two
horizontal spatial dimensions, i.e. latitude by longitude), hyam
is the weight given to surfaces of constant density (one spatial
dimension, pressure, which is orthogonal to the horizontal
dimensions), and hybm is the weight given to surfaces of
constant elevation (also one spatial dimension).
This command constructs a four-dimensional pressure prs_mdp
from the four input variables of mixed rank and orthogonality:
ncap2 -s 'prs_mdp[time,lat,lon,lev]=P0*hyam+PS*hybm' in.nc out.nc
Manipulating the four fields which define the pressure in a hybrid coordinate system is easy with left hand casting.
Finally, we show how to use interface quantities to define midpoint quantities. In particular, we will define interface pressures using the standard CESM output hybrid coordinate parameters, and then difference those interface pressures to obtain the pressure difference between the interfaces. The pressure difference is necessary obtain gridcell mass path and density (which are midpoint quantities). Definitions are as in the above example, with new variables hyai and hybi defined at grid cell vertical interfaces (rather than midpoints like hyam and hybm). The approach naturally fits into two lines:
cat > ~/pdel.nco << 'EOF' *prs_ntf[time,lat,lon,ilev]=P0*hyai+PS*hybi; // Requires NCO 4.5.4 and later: prs_dlt[time,lat,lon,lev]=prs_ntf(:,:,:,1:$ilev.size-1)-prs_ntf(:,:,:,0:$ilev.size-2); // Derived variable that require pressure thickness: // Divide by gravity to obtain total mass path in layer aka mpl [kg m-2] mpl=prs_dlt/grv_sfc; // Multiply by mass mixing ratio to obtain mass path of constituent mpl_CO2=mpl*mmr_CO2; EOF ncap2 -O -v -S ~/pdel.nco ~/nco/data/in.nc ~/foo.nc ncks -O -C -v prs_dlt ~/foo.nc
The first line defines the four-dimensional interface pressures
prs_ntf
as a RAM variable because those are not desired
in the output file.
The second differences each pressure level from the pressure above it
to obtain the pressure difference.
This line employs both left-hand casting and array hyperslabbing.
However, this syntax only works with NCO version 4.5.4
(November, 2015) and later because earlier versions require that
LHS and RHS dimension names (not just sizes) match.
From the pressure differences, one can obtain the mass path in each
layer as shown.
Another reason to cast a variable is to modify the shape or type of a variable already in Input
gds_var[gds_crd]=gds_var.double(); three_dmn_var_crd[lat,lon,lev]=10.0d; four[]=four.int();
Generating a regularly spaced n-dimensional array with ncap2
is simple with the array()
function.
The function comes in three (overloaded) forms
(A) var_out=array(val_srt,val_inc,$dmn_nm); // One-dimensional output (B) var_out=array(val_srt,val_inc,var_tpl); // Multi-dimensional output (C) var_out=array(val_srt,val_inc,/$dmn1,$dmn2...,$dmnN/); // Multi-dimensional output
Starting value of the array. The type of the array will be the type of this starting value.
Spacing (or increment) between elements.
Variable from which the array can derive its shape 1D or nD
One-Dimensional Arrays
Use form (A) or (B) above for 1D arrays:
# var_out will be NC_DOUBLE: var_out=array(10.0,2,$time) // 10.5,12.5,14.5,16.5,18.5,20.5,22.5,24.5,26.5,28.5 // var_out will be NC_UINT, and "shape" will duplicate "ilev" var_out=array(0ul,2,ilev) // 0,2,4,6 // var_out will be NC_FLOAT var_out=array(99.0f,2.5,$lon) // 99,101.5,104,106.5 // Create an array of zeros var_out=array(0,0,$time) // 0,0,0,0,0,0,0,0,0,0 // Create array of ones var_out=array(1.0,0.0,$lon) // 1.0,1.0,1.0,1.0
n-Dimensional Arrays
Use form (B) or (C) for creating n-D arrays.
NB: In (C) the final argument is a list of dimensions
// These are equivalent var_out=array(1.0,2.0,three_dmn_var); var_out=array(1.0,2.0,/$lat,$lev,$lon/); // var_out is NC_BYTE var_out=array(20b, -4, /$lat,$lon/); // 20,16,12,8,4,0,-4,-8 srt=3.14159f; inc=srt/2.0f; var_out(srt,inc,var_2D_rrg); // 3.14159, 4.712385, 6.28318, 7.853975, 9.42477, 10.99557, 12.56636, 14.13716 ;
Hyperslabs in ncap2
are more limited than hyperslabs with the
other NCO operators.
ncap2
does not understand the shell command-line syntax
used to specify multi-slabs, wrapped co-ordinates, negative stride or
coordinate value limits.
However with a bit of syntactic magic they are all are possible.
ncap2
accepts (in fact, it requires) N-hyperslab
arguments for a variable of rank N:
var1(arg1,arg2 ... argN);
where each hyperslab argument is of the form
start:end:stride
and the arguments for different dimensions are separated by commas. If start is omitted, it defaults to zero. If end is omitted, it defaults to dimension size minus one. If stride is omitted, it defaults to one.
If a single value is present then it is assumed that that dimension collapses to a single value (i.e., a cross-section). The number of hyperslab arguments MUST equal the variable’s rank.
Hyperslabs on the Right Hand Side of an assign
A simple 1D example:
($time.size=10) od[$time]={20,22,24,26,28,30,32,34,36,38}; od(7); // 34 od(7:); // 34,36,38 od(:7); // 20,22,24,26,28,30,32,34 od(::4); // 20,28,36 od(1:6:2) // 22,26,30 od(:) // 20,22,24,26,28,30,32,34,36,38
A more complex three dimensional example:
($lat.size=2,$lon.size=4) th[$time,$lat,$lon]= {1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16, 17,18,19,20,21,22,23,24, -99,-99,-99,-99,-99,-99,-99,-99, 33,34,35,36,37,38,39,40, 41,42,43,44,45,46,47,48, 49,50,51,52,53,54,55,56, -99,58,59,60,61,62,63,64, 65,66,67,68,69,70,71,72, -99,74,75,76,77,78,79,-99 }; th(1,1,3); // 16 th(2,0,:); // 17, 18, 19, 20 th(:,1,3); // 8, 16, 24, -99, 40, 48, 56, 64, 72, -99 th(::5,:,0:3:2); // 1, 3, 5, 7, 41, 43, 45, 47
If hyperslab arguments collapse to a single value (a cross-section has been specified), then that dimension is removed from the returned variable. If all the values collapse then a scalar variable is returned. So, for example, the following is valid:
th_nw=th(0,:,:)+th(9,:,:); // th_nw has dimensions $lon,$lat // NB: the time dimension has become degenerate
The following is invalid:
th_nw=th(0,:,0:1)+th(9,:,0:1);
because the $lon
dimension now only has two elements.
The above can be calculated by using a LHS cast with
$lon_nw
as replacement dim for $lon
:
defdim("lon_nw",2); th_nw[$lat,$lon_nw]=th(0,:,0:1)+th(9,:,0:1);
Hyperslabs on the Left Hand Side of an assign
When hyperslabing on the LHS, the expression on the RHS must
evaluate to a scalar or a variable/attribute with the same number of
elements as the LHS hyperslab.
Set all elements of the last record to zero:
th(9,:,:)=0.0;
Set first element of each lon element to 1.0:
th(:,:,0)=1.0;
One may hyperslab on both sides of an assign. For example, this sets the last record to the first record:
th(9,:,:)=th(0,:,:);
Say th0 represents pressure at height=0 and th1 represents pressure at height=1. Then it is possible to insert these hyperslabs into the records
prs[$time,$height,$lat,$lon]=0.0; prs(:,0,:,:)=th0; prs(:,1,:,:)=th1;
Reverse method
Use the reverse()
method to reverse a dimension’s elements in a
variable with at least one dimension.
This is equivalent to a negative stride, e.g.,
th_rv=th(1,:,:).reverse($lon); // {12,11,10,9 }, {16,15,14,13} od_rv=od.reverse($time); // {38,36,34,32,30,28,26,24,22,20}
Permute methodp
Use the permute()
method to swap the dimensions of a variable.
The number and names of dimension arguments must match the dimensions in
the variable.
If the first dimension in the variable is of record type then this must
remain the first dimension.
If you want to change the record dimension then consider using
ncpdq
.
Consider the variable:
float three_dmn_var(lat,lev,lon); three_dmn_var_prm=three_dmn_var.permute($lon,$lat,$lev); // The permuted values are three_dmn_var_prm= 0,4,8, 12,16,20, 1,5,9, 13,17,21, 2,6,10, 14,18,22, 3,7,11, 15,19,23;
Refer to attributes with var_nm@att_nm. The following are all valid statements:
global@text="Test Attributes"; /* Assign a global variable attribute */ a1[$time]=time*20; a1@long_name="Kelvin"; a1@min=a1.min(); a1@max=a1.max(); a1@min++; --a1@max; a1(0)=a1@min; a1($time.size-1)=a1@max;
NetCDF allows all attribute types to have a size between one
NC_MAX_ATTRS
.
Here is the metadata for variable a1:
double a1(time) ; a1:long_name = "Kelvin" ; a1:max = 199. ; a1:min = 21. ; a1:trip1 = 1, 2, 3 ; a1:triplet = 21., 110., 199. ;
These basic methods can be used with attributes:
size()
, type()
, and exists()
.
For example, to save an attribute text string in a variable:
defdim("sng_len",a1@long_name.size()); sng_arr[$sng_len]=a1@long_name; // sng_arr now contains "Kelvin"
Attributes defined in a script are stored in memory and are written to
the output file after script completion.
To stop the attribute being written use the ram_delete()
method
or use a bogus variable name.
Attribute Propagation and Inheritance
// prs_mdp inherits attributes from P0: prs_mdp[time,lat,lon,lev]=P0*hyam+hybm*PS; // th_min inherits attributes from three_dmn_var_dbl: th_min=1.0 + 2*three_dmn_var_dbl.min($time);
The push() function concatenates attributes, or appends an “expression” to a pre-existing attribute. It comes in two forms
(A) att_new=push(att_exp, expr) (B) att_size=push(&att_nm,expr)
In form (A) The first argument should be an attribute
identifier or an expression that evaluates to an attribute.
The second argument can evalute to an attribute or a variable.
The second argument is then converted to the type of att_exp;
and appended to att_exp ; and the resulting attribute is returned.
In form (B) the first argument is a call-by-reference attribute identifier (which may not yet exist). The second argument is then evaluated (and type-converted as needed) and appended to the call-by-reference atttribute. The final size of the attribute is then returned.
temp@range=-10.0; push(&temp@range,12.0); // temp@range=-10.0,12.0 numbers@squares=push(1,4); numbers@squares=push(numbers@squares,9); push(&number@squares,16.0); push(&number@squares,25ull); // numbers@squares=1,4,9,16,25
Now some text examples.
Remember, an atttribute identifier that begins with @ implies a global
attribute.
For example, ’@institution’ is short for ’global@institution’.
global@greetings=push("hello"," world !!"); global@greek={"alpha"s,"beta"s,"gamma"s}; // Append an NC_STRING push(&@greek,"delta"s); // Pushing an NC_CHAR to a NC_STRING attribute is allowed, it is converted to an an NC_CHAR @e="epsilon"; push(&@greek,@e); push(&@greek,"zeta"); // Pushing a single NC_STRING to an NC_CHAR is not allowed @h="hello"; push(&@h," again"s); // BAD PUSH
If the attribute name contains non-regular characters use ID quoting:
'b..m1@c--lost'=23;
See see ID Quoting.
A value list is a special type of attribute.
It can only be used on the RHS of the assign family of statements.
That is =, +=, -=, *=, /=
A value list CANNOT be involved in any logical, binary, or arithmetical operations (except those above).
A value list CANNOT be used as a function argument.
A value list CANNOT have nested value lists.
The type of a value list is the type of the member with the highest type.
a1@trip={1,2,3}; a1@trip+={3,2,1}; // 4,4,4 a1@triplet={a1@min,(a1@min+a1@max)/2,a1@max}; lon[lon]={0.0,90.0,180.0,270.0}; lon*={1.0,1.1,1.2,1.3} dlon[lon]={1b,2s,3ull,4.0f}; // final type NC_FLOAT a1@ind={1,2,3}+{4,4,4}; // BAD a1@s=sin({1.0,16.0}); // BAD
One can also use a value_list to create an attribute of type NC_STRING. Remember, a literal string of type NC_STRING has a postfix ’s’. A value list of NC_CHAR has no semantic meaning and is plain wrong.
array[lon]={1.0,2.,4.0,7.0}; array@numbers={"one"s, "two"s, "four"s, "seven"s}; // GOOD ar[lat]={0,20} ar@numbers={"zero","twenty"}; // BAD
The table below lists the postfix character(s) to add to a number
literal (aka, a naked constant) for explicit type specification.
The same type-specification rules are used for variables and
attributes.
A floating-point number without a postfix defaults to NC_DOUBLE
,
while an integer without a postfix defaults to type NC_INT
:
var[$rlev]=0.1; // Variable will be type NC_DOUBLE var[$lon_grd]=2.0; // Variable will be type NC_DOUBLE var[$gds_crd]=2e3; // Variable will be type NC_DOUBLE var[$gds_crd]=2.0f; // Variable will be type NC_FLOAT (note "f") var[$gds_crd]=2e3f; // Variable will be type NC_FLOAT (note "f") var[$gds_crd]=2; // Variable will be type NC_INT var[$gds_crd]=-3; // Variable will be type NC_INT var[$gds_crd]=2s; // Variable will be type NC_SHORT var[$gds_crd]=-3s; // Variable will be type NC_SHORT var@att=41.; // Attribute will be type NC_DOUBLE var@att=41.f; // Attribute will be type NC_FLOAT var@att=41; // Attribute will be type NC_INT var@att=-21s; // Attribute will be type NC_SHORT var@units="kelvin"; // Attribute will be type NC_CHAR
There is no postfix for characters, use a quoted string instead for
NC_CHAR
.
ncap2
interprets a standard double-quoted string as a value
of type NC_CHAR
.
In this case, any receiving variable must be dimensioned as an array
of NC_CHAR
long enough to hold the value.
To use the newer netCDF4 types NCO must be compiled/linked to
the netCDF4 library and the output file must be of type NETCDF4
:
var[$time]=1UL; // Variable will be type @code{NC_UINT} var[$lon]=4b; // Variable will be type @code{NC_BYTE} var[$lat]=5ull; // Variable will be type @code{NC_UINT64} var[$lat]=5ll; // Variable will be type @code{NC_INT64} var@att=6.0d; // Attribute will be type @code{NC_DOUBLE} var@att=-666L; // Attribute will be type @code{NC_INT} var@att="kelvin"s; // Attribute will be type @code{NC_STRING} (note the "s")
Use a post-quote ‘s’ for NC_STRING
.
Place the letter ‘s’ immediately following the double-quoted string
to indicate that the value is of type NC_STRING
.
In this case, the receiving variable need not have any memory allocated
to hold the string because netCDF4 handles that memory allocation.
Suppose one creates a file containing an ensemble of model results, and
wishes to label the record coordinate with the name of each model.
The NC_STRING
type is well-suited to this because it facilitates
storing arrays of strings of arbitrary length.
This is sophisticated, though easy with ncap2
:
% ncecat -O -u model cesm.nc ecmwf.nc giss.nc out.nc % ncap2 -4 -O -s 'model[$model]={"cesm"s,"ecmwf"s,"giss"s}' out.nc out.nc
The key here to place an ‘s’ character after each double-quoted
string value to indicate an NC_STRING
type.
The ‘-4’ ensures the output filetype is netCDF4 in case the input
filetype is not.
NC_BYTE
, a signed 1-byte integer
NC_CHAR
, an ISO/ASCII character
NC_SHORT
, a signed 2-byte integer
NC_INT
, a signed 4-byte integer
NC_FLOAT
, a single-precision (4-byte) floating-point number
NC_DOUBLE
, a double-precision (8-byte) floating-point number
NC_UBYTE
, an unsigned 1-byte integer
NC_USHORT
, an unsigned 2-byte integer
NC_UINT
, an unsigned 4-byte integer
NC_INT64
, a signed 8-byte integer
NC_UINT64
, an unsigned 8-byte integer
NC_STRING
, a string of arbitrary length
The syntax of the if statement is similar to its C counterpart. The Conditional Operator (ternary operator) has also been implemented.
if(exp1) stmt1; else if(exp2) stmt2; else stmt3; # Can use code blocks as well: if(exp1){ stmt1; stmt1a; stmt1b; }else if(exp2) stmt2; else{ stmt3; stmt3a; stmt3b; }
For a variable or attribute expression to be logically true
all its non-missing value elements must be logically true, i.e.,
non-zero.
The expression can be of any type.
Unlike C there is no short-circuiting of an expression with the
OR (||
) and AND (&&
) operators.
The whole expression is evaluated regardless if one of the AND/OR
operands are True/False.
# Simple example if(time > 0) print("All values of time are greater than zero\n"); else if(time < 0) print("All values of time are less than zero\n"); else { time_max=time.max(); time_min=time.min(); print("min value of time=");print(time_min,"%f"); print("max value of time=");print(time_max,"%f"); } # Example from ddra.nco if(fl_typ == fl_typ_gcm){ var_nbr_apx=32; lmn_nbr=1.0*var_nbr_apx*varsz_gcm_4D; /* [nbr] Variable size */ if(nco_op_typ==nco_op_typ_avg){ lmn_nbr_avg=1.0*var_nbr_apx*varsz_gcm_4D; // Block size lmn_nbr_wgt=dmnsz_gcm_lat; /* [nbr] Weight size */ } // !nco_op_typ_avg }else if(fl_typ == fl_typ_stl){ var_nbr_apx=8; lmn_nbr=1.0*var_nbr_apx*varsz_stl_2D; /* [nbr] Variable size */ if(nco_op_typ==nco_op_typ_avg){ lmn_nbr_avg=1.0*var_nbr_apx*varsz_stl_2D; // Block size lmn_nbr_wgt=dmnsz_stl_lat; /* [nbr] Weight size */ } // !nco_op_typ_avg } // !fl_typ
Conditional Operator
// netCDF4 needed for this example th_nw=(three_dmn_var_sht >= 0 ? three_dmn_var_sht.uint() : \ three_dmn_var_sht.int());
The print statement comes in a variety of forms:
(A) print(variable_name, format string?); (A1) print(expression/string, format string?); (B) sprint(expression/string, format string?); (B1) sprint4(expression/string, format string?);
print()
If the variable exists in I/O then it is printed in a similar fashion to ncks -H
.
print(lon); lon[0]=0 lon[1]=90 lon[2]=180 lon[3]=270 print(byt_2D) lat[0]=-90 lon[0]=0 byt_2D[0]=0 lat[0]=-90 lon[1]=90 byt_2D[1]=1 lat[0]=-90 lon[2]=180 byt_2D[2]=2 lat[0]=-90 lon[3]=270 byt_2D[3]=3 lat[1]=90 lon[0]=0 byt_2D[4]=4 lat[1]=90 lon[1]=90 byt_2D[5]=5 lat[1]=90 lon[2]=180 byt_2D[6]=6 lat[1]=90 lon[3]=270 byt_2D[7]=7
If the first argument is NOT a variable the form (A1) is invoked.
print(mss_val_fst@_FillValue); mss_val_fst@_FillValue, size = 1 NC_FLOAT, value = -999 print("This function \t is monotonic\n"); This function is monotonic print(att_var@float_att) att_var@float_att, size = 7 NC_FLOAT, value = 73, 72, 71, 70.01, 69.001, 68.01, 67.01 print(lon*10.0) lon, size = 4 NC_DOUBLE, value = 0, 900, 1800, 2700
If the format string is specified then the results from (A) and (A1) forms are the same
print(lon_2D_rrg,"%3.2f,"); 0.00,0.00,180.00,0.00,180.00,0.00,180.00,0.00, print(lon*10.0,"%g,") 0,900,1800,2700, print(att_var@float_att,"%g,") 73,72,71,70.01,69.001,68.01,67.01,
sprint() & sprint4()
These functions work in an identical fashion to (A1) except that sprint()
outputs a regular netCDF3 NC_CHAR
attribute
and sprint4()
outputs a netCDF4 NC_STRING
attribute
time@units=sprint(yyyy,"days since %d-1-1") bnd@num=sprint4(bnd_idx,"Band number=%d") time@arr=sprint4(time,"%.2f,") // "1.00,2.00,3.00,4.00,5.00,6.00,7.00,8.00,9.00,10.00,"
You can also use sprint4()
to convert a NC_CHAR
string to a NC_STRING
string
and sprint()
to convert a NC_STRING
to a NC_CHAR
lat_1D_rct@long_name = "Latitude for 2D rectangular grid stored as 1D arrays"; // // convert to NC_STRING lat_1D_rct@long_name = sprint4(lat_1D_rct@long_name)
hyperslab a netCDF string
It is possible to index-into an NC_CHAR string, similar to a C-String.
Unlike a C-String, however, an NC_CHAR string has no null-character to
mark its termination.
One CANNOT index into an NC_STRING string.
One must must convert to an NC_CHAR first.
global@greeting="hello world!!!" @h=@greeting(0:4); // "hello" @w=@greeting(6:11); // "world" // can use negative inidices @x=@greeting(-3:-1); // "!!!" // can use stride @n=@greeting(::2); // "hlowrd!" // concatenation global@new_greeting=push(@h, " users !!!"); // "hello users!!!" @institution="hotel california"s; @h=@institution(0:4); // BAD // convert NC_STRING to NC_CHAR @is=sprint(@institution); @h=@is(0:4); // "hotel" // convert NC_CHAR to NC_STRING @h=sprint4(@h);
get_vars_in() & get_vars_out()
att_lst=get_vars_in(att_regexp?) att_lst=get_vars_out(att_regexp?)
These functions are used to create a list of vars in Input or Output. The optional arg ’att_regexp’. Can be an NC_CHAR att or a NC_STRING att. If NC_CHAR then only a single reg-exp can be specified. If NC_STRING then multiple reg-exp can be specified. The output is allways an NC_STRING att. The matching works in an identical fashion to the -v switch in ncks. if there is no arg then all vars are returned.
@slist=get_vars_in("^time"); // "time", "time_bnds", "time_lon", "time_udunits" // Use NC_STRINGS @regExp={".*_bnd"s,".*_grd"s} @slist=get_vars_in(@regExp); // "lat_bnd", "lat_grd", "lev_bnd", "lon_grd", "time_bnds", "cnv_CF_grd"
Missing values operate slightly differently in ncap2
Consider the expression where op is any of the following operators (excluding ’=’)
Arithmetic operators ( * / % + - ^ ) Binary Operators ( > >= < <= == != == || && >> << ) Assign Operators ( += -= /= *= ) var1 'op' var2
If var1 has a missing value then this is the value used in the
operation, otherwise the missing value for var2 is used.
If during the element-by-element operation an element from either
operand is equal to the missing value then the missing value is carried
through.
In this way missing values ’percolate’ or propagate through an
expression.
Missing values associated with Output variables are stored in memory and
are written to disk after the script finishes.
During script execution its possible (and legal) for the missing value
of a variable to take on several different values.
# Consider the variable: int rec_var_int_mss_val_int(time); =-999,2,3,4,5,6,7,8,-999,-999; rec_var_int_mss_val_int:_FillValue = -999; n2=rec_var_int_mss_val_int + rec_var_int_mss_val_int.reverse($time); n2=-999,-999,11,11,11,11,11,11,999,-999;
The following methods query or manipulate missing value (aka
_FillValue
information associated with a variable.
The methods that “manipulate” only succeed on variables in Output.
set_miss(expr)
¶The numeric argument expr becomes the new missing value,
overwriting the old missing value, if any.
The argument given is converted if necessary to the variable’s type.
NB: This only changes the missing value attribute.
Missing values in the original variable remain unchanged, and thus
are no long considered missing values.
They are effectively “orphaned”.
Thus set_miss()
is normally used only when creating new
variables.
The intrinsic function change_miss()
(see below) is typically
used to edit values of existing variables.
change_miss(expr)
¶Sets or changes (any pre-existing) missing value attribute and missing data values to expr. NB: This is an expensive function since all values must be examined. Use this function when changing missing values for pre-existing variables.
get_miss()
¶Returns the missing value of a variable. If the variable exists in Input and Output then the missing value of the variable in Output is returned. If the variable has no missing value then an error is returned.
delete_miss()
¶Delete the missing value associated with a variable.
number_miss()
¶Count the number of missing values a variable contains.
has_miss()
¶Returns 1 (True) if the variable has a missing value associated with it. else returns 0 (False)
missing()
¶This function creates a True/False mask array of where the missing value is set.
It is syntatically equivalent to (var_in == var_in.get_miss())
,
except that requires deleting the missing value before-hand.
th=three_dmn_var_dbl; th.change_miss(-1e10d); /* Set values less than 0 or greater than 50 to missing value */ where(th < 0.0 || th > 50.0) th=th.get_miss(); # Another example: new[$time,$lat,$lon]=1.0; new.set_miss(-997.0); // Extract all elements evenly divisible by 3 where (three_dmn_var_dbl%3 == 0) new=three_dmn_var_dbl; elsewhere new=new.get_miss(); // Print missing value and variable summary mss_val_nbr=three_dmn_var_dbl.number_miss(); print(three_dmn_var_dbl@_FillValue); print("Number of missing values in three_dmn_var_dbl: "); print(mss_val_nbr,"%d"); print(three_dmn_var_dbl); // Find total number of missing values along dims $lat and $lon mss_ttl=three_dmn_var_dbl.missing().ttl($lat,$lon); print(mss_ttl); // 0, 0, 0, 8, 0, 0, 0, 1, 0, 2 ;
simple_fill_miss(var)
¶This function takes a variable and attempts to fill missing values using an average of up to the 4 nearest neighbour grid points. The method used is iterative (up to 1000 cycles). For very large areas of missing values results can be unpredictable. The given variable must be at least 2D; and the algorithm assumes that the last two dims are lat/lon or y/x
weighted_fill_miss(var)
¶Weighted_fill_miss is more sophisticated. Up to 8 nearest neighbours are used to calculate a weighted average. The weighting used is the inverse square of distance. Again the method is iterative (up to 1000 cycles). The area filled is defined by the final two dims of the variable. In addition this function assumes the existance of coordinate vars the same name as the last two dims. if it doesn’t find these dims it will gently exit with warning.
The convention within this document is that methods can be used as
functions.
However, functions are not and cannot be used as methods.
Methods can be daisy-chained d and their syntax is cleaner than functions.
Method names are reserved words and CANNOT be used as variable names.
The command ncap2 -f
shows the complete list of methods available
on your build.
n2=sin(theta) n2=theta.sin() n2=sin(theta)^2 + cos(theta)^2 n2=theta.sin().pow(2) + theta.cos()^2
This statement chains together methods to convert three_dmn_var_sht to type double, average it, then convert this back to type short:
three_avg=three_dmn_var_sht.double().avg().short();
Aggregate Methods
These methods mirror the averaging types available in ncwa
. The arguments to the methods are the dimensions to average over. Specifying no dimensions is equivalent to specifying all dimensions i.e., averaging over all dimensions. A masking variable and a weighting variable can be manually created and applied as needed.
avg()
¶Mean value
sqravg()
¶Square of the mean
avgsqr()
Mean of sum of squares
max()
¶Maximum value
min()
¶Minimum value
mabs()
¶Maximum absolute value
mebs()
¶Mean absolute value
mibs()
¶Minimum absolute value
rms()
Root-mean-square (normalize by N)
rmssdn()
¶Root-mean square (normalize by N-1)
tabs() or ttlabs()
¶Sum of absolute values
ttl() or total() or sum()
¶Sum of values
// Average a variable over time four_time_avg=four_dmn_rec_var($time);
Packing Methods
For more information see see Packed data and see ncpdq
netCDF Permute Dimensions Quickly
pack() & pack_short()
¶The default packing algorithm is applied and variable is packed to NC_SHORT
pack_byte()
¶Variable is packed to NC_BYTE
pack_short()
¶Variable is packed to NC_SHORT
pack_int()
¶Variable is packed to NC_INT
unpack()
¶The standard unpacking algorithm is applied.
NCO automatically unpacks packed data before arithmetically
modifying it.
After modification NCO stores the unpacked data.
To store it as packed data again, repack it with, e.g., the
pack()
function.
To ensure that temperature
is packed in the output file,
regardless of whether it is packed in the input file, one uses, e.g.,
ncap2 -s 'temperature=pack(temperature-273.15)' in.nc out.nc
All the above pack functions also take the additional two arguments
scale_factor, add_offset
.
Both arguments must be included:
ncap2 -v -O -s 'rec_pck=pack(three_dmn_rec_var,-0.001,40.0);' in.nc foo.nc
Basic Methods
These methods work with variables and attributes. They have no arguments.
size()
¶Total number of elements
ndims()
¶Number of dimensions in variable
type()
¶Returns the netcdf type (see previous section)
exists()
¶Return 1 (true) if var or att is present in I/O else return 0 (false)
getdims()
¶Returns an NC_STRING attribute of all the dim names of a variable
Utility Methods
These functions are used to manipulate missing values and RAM variables.
see Missing values ncap2
set_miss(expr)
Takes one argument, the missing value. Sets or overwrites the existing missing value. The argument given is converted if necessary to the variable type. (NB: pre-existing missing values, if any, are not converted).
change_miss(expr)
Changes the missing value elements of the variable to the new missing value (NB: an expensive function).
get_miss()
Returns the missing value of a variable in Input or Output
delete_miss()
Deletes the missing value associated with a variable.
has_miss()
Returns 1 (True) if the variable has a missing else returns 0 (False)
number_miss
Returns the number of missing values a variable contains
ram_write()
Writes a RAM variable to disk i.e., converts it to a regular disk type variable
ram_delete()
Deletes a RAM variable or an attribute
PDQ Methods
See see ncpdq
netCDF Permute Dimensions Quickly
reverse(dim args)
Reverse the dimension ordering of elements in a variable.
permute(dim args)
Re-shape variables by re-ordering the dimensions.
All the dimensions of the variable must be specified in the
arguments.
A limitation of this permute (unlike ncpdq
) is that the
record dimension cannot be re-assigned.
// Swap dimensions about and reorder along lon
lat_2D_rrg_new=lat_2D_rrg.permute($lon,$lat).reverse($lon); lat_2D_rrg_new=0,90,-30,30,-30,30,-90,0
Type Conversion Methods and Functions
These methods allow ncap2
to convert variables and
attributes to the different netCDF types.
For more details on automatic and manual type conversion see
(see Type Conversion).
netCDF4 types are only available if you have compiled/links
NCO with the netCDF4 library and the Output file is
HDF5.
netCDF3/4 Types
byte()
¶convert to NC_BYTE
, a signed 1-byte integer
char()
¶convert to NC_CHAR
, an ISO/ASCII character
short()
¶convert to NC_SHORT
, a signed 2-byte integer
int()
¶convert to NC_INT
, a signed 4-byte integer
float()
¶convert to NC_FLOAT
, a single-precision (4-byte) floating-point number
double()
¶convert to NC_DOUBLE
, a double-precision (8-byte) floating-point number
netCDF4 Types
ubyte()
¶convert to NC_UBYTE
, an unsigned 1-byte integer
ushort()
¶convert to NC_USHORT
, an unsigned 2-byte integer
uint()
¶convert to NC_UINT
, an unsigned 4-byte integer
int64()
¶convert to NC_INT64
, a signed 8-byte integer
uint64()
¶convert to NC_UINT64
, an unsigned 8-byte integer
You can also use the convert()
method to do type conversion.
This takes an integer agument.
For convenience, ncap2
defines the netCDF pre-processor tokens
as RAM variables.
For example you may wish to convert a non-floating point variable to the
same type as another variable.
lon_type=lon.type(); if(time.type() != NC_DOUBLE && time.type() != NC_FLOAT) time=time.convert(lon_type);
Intrinsic Mathematical Methods
The list of mathematical methods is system dependant.
For the full list see Intrinsic mathematical methods
All the mathematical methods take a single argument except atan2()
and pow()
which take two.
If the operand type is less than float then the result will be of
type float.
Arguments of type double yield results of type double.
Like the other methods, you are free to use the mathematical methods as functions.
n1=pow(2,3.0f) // n1 type float n2=atan2(2,3.0) // n2 type double n3=1/(three_dmn_var_dbl.cos().pow(2))-tan(three_dmn_var_dbl)^2; // n3 type double
Unlike regular variables, RAM variables are never written to disk.
Hence using RAM variables in place of regular variables (especially
within loops) significantly increases execution speed.
Variables that are frequently accessed within for
or where
clauses provide the greatest opportunities for optimization.
To declare and define a RAM variable simply prefix the variable name
with an asterisk (*
) when the variable is declared/initialized.
To delete RAM variables (and recover their memory) use the
ram_delete()
method.
To write a RAM variable to disk (like a regular variable) use
ram_write()
.
*temp[$time,$lat,$lon]=10.0; // Cast *temp_avg=temp.avg($time); // Regular assign temp_avg.ram_write(); // Write Variable to output temp.ram_delete(); // Delete RAM variable // Create and increment a RAM variable from "one" in Input *one++; // Create RAM variables from the variables three and four in Input. // Multiply three by 10 and add it to four. *four+=*three*=10; // three=30, four=34
The where()
statement combines the definition and application of
a mask and can lead to succinct code.
The syntax of a where()
statement is:
// Single assign ('elsewhere' is optional) where(mask) var1=expr1; elsewhere var1=expr2; // Multiple assigns where(mask){ var1=expr1; var2=expr2; ... }elsewhere{ var1=expr3 var2=expr4 var3=expr5; ... }
ncap2
assign.
The LHS var must already exist in Input or Output.
The RHS expression must evaluate to a scalar or a variable/attribute of
the same size as the LHS variable.
Consider the variables float lon_2D_rct(lat,lon);
and
float var_msk(lat,lon);
.
Suppose we wish to multiply by two the elements for which var_msk
equals 1:
where(var_msk == 1) lon_2D_rct=2*lon_2D_rct;
Suppose that we have the variable int RDM(time)
and that we want
to set its values less than 8 or greater than 80 to 0:
where(RDM < 8 || RDM > 80) RDM=0;
To use where
on a variable hyperslab, define and use a temporary
variable, e.g.,
*var_tmp=var2(:,0,:,:); where (var1 < 0.5) var_tmp=1234; var2(;,0,:,;)=var_tmp; ram_delete(var_tmp);
Consider irregularly gridded data, described using rank 2 coordinates:
double lat(south_north,east_west)
,
double lon(south_north,east_west)
,
double temperature(south_north,east_west)
.
This type of structure is often found in regional weather/climate model
(such as WRF) output, and in satellite swath data.
For this reason we call it “Swath-like Data”, or SLD.
To find the average temperature in a region bounded by
[lat_min,lat_max] and [lon_min,lon_max]:
temperature_msk[$south_north,$east_west]=0.0; where((lat >= lat_min && lat <= lat_max) && (lon >= lon_min && lon <= lon_max)) temperature_msk=temperature; elsewhere temperature_msk=temperature@_FillValue; temp_avg=temperature_msk.avg(); temp_max=temperature.max();
For North American Regional Reanalysis (NARR) data (example dataset) the procedure looks like this
ncap2 -O -v -S ~/narr.nco ${DATA}/hdf/narr_uwnd.199605.nc ~/foo.nc
where narr.nco is an ncap2
script like this:
/* North American Regional Reanalysis (NARR) Statistics NARR stores grids with 2-D latitude and longitude, aka Swath-like Data (SLD) Here we work with three variables: lat(y,x), lon(y,x), and uwnd(time,level,y,x); To study sub-regions of SLD, we use masking techniques: 1. Define mask as zero times variable to be masked Then mask automatically inherits variable attributes And average below will inherit mask attributes 2. Optionally, create mask as RAM variable (as below with asterisk *) NCO does not write RAM variable to output Masks are often unwanted, and can be big, so this speeds execution 3. Example could be extended to preserve mean lat and lon of sub-region Follow uwnd example to do this: lat_sk=0.0*lat ... lat_avg=lat.avg($y,$x) */ *uwnd_msk=0.0*uwnd; where((lat >= 35.6 && lat <= 37.0) && (lon >= -100.5 && lon <= -99.0)) uwnd_msk=uwnd; elsewhere uwnd_msk=uwnd@_FillValue; // Average only over horizontal dimensions x and y (preserve level and time) uwnd_avg=uwnd_msk.avg($y,$x);
Stripped of comments and formatting, this example is a three-statement
script executed by a one-line command.
NCO needs only this meagre input to unpack and copy the input
data and attributes, compute the statistics, and then define and write
the output file.
Unless the comments pointed out that wind variable (uwnd
) was
four-dimensional and the latitude/longitude grid variables were both
two-dimensional, there would be no way to tell.
This shows how NCO hides from the user the complexity of
analyzing multi-dimensional SLD.
We plan to extend such SLD features to more operators soon.
ncap2
supplies for()
loops and while()
loops.
They are completely unoptimized so use them only with RAM
variables unless you want thrash your disk to death.
To break out of a loop use the break
command.
To iterate to the next cycle use the continue
command.
// Set elements in variable double temp(time,lat) // If element < 0 set to 0, if element > 100 set to 100 *sz_idx=$time.size; *sz_jdx=$lat.size; for(*idx=0;idx<sz_idx;idx++) for(*jdx=0;jdx<sz_jdx;jdx++) if(temp(idx,jdx) > 100) temp(idx,jdx)=100.0; else if(temp(idx,jdx) < 0) temp(idx,jdx)=0.0; // Are values of co-ordinate variable double lat(lat) monotonic? *sz=$lat.size; for(*idx=1;idx<sz;idx++) if(lat(idx)-lat(idx-1) < 0.0) break; if(idx == sz) print("lat co-ordinate is monotonic\n"); else print("lat co-ordinate is NOT monotonic\n"); // Sum odd elements *idx=0; *sz=$lat_nw.size; *sum=0.0; while(idx<sz){ if(lat(idx)%2) sum+=lat(idx); idx++; } ram_write(sum); print("Total of odd elements ");print(sum);print("\n");
The syntax of an include-file is:
#include "script.nco" #include "/opt/SOURCES/nco/data/tst.nco"
If the filename is relative and not absolute then the directory searched is relative to the run-time directory. It is possible to nest include files to an arbitrary depth. A handy use of inlcude files is to store often used constants. Use RAM variables if you do not want these constants written to nc-file.
output-file.
// script.nco // Sample file to #include in ncap2 script *pi=3.1415926535; // RAM variable, not written to output *h=6.62607095e-34; // RAM variable, not written to output e=2.71828; // Regular (disk) variable, written to output
As of NCO version 4.6.3 (December, 2016), The user can specify the directory(s) to be searched by specifing them in the UNIX environment var NCO_PATH
. The format used is identical to the UNIX PATH
. The directory(s) are only searched if the include filename is relative.
export NCO_PATH=":/home/henryb/bin/:/usr/local/scripts:/opt/SOURCES/nco/data:"
sort
methods ¶In ncap2 there are multiple ways to sort data. Beginning with NCO 4.1.0 (March, 2012), ncap2 support six sorting functions:
var_out=sort(var_in,&srt_map); // Ascending sort var_out=asort(var_in,&srt_map); // Accending sort var_out=dsort(var_in,&srt_map); // Desending sort var_out=remap(var_in,srt_map); // Apply srt_map to var_in var_out=unmap(var_in,srt_map); // Reverse what srt_map did to var_in dsr_map=invert_map(srt_map); // Produce "de-sort" map that inverts srt_map
The first two functions, sort()
and asort()
sort, in ascending order, all the elements of var_in (which can be
a variable or attribute) without regard to any dimensions.
The third function, dsort()
does the same but sorts in
descending order.
Remember that ascending and descending sorts are specified by
asort()
and dsort()
, respectively.
These three functions are overloaded to take a second, optional argument called the sort map srt_map, which should be supplied as a call-by-reference variable, i.e., preceded with an ampersand. If the sort map does not yet exist, then it will be created and returned as an integer type the same shape as the input variable.
The output var_out of each sort function is a sorted version of
the input, var_in.
The output var_out of the two mapping functions the result of
applying (with remap()
or un-applying (with unmap()
)
the sort map srt_map to the input var_in.
To apply the sort map with remap()
the size of the variable
must be exactly divisible by the size of the sort map.
The final function invert_map()
returns the so-called
de-sorting map dsr_map which is the inverse of the input map
srt_map.
This gives the user access to both the forward and inverse sorting maps:
a1[$time]={10,2,3,4,6,5,7,3,4,1}; a1_sort=sort(a1); print(a1_sort); // 1, 2, 3, 3, 4, 4, 5, 6, 7, 10; a2[$lon]={2,1,4,3}; a2_sort=sort(a2,&a2_map); print(a2); // 1, 2, 3, 4 print(a2_map); // 1, 0, 3, 2;
If the map variable does not exist prior to the sort()
call,
then it will be created with the same shape as the input variable and be
of type NC_INT
.
If the map variable already exists, then the only restriction is that it
be of at least the same size as the input variable.
To apply a map use remap(var_in,srt_map)
.
defdim("nlat",5); a3[$lon]={2,5,3,7}; a4[$nlat,$lon]={ 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12, 13,14,15,16, 17,18,19,20}; a3_sort=sort(a3,&a3_map); print(a3_map); // 0, 2, 1, 3; a4_sort=remap(a4,a3_map); print(a4_sort); // 1, 3, 2, 4, // 5, 7, 6, 8, // 9,11,10,12, // 13,15,14,16, // 17,19,18,20; a3_map2[$nlat]={4,3,0,2,1}; a4_sort2=remap(a4,a3_map2); print(a4_sort2); // 3, 5, 4, 2, 1 // 8, 10, 9,7, 6, // 13,15,14,12,11, // 18,20,19,17,16
As in the above example you may create your own sort map.
To sort in descending order, apply the reverse()
method after the
sort()
.
Here is an extended example of how to use ncap2
features to
hyperslab an irregular region based on the values of a variable not a
coordinate.
The distinction is crucial: hyperslabbing based on dimensional indices
or coordinate values is straightforward.
Using the values of single or multi-dimensional variable to define a
hyperslab is quite different.
cat > ~/ncap2_foo.nco << 'EOF' // Purpose: Save irregular 1-D regions based on variable values // Included in NCO User Guide at http://nco.sf.net/nco.html#sort /* NB: Single quotes around EOF above turn off shell parameter expansion in "here documents". This in turn prevents the need for protecting dollarsign characters in NCO scripts with backslashes when the script is cut-and-pasted (aka "moused") from an editor or e-mail into a shell console window */ /* Copy coordinates and variable(s) of interest into RAM variable(s) Benefits: 1. ncap2 defines writes all variables on LHS of expression to disk Only exception is RAM variables, which are stored in RAM only Repeated operations on regular variables takes more time, because changes are written to disk copy after every change. RAM variables are only changed in RAM so script works faster RAM variables can be written to disk at end with ram_write() 2. Script permutes variables of interest during processing Safer to work with copies that have different names This discourages accidental, mistaken use of permuted versions 3. Makes this script a more generic template: var_in instead of specific variable names everywhere */ *var_in=one_dmn_rec_var; *crd_in=time; *dmn_in_sz=$time.size; // [nbr] Size of input arrays /* Create all other "intermediate" variables as RAM variables to prevent them from cluttering the output file. Mask flag and sort map are same size as variable of interest */ *msk_flg=var_in; *srt_map=var_in; /* In this example we mask for all values evenly divisible by 3 This is the key, problem-specific portion of the template Replace this where() condition by that for your problem Mask variable is Boolean: 1=Meets condition, 0=Fails condition */ where(var_in % 3 == 0) msk_flg=1; elsewhere msk_flg=0; // print("msk_flg = ");print(msk_flg); // For debugging... /* The sort() routine is overloaded, and takes one or two arguments The second argument (optional) is the "sort map" (srt_map below) Pass the sort map by reference, i.e., prefix with an ampersand If the sort map does not yet exist, then it will be created and returned as an integer type the same shape as the input variable. The output of sort(), on the LHS, is a sorted version of the input msk_flg is not needed in its original order after sort() Hence we use msk_flg as both input to and output from sort() Doing this prevents the need to define a new, unneeded variable */ msk_flg=sort(msk_flg,&srt_map); // Count number of valid points in mask by summing the one's *msk_nbr=msk_flg.total(); // Define output dimension equal in size to number of valid points defdim("crd_out",msk_nbr); /* Now sort the variable of interest using the sort map and remap() The output, on the LHS, is the input re-arranged so that all points meeting the mask condition are contiguous at the end of the array Use same srt_map to hyperslab multiple variables of the same shape Remember to apply srt_map to the coordinate variables */ crd_in=remap(crd_in,srt_map); var_in=remap(var_in,srt_map); /* Hyperslab last msk_nbr values of variable(s) of interest */ crd_out[crd_out]=crd_in((dmn_in_sz-msk_nbr):(dmn_in_sz-1)); var_out[crd_out]=var_in((dmn_in_sz-msk_nbr):(dmn_in_sz-1)); /* NB: Even though we created all variables possible as RAM variables, the original coordinate of interest, time, is written to the ouput. I'm not exactly sure why. For now, delete it from the output with: ncks -O -x -v time ~/foo.nc ~/foo.nc */ EOF ncap2 -O -v -S ~/ncap2_foo.nco ~/nco/data/in.nc ~/foo.nc ncks -O -x -v time ~/foo.nc ~/foo.nc ncks ~/foo.nc
Here is an extended example of how to use ncap2
features to
sort multi-dimensional arrays based on the coordinate values along a
single dimension.
cat > ~/ncap2_foo.nco << 'EOF' /* Purpose: Sort multi-dimensional array based on coordinate values This example sorts the variable three_dmn_rec_var(time,lat,lon) based on the values of the time coordinate. */ // Included in NCO User Guide at http://nco.sf.net/nco.html#sort // Randomize the time coordinate time=10.0*gsl_rng_uniform(time); //print("original randomized time = \n");print(time); /* The sort() routine is overloaded, and takes one or two arguments The first argument is a one dimensional array The second argument (optional) is the "sort map" (srt_map below) Pass the sort map by reference, i.e., prefix with an ampersand If the sort map does not yet exist, then it will be created and returned as an integer type the same shape as the input variable. The output of sort(), on the LHS, is a sorted version of the input */ time=sort(time,&srt_map); //print("sorted time (ascending order) and associated sort map =\n");print(time);print(srt_map); /* sort() always sorts in ascending order The associated sort map therefore re-arranges the original, randomized time array into ascending order. There are two methods to obtain the descending order the user wants 1) We could solve the problem in ascending order (the default) and then apply the reverse() method to re-arrange the results. 2) We could change the sort map to return things in descending order of time and solve the problem directly in descending order. */ // Following shows how to do method one: /* Expand the sort map to srt_map_3d, the size of the data array 1. Use data array to provide right shape for the expanded sort map 2. Coerce data array into an integer so srt_map_3d is an integer 3. Multiply data array by zero so 3-d map elements are all zero 4. Add the 1-d sort map to the 3-d sort map (NCO automatically resizes) 5. Add the spatial (lat,lon) offsets to each time index 6. de-sort using the srt_map_3d 7. Use reverse to obtain descending in time order Loops could accomplish the same thing (exercise left for reader) However, loops are slow for large datasets */ /* Following index manipulation requires understanding correspondence between 1-d (unrolled, memory order of storage) and access into that memory as a multidimensional (3-d, in this case) rectangular array. Key idea to understand is how dimensionality affects offsets */ // Copy 1-d sort map into 3-d sort map srt_map_3d=(0*int(three_dmn_rec_var))+srt_map; // Multiply base offset by factorial of lesser dimensions srt_map_3d*=$lat.size*$lon.size; lon_idx=array(0,1,$lon); lat_idx=array(0,1,$lat)*$lon.size; lat_lon_idx[$lat,$lon]=lat_idx+lon_idx; srt_map_3d+=lat_lon_idx; print("sort map 3d =\n");print(srt_map_3d); // Use remap() to re-map the data three_dmn_rec_var=remap(three_dmn_rec_var,srt_map_3d); // Finally, reverse data so time coordinate is descending time=time.reverse($time); //print("sorted time (descending order) =\n");print(time); three_dmn_rec_var=three_dmn_rec_var.reverse($time); // Method two: Key difference is srt_map=$time.size-srt_map-1; EOF ncap2 -O -v -S ~/ncap2_foo.nco ~/nco/data/in.nc ~/foo.nc
As of NCO version 4.6.3 (December, 2016), ncap2
includes support for UDUnits conversions.
The function is called udunits
.
Its syntax is
varOut=udunits(varIn,"UnitsOutString")
The udunits()
function looks for the attribute of
varIn@units
and fails if it is not found.
A quirk of this function that due to attribute propagation
varOut@units
will be overwritten by varIn@units
.
It is best to re-initialize this attribute AFTER the call.
In addition if varIn@units
is of the form
"time_interval since basetime"
then the calendar attribute
varIn@calendar
will read it.
If it does not exist then the calendar used defaults to mixed
Gregorian/Julian as defined by UDUnits.
If varIn
is not a floating-point type then it is promoted to
NC_DOUBLE
for the system call in the UDUnits library,
and then demoted back to its original type after.
T[lon]={0.0,100.0,150.0,200.0}; T@units="Celsius"; // Overwrite variable T=udunits(T,"kelvin"); print(T); // 273.15, 373.15, 423.15, 473.15 ; T@units="kelvin"; // Rebase coordinate days to hours timeOld=time; print(timeOld); // 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ; timeOld@units="days since 2012-01-30"; @units="hours since 2012-02-01 01:00"; timeNew=udunits(timeOld, @units); timeNew@units=@units; print(timeNew); // -25, -1, 23, 47, 71, 95, 119, 143, 167, 191 ; tOld=time; // NB: Calendar=365_day has NO Leap year tOld@calendar="365_day"; tOld@units="minutes since 2012-02-28 23:58:00.00"; @units="seconds since 2012-03-01 00:00"; tNew=udunits(tOld, @units); tNew@units=@units; print(tNew); // -60, 0, 60, 120, 180, 240, 300, 360, 420, 480
strftime()
The var_str=strtime(var_time,fmt_sng)
method takes a time-based variable and a format string and returns an NC_STRING
variable (of the same shape as var_time) of time-stamps in the form specified by ’fmt_sng’. In order to run this command output type must be netCDF4.
ncap2 -4 -v -O -s 'time_str=strftime(time,"%Y-%m-%d");' in.nc foo.nc time_str="1964-03-13", "1964-03-14", "1964-03-15", "1964-03-16", "1964-03-17", "1964-03-18", "1964-03-19", "1964-03-20", "1964-03-21", "1964-03-22" ;
Under the hood there are a few steps invoved:
First the method reads var_time@units
and
var_time@calendar
(if present) then converts var_time
to
seconds since 1970-01-01
.
It then converts these possibly UTC seconds to the standard struture
struct *tm
.
Finally strftime()
is called with fmt_sng
and the
*tm
struct.
The C-standard strftime()
is used as defined in time.h.
If the method is called without fmt_sng then the following default is
used: "%Y-%m-%d %H:%M:%S"
.
The method regular
takes a single var argument and uses the
above default string.
ncap2 -4 -v -O -s 'time_str=regular(time);' in.nc foo.nc time_str = "1964-03-13 21:09:00", "1964-03-14 21:09:00", "1964-03-15 21:09:00", "1964-03-16 21:09:00", "1964-03-17 21:09:00", "1964-03-18 21:09:00", "1964-03-19 21:09:00", "1964-03-20 21:09:00", "1964-03-21 21:09:00", "1964-03-22 21:09:00" ;
Another working example
ncap2 -v -O -s 'ts=strftime(frametime(0),"%Y-%m-%d/envlog_netcdf_L1_ua-mac_%Y-%m-%d.nc");' in.nc out.nc ts="2017-08-11/envlog_netcdf_L1_ua-mac_2017-08-11.nc"
A variable-pointer or vpointer is a pointer to a
variable or attribute.
It is most useful when one needs to apply a set of operations on a list
of variables.
For example, after regular processing one may wish to set the
_FillValue
of all NC_FLOAT
variables to a particular
value, or to create min/max attributes for all 3D variables of type
NC_DOUBLE
.
A vpointer is not a ’pointer’ to a memory location in the C/C++
sense.
Rather the vpointer is a text attribute that contains the name of a
variable.
To use the pointer simply prefix the pointer with *
.
Then, most places where you use VAR_ID
you can use
*vpointer_nm.
There are a variety of ways to maintain a list of strings in
ncap2
.
The easiest method is to use an NC_STRING
attribute.
Below is a simple illustration that uses a vpointer of type
NC_CHAR
.
Remember an attribute starting with @
implies ’global’, e.g.,
@vpx
is short for global@vpx
.
idx=9; idy=20; t2=time; global@vpx="idx"; // Increment idx by one *global@vpx++; print(idx); // Multiply by 5 *@vpx*=5; // idx now 50 print(idx); // Add 200 (long method) *@vpx=*@vpx+200; //idx now 250 print(idx); @vpy="idy"; // Add idx idy to get idz idz=*@vpx+*@vpy; // idz == 270 print(idz); // We can also reference variables in the input file // Can use an existing attribute pointer since attributes are not written // to the netCDF file until after the script has finished. @vpx="three_dmn_var"; // We can convert this variable to type NC_DOUBLE and // write it to ouptut all at once *@vpx=*@vpx.double();
The following script writes to the output files all variables that are
of type NC_DOUBLE
and that have at least two dimensions.
It then changes their _FillValue
to 1.0E-9
.
The function get_vars_in()
creates an NC_STRING
attribute
that contains all of the variable names in the input file.
Note that a vpointer must be a plain attribute, NOT an a attribute
expression.
Thus in the below script using *all(idx)
would be a fundamental
mistake.
In the below example the vpointer var_nm
is of type
NC_STRING
.
@all=get_vars_in(); *sz=@all.size(); *idx=0; for(idx=0;idx<sz;idx++){ // @var_nm is of type NC_STRING @var_nm=@all(idx); if(*@var_nm.type() == NC_DOUBLE && *@var_nm.ndims() >= 2){ *@var_nm=*@var_nm; *@var_nm.change_miss(1e-9d); } }
The following script writes to the output file all 3D/4D
variables of type NC_FLOAT
.
Then for each variable it calculates a range
attribute that
contains the maximum and minimum values, and a total
attribute
that is the sum of all the elements.
In this example vpointers are used to ’point’ to attributes.
@all=get_vars_in(); *sz=@all.size(); for(*idx=0;idx<sz;idx++){ @var_nm=@all(idx); if(*@var_nm.ndims() >= 3){ *@var_nm=*@var_nm.float(); // The push function also takes a call-by-ref attribute: if it does not already exist then it will be created // The call below pushes a NC_STRING to an att so the end result is a list of NC_STRINGS push(&@prc,@var_nm); } } *sz=@prc.size(); for(*idx=0;idx<sz;idx++){ @var_nm=@prc(idx); // We can work with attribute pointers as well // sprint() ouptut is of type NC_CHAR @att_total=sprint(@var_nm,"%s@total"); @att_range=sprint(@var_nm,"%s@range"); // If you are still confused then print out the attributes print(@att_total); print(@att_range); *@att_total=*@var_nm.total(); *@att_range={min(*@var_nm),max(*@var_nm)}; }
This is the CDL dump of a variable processed by the above script:
float three_dmn_var_int(time, lat, lon) ; three_dmn_var_int:_FillValue = -99.f ; three_dmn_var_int:long_name = "three dimensional record variable of type int" ; three_dmn_var_int:range = 1.f, 80.f ; three_dmn_var_int:total = 2701.f ; three_dmn_var_int:units = "watt meter-2" ;
NCO is capable of analyzing datasets for many different underlying coordinate grid types. netCDF was developed for and initially used with grids comprised of orthogonal dimensions forming a rectangular coordinate system. We call such grids standard grids. It is increasingly common for datasets to use metadata to describe much more complex grids. Let us first define three important coordinate grid properties: regularity, rectangularity, and structure.
Grids are regular if the spacing between adjacent is constant. For example, a 4-by-5 degree latitude-longitude grid is regular because the spacings between adjacent latitudes (4 degrees) are constant as are the (5 degrees) spacings between adjacent longitudes. Spacing in irregular grids depends on the location along the coordinate. Grids such as Gaussian grids have uneven spacing in latitude (points cluster near the equator) and so are irregular.
Grids are rectangular if the number of elements in any dimension is not a function of any other dimension. For example, a T42 Gaussian latitude-longitude grid is rectangular because there are the same number of longitudes (128) for each of the (64) latitudes. Grids are non-rectangular if the elements in any dimension depend on another dimension. Non-rectangular grids present many special challenges to analysis software like NCO.
Grids are structured if they are represented as functions of two horizontal spatial dimensions. For example, grids with latitude and longitude dimensions are structured, and so are curvilinear grids with along-track and cross-track dimensions. A grid with a single dimension is unstructured. For example, icosohedral grids are usually unstructured, as are MPAS grids.
Wrapped coordinates (see Wrapped Coordinates), such as longitude, are independent of these grid properties (regularity, rectangularity, structure).
The preferred NCO technique to analyze data on non-standard
coordinate grids is to create a region mask with ncap2
, and
then to use the mask within ncap2
for variable-specific
processing, and/or with other operators (e.g., ncwa
,
ncdiff
) for entire file processing.
Before describing the construction of masks, let us review how irregularly gridded geoscience data are described. Say that latitude and longitude are stored as R-dimensional arrays and the product of the dimension sizes is the total number of elements N in the other variables. Geoscience applications tend to use R=1, R=2, and R=3.
If the grid is has no simple representation (e.g., discontinuous) then it makes sense to store all coordinates as 1D arrays with the same size as the number of grid points. These gridpoints can be completely independent of all the other (own weight, area, etc.).
R=1: lat(number_of_gridpoints) and lon(number_of_gridpoints)
If the horizontal grid is time-invariant then R=2 is common:
R=2: lat(south_north,east_west) and lon(south_north,east_west)
The Weather and Research Forecast (WRF) model uses R=3:
R=3: lat(time,south_north,east_west), lon(time,south_north,east_west)
and so supports grids that change with time.
Grids with R > 1 often use missing values to indicated empty points. For example, so-called “staggered grids” will use fewer east_west points near the poles and more near the equator. netCDF only accepts rectangular arrays so space must be allocated for the maximum number of east_west points at all latitudes. Then the application writes missing values into the unused points near the poles.
We demonstrate the ncap2
analysis technique for irregular
regions by constructing a mask for an R=2 grid.
We wish to find, say, the mean temperature within
[lat_min,lat_max] and [lon_min,lon_max]:
ncap2 -s 'mask_var= (lat >= lat_min && lat <= lat_max) && \ (lon >= lon_min && lon <= lon_max);' in.nc out.nc
Arbitrarily shaped regions can be defined by more complex conditional statements. Once defined, masks can be applied to specific variables, and to entire files:
ncap2 -s 'temperature_avg=(temperature*mask_var).avg()' in.nc out.nc ncwa -a lat,lon -m mask_var -w area in.nc out.nc
Crafting such commands on the command line is possible though unwieldy. In such cases, a script is often cleaner and allows you to document the procedure:
cat > ncap2.in << 'EOF' mask_var = (lat >= lat_min && lat <= lat_max) && (lon >= lon_min && > lon <= lon_max); if(mask_var.total() > 0){ // Check that mask contains some valid values temperature_avg=(temperature*mask_var).avg(); // Average temperature temperature_max=(temperature*mask_var).max(); // Maximum temperature } EOF ncap2 -S ncap2.in in.nc out.nc
Grids like those produced by the WRF model are complex because
one must use global metadata to determine the grid staggering and
offsets to translate XLAT
and XLONG
into real latitudes,
longitudes, and missing points.
The WRF grid documentation should describe this.
For WRF files creating regional masks looks, in general, like
mask_var = (XLAT >= lat_min && XLAT <= lat_max) && (XLONG >= lon_min && XLONG <= lon_max);
A few notes: Irregular regions are the union of arrays of lat/lon min/max’s. The mask procedure is identical for all R.
As of version 4.0.0 NCO has internal routines to perform bilinear interpolation on gridded data sets. In mathematics, bilinear interpolation is an extension of linear interpolation for interpolating functions of two variables on a regular grid. The idea is to perform linear interpolation first in one direction, and then again in the other direction.
Suppose we have an irregular grid of data temperature[lat,lon]
,
with co-ordinate vars lat[lat], lon[lon]
.
We wish to find the temperature at an arbitary point [X,Y]
within the grid.
If we can locate lat_min,lat_max and lon_min,lon_max such that
lat_min <= X <= lat_max
and lon_min <= Y <= lon_max
then we can interpolate in two dimensions the temperature at
[X,Y].
The general form of the ncap2
interpolation function is
var_out=bilinear_interp(grid_in,grid_out,grid_out_x,grid_out_y,grid_in_x,grid_in_y)
where
grid_in
Input function data.
Usually a two dimensional variable.
It must be of size grid_in_x.size()*grid_in_y.size()
grid_out
This variable is the shape of var_out
.
Usually a two dimensional variable.
It must be of size grid_out_x.size()*grid_out_y.size()
grid_out_x
X output values
grid_out_y
Y output values
grid_in_x
X input values values. Must be monotonic (increasing or decreasing).
grid_in_y
Y input values values. Must be monotonic (increasing or decreasing).
Prior to calculations all arguments are converted to type
NC_DOUBLE
.
After calculations var_out
is converted to the input type of
grid_in
.
Suppose the first part of an ncap2
script is
defdim("X",4); defdim("Y",5); // Temperature T_in[$X,$Y]= {100, 200, 300, 400, 500, 101, 202, 303, 404, 505, 102, 204, 306, 408, 510, 103, 206, 309, 412, 515.0 }; // Coordinate variables x_in[$X]={0.0,1.0,2.0,3.01}; y_in[$Y]={1.0,2.0,3.0,4.0,5};
Now we interpolate with the following variables:
defdim("Xn",3); defdim("Yn",4); T_out[$Xn,$Yn]=0.0; x_out[$Xn]={0.0,0.02,3.01}; y_out[$Yn]={1.1,2.0,3,4}; var_out=bilinear_interp(T_in,T_out,x_out,y_out,x_in,y_in); print(var_out); // 110, 200, 300, 400, // 110.022, 200.04, 300.06, 400.08, // 113.3, 206, 309, 412 ;
It is possible to interpolate a single point:
var_out=bilinear_interp(T_in,0.0,3.0,4.99,x_in,y_in); print(var_out); // 513.920594059406
Wrapping and Extrapolation
The function bilinear_interp_wrap()
takes the same
arguments as bilinear_interp()
but performs wrapping (Y)
and extrapolation (X) for points off the edge of the grid.
If the given range of longitude is say (25-335) and we have a point at
20 degrees, then the endpoints of the range are used for the
interpolation.
This is what wrapping means.
For wrapping to occur Y must be longitude and must be in the range
(0,360) or (-180,180).
There are no restrictions on the longitude (X) values, though
typically these are in the range (-90,90).
This ncap2
script illustrates both wrapping and extrapolation
of end points:
defdim("lat_in",6); defdim("lon_in",5); // Coordinate input vars lat_in[$lat_in]={-80,-40,0,30,60.0,85.0}; lon_in[$lon_in]={30, 110, 190, 270, 350.0}; T_in[$lat_in,$lon_in]= {10,40,50,30,15, 12,43,52,31,16, 14,46,54,32,17, 16,49,56,33,18, 18,52,58,34,19, 20,55,60,35,20.0 }; defdim("lat_out",4); defdim("lon_out",3); // Coordinate variables lat_out[$lat_out]={-90,0,70,88.0}; lon_out[$lon_out]={0,190,355.0}; T_out[$lat_out,$lon_out]=0.0; T_out=bilinear_interp_wrap(T_in,T_out,lat_out,lon_out,lat_in,lon_in); print(T_out); // 13.4375, 49.5, 14.09375, // 16.25, 54, 16.625, // 19.25, 58.8, 19.325, // 20.15, 60.24, 20.135 ;
As of version 3.9.6 (released January, 2009), NCO
can link to the GNU Scientific Library (GSL).
ncap2
can access most GSL special functions including
Airy, Bessel, error, gamma, beta, hypergeometric, and Legendre functions
and elliptical integrals.
GSL must be version 1.4 or later.
To list the GSL functions available with your NCO
build, use ncap2 -f | grep ^gsl
.
The function names used by ncap2 mirror their GSL names. The NCO wrappers for GSL functions automatically call the error-handling version of the GSL function when available 64. This allows NCO to return a missing value when the GSL library encounters a domain error or a floating-point exception. The slow-down due to calling the error-handling version of the GSL numerical functions was found to be negligible (please let us know if you find otherwise).
Consider the gamma function.
The GSL function prototype is
int gsl_sf_gamma_e(const double x, gsl_sf_result * result)
The ncap2
script would be:
lon_in[lon]={-1,0.1,0,2,0.3}; lon_out=gsl_sf_gamma(lon_in); lon_out= _, 9.5135, 4.5908, 2.9915
The first value is set to _FillValue
since the gamma
function is undefined for negative integers.
If the input variable has a missing value then this value is used.
Otherwise, the default double fill value is used
(defined in the netCDF header netcdf.h as
NC_FILL_DOUBLE = 9.969e+36
).
Consider a call to a Bessel function with GSL
prototype
int gsl_sf_bessel_Jn_e(int n, double x, gsl_sf_result * result)
An ncap2
script would be
lon_out=gsl_sf_bessel_Jn(2,lon_in); lon_out=0.11490, 0.0012, 0.00498, 0.011165
This computes the Bessel function of order n=2 for every value in
lon_in
.
The Bessel order argument, an integer, can also be a non-scalar
variable, i.e., an array.
n_in[lon]={0,1,2,3}; lon_out=gsl_sf_bessel_Jn(n_in,0.5); lon_out= 0.93846, 0.24226, 0.03060, 0.00256
Arguments to GSL wrapper functions in ncap2
must conform to one another, i.e., they must share the same sub-set of
dimensions.
For example: three_out=gsl_sf_bessel_Jn(n_in,three_dmn_var_dbl)
is valid because the variable three_dmn_var_dbl
has a lon
dimension, so n_in
in can be broadcast to conform to
three_dmn_var_dbl
.
However time_out=gsl_sf_bessel_Jn(n_in,time)
is invalid.
Consider the elliptical integral with prototype
int gsl_sf_ellint_RD_e(double x, double y, double z, gsl_mode_t mode, gsl_sf_result * result)
three_out=gsl_sf_ellint_RD(0.5,time,three_dmn_var_dbl);
The three arguments are all conformable so the above ncap2
call is valid. The mode argument in the function prototype controls the convergence of the algorithm. It also appears in the Airy Function prototypes. It can be set by defining the environment variable GSL_PREC_MODE
. If unset it defaults to the value GSL_PREC_DOUBLE
. See the GSL manual for more details.
export GSL_PREC_MODE=0 // GSL_PREC_DOUBLE export GSL_PREC_MODE=1 // GSL_PREC_SINGLE export GSL_PREC_MODE=2 // GSL_PREC_APPROX
The ncap2
wrappers to the array functions are
slightly different.
Consider the following GSL prototype
int gsl_sf_bessel_Jn_array(int nmin, int nmax, double x, double *result_array)
b1=lon.double(); x=0.5; status=gsl_sf_bessel_Jn_array(1,4,x,&b1); print(status); b1=0.24226,0.0306,0.00256,0.00016;
This calculates the Bessel function of x=0.5 for
n=1 to 4.
The first three arguments are scalar values.
If a non-scalar variable is supplied as an argument then only the first
value is used.
The final argument is the variable where the results are stored (NB: the
&
indicates this is a call by reference).
This final argument must be of type double
and must be of least
size nmax-nmin+1.
If either of these conditions is not met then then the function
returns an error message.
The function/wrapper returns a status flag.
Zero indicates success.
Consider another array function
int gsl_sf_legendre_Pl_array(int lmax, double x, double *result_array);
a1=time.double(); x=0.3; status=gsl_sf_legendre_Pl_array(a1.size()-1, x,&a1); print(status);
This call calculates P_l(0.3) for l=0..9. Note that |x|<=1, otherwise there will be a domain error. See the GSL documentation for more details.
The GSL functions implemented in NCO are
listed in the table below.
This table is correct for GSL version 1.10.
To see what functions are available on your build run the command
ncap2 -f |grep ^gsl
.
To see this table along with the GSL C-function
prototypes look at the spreadsheet doc/nco_gsl.ods.
GSL NAME | I | NCAP FUNCTION CALL |
gsl_sf_airy_Ai_e | Y | gsl_sf_airy_Ai(dbl_expr) |
gsl_sf_airy_Bi_e | Y | gsl_sf_airy_Bi(dbl_expr) |
gsl_sf_airy_Ai_scaled_e | Y | gsl_sf_airy_Ai_scaled(dbl_expr) |
gsl_sf_airy_Bi_scaled_e | Y | gsl_sf_airy_Bi_scaled(dbl_expr) |
gsl_sf_airy_Ai_deriv_e | Y | gsl_sf_airy_Ai_deriv(dbl_expr) |
gsl_sf_airy_Bi_deriv_e | Y | gsl_sf_airy_Bi_deriv(dbl_expr) |
gsl_sf_airy_Ai_deriv_scaled_e | Y | gsl_sf_airy_Ai_deriv_scaled(dbl_expr) |
gsl_sf_airy_Bi_deriv_scaled_e | Y | gsl_sf_airy_Bi_deriv_scaled(dbl_expr) |
gsl_sf_airy_zero_Ai_e | Y | gsl_sf_airy_zero_Ai(uint_expr) |
gsl_sf_airy_zero_Bi_e | Y | gsl_sf_airy_zero_Bi(uint_expr) |
gsl_sf_airy_zero_Ai_deriv_e | Y | gsl_sf_airy_zero_Ai_deriv(uint_expr) |
gsl_sf_airy_zero_Bi_deriv_e | Y | gsl_sf_airy_zero_Bi_deriv(uint_expr) |
gsl_sf_bessel_J0_e | Y | gsl_sf_bessel_J0(dbl_expr) |
gsl_sf_bessel_J1_e | Y | gsl_sf_bessel_J1(dbl_expr) |
gsl_sf_bessel_Jn_e | Y | gsl_sf_bessel_Jn(int_expr,dbl_expr) |
gsl_sf_bessel_Jn_array | Y | status=gsl_sf_bessel_Jn_array(int,int,double,&var_out) |
gsl_sf_bessel_Y0_e | Y | gsl_sf_bessel_Y0(dbl_expr) |
gsl_sf_bessel_Y1_e | Y | gsl_sf_bessel_Y1(dbl_expr) |
gsl_sf_bessel_Yn_e | Y | gsl_sf_bessel_Yn(int_expr,dbl_expr) |
gsl_sf_bessel_Yn_array | Y | gsl_sf_bessel_Yn_array |
gsl_sf_bessel_I0_e | Y | gsl_sf_bessel_I0(dbl_expr) |
gsl_sf_bessel_I1_e | Y | gsl_sf_bessel_I1(dbl_expr) |
gsl_sf_bessel_In_e | Y | gsl_sf_bessel_In(int_expr,dbl_expr) |
gsl_sf_bessel_In_array | Y | status=gsl_sf_bessel_In_array(int,int,double,&var_out) |
gsl_sf_bessel_I0_scaled_e | Y | gsl_sf_bessel_I0_scaled(dbl_expr) |
gsl_sf_bessel_I1_scaled_e | Y | gsl_sf_bessel_I1_scaled(dbl_expr) |
gsl_sf_bessel_In_scaled_e | Y | gsl_sf_bessel_In_scaled(int_expr,dbl_expr) |
gsl_sf_bessel_In_scaled_array | Y | staus=gsl_sf_bessel_In_scaled_array(int,int,double,&var_out) |
gsl_sf_bessel_K0_e | Y | gsl_sf_bessel_K0(dbl_expr) |
gsl_sf_bessel_K1_e | Y | gsl_sf_bessel_K1(dbl_expr) |
gsl_sf_bessel_Kn_e | Y | gsl_sf_bessel_Kn(int_expr,dbl_expr) |
gsl_sf_bessel_Kn_array | Y | status=gsl_sf_bessel_Kn_array(int,int,double,&var_out) |
gsl_sf_bessel_K0_scaled_e | Y | gsl_sf_bessel_K0_scaled(dbl_expr) |
gsl_sf_bessel_K1_scaled_e | Y | gsl_sf_bessel_K1_scaled(dbl_expr) |
gsl_sf_bessel_Kn_scaled_e | Y | gsl_sf_bessel_Kn_scaled(int_expr,dbl_expr) |
gsl_sf_bessel_Kn_scaled_array | Y | status=gsl_sf_bessel_Kn_scaled_array(int,int,double,&var_out) |
gsl_sf_bessel_j0_e | Y | gsl_sf_bessel_J0(dbl_expr) |
gsl_sf_bessel_j1_e | Y | gsl_sf_bessel_J1(dbl_expr) |
gsl_sf_bessel_j2_e | Y | gsl_sf_bessel_j2(dbl_expr) |
gsl_sf_bessel_jl_e | Y | gsl_sf_bessel_jl(int_expr,dbl_expr) |
gsl_sf_bessel_jl_array | Y | status=gsl_sf_bessel_jl_array(int,double,&var_out) |
gsl_sf_bessel_jl_steed_array | Y | gsl_sf_bessel_jl_steed_array |
gsl_sf_bessel_y0_e | Y | gsl_sf_bessel_Y0(dbl_expr) |
gsl_sf_bessel_y1_e | Y | gsl_sf_bessel_Y1(dbl_expr) |
gsl_sf_bessel_y2_e | Y | gsl_sf_bessel_y2(dbl_expr) |
gsl_sf_bessel_yl_e | Y | gsl_sf_bessel_yl(int_expr,dbl_expr) |
gsl_sf_bessel_yl_array | Y | status=gsl_sf_bessel_yl_array(int,double,&var_out) |
gsl_sf_bessel_i0_scaled_e | Y | gsl_sf_bessel_I0_scaled(dbl_expr) |
gsl_sf_bessel_i1_scaled_e | Y | gsl_sf_bessel_I1_scaled(dbl_expr) |
gsl_sf_bessel_i2_scaled_e | Y | gsl_sf_bessel_i2_scaled(dbl_expr) |
gsl_sf_bessel_il_scaled_e | Y | gsl_sf_bessel_il_scaled(int_expr,dbl_expr) |
gsl_sf_bessel_il_scaled_array | Y | status=gsl_sf_bessel_il_scaled_array(int,double,&var_out) |
gsl_sf_bessel_k0_scaled_e | Y | gsl_sf_bessel_K0_scaled(dbl_expr) |
gsl_sf_bessel_k1_scaled_e | Y | gsl_sf_bessel_K1_scaled(dbl_expr) |
gsl_sf_bessel_k2_scaled_e | Y | gsl_sf_bessel_k2_scaled(dbl_expr) |
gsl_sf_bessel_kl_scaled_e | Y | gsl_sf_bessel_kl_scaled(int_expr,dbl_expr) |
gsl_sf_bessel_kl_scaled_array | Y | status=gsl_sf_bessel_kl_scaled_array(int,double,&var_out) |
gsl_sf_bessel_Jnu_e | Y | gsl_sf_bessel_Jnu(dbl_expr,dbl_expr) |
gsl_sf_bessel_Ynu_e | Y | gsl_sf_bessel_Ynu(dbl_expr,dbl_expr) |
gsl_sf_bessel_sequence_Jnu_e | N | gsl_sf_bessel_sequence_Jnu |
gsl_sf_bessel_Inu_scaled_e | Y | gsl_sf_bessel_Inu_scaled(dbl_expr,dbl_expr) |
gsl_sf_bessel_Inu_e | Y | gsl_sf_bessel_Inu(dbl_expr,dbl_expr) |
gsl_sf_bessel_Knu_scaled_e | Y | gsl_sf_bessel_Knu_scaled(dbl_expr,dbl_expr) |
gsl_sf_bessel_Knu_e | Y | gsl_sf_bessel_Knu(dbl_expr,dbl_expr) |
gsl_sf_bessel_lnKnu_e | Y | gsl_sf_bessel_lnKnu(dbl_expr,dbl_expr) |
gsl_sf_bessel_zero_J0_e | Y | gsl_sf_bessel_zero_J0(uint_expr) |
gsl_sf_bessel_zero_J1_e | Y | gsl_sf_bessel_zero_J1(uint_expr) |
gsl_sf_bessel_zero_Jnu_e | N | gsl_sf_bessel_zero_Jnu |
gsl_sf_clausen_e | Y | gsl_sf_clausen(dbl_expr) |
gsl_sf_hydrogenicR_1_e | N | gsl_sf_hydrogenicR_1 |
gsl_sf_hydrogenicR_e | N | gsl_sf_hydrogenicR |
gsl_sf_coulomb_wave_FG_e | N | gsl_sf_coulomb_wave_FG |
gsl_sf_coulomb_wave_F_array | N | gsl_sf_coulomb_wave_F_array |
gsl_sf_coulomb_wave_FG_array | N | gsl_sf_coulomb_wave_FG_array |
gsl_sf_coulomb_wave_FGp_array | N | gsl_sf_coulomb_wave_FGp_array |
gsl_sf_coulomb_wave_sphF_array | N | gsl_sf_coulomb_wave_sphF_array |
gsl_sf_coulomb_CL_e | N | gsl_sf_coulomb_CL |
gsl_sf_coulomb_CL_array | N | gsl_sf_coulomb_CL_array |
gsl_sf_coupling_3j_e | N | gsl_sf_coupling_3j |
gsl_sf_coupling_6j_e | N | gsl_sf_coupling_6j |
gsl_sf_coupling_RacahW_e | N | gsl_sf_coupling_RacahW |
gsl_sf_coupling_9j_e | N | gsl_sf_coupling_9j |
gsl_sf_coupling_6j_INCORRECT_e | N | gsl_sf_coupling_6j_INCORRECT |
gsl_sf_dawson_e | Y | gsl_sf_dawson(dbl_expr) |
gsl_sf_debye_1_e | Y | gsl_sf_debye_1(dbl_expr) |
gsl_sf_debye_2_e | Y | gsl_sf_debye_2(dbl_expr) |
gsl_sf_debye_3_e | Y | gsl_sf_debye_3(dbl_expr) |
gsl_sf_debye_4_e | Y | gsl_sf_debye_4(dbl_expr) |
gsl_sf_debye_5_e | Y | gsl_sf_debye_5(dbl_expr) |
gsl_sf_debye_6_e | Y | gsl_sf_debye_6(dbl_expr) |
gsl_sf_dilog_e | N | gsl_sf_dilog |
gsl_sf_complex_dilog_xy_e | N | gsl_sf_complex_dilog_xy_e |
gsl_sf_complex_dilog_e | N | gsl_sf_complex_dilog |
gsl_sf_complex_spence_xy_e | N | gsl_sf_complex_spence_xy_e |
gsl_sf_multiply_e | N | gsl_sf_multiply |
gsl_sf_multiply_err_e | N | gsl_sf_multiply_err |
gsl_sf_ellint_Kcomp_e | Y | gsl_sf_ellint_Kcomp(dbl_expr) |
gsl_sf_ellint_Ecomp_e | Y | gsl_sf_ellint_Ecomp(dbl_expr) |
gsl_sf_ellint_Pcomp_e | Y | gsl_sf_ellint_Pcomp(dbl_expr,dbl_expr) |
gsl_sf_ellint_Dcomp_e | Y | gsl_sf_ellint_Dcomp(dbl_expr) |
gsl_sf_ellint_F_e | Y | gsl_sf_ellint_F(dbl_expr,dbl_expr) |
gsl_sf_ellint_E_e | Y | gsl_sf_ellint_E(dbl_expr,dbl_expr) |
gsl_sf_ellint_P_e | Y | gsl_sf_ellint_P(dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_ellint_D_e | Y | gsl_sf_ellint_D(dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_ellint_RC_e | Y | gsl_sf_ellint_RC(dbl_expr,dbl_expr) |
gsl_sf_ellint_RD_e | Y | gsl_sf_ellint_RD(dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_ellint_RF_e | Y | gsl_sf_ellint_RF(dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_ellint_RJ_e | Y | gsl_sf_ellint_RJ(dbl_expr,dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_elljac_e | N | gsl_sf_elljac |
gsl_sf_erfc_e | Y | gsl_sf_erfc(dbl_expr) |
gsl_sf_log_erfc_e | Y | gsl_sf_log_erfc(dbl_expr) |
gsl_sf_erf_e | Y | gsl_sf_erf(dbl_expr) |
gsl_sf_erf_Z_e | Y | gsl_sf_erf_Z(dbl_expr) |
gsl_sf_erf_Q_e | Y | gsl_sf_erf_Q(dbl_expr) |
gsl_sf_hazard_e | Y | gsl_sf_hazard(dbl_expr) |
gsl_sf_exp_e | Y | gsl_sf_exp(dbl_expr) |
gsl_sf_exp_e10_e | N | gsl_sf_exp_e10 |
gsl_sf_exp_mult_e | Y | gsl_sf_exp_mult(dbl_expr,dbl_expr) |
gsl_sf_exp_mult_e10_e | N | gsl_sf_exp_mult_e10 |
gsl_sf_expm1_e | Y | gsl_sf_expm1(dbl_expr) |
gsl_sf_exprel_e | Y | gsl_sf_exprel(dbl_expr) |
gsl_sf_exprel_2_e | Y | gsl_sf_exprel_2(dbl_expr) |
gsl_sf_exprel_n_e | Y | gsl_sf_exprel_n(int_expr,dbl_expr) |
gsl_sf_exp_err_e | Y | gsl_sf_exp_err(dbl_expr,dbl_expr) |
gsl_sf_exp_err_e10_e | N | gsl_sf_exp_err_e10 |
gsl_sf_exp_mult_err_e | N | gsl_sf_exp_mult_err |
gsl_sf_exp_mult_err_e10_e | N | gsl_sf_exp_mult_err_e10 |
gsl_sf_expint_E1_e | Y | gsl_sf_expint_E1(dbl_expr) |
gsl_sf_expint_E2_e | Y | gsl_sf_expint_E2(dbl_expr) |
gsl_sf_expint_En_e | Y | gsl_sf_expint_En(int_expr,dbl_expr) |
gsl_sf_expint_E1_scaled_e | Y | gsl_sf_expint_E1_scaled(dbl_expr) |
gsl_sf_expint_E2_scaled_e | Y | gsl_sf_expint_E2_scaled(dbl_expr) |
gsl_sf_expint_En_scaled_e | Y | gsl_sf_expint_En_scaled(int_expr,dbl_expr) |
gsl_sf_expint_Ei_e | Y | gsl_sf_expint_Ei(dbl_expr) |
gsl_sf_expint_Ei_scaled_e | Y | gsl_sf_expint_Ei_scaled(dbl_expr) |
gsl_sf_Shi_e | Y | gsl_sf_Shi(dbl_expr) |
gsl_sf_Chi_e | Y | gsl_sf_Chi(dbl_expr) |
gsl_sf_expint_3_e | Y | gsl_sf_expint_3(dbl_expr) |
gsl_sf_Si_e | Y | gsl_sf_Si(dbl_expr) |
gsl_sf_Ci_e | Y | gsl_sf_Ci(dbl_expr) |
gsl_sf_atanint_e | Y | gsl_sf_atanint(dbl_expr) |
gsl_sf_fermi_dirac_m1_e | Y | gsl_sf_fermi_dirac_m1(dbl_expr) |
gsl_sf_fermi_dirac_0_e | Y | gsl_sf_fermi_dirac_0(dbl_expr) |
gsl_sf_fermi_dirac_1_e | Y | gsl_sf_fermi_dirac_1(dbl_expr) |
gsl_sf_fermi_dirac_2_e | Y | gsl_sf_fermi_dirac_2(dbl_expr) |
gsl_sf_fermi_dirac_int_e | Y | gsl_sf_fermi_dirac_int(int_expr,dbl_expr) |
gsl_sf_fermi_dirac_mhalf_e | Y | gsl_sf_fermi_dirac_mhalf(dbl_expr) |
gsl_sf_fermi_dirac_half_e | Y | gsl_sf_fermi_dirac_half(dbl_expr) |
gsl_sf_fermi_dirac_3half_e | Y | gsl_sf_fermi_dirac_3half(dbl_expr) |
gsl_sf_fermi_dirac_inc_0_e | Y | gsl_sf_fermi_dirac_inc_0(dbl_expr,dbl_expr) |
gsl_sf_lngamma_e | Y | gsl_sf_lngamma(dbl_expr) |
gsl_sf_lngamma_sgn_e | N | gsl_sf_lngamma_sgn |
gsl_sf_gamma_e | Y | gsl_sf_gamma(dbl_expr) |
gsl_sf_gammastar_e | Y | gsl_sf_gammastar(dbl_expr) |
gsl_sf_gammainv_e | Y | gsl_sf_gammainv(dbl_expr) |
gsl_sf_lngamma_complex_e | N | gsl_sf_lngamma_complex |
gsl_sf_taylorcoeff_e | Y | gsl_sf_taylorcoeff(int_expr,dbl_expr) |
gsl_sf_fact_e | Y | gsl_sf_fact(uint_expr) |
gsl_sf_doublefact_e | Y | gsl_sf_doublefact(uint_expr) |
gsl_sf_lnfact_e | Y | gsl_sf_lnfact(uint_expr) |
gsl_sf_lndoublefact_e | Y | gsl_sf_lndoublefact(uint_expr) |
gsl_sf_lnchoose_e | N | gsl_sf_lnchoose |
gsl_sf_choose_e | N | gsl_sf_choose |
gsl_sf_lnpoch_e | Y | gsl_sf_lnpoch(dbl_expr,dbl_expr) |
gsl_sf_lnpoch_sgn_e | N | gsl_sf_lnpoch_sgn |
gsl_sf_poch_e | Y | gsl_sf_poch(dbl_expr,dbl_expr) |
gsl_sf_pochrel_e | Y | gsl_sf_pochrel(dbl_expr,dbl_expr) |
gsl_sf_gamma_inc_Q_e | Y | gsl_sf_gamma_inc_Q(dbl_expr,dbl_expr) |
gsl_sf_gamma_inc_P_e | Y | gsl_sf_gamma_inc_P(dbl_expr,dbl_expr) |
gsl_sf_gamma_inc_e | Y | gsl_sf_gamma_inc(dbl_expr,dbl_expr) |
gsl_sf_lnbeta_e | Y | gsl_sf_lnbeta(dbl_expr,dbl_expr) |
gsl_sf_lnbeta_sgn_e | N | gsl_sf_lnbeta_sgn |
gsl_sf_beta_e | Y | gsl_sf_beta(dbl_expr,dbl_expr) |
gsl_sf_beta_inc_e | N | gsl_sf_beta_inc |
gsl_sf_gegenpoly_1_e | Y | gsl_sf_gegenpoly_1(dbl_expr,dbl_expr) |
gsl_sf_gegenpoly_2_e | Y | gsl_sf_gegenpoly_2(dbl_expr,dbl_expr) |
gsl_sf_gegenpoly_3_e | Y | gsl_sf_gegenpoly_3(dbl_expr,dbl_expr) |
gsl_sf_gegenpoly_n_e | N | gsl_sf_gegenpoly_n |
gsl_sf_gegenpoly_array | Y | gsl_sf_gegenpoly_array |
gsl_sf_hyperg_0F1_e | Y | gsl_sf_hyperg_0F1(dbl_expr,dbl_expr) |
gsl_sf_hyperg_1F1_int_e | Y | gsl_sf_hyperg_1F1_int(int_expr,int_expr,dbl_expr) |
gsl_sf_hyperg_1F1_e | Y | gsl_sf_hyperg_1F1(dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_hyperg_U_int_e | Y | gsl_sf_hyperg_U_int(int_expr,int_expr,dbl_expr) |
gsl_sf_hyperg_U_int_e10_e | N | gsl_sf_hyperg_U_int_e10 |
gsl_sf_hyperg_U_e | Y | gsl_sf_hyperg_U(dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_hyperg_U_e10_e | N | gsl_sf_hyperg_U_e10 |
gsl_sf_hyperg_2F1_e | Y | gsl_sf_hyperg_2F1(dbl_expr,dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_hyperg_2F1_conj_e | Y | gsl_sf_hyperg_2F1_conj(dbl_expr,dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_hyperg_2F1_renorm_e | Y | gsl_sf_hyperg_2F1_renorm(dbl_expr,dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_hyperg_2F1_conj_renorm_e | Y | gsl_sf_hyperg_2F1_conj_renorm(dbl_expr,dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_hyperg_2F0_e | Y | gsl_sf_hyperg_2F0(dbl_expr,dbl_expr,dbl_expr) |
gsl_sf_laguerre_1_e | Y | gsl_sf_laguerre_1(dbl_expr,dbl_expr) |
gsl_sf_laguerre_2_e | Y | gsl_sf_laguerre_2(dbl_expr,dbl_expr) |
gsl_sf_laguerre_3_e | Y | gsl_sf_laguerre_3(dbl_expr,dbl_expr) |
gsl_sf_laguerre_n_e | Y | gsl_sf_laguerre_n(int_expr,dbl_expr,dbl_expr) |
gsl_sf_lambert_W0_e | Y | gsl_sf_lambert_W0(dbl_expr) |
gsl_sf_lambert_Wm1_e | Y | gsl_sf_lambert_Wm1(dbl_expr) |
gsl_sf_legendre_Pl_e | Y | gsl_sf_legendre_Pl(int_expr,dbl_expr) |
gsl_sf_legendre_Pl_array | Y | status=gsl_sf_legendre_Pl_array(int,double,&var_out) |
gsl_sf_legendre_Pl_deriv_array | N | gsl_sf_legendre_Pl_deriv_array |
gsl_sf_legendre_P1_e | Y | gsl_sf_legendre_P1(dbl_expr) |
gsl_sf_legendre_P2_e | Y | gsl_sf_legendre_P2(dbl_expr) |
gsl_sf_legendre_P3_e | Y | gsl_sf_legendre_P3(dbl_expr) |
gsl_sf_legendre_Q0_e | Y | gsl_sf_legendre_Q0(dbl_expr) |
gsl_sf_legendre_Q1_e | Y | gsl_sf_legendre_Q1(dbl_expr) |
gsl_sf_legendre_Ql_e | Y | gsl_sf_legendre_Ql(int_expr,dbl_expr) |
gsl_sf_legendre_Plm_e | Y | gsl_sf_legendre_Plm(int_expr,int_expr,dbl_expr) |
gsl_sf_legendre_Plm_array | Y | status=gsl_sf_legendre_Plm_array(int,int,double,&var_out) |
gsl_sf_legendre_Plm_deriv_array | N | gsl_sf_legendre_Plm_deriv_array |
gsl_sf_legendre_sphPlm_e | Y | gsl_sf_legendre_sphPlm(int_expr,int_expr,dbl_expr) |
gsl_sf_legendre_sphPlm_array | Y | status=gsl_sf_legendre_sphPlm_array(int,int,double,&var_out) |
gsl_sf_legendre_sphPlm_deriv_array | N | gsl_sf_legendre_sphPlm_deriv_array |
gsl_sf_legendre_array_size | N | gsl_sf_legendre_array_size |
gsl_sf_conicalP_half_e | Y | gsl_sf_conicalP_half(dbl_expr,dbl_expr) |
gsl_sf_conicalP_mhalf_e | Y | gsl_sf_conicalP_mhalf(dbl_expr,dbl_expr) |
gsl_sf_conicalP_0_e | Y | gsl_sf_conicalP_0(dbl_expr,dbl_expr) |
gsl_sf_conicalP_1_e | Y | gsl_sf_conicalP_1(dbl_expr,dbl_expr) |
gsl_sf_conicalP_sph_reg_e | Y | gsl_sf_conicalP_sph_reg(int_expr,dbl_expr,dbl_expr) |
gsl_sf_conicalP_cyl_reg_e | Y | gsl_sf_conicalP_cyl_reg(int_expr,dbl_expr,dbl_expr) |
gsl_sf_legendre_H3d_0_e | Y | gsl_sf_legendre_H3d_0(dbl_expr,dbl_expr) |
gsl_sf_legendre_H3d_1_e | Y | gsl_sf_legendre_H3d_1(dbl_expr,dbl_expr) |
gsl_sf_legendre_H3d_e | Y | gsl_sf_legendre_H3d(int_expr,dbl_expr,dbl_expr) |
gsl_sf_legendre_H3d_array | N | gsl_sf_legendre_H3d_array |
gsl_sf_legendre_array_size | N | gsl_sf_legendre_array_size |
gsl_sf_log_e | Y | gsl_sf_log(dbl_expr) |
gsl_sf_log_abs_e | Y | gsl_sf_log_abs(dbl_expr) |
gsl_sf_complex_log_e | N | gsl_sf_complex_log |
gsl_sf_log_1plusx_e | Y | gsl_sf_log_1plusx(dbl_expr) |
gsl_sf_log_1plusx_mx_e | Y | gsl_sf_log_1plusx_mx(dbl_expr) |
gsl_sf_mathieu_a_array | N | gsl_sf_mathieu_a_array |
gsl_sf_mathieu_b_array | N | gsl_sf_mathieu_b_array |
gsl_sf_mathieu_a | N | gsl_sf_mathieu_a |
gsl_sf_mathieu_b | N | gsl_sf_mathieu_b |
gsl_sf_mathieu_a_coeff | N | gsl_sf_mathieu_a_coeff |
gsl_sf_mathieu_b_coeff | N | gsl_sf_mathieu_b_coeff |
gsl_sf_mathieu_ce | N | gsl_sf_mathieu_ce |
gsl_sf_mathieu_se | N | gsl_sf_mathieu_se |
gsl_sf_mathieu_ce_array | N | gsl_sf_mathieu_ce_array |
gsl_sf_mathieu_se_array | N | gsl_sf_mathieu_se_array |
gsl_sf_mathieu_Mc | N | gsl_sf_mathieu_Mc |
gsl_sf_mathieu_Ms | N | gsl_sf_mathieu_Ms |
gsl_sf_mathieu_Mc_array | N | gsl_sf_mathieu_Mc_array |
gsl_sf_mathieu_Ms_array | N | gsl_sf_mathieu_Ms_array |
gsl_sf_pow_int_e | N | gsl_sf_pow_int |
gsl_sf_psi_int_e | Y | gsl_sf_psi_int(int_expr) |
gsl_sf_psi_e | Y | gsl_sf_psi(dbl_expr) |
gsl_sf_psi_1piy_e | Y | gsl_sf_psi_1piy(dbl_expr) |
gsl_sf_complex_psi_e | N | gsl_sf_complex_psi |
gsl_sf_psi_1_int_e | Y | gsl_sf_psi_1_int(int_expr) |
gsl_sf_psi_1_e | Y | gsl_sf_psi_1(dbl_expr) |
gsl_sf_psi_n_e | Y | gsl_sf_psi_n(int_expr,dbl_expr) |
gsl_sf_synchrotron_1_e | Y | gsl_sf_synchrotron_1(dbl_expr) |
gsl_sf_synchrotron_2_e | Y | gsl_sf_synchrotron_2(dbl_expr) |
gsl_sf_transport_2_e | Y | gsl_sf_transport_2(dbl_expr) |
gsl_sf_transport_3_e | Y | gsl_sf_transport_3(dbl_expr) |
gsl_sf_transport_4_e | Y | gsl_sf_transport_4(dbl_expr) |
gsl_sf_transport_5_e | Y | gsl_sf_transport_5(dbl_expr) |
gsl_sf_sin_e | N | gsl_sf_sin |
gsl_sf_cos_e | N | gsl_sf_cos |
gsl_sf_hypot_e | N | gsl_sf_hypot |
gsl_sf_complex_sin_e | N | gsl_sf_complex_sin |
gsl_sf_complex_cos_e | N | gsl_sf_complex_cos |
gsl_sf_complex_logsin_e | N | gsl_sf_complex_logsin |
gsl_sf_sinc_e | N | gsl_sf_sinc |
gsl_sf_lnsinh_e | N | gsl_sf_lnsinh |
gsl_sf_lncosh_e | N | gsl_sf_lncosh |
gsl_sf_polar_to_rect | N | gsl_sf_polar_to_rect |
gsl_sf_rect_to_polar | N | gsl_sf_rect_to_polar |
gsl_sf_sin_err_e | N | gsl_sf_sin_err |
gsl_sf_cos_err_e | N | gsl_sf_cos_err |
gsl_sf_angle_restrict_symm_e | N | gsl_sf_angle_restrict_symm |
gsl_sf_angle_restrict_pos_e | N | gsl_sf_angle_restrict_pos |
gsl_sf_angle_restrict_symm_err_e | N | gsl_sf_angle_restrict_symm_err |
gsl_sf_angle_restrict_pos_err_e | N | gsl_sf_angle_restrict_pos_err |
gsl_sf_zeta_int_e | Y | gsl_sf_zeta_int(int_expr) |
gsl_sf_zeta_e | Y | gsl_sf_zeta(dbl_expr) |
gsl_sf_zetam1_e | Y | gsl_sf_zetam1(dbl_expr) |
gsl_sf_zetam1_int_e | Y | gsl_sf_zetam1_int(int_expr) |
gsl_sf_hzeta_e | Y | gsl_sf_hzeta(dbl_expr,dbl_expr) |
gsl_sf_eta_int_e | Y | gsl_sf_eta_int(int_expr) |
gsl_sf_eta_e | Y | gsl_sf_eta(dbl_expr) |
As of version 3.9.9 (released July, 2009), NCO has wrappers to the GSL interpolation functions.
Given a set of data points (x1,y1)...(xn, yn) the GSL functions computes a continuous interpolating function Y(x) such that Y(xi) = yi. The interpolation is piecewise smooth, and its behavior at the end-points is determined by the type of interpolation used. For more information consult the GSL manual.
Interpolation with ncap2
is a two stage process. In the first stage, a RAM variable is created from the chosen interpolating function and the data set. This RAM variable holds in memory a GSL interpolation object. In the second stage, points along the interpolating function are calculated. If you have a very large data set or are interpolating many sets then consider deleting the RAM variable when it is redundant. Use the command ram_delete(var_nm)
.
A simple example
x_in[$lon]={1.0,2.0,3.0,4.0}; y_in[$lon]={1.1,1.2,1.5,1.8}; // Ram variable is declared and defined here gsl_interp_cspline(&ram_sp,x_in,y_in); x_out[$lon_grd]={1.1,2.0,3.0,3.1,3.99}; y_out=gsl_spline_eval(ram_sp,x_out); y2=gsl_spline_eval(ram_sp,1.3); y3=gsl_spline_eval(ram_sp,0.0); ram_delete(ram_sp); print(y_out); // 1.10472, 1.2, 1.4, 1.42658, 1.69680002 print(y2); // 1.12454 print(y3); // '_'
Note in the above example y3 is set to ’missing value’ because 0.0 isn’t within the input X range.
GSL Interpolation Types
All the interpolation functions have been implemented. These are:
gsl_interp_linear()
gsl_interp_polynomial()
gsl_interp_cspline()
gsl_interp_cspline_periodic()
gsl_interp_akima()
gsl_interp_akima_periodic()
Evaluation of Interpolating Types
Implemented
gsl_spline_eval()
Not implemented
gsl_spline_deriv()
gsl_spline_deriv2()
gsl_spline_integ()
Least Squares fitting is a method of calculating a straight line through a set of experimental data points in the XY plane. Data may be weighted or unweighted. For more information please refer to the GSL manual.
These GSL functions fall into three categories:
A) Fitting data to Y=c0+c1*X
B) Fitting data (through the origin) Y=c1*X
C) Multi-parameter fitting (not yet implemented)
Section A
status=gsl_fit_linear (data_x,stride_x,data_y,stride_y,n,&co,&c1,&cov00,&cov01,&cov11,&sumsq)
Input variables: data_x, stride_x, data_y, stride_y, n
From the above variables an X and Y vector both of length ’n’ are derived.
If data_x or data_y is less than type double then it is converted to type double
.
It is up to you to do bounds checking on the input data.
For example if stride_x=3 and n=8 then the size of data_x must be at least 24
Output variables: c0, c1, cov00, cov01, cov11,sumsq
The ’&’ prefix indicates that these are call-by-reference variables.
If any of the output variables don’t exist prior to the call then they are created on the fly as scalar variables of type double
. If they already exist then their existing value is overwritten. If the function call is successful then status=0
.
status= gsl_fit_wlinear(data_x,stride_x,data_w,stride_w,data_y,stride_y,n,&co,&c1,&cov00,&cov01,&cov11,&chisq)
Similar to the above call except it creates an additional weighting vector from the variables data_w, stride_w, n
data_y_out=gsl_fit_linear_est(data_x,c0,c1,cov00,cov01,cov11)
This function calculates y values along the line Y=c0+c1*X
Section B
status=gsl_fit_mul(data_x,stride_x,data_y,stride_y,n,&c1,&cov11,&sumsq)
Input variables: data_x, stride_x, data_y, stride_y, n
From the above variables an X and Y vector both of length ’n’ are derived.
If data_x or data_y is less than type double
then it is converted to type double
.
Output variables: c1,cov11,sumsq
status= gsl_fit_wmul(data_x,stride_x,data_w,stride_w,data_y,stride_y,n,&c1,&cov11,&sumsq)
Similar to the above call except it creates an additional weighting vector from the variables data_w, stride_w, n
data_y_out=gsl_fit_mul_est(data_x,c0,c1,cov11)
This function calculates y values along the line Y=c1*X
The below example shows gsl_fit_linear() in action
defdim("d1",10); xin[d1]={1,2,3,4,5,6,7,8,9,10.0}; yin[d1]={3.1,6.2,9.1,12.2,15.1,18.2,21.3,24.0,27.0,30.0}; gsl_fit_linear(xin,1,yin,1,$d1.size,&c0,&c1,&cov00,&cov01,&cov11,&sumsq); print(c0); // 0.2 print(c1); // 2.98545454545 defdim("e1",4); xout[e1]={1.0,3.0,4.0,11}; yout[e1]=0.0; yout=gsl_fit_linear_est(xout,c0,c1,cov00,cov01,cov11,sumsq); print(yout); // 3.18545454545, 9.15636363636, 12.1418181818, 33.04
The following code does linear regression of sst(time,lat,lon) for each time-step
// Declare variables c0[$lat, $lon]=0.; // Intercept c1[$lat, $lon]=0.; // Slope sdv[$lat, $lon]=0.; // Standard deviation covxy[$lat, $lon]=0.; // Covariance for (i=0;i<$lat.size;i++) // Loop over lat { for (j=0;j<$lon.size;j++) // Loop over lon { // Linear regression function gsl_fit_linear(time,1,sst(:, i, j),1,$time.size,&tc0,&tc1,&cov00,&cov01,&cov11,&sumsq); c0(i,j)=tc0; // Output results c1(i,j)=tc1; // Output results // Covariance function covxy(i,j)=gsl_stats_covariance(time,1,$time.size,double(sst(:,i,j)),1,$time.size); // Standard deviation function sdv(i,j)=gsl_stats_sd(sst(:,i,j),1,$time.size); } } // slope (c1) missing values are set to '0', change to -999. (variable c0 intercept value) where(c0 == -999) c1=-999;
Wrappers for most of the GSL Statistical functions have been implemented. The GSL function names include a type specifier (except for type double functions). To obtain the equivalent NCO name simply remove the type specifier; then depending on the data type the appropriate GSL function is called. The weighed statistical functions e.g., gsl_stats_wvariance()
are only defined in GSL for floating-point types; so your data must of type float
or double
otherwise ncap2 will emit an error message. To view the implemented functions use the shell command ncap2 -f|grep _stats
GSL Functions
short gsl_stats_max (short data[], size_t stride, size_t n); double gsl_stats_int_mean (int data[], size_t stride, size_t n); double gsl_stats_short_sd_with_fixed_mean (short data[], size_t stride, size_t n, double mean); double gsl_stats_wmean (double w[], size_t wstride, double data[], size_t stride, size_t n); double gsl_stats_quantile_from_sorted_data (double sorted_data[], size_t stride, size_t n, double f) ;
Equivalent ncap2 wrapper functions
short gsl_stats_max (var_data, data_stride, n); double gsl_stats_mean (var_data, data_stride, n); double gsl_stats_sd_with_fixed_mean (var_data, data_stride, n, var_mean); double gsl_stats_wmean (var_weight, weight_stride, var_data, data_stride, n, var_mean); double gsl_stats_quantile_from_sorted_data (var_sorted_data, data_stride, n, var_f) ;
GSL has no notion of missing values or dimensionality beyond one. If your data has missing values which you want ignored in the calculations then use the ncap2
built in aggregate functions(Methods and functions). The GSL functions operate on a vector of values created from the var_data/stride/n arguments. The ncap wrappers check that there is no bounding error with regard to the size of the data and the final value in the vector.
a1[time]={1,2,3,4,5,6,7,8,9,10}; a1_avg=gsl_stats_mean(a1,1,10); print(a1_avg); // 5.5 a1_var=gsl_stats_variance(a1,4,3); print(a1_var); // 16.0 // bounding error, vector attempts to access element a1(10) a1_sd=gsl_stats_sd(a1,5,3);
For functions with the signature
func_nm(var_data,data_stride,n),
one may omit the second or third arguments.
The default value for stride is 1
.
The default value for n is 1+(data.size()-1)/stride
.
// Following statements are equvalent n2=gsl_stats_max(a1,1,10) n2=gsl_stats_max(a1,1); n2=gsl_stats_max(a1); // Following statements are equvalent n3=gsl_stats_median_from_sorted_data(a1,2,5); n3=gsl_stats_median_from_sorted_data(a1,2); // Following statements are NOT equvalent n4=gsl_stats_kurtosis(a1,3,2); n4=gsl_stats_kurtosis(a1,3); //default n=4
The following example illustrates some of the weighted functions. The data are randomly generated. In this case the value of the weight for each datum is either 0.0 or 1.0
defdim("r1",2000); data[r1]=1.0; // Fill with random numbers [0.0,10.0) data=10.0*gsl_rng_uniform(data); // Create a weighting variable weight=(data>4.0); wmean=gsl_stats_wmean(weight,1,data,1,$r1.size); print(wmean); wsd=gsl_stats_wsd(weight,1,data,1,$r1.size); print(wsd); // number of values in data that are greater than 4 weight_size=weight.total(); print(weight_size); // print min/max of data dmin=data.gsl_stats_min(); dmax=data.gsl_stats_max(); print(dmin);print(dmax);
The GSL library has a large number of random number generators. In addition there are a large set of functions for turning uniform random numbers into discrete or continuous probabilty distributions. The random number generator algorithms vary in terms of quality numbers output, speed of execution and maximum number output. For more information see the GSL documentation. The algorithm and seed are set via environment variables, these are picked up by the ncap2
code.
Setup
The number algorithm is set by the environment variable GSL_RNG_TYPE
. If this variable isn’t set then the default rng algorithm is gsl_rng_19937. The seed is set with the environment variable GSL_RNG_SEED
. The following wrapper functions in ncap2 provide information about the chosen algorithm.
gsl_rng_min()
the minimum value returned by the rng algorithm.
gsl_rng_max()
the maximum value returned by the rng algorithm.
Uniformly Distributed Random Numbers
gsl_rng_get(var_in)
This function returns var_in with integers from the chosen rng algorithm. The min and max values depend uoon the chosen rng algorthm.
gsl_rng_uniform_int(var_in)
This function returns var_in with random integers from 0 to n-1. The value n must be less than or equal to the maximum value of the chosen rng algorithm.
gsl_rng_uniform(var_in)
This function returns var_in with double-precision numbers in the range [0.0,1). The range includes 0.0 and excludes 1.0.
gsl_rng_uniform_pos(var_in)
This function returns var_in with double-precision numbers in the range (0.0,1), excluding both 0.0 and 1.0.
Below are examples of gsl_rng_get()
and gsl_rng_uniform_int()
in action.
export GSL_RNG_TYPE=ranlux export GSL_RNG_SEED=10 ncap2 -v -O -s 'a1[time]=0;a2=gsl_rng_get(a1);' in.nc foo.nc // 10 random numbers from the range 0 - 16777215 // a2=9056646, 12776696, 1011656, 13354708, 5139066, 1388751, 11163902, 7730127, 15531355, 10387694 ; ncap2 -v -O -s 'a1[time]=21;a2=gsl_rng_uniform_int(a1).sort();' in.nc foo.nc // 10 random numbers from the range 0 - 20 a2 = 1, 1, 6, 9, 11, 13, 13, 15, 16, 19 ;
The following example produces an ncap2
runtime error. This is because the chose rng algorithm has a maximum value greater than NC_MAX_INT=2147483647
; the wrapper functions to gsl_rng_get()
and gsl_rng_uniform_int()
return variable of type NC_INT
. Please be aware of this when using random number distribution functions functions from the GSL library which return unsigned int
. Examples of these are gsl_ran_geometric()
and gsl_ran_pascal()
.
export GSL_RNG_TYPE=mt19937 ncap2 -v -O -s 'a1[time]=0;a2=gsl_rng_get(a1);' in.nc foo.nc
To find the maximum value of the chosen rng algorithm use the following code snippet.
ncap2 -v -O -s 'rng_max=gsl_rng_max();print(rng_max)' in.nc foo.nc
Random Number Distributions
The GSL library has a rich set of random number disribution functions. The library also provides cumulative distribution functions and inverse cumulative distribution functions sometimes referred to a quantile functions. To see whats available on your build use the shell command ncap2 -f|grep -e _ran -e _cdf
.
The following examples all return variables of type NC_INT
defdim("out",15); a1[$out]=0.5; a2=gsl_ran_binomial(a1,30).sort(); //a2 = 10, 11, 12, 12, 13, 14, 14, 15, 15, 16, 16, 16, 16, 17, 22 ; a3=gsl_ran_geometric(a2).sort(); //a2 = 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 4, 5 ; a4=gsl_ran_pascal(a2,50); //a5 = 37, 40, 40, 42, 43, 45, 46, 49, 52, 58, 60, 62, 62, 65, 67 ;
The following all return variables of type NC_DOUBLE
;
defdim("b1",1000); b1[$b1]=0.8; b2=gsl_ran_exponential(b1); b2_avg=b2.avg(); print(b2_avg); // b2_avg = 0.756047976787 b3=gsl_ran_gaussian(b1); b3_avg=b3.avg(); b3_rms=b3.rms(); print(b3_avg); // b3_avg = -0.00903446534258; print(b3_rms); // b3_rms = 0.81162979889; b4[$b1]=10.0; b5[$b1]=20.0; b6=gsl_ran_flat(b4,b5); b6_avg=b6.avg(); print(b6_avg); // b6_avg=15.0588129413
See the ncap.in and ncap2.in scripts released with NCO
for more complete demonstrations of ncap2
functionality
(script available on-line at http://nco.sf.net/ncap2.in).
Define new attribute new for existing variable one as twice the existing attribute double_att of variable att_var:
ncap2 -s 'one@new=2*att_var@double_att' in.nc out.nc
Average variables of mixed types (result is of type double
):
ncap2 -s 'average=(var_float+var_double+var_int)/3' in.nc out.nc
Multiple commands may be given to ncap2
in three ways.
First, the commands may be placed in a script which is executed, e.g.,
tst.nco.
Second, the commands may be individually specified with multiple
‘-s’ arguments to the same ncap2
invocation.
Third, the commands may be chained into a single ‘-s’
argument to ncap2
.
Assuming the file tst.nco contains the commands
a=3;b=4;c=sqrt(a^2+b^2);
, then the following ncap2
invocations produce identical results:
ncap2 -v -S tst.nco in.nc out.nc ncap2 -v -s 'a=3' -s 'b=4' -s 'c=sqrt(a^2+b^2)' in.nc out.nc ncap2 -v -s 'a=3;b=4;c=sqrt(a^2+b^2)' in.nc out.nc
The second and third examples show that ncap2
does not require
that a trailing semi-colon ‘;’ be placed at the end of a ‘-s’
argument, although a trailing semi-colon ‘;’ is always allowed.
However, semi-colons are required to separate individual assignment
statements chained together as a single ‘-s’ argument.
ncap2
may be used to “grow” dimensions, i.e., to increase
dimension sizes without altering existing data.
Say in.nc has ORO(lat,lon)
and the user wishes a new
file with new_ORO(new_lat,new_lon)
that contains zeros in the
undefined portions of the new grid.
defdim("new_lat",$lat.size+1); // Define new dimension sizes defdim("new_lon",$lon.size+1); new_ORO[$new_lat,$new_lon]=0.0f; // Initialize to zero new_ORO(0:$lat.size-1,0:$lon.size-1)=ORO; // Fill valid data
The commands to define new coordinate variables new_lat
and new_lon
in the output file follow a similar pattern.
One would might store these commands in a script grow.nco
and then execute the script with
ncap2 -v -S grow.nco in.nc out.nc
Imagine you wish to create a binary flag based on the value of
an array.
The flag should have value 1.0 where the array exceeds 1.0,
and value 0.0 elsewhere.
This example creates the binary flag ORO_flg
in out.nc
from the continuous array named ORO
in in.nc.
ncap2 -s 'ORO_flg=(ORO > 1.0)' in.nc out.nc
Suppose your task is to change all values of ORO
which
equal 2.0 to the new value 3.0:
ncap2 -s 'ORO_msk=(ORO==2.0);ORO=ORO_msk*3.0+!ORO_msk*ORO' in.nc out.nc
This creates and uses ORO_msk
to mask the subsequent arithmetic
operation.
Values of ORO
are only changed where ORO_msk
is true,
i.e., where ORO
equals 2.0
Using the where
statement the above code simplifies to :
ncap2 -s 'where(ORO == 2.0) ORO=3.0;' in.nc foo.nc
This example uses ncap2
to compute the covariance of two
variables.
Let the variables u and v be the horizontal
wind components.
The covariance of u and v is defined
as the time mean product of the deviations of u and
v from their respective time means.
Symbolically, the covariance
[u'v'] =
[uv]-[u][v]
where [x] denotes the time-average of
x and x'
denotes the deviation from the time-mean.
The covariance tells us how much of the correlation of two signals
arises from the signal fluctuations versus the mean signals.
Sometimes this is called the eddy covariance.
We will store the covariance in the variable uprmvprm
.
ncwa -O -a time -v u,v in.nc foo.nc # Compute time mean of u,v ncrename -O -v u,uavg -v v,vavg foo.nc # Rename to avoid conflict ncks -A -v uavg,vavg foo.nc in.nc # Place time means with originals ncap2 -O -s 'uprmvprm=u*v-uavg*vavg' in.nc in.nc # Covariance ncra -O -v uprmvprm in.nc foo.nc # Time-mean covariance
The mathematically inclined will note that the same covariance would be
obtained by replacing the step involving ncap2
with
ncap2 -O -s 'uprmvprm=(u-uavg)*(v-vavg)' foo.nc foo.nc # Covariance
As of NCO version 3.1.8 (December, 2006), ncap2
can compute averages, and thus covariances, by itself:
ncap2 -s 'uavg=u.avg($time);vavg=v.avg($time);uprmvprm=u*v-uavg*vavg' \ -s 'uprmvrpmavg=uprmvprm.avg($time)' in.nc foo.nc
We have not seen a simpler method to script and execute powerful
arithmetic than ncap2
.
ncap2
utilizes many meta-characters
(e.g., ‘$’, ‘?’, ‘;’, ‘()’, ‘[]’)
that can confuse the command-line shell if not quoted properly.
The issues are the same as those which arise in utilizing extended
regular expressions to subset variables (see Subsetting Files).
The example above will fail with no quotes and with double quotes.
This is because shell globbing tries to interpolate the value of
$time
from the shell environment unless it is quoted:
ncap2 -s 'uavg=u.avg($time)' in.nc foo.nc # Correct (recommended) ncap2 -s uavg=u.avg('$time') in.nc foo.nc # Correct (and dangerous) ncap2 -s uavg=u.avg($time) in.nc foo.nc # Wrong ($time = '') ncap2 -s "uavg=u.avg($time)" in.nc foo.nc # Wrong ($time = '')
Without the single quotes, the shell replaces $time
with an
empty string.
The command ncap2
receives from the shell is
uavg=u.avg()
.
This causes ncap2
to average over all dimensions rather than
just the time dimension, and unintended consequence.
We recommend using single quotes to protect ncap2
command-line scripts from the shell, even when such protection is not
strictly necessary.
Expert users may violate this rule to exploit the ability to use shell
variables in ncap2
command-line scripts
(see CCSM Example).
In such cases it may be necessary to use the shell backslash character
‘\’ to protect the ncap2
meta-character.
A dimension of size one is said to be degenerate.
Whether a degenerate record dimension is desirable or not
depends on the application.
Often a degenerate time dimension is useful, e.g., for
concatenating, though it may cause problems with arithmetic.
Such is the case in the above example, where the first step employs
ncwa
rather than ncra
for the time-averaging.
Of course the numerical results are the same with both operators.
The difference is that, unless ‘-b’ is specified, ncwa
writes no time dimension to the output file, while ncra
defaults to keeping time as a degenerate (size 1) dimension.
Appending u
and v
to the output file would cause
ncks
to try to expand the degenerate time axis of uavg
and vavg
to the size of the non-degenerate time dimension
in the input file.
Thus the append (ncks -A
) command would be undefined (and
should fail) in this case.
Equally important is the ‘-C’ argument
(see Subsetting Coordinate Variables) to ncwa
to prevent
any scalar time variable from being written to the output file.
Knowing when to use ncwa -a time
rather than the default
ncra
for time-averaging takes, well, time.
ncap2
supports the standard mathematical functions supplied with
most operating systems.
Standard calculator notation is used for addition +, subtraction
-, multiplication *, division /, exponentiation
^, and modulus %.
The available elementary mathematical functions are:
abs(x)
Absolute value Absolute value of x. Example: abs(-1) = 1
acos(x)
Arc-cosine Arc-cosine of x where x is specified in radians. Example: acos(1.0) = 0.0
acosh(x)
Hyperbolic arc-cosine Hyperbolic arc-cosine of x where x is specified in radians. Example: acosh(1.0) = 0.0
asin(x)
Arc-sine Arc-sine of x where x is specified in radians. Example: asin(1.0) = 1.57079632679489661922
asinh(x)
Hyperbolic arc-sine Hyperbolic arc-sine of x where x is specified in radians. Example: asinh(1.0) = 0.88137358702
atan(x)
Arc-tangent Arc-tangent of x where x is specified in radians between -pi/2 and pi/2. Example: atan(1.0) = 0.78539816339744830961
atan2(y,x)
Arc-tangent2 Arc-tangent of y/x :Example atan2(1,3) = 0.321689857
atanh(x)
Hyperbolic arc-tangent Hyperbolic arc-tangent of x where x is specified in radians between -pi/2 and pi/2. Example: atanh(3.14159265358979323844) = 1.0
ceil(x)
Ceil Ceiling of x. Smallest integral value not less than argument. Example: ceil(0.1) = 1.0
cos(x)
Cosine Cosine of x where x is specified in radians. Example: cos(0.0) = 1.0
cosh(x)
Hyperbolic cosine Hyperbolic cosine of x where x is specified in radians. Example: cosh(0.0) = 1.0
erf(x)
Error function Error function of x where x is specified between -1 and 1. Example: erf(1.0) = 0.842701
erfc(x)
Complementary error function Complementary error function of x where x is specified between -1 and 1. Example: erfc(1.0) = 0.15729920705
exp(x)
Exponential Exponential of x, e^x. Example: exp(1.0) = 2.71828182845904523536
floor(x)
Floor Floor of x. Largest integral value not greater than argument. Example: floor(1.9) = 1
gamma(x)
Gamma function Gamma function of x, Gamma(x). The well-known and loved continuous factorial function. Example: gamma(0.5) = sqrt(pi)
gamma_inc_P(x)
Incomplete Gamma function Incomplete Gamma function of parameter a and variable x, gamma_inc_P(a,x). One of the four incomplete gamma functions. Example: gamma_inc_P(1,1) = 1-1/e
ln(x)
Natural Logarithm Natural logarithm of x, ln(x). Example: ln(2.71828182845904523536) = 1.0
log(x)
Natural Logarithm
Exact synonym for ln(x)
.
log10(x)
Base 10 Logarithm Base 10 logarithm of x, log10(x). Example: log(10.0) = 1.0
nearbyint(x)
Round inexactly Nearest integer to x is returned in floating-point format. No exceptions are raised for inexact conversions. Example: nearbyint(0.1) = 0.0
pow(x,y)
Power
Value of x is raised to the power of y.
Exceptions are raised for domain errors.
Due to type-limitations in the C language pow
function,
integer arguments are promoted (see Type Conversion) to type
NC_FLOAT
before evaluation.
Example:
pow(2,3) = 8
rint(x)
Round exactly Nearest integer to x is returned in floating-point format. Exceptions are raised for inexact conversions. Example: rint(0.1) = 0
round(x)
Round Nearest integer to x is returned in floating-point format. Round halfway cases away from zero, regardless of current IEEE rounding direction. Example: round(0.5) = 1.0
sin(x)
Sine Sine of x where x is specified in radians. Example: sin(1.57079632679489661922) = 1.0
sinh(x)
Hyperbolic sine Hyperbolic sine of x where x is specified in radians. Example: sinh(1.0) = 1.1752
sqrt(x)
Square Root Square Root of x, sqrt(x). Example: sqrt(4.0) = 2.0
tan(x)
Tangent Tangent of x where x is specified in radians. Example: tan(0.78539816339744830961) = 1.0
tanh(x)
Hyperbolic tangent Hyperbolic tangent of x where x is specified in radians. Example: tanh(1.0) = 0.761594155956
trunc(x)
Truncate Nearest integer to x is returned in floating-point format. Round halfway cases toward zero, regardless of current IEEE rounding direction. Example: trunc(0.5) = 0.0
The complete list of mathematical functions supported is platform-specific. Functions mandated by ANSI C are guaranteed to be present and are indicated with an asterisk 65. and are indicated with an asterisk. Use the ‘-f’ (or ‘fnc_tbl’ or ‘prn_fnc_tbl’) switch to print a complete list of functions supported on your platform. 66
This page lists the ncap2
operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.
Operator | Description | Associativity |
---|---|---|
++ -- | Postfix Increment/Decrement | Right to Left |
() | Parentheses (function call) | |
. | Method call | |
++ -- | Prefix Increment/Decrement | Right to Left |
+ - | Unary Plus/Minus | |
! | Logical Not | |
^ | Power of Operator | Right to Left |
* / % | Multiply/Divide/Modulus | Left To Right |
+ - | Addition/Subtraction | Left To Right |
>> << | Fortran style array clipping | Left to Right |
< <= | Less than/Less than or equal to | Left to Right |
> >= | Greater than/Greater than or equal to | |
== != | Equal to/Not equal to | Left to Right |
&& | Logical AND | Left to Right |
|| | Logical OR | Left to Right |
?: | Ternary Operator | Right to Left |
= | Assignment | Right to Left |
+= -= | Addition/subtraction assignment | |
*= /= | Multiplication/division assignment |
In this section a name refers to a variable, attribute, or dimension name. The allowed characters in a valid netCDF name vary from release to release. (See end section). To use metacharacters in a name, or to use a method name as a variable name, the name must be quoted wherever it occurs.
The default NCO name is specified by the regular expressions:
DGT: ('0'..'9'); LPH: ( 'a'..'z' | 'A'..'Z' | '_' ); name: (LPH)(LPH|DGT)+
The first character of a valid name must be alphabetic or the underscore. Subsequent characters must be alphanumeric or underscore, e.g., a1, _23, hell_is_666.
The valid characters in a quoted name are specified by the regular expressions:
LPHDGT: ( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9'); name: (LPHDGT|'-'|'+'|'.'|'('|')'|':' )+ ;
Quote a variable:
’avg’ , ’10_+10’,’set_miss’ ’+-90field’ , ’–test’=10.0d
Quote an attribute:
’three@10’, ’set_mss@+10’, ’666@hell’, ’t1@+units’="kelvin"
Quote a dimension:
’$10’, ’$t1–’, ’$–odd’, c1[’$10’,’$t1–’]=23.0d
The following comments are from the netCDF library definitions and
detail the naming conventions for each release.
netcdf-3.5.1
netcdf-3.6.0-p1
netcdf-3.6.1
netcdf-3.6.2
/* * ( [a-zA-Z]|[0-9]|'_'|'-'|'+'|'.'|'|':'|'@'|'('|')' )+ * Verify that name string is valid CDL syntax, i.e., all characters are * alphanumeric, '-', '_', '+', or '.'. * Also permit ':', '@', '(', or ')' in names for chemists currently making * use of these characters, but don't document until ncgen and ncdump can * also handle these characters in names. */
netcdf-3.6.3
netcdf-4.0 Final 2008/08/28
/* * Verify that a name string is valid syntax. The allowed name * syntax (in RE form) is: * * ([a-zA-Z_]|{UTF8})([^\x00-\x1F\x7F/]|{UTF8})* * * where UTF8 represents a multibyte UTF-8 encoding. Also, no * trailing spaces are permitted in names. This definition * must be consistent with the one in ncgen.l. We do not allow '/' * because HDF5 does not permit slashes in names as slash is used as a * group separator. If UTF-8 is supported, then a multi-byte UTF-8 * character can occur anywhere within an identifier. We later * normalize UTF-8 strings to NFC to facilitate matching and queries. */
The ncap2
custom function ’make_bounds()’ takes any monotonic 1D coordinate variable with regular or irregular (e.g., Gaussian) spacing and creates a bounds variable.
<bounds_var_out>=make_bounds( <coordinate_var_in>, <dim in>, <string>)
The name of the input coordinate variable.
The second dimension of the output variable, referenced as a dimension
(i.e., the name preceded by a dollarsign) not as a string name.
The size of this dimension should always be 2.
If the dimension does not yet exist create it first using defdim()
.
This optional string argument will be placed in the "bounds" attribute that will be created in the input coordinate variable. Normally this is the name of the bounds variable:
Typical usage:
defdim("nv",2); longitude_bounds=make_bounds(longitude,$nv,"longitude_bounds");
Another common CF convention:
defdim("nv",2); climatology_bounds=make_bounds(time,$nv,"climatology_bounds");
<zenith_out>=solar_zenith_angle( <time_in>, <latitude in>)
This function takes two arguments, mean local solar time and latitude.
Calculation and output is done with type NC_DOUBLE
.
The calendar attribute for <time_in> in is NOT read and is assumed to be
Gregorian (this is the calendar that UDUnits uses).
As part of the calculation <time_in> is converted to days since start of
year.
For some input units e.g., seconds, this function may produce
gobbledygook.
The output <zenith_out> is in degrees
.
For more details of the algorithm used please examine the function
solar_geometry()
in fmc_all_cls.cc
.
Note that this routine does not account for the equation of time,
and so can be in error by the angular equivalent of up to about fifteen
minutes time depending on the day of year.
my_time[time]={10.50, 11.0, 11.50, 12.0, 12.5, 13.0, 13.5, 14.0, 14.50, 15.00}; my_time@units="hours since 2017-06-21"; // Assume we are at Equator latitude=0.0; // 32.05428, 27.61159, 24.55934, 23.45467, 24.55947, 27.61184, 32.05458, 37.39353, 43.29914, 49.55782 ; zenith=solar_zenith_angle(my_time,latitude);
ncatted
netCDF Attribute Editor ¶SYNTAX
ncatted [-a att_dsc] [-a ...] [-D dbg] [-H] [-h] [--hdr_pad nbr] [--hpss] [-l path] [-O] [-o output-file] [-p path] [-R] [-r] [--ram_all] [-t] input-file [[output-file]]
DESCRIPTION
ncatted
edits attributes in a netCDF file.
If you are editing attributes then you are spending too much time in the
world of metadata, and ncatted
was written to get you back out as
quickly and painlessly as possible.
ncatted
can append, create, delete,
modify, and overwrite attributes (all explained below).
ncatted
allows each editing operation to be applied
to every variable in a file.
This saves time when changing attribute conventions throughout a file.
ncatted
is for writing attributes.
To read attribute values in plain text, use ncks -m -M
,
or define something like ncattget
as a shell command
(see Filters for ncks
).
Because repeated use of ncatted
can considerably increase the size
of the history
global attribute (see History Attribute), the
‘-h’ switch is provided to override automatically appending the
command to the history
global attribute in the output-file.
According to the netCDF User Guide, altering metadata in
netCDF files does not incur the penalty of recopying the entire file
when the new metadata occupies less space than the old metadata.
Thus ncatted
may run much faster (at least on netCDF3 files)
if judicious use of header padding (see Metadata Optimization) was
made when producing the input-file.
Similarly, using the ‘--hdr_pad’ option with ncatted
helps ensure that future metadata changes to output-file occur
as swiftly as possible.
When ncatted
is used to change the _FillValue
attribute,
it changes the associated missing data self-consistently.
If the internal floating-point representation of a missing value,
e.g., 1.0e36, differs between two machines then netCDF files produced
on those machines will have incompatible missing values.
This allows ncatted
to change the missing values in files from
different machines to a single value so that the files may then be
concatenated, e.g., by ncrcat
, without losing information.
See Missing values, for more information.
To master ncatted
one must understand the meaning of the
structure that describes the attribute modification, att_dsc
specified by the required option ‘-a’ or ‘--attribute’.
This option is repeatable and may be used multiple time in a single
ncatted
invocation to increase the efficiency of altering
multiple attributes.
Each att_dsc contains five elements.
This makes using ncatted
somewhat complicated, though
powerful.
The att_dsc fields are in the following order:
att_dsc = att_nm, var_nm, mode, att_type,
att_val
Attribute name.
Example: units
As of NCO 4.5.1 (July, 2015), ncatted
accepts
regular expressions (see Subsetting Files) for attribute names
(it has “always” accepted regular expressions for variable names).
Regular expressions will select all matching attribute names.
Variable name.
Example: pressure
, '^H2O'
.
Regular expressions (see Subsetting Files) are accepted and will
select all matching variable (and/or group) names.
The names global
and group
have special meaning.
Edit mode abbreviation.
Example: a
.
See below for complete listing of valid values of mode.
Attribute type abbreviation.
Example: c
.
See below for complete listing of valid values of att_type.
Attribute value.
Example: pascal
.
There should be no empty space between these five consecutive arguments. The description of these arguments follows in their order of appearance.
The value of att_nm is the name of the attribute to edit.
The meaning of this should be clear to all ncatted
users.
Both att_nm and var_nm may be specified as regular
expressions.
If att_nm is omitted (i.e., left blank) and Delete mode is
selected, then all attributes associated with the specified variable
will be deleted.
The value of var_nm is the name of the variable containing the
attribute (named att_nm) that you want to edit.
There are three very important and useful exceptions to this rule.
The value of var_nm can also be used to direct ncatted
to edit global attributes, or to repeat the editing operation for every
group or variable in a file.
A value of var_nm of global
indicates that att_nm
refers to a global (i.e., root-level) attribute, rather than to a
particular variable’s attribute.
This is the method ncatted
supports for editing global
attributes.
A value of var_nm of group
indicates that att_nm
refers to all groups, rather than to a particular variable’s or group’s
attribute.
The operation will proceed to edit group metadata for every group.
Finally, if var_nm is left blank, then ncatted
attempts to perform the editing operation on every variable in the file.
This option may be convenient to use if you decide to change the
conventions you use for describing the data.
As of NCO 4.6.0 (May, 2016), ncatted
accepts
the ‘-t’ (or long-option equivalent ‘--typ_mch’ or
‘--type_match’) option.
This causes ncatted
to perform the editing operation only on
variables that are the same type as the specified attribute.
The value of mode is a single character abbreviation (a
,
c
, d
, m
, n
, o
, or p
)
standing for one of seven editing modes:
a
Append. Append value att_val to current var_nm attribute att_nm value att_val, if any. If var_nm does not already have an existing attribute att_nm, it is created with the value att_val.
c
Create. Create variable var_nm attribute att_nm with att_val if att_nm does not yet exist. If var_nm already has an attribute att_nm, there is no effect, so the existing attribute is preserved without change.
d
Delete. Delete current var_nm attribute att_nm. If var_nm does not have an attribute att_nm, there is no effect. If att_nm is omitted (left blank), then all attributes associated with the specified variable are automatically deleted. When Delete mode is selected, the att_type and att_val arguments are superfluous and may be left blank.
m
Modify. Change value of current var_nm attribute att_nm to value att_val. If var_nm does not have an attribute att_nm, there is no effect.
n
Nappend.
Append value att_val to var_nm attribute att_nm value
att_val if att_nm already exists.
If var_nm does not have an attribute att_nm, there is no
effect.
In other words, if att_nm already exists, Nappend behaves like
Append otherwise it does nothing.
The mnenomic is “non-create append”.
Nappend mode was added to ncatted
in version 4.6.0 (May,
2016).
o
Overwrite. Write attribute att_nm with value att_val to variable var_nm, overwriting existing attribute att_nm, if any. This is the default mode.
p
Prepend.
Prepend value att_val to var_nm attribute att_nm value
att_val if att_nm already exists.
If var_nm does not have an attribute att_nm, there is no
effect.
Prepend mode was added to ncatted
in version 5.0.5 (January,
2022).
The value of att_type is a single character abbreviation
(f
, d
, l
, i
, s
, c
,
b
, u
) or a short string standing for one of the twelve
primitive netCDF data types:
f
Float.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_FLOAT
.
d
Double.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_DOUBLE
.
i, l
Integer or (its now deprecated synonym) Long.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_INT
.
s
Short.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_SHORT
.
c
Char.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_CHAR
.
b
Byte.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_BYTE
.
ub
Unsigned Byte.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_UBYTE
.
us
Unsigned Short.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_USHORT
.
u, ui, ul
Unsigned Int.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_UINT
.
ll, int64
Int64.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_INT64
.
ull, uint64
Uint64.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_UINT64
.
sng, string
String.
Value(s) specified in att_val will be stored as netCDF intrinsic
type NC_STRING
.
Note that ncatted
handles type NC_STRING
attributes
correctly beginning with version 4.3.3 released in July, 2013.
Earlier versions fail when asked to handle NC_STRING
attributes.
In Delete mode the specification of att_type is optional (and is ignored if supplied).
The value of att_val is what you want to change attribute
att_nm to contain.
The specification of att_val is optional in Delete (and is
ignored) mode.
Attribute values for all types besides NC_CHAR
must have an
attribute length of at least one.
Thus att_val may be a single value or one-dimensional array of
elements of type att_type
.
If the att_val is not set or is set to empty space,
and the att_type is NC_CHAR
, e.g., -a units,T,o,c,""
or -a units,T,o,c,
, then the corresponding attribute is set to
have zero length.
When specifying an array of values, it is safest to enclose
att_val in single or double quotes, e.g.,
-a levels,T,o,s,"1,2,3,4"
or
-a levels,T,o,s,'1,2,3,4'
.
The quotes are strictly unnecessary around att_val except
when att_val contains characters which would confuse the calling
shell, such as spaces, commas, and wildcard characters.
NCO processing of NC_CHAR
attributes is a bit like Perl in
that it attempts to do what you want by default (but this sometimes
causes unexpected results if you want unusual data storage).
If the att_type is NC_CHAR
then the argument is interpreted as a
string and it may contain C-language escape sequences, e.g., \n
,
which NCO will interpret before writing anything to disk.
NCO translates valid escape sequences and stores the
appropriate ASCII code instead.
Since two byte escape sequences, e.g., \n
, represent one-byte
ASCII codes, e.g., ASCII 10 (decimal), the stored
string attribute is one byte shorter than the input string length for
each embedded escape sequence.
The most frequently used C-language escape sequences are \n
(for
linefeed) and \t
(for horizontal tab).
These sequences in particular allow convenient editing of formatted text
attributes.
The other valid ASCII codes are \a
, \b
, \f
,
\r
, \v
, and \\
.
See ncks
netCDF Kitchen Sink, for more examples of string formatting
(with the ncks
‘-s’ option) with special characters.
Analogous to printf
, other special characters are also allowed by
ncatted
if they are “protected” by a backslash.
The characters "
, '
, ?
, and \
may be
input to the shell as \"
, \'
, \?
, and \\
.
NCO simply strips away the leading backslash from these
characters before editing the attribute.
No other characters require protection by a backslash.
Backslashes which precede any other character (e.g., 3
, m
,
$
, |
, &
, @
, %
, {
, and
}
) will not be filtered and will be included in the attribute.
Note that the NUL character \0
which terminates C language
strings is assumed and need not be explicitly specified.
If \0
is input, it is translated to the NUL character.
However, this will make the subsequent portion of the string, if any,
invisible to C standard library string functions.
And that may cause unintended consequences.
Because of these context-sensitive rules, one must use ncatted
with care in order to store data, rather than text strings, in an
attribute of type NC_CHAR
.
Note that ncatted
interprets character attributes
(i.e., attributes of type NC_CHAR
) as strings.
EXAMPLES
Append the string Data version 2.0.\n
to the global attribute
history
:
ncatted -a history,global,a,c,'Data version 2.0\n' in.nc
Note the use of embedded C language printf()
-style escape
sequences.
Change the value of the long_name
attribute for variable T
from whatever it currently is to “temperature”:
ncatted -a long_name,T,o,c,temperature in.nc
Many model and observational datasets use missing values that are not
annotated in the standard manner.
For example, at the time (2015–2018) of this writing,
the MPAS ocean and ice models use
-9.99999979021476795361e+33 as the missing value, yet do not
store a _FillValue
attribute with any variables.
To prevent arithmetic from treating these values as normal, designate
this value as the _FillValue
attribute:
ncatted -a _FillValue,,o,d,-9.99999979021476795361e+33 in.nc ncatted -t -a _FillValue,,o,d,-9.99999979021476795361e+33 in.nc ncatted -t -a _FillValue,,o,d,-9.99999979021476795361e+33 \ -a _FillValue,,o,f,1.0e36 -a _FillValue,,o,i,-999 in.nc
The first example adds the attribute to all variables. The ‘-t’ switch causes the second example to add the attribute only to double precision variables. This is often more useful, and can be used to provide distinct missing value attributes to each numeric type, as in the third example.
NCO arithmetic operators may not work as expected on
IEEE NaN (short for Not-a-Number) and NaN-like numbers such
as positive infinity and negative infinity
67.
One way to work-around this problem is to change IEEE NaNs
to normal missing values.
As of NCO 4.1.0 (March, 2012), ncatted
works with
NaNs (though none of NCO’s arithmetic operators do).
This limited support enables users to change NaN to a normal number
before performing arithmetic or propagating a NaN-tainted dataset.
First set the missing value (i.e., the value of the _FillValue
attribute) for the variable(s) in question to the IEEE NaN
value.
ncatted -a _FillValue,,o,f,NaN in.nc
Then change the missing value from the IEEE NaN value to a normal IEEE number, like 1.0e36 (or to whatever the original missing value was).
ncatted -a _FillValue,,m,f,1.0e36 in.nc
Some NASA MODIS datasets provide a real-world example.
ncatted -O -a _FillValue,,m,d,1.0e36 -a missing_value,,m,d,1.0e36 \ MODIS_L2N_20140304T1120.nc MODIS_L2N_20140304T1120_noNaN.nc
Delete all existing units
attributes:
ncatted -a units,,d,, in.nc
The value of var_nm was left blank in order to select all variables in the file. The values of att_type and att_val were left blank because they are superfluous in Delete mode.
Delete all attributes associated with the tpt
variable, and
delete all global attributes
ncatted -a ,tpt,d,, -a ,global,d,, in.nc
The value of att_nm was left blank in order to select all
attributes associated with the variable.
To delete all global attributes, simply replace tpt
with
global
in the above.
Modify all existing units
attributes to meter second-1
:
ncatted -a units,,m,c,'meter second-1' in.nc
Add a units
attribute of kilogram kilogram-1
to all
variables whose first three characters are ‘H2O’:
ncatted -a units,'^H2O',c,c,'kilogram kilogram-1' in.nc
Remove the _FillValue
attribute from lat
and lon
variables.
ncatted -O -a _FillValue,'[lat]|[lon]',d,, in.nc
Overwrite the quanta
attribute of variable
energy
to an array of four integers.
ncatted -a quanta,energy,o,s,'010,101,111,121' in.nc
As of NCO 3.9.6 (January, 2009), ncatted
accepts
extended regular expressions as arguments for variable names,
and, since NCO 4.5.1 (July, 2015), for attribute names.
ncatted -a isotope,'^H2O*',c,s,'18' in.nc ncatted -a '.?_iso19115$','^H2O*',d,, in.nc
The first example creates isotope
attributes for all variables
whose names contain ‘H2O’.
The second deletes all attributes whose names end in _iso19115
from all variables whose names contain ‘H2O’.
See Subsetting Files for more details on using regular
expressions.
As of NCO 4.3.8 (November, 2013), ncatted
accepts full and partial group paths in names of attributes,
variables, dimensions, and groups.
# Overwrite units attribute of specific 'lon' variable ncatted -O -a units,/g1/lon,o,c,'degrees_west' in_grp.nc # Overwrite units attribute of all 'lon' variables ncatted -O -a units,lon,o,c,'degrees_west' in_grp.nc # Delete units attribute of all 'lon' variables ncatted -O -a units,lon,d,, in_grp.nc # Overwrite units attribute with new type for specific 'lon' variable ncatted -O -a units,/g1/lon,o,sng,'degrees_west' in_grp.nc # Add new_att attribute to all variables ncatted -O -a new_att,,c,sng,'new variable attribute' in_grp.nc # Add new_grp_att group attribute to all groups ncatted -O -a new_grp_att,group,c,sng,'new group attribute' in_grp.nc # Add new_grp_att group attribute to single group ncatted -O -a g1_grp_att,g1,c,sng,'new group attribute' in_grp.nc # Add new_glb_att global attribute to root group ncatted -O -a new_glb_att,global,c,sng,'new global attribute' in_grp.nc
Note that regular expressions work well in conjuction with group path support. In other words, the variable name (including group path component) and the attribute names may both be extended regular expressions.
Demonstrate input of C-language escape sequences (e.g., \n
) and
other special characters (e.g., \"
)
ncatted -h -a special,global,o,c, '\nDouble quote: \"\nTwo consecutive double quotes: \"\"\n Single quote: Beyond my shell abilities!\nBackslash: \\\n Two consecutive backslashes: \\\\\nQuestion mark: \?\n' in.nc
Note that the entire attribute is protected from the shell by single quotes. These outer single quotes are necessary for interactive use, but may be omitted in batch scripts.
Although ncatted
accepts multiple ‘-a att_dst’
options simultaneously, modifying lengthy commands can become
unwieldy.
To preserve simplicity in storing/modifying multiple attribute edits,
consider storing the options separately in a text file and
assembling them at run-time to generate and submit the correct
command.
One such method uses the xargs
command to intermediate
between an on-disk list attributes to change and the ncatted
command.
For example, use an intermediate file named options.txt
to store one option per line thusly
cat > opt.txt << EOF -a institution,global,m,c,\"Super Cool University\" -a source,global,c,c,\"My Awesome Radar\" -a contributors,global,c,c,\"Enrico Fermi, Galileo Galilei, Leonardo Da Vinci\" ... EOF
The backslashes preserve the whitespace in the individual attributes
for correct parsing by the shell.
Simply substituting the expansion of this file through xargs
directly on the command line fails to work (why?).
However, a simple workaround is to use xargs
to construct
the command string, and execute that string with eval
:
opt=$(cat opt.txt | xargs) cmd="ncatted -O ${opt} in.nc out.nc" eval $cmd
This procedure can by modified to employ more complex option pre-processing using other tools such as Awk, Perl, or Python.
ncbo
netCDF Binary Operator ¶SYNTAX
ncbo [-3] [-4] [-5] [-6] [-7] [-A] [-C] [-c] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride]]] [-F] [--fl_fmt fl_fmt] [-G gpe_dsc] [-g grp[,...]] [--glb ...] [-H] [-h] [--hdr_pad nbr] [--hpss] [-L dfl_lvl] [-l path] [--no_cll_msr] [--no_frm_trm] [--no_tmp_fl] [-O] [-o file_3] [-p path] [--qnt ...] [--qnt_alg alg_nm] [-R] [-r] [--ram_all] [-t thr_nbr] [--unn] [-v var[,...]] [-X ...] [-x] [-y op_typ] file_1 file_2 [file_3]
DESCRIPTION
ncbo
performs binary operations on variables in file_1
and the corresponding variables (those with the same name) in
file_2 and stores the results in file_3.
The binary operation operates on the entire files (modulo any excluded
variables).
See Missing values, for treatment of missing values.
One of the four standard arithmetic binary operations currently
supported must be selected with the ‘-y op_typ’ switch (or
long options ‘--op_typ’ or ‘--operation’).
The valid binary operations for ncbo
, their definitions,
corresponding values of the op_typ key, and alternate invocations
are:
Definition: file_3 = file_1 + file_2
Alternate invocation: ncadd
op_typ key values: ‘add’, ‘+’, ‘addition’
Examples: ‘ncbo --op_typ=add 1.nc 2.nc 3.nc’, ‘ncadd 1.nc 2.nc 3.nc’
Definition: file_3 = file_1 - file_2
Alternate invocations: ncdiff
, ncsub
, ncsubtract
op_typ key values: ‘sbt’, ‘-’, ‘dff’, ‘diff’, ‘sub’, ‘subtract’, ‘subtraction’
Examples: ‘ncbo --op_typ=- 1.nc 2.nc 3.nc’, ‘ncdiff 1.nc 2.nc 3.nc’
Definition: file_3 = file_1 * file_2
Alternate invocations: ncmult
, ncmultiply
op_typ key values: ‘mlt’, ‘*’, ‘mult’, ‘multiply’, ‘multiplication’
Examples: ‘ncbo --op_typ=mlt 1.nc 2.nc 3.nc’, ‘ncmult 1.nc 2.nc 3.nc’
Definition: file_3 = file_1 / file_2
Alternate invocation: ncdivide
op_typ key values: ‘dvd’, ‘/’, ‘divide’, ‘division’
Examples: ‘ncbo --op_typ=/ 1.nc 2.nc 3.nc’, ‘ncdivide 1.nc 2.nc 3.nc’
Care should be taken when using the shortest form of key values, i.e., ‘+’, ‘-’, ‘*’, and ‘/’. Some of these single characters may have special meanings to the shell 68. Place these characters inside quotes to keep them from being interpreted (globbed) by the shell 69. For example, the following commands are equivalent
ncbo --op_typ=* 1.nc 2.nc 3.nc # Dangerous (shell may try to glob) ncbo --op_typ='*' 1.nc 2.nc 3.nc # Safe ('*' protected from shell) ncbo --op_typ="*" 1.nc 2.nc 3.nc # Safe ('*' protected from shell) ncbo --op_typ=mlt 1.nc 2.nc 3.nc ncbo --op_typ=mult 1.nc 2.nc 3.nc ncbo --op_typ=multiply 1.nc 2.nc 3.nc ncbo --op_typ=multiplication 1.nc 2.nc 3.nc ncmult 1.nc 2.nc 3.nc # First do 'ln -s ncbo ncmult' ncmultiply 1.nc 2.nc 3.nc # First do 'ln -s ncbo ncmultiply'
No particular argument or invocation form is preferred. Users are encouraged to use the forms which are most intuitive to them.
Normally, ncbo
will fail unless an operation type is specified
with ‘-y’ (equivalent to ‘--op_typ’).
You may create exceptions to this rule to suit your particular tastes,
in conformance with your site’s policy on symbolic links to
executables (files of a different name point to the actual executable).
For many years, ncdiff
was the main binary file operator.
As a result, many users prefer to continue invoking ncdiff
rather than memorizing a new command (‘ncbo -y sbt’) which
behaves identically to the original ncdiff
command.
However, from a software maintenance standpoint, maintaining a distinct
executable for each binary operation (e.g., ncadd
) is untenable,
and a single executable, ncbo
, is desirable.
To maintain backward compatibility, therefore, NCO
automatically creates a symbolic link from ncbo
to
ncdiff
.
Thus ncdiff
is called an alternate invocation of
ncbo
.
ncbo
supports many additional alternate invocations which must
be manually activated.
Should users or system adminitrators decide to activate them, the
procedure is simple.
For example, to use ‘ncadd’ instead of ‘ncbo --op_typ=add’,
simply create a symbolic link from ncbo
to ncadd
70.
The alternatate invocations supported for each operation type are listed
above.
Alternatively, users may always define ‘ncadd’ as an alias to
‘ncbo --op_typ=add’
71.
It is important to maintain portability in NCO scripts. Therefore we recommend that site-specfic invocations (e.g., ‘ncadd’) be used only in interactive sessions from the command-line. For scripts, we recommend using the full invocation (e.g., ‘ncbo --op_typ=add’). This ensures portability of scripts between users and sites.
ncbo
operates (e.g., adds) variables in file_2 with the
corresponding variables (those with the same name) in file_1 and
stores the results in file_3.
Variables in file_1 or file_2 are broadcast to conform
to the corresponding variable in the other input file if
necessary72.
Now ncbo
is completely symmetric with respect to file_1
and file_2, i.e.,
file_1 - file_2 = - (file_2 - file_1.
Broadcasting a variable means creating data in non-existing dimensions
by copying data in existing dimensions.
For example, a two dimensional variable in file_2 can be
subtracted from a four, three, or two (not one or zero)
dimensional variable (of the same name) in file_1
.
This functionality allows the user to compute anomalies from the mean.
In the future, we will broadcast variables in file_1, if necessary
to conform to their counterparts in file_2.
Thus, presently, the number of dimensions, or rank, of any
processed variable in file_1 must be greater than or equal to the
rank of the same variable in file_2.
Of course, the size of all dimensions common to both file_1 and
file_2 must be equal.
When computing anomalies from the mean it is often the case that
file_2 was created by applying an averaging operator to a file
with initially the same dimensions as file_1 (often file_1
itself).
In these cases, creating file_2 with ncra
rather than
ncwa
will cause the ncbo
operation to fail.
For concreteness say the record dimension in file_1
is
time
.
If file_2 was created by averaging file_1 over the
time
dimension with the ncra
operator (rather than with
the ncwa
operator), then file_2 will have a time
dimension of size 1 rather than having no time
dimension at
all
73.
In this case the input files to ncbo
, file_1 and
file_2, will have unequally sized time
dimensions which
causes ncbo
to fail.
To prevent this from occurring, use ncwa
to remove the
time
dimension from file_2.
See the example below.
ncbo
never operates on coordinate variables or variables
of type NC_CHAR
or NC_STRING
.
This ensures that coordinates like (e.g., latitude and longitude) are
physically meaningful in the output file, file_3.
This behavior is hardcoded.
ncbo
applies special rules to some
CF-defined (and/or NCAR CCSM or NCAR CCM
fields) such as ORO
.
See CF Conventions for a complete description.
Finally, we note that ncflint
(see ncflint
netCDF File Interpolator) is designed for file interpolation.
As such, it also performs file subtraction, addition, multiplication,
albeit in a more convoluted way than ncbo
.
Beginning with NCO version 4.3.1 (May, 2013), ncbo
supports group broadcasting.
Group broadcasting means processing data based on group patterns in the
input file(s) and automatically transferring or transforming groups to
the output file.
Consider the case where file_1 contains multiple groups each with
the variable v1, while file_2 contains v1 only in its
top-level (i.e., root) group.
Then ncbo
will replicate the group structure of file_1
in the output file, file_3.
Each group in file_3 contains the output of the corresponding
group in file_1 operating on the data in the single group in
file_2.
An example is provided below.
EXAMPLES
Say files 85_0112.nc and 86_0112.nc each contain 12 months of data. Compute the change in the monthly averages from 1985 to 1986:
ncbo 86_0112.nc 85_0112.nc 86m85_0112.nc ncdiff 86_0112.nc 85_0112.nc 86m85_0112.nc ncbo --op_typ=sub 86_0112.nc 85_0112.nc 86m85_0112.nc ncbo --op_typ='-' 86_0112.nc 85_0112.nc 86m85_0112.nc
These commands are all different ways of expressing the same thing.
The following examples demonstrate the broadcasting feature of
ncbo
.
Say we wish to compute the monthly anomalies of T
from the yearly
average of T
for the year 1985.
First we create the 1985 average from the monthly data, which is stored
with the record dimension time
.
ncra 85_0112.nc 85.nc ncwa -O -a time 85.nc 85.nc
The second command, ncwa
, gets rid of the time
dimension
of size 1 that ncra
left in 85.nc.
Now none of the variables in 85.nc has a time
dimension.
A quicker way to accomplish this is to use ncwa
from the
beginning:
ncwa -a time 85_0112.nc 85.nc
We are now ready to use ncbo
to compute the anomalies for 1985:
ncdiff -v T 85_0112.nc 85.nc t_anm_85_0112.nc
Each of the 12 records in t_anm_85_0112.nc now contains the
monthly deviation of T
from the annual mean of T
for each
gridpoint.
Say we wish to compute the monthly gridpoint anomalies from the zonal
annual mean.
A zonal mean is a quantity that has been averaged over the
longitudinal (or x) direction.
First we use ncwa
to average over longitudinal direction
lon
, creating 85_x.nc, the zonal mean of 85.nc.
Then we use ncbo
to subtract the zonal annual means from the
monthly gridpoint data:
ncwa -a lon 85.nc 85_x.nc ncdiff 85_0112.nc 85_x.nc tx_anm_85_0112.nc
This examples works assuming 85_0112.nc has dimensions
time
and lon
, and that 85_x.nc has no time
or lon
dimension.
Group broadcasting simplifies evaluation of multiple models against
observations.
Consider the input file cmip5.nc which contains multiple
top-level groups cesm
, ecmwf
, and giss
, each of
which contains the surface air temperature field tas
.
We wish to compare these models to observations stored in obs.nc
which contains tas
only in its top-level (i.e., root) group.
It is often the case that many models and/or model simulations exist,
whereas only one observational dataset does.
We evaluate the models and obtain the bias (difference) between models
and observations by subtracting obs.nc from cmip5.nc.
Then ncbo
“broadcasts” (i.e., replicates) the observational
data to match the group structure of cmip5.nc, subtracts,
and then stores the results in the output file, bias.nc
which has the same group structure as cmip5.nc.
% ncbo -O cmip5.nc obs.nc bias.nc % ncks -H -v tas -d time,3 bias.nc /cesm/tas time[3] tas[3]=-1 /ecmwf/tas time[3] tas[3]=0 /giss/tas time[3] tas[3]=1
As a final example, say we have five years of monthly data (i.e.,
60 months) stored in 8501_8912.nc and we wish to create a
file which contains the twelve month seasonal cycle of the average
monthly anomaly from the five-year mean of this data.
The following method is just one permutation of many which will
accomplish the same result.
First use ncwa
to create the five-year mean:
ncwa -a time 8501_8912.nc 8589.nc
Next use ncbo
to create a file containing the difference of
each month’s data from the five-year mean:
ncbo 8501_8912.nc 8589.nc t_anm_8501_8912.nc
Now use ncks
to group together the five January anomalies in
one file, and use ncra
to create the average anomaly for all
five Januarys.
These commands are embedded in a shell loop so they are repeated for all
twelve months:
for idx in {1..12}; do # Bash Shell (version 3.0+) idx=`printf "%02d" ${idx}` # Zero-pad to preserve order ncks -F -d time,${idx},,12 t_anm_8501_8912.nc foo.${idx} ncra foo.${idx} t_anm_8589_${idx}.nc done for idx in 01 02 03 04 05 06 07 08 09 10 11 12; do # Bourne Shell ncks -F -d time,${idx},,12 t_anm_8501_8912.nc foo.${idx} ncra foo.${idx} t_anm_8589_${idx}.nc done foreach idx (01 02 03 04 05 06 07 08 09 10 11 12) # C Shell ncks -F -d time,${idx},,12 t_anm_8501_8912.nc foo.${idx} ncra foo.${idx} t_anm_8589_${idx}.nc end
Note that ncra
understands the stride
argument so the
two commands inside the loop may be combined into the single command
ncra -F -d time,${idx},,12 t_anm_8501_8912.nc foo.${idx}
Finally, use ncrcat
to concatenate the 12 average monthly
anomaly files into one twelve-record file which contains the entire
seasonal cycle of the monthly anomalies:
ncrcat t_anm_8589_??.nc t_anm_8589_0112.nc
ncchecker
netCDF Compliance Checker ¶SYNTAX
ncchecker [-D dbg] [-i drc_in] [--tests=tst_lst] [-x] [-v var[,...]] [--version] [input-files]
DESCRIPTION
As of version 5.2.2 (March, 2024), NCO comes with the
ncchecker
script.
This command checks files for compliance with best practices rules and
recommendations from various data and metadata standards bodies.
These include the Climate & Forecast (CF) Metadata
Conventions and the NASA Dataset Interoperability Working Group
(DIWG) recommendations.
Only a small subset (six tests) of CF or
DIWG recommendations are currently supported.
The number of tests implemented, or, equivalently, of recommendations
checked, is expected to grow.
ncchecker
reads each data file in input-files, in
drc_in, or piped through standard input.
It performs the checks requested in the ‘--tests=tst_lst’
option, if any (otherwise it performs all tests), and writes the
results to stdout
.
The command supports some standard NCO options, including
increasing the verbosity level with ‘-D dbg_lvl’,
excluding variables with ‘-x -v var_lst’, variable
subsetting with ‘-v var_lst’, and printing the
version with ‘--version’.
The output contains counts of the location and number of failed tests,
or prints “SUCCESS” for tests with no failures.
EXAMPLES
ncchecker in1.nc in2.nc # Run all tests on two files ncchecker -v var1,var2 in1.nc # Check only two variables ncchecker *.nc # Glob input files via wildcard ls *.nc | ncchecker # Input files via stdin ncchecker --dbg=2 *.nc # Debug ncchecker ncchecker --tests=nan,mss *.nc # Select only two tests ncchecker --tests=xtn,tm,nan,mss,chr,bnd *.nc # Change test ordering ncchecker file:///Users/zender/in_zarr4#mode=nczarr,file # Check Zarr object(s)
ncclimo
netCDF Climatology Generator ¶SYNTAX
ncclimo [-3] [-4] [-5] [-6] [-7] [-a wnt_md] [-C clm_md] [-c caseid] [--cmp cmp_sng] [-d dbg_lvl] [--d2f] [--dpf=dpf] [--dpt_fl=dpt_fl] [-E yr_prv] [-e yr_end] [-f fml_nm] [--fl_fmt=fl_fmt] [--glb_avg] [--glb_stt=glb_stt] [-h hst_nm] [-i drc_in] [-j job_nbr] [-L dfl_lvl] [-l lnk_flg] [-m mdl_nm] [--mth_end=mth_end] [--mth_srt=mth_srt] [-n nco_opt] [--no_cll_msr] [--no_frm_trm] [--no_ntv_tms] [--no_stg_grd] [--no_stdin] [-O drc_rgr] [-o drc_out] [-P prc_typ] [-p par_typ] [--qnt=qnt_prc] [-R rgr_opt] [-r rgr_map] [-S yr_prv] [-s yr_srt] [--seasons=csn_lst] [--sgs_frc=sgs_frc] [--split] [--sum_scl=sum_scl] [-t thr_nbr] [--tpd=tpd] [--uio] [-v var_lst] [--var_xtr=var_xtr] [--version] [--vrt_out=vrt_fl] [--vrt_xtr=vrt_xtr] [-X drc_xtn] [-x drc_prv] [--xcl_var] [-Y rgr_xtn] [-y rgr_prv] [--ypf=ypf_max] [input-files]
DESCRIPTION
In climatology generation mode, ncclimo
ingests “raw” data
consisting of interannual sets of files, each containing sub-daily
(diurnal), daily, monthly, or yearly averages, and from these
produces climatological daily, monthly, seasonal, and/or annual
means.
Alternatively, in timeseries reshaping (aka “splitter”) mode,
ncclimo
will subset and temporally split the input raw data
timeseries into per-variable files spanning the entire period.
ncclimo
can optionally (call ncremap
to) regrid
all output files in either mode.
Unlike the rest of NCO, ncclimo
and
ncremap
are shell scripts, not compiled binaries74.
As of NCO 4.9.2 (February, 2020), the ncclimo
and ncremap
scripts export the environment variable
HDF5_USE_FILE_LOCKING
with a value of FALSE
.
This prevents failures of these operators that can occur with some
versions of the underlying HDF library that attempt to lock files
on file systems that cannot, or do not, support it.
There are five (usually) required options (‘-c’, ‘-s’,
‘-e’, ‘-i’, and ‘-o’)) to generate climatologies, and
many more options are available to customize the processing.
Options are similar to ncremap
options.
Standard ncclimo
usage for climatology generation looks like
ncclimo -c caseid -s srt_yr -e end_yr -i drc_in -o drc_out ncclimo -m mdl_nm -c caseid -s srt_yr -e end_yr -i drc_in -o drc_out ncclimo -v var_lst -c caseid -s srt_yr -e end_yr -i drc_in -o drc_out ncclimo --case=caseid --start=srt_yr --end=end_yr --input=drc_in --output=drc_out
In climatology generation mode, ncclimo
constructs the list
of input filenames from the arguments to the caseid, date, and
model-type options.
As of NCO version 4.9.4 (September, 2020), ncclimo
can produce climatologies of high-frequency input data supplied via
standard input, positional command-line options, or directory
contents, all input methods traditionally supported only in splitter
mode.
Instead of using the caseid
option to help generate the input
filenames as it does for normal (monthly) climos, ncclimo
uses the caseid
option, when provided, to rename the output
files for high-frequency climos.
# Generate diurnal climos from high-frequency CMIP6 timeseries cd ${drc_in};ls ${caseid}*.h4.nc | ncclimo --clm_md=hfc \ -c ${caseid} --yr_srt=2001 --yr_end=2002 --drc_out=${HOME}
ncclimo
automatically switches to timeseries reshaping mode
if it receives a list of files from stdin
, or, alternatively,
placed as positional arguments (after the last command-line option), or
if neither of these is done and no caseid is specified, in which
case it assumes all *.nc
files in drc_in constitute the
input file list.
# Split monthly timeseries into CMIP-like timeseries cd ${drc_in};ls ${caseid}*.h4.nc | ncclimo -v=T \ --ypf=1 --yr_srt=56 --yr_end=76 --drc_out=${HOME} # Split high-frequency timeseries into CMIP-like timeseries cd ${drc_in};ls ${caseid}*.h4.nc | ncclimo --clm_md=hfs -v=T \ --ypf=1 --yr_srt=56 --yr_end=76 --drc_out=${HOME}
Options for ncclimo
and ncremap
come in both short
(single-letter) and long forms.
The handful of long-option synonyms for each option allows the user
to imbue the commands with a level of verbosity and precision that suits
her taste.
A complete description of all options is given below, in alphabetical
order of the short option letter.
Long option synonyms are given just after the letter.
When invoked without options, ncclimo
and ncremap
print a succinct table of all options and some examples.
All valid options for both operators are listed in their command
syntax above but, for brevity, options that ncclimo
passes
straight through to ncremap
are only fully described in the
table of ncremap
options.
--dec_md
, --dcm_md
, --december_mode
, --dec_mode
) ¶Winter mode aka December mode determines the start and end months of
the climatology and the type of NH winter seasonal average.
Two valid arguments are jfd
(default, or synonyms sdd
and JFD
) and djf
(or synonyms scd
and
DJF
).
DJF-mode is the same as SCD-mode which
stands for “Seasonally Continuous December”.
The first month used is December of the year before the start year
specified with ‘-s’.
The last month is November of the end year specified with ‘-e’.
In DJF-mode the Northern Hemisphere winter seasonal
climatology will be computed with sets of the three consecutive
months December, January, and February (DJF) where the calendar
year of the December months is always one less than the calendar
year of January and February.
JFD-mode is the same as SDD-mode which stands for
“Seasonally Discontinuous December”.
The first month used is January of the specified start year.
The last month is December of the end year specified with ‘-e’.
In JFD-mode the Northern Hemisphere winter seasonal
climatology will be computed with sets of the three non-consecutive
months January, February, and December (JFD) from each calendar year.
--clm_md
, --climatology_mode
, --mode
, --climatology
) ¶Climatology mode.
Valid values for clm_md are
ann
(or synonyms annual
, annual
, yearly
, or year
)
for annual-mode climatologies,
dly
(or synonyms daily
, doy
, or day
)
for daily-mode climatologies,
hfc
(or synonyms high_frequency_climo
or hgh_frq_clm
)
for high-frequency (diurnally resolved) climos,
hfs
(or synonyms high_frequency_splitter
or hgh_frq_spl
)
for high-frequency splitting, and
mth
(or synonyms month
or monthly
)
for monthly climotologies.
The value indicates the timespan of each input file for annual and
monthly climatologies.
The default mode is ‘mth’, which means input files are monthly
averages.
Use ‘ann’ when the input files are a series of annual means
(a common temporal resolution for ice-sheet simulations).
The value ‘dly’ is used only input files whose temporal
resolution is daily or finer, and when the desired output is a
day-of-year climatology where the means are output for each
day of a 365 day year.
Day-of-year climatologies are uncommon, yet useful for showing
daily variability.
The value ‘hfc’ indicates a high-frequency climatology where
the output will be a traditional set of climatological monthly,
seasonal, or annual means similar to monthly climos, except that
each file will have the same number of timesteps-per-day as the
input data to resolve the diurnal cycle.
The value ‘hfs’ indicates a high-frequency splitting operation
where an interannual input timeseries will be split into regular
size segments of a given number of years, similar to CMIP
timeseries.
The climatology generator and splitter do not require that daily-mode input files begin or end on daily boundaries. These tools hyperslab the input files using the date information required to performed their analysis. This facilitates analyzing datasets with varying numbers of days per input file.
Explicitly specifying ‘--clm_md=mth’ serves a secondary purpose,
namely invoking the default setting on systems that control
stdin
.
When ncclimo
detects that stdin
is not attached to the
terminal (keyboard) it automatically expects a list of files on
stdin
.
Some environments, however, hijack stdin
for their purposes
and thereby confuse ncclimo
into expecting a list argument.
Users have encountered this issue when attempting to run
ncclimo
in Python parallel environments, via inclusion in
crontab
, and in nohup
-mode (whatever that is!).
In such cases, explicitly specify ‘--clm_md=mth’ (or ann
or
day
) to persuade ncclimo
to compute a normal
climatology.
--case
, --caseid
, --case_id
) ¶Simulation name, or any input filename for non-CESM’ish files. The use of caseid is required in climate generation mode (unless equivalent information is provided through other options), where caseid is used to construct both input and output filenames. For CESM’ish input files like famipc5_ne30_v0.3_00001.cam.h0.1980-01.nc, specify ‘-c famipc5_ne30_v0.3_00001’. The ‘.cam.’ and ‘.h0.’ bits are added internally to produce the input filenames. Modify these via the -m mdl_nm and -h hst_nm options if needed.
For input files named slightly differently than standard
CESM’ish names, supply the filename (excluding the path
component) as the caseid and then ncclimo
will attempt
to parse that by matching to a database of regular expressions known
to be favored by various other datasets.
These expressions are all various formats of dates at the end of the
filenames, and adhere to the general convention
prefix[.-]YYYY[-]MM[-]DD[-]SSSSS.suffix.
The particular formats currently supported, as of NCO version
5.1.6 (May, 2023) are:
prefix_YYYYMM
.suffix,
prefix.YYYY-MM
.suffix,
prefix.YYYY-MM-01
.suffix, and
prefix.YYYY-MM-01-00000
.suffix.
For example, input files like merra2_198001.nc (i.e., the six
digits that precede the suffix are YYYYMM-format), specify
‘-c merra2_198001.nc’ and the prefix (merra2
) will be
automatically abstracted and used to template and generate all the
filenames based on the specified yr_srt and yr_end.
ncclimo -c merra2_198001.nc --start=1980 --end=1999 --drc_in=${drc} ncclimo -c cesm_1980-01.nc --start=1980 --end=1999 --drc_in=${drc} ncclimo -c eamxx_1980-01-00000.nc --start=1980 --end=1999 --drc_in=${drc}
Please tell us any common dataset filename regular expressions that you would
like added to ncclimo
’s internal database.
The ‘--caseid=caseid’ option is not mandatory in
the High-Frequency-Splitter (clm_md=hfs
) and
High-Frequency-Climatology (clm_md=hfc
) modes.
Those modes expect all input filenames to be entered from the
command-line so there is no internal need to create filenames
from the caseid variable.
Instead, when caseid is specified in a high-freqency mode,
its value is used to name the output files in a similar manner
to the ‘-f fml_nm’ option.
--dbg_lvl
, --dbg
, --debug
, --debug_level
) ¶Specifies a debugging level similar to the rest of NCO.
If dbg_lvl = 1, ncclimo
prints more extensive
diagnostics of its behavior.
If dbg_lvl = 2, ncclimo
prints the commands
it would execute at any higher or lower debugging level, but does
not execute these commands.
If dbg_lvl > 2, ncclimo
prints the diagnostic
information, executes all commands, and passes-through the debugging
level to the regridder (ncks
) for additional diagnostics.
--d2f
, --d2s
, --dbl_flt
, --dbl_sgl
, --double_float
) ¶This switch (which takes no argument) causes ncclimo
to invoke ncremap
with the same switch, so that
ncremap
converts all double precision non-coordinate
variables to single precision in the regridded file.
This switch has no effect on files that are not regridded.
To demote the precision in such files, use ncpdq
to apply
the dbl_flt
packing map to the file directly.
--dpf
, --days_per_file
) ¶The number of days-per-file in files ingested by ncclimo
.
It can sometimes be difficult for ncclimo
to infer the
number of days-per-file in high-frequency input files, i.e., those
with 1 or more timesteps-per-day.
In such cases, users may override the inferred value by explicitly
specifying --dpf=dpf
.
--dpt_fl
, --depth_file
, --mpas_fl
, --mpas_depth
) ¶The ‘--dpt_fl=dpt_fl’ triggers the addition of a depth
coordinate to MPAS ocean datasets that will undergo
regridding.
ncclimo
passes this option through to ncremap
,
and this option has no effect when ncclimo
does not invoke
ncremap
.
The ncremap
documentation contains the full description of
this option.
--end_yr
, --yr_end
, --end_year
, --year_end
, --end
) ¶End year (example: 2000). By default, the last month is December of the specified end year. If ‘-a scd’ is specified, the last month used is November of the specified end year.
--fml_nm
, --fml
, --family
, --family_name
) ¶Family name (nickname) of output files.
In climate generation mode, output climo file names are constructed by
default with the same caseid as the input files.
The fml_nm, if supplied, replaces caseid in output climo
names, which are of the form
fml_nm_XX_YYYYMM_YYYYMM.nc
where XX is the month or
seasonal abbreviation.
Use ‘-f fml_nm’ to simplify long names, avoid overlap, etc.
Example values of fml_nm are ‘control’, ‘experiment’,
and (for a single-variable climo) ‘FSNT’.
In timeseries reshaping mode, fml_nm will be used, if supplied,
as an additional string in the output filename.
For example, specifying ‘-f control’ would cause
T_000101_000912.nc to be instead named
T_control_000101_000912.nc.
--hst_nm
, --history_name
, --history
) ¶History volume name of file used to generate climatologies.
This referring to the hst_nm character sequence used to construct
input file names: caseid.mdl_nm.
hst_nm.YYYY-MM.nc
.
By default input climo file names are constructed from the caseid
of the input files, together with the model name mdl_nm (specified
with ‘-m’) and the date range.
Use ‘-h hst_nm’ to specify alternative history volumes.
Examples include ‘h0’ (default, works for CAM,
CLM/CTSM/ELM), ‘h1’, and ‘h’ (for CISM).
--drc_in
, --in_drc
, --dir_in
, --input
) ¶Directory containing all monthly mean files to read as input to the
climatology.
The use of drc_in is mandatory in climate generation mode and is
optional in timeseries reshaping mode.
In timeseries reshaping mode, ncclimo
uses all netCDF files
(meaning files with suffixes .nc
, .nc3
, .nc4
,
.nc5
, .nc6
, .nc7
,
.cdf
, .hdf
, .he5
, or .h5
) in drc_in to
create the list of input files when no list is provided through
stdin
or as positional arguments to the command-line.
--job_nbr
, --job_number
, --jobs
) ¶The job_nbr parameter controls the parallelism granularity of both timeseries reshaping (aka splitting) and climatology generation. These modes parallelize over different types of tasks, so we describe the effects of job_nbr separately, first for climatologies, then for splitting. However, for both modes, job_nbr specifies the total number of simultaneous processes to run in parallel either on the local node for Background parallelism, or across all the nodes for MPI parallelism (i.e., job_nbr is the simultaneous total across all nodes, it is not the simultaneous number per node).
For climatology generation, job_nbr specifies the number of
averaging tasks to perform simultaneously on the local node for
Background parallelism, or spread across all nodes for
MPI-parallelism.
By default ncclimo
sets job_nbr = 12 for
background parallelism mode.
This number ensures that monthly averages for all individual
months complete more-or-less simultaneously, so that all seasonal
averages can then be computed.
However, many nodes are too small to simultaneously average multiple
distinct months (January, February, etc.).
Hence job_nbr may be set to any factor of 12, i.e., 1, 2, 3, 4,
6, or 12.
For Background parallelism, setting job_nbr = 4 causes
four-months to be averaged at one time.
After three batches of four-months complete, the climatology generator
then moves on to seasonal averaging and regridding.
In MPI-mode, ncclimo
defaults to
job_nbr = nd_nbr unless the user explicitly sets
job_nbr to a different value.
For the biggest jobs, when a single-month nearly exhausts the
RAM on a node, this default value
job_nbr = nd_nbr ensures that each node gets only
one job at a time.
To override this default for MPI-parallelism, set
job_nbr >= nd_nbr otherwise some nodes will be idle
for the entire time.
If a node can handle averaging three distinct months simultaneously,
then try job_nbr = 3*nd_nbr.
Never set job_nbr > 12 in climatology mode, since there
are at most only twelve jobs that can be performed in parallel.
For splitting, job_nbr specifies the number of simultaneous
subsetting processes to spawn during parallel execution for both
Background and MPI-parallelism.
In both parallelism modes ncclimo
spawns processes in
batches of job_nbr jobs, then waits for those processes to
complete.
Once a batch finishes, ncclimo
spawns the next batch.
For Background-parallelism, all jobs are spawned to the local node.
For MPI-parallelism, all jobs are spawned in round-robin
fashion to all available nodes until job_nbr jobs are running.
Rinse, lather, repeat until all variables have been split.
The splitter chooses its default value of job_nbr based on
on the parallelism mode.
For Background parallelism, job_nbr defaults to the number of
variables to be split, so that not specifying job_nbr results
in launching var_nbr simultaneous splitter tasks.
This scales well to over a hundred variables in our tests
75.
In practice, splitting timeseries consumes minimal memory, since
ncrcat
(which underlies the splitter) only holds one
record (timestep) of a variable in memory Memory Requirements.
However, if splitting consumes so much RAM (e.g., because variables are large and/or the number of jobs is large) that a single node can perform only one or a few subsetting jobs at a time, then it is reasonable to use MPI-mode to split the datasets. For MPI-parallelism, job_nbr defaults to the number of nodes requested. This helps prevent users from overloading nodes with too many jobs. Usually, however, nodes can usually subset (and then regrid, if requested) multiple variables simultaneously. In summary, the splitter defaults to job_nbr = var_nbr in Background mode, and to job_nbr = node_nbr in MPI mode. Subject to the availability of adequate RAM, expand the number of jobs-per-node by increasing job_nbr until overall throughput peaks.
The main throughput bottleneck in timeseries reshaping mode is I/O. Increasing job_nbr may reduce throughput once the maximum I/O bandwidth of the node is reached, due to contention for I/O resources. Regridding requires math that can relieve some I/O contention and allows for some throughput gains with increasing job_nbr. One strategy that seems sensible is to set job_nbr equal to the number of nodes times the number of cores per node, and increase or decrease as necessary until throughput peaks.
--dfl_lvl
, --dfl
, --deflate
) ¶Activate deflation (i.e., lossless compress, see Deflation) with
the -L dfl_lvl
short option (or with the same argument to
the ‘--dfl_lvl’ or ‘--deflate’ long options).
Specify deflation level dfl_lvl on a scale from no deflation
(dfl_lvl = 0, the default) to maximum deflation
(dfl_lvl = 9).
--lnk_flg
, --link_flag
) ¶--no_amwg_link
, --no_amwg_links
, --no_amwg
, --no_AMWG_link
, --no_AMWG_links
)--amwg_link
, --amwg_links
, --AMWG_link
, --AMWG_links
)These options turn-on or turn-off the linking of
E3SM/ACME-climo to AMWG-climo filenames.
AMWG omits the YYYYMM components of climo filenames,
resulting in shorter names.
By default ncclimo
symbolically links the full
E3SM/ACME filename (which is always) created to a file with
the shorter (AMWG) name whose creation is optional.
AMWG diagnostics scripts can produce plots directly from
the linked AMWG filenames.
The ‘-l’ (and ‘--lnk_flg’ and ‘--link_flag’ long-option
synonmyms) are true options that require an argument of either
‘Yes’ or ‘No’.
The remaining synonyms are switches that take no arguments.
The ‘--amwg_link’ switch and its synonyms cause the creation
of symbolic links with AMWG filenames.
The ‘--no_amwg_link’ switch and its synonyms prevent the creation
of symbolic links with AMWG filenames.
If you do not need AMWG filenames, turn-off linking to
reduce file proliferation in the output directories.
--mdl_nm
, --mdl
, --model_name
, --model
) ¶Model name (as embedded in monthly input filenames). Default is ‘cam’. Other options are ‘clm2’, ‘ocn’, ‘ice’, ‘cism’, ‘cice’, ‘pop’.
nco_opt
, nco
, nco_options
) ¶Specifies a string of options to pass-through unaltered to
ncks
.
nco_opt defaults to ‘--no_tmp_fl’.
Note that ncclimo
passes its nco_opt to
ncremap
.
This can cause unexpected results, so use the front-end options to
ncclimo
when possible, rather than attempting to subvert
them with nco_opt.
--drc_rgr
, --rgr_drc
, --dir_rgr
, --regrid
) ¶Directory to hold regridded climo files. Regridded climos are placed in drc_out unless a separate directory for them is specified with ‘-O’ (NB: capital “O”).
--no_cll_msr
, --no_cll
, --no_cell_measures
, --no_area
) ¶This switch (which takes no argument) controls whether ncclimo
and ncremap
add measures variables to the extraction list
along with the primary variable and other associated variables.
See CF Conventions for a detailed description.
--no_frm_trm
, --no_frm
, --no_formula_terms
) ¶This switch (which takes no argument) controls whether ncclimo
and ncremap
add formula variables to the extraction list along
with the primary variable and other associated variables.
See CF Conventions for a detailed description.
--glb_avg
, --global_average
) (deprecated) ¶--rgn_avg
, --region_average
)When introduced in NCO version 4.9.1 (released December,
2019), this switch (which takes no argument) caused the splitter to
output global horizontally spatially averaged timeseries files instead of
raw, native-grid timeseries.
This switch changed behavior in NCO version 5.1.1
(released November, 2022).
It now causes the splitter to output three horizontally spatially
averaged timeseries.
First is the global average (as before), next is the northern
hemisphere average, followed by the southern hemisphere average.
The three timeseries are now saved in a two-dimensional (time
by region) array with a “region dimension” named rgn
.
The region names are stored in the variable named region_name
.
As of NCO version 5.2.3 (released March, 2024), this switch
works with sub-gridscale fractions, such as are common in surface
models like ELM and CLM.
The correct weights for (SGS) fraction will automatically be
applied so long as ncclimo
is invoked with
‘-P prc_typ’.
Otherwise the field containing the (SGS) fraction must be
supplied with ‘--sgs_frc=sgs_frc’.
This switch only has effect in timeseries splitting mode. This is useful, for example, to quickly diagnose the behavior of ongoing model simulations prior to a full-blown analysis. Thus the spatial mean files will be in the same location and have the same name as the native grid timeseries would have been and had, respectively. Note that this switch does not alter the capability of also outputting the full regridded timeseries, if requested, at the same time.
Because the switch now outputs global and regional averages, the best
practice is to invoke with ‘--rgn_avg’ instead of
‘--glb_avg’.
NCO version 5.2.8 (released September, 2024) superseded this
switch by introducing support for more general global/regional
statistics timeseries output via the --glb_stt
/--rgn_stt
options.
--glb_stt
, --global_statistic
) ¶--rgn_stt
, --region_statistic
)NCO version 5.2.8 (released September, 2024) introduced the
--glb_stt
/--rgn_stt
options (or long options
equivalents --hms_stt
, --regional_statistic
, and
--global_statistic
) to support more general global/regional
statistics timeseries output.
These options allow the user to choose which statistic, sums or
averages, to output with global/regional timeseries for all
variables.
Set rgn_stt to avg
, average
, or mean
to output timeseries of the global/regional mean statistic.
Set rgn_stt to sum
, total
, ttl
, or
integral
to output timeseries of the global/regional sum.
The option ‘--rgn_stt=avg’ is equivalent setting the
--rgn_avg
switch (which may eventually be deprecated).
When invoked with ‘--rgn_stt=sum’ the averaged field is
multiplied by the sum of the area variable.
For area-intensive fields (e.g., fluxes per unit area) this results in
the total net flux over the area.
However, the field must employ the same areal units as the area
variable for this to be true.
For example, fields given in inverse square meters would need to
employ an area variable in square meters.
Unfortunately, many people love non-SI units so that is rarely
the case!
For example, ELM and CLM archive area in a field named
area
whose units are square kilometers, so a scale factor of
one million is needed to correct the sum for many variables.
EAM and CAM also archive area in a field named area
,
though in unis of inverse steradians, which would require a
different scale factor to match the sums of area-intensive fields.
That is why ncclimo
introduced a second new option
‘--sum_scl=sum_scl’, in NCO version 5.2.8
(released September, 2024).
The long option equivalents are --scl_fct
, --sum_scale
,
and --scale_factor
.
When rgn_stt is sum
, the sum_scl scale factor
multiplies the integrated field value, which allows the user to
generate timeseries in the desired units for any field.
Consider these prototypical examples to generate global timeseries
of common geophysical statistics from ESM output:
# Timeseries of global GPP in grams/s for ELM/CLM: ncclimo -P elm --split --rgn_stt=sum --sum_scl=1.0e6 -v GPP ... # Timeseries of global GPP in GT C/yr for ELM/CLM: ncclimo -P elm --split --rgn_stt=sum --sum_scl=1.0e6*3600*24*365/1.0e12 -v GPP ... # Timeseries of global column vapor in kg for EAM/CAM: ncclimo -P eam --split --rgn_stt=sum --sum_scl=6.37122e6^2 -v TMQ ...
All three examples set rgn_stt to sum
in order to
activate the sum_scl factor.
The first example scales (multiplies) the mean of all global
timeseries (here, GPP
for concreteness) by one million.
This factor converts the ELM or CLM area
variable from square kilometers to square meters, appropriate to
to integrating fields like GPP
whose fluxes are per square
meter.
The output timeseries of GPP
would then be in gC s-1.
The second example sets the scale factor to convert the global
GPP
statistic to units of GT C yr-1.
The third example shows how to convert areal sums in sterradians
(which EAM and CAM use for area
) to
square meters.
This factor converts atmospheric variables from global mean mass per
square meter to global total mass.
The --glb_stt=sum
/--sum_scl
procedure is model- and
variable-specific and we are open to suggestions to make it more
useful.
As of NCO version 5.3.0 (December, 2025) ncclimo
automatically outputs additional metrics with global statistics.
The output files containing the global timeseries also contain the
variable valid_area_per_gridcell
.
This field is equivalent to the product of the area variable
and the sgs_frc variable (if any).
Thus for ELM/CLM/CTSM, this field equals
area
times landfrac
, while for
EAM/CAM this variable simply equals area
.
The output files also contain the area and sgs_frc
variables separately.
The presence of these variables in output allows downstream processors
(e.g., zppy
) to generate additional masks and weights for
rescaling the statistics.
For example, these fields can be used to rescale global sums into any
units desired.
--no_ntv_tms
, --no_ntv
, --no_native
, --remove_native
) ¶This switch (which takes no argument) controls whether the splitter
retains native grid split files, which it does by default, or deletes them.
ncclimo
can split model output from multi-variable native grid
files into per-variable timeseries files and regrid those onto a
so-called analysis grid.
That is the typical format in which Model Intercomparison Projects
(MIPs) request and disseminate contributions.
When the data producer has no use for the split timeseries on the native
grid, he/she can invoke this flag to cause ncclimo
to
delete the native grid timeseries (not the raw native grid datafiles).
This functionality is implemented by first creating the native grid
timeseries, regridding it, and then overwriting the native grid
timeseries with the regridded timeseries.
Thus the regridded files will be in the same location and have the same
name as the native grid timeseries would have been and had, respectively.
--no_stg_grd
, --no_stg
, --no_stagger
, --no_staggered_grid
) ¶This switch (which takes no argument) controls whether
regridded output will contain the staggered grid coordinates
slat
, slon
, and w_stag
(see Regridding).
By default the staggered grid is output for all files regridded from
a Cap (aka FV) grid, except when the regridding is performed
as part of splitting (reshaping) into timeseries.
--drc_out
, --out_drc
, --dir_out
, --output
) ¶Directory to hold computed (output) native grid climo files. Regridded climos are also placed here unless a separate directory for them is specified with ‘-O’ (NB: capital “O”).
--par_typ
, --par_md
, --parallel_type
, --parallel_mode
, --parallel
) ¶Specifies the parallelism mode desired.
The options are serial mode (‘-p srl’, ‘-p serial’, or ‘-p nil’),
background mode parallelism (‘-p bck’ or ‘-p background’)),
and MPI parallelism (‘-p mpi’ or ‘-p MPI’).
The default is background-mode parallelism.
The default par_typ is ‘background’, which means
ncclimo
spawns up to twelve (one for each month) parallel
processes at a time.
See discussion below under Memory Considerations.
--ppc
, --ppc_prc
, --precision
, --qnt
, --quantize
) ¶Specifies the precision of the Precision-Preserving Compression algorithm (see Precision-Preserving Compression). A positive integer is interpreted as the Number of Significant Digits for the Bit-Grooming algorithm, and is equivalent to specifying ‘--qnt default=qnt_prc’ to a binary operator. A positive or negative integer preceded by a period, e.g., ‘.-2’ is interpreted as the number of Decimal Significant Digits for the rounding algorithm and is equivalent to specifying ‘--qnt default=.qnt_prc’ to a binary operator. This option applies one precision algorithm and a uniform precision for the entire file. To specify variable-by-variable precision options, pass the desired options as a quoted string directly with ‘-n nco_opt’, e.g., ‘-n '--qnt FSNT,TREFHT=4 --qnt CLOUD=2'’.
rgr_opt
, regrid_options
) ¶Specifies a string of options to pass-through unaltered to
ncks
.
rgr_opt defaults to ‘-O --no_tmp_fl’.
--rgr_map
, --regrid_map
, --map
) ¶Regridding map.
Unless ‘-r’ is specified ncclimo
produces only a
climatology on the native grid of the input datasets.
The rgr_map specifies how to (quickly) transform the native
grid into the desired analysis grid.
ncclimo
will (call ncremap
to) apply the given map
to the native grid climatology and produce a second climatology on the
analysis grid.
Options intended exclusively for the regridder may be passed as
arguments to the ‘-R’ switch.
See below the discussion on regridding.
--mth_srt
, --srt_mth
, --month_start
, --start_month
) ¶--mth_end
, --end_mth
, --month_end
, --end_month
)Start month (example: 4), and end month (example: 11).
The starting month of monthly timeseries extracted by the splitter
defaults to January of the specified start year, and the ending
month defaults to December of the specified end year.
As of NCO version 4.9.8, released in March, 2021,
the splitter mode of ncclimo
accepts user-specified
start and end months with the ‘--mth_srt’ and ‘--mth_end’
options, respectively.
Months are input as one-based integers so January is 1 and
December is is 12.
To extract 14-month timeseries from individual monthly input files one
could use, e.g.,
ncclimo --yr_srt=1 --yr_end=2 --mth_srt=4 --mth_end=5 ...
Note that mth_srt
and mth_end
only affect the splitter,
and that they play no role in climatology generation.
--srt_yr
, --yr_srt
, --start_year
, --year_start
, --start
) ¶Start year (example: 1980). By default, the first month used is January of the specified start year. If ‘-a scd’ is specified, the first month used will be December of the year before the start year (to allow for contiguous DJF climos).
--seasons
, --csn_lst
, --csn
) ¶Seasons for ncclimo
to compute in monthly climatology
generation mode.
The list of seasons, csn_lst, is a comma-separated,
case-insensitive, unordered subset of the abbreviations for the eleven
(so far) defined seasons:
jfm
, amj
, jas
, ond
, on
, fm
,
djf
, mam
, jja
, son
, and ann
.
By default csn_lst=mam,jja,son,djf
.
Moreover, ncclimo
automatically computes the climatological
annual mean, ANN
, is always computed when MAM, JJA, SON, and DJF
are all requested (which is the default).
The ANN computed automatically is the time-weighted average of the four
seasons, rather than as the time-weighted average of the twelve monthly
climatologies.
Users who need ANN but not DJF, MAM, JJA, and SON should instead
explicitly specify ANN as a season in csn_lst.
The ANN computed as a season is the time-weighted average of the twelve
monthly climatologies, rather than the time-weighted average of four
seasonal climatologies.
Specifying the four seasons and ANN in csn_lst (e.g.,
csn_lst=mam,jja,son,djf,ann
) is legal though redundant
and wasteful.
It cause ANN to be computed twice, first as the average of the twelve
monthly climatologies, then as the average of the four seasons.
The special value csn_lst=none
turns-off computation of
seasonal (and annual) climatologies.
ncclimo --seasons=none ... # Produce only monthly climos ncclimo --seasons=mam,jja,son,djf ... # Monthly + MAM,JJA,SON,DJF,ANN ncclimo --seasons=jfm,jas,ann ... # Monthly + JFM,JAS,ANN ncclimo --seasons=fm,on ... # Monthly + FM,ON
--split
, --splitter
, --tms_flg
, --timeseries
) ¶This switch (which takes no argument) explicitly instructs
ncclimo
to split the input multi-variable raw datasets into
per-variable timeseries spanning the entire period.
The --split
switch, and its synonyms --splitter
,
--tms_flg
, and --timeseries
, were introduced in
NCO version 5.0.4 (released December, 2021).
Previously, the splitter was automatically invoked whenever the input
files were provided via stdin
, globbing, or positional
command-line arguments, with some exceptions.
That older method became ambiguous and untenable once it was decided
to also allow climos to be generated from files provided via
stdin
, globbing, or positional command-line arguments.
Now there are three methods to invoke the splitter:
1) Use the ‘--split’ flag:
this is the most explicit way to invoke the splitter.
2) Select ‘clm_md=hfs’:
the high-frequency splitter mode by definition invokes the splitter,
so a more explicit option than this is not necessary.
3) Set the years-per-file option, e.g., ‘--ypf=25’:
the ypf_max
option is only useful to the splitter, and has
thus been used in many scripts.
Since this option still causes the splitter to be invoked, those
will perform as before the API change.
These three splitter invocations methods are non-exclusive, i.e., more
than one can be used, and there is no harm in doing so.
While the API change in version 5.0.4 does proscribe the
former practice of passively invoking the splitter by simply piping
files to stdin
or similar, it enables much more flexibility
for future features, including the possibility of automatically
generating timeseries filenames for the splitter, and of piping
files to stdin
or similar for climo generation.
--no_stdin
, --no_inp_std
, --no_redirect
, --no_standard_input
) ¶First introduced in NCO version 4.8.0 (released May, 2019),
this switch (which takes no argument) disables checking standard input
(aka stdin
) for input files.
This is useful because ncclimo
and ncremap
may
mistakenly expect input to be provided on stdin
in environments
that use stdin
for other purposes.
Some non-interactive environments (e.g., crontab
,
nohup
, Azure CI, CWL), may
use standard input for their own purposes, and thus confuse
NCO into thinking that you provided the input files names
via the stdin
mechanism.
In such cases users may disable the automatic checks for standard
input by explicitly invoking the ‘--no_stdin’ flag.
This switch is usually not required for jobs in an interactive shell.
Interactive SLURM shells can also commandeer stdin
,
as is the case on the DOE
machine named Chrysalis.
This behavior appears to vary depending on the SLURM
implementation.
ncclimo --no_stdin -v T -s 2000 -e 2001 --ypf=10 -i in -o out
--thr_nbr
, --thr
, --thread_number
, --threads
) ¶Specifies the number of threads used per regridding process
(see OpenMP Threading).
The NCO regridder scales well to 8–16 threads.
However, regridding with the maximum number of threads can interfere
with climatology generation in parallel climatology mode (i.e., when
par_typ = mpi
or bck
).
Hence ncclimo
defaults to thr_nbr=2.
--tpd_out
, --tpd
, --timesteps_per_day
) ¶Normally, the number of timesteps-per-day in files ingested by
ncclimo
.
It can sometimes be difficult for ncclimo
to infer the
number of timesteps-per-day in high-frequency input files, i.e., those
with 1 or more timesteps-per-day.
In such cases, users may override the inferred value by explicitly
specifying --tpd=tpd
.
The value of tpd_out in daily-average climatology mode
clm_md=dly
(which is generally not used outside of ice-sheet
models) is different, and actually refers to the number of timesteps
per day that ncclimo will output, regardless of its value in the input
files.
Hence in daily-average mode (only), we refer to this variable as
tpd_out.
The climatology output from input files at daily or sub-daily resolution is, by default, averaged to daily resolution, i.e., tpd_out=1. If the number of timesteps per day in each input file is tpd_in, then the user may select any value of tpd_out that is smaller than and integrally divides tpd_in. For example, an input timeseries with tpd_in=8 (i.e., 3-hourly resolution), can be used to produce climatological output at 3, 6, or 12-hourly resolution by setting tpd_out to 8, 4, or 2, respectively. This option only takes effect in daily-average climatology mode.
For full generality, the --tpd
option should probably be split
into separate options --tpd_in
and --tpd_out
.
However, because it is unlikely that anyone will need to specify these
to different values, we leave only one option.
If this hinders you, please let us know and we will split the options.
--var_lst
, --var
, --vars
, --variables
, --variable_list
) ¶Variables to subset or to split.
Same behavior as Subsetting Files.
The use of var_lst is optional in clim-generation mode.
We suggest using this feature to test whether an ncclimo
command, especially one that is lengthy and/or time-consuming, works as
intended on one or a few variables with, e.g., ‘-v T,FSNT’ before
generating the full climatology (by omitting this option).
Invoking this switch was required in the original splitter released in
version 4.6.5 (March, 2017), and became optional as of version 4.6.6
(May, 2017).
This option is recommended in timeseries reshaping mode to prevent
inadvertently copying the results of an entire model simulation.
Regular expressions are allowed so, e.g., ‘PREC.?’ extracts
the variables ‘PRECC,PRECL,PRECSC,PRECSL’ if present.
Currently in reshaping mode all matches to a regular expression are
placed in the same output file.
We hope to remove this limitation in the future.
--var_xtr
, --var_xtr
, --var_extra
, --variables_extra
, --extra_variables
) ¶The ‘--var_xtr’ option causes ncclimo
to include the extra
variables list in var_xtr in every timeseries split from the raw
data.
This is useful when extra variables are desired in timeseries.
There are no limits on the extra variables—they may be of any rank
and may be timeseries themselves.
One useful application of this option, is to ensure that the area
variable is included with each timeseries, e.g.,
‘--var_xtr=area’.
--version
, --vrs
, --config
, --configuration
, --cnf
) ¶This switch (which takes no argument) causes the operator to print its version and configuration. This includes the copyright notice, URLs to the BSD and NCO license, directories from which the NCO scripts and binaries are running, and the locations of any separate executables that may be used by the script.
--xcl_var
, --xcl
, --exclude
, --exclude_variables
) ¶This flag (which takes no argument) changes var_lst,
as set by the --var_lst
option, from an extraction list to an
exclusion list so that variables in var_lst will not be
processed, and variables not in var_lst will be processed.
Thus the option ‘-v var_lst’ must also be present for this
flag to take effect.
Variables explicitly specified for exclusion by
‘--xcl --vars=var_lst[,…]’ need not be present in the
input file.
Previously, this switch has always woked in climo mode.
As of NCO version 5.2.5 (July, 2024), this switch also works
in timeseries mode.
--ypf
, --years
, --years_per_file
) ¶Specifies the maximum number of years-per-file output by
ncclimo
’s splitting operation.
When ncclimo
subsets and splits a collection of input files
spanning a timerseries, it places each subset variable in its own output
file.
The maximum length, in years, of each output file is ypf_max,
which defaults to ypf_max=50.
If an input timeseries spans 237 years and ypf_max=50, then
ncclimo
will generate four output files of length 50 years
and one output file of length 37 years.
Note that invoking this option causes ncclimo
to enter
timeseries reshaping mode.
In fact, one must use ‘--ypf’ to turn-on splitter mode when
the input files are specified by using the ‘-i drc_in’ method.
Otherwise it would be ambiguous whether to generate a climatology from
or to split the input files.
ncclimo
do?This section of the ncclimo
documentation applies only to
resphaping mode, whereas all subsequent sections apply to climatology
generation mode.
In splitter mode, ncclimo
reshapes the input so that
the outputs are continuous timeseries of each variable taken from all
input files.
As of NCO version 5.0.4 (released December, 2021),
ncclimo
enters splitter mode when invoked with the
--split
switch (or its synonyms --splitter
,
--tms_flg
, or --timeseries
) or with the --ypf_max
option.
Then ncclimo
will create per-variable timeseries from the
list of files supplied via stdin
, or, alternatively,
placed as positional arguments (after the last command-line option), or
if neither of these is done and no caseid is specified, in which
case it assumes all *.nc
files in drc_in constitute the
input file list.
These examples invoke reshaping mode in the four possible ways:
# Pipe list to stdin cd $drc_in;ls *mdl*000[1-9]*.nc | ncclimo --split -v T,Q,RH -s 1 -e 9 -o $drc_out # Redirect list from file to stdin cd $drc_in;ls *mdl*000[1-9]*.nc > foo;ncclimo --split -v T,Q,RH -s 1 -e 9 -o $drc_out < foo # List as positional arguments ncclimo --split -v T,Q,RH -s 1 -e 9 -o $drc_out $drc_in/*mdl*000[1-9]*.nc # Glob directory ncclimo --split -v T,Q,RH -s 1 -e 9 -i $drc_in -o $drc_out
Assuming each input file is a monthly average comprising the variables T, Q, and RH, then the output will be files T_000101_000912.nc, Q_000101_000912.nc, and RH_000101_000912.nc. When necessary, the output is split into segments each containing no more than ypf_max (default 50) years of input, i.e., T_000101_005012.nc, T_005101_009912.nc, T_010001_014912.nc, etc.
MPAS ocean and ice models currently have their own
(non-CESM’ish) naming convention that guarantees output files have the
same names for all simulations.
By default ncclimo
analyzes the “timeSeriesStatsMonthly”
analysis member output (tell us if you want options for other analysis
members).
ncclimo
and ncremap
recognize input files as being
MPAS-style when invoked with ‘-P mpas’ or with the
more expressive synonym ‘--prc_typ=mpas’.
The the generic ‘-P mpas’ invocation works for generating
climatologies for any MPAS model.
However, some regridder options are model-specific and therefore it is
smarter to specify which MPAS model produced the input
data with
‘-P mpasatmosphere’, (or ‘-P mpasa’ for short),
‘-P mpasocean’, (or ‘-P mpaso’ for short),
‘-P mpasseaice’, (or ‘-P mpassi’ for short), or
‘-P mali’, like this:
ncclimo -P mpasa -c $case -s 1980 -e 1983 -i $drc_in -o $drc_out # MPAS-A ncclimo -P mpaso -c $case -s 1980 -e 1983 -i $drc_in -o $drc_out # MPAS-O ncclimo -P mpassi -c $case -s 1980 -e 1983 -i $drc_in -o $drc_out # MPAS-SI ncclimo -P mali -c $case -s 1980 -e 1983 -i $drc_in -o $drc_out # MPAS-LI
As of June 2024 and NCO version 5.2.5, ncclimo
updated its MPAS dataset filename construction option.
Previously it constructed MPAS monthly datasets names like this:
$mdl_nm.hist.am.timeSeriesStatsMonthly.$YYYY-$MM-01.nc.
where mdl_nm is the canonical MPAS component name,
e.g., mpaso
.
This yielded names consistent with E3SM v1 output like
mpaso.hist.am.timeSeriesStatsMonthly.0001-02-01.nc, and
mpascice.hist.am.timeSeriesStatsMonthly.0001-02-01.nc.
Now ncclimo
prepends the caseid, if present, to the
filename.
This yields names consistent with E3SM v2 and v3 output like
v2.LR.historical_0101.mpaso.hist.am.timeSeriesStatsMonthly.0001-02-01.nc, and
v2.LR.historical_0101.mpassi.hist.am.timeSeriesStatsMonthly.0001-02-01.nc.
To read MPAS filenames with other patterns, simply pipe the
filenames to ncclimo
: ‘ls *mpas*hist | ncclimo ...’.
Raw output data from all MPAS models does not contain
missing value attributes 76.
These attributes must be manually added before sending the data as
input to ncclimo
or ncremap
.
We recommend that simulation producers annotate all floating point
variables with the appropriate _FillValue
prior to invoking
ncclimo
.
Run something like this once in the history-file directory:
for fl in `ls hist.*` ; do ncatted -O -t -a _FillValue,,o,d,-9.99999979021476795361e+33 ${fl} done
If/when MPAS-O/I generates the _FillValue
attributes
itself, this step can and should be skipped.
All other ncclimo
features like regridding (below) are invoked
identically for MPAS as for CAM/CLM users
although under-the-hood ncclimo
does do some special
pre-processing (dimension permutation, metadata annotation) for
MPAS.
A five-year oEC60to30 MPAS-O climo with regridding to T62
takes less than 10 minutes on the machine rhea.
Not all model or observed history files are created as monthly means.
To create a climatological annual mean from a series of annual mean
inputs, select ncclimo
’s annual climatology mode with
the ‘-C ann’ option:
ncclimo -C ann -m cism -h h -c caseid -s 1851 -e 1900 -i drc_in -o drc_out
The options ‘-m mdl_nm’ and ‘-h hst_nm’ (that default to
cam
and h0
, respectively) tell ncclimo
how to
construct the input filenames.
The above formula names the files
caseid.cism.h.1851-01-01-00000.nc
,
caseid.cism.h.1852-01-01-00000.nc
,
and so on.
Annual climatology mode produces a single output file (or two if
regridding is selected), and in all other respects behaves the same as
monthly climatology mode.
ncclimo
will (optionally) regrid during climatology generation
and produce climatology files on both native and analysis grids.
This regridding is virtually free, because it is performed on idle
nodes/cores after monthly climatologies have been computed and while
seasonal climatologies are being computed.
This load-balancing can save half-an-hour on ne120 datasets.
To regrid, simply pass the desired mapfile name with ‘-r map.nc’,
e.g., ‘-r maps/map_ne120np4_to_fv257x512_aave.20150901.nc’.
Although this should not be necessary for normal use, you may pass any
options specific to regridding with ‘-R opt1 opt2’.
Specifying ‘-O drc_rgr’ (NB: uppercase ‘O’) causes
ncclimo
to place the regridded files in the directory
drc_rgr.
These files have the same names as the native grid climos from which
they were derived.
There is no namespace conflict because they are in separate
directories.
These files also have symbolic links to their AMWG filenames.
If ‘-O drc_rgr’ is not specified, ncclimo
places
all regridded files in the native grid climo output directory,
drc_out, specified by ‘-o drc_out’ (NB: lowercase
‘o’).
To avoid namespace conflicts when both climos are stored in the same
directory, the names of regridded files are suffixed by the destination
geometry string obtained from the mapfile, e.g.,
*_climo_fv257x512_bilin.nc.
These files also have symbolic links to their AMWG filenames.
ncclimo -c amip_xpt -s 1980 -e 1983 -i drc_in -o drc_out ncclimo -c amip_xpt -s 1980 -e 1983 -i drc_in -o drc_out -r map_fl ncclimo -c amip_xpt -s 1980 -e 1983 -i drc_in -o drc_out -r map_fl -O drc_rgr
The above commands perform a climatology without regridding, then with
regridding (all climos stored in drc_out), then with regridding and
storing regridded files separately.
Paths specified by drc_in, drc_out, and drc_rgr may be
relative or absolute.
An alternative to regridding during climatology generation is to regrid
afterwards with ncremap
, which has more special features
built-in for regridding.
To use ncremap
to regrid a climatology in drc_out and
place the results in drc_rgr, use something like
ncremap -I drc_out -m map.nc -O drc_rgr ls drc_out/*climo* | ncremap -m map.nc -O drc_rgr
See ncremap
netCDF Remapper for more details (including
MPAS!).
ncclimo
supports two methods for generating extended
climatologies: Binary and Incremental.
Both methods lengthen a climatology without requiring access to
all the raw monthly data spanning the time period.
The binary method combines, with appropriate weighting, two previously
computed climatologies into a single climatology.
No raw monthly data are employed.
The incremental method computes a climatology from raw monthly data
and (with appropriate weighting) combines that with a previously
computed climatology that ends the month prior to raw data.
The incremental method was introduced in NCO version 4.6.1
(released August, 2016), and the binary method was introduced in
NCO version 4.6.3 (released December, 2016).
Both methods, binary and incremental, compute the so-called “extended climo” as a weighted mean of two shorter climatologies, called the “previous” and “current” climos. The incremental method uses the original monthly input to compute the curent climo, which must immediately follow in time the previous climo which has been pre-computed. The binary method use pre-computed climos for both the previous and current climos, and these climos need not be sequential nor chronological. Both previous and current climos for both binary and incremental methods may be of any length (in years); their weights will be automatically adjusted in computing the extended climo.
The use of pre-computed climos permits ongoing simulations (or lengthy observations) to be analyzed in shorter segments combined piecemeal, instead of requiring all raw, native-grid data to be simultaneously accessible. Without extended climatology capability, generating a one-hundred year climatology requires that one-hundred years of monthly data be available on disk. Disk-space requirements for large datasets may make this untenable. Extended climo methods permits a one-hundred year climo to be generated as the weighted mean of, say, the current ten year climatology (weighted at 10%) combined with the pre-computed climatology of the previous 90-years (weighted at 90%). The 90-year climo could itself have been generated incrementally or binary-wise, and so on. Climatologies occupy at most 17/(12N) the amount of space of N years of monthly data, so the extended methods vastly reduce disk-space requirements.
Incremental mode is selected by specifying ‘-S’, the start year
of the pre-computed, previous climo.
The argument to ‘-S’) is the previous climo start year.
That, together with the current climo end year, determines the extended
climo range.
ncclimo
assumes that the previous climo ends the month before
the current climo begins.
In incremental mode, ncclimo
first generates the current
climatology from the current monthly input files then weights
that current climo with the previous climo to produce the extended
climo.
Binary mode is selected by specifying both ‘-S’ and ‘-E’, the end year of the pre-computed, previous climo. In binary mode, the previous and current climatologies can be of any length, and from any time-period, even overlapping. Most users will run extended clmos the same way they run regular climos in terms of parallelism and regridding, although that is not required. Both climos must treat Decembers same way (or else previous climo files will not be found), and if subsetting (i.e., ‘-v var_lst’) is performed, then the subset must remain the same, and if nicknames (i.e., ‘-f fml_nm’) are employed, then the nickname must remain the same.
As of 20161129, the climatology_bounds
attributes of extended
climos are incorrect.
This is a work in progress...
Options:
--yr_end_prv
, --prv_yr_end
, --previous_end
) ¶The ending year of the previous climo. This argument is required to trigger binary climatologies, and should not be used for incremental climatologies.
--yr_srt_prv
, --prv_yr_srt
, --previous_start
) ¶The starting year of the previous climo. This argument is required to trigger incremental climatologies, and is also mandatory for binary climatologies.
--drc_xtn
, --xtn_drc
, --extended
) ¶Directory in which the extended native grid climo files will be stored for an extended climatology. Default value is drc_prv. Unless a separate directory is specified (with ‘-Y’) for the extended climo on the analysis grid, it will be stored in drc_xtn, too.
--drc_prv
, --prv_drc
, --previous
) ¶Directory in which the previous native grid climo files reside for an incremental climatology. Default value is drc_out. Unless a separate directory is specified (with ‘-y’) for the previous climo on the analysis grid, it is assumed to reside in drc_prv, too.
--drc_rgr_xtn
, --drc_xtn_rgr
, --extended_regridded
, --regridded_extended
) ¶Directory in which the extended analysis grid climo files will be stored in an incremental climatology. Default value is drc_xtn.
--drc_rgr_prv
, --drc_prv_rgr
, --regridded_previous
, --previous_regridded
) ¶Directory in which the previous climo on the analysis grid resides in an incremental climatology. Default value is drc_prv.
Incremental method climatologies can be as simple as providing a start year for the previous climo, e.g.,
ncclimo -v FSNT,AODVIS -c caseid -s 1980 -e 1981 -i raw -o clm -r map.nc ncclimo -v FSNT,AODVIS -c caseid -s 1982 -e 1983 -i raw -o clm -r map.nc -S 1980
By default ncclimo
stores all native and analysis grid climos
in one directory so the above “just works”.
There are no namespace clashes because all climos are for distinct
years, and regridded files have a suffix based on their grid resolution.
However, there can be only one set of AMWG filename links
due to AMWG filename convention.
Thus AMWG filename links, if any, point to the latest extended
climo in a given directory.
Many researchers segregate (with ‘-O drc_rgr’) native-grid from analysis-grid climos. Incrementally generated climos must be consistent in this regard. In other words, all climos contributing to an extended climo must have their native-grid and analysis-grid files in the same (per-climo) directory, or all climos must segregate their native from their analysis grid files. Do not segregate the grids in one climo, and combine them in another. Such climos cannot be incrementally aggregated. Thus incrementing climos can require from zero to four additional options that specify all the previous and extended climatologies for both native and analysis grids. The example below constructs the current climo in crr, then combines the weighted average of that with the previous climo in prv, and places the resulting extended climatology in xtn. Here the native and analysis climos are combined in one directory per climo:
ncclimo -v FSNT,AODVIS -c caseid -s 1980 -e 1981 -i raw -o prv -r map.nc ncclimo -v FSNT,AODVIS -c caseid -s 1982 -e 1983 -i raw -o clm -r map.nc \ -S 1980 -x prv -X xtn
If the native and analysis grid climo directories are segregated, then those directories must be specified, too:
ncclimo -v FSNT,AODVIS -c caseid -s 1980 -e 1981 -i raw -o prv -O rgr_prv -r map.nc ncclimo -v FSNT,AODVIS -c caseid -s 1982 -e 1983 -i raw -o clm -O rgr -r map.nc \ -S 1980 -x prv -X xtn -y rgr_prv -Y rgr_xtn
ncclimo
does not know whether a pre-computed climo is on a
native grid or an analysis grid, i.e., whether it has been regridded.
In binary mode, ncclimo
may be pointed to two pre-computed
native grid climatologies, or to two pre-computed analysis grid
climatologies.
In other words, it is not necessary to maintain native grid
climatologies for use in creating extended climatologies.
It is sufficient to generate climatologies on the analysis grid, and
feed them to ncclimo
in binary mode, without a mapping file:
ncclimo -c caseid -S 1980 -E 1981 -x prv -s 1980 -e 1981 -i crr -o clm
ncclimo
works on all E3SM/ACME and CESM models.
It can simultaneously generate climatologies for a coupled run, where
climatologies mean both native and regridded monthly, seasonal, and
annual averages as per E3SM/ACME specifications (which mandate
the inclusion of certain helpful metadata and provenance information).
Here are template commands for a recent simulation:
caseid=20160121.A_B2000ATMMOD.ne30_oEC.titan.a00 drc_in=/scratch/simulations/$caseid/run drc_out=${DATA}/acme map_atm=${DATA}/maps/map_ne30np4_to_fv129x256_aave.20150901.nc map_lnd=$map_atm map_ocn=${DATA}/maps/map_oEC60to30_to_t62_bilin.20160301.nc map_ice=$map_ocn ncclimo -p mpi -c $caseid -m cam -s 2 -e 5 -i $drc_in -r $map_atm -o $drc_out/atm ncclimo -c $caseid -m clm2 -s 2 -e 5 -i $drc_in -r $map_lnd -o $drc_out/lnd ncclimo -p mpi -m mpaso -s 2 -e 5 -i $drc_in -r $map_ocn -o $drc_out/ocn ncclimo -m mpassi -s 2 -e 5 -i $drc_in -r $map_ice -o $drc_out/ice
Atmosphere and ocean model output is typically larger than land and ice
model output.
These commands recognize that by using different parallelization
strategies that may (rhea standard queue) or may not
(cooley, or rhea’s bigmem
queue) be required,
depending on the fatness of the analysis nodes, as explained below.
It is important to employ the optimal ncclimo
parallelization
strategy for your computer hardware resources.
Select from the three available choices with the
-p par_typ switch.
The options are serial mode (‘-p srl’, ‘-p serial’, or ‘-p nil’),
background mode parallelism (‘-p bck’, or ‘-p background’),
and MPI parallelism (‘-p mpi’ or ‘-p MPI’).
The default is background-mode parallelism.
This is appropriate for lower resolution (e.g., ne30L30) simulations on
most nodes at high-performance computer centers.
Use (or at least start with) serial mode on personal
laptops/workstations.
Serial mode requires twelve times less RAM than the parallel
modes, and is much less likely to deadlock or cause OOM
(out-of-memory) conditions on your personal computer.
If the available RAM (plus swap) is
< 12*4*sizeof(
monthly input file)
, then try serial
mode first (12 is the optimal number of parallel processes for monthly
climos, the computational overhead is a factor of four).
EAM-SE ne30L30 output is about 1 GB/month so each month
requires about 4 GB of RAM.
EAM-SE ne30L72 output (with LINOZ) is about
10 GB/month so each month requires about 40 GB RAM.
EAM-SE ne120 output is about 12 GB/month so each month
requires about 48 GB RAM.
The computer does not actually use all this memory at one time, and many
kernels compress RAM usage to below what top reports, so the
actual physical usage is hard to pin-down, but may be a factor of
2.5–3.0 (rather than a factor of four) times the size of the input
file.
For instance, my 16 GB 2014 MacBookPro successfully runs an ne30L30
climatology (that requests 48 GB RAM) in background mode.
However the laptop is slow and unresponsive for other uses until it
finishes (in 6–8 minutes) the climos.
Experiment and choose the parallelization option that performs best.
Serial-mode, as its name implies, uses one core at a time for climos, and proceeds sequentially from months to seasons to annual climatologies. Serial mode means that climos are performed serially, while regridding still employs OpenMP threading (up to 16 cores) on platforms that support it. By design each month and each season is independent of the others, so all months can be computed in parallel, then each season can be computed in parallel (using monthly climatologies), from which annual average is computed. Background parallelization mode exploits this parallelism and executes the climos in parallel as background processes on a single node, so that twelve cores are simultaneously employed for monthly climatologies, four for seasonal, and one for annual. The optional regridding will employ, by default, up to two cores per process. The MPI parallelism mode executes the climatologies on different nodes so that up to (optimally) twelve nodes compute monthly climos. The full memory of each node is available for each individual climo. The optional regridding employs, by default, up to eight cores per node in MPI-mode. MPI-mode or serial-mode must be used to process ne30L72 and ne120L30 climos on all but the fattest DOE nodes. An ne120L30 climo in background mode on rhea (i.e., on one 128 GB compute node) fails due to OOM. (Unfortunately OOM errors do not produce useful return codes so if your climo processes die without printing useful information, the cause may be OOM). However the same climo in background-mode succeeds when executed on a single big-memory (1 TB) node on rhea (use ‘-lpartition=gpu’, as shown below). Or MPI-mode can be used for any climatology. The same ne120L30 climo will also finish blazingly fast in background mode on cooley (i.e., on one 384 GB compute node), so MPI-mode is unnecessary on cooley. In general, the fatter the memory, the better the performance.
The basic approach above (running the script from a standard terminal
window) that works well for small cases can be unpleasantly slow on
login nodes of LCFs and for longer or higher resolution (e.g.,
ne120) climatologies.
As a baseline, generating a climatology of 5 years of ne30 (~1x1
degree) EAM-SE output with ncclimo
takes 1–2
minutes on rhea (at a time with little contention), and 6–8
minutes on a 2014 MacBook Pro.
To make things a bit faster at LCFs, request a dedicated node
(this only makes sense on supercomputers or clusters with
job-schedulers).
On rhea or titan, which use the PBS scheduler,
do this with
# Standard node (128 GB), PBS scheduler qsub -I -A CLI115 -V -l nodes=1 -l walltime=00:10:00 -N ncclimo # Bigmem node (1 TB), PBS scheduler qsub -I -A CLI115 -V -l nodes=1 -l walltime=00:10:00 -lpartition=gpu -N ncclimo
The equivalent requests on cooley or mira (Cobalt scheduler) and cori or titan (SLURM scheduler) are:
# Cooley node (384 GB) with Cobalt qsub -I -A HiRes_EarthSys --nodecount=1 --time=00:10:00 --jobname=ncclimo # Cori node (128 GB) with SLURM salloc -A acme --nodes=1 --partition=debug --time=00:10:00 --job-name=ncclimo
Flags used and their meanings:
Submit in interactive mode. This returns a new terminal shell rather than running a program.
How long to keep this dedicated node for.
Unless you kill the shell created by the qsub
command, the
shell will exist for this amount of time, then die suddenly.
In the above examples, 10 minutes is requested.
PBS syntax (e.g., on rhea) for nodes.
Cobalt syntax (e.g., on cooley) for nodes.
SLURM syntax (e.g., on cori or edison) for nodes.
These scheduler-dependent variations request a quantity of nodes.
Request 1 node for Serial or Background-mode, and up to 12 nodes
for MPI-mode parallelism.
In all cases ncclimo
will use multiple cores per node if
available.
Export existing environmental variables into the new interactive shell. This may not actually be needed.
Queue name. This is needed for locations like edison that have multiple queues with no default queue.
Name of account to charge for time used.
Acquiring a dedicated node is useful for any workflow, not just creating climos.
This command returns a prompt once nodes are assigned (the prompt is
returned in your home directory so you may then have to cd
to
the location you meant to run from).
Then run your code with the basic ncclimo
invocation.
The is faster because the node is exclusively dedicated to ncclimo
.
Again, ne30L30 climos only require < 2 minutes, so the
10 minutes requested in the example is excessive and conservative.
Tune it with experience.
The above parallel approaches will fail when a single node lacks enough
RAM (plus swap) to store all twelve monthly input files, plus
extra RAM for computations.
One should employ MPI multinode parallelism ‘-p mpi’
on nodes with less RAM than
12*3*sizeof(
input)
.
The longest an ne120 climo will take is less than half an hour (~25
minutes on edison or rhea), so the simplest method to run
MPI jobs is to request 12-interactive nodes using the above
commands (though remember to add ‘-p mpi’), then execute the script
at the command line.
It is also possible, and sometimes preferable, to request
non-interactive compute nodes in a batch queue.
Executing an MPI-mode climo (on machines with job scheduling
and, optimally, 12 nodes) in a batch queue can be done in two
commands.
First, write an executable file which calls the ncclimo
script
with appropriate arguments.
We do this below by echoing to a file, ncclimo.pbs.
echo "ncclimo -p mpi -c $caseid -s 1 -e 20 -i $drc_in -o $drc_out" > ncclimo.pbs
The only new argument here is ‘-p mpi’ that tells ncclimo
to use MPI parallelism.
Then execute this command file with a 12 node non-interactive job:
qsub -A CLI115 -V -l nodes=12 -l walltime=00:30:00 -j oe -m e -N ncclimo \ -o ncclimo.out ncclimo.pbs
This script adds new flags: ‘-j oe’ (combine output and error streams into standard error), ‘-m e’ (send email to the job submitter when the job ends), ‘-o ncclimo.out’ (write all output to ncclimo.out). The above commands are meant for PBS schedulers like on rhea. Equivalent commands for cooley/mira (Cobalt) and cori/edison (SLURM) are
# Cooley (Cobalt scheduler) /bin/rm -f ncclimo.err ncclimo.out echo '#!/bin/bash' > ncclimo.cobalt echo "ncclimo -p mpi -c $caseid -s 1 -e 20 -i $drc_in -o $drc_out" >> ncclimo.cobalt chmod a+x ncclimo.cobalt qsub -A HiRes_EarthSys --nodecount=12 --time=00:30:00 --jobname ncclimo \ --error ncclimo.err --output ncclimo.out --notify zender@uci.edu ncclimo.cobalt # Cori/Edison (SLURM scheduler) echo "ncclimo -p mpi -c $caseid -s 1 -e 20 -i $drc_in -o $drc_out -r $map_fl" \ > ncclimo.pbs chmod a+x ncclimo.slurm sbatch -A acme --nodes=12 --time=03:00:00 --partition=regular --job-name=ncclimo \ --mail-type=END --error=ncclimo.err --output=ncclimo.out ncclimo.slurm
Notice that Cobalt and SLURM require the introductory
shebang-interpreter line (#!/bin/bash
) which PBS does
not need.
Set only the scheduler batch queue parameters mentioned above.
In MPI-mode, ncclimo
determines the appropriate
number of tasks-per-node based on the number of nodes available and
script internals (like load-balancing for regridding).
Hence do not set a tasks-per-node parameter with scheduler configuration
parameters as this could cause conflicts.
ncclimo
do? ¶For monthly climatologies (e.g., JAN), ncclimo
passes
the list of all relevant January monthly files to NCO’s
ncra
command, which averages each variable in these monthly
files over their time-dimension (if it exists) or copies the value from
the first month unchanged (if no time-axis exists).
Seasonal climos are then created by taking the average of the monthly
climo files using ncra
.
To account for differing numbers of days per month, the ncra
‘-w’ flag is used, followed by the number of days in the relevant
months.
For example, the MAM climo is computed with
‘ncra -w 31,30,31 MAR_climo.nc APR_climo.nc MAY_climo.nc MAM_climo.nc’
(details about file names and other optimization flags have been
stripped here to make the concept easier to follow).
The annual (ANN) climo is then computed as a weighted
average of the seasonal climos.
A climatology embodies many algorithmic choices, and regridding from the
native to the analysis grid involves still more choices.
A separate method should reproduce the ncclimo
and
NCO answers to round-off precision if it implements the same
algorithmic choices.
For example, ncclimo
agrees to round-off with AMWG
diagnostics when making the same (sometimes questionable) choices.
The most important choices have to do with converting single- to
double-precision (SP and DP, respectively),
treatment of missing values, and generation/application of regridding
weights.
For concreteness and clarity we describe the algorithmic choices made in
processing a EAM-SE monthly output into a climatological
annual mean (ANN) and then regridding that.
Other climatologies (e.g., daily to monthly, or
annual-to-climatological) involve similar choices.
E3SM/ACME (and CESM) computes fields in DP and
outputs history (not restart) files as monthly means in SP.
The NCO climatology generator (ncclimo
) processes
these data in four stages.
Stage N accesses input only from stage N-1, never
from stage N-2 or earlier.
Thus the (on-disk) files from stage N determine the highest
precision achievable by stage N+1.
The general principal is to perform math (addition, weighting,
normalization) in DP and output results to disk in the same
precision in which they were input from disk (usually SP).
In Stage 1, NCO ingests Stage 0 monthly means (raw
EAM-SE output), converts SP input to DP,
performs the average across all years, then converts the answer from
DP to SP for storage on-disk as the climatological
monthly mean.
In Stage 2, NCO ingests Stage 1 climatological monthly
means, converts SP input to DP, performs the average
across all months in the season (e.g., DJF), then converts the
answer from DP to SP for storage on-disk as the
climatological seasonal mean.
In Stage 3, NCO ingests Stage 2 climatological
seasonal means, converts SP input to DP, performs
the average across all four seasons (DJF, MAM,
JJA, SON), then converts the answer from
DP to SP for storage on-disk as the climatological
annual mean.
Stage 2 weights each input month by its number of days (e.g., 31 for January), and Stage 3 weights each input season by its number of days (e.g., 92 for MAM). E3SM/ACME runs EAM-SE with a 365-day calendar, so these weights are independent of year and never change. The treatment of missing values in Stages 1–3 is limited by the lack of missing value tallies provided by Stage 0 (model) output. Stage 0 records a value as missing if it is missing for the entire month, and present if the value is valid for one or more timesteps. Stage 0 does not record the missing value tally (number of valid timesteps) for each spatial point. Thus a point with a single valid timestep during a month is weighted the same in Stages 1–4 as a point with 100% valid timesteps during the month. The absence of tallies inexorably degrades the accuracy of subsequent statistics by an amount that varies in time and space. On the positive side, it reduces the output size (by a factor of two) and complexity of analyzing fields that contain missing values. Due to the ambiguous nature of missing values, it is debatable whether they merit efforts to treat them more exactly.
The vast majority of fields undergo three promotion/demotion cycles
between EAM-SE and ANN.
No promotion/demotion cycles occur for history fields that
EAM-SE outputs in DP rather than SP, nor
for fields without a time dimension.
Typically these fields are grid coordinates (e.g., longitude, latitude)
or model constants (e.g.,
CO2
mixing ratio).
NCO never performs any arithmetic on grid coordinates or
non-time-varying input, regardless of whether they are SP or
DP.
Instead, NCO copies these fields directly from the first
input file.
Stage 4 uses a mapfile to regrid climos from the native to the
desired analysis grid.
E3SM/ACME currently uses mapfiles generated by
ESMF_RegridWeightGen
(ERWG) and by TempestRemap.
The algorithmic choices, approximations, and commands used to generate mapfiles from input gridfiles are separate issues. We mention only some of these issues here for brevity. Input gridfiles used by E3SM/ACME until ~20150901, and by CESM (then and currently, at least for Gaussian grids) contained flaws that effectively reduced their precision, especially at regional scales, and especially for Gaussian grids. E3SM/ACME (and CESM) mapfiles continue to approximate grids as connected by great circles, whereas most analysis grids (and some models) use great circles for longitude and small circles for latitude. The great circle assumption may be removed in the future. Constraints imposed by ERWG during weight-generation ensure that global integrals of fields undergoing conservative regridding are exactly conserved.
Application of weights from the mapfile to regrid the native data to the
analysis grid is straightforward.
Grid fields (e.g., latitude, longitude, area) are not regridded.
Instead they are copied (and area is reconstructed if absent) directly
from the mapfile.
NCO ingests all other native grid (source) fields, converts
SP to DP, and accumulates destination gridcell
values as the sum of the DP weight (from the sparse matrix in
the mapfile) times the (usually SP-promoted-to-DP)
source values.
Fields without missing values are then stored to disk in their
original precision.
Fields with missing values are treated (by default) with what
NCO calls the “conservative” algorithm.
This algorithm uses all valid data from the source grid on the
destination grid once and only once.
Destination cells receive the weighted valid values of the source
cells.
This is conservative because the global integrals of the source and
destination fields are equal.
See ncremap
netCDF Remapper for more description of the
conservative and of the optional (“renormalized”) algorithm.
EXAMPLES
How does one create a climo from a collection of monthly
non-CESM’ish files?
This is a two-step procedure:
First be sure the names are arranged with a YYYYMM-format date
preceding the suffix (usually ‘.nc’).
Then give any monthly input filename to ncclimo
.
Consider the MERRA2 collection, for example.
As retrieved from NASA, MERRA2 files have names like
svc_MERRA2_300.tavgM_2d_aer_Nx.200903.nc4.
While the sub-string ‘200903’ is easy to recognize as a month in
YYYYMM format, other parts (specifically the ‘300’ code)
of the filename also change with date.
We can use Bash regular expressions to extract dates and create symbolic
links to simpler filenames with regularly patterned YYYYMM
strings like merra2_200903.nc4:
for fl in `ls *.nc4` ; do # Convert svc_MERRA2_300.tavgM_2d_aer_Nx.YYYYMM.nc4 to merra2_YYYYMM.nc4 sfx_out=`expr match "${fl}" '.*_Nx.\(.*.nc4\)'` fl_out="merra2_${sfx_out}" ln -s ${fl} ${fl_out} done
Then call ncclimo
with any standard format filename, e.g.,
merra2_200903.nc4, as as the caseid:
ncclimo -c merra2_200903.nc4 -s 1980 -e 2016 -i $drc_in -o $drc_out
In the default monthly climo generation mode, ncclimo
expects
each input file to contain one single record that is the monthly average
of all fields.
Another example of of wrangling observed datasets into a
CESMish format is ECMWF Integrated Forecasting
System (IFS) output that contains twelve months per file,
rather than the one month per file that ncclimo
expects.
for yr in {1979..2016}; do # Convert ifs_YYYY01-YYYY12.nc to ifs_YYYYMM.nc yyyy=`printf "%04d" $yr` for mth in {1..12}; do mm=`printf "%02d" $mth` ncks -O -F -d time,${mth} ifs_${yyyy}01-${yyyy}12.nc ifs_${yyyy}${mm}.nc done done
Then call ncclimo
with ifs_197901.nc as caseid:
ncclimo -c ifs_197901.nc -s 1979 -e 2016 -i $drc_in -o $drc_out
ncclimo does not recognize all combinations imaginable of records per file and files per year. However, support can be added for the most prevalent combinations so that ncclimo, rather than the user, does any necessary data wrangling. Contact us if there is a common input data format you would like supported as a custom option.
Often one wishes to create a climatology of a single variable.
The ‘-f fml_nm’ option to ncclimo
makes this easy.
Consider a series of single-variable climos for the fields FSNT
,
and FLNT
ncclimo -v FSNT -f FSNT -c amip_xpt -s 1980 -e 1983 -i drc_in -o drc_out ncclimo -v FLNT -f FLNT -c amip_xpt -s 1980 -e 1983 -i drc_in -o drc_out
These climos use the ‘-f’ option and so their output files will have no namespace conflicts. Moreover, the climatologies can be generated in parallel.
ncecat
netCDF Ensemble Concatenator ¶SYNTAX
ncecat [-3] [-4] [-5] [-6] [-7] [-A] [-C] [-c] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride]]] [-F] [--fl_fmt fl_fmt] [-G gpe_dsc] [-g grp[,...]] [--gag] [--glb ...] [-H] [-h] [--hdf] [--hdr_pad nbr] [--hpss] [-L dfl_lvl] [-l path] [-M] [--md5_digest] [--mrd] [-n loop] [--no_cll_msr] [--no_frm_trm] [--no_tmp_fl] [-O] [-o output-file] [-p path] [--qnt ...] [--qnt_alg alg_nm] [-R] [-r] [--ram_all] [-t thr_nbr] [-u ulm_nm] [--unn] [-v var[,...]] [-X ...] [-x] [input-files] [output-file]
DESCRIPTION
ncecat
aggregates an arbitrary number of input files into a
single output file using using one of two methods.
Record AGgregation (RAG), the traditional method employed on
(flat) netCDF3 files and still the default method, stores
input-files as consecutive records in the output-file.
Group AGgregation (GAG) stores input-files as top-level
groups in the netCDF4 output-file.
Record Aggregation (RAG) makes numerous assumptions about the
structure of input files whereas Group Aggregation (GAG) makes
none.
Both methods are described in detail below.
Since ncecat
aggregates all the contents of the input files,
it can easily produce large output files so it is often helpful to invoke
subsetting simultaneously (see Subsetting Files).
RAG makes each variable (except coordinate variables) in each input file into a single record of the same variable in the output file. Coordinate variables are not concatenated, they are instead simply copied from the first input file to the output-file. All input-files must contain all extracted variables (or else there would be “gaps” in the output file).
A new record dimension is the glue which binds together the input file data. The new record dimension is defined in the root group of the output file so it is visible to all sub-groups. Its name is, by default, “record”. This default name can be overridden with the ‘-u ulm_nm’ short option (or the ‘--ulm_nm’ or ‘rcd_nm’ long options).
Each extracted variable must be constant in size and rank across all
input-files.
The only exception is that ncecat
allows files to differ in
the record dimension size if the requested record hyperslab
(see Hyperslabs) resolves to the same size for all files.
This allows easier gluing/averaging of unequal length timeseries from
simulation ensembles (e.g., the CMIP archive).
Classic (i.e., all netCDF3 and NETCDF4_CLASSIC
) output files
can contain only one record dimension.
ncecat
makes room for the new glue record dimension by
changing the pre-existing record dimension, if any, in the input files
into a fixed dimension in the output file.
netCDF4 output files may contain any number of record dimensions, so
ncecat
need not and does not alter the record dimensions,
if any, of the input files as it copies them to the output file.
Group AGgregation (GAG) stores input-files as top-level groups in the output-file. No assumption is made about the size or shape or type of a given object (variable or dimension or group) in the input file. The entire contents of the extracted portion of each input file is placed in its own top-level group in output-file, which is automatically made as a netCDF4-format file.
GAG has two methods to specify group names for the
output-file.
The ‘-G’ option, or its long-option equivalent ‘--gpe’,
takes as argument a group path editing description gpe_dsc of
where to place the results.
Each input file needs a distinct output group name to avoid namespace
conflicts in the output-file.
Hence ncecat
automatically creates unique output group names
based on either the input filenames or the gpe_dsc arguments.
When the user provides gpe_dsc (i.e., with ‘-G’), then the
output groups are formed by enumerating sequential two-digit numeric
suffixes starting with zero, and appending them to the specified group
path (see Group Path Editing).
When gpe_dsc is not provided (i.e., user requests GAG with
‘--gag’ instead of ‘-G’), then ncecat
forms the
output groups by stripping the input file name of any type-suffix
(e.g., .nc
), and all but the final component of the full
filename.
ncecat --gag 85.nc 86.nc 87.nc 8587.nc # Output groups 85, 86, 87 ncecat -G 85_ a.nc b.nc c.nc 8589.nc # Output groups 85_00, 85_01, 85_02 ncecat -G 85/ a.nc b.nc c.nc 8589.nc # Output groups 85/00, 85/01, 85/02
With both RAG and GAG the output-file size is
the sum of the sizes of the extracted variables in the input files.
See Statistics vs Concatenation, for a description of the
distinctions between the various statistics tools and concatenators.
As a multi-file operator, ncecat
will read the list of
input-files from stdin
if they are not specified
as positional arguments on the command line
(see Large Numbers of Files).
Suppress global metadata copying.
By default NCO’s multi-file operators copy the global metadata
from the first input file into output-file.
This helps to preserve the provenance of the output data.
However, the use of metadata is burgeoning and sometimes one
encounters files with excessive amounts of extraneous metadata.
Extracting small bits of data from such files leads to output files
which are much larger than necessary due to the automatically copied
metadata.
ncecat
supports turning off the default copying of global
metadata via the ‘-M’ switch (or its long option equivalents,
‘--no_glb_mtd’ and ‘--suppress_global_metadata’).
Consider five realizations, 85a.nc, 85b.nc,
… 85e.nc of 1985 predictions from the same climate
model.
Then ncecat 85?.nc 85_ens.nc
glues together the individual
realizations into the single file, 85_ens.nc.
If an input variable was dimensioned [lat
,lon
], it will
by default have dimensions [record
,lat
,lon
] in
the output file.
A restriction of ncecat
is that the hyperslabs of the
processed variables must be the same from file to file.
Normally this means all the input files are the same size, and contain
data on different realizations of the same variables.
Concatenating a variable packed with different scales across multiple
datasets is beyond the capabilities of ncecat
(and
ncrcat
, the other concatenator (Concatenators ncrcat
and ncecat
).
ncecat
does not unpack data, it simply copies the data
from the input-files, and the metadata from the first
input-file, to the output-file.
This means that data compressed with a packing convention must use
the identical packing parameters (e.g., scale_factor
and
add_offset
) for a given variable across all input files.
Otherwise the concatenated dataset will not unpack correctly.
The workaround for cases where the packing parameters differ across
input-files requires three steps:
First, unpack the data using ncpdq
.
Second, concatenate the unpacked data using ncecat
,
Third, re-pack the result with ncpdq
.
EXAMPLES
Consider a model experiment which generated five realizations of one year of data, say 1985. You can imagine that the experimenter slightly perturbs the initial conditions of the problem before generating each new solution. Assume each file contains all twelve months (a seasonal cycle) of data and we want to produce a single file containing all the seasonal cycles. Here the numeric filename suffix denotes the experiment number (not the month):
ncecat 85_01.nc 85_02.nc 85_03.nc 85_04.nc 85_05.nc 85.nc ncecat 85_0[1-5].nc 85.nc ncecat -n 5,2,1 85_01.nc 85.nc
These three commands produce identical answers. See Specifying Input Files, for an explanation of the distinctions between these methods. The output file, 85.nc, is five times the size as a single input-file. It contains 60 months of data.
One often prefers that the (new) record dimension have a more descriptive, context-based name than simply “record”. This is easily accomplished with the ‘-u ulm_nm’ switch. To add a new record dimension named “time” to all variables
ncecat -u time in.nc out.nc
To glue together multiple files with a new record variable named “realization”
ncecat -u realization 85_0[1-5].nc 85.nc
Users are more likely to understand the data processing history when such descriptive coordinates are used.
Consider a file with an existing record dimension named time
.
and suppose the user wishes to convert time
from a record
dimension to a non-record dimension.
This may be useful, for example, when the user has another use for the
record variable.
The simplest method is to use ‘ncks --fix_rec_dmn’, and another
possibility is to use ncecat
followed by
ncwa
:
ncecat in.nc out.nc # Convert time to non-record dimension ncwa -a record in.nc out.nc # Remove new degenerate record dimension
The second step removes the degenerate record dimension.
See ncpdq
netCDF Permute Dimensions Quickly and
ncks
netCDF Kitchen Sink for other methods of
of changing variable dimensionality, including the record dimension.
nces
netCDF Ensemble Statistics ¶SYNTAX
nces [-3] [-4] [-5] [-6] [-7] [-A] [-C] [-c] [--cb y1,y2,m1,m2,tpd] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride]]] [-F] [-G gpe_dsc] [-g grp[,...]] [--glb ...] [-H] [-h] [--hdf] [--hdr_pad nbr] [--hpss] [-L dfl_lvl] [-l path] [-n loop] [--no_cll_msr] [--no_frm_trm] [--no_tmp_fl] [--nsm_fl|grp] [--nsm_sfx sfx] [-O] [-o output-file] [-p path] [--qnt ...] [--qnt_alg alg_nm] [-R] [-r] [--ram_all] [--rth_dbl|flt] [-t thr_nbr] [--unn] [-v var[,...]] [-w wgt] [-X ...] [-x] [-y op_typ] [input-files] [output-file]
DESCRIPTION
nces
performs gridpoint statistics (including, but not limited
to, averages) on variables across an arbitrary number (an
ensemble) of input-files and/or of input groups within each
file.
Each file (or group) receives an equal weight by default.
nces
was formerly (until NCO version 4.3.9,
released December, 2013) known as ncea
(netCDF Ensemble
Averager)77.
For example, nces
will average a set of files or groups,
weighting each file or group evenly by default.
This is distinct from ncra
, which performs statistics only
over the record dimension(s) (e.g., time), and weights each record
in each record dimension evenly.
The file or group is the logical unit of organization for the results of
many scientific studies.
Often one wishes to generate a file or group which is the statistical
product (e.g., average) of many separate files or groups.
This may be to reduce statistical noise by combining the results of a
large number of experiments, or it may simply be a step in a procedure
whose goal is to compute anomalies from a mean state.
In any case, when one desires to generate a file whose statistical
properties are influenced by all the inputs, then use nces
.
As of NCO version 4.9.4, released in July, 2020,
nces
accepts user-specified weights with the ‘-w’
(or long-option equivalent ‘--wgt’, ‘--wgt_var’,
or ‘--weight’) switch.
The user must specify one weight per input file on the command line,
or the name of a (scalar or degenerate 1-D array) variable in each
input file that contains a single value to weight that file.
When no weight is specified, nces
weights each file
(e.g., ensemble) in the input-files equally.
Variables in the output-file are the same size as the variable
hyperslab in each input file or group, and each input file or group
must be the same size after hyperslabbing
78
nces
does allow files to differ in the input record
dimension size if the requested record hyperslab (see Hyperslabs)
resolves to the same size for all files.
nces
recomputes the record dimension hyperslab limits for
each input file so that coordinate limits may be used to select equal
length timeseries from unequal length files.
This simplifies analysis of unequal length timeseries from simulation
ensembles (e.g., the CMIP3 IPCC AR4
archive).
nces
works in one of two modes, file ensembles
or group ensembles.
File ensembles are the default (equivalent to the old ncea
)
and may also be explicitly specified by the ‘--nsm_fl’ or
‘--ensemble_file’ switches.
To perform statistics on ensembles of groups, a newer feature, use
‘--nsm_grp’ or ‘--ensemble_group’.
Members of a group ensemble are groups that share the same structure,
parent group, and nesting level.
Members must be leaf groups, i.e., not contain any sub-groups.
Their contents usually have different values because they are
realizations of replicated experiments.
In group ensemble mode nces
computes the statistics across
the ensemble, which may span multiple input files.
Files may contain members of multiple, distinct ensembles.
However, all ensembles must have at least one member in the first input
file.
Group ensembles behave as an unlimited dimension of datasets:
they may contain an arbitrary and extensible number of realizations in
each file, and may be composed from multiple files.
Output statistics in group ensemble mode are stored in the parent group by default. If the ensemble members are /cesm/cesm_01 and /cesm/cesm_02, then the computed statistic will be in /cesm in the output file. The ‘--nsm_sfx’ option instructs nces to instead store output in a new child group of the parent created by attaching the suffix to the parent group’s name, e.g., ‘--nsm_sfx='_avg'’ would store results in the output group /cesm/cesm_avg:
nces --nsm_grp mdl1.nc mdl2.nc mdl3.nc out.nc nces --nsm_grp --nsm_sfx='_avg' mdl1.nc mdl2.nc mdl3.nc out.nc
See Statistics vs Concatenation, for a description of the
distinctions between the statistics tools and concatenators.
As a multi-file operator, nces
will read the list of
input-files from stdin
if they are not specified
as positional arguments on the command line
(see Large Numbers of Files).
Like ncra
and ncwa
, nces
treats coordinate
variables as a special case.
Coordinate variables are assumed to be the same in all ensemble members,
so nces
simply copies the coordinate variables that appear in
ensemble members directly to the output file.
This has the same effect as averaging the coordinate variable across the
ensemble, yet does not incur the time- or precision- penalties of
actually averaging them.
ncra
and ncwa
allow coordinate variables to be
processed only by the linear average operation, regardless of the
arithmetic operation type performed on the non-coordinate variables
(see Operation Types).
Thus it can be said that the three operators (ncra
,
ncwa
, and nces
) all average coordinate variables
(even though nces
simply copies them).
All other requested arithmetic operations (e.g., maximization,
square-root, RMS) are applied only to non-coordinate variables.
In these cases the linear average of the coordinate variable will be
returned.
EXAMPLES
Consider a model experiment which generated five realizations of one year of data, say 1985. Imagine that the experimenter slightly perturbs the initial conditions of the problem before generating each new solution. Assume each file contains all twelve months (a seasonal cycle) of data and we want to produce a single file containing the ensemble average (mean) seasonal cycle. Here the numeric filename suffix denotes the realization number (not the month):
nces 85_01.nc 85_02.nc 85_03.nc 85_04.nc 85_05.nc 85.nc nces 85_0[1-5].nc 85.nc nces -n 5,2,1 85_01.nc 85.nc
These three commands produce identical answers. See Specifying Input Files, for an explanation of the distinctions between these methods. The output file, 85.nc, is the same size as the inputs files. It contains 12 months of data (which might or might not be stored in the record dimension, depending on the input files), but each value in the output file is the average of the five values in the input files.
In the previous example, the user could have obtained the ensemble average values in a particular spatio-temporal region by adding a hyperslab argument to the command, e.g.,
nces -d time,0,2 -d lat,-23.5,23.5 85_??.nc 85.nc
In this case the output file would contain only three slices of data in the time dimension. These three slices are the average of the first three slices from the input files. Additionally, only data inside the tropics is included.
As of NCO version 4.3.9 (released December, 2013)
nces
also works with groups (rather than files) as the
fundamental unit of the ensemble.
Consider two ensembles, /ecmwf
and /cesm
stored across
three input files mdl1.nc, mdl2.nc, and mdl3.nc.
Ensemble members would be leaf groups with names like /ecmwf/01
,
/ecmwf/02
etc. and /cesm/01
, /cesm/02
, etc.
These commands average both ensembles:
nces --nsm_grp mdl1.nc mdl2.nc mdl3.nc out.nc nces --nsm_grp --nsm_sfx='_min' --op_typ=min -n 3,1,1 mdl1.nc out.nc nces --nsm_grp -g cesm -v tas -d time,0,3 -n 3,1,1 mdl1.nc out.nc
nces --nsm_grp mdl1.nc mdl2.nc mdl3.nc out.nc nces --nsm_grp --nsm_sfx='_min' --op_typ=min -n 3,1,1 mdl1.nc out.nc nces --nsm_grp -g cesm -v tas -d time,0,3 -n 3,1,1 mdl1.nc out.nc
The first command stores averages in the output groups /cesm and /ecmwf, while the second stores minima in the output groups /cesm/cesm_min and /ecmwf/ecmwf_min: The third command demonstrates that sub-setting and hyperslabbing work as expected. Note that each input file may contain different numbers of members of each ensemble, as long as all distinct ensembles contain at least one member in the first file.
As of NCO version 4.9.4, released in July, 2020,
nces
accepts user-specified weights with the ‘-w’
(or long-option equivalent ‘--wgt’, ‘--wgt_var’,
or ‘--weight’) switch:
# Construct input variables with values of 1 and 2 ncks -O -M -v one ~/nco/data/in.nc ~/1.nc ncrename -O -v one,var ~/1.nc ncap2 -O -s 'var=2' ~/1.nc ~/2.nc # Three methods of weighting input files unevenly # 1. Old-method: specify input files multiple times # 2. New-method: specify one weight per input file # 3. New-method: specify weight variable in each input file nces -O ~/1.nc ~/2.nc ~/2.nc ~/out.nc # Clumsy, limited to integer weights nces -O -w 1,2 ~/1.nc ~/2.nc ~/out.nc # Flexible, works for any weight nces -O -w var ~/1.nc ~/2.nc ~/out.nc # Flexible, works for any weight # All three methods produce same answer: var=(1*1+2*2)/3=5/3=1.67 ncks ~/out.nc
ncflint
netCDF File Interpolator ¶SYNTAX
ncflint [-3] [-4] [-5] [-6] [-7] [-A] [-C] [-c] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride]]] [--fl_fmt fl_fmt] [-F] [--fix_rec_crd] [-G gpe_dsc] [-g grp[,...]] [--glb ...] [-H] [-h] [--hdr_pad nbr] [--hpss] [-i var,val3] [-L dfl_lvl] [-l path] [-N] [--no_cll_msr] [--no_frm_trm] [--no_tmp_fl] [-O] [-o file_3] [-p path] [--qnt ...] [--qnt_alg alg_nm] [-R] [-r] [--ram_all] [-t thr_nbr] [--unn] [-v var[,...]] [-w wgt1[,wgt2]] [-X ...] [-x] file_1 file_2 [file_3]
DESCRIPTION
ncflint
creates an output file that is a linear combination of
the input files.
This linear combination is a weighted average, a normalized weighted
average, or an interpolation of the input files.
Coordinate variables are not acted upon in any case, they are simply
copied from file_1.
There are two conceptually distinct methods of using ncflint
.
The first method is to specify the weight each input file contributes to
the output file.
In this method, the value val3 of a variable in the output file
file_3 is determined from its values val1 and val2 in
the two input files according to
val3 = wgt1*val1 + wgt2*val2
.
Here at least wgt1, and, optionally, wgt2, are specified on
the command line with the ‘-w’ (or ‘--weight’ or
‘--wgt_var’) switch.
If only wgt1 is specified then wgt2 is automatically
computed as wgt2 = 1 − wgt1.
Note that weights larger than 1 are allowed.
Thus it is possible to specify wgt1 = 2 and
wgt2 = -3.
One can use this functionality to multiply all values in a given
file by a constant.
As of NCO version 4.6.1 (July, 2016), the ‘-N’ switch
(or long-option equivalents ‘--nrm’ or ‘--normalize’)
implements a variation of this method.
This switch instructs ncflint
to internally normalize the two
supplied (or one supplied and one inferred) weights so that
wgt1 = wgt1/(wgt1 + wgt2 and
wgt2 = wgt2/(wgt1 + wgt2 and
.
This allows the user to input integral weights, say, and to delegate
the chore of normalizing them to ncflint
.
Be careful that ‘-N’ means what you think, since the same
switch means something quite different in ncwa
.
The second method of using ncflint
is to specify the
interpolation option with ‘-i’ (or with the ‘--ntp’ or
‘--interpolate’ long options).
This is the inverse of the first method in the following sense:
When the user specifies the weights directly, ncflint
has no
work to do besides multiplying the input values by their respective
weights and adding together the results to produce the output values.
It makes sense to use this when the weights are known
a priori.
Another class of problems has the arrival value (i.e., val3)
of a particular variable var known a priori.
In this case, the implied weights can always be inferred by examining
the values of var in the input files.
This results in one equation in two unknowns, wgt1 and wgt2:
val3 = wgt1*val1 + wgt2*val2
.
Unique determination of the weights requires imposing the additional
constraint of normalization on the weights:
wgt1 + wgt2 = 1.
Thus, to use the interpolation option, the user specifies var
and val3 with the ‘-i’ option.
ncflint
then computes wgt1 and wgt2, and uses these
weights on all variables to generate the output file.
Although var may have any number of dimensions in the input
files, it must represent a single, scalar value.
Thus any dimensions associated with var must be degenerate,
i.e., of size one.
If neither ‘-i’ nor ‘-w’ is specified on the command line,
ncflint
defaults to weighting each input file equally in the
output file.
This is equivalent to specifying ‘-w 0.5’ or ‘-w 0.5,0.5’.
Attempting to specify both ‘-i’ and ‘-w’ methods in the same
command is an error.
ncflint
does not interpolate variables of type NC_CHAR
and NC_STRING
.
This behavior is hardcoded.
By default ncflint
interpolates or multiplies record
coordinate variables (e.g., time is often stored as a record coordinate)
not other coordinate variables (e.g., latitude and longitude).
This is because ncflint
is often used to time-interpolate
between existing files, but is rarely used to spatially interpolate.
Sometimes however, users wish to multiply entire files by a constant
that does not multiply any coordinate variables.
The ‘--fix_rec_crd’ switch was implemented for this purpose
in NCO version 4.2.6 (March, 2013).
It prevents ncflint
from multiplying or interpolating any
coordinate variables, including record coordinate variables.
Depending on your intuition, ncflint
may treat missing values
unexpectedly.
Consider a point where the value in one input file, say val1,
equals the missing value mss_val_1 and, at the same point,
the corresponding value in the other input file val2 is not
misssing (i.e., does not equal mss_val_2).
There are three plausible answers, and this creates ambiguity.
Option one is to set val3 = mss_val_1.
The rationale is that ncflint
is, at heart, an interpolator
and interpolation involving a missing value is intrinsically undefined.
ncflint
currently implements this behavior since it is the
most conservative and least likely to lead to misinterpretation.
Option two is to output the weighted valid data point, i.e.,
val3 = wgt2*val2
.
The rationale for this behavior is that interpolation is really a
weighted average of known points, so ncflint
should weight the
valid point.
Option three is to return the unweighted valid point, i.e.,
val3 = val2.
This behavior would appeal to those who use ncflint
to
estimate data using the closest available data.
When a point is not bracketed by valid data on both sides, it is better
to return the known datum than no datum at all.
The current implementation uses the first approach, Option one. If you have strong opinions on this matter, let us know, since we are willing to implement the other approaches as options if there is enough interest.
EXAMPLES
Although it has other uses, the interpolation feature was designed
to interpolate file_3 to a time between existing files.
Consider input files 85.nc and 87.nc containing variables
describing the state of a physical system at times time
=
85 and time
= 87.
Assume each file contains its timestamp in the scalar variable
time
.
Then, to linearly interpolate to a file 86.nc which describes
the state of the system at time at time
= 86, we would use
ncflint -i time,86 85.nc 87.nc 86.nc
Say you have observational data covering January and April 1985 in two files named 85_01.nc and 85_04.nc, respectively. Then you can estimate the values for February and March by interpolating the existing data as follows. Combine 85_01.nc and 85_04.nc in a 2:1 ratio to make 85_02.nc:
ncflint -w 0.667 85_01.nc 85_04.nc 85_02.nc ncflint -w 0.667,0.333 85_01.nc 85_04.nc 85_02.nc
Multiply 85.nc by 3 and by −2 and add them together to make tst.nc:
ncflint -w 3,-2 85.nc 85.nc tst.nc
This is an example of a null operation, so tst.nc should be identical (within machine precision) to 85.nc.
Multiply all the variables except the coordinate variables in the file emissions.nc by by 0.8:
ncflint --fix_rec_crd -w 0.8,0.0 emissions.nc emissions.nc scaled_emissions.nc
The use of ‘--fix_rec_crd’ ensures, e.g., that the time
coordinate, if any, is not scaled (i.e., multiplied).
Add 85.nc to 86.nc to obtain 85p86.nc, then subtract 86.nc from 85.nc to obtain 85m86.nc
ncflint -w 1,1 85.nc 86.nc 85p86.nc ncflint -w 1,-1 85.nc 86.nc 85m86.nc ncdiff 85.nc 86.nc 85m86.nc
Thus ncflint
can be used to mimic some ncbo
operations.
However this is not a good idea in practice because ncflint
does not broadcast (see ncbo
netCDF Binary Operator) conforming
variables during arithmetic.
Thus the final two commands would produce identical results except that
ncflint
would fail if any variables needed to be broadcast.
Rescale the dimensional units of the surface pressure prs_sfc
from Pascals to hectopascals (millibars)
ncflint -C -v prs_sfc -w 0.01,0.0 in.nc in.nc out.nc ncatted -a units,prs_sfc,o,c,millibar out.nc
ncks
netCDF Kitchen Sink ¶SYNTAX
ncks [-3] [-4] [-5] [-6] [-7] [-A] [-a] [--area_wgt] [-b fl_bnr] [-C] [-c] [--cdl] [--chk_bnd] [--chk_chr] [--chk_map] [--chk_mss] [--chk_nan] [--chk_xtn] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride]]] [-F] [--fix_rec_dmn dim] [--fl_fmt fl_fmt] [--fmt_val format] [-G gpe_dsc] [-g grp[,...]] [--glb ...] [--grp_xtr_var_xcl] [-H] [-h] [--hdn] [--hdr_pad nbr] [--hpss] [--hrz fl_hrz] [--jsn] [--jsn_fmt lvl] [-L dfl_lvl] [-l path] [-M] [-m] [--map map-file] [--md5] [--mk_rec_dmn dim] [--no_blank] [--no_cll_msr] [--no_frm_trm] [--no_tmp_fl] [-O] [-o output-file] [-P] [-p path] [--prn_fl print-file] [-Q] [-q] [--qnt ...] [--qnt_alg alg_nm] [-R] [-r] [--rad] [--ram_all] [--rgr ...] [--rnr=wgt] [-s format] [--s1d] [-u] [--unn] [-V] [-v var[,...]] [--vrt vrt-file] [-X ...] [-x] [--xml] input-file [[output-file]]
DESCRIPTION
The nickname “kitchen sink” is a catch-all because ncks
combines most features of ncdump
and nccopy
with
extra features to extract, hyperslab, multi-slab, sub-set, and translate
into one versatile utility.
ncks
extracts (a subset of the) data from input-file,
regrids it according to map-file if specified,
then writes in netCDF format to output-file, and
optionally writes it in flat binary format to fl_bnr, and
optionally prints it to screen.
ncks
prints netCDF input data in ASCII,
CDL, JSON, or NcML/XML text formats to
stdout
, like (an extended version of) ncdump
.
By default ncks
prints CDL format.
Option ‘-s’ (or long options ‘--sng_fmt’ and ‘--string’)
permits the user to format data using C-style format strings, while
option ‘--cdl’ outputs CDL,
option ‘--jsn’ (or ‘json’) outputs JSON,
option ‘--trd’ (or ‘traditional’) outputs “traditional” format,
and option ‘--xml’ (or ‘ncml’) outputs NcML.
The “traditional” tabular format is intended to be
easy to search for the data you want, one datum per screen line, with
all dimension subscripts and coordinate values (if any) preceding the
datum.
ncks
exposes many flexible controls over printed output,
including CDL, JSON, and NcML.
Options ‘-a’, ‘--cdl’, ‘-F’, ‘--fmt_val’, ‘-H’, ‘--hdn’, ‘--jsn’, ‘-M’, ‘-m’, ‘-P’, ‘--prn_fl’, ‘-Q’, ‘-q’, ‘-s’, ‘--trd’, ‘-u’, ‘-V’, and ‘--xml’ (and their long option counterparts) control the presence of data and metadata and their formatted location and appearance when printed.
ncks
extracts (and optionally creates a new netCDF file
comprised of) only selected variables from the input file
(similar to the old ncextr
specification).
Only variables and coordinates may be specifically included or
excluded—all global attributes and any attribute associated with an
extracted variable are copied to the screen and/or output netCDF file.
Options ‘-c’, ‘-C’, ‘-v’, and ‘-x’ (and their long
option synonyms) control which variables are extracted.
ncks
extracts hyperslabs from the specified variables
(ncks
implements the original nccut
specification).
Option ‘-d’ controls the hyperslab specification.
Input dimensions that are not associated with any output variable do
not appear in the output netCDF.
This feature removes superfluous dimensions from netCDF files.
ncks
will append variables and attributes from the
input-file to output-file if output-file is a
pre-existing netCDF file whose relevant dimensions conform to dimension
sizes of input-file.
The append features of ncks
are intended to provide a
rudimentary means of adding data from one netCDF file to another,
conforming, netCDF file.
If naming conflicts exist between the two files, data in
output-file is usually overwritten by the corresponding data from
input-file.
Thus, when appending, the user should backup output-file in case
valuable data are inadvertantly overwritten.
If output-file exists, the user will be queried whether to
overwrite, append, or exit the ncks
call
completely.
Choosing overwrite destroys the existing output-file and
create an entirely new one from the output of the ncks
call.
Append has differing effects depending on the uniqueness of the
variables and attributes output by ncks
: If a variable or
attribute extracted from input-file does not have a name conflict
with the members of output-file then it will be added to
output-file without overwriting any of the existing contents of
output-file.
In this case the relevant dimensions must agree (conform) between the
two files; new dimensions are created in output-file as required.
When a name conflict occurs, a global attribute from input-file
will overwrite the corresponding global attribute from
output-file.
If the name conflict occurs for a non-record variable, then the
dimensions and type of the variable (and of its coordinate dimensions,
if any) must agree (conform) in both files.
Then the variable values (and any coordinate dimension values)
from input-file will overwrite the corresponding variable values
(and coordinate dimension values, if any) in output-file
79.
Since there can only be one record dimension in a file, the record dimension must have the same name (though not necessarily the same size) in both files if a record dimension variable is to be appended. If the record dimensions are of differing sizes, the record dimension of output-file will become the greater of the two record dimension sizes, the record variable from input-file will overwrite any counterpart in output-file and fill values will be written to any gaps left in the rest of the record variables (I think). In all cases variable attributes in output-file are superseded by attributes of the same name from input-file, and left alone if there is no name conflict.
Some users may wish to avoid interactive ncks
queries about
whether to overwrite existing data.
For example, batch scripts will fail if ncks
does not receive
responses to its queries.
Options ‘-O’ and ‘-A’ are available to force overwriting
existing files, and appending existing variables, respectively.
ncks
¶The following summarizes features unique to ncks
.
Features common to many operators are described in
Shared Features.
Switches ‘-a’, ‘--abc’, and ‘--alphabetize’
turn-off the default alphbetization of extracted fields in
ncks
only.
These switches are misleadingly named and were deprecated in
ncks
as of NCO version 4.7.1 (December, 2017).
This is the default behavior so these switches are no-ops included only
for completeness.
By default, NCO extracts, prints, and writes specified output
variables to disk in alphabetical order.
This tends to make long output lists easier to search for particular
variables.
Again, no option is necessary to write output in alphabetical order.
Until NCO version 4.7.1 (December, 2017), ncks
used the -a
, --abc
, or --alphabetize
switches to
turn-off the default alphabetization.
These names were counter-intuitive and needlessly confusing.
As of NCO version 4.7.1, ncks
uses the new switches
--no_abc
, --no-abc
, --no_alphabetize
, or
--no-alphabetize
, all of which are equivalent.
The --abc
and --alphabetize
switches are now no-ops,
i.e., they write the output in the unsorted order of the input.
The -a
switch is now completely deprecated in favor of the
clearer long option switches.
Activate native machine binary output writing to binary file file. Also ‘--fl_bnr’ and ‘--binary-file’. Writing packed variables in binary format is not supported. Metadata is never output to the binary file. Examine the netCDF output file to see the variables in the binary file. Use the ‘-C’ switch, if necessary, to avoid wanting unwanted coordinates to the binary file:
% ncks -O -v one_dmn_rec_var -b bnr.dat -p ~/nco/data in.nc out.nc % ls -l bnr.dat | cut -d ' ' -f 5 # 200 B contains time and one_dmn_rec_var 200 % ls -l bnr.dat % ncks -C -O -v one_dmn_rec_var -b bnr.dat -p ~/nco/data in.nc out.nc % ls -l bnr.dat | cut -d ' ' -f # 40 B contains one_dmn_rec_var only 40
As of NCO version 4.6.5 (March, 2017), ncks
can
print human-legible calendar strings corresponding to time values with
UDUnits-compatible date units of the form time-since-basetime, e.g.,
‘days since 2000-01-01’ and a CF calendar attribute, if
any.
Enact this with the ‘--calendar’ (also ‘--cln’,
‘--prn_lgb’, and ‘--datestamp’) option when printing in any mode.
Invoking this option when dbg_lvl >= 1 in CDL
mode prints both the value and the calendar string (one in comments):
zender@aerosol:~$ ncks -D 1 --cal -v tm_365 ~/nco/data/in.nc ... variables: double tm_365 ; tm_365:units = "days since 2013-01-01" ; // char tm_365:calendar = "365_day" ; // char data: tm_365 = "2013-03-01"; // double value: 59 ... zender@aerosol:~$ ncks -D 1 -v tm_365 ~/nco/data/in.nc ... tm_365 = 59; // calendar format: "2013-03-01" ...
This option is similar to the ncdump
‘-t’ option.
As of NCO version 4.6.8 (August, 2017), ncks
CDL printing supports finer-grained control of date formats
with the ‘--dt_fmt=dt_fmt’ (or ‘--date_format’) option.
The dt_fmt is an enumerated integer from 0–3.
Values dt_fmt=0 or 1 correspond to the short format for
dates that are the default.
The value dt_fmt=2 requests the “regular” format for
dates, dt_fmt=3 requests the full ISO-8601 format
with the “T” separator and the comma:
ncks -H -m -v time_bnds -C --dt_fmt=value ~/nco/data/in.nc # Value: Output: # 0,1 1964-03-13 09:08:16 # Default, short format # 2 1964-03-13 09:08:16.000000 # Regular format # 3 1964-03-13T09:08:16.000000 # ISO8601 'T' format
Note that ‘--dt_fmt’ automatically implies ‘--cal’ makes that options superfluous.
As of NCO version 4.9.4 (September, 2020), invoking the ‘--dt_fmt’ option now applies equally well to JSON and XML output as to CDL output:
% ncks -d time,0 -v time --cdl --dt_fmt=3 ~/nco/data/in.nc ... time = "1964-03-13T21:09:0.000000" ; ... % ncks -d time,0 -v time --json --dt_fmt=3 ~/nco/data/in.nc ... "data": ["1964-03-13T21:09:0.000000"] ... % ncks -d time,0 -v time --xml --dt_fmt=3 ~/nco/data/in.nc ... <ncml:values separator="*">1964-03-13T21:09:0.000000</ncml:values> ...
As of NCO version 4.9.0 (December, 2019), invoking
‘--chk_map’ causes ncks
to evaluate the quality of
regridding weights in the map-file provided as input-file.
This option works with map-files (not grid-files) in
ESMF/CMIP6-compliant format (i.e., a sparse matrix
variable named S
and coordinates [xy][ab]_[cv]
.
When invoked with the additional ‘--area_wgt’ option, the
evaluation statistics are area-weighted and thus exactly represent
the global-mean/min/max/mebs/rms/sdn biases expected when regridding a
globally uniform field.
This tool makes it easier to objectively assess weight-generation
algorithms, and will hopefully assist in their improvement.
Thanks to Mark Taylor of Saturday Night Live (SNL) and Paul
Ullrich of UC Davis for this suggestion and early prototypes.
$ ncks --chk_map map.nc # Unweighted statistics $ ncks --chk_map --dbg=2 map.nc # Additional diagnostics $ ncks --chk_map --area_wgt map.nc # Area-weighted statistics
The map-checker performs numerous checks and reports numerous
statistics, probably more than you care about.
Be assured that each piece of provided information has in the past
proved useful to developers of weight-generation and regridding
algorithms.
Most of the time, users can learn whether the examined map is of
sufficient quality for their purposes by examing only a few of these
statistics.
Before defining these primary statistics, it is helpful to understand
the meaning of the weight-array S (stored in a map-file as the
variable S
), and the terminology of rows and columns.
A remapping (aka regridding) transforms a field on an input grid to an
an output grid while conserving to the extent possible or desired the
local and global properties of the field.
The map S is a matrix of M rows and N columns of
weights, where M is the number of gridcells (or degrees of
freedom, DOFs) in the destination grid, and N is the
number of gridcells (or DOFs) in the source grid.
An individual weight S(m,n) represents the
fractional contribution to destination gridcell m by source
gridcell n.
By convention the weights are normalized to sum to unity in each row
(destination gridcell) that completely overlaps the input grid.
Thus the weights in a single row are all equivalent to the fractional
destination areas that the same destination gridcell (we will drop the
DOF terminology hereafter for conciseness) receives from
each source gridcell.
Regardless of the values of the individual weights, it is intuitive
that their row-sum should never exceed unity because that would be
physically equivalent to an output gridcell receiving more than its
own area from the source grid.
Map-files typically store these row-sum statistics for each
destination gridcell in the frac_b
variable described further
below.
Likewise the weights in a single column represent the fractional
destination areas that a single source gridcell contributes to
every output gridcell.
Each output gridcell in a column may have a different area so
column-sums need not, and in general do not, sum to unity.
However, a source gridcell ought to contribute to the destination
grid a total area equal to its own area.
Thus a constraint on column-sums is that their weights, themselves
weighted by the destination gridcell area corresponding to each row,
should sum exactly to the source gridcell area.
In other words, the destination-area-weighted column-sum divided by
the source gridcell area would be unity (in a perfect first order
map) for every source gridcell that completely overlaps valid
destination gridcells.
Map-files typically store these area-weighted-column-sum-ratio
statistics for each gridcell in the frac_a
variable described
further below.
Storing the entire weight-matrix S is unnecessary because only a
relative handful of gridcells in the source grid contribute to a given
destination gridcell, and visa versa.
Instead, map-files store only the non-zero S(m,n),
and encode them as a sparse-matrix.
Storing S as a sparse matrix rather than a full matrix reduces
overall storage sizes by a factor on the order of the ratio of the
product of the grid sizes to their sum, or about 10,000 for grids with
horizontal resolution near one degree, and more for finer resolutions.
The sparse-matrix representation is a one-dimensional array of weights
S
, together with two ancillary arrays, row
and column
, that contain the one-dimensional row and column
indices, respectively, corresponding to the destination and source
gridcells of the associated weight.
By convention, map-files store the row and column indices using the
1-based convention in common use in the 1990s when regridding software
was all written in Fortran.
The map-checker prints cell locations with 1-based indices as well:
% ncks --chk_map map_ne30np4_to_cmip6_180x360_nco.20190601.nc Characterization of map-file map_ne30np4_to_cmip6_180x360_nco.20190601.nc Cell triplet elements : [Fortran (1-based) index, center latitude, center longitude] Sparse matrix size n_s: 246659 Weight min S(190813): 5.1827201764857658e-25 from cell \ [33796,-45.7998,+136.437] to [15975,-45.5,+134.5] Weight max S( 67391): 1.0000000000000000e+00 from cell \ [33671,-54.4442,+189.645] to [12790,-54.5,+189.5] Ignored weights (S=0.0): 0 ...
Here the map-file weights span twenty-five orders of magnitude. This may seem large though in practice is typical for high-resolution intersection grids. The Fortran-convention index of each weight extreme is followed by its geographic latitude and longitude. Reporting the locations of extrema, and of gridcells whose metrics miss their target values by more than a specificied tolerance, are prime map-checker features.
As mentioned above, the two statistics most telling about map quality are the weighted column-sums frac_a and the row-sums frac_b. The short-hand names for what these metrics quantify are Conservation and Consistency, respectively. Conservation means the total fraction of an input gridcell that contributes to the output grid. For global input and output grids that completely tile the sphere, the entirety of each input gridcell should contribute (i.e., map to) the output grid. The same concept that applies locally to conservation of a gridcell value applies globally to the overall conservation of an input field. Thus a perfectly conservative mapping between global grids that tile the sphere would have frac_a = 1.0 for every input gridcell, and for the mean of all input gridcells.
The map-checker computes Conservation (frac_a) from the stored
variables S
, row
, column
, area_a
, and
area_b
in the map-file, and then compares those values to the
frac_a
values (if any) on-disk, and warns of any disagreements
80.
By definition, conservation is perfect to first order if the sum of
the destination-gridcell-area-weighted weights (which is an area)
equals the source gridcell area, and so their ratio (frac_a) is
unity.
Computing the area-weighted-column-sum-ratios and comparing those
frac_a to the stored frac_a
catches any discrepancies.
The analysis sounds an alarm when discrepancies exceed a tolerance
(currently 5.0e-16).
More importantly, the map-checker reports the summary statistics of
the computed frac_a metrics and their imputed errors, including
the grid mean, minimum, maximum, mean-absolute bias, root-mean-square
bias, and standard deviation.
% ncks --chk_map map_ne30np4_to_cmip6_180x360_nco.20190601.nc ... Conservation metrics (column-sums of area_b-weighted weights normalized by area_a) and errors--- Perfect metrics for global Grid B are avg = min = max = 1.0, mbs = rms = sdn = 0.0: frac_a avg: 1.0000000000000000 = 1.0-0.0e+00 // Mean frac_a min: 0.9999999999991109 = 1.0-8.9e-13 // Minimum in grid A cell [45328,+77.3747,+225] frac_a max: 1.0000000000002398 = 1.0+2.4e-13 // Maximum in grid A cell [47582,+49.8351,+135] frac_a mbs: 0.0000000000000096 = 9.6e-15 // Mean absolute bias from 1.0 frac_a rms: 0.0000000000000167 = 1.7e-14 // RMS relative to 1.0 frac_a sdn: 0.0000000000000167 = 1.7e-14 // Standard deviation ...
The values of the frac_a
metric are generally imperfect (not
1.0) for global grids.
The bias is the deviation from the target metric shown in the second
floating-point column in each row above (e.g., 8.9e-13).
These biases should be vanishingly small with respect to unity.
Mean biases as large as 1.0e-08 may be considered acceptable for
off-line analyses (i.e., a single regridding of raw data) though the
acceptable tolerance should be more stringent for on-line use such as
in a coupler where forward and reverse mappings may be applied tens of
thousands of times.
The mean biases for such on-line regridding should be close to 1.0e-15
in order for tens-of-thousands of repetitions to still conserve
mass/energy to full double-precision.
The minimum and maximum gridcell biases indicate the worst performing locations of the mapping. These are generally much (a few orders of magnitude) greater than the mean biases. Observe that the minimum and maximum biases in the examples above and below occur at longitudes that are multiples of 45 degrees. This is characteristic of mappings to/from for cube-square grids whose faces have edges, and thus additional complexity, at multiples of 45 degrees. This illustrates how intersection grid geometry influences biases. More complex, finer-scale structures, produce greater biases. The Root-Mean-Square (RMS) and standard deviation metrics characterize the distribution of biases throughout the entire intersection grid, and are thus complementary information to the minimum and maximum biases.
Consistency expresses the total fraction of an output gridcell that receives contributions from the input grid. Thus Consistency is directly analogous to Conservation, only applied to the output grid. Conservation is the extent to which the mapping preserves the local and grid-wide integrals of input fields, while Consistency is the extent to which the mapping correctly aligns the input and output grids so that each destination cell receives the appropriate proportion of the input integrals. The mapping will produce an acceptably faithful reproduction of the input on the output grid only if all local and global Conservation and Consistency metrics meet the acceptable error tolerances.
The map-checker computes the Consistency (frac_b) as row-sums of
the weights stored in S
and compares these to the stored values
of frac_b
.
(Note how the definition of weights S(m,n) as the
fractional contribution to destination gridcell m by source
gridcell n makes calculation of frac_b almost trivial in
comparison to frac_a).
Nevertheless, frac_b
in the file may differ from the computed
row-sum for example if the map-file generator artificially limits the
stored frac_b
value for any cell to 1.0 for those row-sums
that exceed 1.0.
The map-checker raises an alarm when discrepancies between computed
and stored frac_b
exceed a tolerance (currently 5.0e-16).
There are semi-valid reasons a map-generator might do this, so this
does not necessarily indicate an error.
The alarm simply informs the user that applying the weights will lead
to a slightly different Consistency than indicated by the stored
frac_b
.
As with frac_a
, the values of frac_b
are generally
imperfect (not 1.0) for global grids:
% ncks --chk_map map_ne30np4_to_cmip6_180x360_nco.20190601.nc ... Consistency metrics (row-sums of weights) and errors--- Perfect metrics for global Grid A are avg = min = max = 1.0, mbs = rms = sdn = 0.0: frac_b avg: 0.9999999999999999 = 1.0-1.1e-16 // Mean frac_b min: 0.9999999999985523 = 1.0-1.4e-12 // Minimum in grid B cell [59446,+75.5,+45.5] frac_b max: 1.0000000000004521 = 1.0+4.5e-13 // Maximum in grid B cell [63766,+87.5,+45.5] frac_b mbs: 0.0000000000000065 = 6.5e-15 // Mean absolute bias from 1.0 frac_b rms: 0.0000000000000190 = 1.9e-14 // RMS relative to 1.0 frac_b sdn: 0.0000000000000190 = 1.9e-14 // Standard deviation ...
This example shows that frac_b has the greatest local errors at similar boundaries (multiples of 45 degrees longitude) as frac_a. It is typical for Conservation and Consistency to degrade in intricate areas of the intersection grid, and these areas occur at multiples of 45 degrees longitude for cubed-sphere mappings.
The map-checker will produce area-weighted metrics when invoked
with the --area_wgt
flag, e.g.,
‘ncks --area_wgt in.nc’.
Area-weighted statistics show the exact local and global results to
expect with real-world grids in which large consistency/conservation
errors in small gridcells may be less important than smaller errors in
larger gridcells.
Global-weighted mean statistics will of course differ from unweighted
statistics, although the minimum and maximum do not change:
% ncks --area_wgt map_ne30np4_to_cmip6_180x360_nco.20190601.nc ... Conservation metrics (column-sums of area_b-weighted weights normalized by area_a) and errors--- Perfect metrics for global Grid B are avg = min = max = 1.0, mbs = rms = sdn = 0.0: frac_a avg: 1.0000000000000009 = 1.0+8.9e-16 // Area-weighted mean frac_a min: 0.9999999999999236 = 1.0-7.6e-14 // Minimum in grid A cell [12810,+3.44654,+293.25] frac_a max: 1.0000000000001146 = 1.0+1.1e-13 // Maximum in grid A cell [16203,-45.7267,+272.31] frac_a mbs: 0.0000000000000067 = 6.7e-15 // Area-weighted mean absolute bias from 1.0 frac_a rms: 0.0000000000000102 = 1.0e-14 // Area-weighted RMS relative to 1.0 frac_a sdn: 0.0000000000000103 = 1.0e-14 // Standard deviation Consistency metrics (row-sums of weights) and errors--- Perfect metrics for global Grid A are avg = min = max = 1.0, mbs = rms = sdn = 0.0: frac_b avg: 1.0000000000000047 = 1.0+4.7e-15 // Area-weighted mean frac_b min: 0.9999999999998442 = 1.0-1.6e-13 // Minimum in grid B cell [48415,+44.5,+174.5] frac_b max: 1.0000000000002611 = 1.0+2.6e-13 // Maximum in grid B cell [16558,-44.5,+357.5] frac_b mbs: 0.0000000000000065 = 6.5e-15 // Area-weighted mean absolute bias from 1.0 frac_b rms: 0.0000000000000129 = 1.3e-14 // Area-weighted RMS relative to 1.0 frac_b sdn: 0.0000000000000133 = 1.3e-14 // Standard deviation ...
The examples above show no outstanding differences (besides rounding) between the unweighted and area-weighted statistics. The absence of degradation between the global unweighted statistics (further up the page) and the global weighted statistics (just above) demonstrates there are no important correlations between local weight biases and gridcell areas. The area-weighted mean frac_b statistic deserves special mention. Its value is the exact factor by which the mapping will shift the global mean of a spatially uniform input field. This metric is, therefore, first among equals when evaluating the quality of maps under consideration for use in time-stepping models where global conservation (e.g., of mass or energy) is crucial.
As of NCO version 4.9.2 (March, 2020), adding the ‘--frac_b_nrm’ flag changes the map-checker into a read-write algorithm that first diagnoses the map-file statistics described above and then re-writes the weights (and weight-derived statistics frac_a and frac_b) to compensate or “fix” issues that poor-quality input grids can cause. Input grids can and often do have regions that are not tiled by any portion of any input gridcell. For example, many FV ocean grids (such as MPAS) are empty (have no gridcells) in land regions beyond the coasts. Some FV ocean grids have gridcells everywhere and mask (i.e., screen-out) the non-ocean gridcells by setting the mask value to zero. Both these designs are perfectly legal. What is illegal, yet sometimes encountered in practice, is overlapping gridcells on the same input grid. Such an input grid is said to be self-overlapping.
The surface topography dataset grid SCRIPgrid_1km-merge-10min_HYDRO1K-merge-nomask_c130402.nc (hereafter the HYDRO1K grid for short) used by E3SM and CESM is self-overlapping. Weight-generators that receive the same input location twice might (if they do not take precaustions to idenfity the issue, which no known weight-generators do) double-weight the self-overlapped region(s). In other words, self-overlapping input grids can lead weight-generators to produce values frac_b >> 1.0. Applying these weights would lead to exaggerated values on the destination grid.
The best solution to this issue is to adjust the input grid to
avoid self-overlap.
However, this solution may be difficult or impractical where the
original data, producer, or algorithm are unavailable or unclear.
In such cases, the --frac_b_nrm
flag provides a workaround.
Please understand that ‘ncks --frac_b_nrm map.nc’ is designed to
alter map.nc in-xsplace, so backup the original file first.
% ncks --frac_b_nrm map_hydro1k_to_ne1024np4_nco.20200301.nc ... ...
As of NCO version 5.2.0 (February, 2022), ncks
can report all coordinates that lack a corresponding
bounds
attribute.
This check complies with CF Conventions and with
NASA’s Dataset Interoperability Working Group
(DIWG).
CF requires that coordinate variables that describe
a continuous (not discrete) axis contain a “bounds” attribute
that points to a variable marking the edges of each gridcell
(in time, space, or other dimensions).
This option reports which coordinates lack the required
bounds
attribute, so that a file can be easily
checked for compliance with the convention:
$ ncks --chk_bnd in.nc ncks: WARNING nco_chk_bnd() reports coordinate Lat does not contain "bounds" attribute ncks: WARNING nco_chk_bnd() reports coordinate Lon does not contain "bounds" attribute ncks: INFO nco_chk_bnd() reports total number of coordinates without "bounds" attribute is 2
The identifiers in a netCDF file are the set of dimension,
group, variable, and attribute names it contains.
As of NCO version 5.1.8 (September, 2023), ncks
can report all identifiers that violate the CF Convention
that identifiers “should begin with a letter and be composed of
letters, digits, and underscores.”
System or library-defined identifiers (such as _FillValue
)
are not subject to this (user-land) rule.
NASA’s Dataset Interoperability Working Group
(DIWG) supports this convention.
This option reports which identifiers do not comply with this
convention, so that a file can be easily checked for compliance with
the DIWG recommendation and the underlying CF
Convention:
$ ncks --chk_chr ~/nco/data/in.nc ... ncks: WARNING nco_chk_chr() reports variable att_var_spc_chr attribute name "at_in_name@" is not CF-compliant ncks: WARNING nco_chk_chr() reports variable name "var_nm-dash" is not CF-compliant ncks: WARNING nco_chk_chr() reports variable var_nm-dash attribute name "att_nm-dash" is not CF-compliant ncks: INFO nco_chk_chr() reports total number of identifiers with CF non-compliant names is 26
As of NCO version 5.1.8 (September, 2023), ncks
can report all variables and groups that contain a
missing_value
attribute.
NASA’s Dataset Interoperability Working Group
(DIWG) notes that the missing_value
attribute has been
semi-deprecated, and recommends that it should not be used in new
Earth Science data products.
This option reports which variables (and groups) contain a
missing_value
attribute, so that a file can be easily
checked for compliance with the DIWG recommendation:
$ ncks --chk_mss ~/nco/data/in.nc ncks: WARNING nco_chk_mss() reports variable fll_val_mss_val contains "missing_value" attribute ncks: WARNING nco_chk_mss() reports variable one_dmn_rec_var_missing_value contains "missing_value" attribute ... ncks: WARNING nco_chk_mss() reports variable rec_var_int_mss_val_flt contains "missing_value" attribute ncks: INFO nco_chk_mss() reports total number of variables and/or groups with "missing_value" attribute is 11
As of NCO version 4.8.0 (May, 2019), ncks
can
locate NaN
or NaNf
in double- and single-precision
floating-point variables, respectively.
NCO prints the location of the first NaN
(if any)
encountered in each variable.
NASA’s Dataset Interoperability Working Group
(DIWG) notes that the missing_value
attribute has been
semi-deprecated, and recommends that it should not be used in new
Earth Science data products.
This option reports allows users to easily check whether all the
floating point variables in a file comply with the DIWG
recommendation:
$ ncks --chk_nan ~/nco/data/in_4.nc ncks: WARNING nco_chk_nan() reports variable /nan_arr has first NaNf at hyperslab element 1 ncks: WARNING nco_chk_nan() reports variable /nan_scl has first NaNf at hyperslab element 0 ncks: INFO nco_chk_nan() reports total number of floating-point variables with NaN elements is 2
Thanks to Matthew Thompson of NASA for originally suggesting this feature.
A filename extension is the suffix that follows the final
period ‘.’ in a filename.
For example, the suffix of ‘in.file.nc’ is ‘nc’.
NASA’s Dataset Interoperability Working Group
(DIWG) recommends that “files created with the HDF5, HDF-EOS5, or netCDF
APIs should have filename extensions \"h5\", \"he5\", or \"nc\",
respectively.”
As of NCO version 5.1.9 (November, 2023), ncks
can report all filenames that violate this DIWG
recommendation.
This option reports which filenames do not comply with this
convention.
If a file appears to be mis-labeled, e.g., the extension is ‘.h5’
but the file contents match HDF5-EOS structure, that will
also be reported.
zender@spectral:~$ ncks --chk_xtn ~/nco/data/in.nc zender@spectral:~$ ncks --chk_xtn ~/in.nc4 ncks: WARNING nco_chk_xtn() reports filename extension "nc4" is non-compliant ncks: HINT rename file with "nc" rather than "nc4" extension ncks: INFO nco_chk_xtn() reports total number of non-compliant filename extensions is 1
Change record dimension dim in the input file into a fixed
dimension in the output file.
Also ‘--no_rec_dmn’.
Before NCO version 4.2.5 (January, 2013), the syntax for
--fix_rec_dmn
did not permit or require the specification of
the dimension name dim.
This is because the feature only worked on netCDF3 files, which support
only one record dimension, so specifying its name was unnecessary.
netCDF4 files allow an arbitrary number of record dimensions, so the
user must specify which record dimension to fix.
The decision was made that starting with NCO version 4.2.5
(January, 2013), it is always required to specify the dimension name to
fix regardless of the netCDF file type.
This keeps the code simple, and is symmetric with the syntax for
--mk_rec_dmn
, described next.
As of NCO version 4.4.0 (January, 2014), the argument
all
may be given to ‘--fix_rec_dmn’ to convert all
record dimensions to fixed dimensions in the output file.
Previously, ‘--fix_rec_dmn’ only allowed one option, the name of a
single record dimension to be fixed.
Now it is simple to simultaneously fix all record dimensions.
This is useful (and nearly mandatory) when flattening netCDF4 files that
have multiple record dimensions per group into netCDF3 files (which are
limited to at most one record dimension) (see Group Path Editing).
As of NCO version 4.4.0 (January, 2014), the ‘--hdn’
or ‘--hidden’ options print hidden (aka special) attributes.
This is equivalent to ‘ncdump -s’.
Hidden attributes include: _Format
, _DeflateLevel
,
_Shuffle
, _Storage
, _ChunkSizes
,
_Endianness
, _Fletcher32
, and _NOFILL
.
Previously ncks
ignored all these attributes in
CDL/XML modes.
Now it prints these attributes as appropriate in all modes.
As of NCO version 4.4.6 (September, 2014), ‘--hdn’
also prints the extended file format (i.e., the format of the file
or server supplying the data) as _SOURCE_FORMAT
.
As of NCO version 4.6.1 (August, 2016), ‘--hdn’
also prints the hidden attributes _NCProperties
,
_IsNetcdf4
, and _SuperblockVersion
for netCDF4 files so
long as NCO is linked against netCDF library version 4.4.1 or
later.
Users are referred to the
Unidata netCDF Documentation,
or the man pages for ncgen
or ncdump
, for
detailed descriptions of the meanings of these hidden attributes.
As of NCO version 4.3.3 (July, 2013), ncks
can
print extracted data and metadata to screen (i.e., stdout
) as
valid CDL (network Common data form Description Language).
CDL is the human-readable “lingua franca” of netCDF ingested by
ncgen
and excreted by ncdump
.
As of NCO version 4.6.9 (September, 2017), ncks
prints CDL by default, and the “traditional” mode must
be explicitly selected with ‘--trd’.
Compare ncks
“traditional” with CDL printing:
zender@roulee:~$ ncks --trd -v one ~/nco/data/in.nc one: type NC_FLOAT, 0 dimensions, 1 attribute, chunked? no, compressed? no, packed? no one size (RAM) = 1*sizeof(NC_FLOAT) = 1*4 = 4 bytes one attribute 0: long_name, size = 3 NC_CHAR, value = one one = 1 zender@roulee:~$ ncks --cdl -v one ~/nco/data/in.nc netcdf in { variables: float one ; one:long_name = "one" ; data: one = 1 ; } // group /
Users should note the NCO’s CDL mode outputs successively more verbose additional diagnostic information in CDL comments as the level of debugging increases from zero to two. For example printing the above with ‘-D 2’ yields
zender@roulee:~$ ncks -D 2 --cdl -v one ~/nco/data/in.nc netcdf in { // ncgen -k classic -b -o in.nc in.cdl variables: float one ; // RAM size = 1*sizeof(NC_FLOAT) = 1*4 = 4 bytes, ID = 147 one:long_name = "one" ; // char data: one = 1 ; } // group /
ncgen
converts CDL-mode output into a netCDF file:
ncks -v one ~/nco/data/in.nc > ~/in.cdl ncgen -k netCDF-4 -b -o ~/in.nc ~/in.cdl ncks -v one ~/in.nc
The HDF4 version of ncgen
, often named
hncgen
, h4_ncgen
, or ncgen-hdf
,
(usually) converts netCDF3 CDL into an HDF file:
cd ~/nco/data ncgen -b -o hdf.hdf hdf.cdl # HDF ncgen is sometimes named...ncgen ncgen -b -o in.hdf in.cdl # Fails: Some valid netCDF CDL breaks HDF ncgen hncgen -b -o hdf.hdf hdf.cdl # HDF ncgen is hncgen in some RPM packages h4_ncgen -b -o hdf.hdf hdf.cdl # HDF ncgen is h4_ncgen in Anaconda packages ncgen-hdf -b -o hdf.hdf hdf.cdl # HDF ncgen is ncgen-hdf in some Debian packages hdp dumpsds hdf.hdf # ncdump/h5dump-equivalent for HDF4 h4_ncdump dumpsds hdf.hdf # ncdump/h5dump-equivalent for HDF4
Note that HDF4 does not support netCDF-style groups, so the
above commands fail when the input file contains groups.
Only netCDF4 and HDF5 support groups.
In our experience the HDF ncgen
command, by
whatever name installed, is not robust and fails on many valid netCDF3
CDL constructs.
The HDF4 version of ncgen
will definitely fail on
the default NCO input file nco/data/in.cdl.
The NCO source code distribution provides
nco/data/hdf.cdl that works with the HDF4 version
of ncgen
, and can be used to test HDF files.
Change existing dimension dim to a record dimension in the output file.
This is the most straightforward way of changing a dimension to a/the
record dimension, and works fine in most cases.
See ncecat
netCDF Ensemble Concatenator and
ncpdq
netCDF Permute Dimensions Quickly for other methods of
changing variable dimensionality, including the record dimension.
Toggle (turn-on or turn-off) default behavior of printing data (not
metadata) to screen or copying data to disk.
Also activated using ‘--print’ or ‘--prn’.
By default ncks
prints all metadata but no data to screen
when no netCDF output-file is specified.
And if output-file is specified, ncks
copies all
metadata and all data to it.
In other words, the printing/copying default is context-sensitive,
and ‘-H’ toggles the default behavior.
Hence, use ‘-H’ to turn-off copying data (not metadata) to an
output file.
(It is occasionally useful to write all metadata to a file, so that
the file has allocated the required disk space to hold the data, yet
to withold writing the data itself).
And use ‘-H’ to turn-on printing data (not metadata) to screen.
Unless otherwise specified (with -s
), each element of the data
hyperslab prints on a separate line containing the names, indices,
and, values, if any, of all of the variables dimensions.
The dimension and variable indices refer to the location of the
corresponding data element with respect to the variable as stored on
disk (i.e., not the hyperslab).
% ncks --trd -C -v three_dmn_var in.nc lat[0]=-90 lev[0]=100 lon[0]=0 three_dmn_var[0]=0 lat[0]=-90 lev[0]=100 lon[1]=90 three_dmn_var[1]=1 lat[0]=-90 lev[0]=100 lon[2]=180 three_dmn_var[2]=2 ... lat[1]=90 lev[2]=1000 lon[1]=90 three_dmn_var[21]=21 lat[1]=90 lev[2]=1000 lon[2]=180 three_dmn_var[22]=22 lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23
Printing the same variable with the ‘-F’ option shows the same variable indexed with Fortran conventions
% ncks -F -C -v three_dmn_var in.nc lon(1)=0 lev(1)=100 lat(1)=-90 three_dmn_var(1)=0 lon(2)=90 lev(1)=100 lat(1)=-90 three_dmn_var(2)=1 lon(3)=180 lev(1)=100 lat(1)=-90 three_dmn_var(3)=2 ...
Printing a hyperslab does not affect the variable or dimension indices since these indices are relative to the full variable (as stored in the input file), and the input file has not changed. However, if the hyperslab is saved to an output file and those values are printed, the indices will change:
% ncks --trd -H -d lat,90.0 -d lev,1000.0 -v three_dmn_var in.nc out.nc ... lat[1]=90 lev[2]=1000 lon[0]=0 three_dmn_var[20]=20 lat[1]=90 lev[2]=1000 lon[1]=90 three_dmn_var[21]=21 lat[1]=90 lev[2]=1000 lon[2]=180 three_dmn_var[22]=22 lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23 % ncks --trd -C -v three_dmn_var out.nc lat[0]=90 lev[0]=1000 lon[0]=0 three_dmn_var[0]=20 lat[0]=90 lev[0]=1000 lon[1]=90 three_dmn_var[1]=21 lat[0]=90 lev[0]=1000 lon[2]=180 three_dmn_var[2]=22 lat[0]=90 lev[0]=1000 lon[3]=270 three_dmn_var[3]=23
As of NCO version 4.6.2 (November, 2016), ncks
can
print extracted metadata and data to screen (i.e., stdout
) as
JSON, JavaScript Object Notation, defined
here.
ncks
supports JSON output more completely, flexibly,
and robustly than any other tool to our knowledge.
With ncks
one can translate entire netCDF3 and netCDF4 files
into JSON, including metadata and data, using all
NCO’s subsetting and hyperslabbing capabilities.
NCO uses a JSON format we developed ourselves,
during a year of discussion among interested researchers.
Some refer to this format as NCO-JSON, to disambiguate it from
other JSON formats for netCDF data.
Other projects have since adopted, and some can generate,
NCO-JSON.
Projects that support NCO-JSON include ERDDAP
(https://coastwatch.pfeg.noaa.gov/erddap/index.html, choose output
filetype .ncoJson
from this
table)
and CF-JSON (http://cf-json.org).
Behold JSON output in default mode:
zender@aerosol:~$ ncks --jsn -v one ~/nco/data/in.nc { "variables": { "one": { "type": "float", "attributes": { "long_name": "one" }, "data": 1.0 } } }
NCO converts to (using commonsense rules) and prints all
NC_TYPEs as one of three atomic types distinguishable as
JSON values: float
, string
, and int
81.
Floating-point types (NC_FLOAT
and NC_DOUBLE
)
are printed with a decimal point and at least one signficant digit
following the decimal point, e.g., 1.0
rather than 1.
or
1
.
Integer types (e.g., NC_INT
, NC_UINT64
) are printed
with no decimal point.
String types (NC_CHAR
and NC_STRING
) are enclosed
in double-quotes.
The JSON specification allows many possible output formats for netCDF files. NCO developers implemented a working prototype in Octoboer, 2016 and, after discussing options with interested parties here, finalized the emitted JSON syntax a few weeks later. The resulting JSON backend supports three levels of pedanticness, ordered from more concise, flexible, and human-readable to more verbose, restrictive, and 1-to-1 reproducible. JSON-specific switches access these modes and other features. Each JSON configuration option automatically triggers JSON printing, so that specifying ‘--json’ in addition to a JSON configuration option is redundant and unnecessary.
Request a specific format level with the pedantic level argument to the ‘--jsn_fmt lvl’ option. As of NCO version 4.6.3 (December, 2016), the option formerly known as ‘--jsn_att_fmt’ was renamed simply ‘--jsn_fmt’. The more general name reflects the fact that the option controls all JSON formatting, not just attribute formatting. As of version 4.6.3, NCO defaults to demarcate inner dimensions of variable data with (nested) square brackets rather than printing data as an unrolled single dimensional array. An array with C-ordered dimensionality [2,3,4] prints as:
% ncks --jsn -v three_dmn_var ~/nco/data/in.nc ... "data": [[[0.0, 1.0, 2.0, 3.0], [4.0, 5.0, 6.0, 7.0], [8.0, 9.0, 10.0,11.0]], [[12.0, 13.0, 14.0, 15.0], [16.0, 17.0, 18.0, 19.0], [20.0,21.0, 22.0, 23.0]]] ... % ncks --jsn_fmt=4 -v three_dmn_var ~/nco/data/in.nc ... "data": [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0,22.0, 23.0] ...
One can recover the former behavior (and omit the brackets) by adding four to the base pedantic level lvl (as shown above). Besides the potential offset of four, lvl may take one of three values between 0–2:
% ncks --jsn_fmt=0 -v att_var ~/nco/data/in_grp.nc ... "att_var": { "shape": ["time"], "type": "float", "attributes": { "byte_att": [0, 1, 2, 127, -128, -127, -2, -1], "char_att": "Sentence one.\nSentence two.\n", "short_att": 37, "int_att": 73, "long_att": 73, "float_att": [73.0, 72.0, 71.0, 70.010, 69.0010, 68.010, 67.010], "double_att": [73.0, 72.0, 71.0, 70.010, 69.0010, 68.010, 67.0100010] }, "data": [10.0, 10.10, 10.20, 10.30, 10.40101, 10.50, 10.60, 10.70, 10.80, 10.990] ...
This least pedantic mode produces the most easily read results, and
suffices for many (most?) purposes.
Any downstream parser is expected to assign an appropriate type as
indicated by JSON syntax rules.
Because the original attributes’ NC_TYPE
are not output,
a downstream parser may not exactly reproduce the input file datatypes.
For example, whether the original attribute string was stored as
NC_CHAR
or NC_STRING
will be unknown to a downstream
parser.
Distinctions between NC_FLOAT
and NC_DOUBLE
are similarly
lost, as are all distinctions among the integer types.
In our experience, these distinctions are immaterial for attributes,
which are intended for metadata not for large-scale storage.
Type-distinctions can, however, significantly impact the size of
variable data, responsible for nearly all the storage required by
datasets.
For instance, storing or transferring an NC_SHORT
field as
NC_DOUBLE
would waste a factor of four in space or bandwidth.
This is why NCO always prints the NC_TYPE
of variable data.
Downstream parsers can (but are not required to) take advantage of the
variable’s NC_TYPE
to choose the most efficient storage type.
The Shape member of the variable object is an ordered array of
dimension names such as "shape": ["lat","lon"]
, similar to its
use in NcML.
Each name corresponds to a previously defined Dimension object
that, taken together, define the rank, shape, and size of the
variable.
Variables are assumed to be scalar by default.
Hence the Shape member is mandatory for arrays, and is always omitted
for scalars (by contrast, NcML requires an empty shape string to
indicate scalars).
NC_FLOAT
, NC_CHAR
,
NC_SHORT
, and NC_BYTE
are printed as objects with an
explicit type so that parsers do not use the default type.
Attributes of type NC_DOUBLE
, NC_STRING
, and NC_INT
are printed as JSON arrays, as in the lvl = 0
above:
% ncks --jsn_fmt=1 -v att_var ~/nco/data/in.nc ... "att_var": { "shape": ["time"], "type": "float", "attributes": { "byte_att": { "type": "byte", "data": [0, 1, 2, 127, -128, -127, -2, -1]}, "char_att": "Sentence one.\nSentence two.\n", "short_att": { "type": "short", "data": 37}, "int_att": 73, "long_att": 73, "float_att": [73.0, 72.0, 71.0, 70.010, 69.0010, 68.010, 67.010], "double_att": { "type": "double", "data": [73.0, 72.0, 71.0, 70.010, 69.0010, 68.010, 67.0100010]} }, "data": [10.0, 10.10, 10.20, 10.30, 10.40101, 10.50, 10.60, 10.70, 10.80, 10.990] ...
The attributes of type NC_BYTE
, NC_SHORT
, and
NC_DOUBLE
are printed as JSON objects that comprise an
NC_TYPE
and a value list, because their values could conceivably
not be representable, or would waste space, if interpreted as
NC_INT
or NC_FLOAT
, respectively.
All other attributes may be naturally mapped to the type indicated by
the JSON syntax of the value, where numbers are assumed to
correspond to NC_FLOAT
for floating-point, NC_INT
for
integers, and NC_CHAR
or NC_STRING
for strings.
This minimal increase in verbosity allows a downstream parser to
re-construct the original dataset with nearly identical attributes types
to the original.
% ncks --jsn_fmt=2 -v att_var ~/nco/data/in.nc ... "att_var": { "shape": ["time"], "type": "float", "attributes": { "byte_att": { "type": "byte", "data": [0, 1, 2, 127, -128, -127, -2, -1]}, "char_att": { "type": "char", "data": "Sentence one.\nSentence two.\n"}, "short_att": { "type": "short", "data": 37}, "int_att": { "type": "int", "data": 73}, "long_att": { "type": "int", "data": 73}, "float_att": { "type": "float", "data": [73.0, 72.0, 71.0, 70.010, 69.0010, 68.010, 67.010]}, "double_att": { "type": "double", "data": [73.0, 72.0, 71.0, 70.010, 69.0010, 68.010, 67.0100010]} }, "data": [10.0, 10.10, 10.20, 10.30, 10.40101, 10.50, 10.60, 10.70, 10.80, 10.990] ...
That ncks produces correct translations of for all supported
datatypes may be verified by a JSON syntax checker command
like jsonlint
.
Please let us know how to improve JSON features for your
application.
Turn-on printing to screen or turn-off copying global and group metadata.
This includes file summary information and global and group attributes.
Also ‘--Mtd’ and ‘--Metadata’.
By default ncks
prints global metadata to screen if no netCDF
output file and no variable extraction list is specified (with ‘-v’).
Use ‘-M’ to print global metadata to screen if a netCDF output is
specified, or if a variable extraction list is specified (with ‘-v’).
Use ‘-M’ to turn-off copying of global and group metadata when
copying, subsetting, or appending to an output file.
The various combinations of printing switches can be confusing.
In an attempt to anticipate what most users want to do, ncks
uses context-sensitive defaults for printing.
Our goal is to minimize the use of switches required to accomplish the
common operations.
We assume that users creating a new file or overwriting (e.g., with
‘-O’) an existing file usually wish to copy all global and
variable-specific attributes to the new file.
In contrast, we assume that users appending (e.g., with ‘-A’ an
explicit variable list from one file to another usually wish to copy
only the variable-specific attributes to the output file.
The switches ‘-H’, ‘-M’, and ‘-m’ switches are
implemented as toggles which reverse the default behavior.
The most confusing aspect of this is that ‘-M’ inhibits copying
global metadata in overwrite mode and causes copying of global
metadata in append mode.
ncks in.nc # Print VAs and GAs ncks -v one in.nc # Print VAs not GAs ncks -M -v one in.nc # Print GAs only ncks -m -v one in.nc # Print VAs only ncks -M -m -v one in.nc # Print VAs and GAs ncks -O in.nc out.nc # Copy VAs and GAs ncks -O -v one in.nc out.nc # Copy VAs and GAs ncks -O -M -v one in.nc out.nc # Copy VAs not GAs ncks -O -m -v one in.nc out.nc # Copy GAs not VAs ncks -O -M -m -v one in.nc out.nc # Copy only data (no atts) ncks -A in.nc out.nc # Append VAs and GAs ncks -A -v one in.nc out.nc # Append VAs not GAs ncks -A -M -v one in.nc out.nc # Append VAs and GAs ncks -A -m -v one in.nc out.nc # Append only data (no atts) ncks -A -M -m -v one in.nc out.nc # Append GAs not VAs
where VAs
and GAs
denote variable and group/global
attributes, respectively.
Turn-on printing to screen or turn-off copying variable metadata.
Using ‘-m’ will print variable metadata to screen (similar to
ncdump -h).
This displays all metadata pertaining to each variable, one variable
at a time.
This includes information on the storage properties of the variable,
such as whether it employs chunking, compression, or packing.
Also activated using ‘--mtd’ and ‘--metadata’.
The ncks
default behavior is to print variable metadata to
screen if no netCDF output file is specified.
Use ‘-m’ to print variable metadata to screen if a netCDF output is
specified.
Also use ‘-m’ to turn-off copying of variable metadata to an output
file.
Print numeric representation of missing values.
As of NCO version 4.2.2 (October, 2012), NCO prints
missing values as blanks (i.e., the underscore character ‘_’) by default.
To enable the old behavior of printing the numeric representation of
missing values (e.g., 1.0e36
), use the ‘--no_blank’ switch.
Also activated using ‘--noblank’ or ‘--no-blank’.
Print data, metadata, and units to screen. The ‘-P’ switch is a convenience abbreviation for ‘-C -H -M -m -u’. Also activated using ‘--print’ or ‘--prn’. This set of switches is useful for exploring file contents.
Activate printing formatted output to file print-file. Also ‘--print_file’, ‘--fl_prn’, and ‘--file_print’. One can achieve the same result by redirecting stdout to a named file. However, it is slightly faster to print formatted output directly to a file than to stdout:
ncks --fl_prn=foo.txt --jsn in.nc
Print quietly, meaning omit dimension names, indices, and coordinate values when printing arrays. Variable (not dimension) indices are printed. Variable names appear flush left in the output:
zender@roulee:~$ ncks --trd -Q -v three_dmn_rec_var -C -H ~/nco/data/in.nc three_dmn_rec_var[0]=1 ...
This helps locate specific variables in lists with many variables and different dimensions. See also the ‘-V’ option, which omits all names and indices and prints only variable values.
Quench (turn-off) all printing to screen.
This overrides the setting of all print-related switches, equivalent to
-H -M -m when in single-file printing mode.
When invoked with -R
(see Retaining Retrieved Files), ncks
automatically sets -q
.
This allows ncks
to retrieve remote files without
automatically trying to print them.
Also ‘--quench’.
Retain all dimensions.
When invoked with --rad
(Retain All Dimensions),
ncks
copies each dimension in the input file to the output
file, regardless of whether the dimension is utilized by any variables.
Normally ncks
discards “orphan dimensions”, i.e., dimensions
not referenced by any variables.
This switch allows users to keep non-referenced dimensions in the workflow.
When invoked in printing mode, causes orphaned dimensions to be printed
(they are not printed by default).
Also ‘--retain_all_dimensions’, ‘--orphan_dimensions’, and
‘--rph_dmn’.
String format for text output.
Accepts C language escape sequences and printf()
formats.
Also ‘--string’ and ‘--sng_fmt’.
This option is only intended for use with traditional (TRD)
printing, and thus automatically invokes the ‘--trd’ switch.
Supply a printf()
-style format for printed output, i.e., in
CDL, JSON, TRD, or XML modes.
Also ‘--val_fmt’ and ‘--value_format’.
One use for this option is to reduce the printed precision of floating
point values:
# Default printing of original double precision values # 0.0,0.1,0.12,0.123,0.1234,0.12345,0.123456,0.1234567,0.12345678,0.123456789 % ncks -C -v ppc_dbl ~/nco/data/in.nc ... ppc_dbl = 0, 0.1, 0.12, 0.123, 0.1234, 0.12345, 0.123456, 0.1234567, 0.12345678, 0.123456789 ; ... # Restrict printing to three digits after the decimal % ncks --fmt_val=%.3f -C -v ppc_dbl ~/nco/data/in.nc ... ppc_dbl = 0., 0.1, 0.12, 0.123, 0.123, 0.123, 0.123, 0.123, 0.123, 0.123 ; ...
The supplied format only applies to floating point variable values
(NC_FLOAT
or NC_DOUBLE
), and not to other types or to
attributes.
For reference, the default printf()
format for
CDL, JSON, TRD, and XML modes is
%#.7gf
, %#.7g
, %g
, and %#.7g
,
respectively, for single-precision data, and, for double-precision data is
%#.15g
, %#.15g
, %.12g
, and %#.15g
,
respectively.
NCO introduced this feature in version 4.7.3 (March, 2018).
We would appreciate your feedback on whether and how to extend this
feature to make it more useful.
As of NCO version 5.2.0, released in February, 2024,
ncks
can help analyze initial condition and restart datasets
produced by the E3SM ELM and CESM CLM/CTSM
land-surface models.
Whereas gridded history datasets from these ESMs use a
standard gridded data format, land-surface "restart files" employ a
custom packing format that unwinds multi-dimensional data into
sparse, 1-D (S1D) arrays that are not easily
visualized.
ncks
can convert S1D files into gridded datasets
where all dimensions are explicitly declared, rather than unrolled or
"packed".
Invoke this conversion feature with the --s1d
option
(or long option equivalents, --sparse
or --unpacksparse
)
and, with ‘--hrz_crd fl_hrz’ (e.g.,
‘--hrz_crd hrz.nc’), point to the file that contains the
horizontal coordinates (that restart files usually omit).
The output file is the fully gridded input file, with no loss
of information:
ncks --s1d --hrz=elmv3_history.nc elmv3_restart.nc out.nc
The output file contains all input variables placed on a lat-lon or unstructured grid, with new dimensions for Plant Funtional Type (PFT) and multiple elevation classes (MECs).
The S1D capabilities have steadily grown culminating in major new features in NCO version 5.2.9 and version 5.3.0, released in October and November, 2024, respectively.
The ‘--rgr lut_out=$lut_out’ option specifies that only columns of specified landunit type(s) should appear in the output for column-variables. The value lut_out is the standard landunit type of the column. Two additional values specify to output area-weighted averages of multiple landunit types:
lut_out Output will be value of column(s) in this andunit 0 Not Currently Used 1 Vegetated or bare soil 2 Crop 3 Landice (plain, no MEC) 4 Landice multiple elevation classes 5 Deep lake 6 Wetland 7 Urban tall building district 8 Urban high density 9 Urban medium density 10 Area-weighted average of all landunit types except MEC glaciers 13 Area-weighted average of soil+(non-MEC) glacier
This feature is necessarily restricted to restart datasets, e.g.,
ncks --s1d --lut_out=1 --hrz=hst.nc rst.nc s1d.nc # Output Soil LUT ncks --s1d --lut_out=13 --hrz=hst.nc rst.nc s1d.nc # Avg Soil+Glacier
S1D can now grid snow-related variables into a top-down
(ocean-like) vertical grid that many think is more intuitive.
By default the land system models ELM, CLM, and
CTSM store the negative of the number of active snow layers
in the variable SNLSNO
.
Restart files for these models store the active snow layer butted-up
against the lowest layers in the snow-level dimension (so that they
are continguous with soil layers to simplify hydrologic
calculations).
This makes good modeling sense though also makes snow variables in
restart files difficult to visualize.
By default S1D now uses SNLSNO
, if present, to unpack
active layers of snow variables into a top-layer first, downwards
order, increasing with depth.
Inactive layers are underneath the bottom (i.e., where they
reside physically).
The resulting snow variables appear like ocean state variables over
uneven bathymetry, with missing values underneath.
We call this "snow-ocean" ordering to contrast it with the on-disk
storage order of snow variables.
ncks --s1d --rgr snw_ocn --hrz=hst.nc rst.nc s1d.nc # Snow-ocean order ncks --s1d --rgr no_snw_ocn --hrz=hst.nc rst.nc s1d.nc # Input order
Print summary of ncks
hidden features.
These hidden or secret features are used mainly by developers.
They are not supported for general use and may change at any time.
This demonstrates conclusively that I cannot keep a secret.
Also ‘--ssh’ and ‘--scr’.
From 1995–2017 ncks
dumped the ASCII text
representation of netCDF files in what we now call “traditional”
mode.
Much of this manual contains output printed in traditional mode,
which places one value per line, with complete dimensional
information.
Traditional-mode metadata output includes lower-level information,
such as RAM usage and internal variable IDs, than
CDL.
While this is useful for some developers and user, CDL has,
over the years, become more useful than traditional mode for most
users.
As of NCO version 4.6.9 (September, 2017) CDL became
the default printing mode.
Traditional printing mode is accessed via the ‘--trd’ option.
Toggle the printing of a variable’s units
attribute, if any,
with its values.
Also ‘--units’.
Print variable values only. Do not print variable and dimension names, indices, and coordinate values when printing arrays.
zender@roulee:~$ ncks --trd -V -v three_dmn_rec_var -C -H ~/nco/data/in.nc 1 ...
See also the ‘-Q’ option, which prints variable names and indices, but not dimension names, indices, or coordinate values when printing arrays. Using ‘-V’ is the same as specifying ‘-Q --no_nm_prn’.
As of NCO version 4.3.3 (July, 2013), ncks
can
print extracted data and metadata to screen (i.e., stdout
) as
XML in NcML, the netCDF Markup Language.
ncks
supports XML more completely than
‘ncdump -x’.
With ncks
one can translate entire netCDF3 and netCDF4 files
into NcML, including metadata and data, using all
NCO’s subsetting and hyperslabbing capabilities.
Compare ncks
“traditional” with XML printing:
zender@roulee:~$ ncks --trd -v one ~/nco/data/in.nc one: type NC_FLOAT, 0 dimensions, 1 attribute, chunked? no, compressed? no, packed? no one size (RAM) = 1*sizeof(NC_FLOAT) = 1*4 = 4 bytes one attribute 0: long_name, size = 3 NC_CHAR, value = one one = 1 zender@roulee:~$ ncks --xml -v one ~/nco/data/in.nc <?xml version="1.0" encoding="UTF-8"?> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/home/zender/nco/data/in.nc"> <variable name="one" type="float" shape=""> <attribute name="long_name" separator="*" value="one" /> <values>1.</values> </variable> </netcdf>
XML-mode prints variable metadata and, as of
NCO version 4.3.7 (October, 2013), variable data and, as of
NCO version 4.4.0 (January, 2014), hidden attributes.
That ncks produces correct NcML translations of
CDM files for all supported datatypes is verified by
comparison to output from Unidata’s toolsUI
Java program.
Please let us know how to improve XML/NcML
features.
ncks
provides additional options to configure NcML
output: ‘--xml_no_location’, ‘--xml_spr_chr’, and
‘--xml_spr_nmr’.
Every NcML configuration option automatically triggers
NcML printing, so that specifying ‘--xml’ in addition
to a configuration option is redundant and unnecessary.
The ‘--xml_no_location’ switch prevents output of the
NcML location
element.
By default the location element is printed with a value equal to the
location of the input dataset, e.g.,
location="/home/zender/in.nc"
.
The ‘--xml_spr_chr’ and ‘--xml_spr_nmr’ options customize
the strings used as NcML separators for attributes and
variables of character-type and numeric-type, respectively.
Their default separators are *
and “
” (a space):
zender@roulee:~$ ncks --xml -d time,0,3 -v two_dmn_rec_var_sng in.nc ... <values separator="*">abc*bcd*cde*def</values> ... zender@roulee:~$ ncks --xml_spr_chr=', ' -v two_dmn_rec_var_sng in.nc ... <values separator=", ">abc, bcd, cde, def, efg, fgh, ghi, hij, jkl, klm</values> ... zender@roulee:~$ ncks --xml -v one_dmn_rec_var in.nc ... <values>1 2 3 4 5 6 7 8 9 10</values> ... zender@roulee:~$ ncks --xml_spr_nmr=', ' -v one_dmn_rec_var in.nc ... <values separator=", ">1, 2, 3, 4, 5, 6, 7, 8, 9, 10</values> ...
Separator elements for strings are a thorny issue.
One must be sure that the separator element is not mistaken as a portion
of the string.
NCO attempts to produce valid NcML and supplies the
‘--xml_spr_chr’ option to work around any difficulties.
NCO performs precautionary checks with
strstr(val,spr)
to identify presence of the separator
string (spr) in data (val) and, when it detects a match,
automatically switches to a backup separator string (*|*
).
However limitations of strstr()
may lead to false negatives
when the separator string occurs in data beyond the first string in
multi-dimensional NC_CHAR
arrays.
Hence, results may be ambiguous to NcML parsers.
If problems arise, use ‘--xml_spr_chr’ to specify a multi-character
separator that does not appear in the string array and that does not
include an NcML formatting characters (e.g., commas, angles, quotes).
ncks
¶We encourage the use of standard UNIX pipes and filters to
narrow the verbose output of ncks
into more precise targets.
For example, to obtain an uncluttered listing of the variables in a file
try
ncks --trd -m in.nc | grep -E ': type' | cut -f 1 -d ' ' | sed 's/://' | sort
A Bash user could alias the previous filter to the shell command
ncvarlst
as shown below.
More complex examples could involve command line arguments.
For example, a user may frequently be interested in obtaining the value
of an attribute, e.g., for textual file examination or for passing to
another shell command.
Say the attribute is purpose
, the variable is z
, and the
file is in.nc
.
In this example, ncks --trd -m -v z
is too verbose so a robust
grep
and cut
filter is desirable, such as
ncks --trd -M -m in.nc | grep -E -i "^z attribute [0-9]+: purpose" | cut -f 11- -d ' ' | sort
The filters are clearly too complex to remember on-the-fly so the entire
procedure could be implemented as a shell command or function called,
say, ncattget
function ncattget { ncks --trd -M -m ${3} | grep -E -i "^${2} attribute [0-9]+: ${1}" | cut -f 11- -d ' ' | sort ; }
The shell ncattget
is invoked with three arugments that are,
in order, the names of the attribute, variable, and file to examine.
Global attributes are indicated by using a variable name of global
.
This definition yields the following results
% ncattget purpose z in.nc Height stored with a monotonically increasing coordinate % ncattget Purpose Z in.nc Height stored with a monotonically increasing coordinate % ncattget history z in.nc % ncattget history global in.nc History global attribute.
Note that case sensitivity has been turned off for the variable and
attribute names (and could be turned on by removing the ‘-i’ switch
to grep
).
Furthermore, extended regular expressions may be used for both the
variable and attribute names.
The next two commands illustrate this by searching for the values
of attribute purpose
in all variables, and then for all
attributes of the variable z
:
% ncattget purpose .+ in.nc 1-D latitude coordinate referred to by geodesic grid variables 1-D longitude coordinate referred to by geodesic grid variables ... % ncattget .+ Z in.nc Height Height stored with a monotonically increasing coordinate meter
Extended filters are best stored as shell commands if they are used frequently. Shell commands may be re-used when they are defined in shell configuration files. These files are usually named .bashrc, .cshrc, and .profile for the Bash, Csh, and Sh shells, respectively.
# NB: Untested on Csh, Ksh, Sh, Zsh! Send us feedback! # Bash shell (/bin/bash), .bashrc examples # ncattget $att_nm $var_nm $fl_nm : What attributes does variable have? function ncattget { ncks --trd -M -m ${3} | grep -E -i "^${2} attribute [0-9]+: ${1}" | cut -f 11- -d ' ' | sort ; } # ncunits $att_val $fl_nm : Which variables have given units? function ncunits { ncks --trd -m ${2} | grep -E -i " attribute [0-9]+: units.+ ${1}" | cut -f 1 -d ' ' | sort ; } # ncavg $var_nm $fl_nm : What is mean of variable? function ncavg { ncwa -y avg -O -C -v ${1} ${2} ~/foo.nc ; ncks --trd -H -C -v ${1} ~/foo.nc | cut -f 3- -d ' ' ; } # ncavg $var_nm $fl_nm : What is mean of variable? function ncavg { ncap2 -O -C -v -s "foo=${1}.avg();print(foo)" ${2} ~/foo.nc | cut -f 3- -d ' ' ; } # ncdmnlst $fl_nm : What dimensions are in file? function ncdmnlst { ncks --cdl -m ${1} | cut -d ':' -f 1 | cut -d '=' -s -f 1 ; } # ncvardmnlst $var_nm $fl_nm : What dimensions are in a variable? function ncvardmnlst { ncks --trd -m -v ${1} ${2} | grep -E -i "^${1} dimension [0-9]+: " | cut -f 4 -d ' ' | sed 's/,//' ; } # ncvardmnlatlon $var_nm $fl_nm : Does variable contain both lat and lon dimensions? function ncvardmnlatlon { flg=`ncks -C -v ${1} -m ${2} | grep -E -i "${1}\(" | grep -E "lat.*lon|lon.*lat"` ; [[ ! -z "$flg" ]] && echo "Yes, ${1} has both lat and lon dimensions" || echo "No, ${1} does not have both lat and lon dimensions" } # ncdmnsz $dmn_nm $fl_nm : What is dimension size? function ncdmnsz { ncks --trd -m -M ${2} | grep -E -i ": ${1}, size =" | cut -f 7 -d ' ' | uniq ; } # ncgrplst $fl_nm : What groups are in file? function ncgrplst { ncks -m ${1} | grep 'group:' | cut -d ':' -f 2 | cut -d ' ' -f 2 | sort ; } # ncvarlst $fl_nm : What variables are in file? function ncvarlst { ncks --trd -m ${1} | grep -E ': type' | cut -f 1 -d ' ' | sed 's/://' | sort ; } # ncmax $var_nm $fl_nm : What is maximum of variable? function ncmax { ncwa -y max -O -C -v ${1} ${2} ~/foo.nc ; ncks --trd -H -C -v ${1} ~/foo.nc | cut -f 3- -d ' ' ; } # ncmax $var_nm $fl_nm : What is maximum of variable? function ncmax { ncap2 -O -C -v -s "foo=${1}.max();print(foo)" ${2} ~/foo.nc | cut -f 3- -d ' ' ; } # ncmdn $var_nm $fl_nm : What is median of variable? function ncmdn { ncap2 -O -C -v -s "foo=gsl_stats_median_from_sorted_data(${1}.sort());print(foo)" ${2} ~/foo.nc | cut -f 3- -d ' ' ; } # ncmin $var_nm $fl_nm : What is minimum of variable? function ncmin { ncap2 -O -C -v -s "foo=${1}.min();print(foo)" ${2} ~/foo.nc | cut -f 3- -d ' ' ; } # ncrng $var_nm $fl_nm : What is range of variable? function ncrng { ncap2 -O -C -v -s "foo_min=${1}.min();foo_max=${1}.max();print(foo_min,\"%f\");print(\" to \");print(foo_max,\"%f\")" ${2} ~/foo.nc ; } # ncmode $var_nm $fl_nm : What is mode of variable? function ncmode { ncap2 -O -C -v -s "foo=gsl_stats_median_from_sorted_data(${1}.sort());print(foo)" ${2} ~/foo.nc | cut -f 3- -d ' ' ; } # ncrecsz $fl_nm : What is record dimension size? function ncrecsz { ncks --trd -M ${1} | grep -E -i "^Root record dimension 0:" | cut -f 10- -d ' ' ; } # nctypget $var_nm $fl_nm : What type is variable? function nctypget { ncks --trd -m -v ${1} ${2} | grep -E -i "^${1}: type" | cut -f 3 -d ' ' | cut -f 1 -d ',' ; } # Csh shell (/bin/csh), .cshrc examples (derive others from Bash definitions): ncattget() { ncks --trd -M -m -v ${3} | grep -E -i "^${2} attribute [0-9]+: ${1}" | cut -f 11- -d ' ' | sort ; } ncdmnsz() { ncks --trd -m -M ${2} | grep -E -i ": ${1}, size =" | cut -f 7 -d ' ' | uniq ; } ncvarlst() { ncks --trd -m ${1} | grep -E ': type' | cut -f 1 -d ' ' | sed 's/://' | sort ; } ncrecsz() { ncks --trd -M ${1} | grep -E -i "^Record dimension:" | cut -f 8- -d ' ' ; } # Sh shell (/bin/sh), .profile examples (derive others from Bash definitions): ncattget() { ncks --trd -M -m ${3} | grep -E -i "^${2} attribute [0-9]+: ${1}" | cut -f 11- -d ' ' | sort ; } ncdmnsz() { ncks --trd -m -M ${2} | grep -E -i ": ${1}, size =" | cut -f 7 -d ' ' | uniq ; } ncvarlst() { ncks --trd -m ${1} | grep -E ': type' | cut -f 1 -d ' ' | sed 's/://' | sort ; } ncrecsz() { ncks --trd -M ${1} | grep -E -i "^Record dimension:" | cut -f 8- -d ' ' ; }
EXAMPLES
View all data in netCDF in.nc, printed with Fortran indexing conventions:
ncks -F in.nc
Copy the netCDF file in.nc to file out.nc.
ncks in.nc out.nc
Now the file out.nc contains all the data from in.nc.
There are, however, two differences between in.nc and
out.nc.
First, the history
global attribute (see History Attribute)
will contain the command used to create out.nc.
Second, the variables in out.nc will be defined in alphabetical
order.
Of course the internal storage of variable in a netCDF file should be
transparent to the user, but there are cases when alphabetizing a file
is useful (see description of -a
switch).
Copy all global attributes (and no variables) from in.nc to out.nc:
ncks -A -x ~/nco/data/in.nc ~/out.nc
The ‘-x’ switch tells NCO to use the complement of the extraction list (see Subsetting Files). Since no extraction list is explicitly specified (with ‘-v’), the default is to extract all variables. The complement of all variables is no variables. Without any variables to extract, the append (‘-A’) command (see Appending Variables) has only to extract and copy (i.e., append) global attributes to the output file.
Copy/append metadata (not data) from variables in one file to variables in a second file. When copying/subsetting/appending files (as opposed to printing them), the copying of data, variable metadata, and global/group metadata are now turned OFF by ‘-H’, ‘-m’, and ‘-M’, respectively. This is the opposite sense in which these switches work when printing a file. One can use these switches to easily replace data or metadata in one file with data or metadata from another:
# Extract naked (data-only) copies of two variables ncks -h -M -m -O -C -v one,three_dmn_rec_var ~/nco/data/in.nc ~/out.nc # Change values to be sure original values are not copied in following step ncap2 -O -v -s 'one*=2;three_dmn_rec_var*=0' ~/nco/data/in.nc ~/in2.nc # Append in2.nc metadata (not data!) to out.nc ncks -A -C -H -v one,three_dmn_rec_var ~/in2.nc ~/out.nc
Variables in out.nc now contain data (not metadata) from in.nc and metadata (not data) from in2.nc.
Print variable three_dmn_var
from file in.nc with
default notations.
Next print three_dmn_var
as an un-annotated text column.
Then print three_dmn_var
signed with very high precision.
Finally, print three_dmn_var
as a comma-separated list:
% ncks --trd -C -v three_dmn_var in.nc lat[0]=-90 lev[0]=100 lon[0]=0 three_dmn_var[0]=0 lat[0]=-90 lev[0]=100 lon[1]=90 three_dmn_var[1]=1 ... lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23 % ncks --trd -s '%f\n' -C -v three_dmn_var in.nc 0.000000 1.000000 ... 23.000000 % ncks --trd -s '%+16.10f\n' -C -v three_dmn_var in.nc +0.0000000000 +1.0000000000 ... +23.0000000000 % ncks --trd -s '%f, ' -C -v three_dmn_var in.nc 0.000000, 1.000000, ..., 23.000000,
Programmers will recognize these as the venerable C language
printf()
formatting strings.
The second and third options are useful when pasting data into text
files like reports or papers.
See ncatted
netCDF Attribute Editor, for more details on string
formatting and special characters.
As of NCO version 4.2.2 (October, 2012), NCO prints missing values as blanks (i.e., the underscore character ‘_’) by default:
% ncks --trd -C -H -v mss_val in.nc lon[0]=0 mss_val[0]=73 lon[1]=90 mss_val[1]=_ lon[2]=180 mss_val[2]=73 lon[3]=270 mss_val[3]=_ % ncks -s '%+5.1f, ' -H -C -v mss_val in.nc +73.0, _, +73.0, _,
To print the numeric value of the missing value instead of a blank, use the ‘--no_blank’ option.
ncks
prints in a verbose fashion by default and supplies a
number of switches to pare-down (or even spruce-up) the output.
The interplay of the ‘-Q’, ‘-V’, and (otherwise undocumented)
‘--no_nm_prn’ switches yields most desired verbosities:
% ncks -v three_dmn_rec_var -C -H ~/nco/data/in.nc time[0]=1 lat[0]=-90 lon[0]=0 three_dmn_rec_var[0]=1 % ncks -Q -v three_dmn_rec_var -C -H ~/nco/data/in.nc three_dmn_rec_var[0]=1 % ncks -V -v three_dmn_rec_var -C -H ~/nco/data/in.nc 1 % ncks -Q --no_nm_prn -v three_dmn_rec_var -C -H ~/nco/data/in.nc 1 % ncks --no_nm_prn -v three_dmn_rec_var -C -H ~/nco/data/in.nc 1 -90 0 1
One dimensional arrays of characters stored as netCDF variables are automatically printed as strings, whether or not they are NUL-terminated, e.g.,
ncks -v fl_nm in.nc
The %c
formatting code is useful for printing
multidimensional arrays of characters representing fixed length strings
ncks -s '%c' -v fl_nm_arr in.nc
Using the %s
format code on strings which are not NUL-terminated
(and thus not technically strings) is likely to result in a core dump.
Create netCDF out.nc containing all variables, and any associated
coordinates, except variable time
, from netCDF in.nc:
ncks -x -v time in.nc out.nc
As a special case of this, consider how to remove a
variable such as time_bounds
that is identified in a
CF Convention (see CF Conventions) compliant
ancillary_variables
, bounds
, climatology
,
coordinates
, or grid_mapping
attribute.
NCO subsetting assumes the user wants all ancillary variables,
axes, bounds and coordinates associated with all extracted variables
(see Subsetting Coordinate Variables).
Hence to exclude a ancillary_variables
, bounds
,
climatology
, coordinates
, or grid_mapping
variable
while retaining the “parent” variable (here time
), one must use
the ‘-C’ switch:
ncks -C -x -v time_bounds in.nc out.nc
The ‘-C’ switch tells the operator NOT to necessarily
include all the CF ancillary variables, axes, bounds, and
coordinates.
Hence the output file will contain time
and not
time_bounds
.
Extract variables time
and pressure
from netCDF
in.nc.
If out.nc does not exist it will be created.
Otherwise the you will be prompted whether to append to or to
overwrite out.nc:
ncks -v time,pressure in.nc out.nc ncks -C -v time,pressure in.nc out.nc
The first version of the command creates an out.nc which contains
time
, pressure
, and any coordinate variables associated
with pressure.
The out.nc from the second version is guaranteed to contain only
two variables time
and pressure
.
Create netCDF out.nc containing all variables from file
in.nc.
Restrict the dimensions of these variables to a hyperslab.
The specified hyperslab is: the fifth value in dimension time
;
the
half-open range lat > 0. in coordinate lat
; the
half-open range lon < 330. in coordinate lon
; the
closed interval 0.3 < band < 0.5 in coordinate band
;
and cross-section closest to 1000. in coordinate lev
.
Note that limits applied to coordinate values are specified with a
decimal point, and limits applied to dimension indices do not have a
decimal point See Hyperslabs.
ncks -d time,5 -d lat,,0.0 -d lon,330.0, -d band,0.3,0.5 -d lev,1000.0 in.nc out.nc
Assume the domain of the monotonically increasing longitude coordinate
lon
is 0 < lon < 360.
Here, lon
is an example of a wrapped coordinate.
ncks
will extract a hyperslab which crosses the Greenwich
meridian simply by specifying the westernmost longitude as min and
the easternmost longitude as max, as follows:
ncks -d lon,260.0,45.0 in.nc out.nc
For more details See Wrapped Coordinates.
ncpdq
netCDF Permute Dimensions Quickly ¶SYNTAX
ncpdq [-3] [-4] [-5] [-6] [-7] [-A] [-a [-]dim[,...]] [-C] [-c] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride]]] [-F] [--fl_fmt fl_fmt] [-G gpe_dsc] [-g grp[,...]] [--glb ...] [-H] [-h] [--hdf] [--hdr_pad nbr] [--hpss] [-L dfl_lvl] [-l path] [-M pck_map] [--mrd] [--no_cll_msr] [--no_frm_trm] [--no_tmp_fl] [-O] [-o output-file] [-P pck_plc] [-p path] [--qnt ...] [--qnt_alg alg_nm] [-R] [-r] [--ram_all] [-t thr_nbr] [-U] [--unn] [-v var[,...]] [-X ...] [-x] input-file [output-file]
DESCRIPTION
ncpdq
performs one (not both) of two distinct functions
per invocation: packing or dimension permutation.
Without any options, ncpdq
will pack data with default
parameters.
The ‘-a’ option tells ncpdq
to permute dimensions
accordingly, otherwise ncpdq
will pack data as
instructed/controlled by the ‘-M’ and ‘-P’ options.
ncpdq
is optimized to perform these actions in a parallel
fashion with a minimum of time and memory.
The pdq may stand for “Permute Dimensions Quickly”,
“Pack Data Quietly”, “Pillory Dan Quayle”, or other silly uses.
The ncpdq
packing (and unpacking) algorithms are described
in Methods and functions, and are also implemented in
ncap2
.
ncpdq
extends the functionality of these algorithms by
providing high level control of the packing policy so that
users can consistently pack (and unpack) entire files with one command.
The user specifies the desired packing policy with the ‘-P’ switch
(or its long option equivalents, ‘--pck_plc’ and
‘--pack_policy’) and its pck_plc argument.
Four packing policies are currently implemented:
Definition: Pack unpacked variables, re-pack packed variables
Alternate invocation: ncpack
pck_plc key values: ‘all_new’, ‘pck_all_new_att’
Definition: Pack unpacked variables, copy packed variables
Alternate invocation: none
pck_plc key values: ‘all_xst’, ‘pck_all_xst_att’
Definition: Re-pack packed variables, copy unpacked variables
Alternate invocation: none
pck_plc key values: ‘xst_new’, ‘pck_xst_new_att’
Definition: Unpack packed variables, copy unpacked variables
Alternate invocation: ncunpack
pck_plc key values: ‘upk’, ‘unpack’, ‘pck_upk’
Equivalent key values are fully interchangeable. Multiple equivalent options are provided to satisfy disparate needs and tastes of NCO users working with scripts and from the command line.
Regardless of the packing policy selected, ncpdq
no longer (as of NCO version 4.0.4 in October, 2010)
packs coordinate variables, or the special variables, weights,
and other grid properties described in CF Conventions.
Prior ncpdq
versions treated coordinate variables and
grid properties no differently from other variables.
However, coordinate variables are one-dimensional, so packing saves
little space on large files, and the resulting files are difficult for
humans to read.
ncpdq
will, of course, unpack coordinate variables and
weights, for example, in case some other, non-NCO software
packed them in the first place.
Concurrently, Gaussian and area weights and other grid properties are
often used to derive fields in re-inflated (unpacked) files, so packing
such grid properties causes a considerable loss of precision in
downstream data processing.
If users express strong wishes to pack grid properties, we will
implement new packing policies.
An immediate workaround for those needing to pack grid properties
now, is to use the ncap2
packing functions or to rename the
grid properties prior to calling ncpdq
.
We welcome your feedback.
To reduce required memorization of these complex policy switches,
ncpdq
may also be invoked via a synonym or with switches
that imply a particular policy.
ncpack
is a synonym for ncpdq
and behaves the same
in all respects.
Both ncpdq
and ncpack
assume a default packing
policy request of ‘all_new’.
Hence ncpack
may be invoked without any ‘-P’ switch,
unlike ncpdq
.
Similarly, ncunpack
is a synonym for ncpdq
except that ncpack
implicitly assumes a request to unpack,
i.e., ‘-P pck_upk’.
Finally, the ncpdq
‘-U’ switch (or its long option
equivalents ‘--unpack’) requires no argument.
It simply requests unpacking.
Given the menagerie of synonyms, equivalent options, and implied
options, a short list of some equivalent commands is appropriate.
The following commands are equivalent for packing:
ncpdq -P all_new
, ncpdq --pck_plc=all_new
, and
ncpack
.
The following commands are equivalent for unpacking:
ncpdq -P upk
, ncpdq -U
, ncpdq --pck_plc=unpack
,
and ncunpack
.
Equivalent commands for other packing policies, e.g., ‘all_xst’,
follow by analogy.
Note that ncpdq
synonyms are subject to the same constraints
and recommendations discussed in the secion on ncbo
synonyms
(see ncbo
netCDF Binary Operator).
That is, symbolic links must exist from the synonym to ncpdq
,
or else the user must define an alias
.
The ncpdq
packing algorithms must know to which type
particular types of input variables are to be packed.
The correspondence between the input variable type and the output,
packed type, is called the packing map.
The user specifies the desired packing map with the ‘-M’ switch
(or its long option equivalents, ‘--pck_map’ and
‘--map’) and its pck_map argument.
Six packing maps are currently implemented:
NC_SHORT
[default]Definition: Pack floating precision types to NC_SHORT
Map: Pack [NC_DOUBLE
,NC_FLOAT
] to NC_SHORT
Types copied instead of packed: [NC_INT64
,NC_UINT64
,NC_INT
,NC_UINT
,NC_SHORT
,NC_USHORT
,NC_CHAR
,NC_BYTE
,NC_UBYTE
]
pck_map key values: ‘flt_sht’, ‘pck_map_flt_sht’
NC_BYTE
Definition: Pack floating precision types to NC_BYTE
Map: Pack [NC_DOUBLE
,NC_FLOAT
] to NC_BYTE
Types copied instead of packed: [NC_INT64
,NC_UINT64
,NC_INT
,NC_UINT
,NC_SHORT
,NC_USHORT
,NC_CHAR
,NC_BYTE
,NC_UBYTE
]
pck_map key values: ‘flt_byt’, ‘pck_map_flt_byt’
NC_SHORT
Definition: Pack higher precision types to NC_SHORT
Map:
Pack [NC_DOUBLE
,NC_FLOAT
,NC_INT64
,NC_UINT64
,NC_INT
,NC_UINT
] to NC_SHORT
Types copied instead of packed: [NC_SHORT
,NC_USHORT
,NC_CHAR
,NC_BYTE
,NC_UBYTE
]
pck_map key values: ‘hgh_sht’, ‘pck_map_hgh_sht’
NC_BYTE
Definition: Pack higher precision types to NC_BYTE
Map:
Pack [NC_DOUBLE
,NC_FLOAT
,NC_INT64
,NC_UINT64
,NC_INT
,NC_UINT
,NC_SHORT
,NC_USHORT
] to NC_BYTE
Types copied instead of packed: [NC_CHAR
,NC_BYTE
,NC_UBYTE
]
pck_map key values: ‘hgh_byt’, ‘pck_map_hgh_byt’
Definition: Pack each type to type of next lesser size
Map: Pack [NC_DOUBLE
,NC_INT64
,NC_UINT64
] to NC_INT
.
Pack [NC_FLOAT
,NC_INT
,NC_UINT
] to NC_SHORT
.
Pack [NC_SHORT
,NC_USHORT
] to NC_BYTE
.
Types copied instead of packed: [NC_CHAR
,NC_BYTE
,NC_UBYTE
]
pck_map key values: ‘nxt_lsr’, ‘pck_map_nxt_lsr’
Definition: Demote (via type-conversion, not packing) double-precision variables to single-precision
Map: Demote NC_DOUBLE
to NC_FLOAT
.
Types copied instead of packed: All except NC_DOUBLE
pck_map key values: ‘dbl_flt’, ‘pck_map_dbl_flt’, ‘dbl_sgl’, ‘pck_map_dbl_sgl’
The dbl_flt
map was introduced in NCO version 4.7.7 (September, 2018).
Definition: Promote (via type-conversion, not packing) single-precision variables to double-precision
Map: Promote NC_FLOAT
to NC_DOUBLE
.
Types copied instead of packed: All except NC_FLOAT
pck_map key values: ‘flt_dbl’, ‘pck_map_flt_dbl’, ‘sgl_dbl’, ‘pck_map_sgl_dbl’
The flt_dbl
map was introduced in NCO version 4.9.1
(December, 2019).
The default ‘all_new’ packing policy with the default
‘flt_sht’ packing map reduces the typical NC_FLOAT
-dominated
file size by about 50%.
‘flt_byt’ packing reduces an NC_DOUBLE
-dominated file by
about 87%.
The “packing map” ‘pck_map_dbl_flt’ does a pure type-conversion
(no packing is involved) from NC_DOUBLE
to NC_FLOAT
.
The resulting variables are not packed, they are just single-precision
floating point instead of double-precision floating point.
This operation is irreversible, and no attributes are created, modified,
or deleted for these variables.
Note that coordinate and coordinate-like variables will not be demoted
as best practices dictate maintaining coordinates in the highest
possible precision.
The “packing map” ‘pck_map_flt_dbl’ does a pure type-conversion
(no packing is involved) from NC_FLOAT
to NC_DOUBLE
.
The resulting variables are not packed, they are just double-precision
floating point instead of single-precision floating point.
This operation is irreversible, and no attributes are created, modified,
or deleted for these variables.
All single-precision variables, including coordinates, are promoted.
Note that this map can double the size of a dataset.
The netCDF packing algorithm (see Methods and functions) is
lossy—once packed, the exact original data cannot be recovered without
a full backup.
Hence users should be aware of some packing caveats:
First, the interaction of packing and data equal to the
_FillValue is complex.
Test the _FillValue
behavior by performing a pack/unpack cycle
to ensure data that are missing stay missing and data that are
not misssing do not join the Air National Guard and go missing.
This may lead you to elect a new _FillValue.
Second, ncpdq
actually allows packing into NC_CHAR
(with,
e.g., ‘flt_chr’).
However, the intrinsic conversion of signed char
to higher
precision types is tricky for values equal to zero, i.e., for
NUL
.
Hence packing to NC_CHAR
is not documented or advertised.
Pack into NC_BYTE
(with, e.g., ‘flt_byt’) instead.
ncpdq
re-shapes variables in input-file by re-ordering
and/or reversing dimensions specified in the dimension list.
The dimension list is a whitespace-free, comma separated list of
dimension names, optionally prefixed by negative signs, that follows the
‘-a’ (or long options ‘--arrange’, ‘--permute’,
‘--re-order’, or ‘--rdr’) switch.
To re-order variables by a subset of their dimensions, specify
these dimensions in a comma-separated list following ‘-a’, e.g.,
‘-a lon,lat’.
To reverse a dimension, prefix its name with a negative sign in the
dimension list, e.g., ‘-a -lat’.
Re-ordering and reversal may be performed simultaneously, e.g.,
‘-a lon,-lat,time,-lev’.
Users may specify any permutation of dimensions, including
permutations which change the record dimension identity.
The record dimension is re-ordered like any other dimension.
This unique ncpdq
capability makes it possible to concatenate
files along any dimension.
See Concatenators ncrcat
and ncecat
for a detailed example.
The record dimension is always the most slowly varying dimension in a
record variable (see C and Fortran Index conventions).
The specified re-ordering fails if it requires creating more than
one record dimension amongst all the output variables
82.
Two special cases of dimension re-ordering and reversal deserve special mention. First, it may be desirable to completely reverse the storage order of a variable. To do this, include all the variable’s dimensions in the dimension re-order list in their original order, and prefix each dimension name with the negative sign. Second, it may useful to transpose a variable’s storage order, e.g., from C to Fortran data storage order (see C and Fortran Index conventions). To do this, include all the variable’s dimensions in the dimension re-order list in reversed order. Explicit examples of these two techniques appear below.
EXAMPLES
Pack and unpack all variables in file in.nc and store the results in out.nc:
ncpdq in.nc out.nc # Same as ncpack in.nc out.nc ncpdq -P all_new -M flt_sht in.nc out.nc # Defaults ncpdq -P all_xst in.nc out.nc ncpdq -P upk in.nc out.nc # Same as ncunpack in.nc out.nc ncpdq -U in.nc out.nc # Same as ncunpack in.nc out.nc
The first two commands pack any unpacked variable in the input file. They also unpack and then re-pack every packed variable. The third command only packs unpacked variables in the input file. If a variable is already packed, the third command copies it unchanged to the output file. The fourth and fifth commands unpack any packed variables. If a variable is not packed, the third command copies it unchanged.
The previous examples all utilized the default packing map. Suppose you wish to archive all data that are currently unpacked into a form which only preserves 256 distinct values. Then you could specify the packing map pck_map as ‘hgh_byt’ and the packing policy pck_plc as ‘all_xst’:
ncpdq -P all_xst -M hgh_byt in.nc out.nc
Many different packing maps may be used to construct a given file by performing the packing on subsets of variables (e.g., with ‘-v’) and using the append feature with ‘-A’ (see Appending Variables).
Users may wish to unpack data packed with the HDF convention, and then re-pack it with the netCDF convention so that all their datasets use the same packing convention prior to intercomparison.
# One-step procedure: For NCO 4.4.0+, netCDF 4.3.1+ # 1. Convert, unpack, and repack HDF file into netCDF file ncpdq --hdf_upk -P xst_new modis.hdf modis.nc # HDF4 files ncpdq --hdf_upk -P xst_new modis.h5 modis.nc # HDF5 files # One-step procedure: For NCO 4.3.7--4.3.9 # 1. Convert, unpack, and repack HDF file into netCDF file ncpdq --hdf4 --hdf_upk -P xst_new modis.hdf modis.nc # HDF4 ncpdq --hdf_upk -P xst_new modis.h5 modis.nc # HDF5 # Two-step procedure: For NCO 4.3.6 and earlier # 1. Convert HDF file to netCDF file ncl_convert2nc modis.hdf # 2. Unpack using HDF convention and repack using netCDF convention ncpdq --hdf_upk -P xst_new modis.nc modis.nc
NCO now
83
automatically detects HDF4 files.
In this case it produces an output file modis.nc which preserves
the HDF packing used in the input file.
The ncpdq
command first unpacks all packed variables using the
HDF unpacking algorithm (as specified by ‘--hdf_upk’),
and then repacks those same variables using the netCDF algorithm
(because that is the only algorithm NCO packs with).
As described above the ‘--P xst_new’ packing policy only repacks
variables that are already packed.
Not-packed variables are copied directly without loss of precision
84.
Re-order file in.nc so that the dimension lon
always
precedes the dimension lat
and store the results in
out.nc:
ncpdq -a lon,lat in.nc out.nc ncpdq -v three_dmn_var -a lon,lat in.nc out.nc
The first command re-orders every variable in the input file.
The second command extracts and re-orders only the variable
three_dmn_var
.
Suppose the dimension lat
represents latitude and monotonically
increases increases from south to north.
Reversing the lat
dimension means re-ordering the data so that
latitude values decrease monotonically from north to south.
Accomplish this with
% ncpdq -a -lat in.nc out.nc % ncks --trd -C -v lat in.nc lat[0]=-90 lat[1]=90 % ncks --trd -C -v lat out.nc lat[0]=90 lat[1]=-90
This operation reversed the latitude dimension of all variables. Whitespace immediately preceding the negative sign that specifies dimension reversal may be dangerous. Quotes and long options can help protect negative signs that should indicate dimension reversal from being interpreted by the shell as dashes that indicate new command line switches.
ncpdq -a -lat in.nc out.nc # Dangerous? Whitespace before "-lat" ncpdq -a '-lat' in.nc out.nc # OK. Quotes protect "-" in "-lat" ncpdq -a lon,-lat in.nc out.nc # OK. No whitespace before "-" ncpdq --rdr=-lat in.nc out.nc # Preferred. Uses "=" not whitespace
To create the mathematical transpose of a variable, place all its
dimensions in the dimension re-order list in reversed order.
This example creates the transpose of three_dmn_var
:
% ncpdq -a lon,lev,lat -v three_dmn_var in.nc out.nc % ncks --trd -C -v three_dmn_var in.nc lat[0]=-90 lev[0]=100 lon[0]=0 three_dmn_var[0]=0 lat[0]=-90 lev[0]=100 lon[1]=90 three_dmn_var[1]=1 lat[0]=-90 lev[0]=100 lon[2]=180 three_dmn_var[2]=2 ... lat[1]=90 lev[2]=1000 lon[1]=90 three_dmn_var[21]=21 lat[1]=90 lev[2]=1000 lon[2]=180 three_dmn_var[22]=22 lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23 % ncks --trd -C -v three_dmn_var out.nc lon[0]=0 lev[0]=100 lat[0]=-90 three_dmn_var[0]=0 lon[0]=0 lev[0]=100 lat[1]=90 three_dmn_var[1]=12 lon[0]=0 lev[1]=500 lat[0]=-90 three_dmn_var[2]=4 ... lon[3]=270 lev[1]=500 lat[1]=90 three_dmn_var[21]=19 lon[3]=270 lev[2]=1000 lat[0]=-90 three_dmn_var[22]=11 lon[3]=270 lev[2]=1000 lat[1]=90 three_dmn_var[23]=23
To completely reverse the storage order of a variable, include
all its dimensions in the re-order list, each prefixed by a negative
sign.
This example reverses the storage order of three_dmn_var
:
% ncpdq -a -lat,-lev,-lon -v three_dmn_var in.nc out.nc % ncks --trd -C -v three_dmn_var in.nc lat[0]=-90 lev[0]=100 lon[0]=0 three_dmn_var[0]=0 lat[0]=-90 lev[0]=100 lon[1]=90 three_dmn_var[1]=1 lat[0]=-90 lev[0]=100 lon[2]=180 three_dmn_var[2]=2 ... lat[1]=90 lev[2]=1000 lon[1]=90 three_dmn_var[21]=21 lat[1]=90 lev[2]=1000 lon[2]=180 three_dmn_var[22]=22 lat[1]=90 lev[2]=1000 lon[3]=270 three_dmn_var[23]=23 % ncks --trd -C -v three_dmn_var out.nc lat[0]=90 lev[0]=1000 lon[0]=270 three_dmn_var[0]=23 lat[0]=90 lev[0]=1000 lon[1]=180 three_dmn_var[1]=22 lat[0]=90 lev[0]=1000 lon[2]=90 three_dmn_var[2]=21 ... lat[1]=-90 lev[2]=100 lon[1]=180 three_dmn_var[21]=2 lat[1]=-90 lev[2]=100 lon[2]=90 three_dmn_var[22]=1 lat[1]=-90 lev[2]=100 lon[3]=0 three_dmn_var[23]=0
Creating a record dimension named, e.g., time
, in a file which
has no existing record dimension is simple with ncecat
:
ncecat -O -u time in.nc out.nc # Create degenerate record dimension named "time"
Now consider a file with all dimensions, including time
, fixed
(non-record).
Suppose the user wishes to convert time
from a fixed dimension to
a record dimension.
This may be useful, for example, when the user wishes to append
additional time slices to the data.
As of NCO version 4.0.1 (April, 2010) the preferred method for
doing this is with ncks
:
ncks -O --mk_rec_dmn time in.nc out.nc # Change "time" to record dimension
Prior to 4.0.1, the procedure to change an existing fixed dimension into
a record dimension required three separate commands,
ncecat
followed by ncpdq
, and then ncwa
.
The recommended method is now to use ‘ncks --fix_rec_dmn’, yet it
is still instructive to present the original procedure, as it shows how
multiple operators can achieve the same ends by different means:
ncecat -O in.nc out.nc # Add degenerate record dimension named "record" ncpdq -O -a time,record out.nc out.nc # Switch "record" and "time" ncwa -O -a record out.nc out.nc # Remove (degenerate) "record"
The first step creates a degenerate (size equals one) record dimension
named (by default) record
.
The second step swaps the ordering of the dimensions named time
and record
.
Since time
now occupies the position of the first (least rapidly
varying) dimension, it becomes the record dimension.
The dimension named record
is no longer a record dimension.
The third step averages over this degenerate record
dimension.
Averaging over a degenerate dimension does not alter the data.
The ordering of other dimensions in the file (lat
, lon
,
etc.) is immaterial to this procedure.
See ncecat
netCDF Ensemble Concatenator and
ncks
netCDF Kitchen Sink for other methods of
changing variable dimensionality, including the record dimension.
ncra
netCDF Record Averager ¶SYNTAX
ncra [-3] [-4] [-5] [-6] [-7] [-A] [-C] [-c] [--cb y1,y2,m1,m2,tpd] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride][,[subcycle][,[interleave]]]]] [-F] [--fl_fmt fl_fmt] [-G gpe_dsc] [-g grp[,...]] [--glb ...] [-H] [-h] [--hdf] [--hdr_pad nbr] [--hpss] [-L dfl_lvl] [-l path] [--mro] [-N] [-n loop] [--no_cll_msr] [--no_cll_mth] [--no_frm_trm] [--no_tmp_fl] [-O] [-o output-file] [-p path] [--qnt ...] [--qnt_alg alg_nm] [--prm_int] [--prw wgt_arr] [-R] [-r] [--ram_all] [--rec_apn] [--rth_dbl|flt] [-t thr_nbr] [--unn] [-v var[,...]] [-w wgt] [-X ...] [-x] [-y op_typ] [input-files] [output-file]
DESCRIPTION
ncra
computes statistics (including, though not limited to,
averages) of record variables across an arbitrary number of
input-files.
The record dimension is, by default, retained as a degenerate
(size 1) dimension in the output variables.
See Statistics vs Concatenation, for a description of the
distinctions between the various statistics tools and concatenators.
As a multi-file operator, ncra
will read the list of
input-files from stdin
if they are not specified
as positional arguments on the command line
(see Large Numbers of Files).
Input files may vary in size, but each must have a record dimension.
The record coordinate, if any, should be monotonic (or else non-fatal
warnings may be generated).
Hyperslabs of the record dimension which include more than one file
work correctly.
ncra
supports the stride argument to the ‘-d’
hyperslab option (see Hyperslabs) for the record dimension only,
stride is not supported for non-record dimensions.
ncra
always averages coordinate variables (e.g.,
time
) regardless of the arithmetic operation type performed on
non-coordinate variables (see Operation Types).
As of NCO version 4.4.9, released in May, 2015,
ncra
accepts user-specified weights with the ‘-w’
(or long-option equivalent ‘--wgt’, ‘--wgt_var’,
or ‘--weight’) switch.
When no weight is specified, ncra
weights each record (e.g.,
time slice) in the input-files equally.
ncra
does not attempt to see if, say, the time
coordinate is irregularly spaced and thus would require a weighted
average in order to be a true time-average.
Specifying unequal weights is entirely the user’s responsibility.
Weights specified with ‘-w wgt’ may take one of two forms. In the first form, the ‘wgt’ argument is a comma-separated list of values by which to weight each file (recall that files may have multiple timesteps). In this form the number of weights specified must equal the number of files specified in the input file list, or else the program will exit. In the second form, the ‘wgt’ argument is the name of a weighting variable present in every input file. The variable may be a scalar or a one-dimensional record variable. Scalar weights are applied uniformly to the entire file (i.e., this produces the same arithmetic result as supplying the same value as a per-file weight option on the command-line). One-dimensional weights apply to each corresponding record (i.e., per-record weights), and are suitable for dynamically changing timesteps.
By default, any weights specified (whether by value or by variable name) are normalized to unity by dividing each specified weight by the sum of all the weights. This means, for example, that, ‘-w 0.25,0.75’ is equivalent to ‘-w 2.0,6.0’ since both are equal when normalized. This behavior simplifies specifying weights based on countable items. For example, time-weighting monthly averages for March, April, and May to obtain a spring seasonal average can be done with ‘-w 31,30,31’ instead of ‘-w 0.33695652173913043478,0.32608695652173913043,0.33695652173913043478’.
However, sometimes one wishes to use weights in “dot-product mode”,
i.e., multiply by the (non-normalized) weights.
As of NCO version 4.5.2, released in July, 2015,
ncra
accepts the ‘-N’ (or long-option equivalent
‘--no_nrm_by_wgt’) switch that prevents automatic weight
normalization.
When this switch is used, the weights will not be normalized (unless the
user provides them as normalized), and the numerator of the weighted
average will not be divided by the sum of the weights (which is one for
normalized weights).
As of NCO version 4.9.4, released in September, 2020,
ncra
supports the ‘--per_record_weights’ (or
‘--prw’) flag to utilize the command-line weights separately
specified by ‘-w wgt_arr’ (or ‘--wgt wgt_arr’)
for per-record weights instead of per-file-weights, where
wgt_arr is a 1-D array of weights.
This is useful when computing weighted averages with cyclically
varying weights, since the weights given on the command line will be
repeated for the length of the timeseries.
Consider, for example, a CMIP6 timeseries of historical
monthly mean emissions that one wishes to convert to a timeseries of
annual-mean emissions.
One can now weight each month by its number of days via:
ncra --per_record_weights --mro -d time,,,12,12 --wgt \ 31,28,31,30,31,30,31,31,30,31,30,31 ~/monthly.nc ~/annual.nc
Note that the twelve weights will be implicitly repeated throughtout the duration of the input file(s), which in this case may therefore specify an interannual monthly timeseries that is reduced to a timeseries of annual-means in the output.
Bear these exceptions in mind when weighting input:
First, ncra
only applies weights if the arithmetic operation
type is averaging (see Operation Types), i.e., for timeseries mean
and for timeseries mean absolute value.
Weights are never applied for minimization, square-roots, etc.
Second, ncra
never weights coordinate variables (e.g.,
time
) regardless of the weighting performed on non-coordinate
variables.
As of NCO version 4.9.4, released in September, 2020,
ncra
supports the ‘--promote_ints’ (or ‘prm_ints’)
flags to output statistics of integer-valued input variables in
floating-point precision in the output file.
By default, arithmetic operators such as ncra
auto-promote
integers to double-precision prior to arithmetic, then conduct the
arithmetic, then demote the values back to integers for final output.
The final stage (demotion) of this default behavior quantizes the
mantissa of the values and prevents, e.g., retaining the statisitical
means of Boolean (0 or 1-valued) input data as floating point data.
The ‘--promote_ints’ flag eliminates the demotion and causes the
statistical means of integer (NC_BYTE
, NC_SHORT
,
NC_INT
, NC_INT64
) inputs to be output as
single-precision floating point (NC_FLOAT
) variables.
This allows useful arithmetic to be performed on Boolean values stored
in the space-conserving NC_BYTE
(single-byte) format.
ncra --prm_ints in*.nc out.nc
EXAMPLES
Average files 85.nc, 86.nc, … 89.nc along the record dimension, and store the results in 8589.nc:
ncra 85.nc 86.nc 87.nc 88.nc 89.nc 8589.nc ncra 8[56789].nc 8589.nc ncra -n 5,2,1 85.nc 8589.nc
These three methods produce identical answers. See Specifying Input Files, for an explanation of the distinctions between these methods.
Assume the files 85.nc, 86.nc, … 89.nc each contain a record coordinate time of length 12 defined such that the third record in 86.nc contains data from March 1986, etc. NCO knows how to hyperslab the record dimension across files. Thus, to average data from December, 1985 through February, 1986:
ncra -d time,11,13 85.nc 86.nc 87.nc 8512_8602.nc ncra -F -d time,12,14 85.nc 86.nc 87.nc 8512_8602.nc
The file 87.nc is superfluous, but does not cause an error. The ‘-F’ turns on the Fortran (1-based) indexing convention. The following uses the stride option to average all the March temperature data from multiple input files into a single output file
ncra -F -d time,3,,12 -v temperature 85.nc 86.nc 87.nc 858687_03.nc
See Stride, for a description of the stride argument.
Assume the time coordinate is incrementally numbered such that January, 1985 = 1 and December, 1989 = 60. Assuming ‘??’ only expands to the five desired files, the following averages June, 1985–June, 1989:
ncra -d time,6.,54. ??.nc 8506_8906.nc ncra -y max -d time,6.,54. ??.nc 8506_8906.nc
The second example identifies the maximum instead of averaging. See Operation Types, for a description of all available statistical operations.
ncra
includes the powerful subcycle and multi-record output
features (see Subcycle).
This example uses these features to compute and output winter
(DJF) averages for all winter seasons beginning with year
1990 and continuing to the end of the input file:
ncra -O --mro -d time,"1990-12-01",,12,3 in.nc out.nc
The ‘-w wgt’ option weights input data per-file when explicit numeric weights are given on the command-line, or per-timestep when the argument is a record variable that resides in the file:
ncra -w 31,31,28 dec.nc jan.nc feb.nc out.nc # Per-file weights ncra -w delta_t in1.nc in2.nc in3.nc out.nc # Per-timestep weights
The first example weights the input differently per-file to produce correctly weighted winter seasonal mean statistics. The second example weights the input per-timestep to produce correctly weighted mean statistics.
ncrcat
netCDF Record Concatenator ¶SYNTAX
ncrcat [-3] [-4] [-5] [-6] [-7] [-A] [-C] [-c] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride][,[subcycle][,[interleave]]]]] [-F] [--fl_fmt fl_fmt] [-G gpe_dsc] [-g grp[,...]] [--glb ...] [-H] [-h] [--hdr_pad nbr] [--hpss] [-L dfl_lvl] [-l path] [--md5_digest] [-n loop] [--no_tmp_fl] [--no_cll_msr] [--no_frm_trm] [--no_tmp_fl] [-O] [-o output-file] [-p path] [--qnt ...] [--qnt_alg alg_nm] [-R] [-r] [--ram_all] [--rec_apn] [-t thr_nbr] [--unn] [-v var[,...]] [-X ...] [-x] [input-files] [output-file]
DESCRIPTION
ncrcat
concatenates record variables across an arbitrary
number of input-files.
The final record dimension is by default the sum of the lengths of the
record dimensions in the input files.
See Statistics vs Concatenation, for a description of the
distinctions between the various statistics tools and concatenators.
As a multi-file operator, ncrcat
will read the list of
input-files from stdin
if they are not specified
as positional arguments on the command line
(see Large Numbers of Files).
Input files may vary in size, but each must have a record dimension.
The record coordinate, if any, should be monotonic (or else non-fatal
warnings may be generated).
Hyperslabs along the record dimension that span more than one file are
handled correctly.
ncra
supports the stride argument to the ‘-d’
hyperslab option for the record dimension only, stride is not
supported for non-record dimensions.
Concatenating a variable packed with different scales multiple datasets
is beyond the capabilities of ncrcat
(and ncecat
,
the other concatenator (Concatenators ncrcat
and ncecat
).
ncrcat
does not unpack data, it simply copies the data
from the input-files, and the metadata from the first
input-file, to the output-file.
This means that data compressed with a packing convention must use
the identical packing parameters (e.g., scale_factor
and
add_offset
) for a given variable across all input files.
Otherwise the concatenated dataset will not unpack correctly.
The workaround for cases where the packing parameters differ across
input-files requires three steps:
First, unpack the data using ncpdq
.
Second, concatenate the unpacked data using ncrcat
,
Third, re-pack the result with ncpdq
.
ncrcat
applies special rules to ARM convention time
fields (e.g., time_offset
).
See ARM Conventions for a complete description.
EXAMPLES
Concatenate files 85.nc, 86.nc, … 89.nc along the record dimension, and store the results in 8589.nc:
ncrcat 85.nc 86.nc 87.nc 88.nc 89.nc 8589.nc ncrcat 8[56789].nc 8589.nc ncrcat -n 5,2,1 85.nc 8589.nc
These three methods produce identical answers. See Specifying Input Files, for an explanation of the distinctions between these methods.
Assume the files 85.nc, 86.nc, … 89.nc each contain a record coordinate time of length 12 defined such that the third record in 86.nc contains data from March 1986, etc. NCO knows how to hyperslab the record dimension across files. Thus, to concatenate data from December, 1985–February, 1986:
ncrcat -d time,11,13 85.nc 86.nc 87.nc 8512_8602.nc ncrcat -F -d time,12,14 85.nc 86.nc 87.nc 8512_8602.nc
The file 87.nc is superfluous, but does not cause an error.
When ncra
and ncrcat
encounter a file which does
contain any records that meet the specified hyperslab criteria, they
disregard the file and proceed to the next file without failing.
The ‘-F’ turns on the Fortran (1-based) indexing convention.
The following uses the stride option to concatenate all the March temperature data from multiple input files into a single output file
ncrcat -F -d time,3,,12 -v temperature 85.nc 86.nc 87.nc 858687_03.nc
See Stride, for a description of the stride argument.
Assume the time coordinate is incrementally numbered such that
January, 1985 = 1 and December, 1989 = 60.
Assuming ??
only expands to the five desired files, the following
concatenates June, 1985–June, 1989:
ncrcat -d time,6.,54. ??.nc 8506_8906.nc
ncremap
netCDF Remapper ¶SYNTAX
ncremap [-3] [-4] [-5] [-6] [-7] [-a alg_typ] [--a2o] [--add_fll] [--alg_lst] [--area_dgn] [--cmp cmp_sng] [-D dbg_lvl] [-d dst_fl] [--d2f] [--dpt] [--dpt_fl=dpt_fl] [--dt_sng=dt_sng] [--esmf_typ=esmf_typ] [--fl_fmt=fl_fmt] [-G grd_sng] [-g grd_dst] [-I drc_in] [-i input-file] [-j job_nbr] [-L dfl_lvl] [-M] [-m map_fl] [--mpi_nbr=mpi_nbr] [--mpi_pfx=mpi_pfx] [--mpt_mss] [--msh_fl=msh_fl] [--msk_apl] [--msk_dst=msk_dst] [--msk_out=msk_out] [--msk_src=msk_src] [--mss_val=mss_val] [-n nco_opt] [--nm_dst=nm_dst] [--nm_src=nm_src] [--no_add_fll] [--no_cll_msr] [--no_frm_trm] [--no_permute] [--no_stdin] [--no_stg_grd] [-O drc_out] [-o output-file] [-P prc_typ] [-p par_typ] [--pdq=pdq_opt] [--qnt=qnt_opt] [--preserve=prs_stt] [--ps_nm=ps_nm] [-R rgr_opt] [--rgn_dst] [--rgn_src] [--rnr_thr=rnr_thr] [--rrg_bb_wesn=bb_wesn] [--rrg_dat_glb=dat_glb] [--rrg_grd_glb=grd_glb] [--rrg_grd_rgn=grd_rgn] [--rrg_rnm_sng=rnm_sng] [-s grd_src] [--sgs_frc=sgs_frc] [--sgs_msk=sgs_msk] [--sgs_nrm=sgs_nrm] [--skl=skl-file] [--stdin] [-T drc_tmp] [-t thr_nbr] [-U] [-u unq_sfx] [--ugrid=ugrid-file] [--uio] [-V rgr_var] [-v var_lst[,...]] [--version] [--vrb=vrb_lvl] [--vrt_in=vrt_fl] [--vrt_out=vrt_fl] [--vrt_nm=vrt_nm] [--vrt_ntp=vrt_ntp] [--vrt_xtr=vrt_xtr] [-W wgt_opt] [-w wgt_cmd] [-x xtn_lst[,...]] [--xcl_var] [--xtr_nsp=xtr_nsp] [--xtr_xpn=xtr_xpn] [input-files] [output-file]
DESCRIPTION
ncremap
remaps the data file(s) in input-file, in
drc_in, or piped through standard input, to the horizontal grid
specified by (in descending order of precedence) map_fl,
grd_dst, or dst_fl and stores the result in
output-file(s).
If a vertical grid vrt_fl is provided, ncremap
will
(also) vertically interpolate the input file(s) to that grid.
When no input-file is provided, ncremap
operates in
“map-only” mode where it exits after producing an annotated map-file.
ncremap
was introduced to NCO in version 4.5.4
(December, 2015).
ncremap
is a “super-operator” that orchestrates the
regridding features of several different programs including other
NCO operators.
Under the hood NCO applies pre-computed remapping weights or,
when necessary, generates and infers grids, generates remapping
weights itself or calls external programs to generate the weights,
and then applies the weights (i.e., regrids).
Unlike the rest of NCO, ncremap
and
ncclimo
are shell scripts, not compiled binaries85.
As of NCO 4.9.2 (February, 2020), the ncclimo
and ncremap
scripts export the environment variable
HDF5_USE_FILE_LOCKING
with a value of FALSE
.
This prevents failures of these operators that can occur with some
versions of the underlying HDF library that attempt to lock files
on file systems that cannot or do not support it.
ncremap
wraps the underlying regridder (ncks
) and
external executables to produce a friendly interface to regridding.
Without any external dependencies, ncremap
applies weights
from a pre-exisiting map-file to a source data file to produce a
regridded dataset.
Source and destination datasets may be on any Swath, Curvilinear,
Rectangular, or Unstructured Data (SCRUD) grid.
ncremap
will also use its own algorithms or, when requested,
external programs ESMF’s ESMF_RegridWeightGen
(ERWG) or
MOAB’s
mbconvert
/mbpart
/mbtempest
,
TempestRemap’s
GenerateOverlapMesh
/GenerateOfflineMap
) to
generate weights and mapfiles.
In order to use the weight-generation options, either invoke an
internal NCO weight-generation algorithm (e.g.,
‘--alg_typ=nco’), or ensure that the desired external
weight-generation package is installed and on your $PATH
.
The recommended way to obtain ERWG is as distributed in
binary format.
Many NCO users already have NCL on their
system(s), and NCL usually comes with ERWG.
Since about June, 2016, the Conda NCO package will also
install ERWG
86.
Then be sure the directory containing the ERWG executable is
on your $PATH
before using ncremap
.
As a fallback, ERWG may also be installed from source:
https://earthsystemcog.org/projects/esmf/download_last_public.
ncremap
can also generate and utilize mapfiles created by
TempestRemap,
https://github.com/ClimateGlobalChange/tempestremap.
Until about April, 2019, TempestRemap had to be built from source
because there were no binary distributions of it.
As of NCO version 4.8.0, released in May, 2019, the
Conda NCO package automatically installs the new
TempestRemap Conda package so building from source is not necessary.
Please contact those projects for support on building and installing
their software, which makes ncremap
more functional and
user-friendly.
As of NCO version 5.0.2 from September, 2021,
ncremap
users can also use the MOAB regridding
toolchain.
MOAB and ERWG perform best in an MPI
environment.
One can easily obtain such an environment with Conda
87.
Please ensure you have the latest version of ERWG,
MOAB, and/or TempestRemap before reporting any related
problems to NCO.
As mentioned above, ncremap
orchestrates the regridding
features of several different programs.
ncremap
runs most quickly when it is supplied with a
pre-computed mapfile.
However, ncremap
will also (call other programs to) compute
mapfiles when necessary and when given sufficient grid information.
Thus it is helpful to understand when ncremap
will and will
not internally generate a mapfile.
Supplying input data files and a pre-computed mapfile without
any other grid information causes ncremap
to regrid the data
files without first pausing to internally generate a mapfile.
On the other hand, supplying any grid information (i.e., using any of
the ‘-d’, ‘-G’, ‘-g’, or ‘-s’ switches described
below), causes ncremap
to internally (re-)generate the
mapfile by combining the supplied and inferred grid information.
A generated mapfile is given a default name unless a user-specified name
is supplied with ‘-m map_fl’.
ncremap
¶Most people ultimately use ncremap
to regrid data, yet not all
data can or should be regridded in the sense of applying a sparse-matrix
of weights to an input field to produce and output field.
Certain fields (e.g., the longitude coordinate) specify the grid.
These fields must be provided in order to compute the weights that are
used to regrid.
The regridded usually copies these fields “as is” directly into
regridded files, where they describe the destination grid, and replace
or supercede the source grid information.
Other fields are extensive grid properties (e.g., the number of cells
adjacent to a given cell) that may apply only to the source (not the
destination) grid, or be too difficult to re-compute for the destination
grid.
ncremap
contains an internal database of fields that it will
not propagate or regrid.
First are variables with names identical to the coordinate names found
in an ever-growing collection of publicly available geoscience datasets
(CMIP, NASA, etc.):
area
, gridcell_area
, gw
, LAT
, lat
,
Latitude
, latitude
, nav_lat
,
global_latitude0
, latitude0
, slat
, TLAT
,
ULAT
, XLAT
, XLAT_M
, CO_Latitude
,
S1_Latitude
, lat_bnds
, lat_vertices
,
latt_bounds
, latu_bounds
, latitude_bnds
,
LatitudeCornerpoints
, bounds_lat
, LON
, lon
,
Longitude
, longitude
, nav_lon
,
global_longitude0
, longitude0
, slon
, TLON
,
TLONG
, ULON
, ULONG
, XLONG
, XLONG_M
,
CO_Longitude
, S1_Longitude
, lon_bnds
,
lon_vertices
, lont_bounds
, lonu_bounds
,
longitude_bnds
, LongitudeCornerpoints
, bounds_lon
,
and w_stag
.
Files produced by MPAS models may contain these variables that
will not be regridded:
angleEdge
, areaTriangle
, cellsOnCell
,
cellsOnEdge
, cellsOnVertex
, dcEdge
, dvEdge
,
edgeMask
, edgesOnCell
, edgesOnEdge
,
edgesOnVertex
, indexToCellID
, indexToEdgeID
,
indexToVertexID
, kiteAreasOnVertex
, latCell
,
latEdge
, latVertex
, lonCell
, lonEdge
,
lonVertex
, maxLevelEdgeTop
, meshDensity
,
nEdgesOnCell
, nEdgesOnEdge
, vertexMask
,
verticesOnCell
, verticesOnEdge
, weightsOnEdge
,
xEdge
, yEdge
, zEdge
, xVertex
,
yVertex
, and zVertex
.
Most of these fields that ncremap
will not regrid are also
fields that NCO size-and-rank-preserving operators will not
modify, as described in CF Conventions.
ncremap
¶The following summarizes features unique to ncremap
.
Features common to many operators are described in
Shared Features.
--alg_typ
, --algorithm
, --regrid_algorithm
)’ ¶Specifies the interpolation algorithm for weight-generation for use by
ESMF_RegridWeightGen
(ERWG), MOAB,
NCO, and/or TempestRemap.
ncremap
unbundles this algorithm choice from the rest of
the weight-generator invocation syntax because users more frequently
change interpolation algorithms than other options
(that can be changed with ‘-W wgt_opt’).
ncremap
can invoke all seven ERWG weight
generation algorithms, one NCO algorithm, and eight
TempestRemap algorithms (with both TR and MOAB).
The seven ERWG weight generation algorithms are:
bilinear
(acceptable abbreviations are: esmfbilin
(preferred), bilin
, blin
, bln
),
conserve
(or esmfaave
(preferred), conservative
, cns
, c1
, or aave
),
conserve2nd
(or conservative2nd
, c2
, or c2nd
)
(NCO supports conserve2nd
as of version 4.7.4 (April, 2018)),
nearestdtos
(or nds
or dtos
or ndtos
),
neareststod
(or nsd
or stod
or nstod
),
and patch
(or pch
or patc
).
See ERWG documentation
here
for detailed descriptions of ERWG algorithms.
ncremap
implements its own internal weight-generation
algorithm as of NCO version 4.8.0 (May, 2019).
The first NCO-native algorithm is a first-order conservative
algorithm ncoaave
that competes well in accuracy with similar
algorithms (e.g., ERWG’s conservative algorithm esmfaave
).
This algorithm is built-in to NCO and requires no external
software so it is NCO’s default weight generation algorithm.
It works well for everyday use.
The algorithm may also be explicitly invoked with nco_con
(or
nco_cns
, nco_conservative
, or simply nco
).
As of NCO version 4.9.4 (September, 2019) ncremap
supports a second internal weight-generation algorithm based on
inverse-distance-weighted (IDW) interpolation/extrapolation.
IDW is similar to the ERWG nearestidavg
extrapolation alorithm, and accepts the same two parameters as input:
‘--xtr_xpn xtr_xpn’ sets the (absolute value of) the
exponent used in inverse distance weighting (default is 2.0), and
‘--xtr_nsp xtr_nsp’ sets the number of source points used
in the extrapolation (default is 8).
ncremap
applies NCO
’s IDW to the entire
destination grid, not just to points with missing/masked values,
whereas ERWG uses distance-weighted-extrapolation
(DWE) solely for extrapolation to missing data points.
Thus NCO
’s IDW is more often used as an
alternative to bilinear interpolation since it interpolates between
known regions and extrapolates to unknown regions.
ncremap --alg_typ=nco_idw -s src.nc -d dst.nc -m map.nc ncremap -a nco_idw --xtr_xpn=1.0 -s src.nc -d dst.nc -m map.nc ncremap -a nco_idw --xtr_nsp=1 -s src.nc -d dst.nc -m map.nc
ncremap
can invoke eight preconfigured TempestRemap
weight-generation algorithms, and one generic algorithm
(tempest
) for which users should provide their own
options.
As of NCO version 4.7.2 (January, 2018), ncremap
implemented the six E3SM-recommended TempestRemap mapping
algorithms between FV and SE flux, state, and
other variables.
ncremap
originated some (we hope) common-sense names
for these algorithms (se2fv_flx
, se2fv_stt
,
se2fv_alt
, fv2se_flx
, fv2se_stt
, and
fv2se_alt
), and also allows more mathematically precise
synonyms (shown below).
As of NCO version 4.9.0 (December, 2019), ncremap
added two further boutique mappings (fv2fv_flx
and
fv2fv_stt
).
As of NCO version 5.1.9 (November, 2023), ncremap
added support for two brand new TempestRemap bilinear interpolation
algorithms for FV grids.
These are (trbilin
for traditional bilinear interpolation,
and trintbilin
), for integrated bilinear or barycentric
interpolation.
As of NCO version 5.2.0 (February, 2024), ncremap
added support for trfv2
, a new second-order conservative
algorithm.
These newer TempestRemap algorithms are briefly described at
https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/1217757434/Mapping+file+algorithms+and+naming+convention.
The ‘-a tempest’ algorithm can be specified with the precise
TempestRemap options as arguments to the ‘-W’ (or
‘--wgt_opt’) option.
Support for the named algorithms requires TempestRemap version 2.0.0
or later (some option combinations fail with earlier versions).
The MOAB algorithms are identical TempestRemap algorithms
88.
Use the same algorithm names to select them.
Passing the --mpi_nbr
option to ncremap
causes it
to invoke the MOAB toolchain to compute weights for any
TempestRemap algorithm (otherwise the TR toolchain is
used).
Generate and use the recommended weights to remap fluxes from FV to FV grids, for example, with
ncremap -a traave --src_grd=src.g --dst_grd=dst.nc -m map.nc ncremap -m map.nc in.nc out.nc
This causes ncremap
to automatically invoke TempestRemap with
the boutique options
‘--in_type fv --in_np 1 --out_type fv --out_np 1’
that are recommended by E3SM for conservative and monotone
remapping of fluxes.
Invoke MOAB to compute these weights by adding the
‘--mpi_nbr=mpi_nbr’ option:
ncremap --mpi_nbr=8 -a traave --src_grd=src.g --dst_grd=dst.nc -m map.nc
This causes ncremap
to automatically invoke multiple
components of the MOAB toolchain:
mbconvert -B -o PARALLEL=WRITE_PART -O PARALLEL=BCAST_DELETE \ -O PARTITION=TRIVIAL -O PARALLEL_RESOLVE_SHARED_ENTS \ "src.g" "src.h5m" mbconvert -B -o PARALLEL=WRITE_PART -O PARALLEL=BCAST_DELETE \ -O PARTITION=TRIVIAL -O PARALLEL_RESOLVE_SHARED_ENTS \ "dst.nc" "dst.h5m" mbpart 8 --zoltan RCB "src.h5m" "src_8p.h5m" mbpart 8 --zoltan RCB --recompute_rcb_box --scale_sphere \ --project_on_sphere 2 "dst.h5m" "dst_8p.h5m" mpirun -n 8 mbtempest --type 5 --weights \ --load "src_8p.h5m" --load "dst_8p.h5m" \ --method fv --order 1 --method fv --order 1 \ --file "map.nc" ncatted -O --gaa <command lines> map.nc map.nc
The MOAB toolchain should produce a map-file identical, to
rounding precision, to one produced by TR.
When speed matters (i.e., large grids), and the algorithm is supported
(e.g., traave
), invoke MOAB, otherwise invoke
TR.
TempestRemap options have the following meanings:
mono
specifies a monotone remapping, i.e., one that does not
generate any new extrema in the field variables.
cgll
indicates the input or output are represented by a
continuous Galerkin method on Gauss-Lobatto-Legendre nodes.
This is appropriate for spectral element datasets.
(TempestRemap also supports, although NCO does not invoke,
the dgll
option for a discontinuous Galerkin method on
Gauss-Lobatto-Legendre nodes.)
It is equivalent to, yet simpler to remember and to invoke than
ncremap -a tempest --src_grd=se.g --dst_grd=fv.nc -m map.nc \ -W '--in_type cgll --in_np 4 --out_type fv --mono'
Specifying ‘-a tempest’ without additional options in the ‘-W’ clause causes TempestRemap to employ defaults. The default configuration requires both input and output grids to be FV, and produces a conservative, non-monotonic mapping. The ‘-a traave’ option described below may produce more desirable results than this default for many users. Using ‘-a tempest’ alone without other options for spectral element grids will lead to undefined and likely unintentional results. In other words, ‘-a tempest’ is intended to be used in conjunction with a ‘-W’ option clause to supply your own combination of TempestRemap options that does not duplicate one of the boutique option collections that already has its own name.
The full list of supported canonical algorithm names, their synonyms,
and boutique options passed to GenerateOfflineMap
or to
mbtempest
follow.
Caveat lector:
As of September, 2021 MOAB-generated weights are only
trustworthy for the |
traave
(synonyms fv2fv_flx
, fv2fv_mono
, conservative_monotone_fv2fv
),TR options: ‘--in_type fv --in_np 1 --out_type fv --out_np 1’
MOAB options: ‘--method fv --order 1 --method fv --order 1’
trbilin
(no synonyms),TR options: ‘--in_type fv --out_type fv --method bilin’
MOAB options: ‘--method fv --method fv --order 1 --order 1 --fvmethod bilin’
trintbilin
(no synonyms),TR options: ‘--in_type fv --out_type fv --method intbilin’
MOAB options: ‘--method fv --method fv --order 1 --order 1 --fvmethod intbilin’
trfv2
(synonyms trfvnp2
),TR options: ‘--in_type fv --in_np 2 --out_type fv --out_np 1 --method normalize’
MOAB options: ‘--method fv --order 2 --method fv --order 1 --fvmethod normalize’
se2fv_flx
(synonyms mono_se2fv
, conservative_monotone_se2fv
)TR options: ‘--in_type cgll --in_np 4 --out_type fv --mono’
MOAB options: ‘--method cgll --order 4 --global_id GLOBAL_DOFS --method fv --monotonic 1 --global_id GLOBAL_ID’
fv2se_flx
(synonyms monotr_fv2se
, conservative_monotone_fv2se
),TR options: ‘--in_type cgll --in_np 4 --out_type fv --mono’.
For fv2se_flx
the weights are generated with options identical to
se2fv_flx
, and then the transpose of the resulting weight matrix
is employed.
MOAB options: ‘--method cgll --order 4 --method fv --monotonic 1’
se2fv_stt
(synonyms highorder_se2fv
, accurate_conservative_nonmonotone_se2fv
),TR options: ‘--in_type cgll --in_np 4 --out_type fv’
MOAB options: ‘--method cgll --order 4 --method fv’
fv2se_stt
(synonyms highorder_fv2se
, accurate_conservative_nonmonotone_fv2se
),TR options: ‘--in_type fv --in_np 2 --out_type cgll --out_np 4’
MOAB options: ‘--method fv --order 2 --method cgll --order 4’
se2fv_alt
(synonyms intbilin_se2fv
, accurate_monotone_nonconservative_se2fv
),TR options: ‘--in_type cgll --in_np 4 --out_type fv --method mono3 --noconserve’
MOAB options: ‘--method cgll --order 4 --method fv --monotonic 3 --noconserve’
fv2se_alt
(synonyms mono_fv2se
, conservative_monotone_fv2se_alt
),TR options: ‘--in_type fv --in_np 1 --out_type cgll --out_np 4 --mono’
MOAB options: ‘--method fv --order 1 --method cgll --order 4 --monotonic 1’
se2se
(synonyms cs2cs
, conservative_monotone_se2se
),TR options: ‘--in_type cgll --in_np 4 --out_type cgll --out_np 4 --mono’
MOAB options: ‘--method cgll --order 4 --method cgll --order 4 --monotonic 1’
fv2fv
(synonyms rll2rll
),TR options: ‘--in_type fv --in_np 2 --out_type fv’
MOAB options: ‘--method fv --order 2 --method fv’
fv2fv_flx
(synonyms traave
fv2fv_mono
, conservative_monotone_fv2fv
),TR options: ‘--in_type fv --in_np 1 --out_type fv --out_np 1’
MOAB options: ‘--method fv --order 1 --method fv --order 1’
fv2fv_stt
(synonyms fv2fv_highorder
, accurate_conservative_nonmonotone_fv2fv
),TR options: ‘--in_type fv --in_np 2 --out_type fv’
MOAB options: ‘--method fv --order 2 --method fv’
Thus these boutique options are specialized for SE grids with
fourth order resolution (np = 4).
Full documentation of the E3SM-recommended boutique options
for TempestRemap is
here
(may require E3SM-authorization to view).
Let us know if you would like other boutique TempestRemap option sets
added as canonical options for ncremap
.
--a2o
, --atm2ocn
, --b2l
, --big2ltl
, --l2s
, --lrg2sml
)’ ¶Use one of these flags (that take no arguments) to cause TempestRemap to
generate mapping weights from a source grid that has more coverage than
the destination grid, i.e., the destination grid is a subset of the
source.
When computing the intersection of two meshes, TempestRemap uses an
algorithm (in an executable named GenerateOverlapMesh
) that
expects the mesh with less coverage to be the first grid, and the
grid with greater coverage to be the second, regardless of the mapping
direction.
By default, ncremap
supplies the source grid first and the
destination second, but this order causes GenerateOverlapMesh
(which is agnostic about ordering for grids of equal coverage)
to fail when the source grid covers regions not in the destination grid.
For example, a global atmosphere grid has more coverage than a global
ocean grid, so that remapping from atmosphere-to-ocean would require
invoking the ‘--atm2ocn’ switch:
# Use --a2o to generate weights for "big" to "little" remaps: ncremap --a2o -a se2fv_flx --src_grd=atm_se_grd.nc \ --dst_grd=ocn_fv_grd.nc -m atm2ocn.nc # Otherwise, omit it: ncremap -a fv2se_flx --src_grd=ocn_fv_grd.nc \ --dst_grd=atm_se_grd.nc -m map.nc ncremap -a se2fv_flx --src_grd=atm_se_grd.nc \ --dst_grd=atm_fv_grd.nc -m map.nc # Only necessary when generating, not applying, weights: ncremap -m atm2ocn.nc in.nc out.nc
As shown in the second example above, remapping from global
ocean-to-atmosphere grids does not require (and should not invoke) this
switch.
The third example shows that the switch is only needed when
generating weights, not when applying them.
The switch is never needed (and is ignored) when generating weights
with ERWG (which constructs the intersection mesh with a
different algorithm than TempestRemap).
Attempting to remap a larger source grid to a subset destination grid
without using ‘--a2o’ causes GenerateOverlapMesh
to emit
an error (and a potential workaround) like this:
....Nearest target face 130767 ....ERROR: No overlapping face found ....This may be caused by mesh B being a subset of mesh A ....Try swapping order of mesh A and B, or override with \ --allow_no_overlap ....EXCEPTION (../src/OverlapMesh.cpp, Line 1738) Exiting
The ‘--a2o’ switch and its synonyms are available in
version 4.7.3 (March, 2018) and later.
As of NCO version 4.9.9 (May, 2021), ncremap
automatically transmits the flag ‘--allow_no_overlap’ to
GenerateOverlapMesh
so that regional meshes that do
not completely overlap may be intersected.
This is thought to have no effect on global mappings.
Please let us know if these capabilities do not work for you.
--add_fll
, --add_fill_value
, --fll_mpt
, --fill_empty
)’ ¶Introduced in NCO version 5.0.0 (released June, 2021),
this switch (which takes no argument) causes the regridder to add a
_FillValue
attribute to fields with empty destination cells.
The corresponding --no_add_fll
switches, introduced in
NCO version 5.1.1 (released November, 2022),
do the opposite and prevent the regridder from adding a
_FillValue
attribute to fields with empty destination cells.
Note that --add_fll
adds an explicit _FillValue
metadata
attribute to fields that lack one only if the field contains empty
destination cells (as described below).
This option, by itself, does not directly change the values contained
in empty gridcells.
There are two varieties of empty destination cells: First are those cells with no non-zero weights from the source grids. If all source grid contributions to the a particular cell are zero, then no field will ever be mapped into that cell. For example, if an ocean model source grid only contains ocean gridcells (e.g., like MPAS-Ocean), then all continental interior gridcells in a destination grid will be empty. The second type of empty gridcell can occur in conjunction with sub-gridscale (SGS) fractions. All destination gridcells with SGS fraction equal to zero will always be empty. For example, sea-ice models often employ time-varying SGS fractions that are zero everywhere except where/when sea ice is present. These gridcells are disjoint from continental interior gridcells whose locations can be determined by mapping weights alone.
When a contiguous geophysical field (e.g., air temperature) without a
_FillValue
is mapped to such a destination grid, the empty
destination values are normally set to zero (because no source grid
cells contribute).
However, zero is a valid value for many geophysical fields.
Use this switch to ensure that empty destination gridcells are always
set to _FillValue
.
The default _FillValue
will be used in the output file for
input fields that lack an explicit _FillValue
.
This flag has no effect on fields that have any input values equal
to an explicitly or implicitly defined _FillValue
.
The flag does affect fields that have valid input values everywhere
on the source grid, yet for some reason (e.g., disjoint grids or
zero sub-gridscale fractions) there are unmapped destination
gridcells.
ncremap ... # No renormalization/masking ncremap --sgs_frc=sgs --add_fll ... # Mask cells missing 100% ncremap --rnr=0.1 ... # Mask cells missing > 10% ncremap --rnr=0.1 --sgs_frc=sgs ... # Mask missing > 10% ncremap --rnr=0.1 --sgs_frc=sgs --add_fll ... # Mask missing > 90% or sgs < 10% ncremap -P mpas... # --add_fll implicit, mask where sgs=0.0 ncremap -P mpas... --no_add_fll # --add_fll explicitly turned-off, no masking ncremap -P mpas... --rnr=0.1 # Mask missing > 90% or sgs < 10% ncremap -P elm... # --add_fll not implicit, no masking
Note that --add_fll
is automatically triggered by
--msk_apl
to ensure that masked fields regridded with
TempestRemap-generated map-files have _FillValue
s consistent
with map-files generated by ESMF and NCO.
--alg_lst
, --algorithm_list
)’ ¶As of NCO version 5.2.0 (February, 2024), ncremap
supports ‘--alg_lst=alg_lst’, a comma-separated list of the
algorithms that MWF-mode uses to create map-files.
The default list is
esmfaave,esmfbilin,ncoaave,ncoidw,traave,trbilin,trfv2,trintbilin
.
Each name in the list should be the primary name of an algorithm,
not a synonym.
For example, use esmfaave,traave
not
aave,fv2fv_flx
(the latter are backward-compatible synonyms
for the former).
The algorithm list must be consistent with grid-types supplied:
ESMF algorithms work with meshes in ESMF,
SCRIP, or UGRID formats.
NCO algorithms only work with meshes in SCRIP
format.
TempestRemap algorithms work with meshes in ESMF,
Exodus, SCRIP, or UGRID formats.
On output, ncremap
inserts each algorithm name into the
output map-file name in this format:
map_nm_src_to_nm_dst_alg_typ.dt_sng.nc
.
% ncremap -P mwf --alg_lst=esmfnstod,ncoaave,ncoidw,traave,trbilin \ -s ocean.QU.240km.scrip.181106.nc -g ne11pg2.nc \ --nm_src=QU240 --nm_dst=ne11pg2 --dt_sng=20240201 ... % ls map* map_QU240_to_ne11pg2_esmfnstod.20240201.nc map_QU240_to_ne11pg2_ncoaave.20240201.nc map_QU240_to_ne11pg2_ncoidw.20240201.nc map_QU240_to_ne11pg2_traave.20240201.nc map_QU240_to_ne11pg2_trbilin.20240201.nc map_ne11pg2_to_QU240_esmfnstod.20240201.nc map_ne11pg2_to_QU240_ncoaave.20240201.nc map_ne11pg2_to_QU240_ncoidw.20240201.nc map_ne11pg2_to_QU240_traave.20240201.nc map_ne11pg2_to_QU240_trbilin.20240201.nc
--area_dgn
, --area_diagnose
, --dgn_area
, --diagnose_area
)’ ¶Introduced in NCO version 5.0.4 (released December, 2021),
this switch (which takes no argument) causes the regridder to diagnose
(rather than copy) the area of each gridcell to an inferred grid-file.
By default, ncremap
simply copies the area variable (whose
name defaults to area
and can be explicitly specified with
‘-R '--rgr area_nm=name'’) into the grid_area
variable of
the inferred grid-file.
When --area_dgn
is invoked, ncremap
instead computes
the values of grid_area
based on the cell boundaries in the
input template file.
ncremap --area_dgn -d dst.nc -g grid.nc
Note that --area_dgn
has no effect on any mapping weights
subsequently generated from the grid-file because most
weight-generators base their weights on internally computed cell
areas (although ERWG has an option, --user_areas
, to
override this behavior).
--version
, --vrs
, --config
, --configuration
, --cnf
)’ ¶This switch (which takes no argument) causes the operator to print its version and configuration. This includes the copyright notice, URLs to the BSD and NCO license, directories from which the NCO scripts and binaries are running, and the locations of any separate executables that may be used by the script.
--d2f
, --d2s
, --dbl_flt
, --dbl_sgl
, --double_float
)’ ¶This switch (which takes no argument) demotes all double precision
non-coordinate variables to single precision.
Internally ncremap
invokes ncpdq
to apply the
dbl_flt
packing map to an intermediate version of the input
file before regridding it.
This switch has no effect on files that are not regridded.
To demote the precision in such files, use ncpdq
to apply
the dbl_flt
packing map to the file directly.
Files without any double precision fields will be unaltered.
--dbg_lvl
, --dbg
, --debug
, --debug_level
)’ ¶Specifies a debugging level similar to the rest of NCO.
If dbg_lvl = 1, ncremap
prints more extensive
diagnostics of its behavior.
If dbg_lvl = 2, ncremap
prints the commands
it would execute at any higher or lower debugging level, but does
not execute these commands.
If dbg_lvl > 2, ncremap
prints the diagnostic
information, executes all commands, and passes-through the debugging
level to the regridder (ncks
) for additional diagnostics.
--devnull=dvn_flg
(--devnull
, --dev_nll
, --dvn_flg
)’ ¶The dvn_flg controls whether ncremap
suppresses
regridder output or sends it to /dev/null
.
The default value of dvn_flg is “Yes”, so that
ncremap
prints little output to the terminal.
Set dvn_flg to “No” to allow the internal regridder
executables (mainly ncks
) to send their output to the
terminal.
--dpt
, --add_dpt
, --depth
, --add_depth
)’ ¶--dpt_fl
, --depth_file
, --mpas_fl
, --mpas_depth
)’The ‘--dpt’ switch (which takes no argument) and the
‘--dpt_fl=dpt_fl’ option which automatically sets the
switch and also takes a filename argument, both control the addition
of a depth coordinate to MPAS ocean datasets.
Depth is the vertical distance below sea surface and, like pressure
in the atmosphere, is an important vertical coordinate whose explicit
values are often omitted from datasets yet may be computed from other
variables (gridbox thickness, pressure difference) and grid information.
Moreover, users are often more interested in the approximate depth,
aka reference depth, of a given ocean layer independent of its
horizontal position.
To invoke either of these options first obtain and place the
add_depth.py
command on the executable path (i.e.,
$PATH
), and use ncremap --config
to verify that it is
found.
These options tell ncremap
to invoke add_depth.py
which uses the refBottomDepth
variable in the current data file
or, if specified, the dpt_fl, to create and add a depth
coordinate to the current file (before regridding).
As of NCO version 4.7.9 (February, 2019), the depth
coordinate is an approximate, one-dimensional, globally uniform
coordinate that neglects horizontal variations in depth that can occur
near strong bathymetry or under ice shelves.
Like its atmospheric counterpart in many models, the lev
pressure-coordinate, depth
is useful for plotting purposes and
global studies.
It would not be difficult to modify these options to add other depth
information based on the 3D cell-thickness field to ocean files
(please ask Charlie if interested in this).
--dst_fl
, --destination_file
, --tpl
, tpl_fl
, --template_file
, --template
)’ ¶Specifies a data file to serve as a template for inferring the destination grid. Currently dst_fl must be a data file (not a gridfile, SCRIP or otherwise) from which NCO can infer the destination grid. The more coordinate and boundary information and metadata the better NCO will do at inferring the grid. If dst_fl has cell boundaries then NCO will use those. If dst_fl has only cell-center coordinates (and no edges), then NCO will guess-at (for rectangular grids) or interpolate (for curvilinear grids) the edges. Unstructured grids must supply cell boundary information, as it cannot be interpolated or guessed-at. NCO only reads coordinate and grid data and metadata from dst_fl. dst_fl is not modified, and may have read-only permissions.
--dt_sng
, --date_string
)’ ¶Specifies the date-string use in the full name of map-files created in
MWF mode.
Map-file names include, by convention, a string to indicate the
approximate date (and thus algorithm versions employed) of weight
generation.
ncremap
uses the dt_sng argument to encode the date
into output map-file names of this format:
map_nm_src_to_nm_dst_alg_typ.dt_sng.nc
.
MWF mode defaults dt_sng to the current date in
YYYYMMDD
-format.
--esmf_typ
, --esmf_mth
, --esmf_extrap_type
, --esmf_extrap_method
)’ ¶Specifies the extrapolation method used to compute unmapped
destination point values with the ERWG weight generator.
Valid values, their synonyms, and their meanings are
neareststod
(synonyms stod
and nsd
) which
uses the nearest valid source value,
nearestidavg
(synonyms idavg
and id
) which
uses an inverse distance-weighted (with an exponent of xtr_xpn)
average of the nearest xtr_nsp valid source values, and
none
(synonyms nil
and nowaydude
) which forbids
extrapolation.
Default is esmf_typ = none
.
The arguments to options ‘--xtr_xpn=xtr_xpn’ (which
defaults to 2.0) and ‘--xtr_nsp=xtr_nsp’ (which defaults
to 8) set the parameters that control the extrapolation
nearestidavg
algorithm.
For more information on ERWG extrapolation, see documentation
here.
NCO supports this feature as of version 4.7.4 (April, 2018).
--xtr_nsp
, --esmf_pnt_src_nbr
, --esmf_extrap_num_src_pnts
)’ ¶Specifies the number of source points to use in extrapolating unmapped
destination point values with the ERWG weight generator.
This option is only useful in conjunction with explicitly requested
extrapolation types esmf_typ = neareststod
and
esmf_typ = nearestidavg
.
Default is xtr_nsp = 8.
For more information on ERWG extrapolation, see documentation
here.
NCO supports this feature as of version 4.7.4 (April, 2018).
--xtr_xpn
, --esmf_pnt_src_nbr
, --esmf_extrap_num_src_pnts
)’ ¶Specifies the number of source points to use in extrapolating unmapped
destination point values with the ERWG weight generator.
This option is only useful in conjunction with explicitly requested
extrapolation types esmf_typ = neareststod
and
esmf_typ = nearestidavg
.
Default is xtr_xpn = 2.0.
For more information on ERWG extrapolation, see documentation
here.
NCO supports this feature as of version 4.7.4 (April, 2018).
--grd_dst
, --grid_dest
, --dest_grid
, --destination_grid
)’ ¶Specifies the destination gridfile. An existing gridfile may be in any format accepted by the weight generator. NCO will use ERWG or TempestRemap to combine grd_dst with a source gridfile (either inferred from input-file, supplied with ‘-s grd_src’, or generated from ‘-G grd_sng’) to produce remapping weights. When grd_dst is used as input, it is not modified, and may have read-only permissions. When grd_dst is inferred from input-file or created from grd_sng, it will be generated in SCRIP format.
As of NCO version 4.6.8 (August, 2017), ncremap
supports most of the file format options that the rest of NCO
has long supported (see File Formats and Conversion).
This includes short flags (e.g., ‘-4’) and key-value options (e.g.,
‘--fl_fmt=netcdf4’) though not long-flags without values
(e.g., ‘--netcdf4’).
However, ncremap
can only apply the full suite of file format
options to files that it creates, i.e., regridded files.
The weight generators (ERWG and TempestRemap) are limited
in the file formats that they read and write.
Currently (August, 2017), ERWG supports CLASSIC
,
64BIT_OFFSET
, and NETCDF4
, while TempestRemap
supports only CLASSIC
.
These can of course be converted to other formats using ncks
(see File Formats and Conversion).
However, map-files produced in other non-CLASSIC
formats
can remap significantly larger grids than CLASSIC
-format
map-files.
--grd_sng
, --grid_generation
, --grid_gen
, --grid_string
)’ ¶Specifies, with together with other options, a source gridfile to
create89.
ncremap
creates the gridfile in SCRIP format by
default, and then, should the requisite options for regridding
be present, combines that with the destination grid (either
inferred from input-file or supplied with
‘-g grd_dst’ and generates mapping weights.
Manual grid-file generation is not frequently used since
ncremap
can infer many grids directly from the
input-file, and few users wish to keep track of SCRIP
grids when they can be easily regenerated as intermediate files.
This option also allows one to visually tune a grid by rapidly
generating candidates and inspecting the results.
If a desired grid-file is unavailable, and no dataset on that grid is
available (so inferral cannot be used), then one must manually create
a new grid.
Users create new grids for many reasons including dataset
intercomparisons, regional studies, and fine-tuned graphics.
NCO and ncremap
support manual generation of the
most common rectangular grids as SCRIP-format grid-files.
Create a grid by supplying ncremap
with a grid-file name and
“grid-formula” (grd_sng) that contains, at a minimum, the
grid-resolution.
The grid-formula is a single hash-separated string of name-value pairs
each representing a grid parameter.
All parameters except grid resolution have reasonable defaults, so a
grid-formula can be as simple as ‘latlon=180,360’:
ncremap -g grd.nc -G latlon=180,360
The SCRIP-format grid-file grd.nc is a valid source
or destination grid for ncremap
and other regridders.
Grid-file generation documentation in the NCO Users Guide at
http://nco.sf.net/nco.html#grid describes all the grid
parameters and contains many examples.
Note that the examples in this section use grid generation
API for ncremap
version 4.7.6 (August, 2018) and
later.
Earlier versions can use the ncks
API explained
at Grid Generation in the Users Guide.
The most useful grid parameters (besides resolution) are latitude type
(lat_typ), longitude type (lon_typ), title (ttl),
and, for regional grids, the SNWE bounding box
(snwe).
The three supported varieties of global rectangular grids are
Uniform/equiangular (lat_typ=uni
), Cap/FV
(lat_typ=cap
), and Gaussian
(lat_typ=gss
).
The four supported varieties of longitude types are the first
(westernmost) gridcell centered at Greenwich
(lon_typ=grn_ctr
), western edge at Greenwish
(grn_wst
), or at the Dateline
(lon_typ=180_ctr
and
lon_typ=180_wst
, respectively).
Grids are global, uniform, and have their first longitude centered at
Greenwich by default.
The grid-formula for this is ‘lat_typ=uni#lon_typ=grn_ctr’.
Some examples (remember, this API requires NCO
4.7.6+):
ncremap -g grd.nc -G latlon=180,360 # 1x1 Uniform grid ncremap -g grd.nc -G latlon=180,360#lat_drc=n2s # 1x1 Uniform grid, N->S not S->N ncremap -g grd.nc -G latlon=180,360#lon_typ=grn_wst # 1x1 Uniform grid, Greenwich-west edge ncremap -g grd.nc -G latlon=129,256#lat_typ=cap # 1.4x1.4 FV grid ncremap -g grd.nc -G latlon=94,192#lat_typ=gss # T62 Gaussian grid ncremap -g grd.nc -G latlon=361,576#lat_typ=cap#lon_typ=180_ctr # MERRA2 FV grid ncremap -g grd.nc -G latlon=94,192#lat_typ=gss#lat_drc=n2s # NCEP2 T62 Gaussian grid
Regional grids are a powerful tool in regional process analyses, and can be much smaller in size than global datasets. Regional grids are always uniform. Specify the rectangular bounding box, i.e., the outside edges of the region, in SNWE order:
ncremap -g grd.nc -G ttl="Equi-Angular 1x1 Greenland grid"#latlon=30,90#snwe=55.0,85.0,-90.0,0.0
The grd_sng argument to ‘-G’ or ‘--grd_sng’ must be
a single hash-separated string of name-value pairs, e.g.,
latlon=....#lat_typ=...#ttl="My title"
.
ncremap
will not correctly parse any other format, such
as multiple separate name-value pairs without hashes.
--in_drc
, --drc_in
, --dir_in
, --in_dir
, input
)’ ¶Specifies the input directory, i.e., the directory which contains
the input file(s).
If in_fl is also specified, then the input filepath is
constructed by appending a slash and the filename to the directory:
‘in_drc/in_fl’.
Specifying in_drc without in_fl causes ncremap
to attempt to remap every file in in_drc that ends with one of
these suffixes: .nc
, .nc3
, .nc4
, .nc5
,
.nc6
, .nc7
, .cdf
,
.hdf
, .he5
, or .h5
.
When multiple files are regridded, each output file takes the name
of the corresponding input file.
There is no namespace conflict because the input and output files are in
separate directories.
Note that ncremap
can instead accept a list of input files
through standard input (e.g., ‘ls *.nc | ncremap ...’) or as
positional command-line arguments (e.g., ‘ncremap in1.nc in2.nc ...’).
--in_fl
, --in_file
, --input_file
)’ ¶Specifies the file containing data on the source grid to be remapped
to the destination grid.
When provided with the optional map_fl, ncremap
only reads data from in_fl in order to regrid it.
Without the optional map_fl or src_grd, ncremap
will try to infer the source grid from in_fl, and so must read
coordinate and metatdata information from in_fl.
In this case the more coordinate and boundary information and metadata,
the better NCO will do at inferring the source grid.
If in_fl has cell boundaries then NCO will use those.
If in_fl has only cell-center coordinates (and no edges),
then NCO will guess (for rectangular grids) or interpolate
(for curvilinear grids) the edges.
Unstructured grids must supply cell boundary information, as it cannot
be interpolated or guessed-at.
in_fl is not modified, and may have read-only permissions.
Note that ncremap
can instead accept input file name(s)
through standard input (e.g., ‘ls *.nc | ncremap ...’) or as
positional command-line arguments (e.g.,
‘ncremap in1.nc in2.nc ...’).
When one or three-or-more positional arguments are given, they are
all interpreted as input filename(s).
Two positional arguments are interpreted as a single input-file
and its corresponding output-file.
--job_nbr
, --job_number
, --jobs
)’ ¶Specifies the number of simultaneous regridding processes to spawn
during parallel execution for both Background and MPI modes.
In both parallel modes ncremap
spawns processes in batches
of job_nbr jobs, then waits for those processes to complete.
Once a batch finishes, ncremap
spawns the next batch.
In Background mode, all jobs are spawned to the local node.
In MPI mode, all jobs are spawned in round-robin fashion
to all available nodes until job_nbr jobs are running.
If regridding consumes so much RAM (e.g., because variables are large and/or the number of threads is large) that a single node can perform only one regridding job at a time, then a reasonable value for job_nbr is the number of nodes, node_nbr. Often, however, nodes can regrid multiple files simultaneously. It can be more efficient to spawn multiple jobs per node than to increase the threading per job because I/O contention for write access to a single file prevents threading from scaling indefinitely.
By default job_nbr = 2 in Background mode, and job_nbr = node_nbr in MPI mode. This helps prevent users from overloading nodes with too many jobs. Subject to the availability of adequate RAM, expand the number of jobs per node by increasing job_nbr until, ideally, each core on the node is used. Remember that processes and threading are multiplicative in core use. Four jobs each with four threads each consumes sixteen cores.
As an example, consider regridding 100 files with a single map.
Say you have a five-node cluster, and each node has 16 cores
and can simultaneously regrid two files using eight threads each.
(One needs to test a bit to optimize these parameters.)
Then an optimal (in terms of wallclock time) invocation would
request five nodes with 10 simultaneous jobs of eight threads.
On PBS or SLURM batch systems this would involve a
scheduler command like ‘qsub -l nodes=5 ...’ or
‘sbatch --nodes=5 ...’, respectively, followed by
‘ncremap --par_typ=mpi --job_nbr=10 --thr_nbr=8 ...’.
This job will likely complete between five and ten-times faster than a
serial-mode invocation of ncremap
to regrid the same files.
The uncertainty range is due to unforeseeable, system-dependent
load and I/O charateristics.
Nodes that can simultaneously write to more than one file fare better
with multiple jobs per node.
Nodes with only one I/O channel to disk may be better exploited
by utilizing more threads per process.
--mlt_map
, --multimap
, --no_multimap
, --nomultimap
)’ ¶ncremap
assumes that every input file is on a unique grid
unless a source gridfile is specified (with ‘-s grd_src’)
or multiple-mapfile generation is explicitly turned-off (with
‘-M’).
The ‘-M’ switch is a toggle, it requires and accepts no argument.
Toggling ‘-M’ tells ncremap
to generate at most one
mapfile regardless of the number of input files.
If ‘-M’ is not toggled (and neither
‘-m map_fl’ nor ‘-s grd_src’ is invoked)
then ncremap
will generate a new mapfile for each input file.
Generating new mapfiles for each input file is necessary for processing
batches of data on different grids (e.g., swath-like data), and slow,
tedious, and unnecessary when batch processing data on the same grids.
--map_fl
, --map
, --map_file
, --rgr_map
, --regrid_map
)’ ¶Specifies a mapfile (i.e., weight-file) to remap the source to
destination grid.
If map_fl is specified in conjunction with any of the ‘-d’,
‘-G’, ‘-g’, or ‘-s’ switches, then ncremap
will name the internally generated mapfile map_fl.
Otherwise (i.e., if none of the source-grid switches are used),
ncremap
assumes that map_fl is a pre-computed mapfile.
In that case, the map_fl must be in SCRIP format,
although it may have been produced by any application (usually
ERWG or TempestRemap).
If map_fl has only cell-center coordinates (and no edges),
then NCO will guess-at or interpolate the edges.
If map_fl has cell boundaries then NCO will use those.
A pre-computed map_fl is not modified, and may have read-only
permissions.
The user will be prompted to confirm if a newly generated map-file
named map_fl would overwrite an existing file.
ncremap
adds provenance information to any newly generated
map-file whose name was specified with ‘-m map_fl’.
This provenance includes a history
attribute that contains
the command invoking ncremap
, and the map-generating command
invoked by ncremap
.
--mpi_pfx=mpi_pfx
(--mpi_pfx
, --mpi_prefix
, --srun_cmd
, --srun_command
)’ ¶--mpi_nbr=mpi_nbr
(--mpi_nbr
, --mpi_number
, --tsk_nbr
, --task_number
)’The ‘--mpi_pfx=mpi_pfx’ option specifies an appropriate job
scheduler prefix for MPI-enabled weight-generation
executables such as ESMF’s ESMF_RegridWeightGen
and MOAB’s mbtempest
.
Other weight generators (ncks
, GenerateOfflineMap
)
are unaffected by this option since they are not
MPI-enabled.
mpi_pfx defaults to mpirun -n ${mpi_nbr}
on all
machines except those whose $HOSTNAME
matches an internal
database of DOE-operated supercomputers where
mpi_pfx usually defaults to srun -n ${mpi_nbr}
When invoking ‘--mpi_pfx’, be sure to explicitly define the
number of MPI tasks-per-node, e.g.,
ncremap --mpi_pfx='srun -n 16' ... ncremap --mpi_pfx='srun --mpi=pmi2 -n 4' ...
The separate ‘--mpi_nbr=mpi_nbr’ option specifies the
number of tasks-per-node that MPI-enabled weight generators
will request.
It preserves the default job scheduler prefix (srun
or
mpirun
):
ncremap --mpi_nbr=4 ... # 16 MPI tasks-per-node for ERWG/mbtempest ncremap --mpi_nbr=16 ... # 4 MPI tasks-per-node for ERWG/mbtempest
Thus ‘--mpi_nbr=mpi_nbr’ can be used to create
host-independent ncremap
commands to facilitate benchmarking
the scaling of weight-generators across hosts that work with the
default value of mpi_pfx.
The ‘--mpi_pfx’ option will prevail and ‘--mpi_nbr’ will be
ignored if both are used in the same ncremap
invocation.
Note that ‘mpi_pfx’ is only used internally by ncremap
to exploit the MPI capabilities of select weight-generators.
It is not used to control and does not affect the distribution of
multiple ncremap
commands among a cluster of nodes.
--msh_fl=msh_fl
(--msh_fl
, --msh
, --mesh
, --mesh_file
)’ ¶Specifies a meshfile (aka intersection mesh, aka overlap mesh) that
stores the grid formed by the intersection of the source and
destination grids.
If not specified then ncremap
will name any internally
generated meshfile with a temporary name and delete the file prior
to exiting.
NCO and TempestRemap support archiving the meshfile,
and ERWG does not.
NCO stores the meshfile in SCRIP format, while
TempestRemap stores it in Exodus format (with a ‘.g’ suffix).
ncremap
adds provenance information to any newly generated
mesh-file whose name was specified with ‘--msh_fl=msh_fl’.
This provenance includes a history
attribute that contains
the command invoking ncremap
, and the map-generating command
invoked by ncremap
.
--mpt_mss
, --sgs_zro_mss
, --empty_missing
)’ ¶Introduced in NCO version 5.1.9 (released November, 2023),
this switch (which takes no argument) causes the regridder to set
empty SGS gridcells to the missing value.
Note that this switch works only in limited circumstances.
First, it only applies to fields for which a valid sub-gridscale
(SGS) distribution has been supplied.
Second, it only applies to fields which have no missing values.
The primary usage of this switch is for sea-ice model datasets.
These datasets tend to be archived with a (SGS) fraction
that is non-zero only when and where sea ice is present.
The datasets also tend to be written with valid data throughout the
ocean domain, regardless of whether sea-ice is present.
Most sea-ice fields are usually zero in open-ocean areas (where
sgs_frc = 0.0), and non-zero where sea-ice exists.
The --mpt_mss
switch causes the regridder to set the open-ocean
regions to the missing value.
# Set open-ocean regions to missing values (not 0.0) in sea-ice output ncremap --mpt_mss -P mpasseaice -m map.nc in.nc out.nc
--msk_apl
, --mask_apply
, --msk_app
)’ ¶Introduced in NCO version 5.0.0 (released June, 2021),
this switch (which takes no argument) causes the regridder to
apply msk_out (i.e., mask_b
) to variables after
regridding.
Some weight generators (e.g., TempestRemap) ignore masks and thus
produce non-zero weights for masked destination cells, and/or from
masked source cells.
This flag causes regridded files produced with such map-files to
adhere to the destination mask rules (though source mask rules may
still be violated).
This feature is especially useful in placing missing values (aka,
_FillValue
) in destination cells that should be empty, so that
regridded files have _FillValue
distributions identical with
output from other weight-generators such as ESMF and
NCO.
ncremap --msk_apl -v FLNS -m map.nc in.nc out.nc ncremap --msk_apl --add_fll -v FLNS -m map.nc in.nc out.nc # Equivalent
By itself, --msk_apl
would only mask cells based on the
mask_b
field in the map-file.
This is conceptually independent of the actual intersection mesh.
However, --msk_apl
automatically triggers --add_fll
,
which also masks fields based on the computed intersection mesh
(i.e., --frac_b
).
This combinations ensures that masked fields regridded with
TempestRemap-generated map-files have _FillValue
s consistent
with map-files generated by ESMF and NCO.
--msk_dst
, --dst_msk
, --mask_destination
, --mask_dst
)’ ¶Specifies a template variable to use for the integer mask of the
destination grid when inferring grid files and/or creating
map-files (i.e., generating weights).
Any variable on the same horizontal grid as a data file can serve as a
mask template for that grid.
The mask will be one (i.e., gridcells will participate in regridding)
where msk_dst has valid, non-zero values in the data file from which
NCO infers the destination grid.
The mask will be zero (i.e., gridcells will not participate in
regridding) where msk_nm has a missing value or is zero.
A typical example of this option would be to use Sea-surface Temperature
(SST) as a template variable for an ocean mask because
SST is often defined only over ocean, and missing values
might denote locations to which regridded quantities should never be
placed.
The special value msk_dst = none
prevents the
regridder from inferring and treating any variable (even one named,
e.g., mask
) in a source file as a mask variable.
This guarantees that all points in the inferred destination grid will be
unmasked.
msk_dst, msk_out, and msk_src are related yet distinct:
msk_dst is the mask template variable in the destination file (whose grid will be inferred),
msk_out is the name to give the destination mask (usually mask_b
in the map-file) in regridded data files, and
msk_src is the mask template variable in the source file (whose grid will be inferred).
msk_src and msk_dst only affect inferred grid files for
the source and destination grids, respectively, whereas msk_out
only affects regridded files.
--msk_out
, --out_msk
, --mask_destination
, --mask_out
)’ ¶Use of this option tells ncremap
to include a variable named
msk_out in any regridded file.
The variable msk_out will contain the integer-valued
regridding mask on the destination grid.
The mask will be one (i.e., fields may have valid values in this
gridcell) or zero (i.e., fields will have missing values in this
gridcell).
By default, ncremap
does not output the destination mask to
the regridded file.
This option changes that default behavior and causes ncremap
to ingest the default destination mask variable contained in the
map-file.
ERWG generates SCRIP-format map-files that contain
the destination mask in the variable named mask_b
.
SCRIP generates map-files that contain the destination mask in
the variable named dst_grid_imask
.
The msk_out
option works with map-files that adhere to either of
these conventions.
Tempest generates map-files that do not typically contain the
destination mask, and so the msk_out
option has no effect on
files that Tempest regrids.
msk_dst, msk_out, and msk_src are related yet distinct:
msk_dst is the mask template variable in the destination file (whose grid will be inferred),
msk_out is the name to give the destination mask (usually mask_b
in the map-file) in regridded data files, and
msk_src is the mask template variable in the source file (whose grid will be inferred).
msk_src and msk_dst only affect inferred grid files for
the source and destination grids, respectively, whereas msk_out
only affects regridded files.
--msk_src
, --src_msk
, --mask_source
, --mask_src
)’ ¶Specifies a template variable to use for the integer mask of the
source grid when inferring grid files and/or creating
map-files (i.e., generating weights).
Any variable on the same horizontal grid as a data file can serve as a
mask template for that grid.
The mask will be one (i.e., gridcells will participate in regridding)
where msk_src has valid, non-zero values in the data file from which
NCO infers the source grid.
The mask will be zero (i.e., gridcells will not participate in
regridding) where msk_nm has a missing value or is zero.
A typical example of this option would be to use Sea-surface Temperature
(SST) as a template variable for an ocean mask because SST is often
defined only over ocean, and missing values might denote locations from
which regridded quantities should emanate.
The special value msk_src = none
prevents the
regridder from inferring and treating any variable (even one named,
e.g., mask
) in a source file as a mask variable.
This guarantees that all points in the inferred source grid will be
unmasked.
msk_dst, msk_out, and msk_src are related yet distinct:
msk_dst is the mask template variable in the destination file (whose grid will be inferred),
msk_out is the name to give the destination mask (usually mask_b
in the map-file) in regridded data files, and
msk_src is the mask template variable in the source file (whose grid will be inferred).
msk_src and msk_dst only affect inferred grid files for
the source and destination grids, respectively, whereas msk_out
only affects regridded files.
--mss_val
, --fll_val
, --missing_value
, --fill_value
)’ ¶Specifies the numeric value that indicates missing data when processing
MPAS datasets, i.e., when ‘-P mpas’ is invoked.
The default missing value is -9.99999979021476795361e+33
which is
correct for the MPAS ocean and sea-ice models.
Currently (January, 2018) the MPAS land-ice model uses
-1.0e36
for missing values.
Hence this option is usually invoked as ‘--mss_val=-1.0e36’ to
facilitate processing of MPAS land-ice datasets.
--nco_opt
, --nco_options
, --nco
)’ ¶Specifies a string of options to pass-through unaltered to
ncks
.
nco_opt defaults to ‘-O --no_tmp_fl’.
--nm_dst
, --name_dst
, --name_short_destination
, --nm_sht_dst
)’ ¶Specifies the short name for the destination grid to use in the full
name of map-files created in MWF mode.
Map-file names include, by convention, shortened versions of both the
source and destination grids.
ncremap
uses the nm_dst argument to encode the
destination grid name into the output map-file name of this format:
map_nm_src_to_nm_dst_alg_typ.dt_sng.nc
.
MWF mode requires this argument, there is no default.
--nm_src
, --name_src
, --name_short_source
, --nm_sht_src
)’ ¶Specifies the short name for the source grid to use in the full
name of map-files created in MWF mode.
Map-file names include, by convention, shortened versions of both the
source and destination grids.
ncremap
uses the nm_dst argument to encode the
source grid name into the output map-file name of this format:
map_nm_src_to_nm_dst_alg_typ.dt_sng.nc
.
MWF mode requires this argument, there is no default.
--no_cll_msr
, --no_cll
, --no_cell_measures
, --no_area
)’ ¶This switch (which takes no argument) controls whether ncclimo
and ncremap
add measures variables to the extraction list
along with the primary variable and other associated variables.
See CF Conventions for a detailed description.
--no_frm_trm
, --no_frm
, --no_formula_terms
)’ ¶This switch (which takes no argument) controls whether ncclimo
and ncremap
add formula variables to the extraction list along
with the primary variable and other associated variables.
See CF Conventions for a detailed description.
--no_stg_grd
, --no_stg
, --no_stagger
, --no_staggered_grid
)’ ¶This switch (which takes no argument) controls whether
regridded output will contain the staggered grid coordinates
slat
, slon
, and w_stag
(see Regridding).
Originally, the staggered grid was output for all files regridded from
a Cap (aka FV) grid, except when the regridding was performed
as part of splitting (reshaping) into timeseries.
As of (roughly, I forget) NCO version 4.9.4, released in
July, 2020, outputging the staggered grid information is turned-off
for all workflows and must be proactively turned-on (with
--stg_grd
).
Thus the --no_stg_grd
switch is obsolete and is intened
only to preserve backward-compatibility of previous workflows.
--out_drc
, --drc_out
, --dir_out
, --out_dir
, --output
)’ ¶Specifies the output directory, i.e., the directory name to contain
the output file(s).
If out_fl is also specified, then the output filepath is
constructed by appending a slash and the filename to the directory:
‘out_drc/out_fl’.
Specifying out_drc without out_fl causes ncremap
to name each output file the same as the corresponding input file.
There is no namespace conflict because the input and output files will
be in separate directories.
--out_fl
, --output_file
, --out_file
)’ ¶Specifies the output filename, i.e., the name of the file to contain the data from in_fl remapped to the destination grid. If out_fl already exists it will be overwritten. Specifying out_fl when there are multiple input files (i.e., from using ‘-I in_drc’ or standard input) generates an error (output files will be named the same as input files). Two positional arguments are interpreted as a single input-file and its corresponding output-file.
--prc_typ
, --prm_typ
, --procedure
)’ ¶Specifies the permutation mode desired.
As of NCO version 4.5.5 (February, 2016), one can tell
ncremap
to invoke special processing procedures for different
types of input data.
For instance, to automatically permute the dimensions in the data file
prior to regridding for a limited (though growing) number of data-file
types that encounter the ncremap
limitation concerning
dimension ordering.
Valid procedure types include ‘airs’ for NASA AIRS satellite data,
‘eam’ or ‘cam’ for DOE EAM and NCAR CAM model data,
‘eamxx’ for DOE EAMxx (aka, SCREAM) model data,
‘elm’ or ‘clm’ for DOE ELM and NCAR CLM model data,
‘cice’ for CICE ice model data (must be on 2D grids),
‘cism’ for NCAR CISM land ice model data,
‘mpasa’, or ‘mpasatmosphere’ for MPAS atmosphere model data,
‘mpascice’, ‘mpasseaice’, or ‘mpassi’ for MPAS sea-ice model data,
‘mpaso’ or ‘mpasocean’ for MPAS ocean model data,
‘mod04’ for Level 2 MODIS MOD04 product,
‘mwf’ for making all weight-files for a pair of grids,
‘sgs’ for datasets containing sub-gridscale (SGS) data
(such as CLM/CTSM/ELM land model data and
CICE/MPAS-Seaice sea-ice model data),
and ‘nil’ (for none).
The default prc_typ is ‘nil’, which means ncremap
does not perform any special procedures prior to regridding.
The AIRS procedure calls ncpdq
to permute dimensions
from their order in the input file to this order:
StdPressureLev,GeoTrack,GeoXTrack
.
The ELM, CLM, and CICE procedures set
idiosyncratic model values and then invoke the Sub-gridscale
(SGS) procedure (see below).
The MOD04 procedure unpacks input data.
The EAMxx procedures permute input data dimensions into this
order prior to horizontal regridding:
time,lwband,swband,ilev,lev,plev,cosp_tau,cosp_cth,cosp_prs,dim2,ncol
,
and cause the vertical interpolation routine to look for surface
pressure under the name ps
instead of PS
.
The MPAS procedures permute input data dimensions into this
order:
Time,depth,nVertInterfaces,nVertLevels,nVertLevelsP1,nZBGCTracers,nBioLayersP1,nAlgaeIceLayers,nDisIronIceLayers,nIceLayers,maxEdges,MaxEdges2,nCategories,R3,ONE,TWO,FOUR,nEdges,nCells
,
and invokes renormalization.
An MPAS dataset that contains any other dimensions will fail
to regrid until/unless those dimensions are added to the
ncremap
dimension permutation option.
MWF-mode:
As mentioned above in other options, ncremap
includes an
MWF-mode (for “Make All Weight Files”) that generates and
names, with one command and in a self-consistent manner, all
combinations of (for instance, E3SM or CESM)
global atmosphere<->ocean maps with both ERWG and Tempest.
MWF-mode automates the laborious and error-prone process of
generating numerous map-files with various switches.
Its chief use occurs when developing and testing new global grid-pairs
for the E3SM atmosphere and ocean components.
Invoke MWF-mode with a number of specialized options to
control the naming of the output map-files:
ncremap -P mwf -s grd_ocn -g grd_atm --nm_src=ocn_nm \ --nm_dst=atm_nm --dt_sng=date
where grd_ocn is the "global" ocean grid, grd_atm, is the global atmosphere grid, nm_src sets the shortened name for the source (ocean) grid as it will appear in the output map-files, nm_dst sets, similarly, the shortend named for the destination (atmosphere) grid, and dt_sng sets the date-stamp in the output map-file name map_${nm_src}_to_${nm_dst}_${alg_typ}.${dt_sng}.nc. Setting nm_src, nm_dst, and dt_sng, is optional though highly recommended. For example,
ncremap -P mwf -s ocean.RRS.30-10km_scrip_150722.nc \ -g t62_SCRIP.20150901.nc --nm_src=oRRS30to10 --nm_dst=T62 \ --dt_sng=20180901
produces the 10 ERWG map-files:
The ordering of source and destination grids is immaterial for
ERWG maps since MWF-mode produces all map
combinations.
However, as described above in the TempestRemap section, the Tempest
overlap-mesh generator must be called with the smaller grid preceding
the larger grid.
For this reason, always invoke MWF-mode with the smaller
grid (i.e., the ocean) as the source, otherwise some Tempest map-file
will fail to generate.
The six optimized SE<->FV Tempest maps described
above in the TempestRemap section will be generated when the
destination grid has a ‘.g’ suffix which ncremap
interprets as indicating an Exodus-format SE grid (NB: this
assumption is an implementation convenience that can be modified if
necessary).
For example,
ncremap -P mwf -s ocean.RRS.30-10km_scrip_150722.nc -g ne30.g \ --nm_src=oRRS30to10 --nm_dst=ne30np4 --dt_sng=20180901
produces the 6 TempestRemap map-files:
MWF-mode takes significant time to complete (~20 minutes on
my MacBookPro) for the above grids.
To accelerate this, consider installing the MPI-enabled
instead of the serial version of ERWG.
Then use the ‘--wgt_cmd’ option to tell ncremap
the
MPI configuration to invoke ERWG with, for
example:
ncremap -P mwf --wgt_cmd='mpirun -np 12 ESMF_RegridWeightGen' \ -s ocean.RRS.30-10km_scrip_150722.nc -g t62_SCRIP.20150901.nc \ --nm_src=oRRS30to10 --nm_dst=T62 --dt_sng=20180901
Background and distributed node parallelism (as described above in the
the Parallelism section) of MWF-mode are possible though not
yet implemented.
Please let us know if this feature is desired.
RRG-mode:
EAM and CAM-SE will produce regional output if
requested to with the finclNlonlat
namelist parameter.
Output for a single region can be higher temporal resolution than the
host global simulation.
This facilitates detailed yet economical regional process studies.
Regional output files are in a special format that we call
RRG (for “regional regridding”).
An RRG file may contain any number of rectangular regions.
However, ncremap
can process only one region per invocation
(change the argument to the ‘--rnm_sng’ option, described below,
in each invocation).
The coordinates and variables for one region do not interfere with
other (possibly overlapping) regions because all variables and
dimensions are named with a per-region suffix string, e.g.,
lat_128e_to_134e_9s_to_16s
.
ncremap
can easily regrid RRG output from an
atmospheric FV-dycore because ncremap
can infer
(as discussed above) the regional grid from any rectangular
FV data file.
Regridding regional SE data, however, is more complex
because SE gridcells are essentially weights without
vertices and SE weight-generators are not yet flexible
enough to output regional weights.
To summarize, regridding RRG data leads to three
SE-specific difficulties (#1–3 below) and two difficulties
(#4–5) shared with FV RRG files:
ncremap
’s RRG mode resolves these issues to allow
trouble-free regridding of SE RRG files. The user
must provide two additional input arguments,
‘--dat_glb=dat_glb’ (or synonynms ‘--rrg_dat_glb’,
‘--data_global’, or ‘--global_data’) and
‘--grd_glb=grd_glb’ (or synonyms ‘--rrg_grd_glb’,
‘--grid_global’, or ‘global_grid’) that point to a global
SE dataset and grid, respectively, of the same resolution as
the model that generated the RRG datasets.
Hence a typical RRG regridding invocation is:
ncremap --dat_glb=dat_glb.nc --grd_glb=grd_glb.nc -g grd_rgn.nc \ dat_rgn.nc dat_rgr.nc
Here grd_rgn.nc is a regional destination grid-file,
dat_rgn.nc is the RRG file to regrid, and
dat_rgr.nc is the regridded output.
Typically grd_rgn.nc is a uniform rectangular grid covering the
same region as the RRG file.
Generate this as described in the last example in the section that
describes Manual Grid-file Generation with the ‘-G’ option.
grd_glb.nc is the standard dual-grid grid-file for the
SE resolution, e.g.,
ne30np4_pentagons.091226.nc.
ncremap
regrids the
global data file dat_glb.nc to the global dual-grid in order to
produce a intermediate global file annotated with gridcell
vertices.
Then it hyperslabs the lat/lon coordinates (and vertices) from the
regional domain to use with regridding the RRG file.
A grd_glb.nc file with only one 2D field suffices (and is
fastest) for producing the information needed by the RRG
procedure.
One can prepare an optimal dat_glb.nc file by subsetting any 2D
variable from any full global SE output dataset with,
e.g., ‘ncks -v FSNT in.nc dat_glb.nc’.
ncremap
RRG mode supports two additional options
to override internal parameters.
First, the per-region suffix string
may be set with ‘--rnm_sng=rnm_sng’ (or synonyms
‘--rrg_rnm_sng’ or ‘--rename_string’).
RRG mode will, by default, regrid the first region it finds
in an RRG file.
Explicitly set the desired region with rnm_sng for files with
multiple regions, e.g., ‘--rnm_sng=_128e_to_134e_9s_to_16s’.
Second, the regional bounding-box may be explicitly set with
‘--bb_wesn=lon_wst,lon_est,lat_sth,lat_nrt’.
The normal parsing of the bounding-box string from the suffix string
may fail in (as yet undiscovered) corner cases, and the
‘--bb_wesn’ option provides a workaround should that occur.
The bounding-box string must include the entire RRG region
(not a subset thereof), specified in WESN order.
The two override options may be used independently or together, as in:
ncremap --rnm_sng='_128e_to_134e_9s_to_16s' --bb_wesn='128,134,-16,-9' \ --dat_glb=dat_glb.nc --grd_glb=grd_glb.nc -g grd_rgn.nc \ dat_rgn.nc dat_rgr.nc
RRG-mode supports most normal ncremap
options,
including input and output methods and regridding algorithms.
However, RRG-mode is not widely used and, as of 20240529,
has not been parallelized like the rest of ncremap
.
SGS-mode:
ncremap
has a sub-gridscale (SGS) mode that
performs the special pre-processing and weighting necessary to
to conserve fields that represent fractional spatial portions of a
gridcell, and/or fractional temporal periods of the analysis.
Spatial fields output by most geophysical models are intensive,
and so by default the regridder attempts to conserve the integral of the
area times the field value such that the integral is equal on source and
destination grids.
However some models (like ELM, CLM,
CICE, and MPAS-Seaice) output gridcell values
intended to apply to only a fraction sgs_frc (for
“sub-gridscale fraction”) of the gridcell.
The sub-gridscale (SGS) fraction usually changes spatially
with the distribution of land and ocean, and spatiotemporally with
the distribution of sea ice and possibly vegetation.
For concreteness consider a sub-grid field that represents the land
fraction.
Land fraction is less than one in gridcells that resolve coastlines or
islands.
ELM and CLM happily output temperature values
valid only for a small (i.e., sgs_frc << 1) island within
the larger gridcell.
Model architecture dictates this behavior and savvy researchers expect
it.
The goal of the NCO weight-application algorithm is to treat
SGS fields as seamlessly as possible so that those less
familiar with sub-gridscale models can easily regrid them correctly.
Fortunately, models like ELM and CLM that run on the same horizontal grid as the overlying atmosphere can use the same mapping-file as the atmosphere, so long as the SGS weight-application procedure is invoked. Not invoking an SGS-aware weight application algorithm is equivalent to assuming sgs_frc = 1 everywhere. Regridding sub-grid values correctly versus incorrectly (e.g., with and without SGS-mode) alters global-mean answers for land-based quantities by about 1% for horizontal grid resolutions of about one degree. The resulting biases are in intricately shaped regions (coasts, lakes, sea-ice floes) and so are easy to overlook.
To invoke SGS mode and correctly regrid sub-gridscale data,
specify the names of the fractional area sgs_frc and, if
applicable, the mask variable sgs_msk (strictly, this is only
necessary if these names differ from their
respective defaults landfrac
and landmask
).
Trouble will ensue if sgs_frc is a percentage or an absolute
area rather than a fractional area (between zero and one).
ncremap
must know the normalization factor sgs_nrm by
which sgs_frc must be divided (not multiplied) to obtain
a true, normalized fraction.
Datasets (such as those from CICE) that store sgs_frc
in percent should specify the option
‘--sgs_nrm=100’ to instruct ncremap
to normalize the
sub-grid area appropriately before regridding.
ncremap
will re-derive sgs_msk based on the regridded
values of sgs_frc: sgs_msk = 1 is assigned to
destination gridcells with sgs_frc > 0.0, and all others
sgs_msk = 0.
As of NCO version 4.6.8 (released June, 2017), invoking any
of the options ‘--sgs_frc’, ‘--sgs_msk’, or ‘--sgs_nrm’,
automatically triggers SGS-mode, so that also invoking
‘-P sgs’ is redundant though legal.
As of NCO version 4.9.0 (released December, 2019), the values
of the sgs_frc and sgs_msk variables should be explicitly
specified.
In previous versions they defaulted to landfrac
and
landmask
, respectively, when ‘-P sgs’ was selected.
This behavior still exists but will likely be deprecated in a future
version.
The area
and sgs_frc fields in the regridded file will be
in units of sterradians and fraction, respectively.
However, ncremap
offers custom options to reproduce the
idiosyncratic data and metadata format of two particular models,
ELM and CICE.
When invoked with ‘-P elm’ (or ‘-P clm’), a final step
converts the output area
from sterradians to square kilometers.
When invoked with ‘-P cice’, the final step converts the output
area
from sterradians to square meters, and the output
sgs_frc
from a fraction to a percent.
# ELM/CLM: output "area" in [sr] ncremap --sgs_frc=landfrac --sgs_msk=landmask in.nc out.nc ncremap -P sgs in.nc out.nc # ELM/CLM pedantic format: output "area" in [km2] ncremap -P elm in.nc out.nc # Same as -P clm, alm, ctsm # CICE: output "area" in [sr] ncremap --sgs_frc=aice --sgs_msk=tmask --sgs_nrm=100 in.nc out.nc # CICE pedantic format: output "area" in [m2], "aice" in [%] ncremap -P cice in.nc out.nc # MPAS-Seaice: both commands are equivalent ncremap -P mpasseaice in.nc out.nc ncremap --sgs_frc=timeMonthly_avg_iceAreaCell in.nc out.nc
It is sometimes convenient to store the sgs_frc field in an external file from the field(s) to be regridded. For example, CMIP-style timeseries are often written with only one variable per file. NCO supports this organization by accepting sgs_frc arguments in the form of a filename followed by a slash and then a variable name:
ncremap --sgs_frc=sgs_landfrac_ne30.nc/landfrac -m map.nc in.nc out.nc
This feature is most useful for datasets whose sgs_frc field is
time-invariant, as is usually the case for land models.
This is because a single sgs_frc location (e.g.,
r05.nc/landfrac) can be used for all files of the same
resolution.
Time-varying sgs_frc fields (e.g., for sea-ice models) change
with the same frequency as the simulation output.
Thus fields associated with time-varying sgs_frc must be
regridded “timestep-by-timestep”, i.e., with a separate
ncremap
invocation for each snapshot of sgs_frc.
Of course, ncrcat
can later concatenate these separate
regriddings can be recombined back into into a single, regridded
timeseries.
Files regridded using explicitly specified SGS options will
differ slightly from those regridded using the ‘-P elm’ or
‘-P cice’ options.
The former will have an area
field in sterradians, the generic
units used internally by the regridder.
The latter produces model-specific area
fields in square
kilometers (for ELM) or square meters (for CICE),
as expected in the raw output from these two models.
To convert from angular to areal values, NCO assumes a
spherical Earth with radius 6,371,220 m or 6,371,229 m,
for ELM and CICE, respectively.
The ouput sgs_frc field is expressed as a decimal fraction in all
cases except for ‘-P cice’ which stores the fraction in percent.
Thus the generic SGS and model-specific convenience options
produce equivalent results, and the latter is intended to be
indistinguishable (in terms of metadata and units) to raw model
output.
This makes it more interoperable with many existing analysis scripts.
--par_typ
, --par_md
, --parallel_type
, --parallel_mode
, --parallel
)’ ¶Specifies the desired file-level parallelism mode, either Background,
MPI, or Serial.
File-level parallelism accelerates throughput when regridding multiple
files in one ncremap
invocation, and has no effect when only
one file is to be regridded.
Note that the ncclimo
and ncremap
semantics for
selecting file-level parallelism are identical, though their defaults
differ (Background mode for ncclimo
and Serial mode for
ncremap
).
Select the desired mode with the argument to
‘--par_typ=par_typ’.
Explicitly select Background mode with par_typ values of
bck
, background
, or Background
.
The values mpi
or MPI
select MPI mode, and
the srl
, serial
, Serial
, nil
, or
none
will select Serial mode (which disables file-level
parallelism, though still allows intra-file OpenMP parallelism).
The default file-level parallelism for ncremap
is Serial
mode (i.e., no file-level parallelism), in which ncremap
processes one input file at a time.
Background and MPI modes implement true file-level
parallelism.
Typically both these parallel modes scale well with sufficent
memory unless and until I/O contention becomes the bottleneck.
In Background mode ncremap
issues all commands to regrid
the input file list as UNIX background processes on the
local node.
Nodes with mutiple cores and sufficient RAM take advantage
of this to simultaneously regrid multiple files.
In MPI mode ncremap
issues commands to regrid
the input file list in round-robin fashion to all available compute
nodes.
Prior to NCO version 4.9.0 (released December, 2019),
Background and MPI parallelism modes both regridded all
the input files at one time and there was no way to limit the number
of files being simultaneously regridded.
Subsequent versions allow finer grained parallelism by introducing
the ability to limit the number of discrete workflow elements or
“jobs” (i.e., file regriddings) to perform simultaneously within an
ncremap
invocation or “workflow”.
As of NCO version 4.9.0 (released December, 2019),
the ‘--job_nbr=job_nbr’ option specifies the maximum number
of files to regrid simultaneously on all nodes being harnessed by the
workflow.
Thus job_nbr is an additional parameter to fine-tune file level
parallelism (it has no effect in Serial mode).
Please see the ncremap
job_nbr documentation for more
details.
--pdq
, --prm_opt
, --prm
, --permute
)’ ¶Specifies the dimension permutation option used by ncpdq
prior to regridding.
Synonyms include ‘--pdq’, ‘--prm’, ‘--prm_opt’, and
‘--permute’.
Files to be regridded must have their horizontal spatial dimension(s)
in the last (most-rapidly-varying) position.
Most data files store variables with dimensions arranged in this
order, and ncremap
internally sets the permutation option
for datasets known (via the --prc_typ
option) to require
permutation.
Use ‘--permute=pdq_opt’ to override the internally preset
defaults.
This is useful when regridding files that contain new dimensions that
ncremap
has not encountered before.
For example, if a development version of an MPAS model
inserts a new dimension new_dim
after the horizontal spatial
dimension nCells
in some variables, that would prevent the
regridder from working because the horizontal dimension(s) must
be the last dimension(s).
The workaround is to instruct ncremap
what the permutation
option to ncpdq
should be in order to place the horizontal
spatial dimension(s) at the end of all variables:
ncremap --permute=Time,new_dim,nCells --map=map.nc in.nc out.nc ncremap --permute=time,new_dim,lat,lon --map=map.nc in.nc out.nc
The API for this option changed in NCO version
5.0.4 (released December, 2021).
Prior to this, the option argument needed to include the entire
option string to be passed to ncpdq
including the
‘-a’, e.g., ‘--permute='-a time,new_dim,lat,lon'’.
Now ncremap
supplies the implicit ‘-a’ internally
so the user does not need to know the ncpdq
syntax.
--no_permute
, --no_prm
, --no_pdq
, --no_ncpdq
)’ ¶Introduced in NCO version 5.0.0 (released June, 2021),
this switch (which takes no argument) causes the regridder to skip the
default permutation of dimensions before regridding (notably
MPAS) datasets known to store data with non-horizontal
most-rapidly varying dimensions.
ncremap
normally ensures that input fields are stored in the
shape expected by regridder weights (horizontal dimensions last)
by permuting the dimensions with ncpdq
.
However, permutation consumes time and generates an extra intermediate
file.
Avoid this time penalty by using the ‘--no_permute’ flag if the
input fields are known to already have trailing horizontal
dimensions.
--preserve
, --prs_stt
, --preserve_statistic
)’ ¶This is a simple, intuitive option to specify how weight application should treat destination gridcells that are not completely overlapped by source gridcells with valid values. Destination gridcells that are completely overlapped by valid source values are unaffected. The two statistics that can be preserved for incompletely overlapped gridcells are the local mean and/or the global integral of the source values. Hence the valid values for this option are ‘integral’ (and, as of NCO version 5.0.2, released in September, 2021, its synonyms ‘global’ and ‘global_integral’) and ‘mean’ (or its synonyms ‘local’, ‘local_mean’, ‘gridcell’, ‘gridcell_mean’, ‘natural_values’). NCO version 5.1.5, released in March, 2023, fixed a longstanding problem with the implmentation of this option, which had essentially been broken since its inception. The option finally works as documented.
Specifying --preserve=integral
sets the destination
gridcell equal to the sum of the valid source values times their
source weights.
This sum is not renormalized by the (valid) fractional area
covered.
This is exactly equivalent to setting --rnr=off
, i.e.,
no renormalization (see Regridding).
If the weights were generated by a conservative algorithm then the
output will be conservative, and will conserve the global integral of
the input field in all cases.
This is often desired for regridding quantities that should be
conserved, e.g., fluxes, and is the default weight application method
in ncremap
(except in MPAS-mode).
Specifying --preserve=mean
sets the destination
gridcell equal to the mean of the valid source values times
their source weights.
This is exactly equivalent to setting --rnr=0.0
, i.e.,
renormalizing the integral value by the (valid) fractional area
covered (see Regridding).
This is often desired for regridding state variables, e.g.,
temperature, though it is not the default behavior and must be
explicitly requested (except in MPAS-mode).
These two types of preserved statistics, integral and mean, produce
identical output in all gridcells where there are no missing data,
i.e., where valid data completely tile the gridcell.
By extension, these two statistics produce identical global means
if valid data completely tile the sphere.
--rgr_opt
, --regrid_options
)’ ¶ncremap
passes rgr_opt directly through to the
regridder.
This is useful to customize output grids and metadata.
One use is to rename output variables and dimensions from the defaults
provided by or derived from input data.
The default value is ‘--rgr lat_nm_out=lat --rgr lon_nm_out=lon’,
i.e., by default ncremap
always names latitude and longitude
“lat” and “lon”, respectively, regardless of their input names.
Users might use this option to set different canonical axes names,
e.g., ‘--rgr lat_nm_out=y --rgr lon_nm_out=x’.
--rnr_thr
, --thr_rnr
, --rnr
, --renormalize
, --renormalization_threshold
)’ ¶Use this option to request renormalized (see Regridding) weight-application and to specify the weight threshold, if any. For example, ‘-r 0.9’ tells the regridder to renormalize with a weight threshold of 90%, so that all destination gridcells with at least 90% of their area contributed by valid source gridcells will be contain valid (not missing) values that are the area-weighted mean of the valid source values. If the weights are conservative, then the output gridcells on the destination grid will preserve the mean of the input gridcells. Specifying ‘-r 0.9’ and ‘--rnr_thr=0.9’ are equivalent. Renormalization can be explicitly turned-off by setting rnr_thr to either of the values ‘off’, or ‘none’. The ‘--preserve=prs_stt’ option performs the same task as this option except it does not allow setting an arbitrary threshold fraction.
--rgn_dst
, --dst_rgn
, --regional_destination
)’ ¶--rgn_src
, --src_rgn
, --regional_source
)’Use these flags which take no argument to indicate that a user-supplied
(i.e., with ‘-s grd_src’ or ‘-g grd_dst’)
grid is regional.
The ERWG weight-generator (at least all versions
before 8.0) needs to be told whether the source, destination, or
both grids are regional or global in order to optimize weight
production.
ncremap
supplies this information to the regridder for grids
it automatically infers from data files.
However, the regridder needs to be explicitly told if user-supplied
(i.e., with either ‘-s grd_src’ or ‘-g grd_dst’)
grids are regional because the regridder does not examine supplied grids
before calling ERWG which assumes, unless told otherwise,
that grids are global in extent.
The sole effect of these flags is to add the arguments
‘--src_regional’ and/or ‘--dst_regional’ to ERWG
calls.
Supplying regional grids without invoking these flags may dramatically
increase the map-file size and time to compute.
According to E3SM MPAS documentation, ERWG
“considers a mesh to be regional when the mesh is not a full sphere
(including if it is planar and does not cover the full sphere).
In other words, all MPAS-O and MPAS-LI grids are
regional” to ERWG.
--grd_src
, --grid_source
, --source_grid
, --src_grd
)’ ¶Specifies the source gridfile. NCO will use ERWG or TempestRemap weight-generator to combine this with a destination gridfile (either inferred from dst_fl, or generated by supplying a ‘-G grd_sng’ option) to generate remapping weights. grd_src is not modified, and may have read-only permissions. One appropriate circumstance to specify grd_src is when the input-file(s) do not contain sufficient information for NCO to infer an accurate or complete source grid. (Unfortunately many dataset producers do not record information like cell edges/vertices in their datasets. This is problematic for non-rectangular grids.) NCO assumes that grd_src, when supplied, applies to every input-file. Thus NCO will call the weight generator only once, and will use that map_fl to regrid every input-file.
Although ncremap
usually uses the contents of a pre-existing
grd_src to create mapping weights, there are some situations
where ncremap
creates the file specified by grd_src
(i.e., treats it as a location for storing output).
When a source grid is inferred or created from other user-specified
input, ncremap
will store it in the location specified by
grd_src.
This allows users to, for example, name the grid on which an input
dataset is stored when that grid is not known a priori.
This functionality is only available for SCRIP-format grids.
--skl_fl
, --skl
, --skl_fl
)’ ¶Normally ncremap
only creates a SCRIP-format
gridfile named grd_dst when it receives the grd_sng
option.
The ‘--skl’ option instructs ncremap
to also produce a
“skeleton” file based on the grd_sng
argument.
A skeleton file is a bare-bones datafile on the specified grid.
It contains the complete latitude/longitude grid and an area field.
Skeleton files are useful for validating that the grid-creation
instructions in grd_sng perform as expected.
--stdin
, --inp_std
, --std_flg
, --redirect
, --standard_input
)’ ¶This switch (which takes no argument) explicitly indicates that
input file lists are provided via stdin
, i.e., standard input.
In interactive environments, ncremap
can automatically
(i.e., without any switch) detect whether input is provided via
stdin
.
This switch is never required for jobs run in an interactive shell.
However, non-interactive batch jobs (such as those submitted to the
SLURM and PBS schedulers) make it impossible to
unambiguously determine whether input has been provided via
stdin
.
Specifically, the ‘--stdin’ switch must be used with
ncremap
in non-interactive batch jobs on PBS
when the input files are piped to stdin
, and on SLURM
when the input files are redirected from a file to stdin
90.
Using ‘--stdin’ in any other context (e.g., interactive shells)
is optional.
In some other non-interactive environments (e.g., crontab
,
nohup
, Azure CI, CWL),
ncremap
may mistakenly expect input to be provided on
stdin
simply because the environment is using stdin
for
other purposes.
In such cases users may disable checking stdin
by explicitly
invoking the ‘--no_stdin’ flag (described next), which works for
both ncclimo
and ncremap
.
--no_stdin
, --no_inp_std
, --no_redirect
, --no_standard_input
)’ ¶First introduced in NCO version 4.8.0 (released May, 2019),
this switch (which takes no argument) disables checking standard input
(aka stdin
) for input files.
This is useful because ncclimo
and ncremap
may
mistakenly expect input to be provided on stdin
in environments
that use stdin
for other purposes.
Some non-interactive environments (e.g., crontab
,
nohup
, Azure CI, CWL), may
use standard input for their own purposes, and thus confuse
NCO into thinking that you provided the input files names
via the stdin
mechanism.
In such cases users may disable the automatic checks for standard
input by explicitly invoking the ‘--no_stdin’ flag.
This switch is usually not required for jobs in an interactive shell.
Interactive SLURM shells can also commandeer stdin
,
as is the case on the DOE
machine named Chrysalis.
This behavior appears to vary depending on the SLURM
implementation.
srun -N 1 -n 1 ncremap --no_stdin -m map.nc in.nc out.nc
--tmp_drc
, --drc_tmp
, --tmp_dir
, --dir_tmp
, --tmp_drc
)’ ¶Specifies the directory in which to place intermediate output files.
Depending on how it is invoked, ncremap
may generate
a few or many intermediate files (grids and maps) that it will, by
default, remove upon successful completion.
These files can be large, so the option to set tmp_drc is offered
to ensure their location is convenient to the system.
If the user does not specify tmp_drc, then ncremap
uses
the value of $TMPDIR
, if any, or else /tmp if it exists,
or else it uses the current working director ($PWD
).
--thr_nbr
, --thr
, --thread_number
, --threads
)’ ¶Specifies the number of threads used per regridding process
(see OpenMP Threading).
ncremap
can use OpenMP shared-memory techniques to
simultaneosly regrid multiple variables within a single file.
This shared memory parallelism is quite efficient because it
uses a single copy of the regridding weights in physical memory
to regrid multiple variable simultaneously.
Even so, simultaneously regridding multiple variables, especially at
high resolution, may be memory-limited, meaning that the insufficient
RAM can often limit the number of variables that the system
can simultaneously regrid.
By convention all variables to be regridded share the same
regridding weights stored in a map-file, so that only one copy
of the weights needs to be in memory, just as in Serial mode.
However, the per-thread (i.e., per-variable) OpenMP memory demands
are considerable, with the memory required to regrid variables
amounting to no less than about 5–7 times (for type NC_FLOAT
)
and 2.5–3.5 times (for type NC_DOUBLE
) the size of the
uncompressed variable, respectively.
Memory requirements are so high because the regridder performs
all arithmetic in double precision to retain the highest accuracy,
and must allocate separate buffers to hold the input and output
(regridded) variable, a tally array to count the number of missing
values and an array to sum the of the weights contributing to each
output gridcell (the last two arrays are only necessary for variables
with a _FillValue
attribute).
The input, output, and weight-sum arrays are always double precision,
and the tally array is composed of four-byte integers.
Given the high memory demands, one strategy to optimize thr_nbr
for repetitious workflows is to increase it to keep doubling it (1, 2,
4, …) until throughput stops improving.
With sufficient RAM, the NCO regridder scales well
up to 8–16 threads.
--unpack
, --upk
, --upk_inp
)’ ¶This switch (which takes no argument) causes ncremap
to
unpack (see Packed data) input data before regridding it.
This switch causes unpacking at the regridding stage that occurs
after map generation.
Hence this switch does not benefit grid inferral.
Grid inferral examines only the coordinate variables in a dataset.
If coordinates are packed (a terrible practice) in a file from which a
grid will be inferred, users should first manually unpack the file (this
option will not help).
Fortunately, coordinate variables are usually not packed, even in files
with other packed data.
Many institutions (like NASA) pack datasets to conserve space before distributing them. This option allows one to regrid input data without having to manually unpack it first. Beware that NASA uses at least three different and incompatible versions of packing in its L2 datasets. The unpacking algorithm employed by this option is the default netCDF algorithm, which is appropriate for MOD04 and is inappropriate for MOD08 and MOD13. See Packed data for more details and workarounds.
--ugrid_fl
, --ugrid
, --ugrid_fl
)’ ¶Normally ncremap
only infers a gridfile named grd_dst in
SCRIP-format.
The ‘ugrid_fl’ option instructs ncremap
to infer both a
SCRIP-format gridfile named grd_dst and a
UGRID-format gridfile named ugrid_fl.
This is an experimental feature and the UGRID file is
only expected to be valid for global rectangular grids.
--unq_sfx
, --unique_suffix
, --suffix
)’ ¶Specifies the suffix used to label intermediate (internal) files
generated by the regridding workflow.
Unique names are required to avoid interference among parallel
invocations of ncremap
.
The default unq_sfx generated internally by ncremap
is
‘.pidPID’ where PID is the process ID.
Applications can provide their own more or less informative suffixes
using the ‘--unq_sfx=unq_sfx’ option.
The suffix should be unique so that no two simultaneously executing
instances of ncremap
can generate the same file.
For example, when the ncclimo
climatology script issues a
dozen ncremap
commands to regrid all twelve months
simultaneously, it uses ‘--unq_sfx=mth_idx’ to encode the
climatological month index in the unique suffix.
Note that the controlling process PID is insufficient to
disambiguate all the similar temporary files when the input file list
is divided into multiple concurrent jobs (controlled by the
‘--job_nbr=job_nbr’ option).
Those files have their user-provided or internally generated
unq_sfx extended by fl_idx, their position in the input
file list, so that their full suffix is
‘.pidPID.fl_idx’.
Finally, a special value of unq_sfx is available to aid
developers: if unq_sfx is ‘noclean’ then ncremap
retains (not removes) all intermediate files after completion.
--var_lst
, --var
, --vars
, --variables
, --variable_list
)’ ¶The ‘-v’ option causes ncremap
to regrid only the
variables in var_lst.
It behaves like subsetting (see Subsetting Files) in the rest of
NCO.
--var_rgr
, --rgr_var
, --var_cf
, --cf_var
, cf_variable
)’ ¶The ‘-V’ option tells ncremap
to use the same grid as
var_rgr in the input file.
If var_rgr adheres to the CF coordinates
convention described
here,
then ncclimo
will infer the grid as represented by those
coordinate variables.
This option simplifies inferring grids when the grid coordinate names
are unknown, since ncclimo
will follow the CF
convention to learn the identity of the grid coordinates.
Until NCO version 4.6.0 (May, 2016), ncremap
would
not follow CF conventions to identify coordinate variables.
Instead, ncremap
used an internal database of “usual
suspects” to identify latitude and longitude coordinate variables.
Now, if var_rgr is CF-compliant, then ncremap
will automatically identify the horizontal spatial dimensions.
If var_rgr is supplied but is not CF-compliant, then
ncremap
will still attempt to identify horizontal spatial
dimensions using its internal database of “likely names”.
If both these automated methods fail, manually supply ncremap
with the names of the horizontal spatial dimensions
# Method used to obtain horizontal spatial coordinates: ncremap -V var_rgr -d dst.nc -O ~/rgr in.nc # CF coordinates convention ncremap -d dst.nc -O ~/rgr in.nc # Internal database ncremap -R "--rgr lat_nm=xq --rgr lon_nm=zj" -d dst.nc -O ~/rgr in.nc # Manual
--vrb_lvl
, --vrb
, --verbosity
, --verbosity_level
)’ ¶Specifies a verbosity level similar to the rest of NCO.
If vrb_lvl = 0, ncremap
prints nothing except
potentially serious warnings.
If vrb_lvl = 1, ncremap
prints the basic
filenames involved in the remapping.
If vrb_lvl = 2, ncremap
prints helpful comments
about the code path taken.
If vrb_lvl > 2, ncremap
prints even more detailed
information.
Note that vrb_lvl is distinct from dbg_lvl which is
passed to the regridder (ncks
) for additional diagnostics.
--vrt_nm
, --plev_nm
, --vrt_crd
, --vertical_coordinate_name
)’ ¶The ‘--vrt_nm=vrt_nm’ option instructs ncremap
to use vrt_nm, instead of the default plev
, as the vertical
coordinate name for pure pressure grids.
This option first appeared in NCO version 4.8.0,
released in May, 2019.
Note that the vertical coordinate may be specified in millibars
in some important reanalyses like ERA5, whereas many
models express the vertical coordinate in Pascals.
The user must ensure that the vertical coordinate in the template
vertical grid-file is in the same units (e.g., mb or Pa) as the
vertical coordinate in the file to be vertically interpolated.
--vrt_out
, --vrt_fl
, --vrt
, --vrt_grd_out
)’ ¶The ‘--vrt_out=vrt_fl’ option instructs ncremap
to vertically interpolate the input file to the vertical coordinate
grid contained in the file vrt_fl.
This option first appeared in NCO version 4.8.0,
released in May, 2019.
The vertical gridfile vrt_fl must specify one of the three
vertical gridtypes that ncremap
understands: pure-pressure,
hybrid sigma-pressure, or geometric depth (e.g., for ocean data).
Note that pure-sigma coordinates are a special case of hybrid
sigma-pressure coordinates and can always be reformatted to work as
well.
Besides the vertical grid-type, the main assumptions, constraints, and priorities for future development of vertical regridding are:
ncremap
.
When this occurs, ncremap
internally performs the
vertical interpolation prior to the horizontal regridding.
Exploiting this feature can have some unintended consequences.
For example, horizontal regridding requires that the horizontal
spatial dimension vary most rapidly, whereas vertical
interpolation makes no such assumption.
When the regridder needs to permute the dimension ordering of
the input dataset in order to perform horizontal regridding,
this step actually precedes the vertical regridding.
This order of operations is problematic and we are working
to address the issues in future updates.
hyai
, hyam
, hybi
,
hybm
, ilev
, lev
, P0
, and PS
(for E3SM/CESM hybrid grids), lev
,
lev_2
, and lnsp
(for ECMWF hybrid grids
only), depth
, timeMonthly_avg_zMid
(for
MPAS depth grids), and plev
and level
(for pure-pressure grids with NCEP and ERA5
conventions, respectively).
The infrastructure to provide alternate names for any of these
input/output variables names is straightforward, and is heavily
used for horizontal spatial regridding.
Allowing this functionality will not be a priority until we are
presented with a compelling use-case.
The simplest vertical grid-type, a pure-pressure grid, contains
the horizontally uniform vertical pressure levels in a one-dimensional
coordinate array named (by default) plev
.
The plev
dimension may have any number of levels and the values
must monotonically increase or decrease.
A 17-level NCEP pressure grid, for example, is easy to create:
# Construct monotonically decreasing 17-level NCEP pressure grid ncap2 -O -v -s 'defdim("plev",17);plev[$plev]={100000,92500,85000, \ 70000,60000,50000,40000,30000,25000,20000,15000,10000,7000,5000, \ 3000,2000,1000};' vrt_prs_ncep_L17.nc
ncremap
will search the supplied vertical grid file for
the coordinate named plev
, or, for any coordinate name
specified by with the plev_nm_in
option to the regridder,
e.g., ‘--plev_nm_in=z’.
Hybrid-coordinate grids are a hybrid between a sigma-coordinate
grid (where each pressure level is a fixed fraction of a
spatiotemporally varying surface pressure) and a pure-pressure grid
that is spatially invariant (as described above).
The so-called hybrid A and B coefficients specify the
fractional weight of the pure-pressure and sigma-grids, respectively,
at each level.
The hybrid gridfile must specify A and B coefficients for
both layer midpoints and interfaces with these standard
(as employed by CESM and E3SM) names and
dimensions: hyai(ilev)
, hybi(ilev)
, hyam(lev)
,
and hybm(lev)
.
The reference pressure and surface pressure must be named
P0
and PS
, respectively.
The pressures at all midpoints and interfaces are then defined as
prs_mdp[time,lev, lat,lon]=hyam*P0+hybm*PS # Midlayer prs_ntf[time,ilev,lat,lon]=hyai*P0+hybi*PS # Interface
The scalar reference pressure P0
is typically 100000 Pa
(or 1000 hPa) while the surface pressure PS
is a (possibly
time-varying) array with one or two spatial dimensions, and its values
are in the same dimensional units (e.g., Pa or hPa) as P0
.
It is often useful to create a vertical grid file from existing
model or reanalysis output.
We call vertical grid files “skinny” if they contain only the
vertical information.
Skinny grid-files are easy to create with ncks
, e.g.,
ncks -C -v hyai,hyam,hybi,hybm,P0 in_L128.nc vrt_hyb_L128.nc
Such files are extremely small and portable, and represent
all the hybrid files created by the model because the vertical
grid parameters are time-invariant.
A “fat” vertical grid file would also include the time-varying
grid information, i.e., the surface pressure field.
Fat grid-files are also easy to create with ncks
, e.g.,
ncks -C -v hyai,hyam,hybi,hybm,P0,PS in_L128.nc vrt_hyb_L128.nc
The full (layer-midpoint) and half (layer-interface) pressure fields
prs_mdp
and prs_ntf
, respectively, can be reconstructed
from any fat grid-file with an ncap2
command:
ncap2 -s 'prs_mdp[time,lat,lon,lev]=P0*hyam+PS*hybm' \ -s 'prs_ntf[time,lat,lon,ilev]=P0*hyai+PS*hybi' in.nc out.nc
Hybrid-coordinate grids define a pure-sigma or pure-pressure grid when either their A or B coefficients are zero, respectively. For example, the following creates the hybrid-coordinate representation of a pure-pressure grid with midpoints every 100 hPa from 100 hPa to 1000 hPa:
ncap2 -O -v -s 'defdim("ilev",11);defdim("lev",10);P0=100000.0; \ hyai=array(0.05,0.1,$ilev);hyam=array(0.1,0.1,$lev); \ hybi=0.0*hyai;hybm=0.0*hyam;' vrt_hyb_L10.nc
NCO currently has no other means of representing pure sigma vertical grids (as opposed to pure pressure grids).
As of July 2019 and NCO version 4.8.1, NCO supports regridding ECMWF datasets in IFS hybrid vertical coordinate format to CESM/E3SM-format hybrid vertical grids. Unfortunately there was a regression and this functionality was broken between about 2023–2024 (the workaround is to use older NCO versions like 4.9.0). NCO once agains supports this functionality as of October 2024 (NCO version 5.2.9), though now the user must employ the ‘--ps_nm=lnsp’ option shown below.
The native IFS hybrid datasets that we have seen store
pressure coordinates in terms of a slightly different formula that
employs the log of surface pressure (lnsp
) instead of surface
pressure PS
, that redefines hyai
and hyam
to be
pure-pressure offsets (rather than coefficients), and that omits
P0
:
prs_mdp[time,lev, lat,lon]=hyam+hybm*exp(lnsp) # Midlayer prs_ntf[time,lev_2,lat,lon]=hyai+hybi*exp(lnsp) # Interface
Note that ECMWF also alters the names of the vertical
half-layer coordinate and employs distinct dimensions (nhym
and
nhyi
) for the hybrid variables hyai(nhyi)
,
hybi(nhyi)
, hyam(nhym)
, and hybm(nhym)
.
ECMWF uses the vertical coordinates lev
and
lev_2
for full-layer (i.e., midlayer) and half-layer
(i.e., interface) for all other variables.
To invoke ncremap
on a hybrid coordinate dataset in
IFS format, one must specify that the surface pressure
variable is named lnsp
.
No modifications to the IFS dataset are necessary.
The vertical grid file should be in CESM/E3SM format.
zender@spectral:~$ ncks -m -C -v lnsp,hyai,hyam,hybi,hybm,lev,lev_2 ifs.nc netcdf ecmwf_ifs_f640L137 { dimensions: lev = 137 ; lev_2 = 1 ; nhyi = 138 ; nhym = 137 ; variables: double hyai(nhyi) ; hyai:long_name = "hybrid A coefficient at layer interfaces" ; hyai:units = "Pa" ; double hyam(nhym) ; hyam:long_name = "hybrid A coefficient at layer midpoints" ; hyam:units = "Pa" ; double hybi(nhyi) ; hybi:long_name = "hybrid B coefficient at layer interfaces" ; hybi:units = "1" ; double hybm(nhym) ; hybm:long_name = "hybrid B coefficient at layer midpoints" ; hybm:units = "1" ; double lev(lev) ; lev:standard_name = "hybrid_sigma_pressure" ; lev:long_name = "hybrid level at layer midpoints" ; lev:formula = "hyam hybm (mlev=hyam+hybm*aps)" ; lev:formula_terms = "ap: hyam b: hybm ps: aps" ; lev:units = "level" ; lev:positive = "down" ; double lev_2(lev_2) ; lev_2:standard_name = "hybrid_sigma_pressure" ; lev_2:long_name = "hybrid level at layer midpoints" ; lev_2:formula = "hyam hybm (mlev=hyam+hybm*aps)" ; lev_2:formula_terms = "ap: hyam b: hybm ps: aps" ; lev_2:units = "level" ; lev_2:positive = "down" ; float lnsp(time,lev_2,lat,lon) ; lnsp:long_name = "Logarithm of surface pressure" ; lnsp:param = "25.3.0" ; } // group / zender@spectral:~$ ncks -m vrt_grd.nc netcdf vrt_hyb_L72 { dimensions: ilev = 73 ; lev = 72 ; variables: double P0 ; P0:long_name = "reference pressure" ; P0:units = "Pa" ; double hyai(ilev) ; hyai:long_name = "hybrid A coefficient at layer interfaces" ; double hyam(lev) ; hyam:long_name = "hybrid A coefficient at layer midpoints" ; double hybi(ilev) ; hybi:long_name = "hybrid B coefficient at layer interfaces" ; double hybm(lev) ; hybm:long_name = "hybrid B coefficient at layer midpoints" ; } // group / zender@spectral:~$ ncremap --ps_nm=lnsp --vrt_grd=vrt_grd.nc ifs.nc out.nc zender@spectral:~$
The IFS file can be horizontally regridded in the same
invocation.
ncremap
automagically handles all of the other details.
Currently ncremap
can only interpolate data from (not to) an
IFS-format hybrid vertical grid data file.
To interpolate to an IFS-format hybrid vertical grid, one
must place the destination vertical grid into a
CESM/E3SM-format hybrid vertical grid file (see above)
that includes a PS
surface pressure field (not lnsp
log-surface pressure) for the destination grid.
The lev
and ilev
coordinates of a hybrid grid are
defined by the hybrid coefficients and reference pressure, and are
by convention stored in millibars (not Pascals) as follows:
ilev[ilev]=P0*(hyai+hybi)/100.0; lev[lev]=P0*(hyam+hybm)/100.0;
A vertical hybrid grid file vrt_fl must contain at least
hyai
, hybi
, hyam
, hybm(lev)
and
P0
; PS
, lev
, and ilev
are optional.
(Exceptions for ECMWF grids are noted above).
All hybrid-coordinate data files must contain PS
.
Interpolating a pure-pressure coordinate data file to hybrid
coordinates requires, therefore, that the hybrid-coordinate
vrt_fl must contain PS
and/or the input data file
must contain PS
.
If both contain PS
then the PS
from the vrt_fl
takes precedence and will be used to construct the hybrid grid
and then copied without to the output file.
In all cases lev
and ilev
are optional in input
hybrid-coordinate data files and vertical grid-files.
They are diagnosed from the other parameters using the above
definitions.
The minimal requirements—a plev
coordinate for a
pure-pressure grid or five parameters for a hybrid grid—allow
vertical gridfiles to be much smaller than horizontal gridfiles
such as SCRIP files.
Moreover, data files from ESMs or analyses (NCEP,
MERRA2, ERA5) are also valid gridfiles.
The flexibility in gridfile structure makes it easy to intercompare
data from the same or different sources.
ncremap
supports vertical interpolation between all
combinations of pure-pressure and hybrid-pressure grids.
The input and output (aka source and destination) pressure grids may
monotonically increase or decrease independently of eachother (i.e.,
one may increase and the other may decrease).
When an output pressure level is outside the input pressure range
for that column, then all variables must be extrapolated (not
interpolated) to that/those level(s).
By default ncremap
sets all extrapolated values to the
nearest valid value.
Temperature and geopotential height are exceptions to this rule.
Temperature variables (those named T
or ta
, anyway) are
extrapolated upwards towards space using the nearest neighbor
assumption, and downwards beneath the surface assuming a moist
adiabatic lapse rate of 6.5 degrees centigrade per 100 millibars.
Geopotential variables (those named Z3
or zg
, anyway)
are extrapolated upwards and downwards using the hypsometric equation
91
with constant global mean virtual temperature
tpt = 288K.
This assumption leads to unrealistic values where tpt differs
significantly from the global mean surface temperature.
Using the local tpt itself would be a much better approximation,
yet would require a time-consuming implementation.
Please let us know if accurate surface geopotential extrapolation in
cold regions is important to you.
Interpolation to and from hybrid coordinate grids works on both
midpoint and interface fields (i.e., on variables with lev
or
ilev
dimensions), while interpolation to and from pure-pressure
grids applies to fields with, or places output of fields on, a
plev
dimension.
All other fields pass through the interpolation procedure unscathed.
Input can be rectangular (aka RLL), curvilinear, or
unstructured.
--ps_nm
, --ps_name
, --vrt_ps
, --ps
)’ ¶It is sometimes convenient to store the ps_nm field in an external file from the field(s) to be regridded. For example, CMIP-style timeseries are often written with only one variable per file. NCO supports this organization by accepting ps_nm arguments in the form of a filename followed by a slash and then a variable name:
ncremap --vrt_in=vrt.nc --ps_nm=ps in.nc out.nc # Use ps not PS ncremap --vrt_in=vrt.nc --ps_nm=/external/file.nc/ps in.nc out.nc
This same functionality (of optionally embedding a filename into the variable name) is also implemented for the sgs_frc variable.
--ps_rtn
, --rtn_sfc_prs
, --retain_surface_pressure
)’ ¶As of NCO version 5.1.5 (March, 2023), ncremap
includes a --ps_rtn
switch (with long-option equivalents
--rtn_sfc_prs
and --retain_surface_pressure
) to
facilitate “round-trip” vertical interpolation such as
hybrid-to-pressure followed by pressure-to-hybrid interpolation.
By default ncremap
excludes the surface pressure field named
ps_nm from the output after hybrid-to-pressure interpolation.
The --ps_rtn
switch (which takes no argument) instructs the
regridder to retain the surface pressure field after
hybrid-to-pressure interpolation.
The surface pressure field is then available for subsequent
interpolation back to a hybrid vertical coordinate:
ncremap --ps_rtn --ps_nm=ps --vrt_out=ncep.nc in.nc out_ncep.nc ncremap --ps_rtn -v T,Q,U,PS --vrt_out=ncep.nc in.nc out_ncep.nc ncremap --vrt_out=hybrid.nc out_ncep.nc out_hybrid.nc
--vrt_ntp
, --ntp_mth
, --interpolation_type
, --interpolation_method
)’ ¶Specifies the interpolation method for destination points within the
vertical range of the input data during vertical interpolation.
Valid values and their synonyms are
lin
(synonyms linear
and lnr
), and
log
(synonyms logarithmic
and lgr
).
Default is vrt_ntp = log
.
The vertical interpolation algorithm defaults to linear in
log(pressure).
Logarithmic interpolation is more natural for gases like the
atmosphere, because it is compressible, than for condensed media like
oceans or Earth’s interior, which are incompressible.
To instead interpolate linearly in the vertical coordinate, use
the ‘ntp_mth=lin’ option.
NCO supports this feature as of version 4.9.0 (December,
2019).
--vrt_xtr
, --xtr_mth
, --extrapolation_type
, --extrapolation_method
)’ ¶Specifies the extrapolation method for destination points outside the
vertical range of the input data during vertical interpolation.
Valid values and their synonyms are
linear
(synonyms lnr
and lin
),
mss_val
(synonyms msv
and missing_value
),
nrs_ngh
(synonyms nn
and nearest_neighbor
), and
zero
(synonym nil
).
Default is vrt_xtr = nrs_ngh
.
NCO supports this feature as of version 4.8.1 (July, 2019).
--wgt_opt
, --weight_options
, --esmf_opt
, --esmf_options
, --tps_opt
, --tempest_options
)’ ¶ncremap
passes wgt_opt directly through to the
weight-generator (currently ERWG or
TempestRemap’s GenerateOfflineMap
) (and not to
GenerateOverlapMesh
).
The user-specified contents of wgt_opt, if any, supercede the
default contents for the weight-generator.
The default option for ERWG is ‘--ignore_unmapped’).
ncremap
4.7.7 and later additionally set the ERWG
‘--ignore_degenerate’ option, though if the run-time ERWG
reports its version is 7.0 (March, 2018) or later.
This is done to preserve backwards compatibility since, ERWG
7.1.0r and later require ‘--ignore_degenerate’ to successfully
regrid some datasets (e.g., CICE) that previous ERWG
versions handle fine.
Users of earlier versions of ncremap
that call ESMF
7.1.0r and later can explicitly pass the base ERWG options
with ncremap
’s ‘--esmf_opt’ option:
# Use when NCO <= 4.7.6 and ERWG >= 7.1.0r ncremap --esmf_opt='--ignore_unmapped --ignore_degenerate' ...
The ERWG and TempestRemap documentation shows all available
options.
For example, to cause ERWG to output to a netCDF4 file,
pass ‘-W "--netcdf4"’ to ncremap
.
By default, ncremap
runs GenerateOfflineMap
without
any options.
To cause GenerateOfflineMap
to use a _FillValue
of
-1, pass ‘-W '--fillvalue -1.0'’ to ncremap
.
Other common options include enforcing monotonicity (which is not the
default in TempestRemap) constraints.
To guarantee monotonicity in regridding from Finite Volume FV
to FV maps (e.g., MPAS-to-rectangular), pass
‘-W '-in_np 1'’ to ncremap
.
To guarantee monotonicity in regridding from Finite Element FE
to FV maps, pass ‘-W '--mono'’.
Common sets of specialized options recommended for TempestRemap are
collected into six boutique algorithms invokable with ‘--alg_typ’
as described above.
--wgt_cmd
, --weight_command
, --wgt_gnr
, --weight_generator
)’ ¶Specifies a (possibly extended) command to use to run the
weight-generator when a map-file is not provided.
This command overrides the default executable executable for the
weight generator, which is ESMF_RegridWeightGen
for
ESMF and GenerateOfflineMap
for TempestRemap.
(There is currently no way to override GenerateOverlapMesh
for TempestRemap).
The wgt_cmd must accept the same arguments as the default
command.
Examples include ‘mpirun -np 24 ESMF_RegridWeightGen’,
‘mpirun-openmpi-mp -np 16 ESMF_RegridWeightGen’, and other
ways of exploiting parallelism that are system-dependent.
Specifying wgt_cmd and supplying (with ‘-m’) a map-file
is not permitted (since the weight-generator would not be used).
--xcl_var
, --xcl
, --exclude
, --exclude_variables
)’ ¶This flag (which takes no argument) changes var_lst,
as set by the --var_lst
option, from an extraction list to an
exclusion list so that variables in var_lst will not be
processed, and variables not in var_lst will be processed.
Thus the option ‘-v var_lst’ must also be present for this
flag to take effect.
Variables explicitly specified for exclusion by
‘--xcl --vars=var_lst[,…]’ need not be present in the
input file.
--xtn_lst
, --xtn_var
, --var_xtn
, --extensive
, --extensive_variables
)’ ¶The ‘-x’ option causes ncremap
to treat the variables in
xtn_lst as extensive, meaning that their value depends on
the gridcell boundaries.
Support for extensive variables during regridding is nascent.
Currently variables marked as extensive are summed, not regridded.
We are interested in “real-world” situations that require regridding
extensive variables, please contact us if you have one.
ncremap
¶ncremap
has two significant limitations to be aware of.
First, for two-dimensional input grids the fields to be regridded must
have latitude and longitude, or, in the case of curvilinear data, the
two equivalent horizontal dimensions, as the final two dimensions in
in_fl.
Fields with other dimension orders (e.g., ‘lat,lev,lon’) will not
regrid properly.
To circumvent this limitation one can employ
ncpdq
(see ncpdq
netCDF Permute Dimensions Quickly)
to permute the dimensions before (and un-permute them after) regridding.
ncremap
utilizes this method internally for some common
input grids.
For example,
# AIRS Level2 vertical profiles ncpdq -a StdPressureLev,GeoTrack,GeoXTrack AIRS_L2.hdf AIRS_L2_ncpdq.nc ncremap -i AIRS_L2_ncpdq.nc -d dst_1x1.nc -O ~/rgr # MPAS-O fields ncpdq -a Time,nVertLevels,maxEdges,MaxEdges2,nEdges,nCells mpas.nc mpas_ncpdq.nc ncremap -R "--rgr col_nm=nCells" -i mpas_ncpdq.nc -m mpas120_to_t62.nc -O ~/rgr
The previous two examples occur so frequently that ncremap
has
been specially equipped to handle AIRS and MPAS
files.
As of NCO version 4.5.5 (February, 2016), the following
ncremap
commands with the ‘-P prc_typ’ option
automagically perform all required permutation and renaming necessary:
# AIRS Level2 vertical profiles ncremap -P airs -i AIRS_L2.nc -d dst_1x1.nc -O ~/rgr # MPAS-O/I fields ncremap -P mpas -i mpas.nc -m mpas120_to_t62.nc -O ~/rgr
The machinery to handle permutations and special options for other datafiles is relatively easy to extend with new prc_typ options. If you work with common datasets that could benefit from their own pre-processing options, contact us and we will try to implement them.
The second limitation is that to perform regridding, ncremap
must read weights from an on-disk mapfile, and cannot yet compute
weights itself and use them directly from RAM.
This makes ncremap
an “offline regridder” and unnecessarily
slow compared to an “integrated regridder” that computes weights and
immediately applies them in RAM without any disk-access.
In practice, the difference is most noticeable when the weights are
easily computable “on the fly”, e.g., rectangular-to-rectangular
mappings.
Otherwise the weight-generation takes much more time than the
weight-application, at which ncremap
is quite fast.
As of NCO version 4.9.0, released in December, 2019,
regridder supports generation of intersection grids and overlap
weights for all finite volume grid combinations.
However these weights are first stored in an offline mapfile, are not
usable otherwise.
One side-effect of ncremap
being an offline regridder is
that, when necessary, it can generate files to store intermediate
versions of grids, maps, and data.
These files are named, by default,
ncremap_tmp_att.nc${unq_sfx},
ncremap_tmp_d2f.nc${unq_sfx},
ncremap_tmp_grd_dst.nc${unq_sfx},
ncremap_tmp_grd_src.nc${unq_sfx},
ncremap_tmp_gnr_out.nc${unq_sfx},
ncremap_tmp_map_*.nc${unq_sfx},
ncremap_tmp_msh_ovr_*.nc${unq_sfx}, and
ncremap_tmp_pdq.nc${unq_sfx}.
They are placed in drc_out with the output file(s).
In general, no intermediate grid or map files are generated when the
map-file is provided.
Intermediate files are always generated when the ‘-P prm_typ’
option is invoked.
By default these files are automatically removed upon successful
completion of the script, unless ncremap
was invoked by
‘--unq_sfx=noclean’ to explitly override this “self-cleaning”
behavior.
Nevertheless, early or unexpected termination of ncremap
will almost always leave behind a collection of these intermediate
files.
Should intermediate files proliferate and/or annoy you, locate and/or
remove all such files under the current directory with
find . -name 'ncremap_tmp*' rm `find . -name 'ncremap_tmp*'`
EXAMPLES
Regrid input file in.nc to the spatial grid in file dst.nc and write the output to out.nc:
ncremap -d dst.nc in.nc out.nc ncremap -d dst.nc -i in.nc -o out.nc ncremap -d dst.nc -O regrid in.nc out.nc ncremap -d dst.nc in.nc regrid/out.nc ncremap -d dst.nc -O regrid in.nc # output named in.nc
NCO infers the destination spatial grid from dst.nc by
reading its coordinate variables and CF attributes.
In the first example, ncremap
places the output in
out.nc.
In the second and third examples, the output file is
regrid/out.nc.
In the fourth example, ncremap
places the output in the
specified output directory.
Since no output filename is provided, the output file will be named
regrid/in.nc.
Generate a mapfile with ncremap
and store it for later re-use.
A pre-computed mapfile (supplied with ‘-m map_fl’) eliminates
time-consuming weight-generation, and thus considerably reduces
wallclock time:
ncremap -m map.nc in.nc out.nc ncremap -m map.nc -I drc_in -O regrid
As of NCO version 4.7.2 (January, 2018), ncremap
supports “canonical” argument ordering of command line arguments most
frequently desired for one-off regridding, where a single input and
output filename are supplied as command-line positional arguments
without switches, pipes, or redirection:
ncremap -m map.nc in.nc out.nc # Requires 4.7.2+ ncremap -m map.nc -i in.nc -o out.nc ncremap -m map.nc -o out.nc in.nc ncremap -m map.nc -O out_dir in1.nc in2.nc ncremap -m map.nc -o out.nc < in.nc ls in.nc | ncremap -m map.nc -o out.nc
These are all equivalent methods, but the canonical ordering shown in the first example only works in NCO version 4.7.2 and later.
ncremap
annotates the gridfiles and mapfiles that it creates
with helpful metadata containing the full provenance of the command.
Consequently, ncremap
is a sensible tool for generating
mapfiles for later use.
To generate a mapfile with the specified (non-default) name
map.nc, and then regrid a single file,
ncremap -d dst.nc -m map.nc in.nc out.nc
To test the remapping workflow, regrid only one or a few variables instead of the entire file:
ncremap -v T,Q,FSNT -m map.nc in.nc out.nc
Regridding generally scales linearly with the size of data to be regridded, so eliminating unnecessary variables produces a snappier response.
Regrid multiple input files with a single mapfile map.nc and write the output to the regrid directory:
ncremap -m map.nc -I drc_in -O regrid ls drc_in/*.nc | ncremap -m map.nc -O regrid
The three ways NCO obtains the destination spatial grid are, in decreasing order of precedence, from map_fl (specified with ‘-m’), from grd_dst (specified with ‘-g’), and (inferred) from dst_fl (specified with ‘-d’). In the first example all likely data files from drc_in are regridded using the same specified mapfile, map_fl = map.nc. Each output file is written to drc_out = regrid with the same name as the corresponding input file. The second example obtains the input file list from standard input, and uses the mapfile and output directory as before.
If multiple input files are on the same grid, yet the mapfile does not exist in advance, one can still regrid all input files without incurring the time-penalty of generating multiple mapfiles. To do so, provide the (known-in-advance) source gridfile or toggle the ‘-M’ switch:
ncremap -M -I drc_in -d dst.nc -O regrid ls drc_in/*.nc | ncremap -M -d dst.nc -O regrid ncremap -I drc_in -s grd_src.nc -d dst.nc -O regrid ls drc_in/*.nc | ncremap -s grd_src.nc -d dst.nc -O regrid ncremap -I drc_in -s grd_src.nc -g grd_dst.nc -O regrid ls drc_in/*.nc | ncremap -s grd_src.nc -g grd_dst.nc -O regrid
The first two examples explicitly toggle the multi-map-generation switch
(with ‘-M’), so that ncremap
refrains from generating
multiple mapfiles.
In this case the source grid is inferred from the first input file,
the destination grid is inferred from dst.nc, and
ncremap
uses ERWG to generate a single mapfile and
uses that to regrid every input file.
The next four examples are variants on this theme.
In these cases, the user provides (with ‘-s grd_src.nc’) the source
gridfile, which will be used directly instead of being inferred.
Any of these styles works well when each input file is known in advance
to be on the same grid, e.g., model data for successive time periods in
a simulation.
The most powerful, time-consuming (yet simultaneously time-saving!)
feature of ncremap
is its ability to regrid multiple input
files on unique grids.
Both input and output can be on any CRUD grid.
ncremap -I drc_in -d dst.nc -O regrid ls drc_in/*.nc | ncremap -d dst.nc -O regrid ncremap -I drc_in -g grd_dst.nc -O regrid ls drc_in/*.nc | ncremap -g grd_dst.nc -O regrid
There is no pre-supplied map_fl or grd_src in these
examples, so ncremap
first infers the output grid from
dst.nc (first two examples), or directly uses the supplied
gridfile grd_dst (second two examples), and calls ERWG
to generate a new mapfile for each input file, whose grid it infers.
This is necessary when each input file is on a unique grid, e.g.,
swath-like data from satellite observations or models with time-varying
grids.
These examples require remarkably little input, since ncremap
automates most of the work.
Finally, ncremap
uses the parallelization options
‘-p par_typ’ and ‘-j job_nbr’ to help manage
high-volume workflow.
On a single node such as a local workstation, use Background mode
to regrid multiple files in parallel
ls drc_in/*.nc | ncremap -p bck -d dst.nc -O regrid ls drc_in/*.nc | ncremap -p bck -j 4 -d dst.nc -O regrid
Both examples will eventually regrid all input files.
The first example regrids two at a time because two is the default
batch size ncremap
employs.
The second example regrids files in batches of four at a time.
Increasing job_nbr will increase throughput so long as the node
is not I/O-limited.
Multi-node clusters can exploit inter-node parallelism in MPI-mode:
qsub -I -A CLI115 -V -l nodes=4 -l walltime=03:00:00 -N ncremap ls drc_in/*.nc | ncremap -p mpi -j 4 -d dst.nc -O regrid
This example shows a typical request for four compute nodes.
After receiving the login prompt from the interactive master node,
execute the ncremap
command with ‘-p mpi’.
ncremap
will send regridding jobs in round-robin fashion
to all available compute nodes until all jobs finish.
It does this by internally prepending an MPI execution
command, like ‘mpirun -H node_name -npernode 1 -n 1’,
to the usual regridding command.
MPI-mode typically has excellent scaling because most
nodes have independent access to hard storage.
This is the easiest way to speed your cumbersome job by factors
of ten or more.
As mentioned above under Limitations, parallelism is currently only
supported when all regridding uses the same map-file.
ncrename
netCDF Renamer ¶SYNTAX
ncrename [-a old_name,new_name] [-a ...] [-D dbg] [-d old_name,new_name] [-d ...] [-g old_name,new_name] [-g ...] [--glb ...] [-H] [-h] [--hdf] [--hdr_pad nbr] [--hpss] [-l path] [-O] [-o output-file] [-p path] [-R] [-r] [-v old_name,new_name] [-v ...] input-file [[output-file]]
DESCRIPTION
ncrename
renames netCDF dimensions, variables, attributes, and
groups.
Each object that has a name in the list of old names is renamed using
the corresponding name in the list of new names.
All the new names must be unique.
Every old name must exist in the input file, unless the old name is
preceded by the period (or “dot”) character ‘.’.
The validity of old_name is not checked prior to the renaming.
Thus, if old_name is specified without the ‘.’ prefix that
indicates the presence of old_name is optional, and old_name
is not present in input-file, then ncrename
will abort.
The new_name should never be prefixed by a ‘.’ (or else the
period will be included as part of the new name).
As of NCO version 4.4.6 (released October, 2014), the
old_name and new_name arguments may include (or be, for
groups) partial or full group paths.
The OPTIONS and EXAMPLES show how to select specific variables
whose attributes are to be renamed.
Caveat lector: Unforunately from 2007–present (August, 2023) the
netCDF library (versions 4.0.0–4.9.3) contains bugs or limitations
that sometimes prevent NCO from correctly renaming coordinate
variables, dimensions, and groups in netCDF4 files.
(To our knowledge the netCDF library calls for renaming always work
well on netCDF3 files so one workaround to many netCDF4 issues is
convert to netCDF3, rename, then convert back).
To understand the renaming limitations associated with particular
netCDF versions, read the |
Although ncrename
supports full pathnames for both
old_name and new_name, this is really “window dressing”.
The full-path to new_name must be identical to the full-path to
old_name in all classes of objects (attributes, variables,
dimensions, or groups).
In other words, ncrename
can change only the local names
of objects, it cannot change the location of the object in the group
hierarchy within the file.
Hence using a full-path in new_name is redundant.
The object name is the terminal path component of new_name and
this object must already exist in the group specified by the
old_name path.
ncrename
is an exception to the normal NCO rule that
the user will be interactively prompted before an existing file is
changed, and that a temporary copy of an output file is constructed
during the operation.
If only input-file is specified, then ncrename
changes
object names in the input-file in place without prompting and
without creating a temporary copy of input-file
.
This is because the renaming operation is considered reversible if the
user makes a mistake.
The new_name can easily be changed back to old_name by using
ncrename
one more time.
Note that renaming a dimension to the name of a dependent variable can be used to invert the relationship between an independent coordinate variable and a dependent variable. In this case, the named dependent variable must be one-dimensional and should have no missing values. Such a variable will become a coordinate variable.
According to the netCDF User Guide, renaming objects in netCDF
files does not incur the penalty of recopying the entire file when the
new_name is shorter than the old_name.
Thus ncrename
may run much faster (at least on netCDF3 files)
if judicious use of header padding (see Metadata Optimization) was
made when producing the input-file.
Similarly, using the ‘--hdr_pad’ option with ncrename
helps ensure that future metadata changes to output-file occur
as swifly as possible.
OPTIONS
Attribute renaming.
The old and new names of the attribute are specified with ‘-a’
(or ‘--attribute’) by the associated old_name and
new_name values.
Global attributes are treated no differently than variable attributes.
This option may be specified more than once.
As mentioned above, all occurrences of the attribute of a given name
will be renamed unless the ‘.’ form is used, with one exception.
To change the attribute name for a particular variable, specify
the old_name in the format old_var_name@old_att_name.
The ‘@’ symbol delimits the variable from the attribute name.
If the attribute is uniquely named (no other variables contain the
attribute) then the old_var_name@old_att_name syntax is
redundant.
The old_var_name variable names global
and group
have special significance.
They indicate that old_att_nm should only be renamed where it
occurs as a global (i.e., root group) metadata attribute (for
global
), or (for group
) as any group attribute, and
not where it occurs as a variable attribute.
The var_name@att_name syntax is accepted, though not required,
for the new_name.
Dimension renaming. The old and new names of the dimension are specified with ‘-d’ (or ‘--dmn’, ‘--dimension’) by the associated old_name and new_name values. This option may be specified more than once.
Group renaming. The old and new names of the group are specified with ‘-g’ (or ‘--grp’, ‘--group’) by the associated old_name and new_name values. This option may be specified more than once. This functionality is only available in NCO version 4.3.7 (October, 2013) or later, and only when built on netCDF library version 4.3.1-rc1 (August, 2013) or later.
Variable renaming. The old and new names of the variable are specified with ‘-v’ (or ‘--variable’) by the associated old_name and new_name values. This option may be specified more than once.
EXAMPLES
Rename the variable p
to pressure
and t
to
temperature
in netCDF in.nc.
In this case p
must exist in the input file (or
ncrename
will abort), but the presence of t
is optional:
ncrename -v p,pressure -v .t,temperature in.nc
Rename the attribute long_name
to largo_nombre
in the
variable u
, and no other variables in netCDF in.nc.
ncrename -a u@long_name,largo_nombre in.nc
Rename the group g8
to g20
in netCDF4 file
in_grp.nc:
ncrename -g g8,g20 in_grp.nc
Rename the variable /g1/lon
to longitude
in netCDF4
in_grp.nc:
ncrename -v /g1/lon,longitude in_grp.nc ncrename -v /g1/lon,/g1/longitude in_grp.nc # Alternate
ncrename
does not automatically attach dimensions to variables of
the same name.
This is done to make renaming an easy way to change whether a variable
is a coordinate.
If you want to rename a coordinate variable so that it remains a
coordinate variable, you must separately rename both the dimension and
the variable:
ncrename -d lon,longitude -v lon,longitude in.nc
Unfortunately, the netCDF4 library had a longstanding bug (all versions
until 4.3.1-rc5 released in December, 2013) that crashed NCO
when performing this operation.
Simultaneously renaming variables and dimensions in netCDF4 files with
earlier versions of netCDF is impossible; it must instead be done in two
separate ncrename
invocations (e.g., first rename the
variable, then rename the dimension) to avoid triggering the libary
bug.
A related bug causes unintended side-effects with ncrename
also built with all versions of the netCDF4 library until 4.3.1-rc5
released in December, 2013):
This bug caused renaming either a dimension or its
associated coordinate variable (not both, which would fail as above) in
a netCDF4 file to inadvertently rename both:
# Demonstrate bug in netCDF4/HDF5 library prior to netCDF-4.3.1-rc5 ncks -O -h -m -M -4 -v lat_T42 ~/nco/data/in.nc ~/foo.nc ncrename -O -v lat_T42,lat ~/foo.nc ~/foo2.nc # Also renames dimension ncrename -O -d lat_T42,lat ~/foo.nc ~/foo2.nc # Also renames variable
To avoid this faulty behavior, either build NCO with netCDF version 4.3.1-rc5 or later, or convert the file to netCDF3 first, then rename as intended, then convert back. Unforunately while this bug and the related coordinate renaming bug were fixed in 4.3.1-rc5 (released in December, 2013), a new and related bug was discovered in October 2014.
Another netCDF4 bug that causes unintended side-effects with
ncrename
affects (at least) versions 4.3.1–4.3.2 and all
snapshots of the netCDF4 library until January, 2015.
This bug (fixed in 4.3.3 in February, 2015) corrupts values or renamed
netCDF4 coordinate variables (i.e., variables with underlying dimensions
of the same name) and other (non-coordinate) variables that include an
underlying dimension that was renamed.
In other words, renaming coordinate variables and dimensions
succeeds yet it corrupts the values contained by the affected array
variables.
This bug corrupts affected variables by replacing their values with the
default _FillValue
for that variable’s type:
# Demonstrate bug in netCDF4 libraries prior to version 4.3.3 ncks -O -4 -C -M -v lat ~/nco/data/in.nc ~/bug.nc ncrename -O -v lat,tal ~/bug.nc ~/foo.nc # Broken until netCDF-4.3.3 ncrename -O -d lat,tal ~/bug.nc ~/foo.nc # Broken until netCDF-4.3.3 ncrename -O -d lat,tal -v lat,tal ~/bug.nc ~/foo.nc # Broken too ncks ~/foo.nc
To avoid this faulty behavior, either build NCO with netCDF version 4.3.3 or later, or convert the file to netCDF3 first, then rename as intended, then convert back. This bug does not affect renaming of groups or of attributes.
Yet another netCDF4 bug that causes unintended side-effects with
ncrename
affects only snapshots from January–February, 2015,
and released version 4.3.3 (February, 2015).
It was fixed in (and was the reason for releasing) netCDF version
4.3.3.1 (March, 2015).
This bug causes renamed attributes of coordinate variables in netCDF4
to files to disappear:
# Demonstrate bug in netCDF4 library version 4.3.3 ncrename -O -h -a /g1/lon@units,new_units ~/nco/data/in_grp.nc ~/foo.nc ncks -v /g1/lon ~/foo.nc # Shows units and new_units are both gone
Clearly, renaming coordinates in netCDF4 files is non-trivial.
The penultimate chapter in this saga is a netCDF4 bug discovered in
September, 2015, and present in versions 4.3.3.1 (and possibly earlier
versions too) and later.
As of this writing (February, 2018), this bug is still present in netCDF4
version 4.6.0.1-development.
This bug causes ncrename
to create corrupted output files
when attempting to rename two or more dimensions simultaneously.
The workaround is to rename the dimensions sequentially, in two separate
ncrename
calls.
# Demonstrate bug in netCDF4 library versions 4.3.3.1--4.6.1+ ncrename -O -d lev,z -d lat,y -d lon,x ~/nco/data/in_grp.nc ~/foo.nc # Completes but file is unreadable ncks -v one ~/foo.nc # File is unreadable (multiple dimensions with same ID?)
A new netCDF4 renaming bug was discovered in March, 2017.
It is present in versions 4.4.1–4.6.0 (and possibly earlier versions).
This bug was fixed in netCDF4 version 4.6.1 (Yay Ed!).
This bug caused ncrename
to fail to rename a variable when the
result would become a coordinate.
# Demonstrate bug in netCDF4 library versions 4.4.1--4.6.0 ncrename -O -v non_coord,coord ~/nco/data/in_grp.nc ~/foo.nc # Fails (HDF error)
The fix is to upgrade to netCDF version 4.6.1. The workaround is to convert to netCDF3, then rename, then convert back to netCDF4.
A potentially new netCDF4 bug was discovered in November, 2017 and is
now fixed.
It is present in versions 4.4.1.1–4.6.0 (and possibly earlier versions too).
This bug causes ncrename
to fail to rename a variable when the
result would become a coordinate.
Oddly this issue shows that simultaneously renaming a dimension and
coordinate can succeed (in contrast to a bug described above), and that
separating that into two steps can fail.
# Demonstrate bug in netCDF4 library versions 4.4.1--4.6.0 # 20171107: https://github.com/Unidata/netcdf-c/issues/597 # Create test dataset ncks -O -C -v lon ~/nco/data/in_grp.nc ~/in_grp.nc ncks -O -x -g g1,g2 ~/in_grp.nc ~/in_grp.nc # Rename dimension then variable ncrename -d lon,longitude ~/in_grp.nc # works ncrename -v lon,longitude ~/in_grp.nc # borken "HDF error" # Rename variable then dimension ncrename -v lon,longitude ~/in_grp.nc # works ncrename -d lon,longitude ~/in_grp.nc # borken "nc4_reform_coord_var: Assertion `dim_datasetid > 0' failed." # Oddly renaming both simultaneously works: ncrename -d lon,longitude -v lon,longitude ~/in_grp.nc # works
The fix is to upgrade to netCDF version 4.6.1. The workaround is to convert to netCDF3, then rename, then convert back to netCDF4.
A new netCDF3 bug was discovered in April, 2018 and is now fixed.
It is present in netCDF versions 4.4.1–4.6.0 (and possibly earlier versions too).
This bug caused ncrename
to fail to rename many coordinates
and dimensions simultaneously.
This bug affects netCDF3 64BIT_OFFSET
files and possibly other
formats as well.
As such it is the first and so far only bug we have identified that
affects netCDF3 files.
cp /glade/scratch/gus/GFDL/exp/CM3_test/pp/0001/0001.land_month_crop.AllD.nc ~/correa_in.nc ncrename -O -d grid_xt,lon -d grid_yt,lat -v grid_xt,lon -v grid_yt,lat \ -v grid_xt_bnds,lon_bnds -v grid_yt_bnds,lat_bnds ~/correa_in.nc ~/correa_out.nc
The fix is to upgrade to netCDF version 4.6.1.
Create netCDF out.nc identical to in.nc except the
attribute _FillValue
is changed to missing_value
,
the attribute units
is changed to CGS_units
(but only in
those variables which possess it), the attribute hieght
is
changed to height
in the variable tpt
, and in the
variable prs_sfc
, if it exists.
ncrename -a _FillValue,missing_value -a .units,CGS_units \ -a tpt@hieght,height -a prs_sfc@.hieght,height in.nc out.nc
The presence and absence of the ‘.’ and ‘@’ features
cause this command to execute successfully only if a number of
conditions are met.
All variables must have a _FillValue
attribute and
_FillValue
must also be a global attribute.
The units
attribute, on the other hand, will be renamed to
CGS_units
wherever it is found but need not be present in
the file at all (either as a global or a variable attribute).
The variable tpt
must contain the hieght
attribute.
The variable prs_sfc
need not exist, and need not contain the
hieght
attribute.
Rename the global or group attribute Convention
to
Conventions
ncrename -a Convention,Conventions in.nc # Variable and group atts. ncrename -a .Convention,Conventions in.nc # Variable and group atts. ncrename -a @Convention,Conventions in.nc # Group atts. only ncrename -a @.Convention,Conventions in.nc # Group atts. only ncrename -a global@Convention,Conventions in.nc # Group atts. only ncrename -a .global@.Convention,Conventions in.nc # Group atts. only ncrename -a global@Convention,Conventions in.nc # Global atts. only ncrename -a .global@.Convention,Conventions in.nc # Global atts. only
The examples without the @
character attempt to change the
attribute name in both Global or Group and variable attributes.
The examples with the @
character attempt to change only
global and group Convention
attributes, and leave unchanged any
Convention
attributes attached directly to variables.
Attributes prefixed with a period (.Convention
) need not be
present.
Attributes not prefixed with a period (Convention
) must be
present.
Variables prefixed with a period (.
or .global
) need not
be present.
Variables not prefixed with a period (global
) must be present.
ncwa
netCDF Weighted Averager ¶SYNTAX
ncwa [-3] [-4] [-5] [-6] [-7] [-A] [-a dim[,...]] [-B mask_cond] [-b] [-C] [-c] [--cmp cmp_sng] [--cnk_byt sz_byt] [--cnk_csh sz_byt] [--cnk_dmn nm,sz_lmn] [--cnk_map map] [--cnk_min sz_byt] [--cnk_plc plc] [--cnk_scl sz_lmn] [-D dbg] [-d dim,[min][,[max][,[stride]]] [-F] [--fl_fmt fl_fmt] [-G gpe_dsc] [-g grp[,...]] [--glb ...] [-H] [-h] [--hdr_pad nbr] [--hpss] [-I] [-L dfl_lvl] [-l path] [-M mask_val] [-m mask_var] [-N] [--no_cll_msr] [--no_cll_mth] [--no_frm_trm] [--no_tmp_fl] [-O] [-o output-file] [-p path] [--qnt ...] [--qnt_alg alg_nm] [-R] [-r] [--ram_all] [--rth_dbl|flt] [-T mask_comp] [-t thr_nbr] [--unn] [-v var[,...]] [-w weight] [-X ...] [-x] [-y op_typ] input-file [output-file]
DESCRIPTION
ncwa
performs statistics (including, but not limited to,
averages) on variables in a single file over arbitrary dimensions, with
options to specify weights, masks, and normalization.
See Statistics vs Concatenation, for a description of the
distinctions between the various statistics tools and concatenators.
The default behavior of ncwa
is to arithmetically average
every numerical variable over all dimensions and to produce a scalar
result for each.
Averaged dimensions are, by default, eliminated as dimensions.
Their corresponding coordinates, if any, are output as scalar
variables.
The ‘-b’ switch (and its long option equivalents ‘--rdd’ and
‘--retain-degenerate-dimensions’) causes ncwa
to retain
averaged dimensions as degenerate (size 1) dimensions.
This maintains the association between a dimension (or coordinate) and
variables after averaging and simplifies, for instance, later
concatenation along the degenerate dimension.
To average variables over only a subset of their dimensions, specify
these dimensions in a comma-separated list following ‘-a’, e.g.,
‘-a time,lat,lon’.
As with all arithmetic operators, the operation may be restricted to
an arbitrary hyperslab by employing the ‘-d’ option
(see Hyperslabs).
ncwa
also handles values matching the variable’s
_FillValue
attribute correctly.
Moreover, ncwa
understands how to manipulate user-specified
weights, masks, and normalization options.
With these options, ncwa
can compute sophisticated averages
(and integrals) from the command line.
mask_var and weight, if specified, are broadcast to conform
to the variables being averaged.
The rank of variables is reduced by the number of dimensions which they
are averaged over.
Thus arrays which are one dimensional in the input-file and are
averaged by ncwa
appear in the output-file as scalars.
This allows the user to infer which dimensions may have been averaged.
Note that that it is impossible for ncwa
to make make a
weight or mask_var of rank W conform to a var of
rank V if W > V.
This situation often arises when coordinate variables (which, by
definition, are one dimensional) are weighted and averaged.
ncwa
assumes you know this is impossible and so ncwa
does not attempt to broadcast weight or mask_var to conform
to var in this case, nor does ncwa
print a warning
message telling you this, because it is so common.
Specifying dbg > 2 does cause ncwa
to emit warnings in
these situations, however.
Non-coordinate variables are always masked and weighted if specified.
Coordinate variables, however, may be treated specially.
By default, an averaged coordinate variable, e.g., latitude
,
appears in output-file averaged the same way as any other variable
containing an averaged dimension.
In other words, by default ncwa
weights and masks
coordinate variables like all other variables.
This design decision was intended to be helpful but for some
applications it may be preferable not to weight or mask coordinate
variables just like all other variables.
Consider the following arguments to ncwa
:
-a latitude -w lat_wgt -d latitude,0.,90.
where lat_wgt
is
a weight in the latitude
dimension.
Since, by default ncwa
weights coordinate variables, the
value of latitude
in the output-file depends on the weights
in lat_wgt and is not likely to be 45.0, the midpoint latitude
of the hyperslab.
Option ‘-I’ overrides this default behavior and causes
ncwa
not to weight or mask coordinate variables
92.
In the above case, this causes the value of latitude
in the
output-file to be 45.0, an appealing result.
Thus, ‘-I’ specifies simple arithmetic averages for the coordinate
variables.
In the case of latitude, ‘-I’ specifies that you prefer to archive
the arithmetic mean latitude of the averaged hyperslabs rather than the
area-weighted mean latitude.
93.
As explained in See Operation Types, ncwa
always averages coordinate variables regardless of the arithmetic
operation type performed on the non-coordinate variables.
This is independent of the setting of the ‘-I’ option.
The mathematical definition of operations involving rank reduction
is given above (see Operation Types).
The mask condition has the syntax mask_var mask_comp mask_val. The preferred method to specify the mask condition is in one string with the ‘-B’ or ‘--mask_condition’ switches. The older method is to use the three switches ‘-m’, ‘-T’, and ‘-M’ to specify the mask_var, mask_comp, and mask_val, respectively. 94. The mask_condition string is automatically parsed into its three constituents mask_var, mask_comp, and mask_val.
Here mask_var is the name of the masking variable (specified with ‘-m’, ‘--mask-variable’, ‘--mask_variable’, ‘--msk_nm’, or ‘--msk_var’). The truth mask_comp argument (specified with ‘-T’, ‘--mask_comparator’, ‘--msk_cmp_typ’, or ‘--op_rlt’ may be any one of the six arithmetic comparators: eq, ne, gt, lt, ge, le. These are the Fortran-style character abbreviations for the logical comparisons ==, !=, >, <, >=, <=. The mask comparator defaults to eq (equality). The mask_val argument to ‘-M’ (or ‘--mask-value’, or ‘--msk_val’) is the right hand side of the mask condition. Thus for the i’th element of the hyperslab to be averaged, the mask condition is mask(i) mask_comp mask_val.
ncwa
has one switch which controls the normalization of the
averages appearing in the output-file.
Short option ‘-N’ (or long options ‘--nmr’ or
‘--numerator’) prevents ncwa
from dividing the weighted
sum of the variable (the numerator in the averaging expression) by the
weighted sum of the weights (the denominator in the averaging
expression).
Thus ‘-N’ tells ncwa
to return just the numerator of the
arithmetic expression defining the operation (see Operation Types).
With this normalization option, ncwa
can integrate variables.
Averages are first computed as sums, and then normalized to obtain the
average.
The original sum (i.e., the numerator of the expression in
Operation Types) is output if default normalization is turned off
(with ‘-N’).
This sum is the integral (not the average) over the specified
(with ‘-a’, or all, if none are specified) dimensions.
The weighting variable, if specified (with ‘-w’), plays the
role of the differential increment and thus permits more sophisticated
integrals (i.e., weighted sums) to be output.
For example, consider the variable
lev
where lev = [100,500,1000] weighted by
the weight lev_wgt
where lev_wgt = [10,2,1].
The vertical integral of lev
, weighted by lev_wgt
,
is the dot product of lev and lev_wgt.
That this is is 3000.0 can be seen by inspection and verified with
the integration command
ncwa -N -a lev -v lev -w lev_wgt in.nc foo.nc;ncks foo.nc
EXAMPLES
Given file 85_0112.nc:
netcdf 85_0112 { dimensions: lat = 64 ; lev = 18 ; lon = 128 ; time = UNLIMITED ; // (12 currently) variables: float lat(lat) ; float lev(lev) ; float lon(lon) ; float time(time) ; float scalar_var ; float three_dmn_var(lat, lev, lon) ; float two_dmn_var(lat, lev) ; float mask(lat, lon) ; float gw(lat) ; }
Average all variables in in.nc over all dimensions and store results in out.nc:
ncwa in.nc out.nc
All variables in in.nc are reduced to scalars in out.nc
since ncwa
averages over all dimensions unless otherwise
specified (with ‘-a’).
Store the zonal (longitudinal) mean of in.nc in out.nc:
ncwa -a lon in.nc out1.nc ncwa -a lon -b in.nc out2.nc
The first command turns lon
into a scalar and the second retains
lon
as a degenerate dimension in all variables.
% ncks --trd -C -H -v lon out1.nc lon = 135 % ncks --trd -C -H -v lon out2.nc lon[0] = 135
In either case the tally is simply the size of lon
, i.e., 180
for the 85_0112.nc file described by the sample header above.
Compute the meridional (latitudinal) mean, with values weighted by the corresponding element of gw 95:
ncwa -w gw -a lat in.nc out.nc
Here the tally is simply the size of lat
, or 64.
The sum of the Gaussian weights is 2.0.
Compute the area mean over the tropical Pacific:
ncwa -w gw -a lat,lon -d lat,-20.,20. -d lon,120.,270. in.nc out.nc
Here the tally is 64 times 128 = 8192.
Compute the area-mean over the globe using only points for which ORO < 0.5 96:
ncwa -B 'ORO < 0.5' -w gw -a lat,lon in.nc out.nc ncwa -m ORO -M 0.5 -T lt -w gw -a lat,lon in.nc out.nc
It is considerably simpler to specify the complete mask_cond with the single string argument to ‘-B’ than with the three separate switches ‘-m’, ‘-T’, and ‘-M’ 97. If in doubt, enclose the mask_cond within quotes since some of the comparators have special meanings to the shell.
Assuming 70% of the gridpoints are maritime, then here the tally is 0.70 times 8192 = 5734.
Compute the global annual mean over the maritime tropical Pacific:
ncwa -B 'ORO < 0.5' -w gw -a lat,lon,time \ -d lat,-20.0,20.0 -d lon,120.0,270.0 in.nc out.nc ncwa -m ORO -M 0.5 -T lt -w gw -a lat,lon,time \ -d lat,-20.0,20.0 -d lon,120.0,270.0 in.nc out.nc
Further examples will use the one-switch specification of mask_cond.
Determine the total area of the maritime tropical Pacific, assuming the variable area contains the area of each gridcell
ncwa -N -v area -B 'ORO < 0.5' -a lat,lon \ -d lat,-20.0,20.0 -d lon,120.0,270.0 in.nc out.nc
Weighting area (e.g., by gw) is not appropriate because area is already area-weighted by definition. Thus the ‘-N’ switch, or, equivalently, the ‘-y ttl’ switch, correctly integrate the cell areas into a total regional area.
Mask a file to contain _FillValue everywhere except where thr_min <= msk_var <= thr_max:
# Set masking variable and its scalar thresholds export msk_var='three_dmn_var_dbl' # Masking variable export thr_max='20' # Maximum allowed value export thr_min='10' # Minimum allowed value ncecat -O in.nc out.nc # Wrap out.nc in degenerate "record" dimension ncwa -O -a record -B "${msk_var} <= ${thr_max}" out.nc out.nc ncecat -O out.nc out.nc # Wrap out.nc in degenerate "record" dimension ncwa -O -a record -B "${msk_var} >= ${thr_min}" out.nc out.nc
After the first use of ncwa
, out.nc contains
_FillValue where ${msk_var} >= ${thr_max}
.
The process is then repeated on the remaining data to filter out
points where ${msk_var} <= ${thr_min}
.
The resulting out.nc contains valid data only
where thr_min <= msk_var <= thr_max.
We welcome contributions from anyone. The project homepage at https://sf.net/projects/nco contains more information on how to contribute.
Financial contributions to NCO development may be made through PayPal. NCO has been shared for over 10 years yet only two users have contributed any money to the developers 98. So you could be the third!
NCO would not exist without the dedicated efforts of the remarkable software engineers who conceive, develop, and maintain netCDF, UDUnits, and OPeNDAP. Since 1995 NCO has received support from nearly the entire staff of all these projects, including Russ Rew, John Caron, Glenn Davis, Steve Emmerson, Ward Fisher, James Gallagher, Ed Hartnett, and Dennis Heimbigner. In addition to their roles in maintaining the software stack on which NCO perches, Yertl-like, some of these gentlemen have advised or contributed to NCO specifically. That support is acknowledged separately below.
The primary contributors to NCO development have been:
All concept, design and implementation from 1995–2000.
Since then autotools, bug-squashing, CDL, chunking,
documentation, anchoring, recursion, GPE, packing,
regridding, CDL/XML backends, compression,
NCO library redesign, ncap2
features,
ncbo
, ncpdq
, SMP threading and MPI parallelization,
netCDF4 integration, external funding, project management, science
research, releases.
Non-linear operations and min()
, max()
, total()
support in ncra
and ncwa
.
Type conversion for arithmetic.
Migration to netCDF3 API.
ncap2
parser, lexer, GSL-support, and I/O.
Multislabbing algorithm.
Variable wildcarding.
JSON backend.
Numerous hacks.
ncap2
language.
Original autotools build support. Long command-line options. Original UDUnits support. Debianization. Numerous bug-fixes.
Script Workflow Analysis for MultiProcessing (SWAMP). RPM support.
Windows Visual Studio support. netCDF4 groups. CMake build-engine.
Acknowledgement via financial donations
Please let me know if your name was omitted!
The recommended citations for NCO software are
Zender, C. S. (2008), Analysis of Self-describing Gridded Geoscience Data with netCDF Operators (NCO), Environ. Modell. Softw., 23(10), 1338-1342, doi:10.1016/j.envsoft.2008.03.004. Zender, C. S. and H. J. Mangalam (2007), Scaling Properties of Common Statistical Operators for Gridded Datasets, Int. J. High Perform. Comput. Appl., 21(4), 485-498, doi:10.1177/1094342007083802. Zender, C. S. (2016), Bit Grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+), Geosci. Model Dev., 9, 3199-3211, doi:10.5194/gmd-9-3199-2016. Zender, C. S. (Year), netCDF Operator (NCO) User Guide, http://nco.sf.net/nco.pdf.
Use the first when referring to overall design, purpose, and optimization of NCO, the second for the speed and throughput of NCO, the third for compressions, and the fourth for specific features and/or the User Guide itself, or in a non-academic setting. A complete list of NCO publications and presentations is at http://nco.sf.net#pub. This list links to the full papers and seminars themselves.
From 2004–2007, NSF funded a project to improve Distributed Data Reduction & Analysis (DDRA) by evolving NCO parallelism (OpenMP, MPI) and Server-Side DDRA (SSDDRA) implemented through extensions to OPeNDAP and netCDF4. The SSDDRA features were implemented in SWAMP, the PhD Thesis of Daniel Wang. SWAMP dramatically reduced bandwidth usage for NCO between client and server.
With this first NCO proposal funded, the content of the next NCO proposal became clear. We had long been interested in obtaining NASA support for HDF-specific enhancements to NCO. From 2012–2015 the NASA ACCESS program funded us to implement support support netCDF4 group functionality. Thus NCO will grow and evade bit-rot for the foreseeable future.
We are considering other interesting ideas for still more proposals. Please contact us if you wish to be involved with any future NCO-related proposals. Comments on the proposals and letters of support are also very welcome.
Simple examples in Bash shell scripts showing how to average data with different file structures. Here we include monthly, seasonal and annual average with daily or monthly data in either one file or multiple files.
Suppose we have daily data from Jan 1st, 1990 to Dec. 31, 2005 in the
file of in.nc with the record dimension as time
.
for yyyy in {1990..2005}; do # Loop over years for moy in {1..12}; do # Loop over months mm=$( printf "%02d" ${moy} ) # Change to 2-digit format # Average specific month yyyy-mm ncra -O -d time,"${yyyy}-${mm}-01","${yyyy}-${mm}-31" \ in.nc in_${yyyy}${mm}.nc done done # Concatenate monthly files together ncrcat -O in_??????.nc out.nc
for yyyy in {1990..2005}; do # Loop over years ncra -O -d time,"${yyyy}-01-01","${yyyy}-12-31" in.nc in_${yyyy}.nc done # Concatenate annual files together ncrcat -O in_????.nc out.nc
The -O switch means to overwrite the pre-existing files (see Batch Mode).
The -d option is to specify the range of hyperslabs (see Hyperslabs).
There are detailed instructions on ncra
(see ncra
netCDF Record Averager and ncrcat
(see ncrcat
netCDF Record Concatenator).
NCO supports UDUnits so that we can use readable dates as time dimension (see UDUnits Support).
Inside the input file in.nc, the record dimension time
is from Jan 1990 to Dec 2005.
ncra -O --mro -d time,"1990-12-01",,12,3 in.nc out.nc
ncra -O --mro -d time,,,12,12 in.nc out.nc
Here we use the subcycle feature (i.e., the number after the fourth
comma: ‘3’ in the seasonal example and the second ‘12’ in
the annual example) to retrieve groups of records separated by regular
intervals (see Subcycle).
The option --mro switches ncra
to produce a
Multi-Record Output instead of a single-record output.
For example, assume snd is a 3D array with dimensions
time
* latitude
* longitude
and time
includes every month from Jan. 1990 to Dec. 2005, 192 months in total,
or 16 years.
Consider the following two command lines:
ncra --mro -v snd -d time,"1990-12-01",,12,3 in.nc out_mro.nc ncra -v snd -d time,"1990-12-01",,12,3 in.nc out_sro.nc
In the first output file, out_mro.nc, snd is still a 3D
array with dimensions time
* latitude
* longitude
,
but the length of time
now is 16, meaning 16 winters.
In the second output file, out_sro.nc, the length of
time
is only 1, which contains the average of all 16 winters.
When using ‘-d dim,min[,max]’ to specify the hyperslabs, you can leave it blank if you want to include the minimum or the maximum of the data, like we did above.
This means if you have daily data of 30 days, there will be 30 data files. Or if you have monthly data of 12 months, there will be 12 data files. Dealing with this kind of files, you need to specify the file names in shell scripts and pass them to NCO operators. For example, your daily data files may look like snd_19900101.nc, snd_19900102.nc, snd_19900103.nc ... If you want to know the monthly average of Jan 1990, you can write like,
ncra -O snd_199001??.nc out.nc
You might want to use loop if you need the average of each month.
for moy in {1..12}; do # Loop over months mm=$( printf "%02d" ${moy} ) # Change to 2-digit format ncra -O snd_????${mm}??.nc out_${mm}.nc done
Similar as the last one, it’s more about shell scripts.
Suppose you have daily data with one month of them in one data file.
The monthly average is simply to apply ncra
on the specific data file.
And for seasonal averages, you can specify the three months by shell scripts.
The fifth phase of the Coupled Model Intercomparison Project (CMIP5) provides a multi-model framework for comparing the mechanisms and responses of climate models from around the world. However, it is a tremendous workload to retrieve a single climate statistic from all these models, each of which includes several ensemble members. Not only that, it is too often a tedious process that impedes new research and hypothesis testing. Our NASA ACCESS 2011 project simplified and accelerated this process.
Traditional geoscience data analysis requires users to work with numerous flat (data in one level or namespace) files. In that paradigm instruments or models produce, and then repositories archive and distribute, and then researchers request and analyze, collections of flat files. NCO works well with that paradigm, yet it also embodies the necessary algorithms to transition geoscience data analysis from relying solely on traditional (or “flat”) datasets to allowing newer hierarchical (or “nested”) datasets.
Hierarchical datasets support and enable combining all datastreams that meet user-specified criteria into a single or small number of files that hold all the science-relevant data. NCO (and no other software to our knowledge) exploits this capability now. Data and metadata may be aggregated into and analyzed in hierarchical structures. We call the resulting data storage, distribution, and analysis paradigm Group-Oriented Data Analysis and Distribution (GODAD). GODAD lets the scientific question organize the data, not the ad hoc granularity of all relevant datasets. This chapter illustrates GODAD techniques applied to analysis of the CMIP5 dataset.
To begin, we document below a prototypical example of CMIP5 analysis and evaluation using traditional NCO commands on netCDF3-format model and HDF-EOS format observational (NASA MODIS satellite instrument) datasets. These examples complement the NCO User Guide by detailing in-depth data analysis in a frequently encountered “real world” context. Graphical representations of the results (NCL scripts available upon request) are provided to illustrate physical meaning of the analysis. Since NCO can process hierarchical datasets, i.e., datasets stored with netCDF4 groups, we present sample scripts illustrating group-based processing as well.
Sometimes, the data of one ensemble member will be stored in several files to reduce single file size. It is more convenient to concatenate these files into a single timeseries, and the following script illustrates how. Key steps include:
ncrcat
(see ncrcat
netCDF Record Concatenator).
#!/bin/bash # shell type shopt -s extglob # enable extended globbing #=========================================================================== # Some of the models cut one ensemble member into several files, # which include data of different time periods. # We'd better concatenate them into one at the beginning so that # we won't have to think about which files we need if we want # to retrieve a specific time period later. # # Method: # - Make sure 'time' is the record dimension (i.e., left-most) # - ncrcat # # Input files like: # /data/cmip5/snc_LImon_bcc-csm1-1_historical_r1i1p1_185001-190012.nc # /data/cmip5/snc_LImon_bcc-csm1-1_historical_r1i1p1_190101-200512.nc # # Output files like: # /data/cmip5/snc_LImon_bcc-csm1-1_historical_r1i1p1_185001-200512.nc # # Online: http://nco.sourceforge.net/nco.html#Combine-Files # # Execute this script: bash cmb_fl.sh #=========================================================================== drc_in='/home/wenshanw/data/cmip5/' # Directory of input files var=( 'snc' 'snd' ) # Variables rlm='LImon' # Realm xpt=( 'historical' ) # Experiment ( could be more ) for var_id in {0..1}; do # Loop over two variables # Names of all the models (ls [get file names]; # cut [get model names]; # sort; uniq [remove duplicates]; awk [print]) mdl_set=$( ls ${drc_in}${var[var_id]}_${rlm}_*_${xpt[0]}_*.nc | \ cut -d '_' -f 3 | sort | uniq -c | awk '{print $2}' ) # Number of models (echo [print contents]; wc [count]) mdl_nbr=$( echo ${mdl_set} | wc -w ) echo "==============================" echo "There are" ${mdl_nbr} "models for" ${var[var_id]}. for mdl in ${mdl_set}; do # Loop over models # Names of all the ensemble members nsm_set=$( ls ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_*.nc | \ cut -d '_' -f 5 | sort | uniq -c | awk '{print $2}' ) # Number of ensemble members in each model nsm_nbr=$( echo ${nsm_set} | wc -w ) echo "------------------------------" echo "Model" ${mdl} "includes" ${nsm_nbr} "ensemble member(s):" echo ${nsm_set}"." for nsm in ${nsm_set}; do # Loop over ensemble members # Number of files in this ensemble member fl_nbr=$( ls ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_${nsm}_*.nc \ | wc -w ) # If there is only 1 file, continue to next loop if [ ${fl_nbr} -le 1 ] then echo "There is only 1 file in" ${nsm}. continue fi echo "There are" ${fl_nbr} "files in" ${nsm}. # Starting date of data # (sed [the name of the first file includes the starting date]) yyyymm_str=$( ls ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_${nsm}_*.nc\ | sed -n '1p' | cut -d '_' -f 6 | cut -d '-' -f 1 ) # Ending date of data # (sed [the name of the last file includes the ending date]) yyyymm_end=$( ls ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_${nsm}_*.nc\ | sed -n "${fl_nbr}p" | cut -d '_' -f 6 | cut -d '-' -f 2 ) # Concatenate one ensemble member files # into one along the record dimension (now is time) ncrcat -O ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_${nsm}_*.nc \ ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_\ ${nsm}_${yyyymm_str}-${yyyymm_end} # Remove useless files rm ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_${nsm}_\ !(${yyyymm_str}-${yyyymm_end}) done done done
CMIP5 model data downloaded from the Earth System Grid Federation (ESGF) does not contain group features yet. Therefore users must aggregate flat files into hierarchical ones themselves. The following script shows how. Each dataset becomes a group in the output file. There can be several levels of groups. In this example, we employ two experiments (“scenarios”) as the top-level. The second-level comprises different models (e.g., CCSM4, CESM1-BGC). Many models are run multiple times with slight perturbed initial conditions to produce an ensemble of realizations. These ensemble members comprise the third level of the hierarchy. The script selects two variables, snc and snd (snow cover and snow depth).
#!/bin/bash # #============================================================ # Aggregate models to one group file # # Method: # - Create files with groups by ncecat --gag # - Append groups level by level using ncks # # Input files like: # snc_LImon_CCSM4_historical_r1i1p1_199001-200512.nc # snd_LImon_CESM1-BGC_esmHistorical_r1i1p1_199001-200512.nc # # Output files like: # sn_LImon_199001-200512.nc # # Online: http://nco.sourceforge.net/nco.html#Combine-Files # # Execute this script: bash cmb_fl_grp.sh #============================================================ # Directories drc_in='../data/' drc_out='../data/grp/' # Constants rlm='LImon' # Realm: LandIce; Time frequency: monthly tms='200001-200512' # Timeseries flt='nc' # File Type # Geographical weights # Can be skipped when ncap2 works on group data # Loop over all snc files for fn in $( ls ${drc_in}snc_${rlm}_*_${tms}.${flt} ); do ncap2 -O -s \ 'gw = float(cos(lat*3.1416/180.)); gw@long_name="geographical weight";'\ ${fn} ${fn} done var=( 'snc' 'snd' ) xpt=( 'esmHistorical' 'historical' ) mdl=( 'CCSM4' 'CESM1-BGC' 'CESM1-CAM5' ) for i in {0..1}; do # Loop over variables for j in {0..1}; do # Loop over experiments for k in {0..2}; do # Loop over models ncecat -O --glb_mtd_spp -G ${xpt[j]}/${mdl[k]}/${mdl[k]}_ \ ${drc_in}${var[i]}_${rlm}_${mdl[k]}_${xpt[j]}_*_${tms}.${flt} \ ${drc_out}${var[i]}_${rlm}_${mdl[k]}_${xpt[j]}_all-nsm_${tms}.${flt} ncks -A \ ${drc_out}${var[i]}_${rlm}_${mdl[k]}_${xpt[j]}_all-nsm_${tms}.${flt} \ ${drc_out}${var[i]}_${rlm}_${mdl[0]}_${xpt[j]}_all-nsm_${tms}.${flt} done # Loop done: models ncks -A \ ${drc_out}${var[i]}_${rlm}_${mdl[0]}_${xpt[j]}_all-nsm_${tms}.${flt} \ ${drc_out}${var[i]}_${rlm}_${mdl[0]}_${xpt[0]}_all-nsm_${tms}.${flt} done # Loop done: experiments ncks -A \ ${drc_out}${var[i]}_${rlm}_${mdl[0]}_${xpt[0]}_all-nsm_${tms}.${flt} \ ${drc_out}${var[0]}_${rlm}_${mdl[0]}_${xpt[0]}_all-nsm_${tms}.${flt} done # Loop done: variables # Rename output file mv ${drc_out}${var[0]}_${rlm}_${mdl[0]}_${xpt[0]}_all-nsm_${tms}.${flt} \ ${drc_out}sn_${rlm}_all-mdl_all-xpt_all-nsm_${tms}.${flt} # Remove temporary files rm ${drc_out}sn?_${rlm}*.nc #- Rename Group: # E.g., file snc_LImon_CESM1-CAM5_historical_r1i1p1_199001-200512.nc # is now group /historical/CESM1-CAM5/CESM1-CAM5_00. # You can rename it to /historical/CESM1-CAM5/r1i1p1 to make more sense. # Note: You don't need to write the full path of the new name. ncrename -g ${xpt}/${mdl}/${mdl}_00,r1i1p1 \ ${drc_out}${var}_${rlm}_${mdl}_all-nsm_${tms}.${flt} #------------------------------------------------------------ # Output file structure #------------------------------------------------------------ # esmHistorical # { # CESM1-BGC # { # CESM1-BGC_00 # { # snc(time, lat, lon) # snd(time, lat, lon) # } # } # } # historical # { # CCSM4 # { # CCSM4_00 # { # snc(time, lat, lon) # snd(time, lat, lon) # } # CCSM4_01 # { # snc(time, lat, lon) # snd(time, lat, lon) # } # CCSM4_02 { ... } # CCSM4_03 { ... } # CCSM4_04 { ... } # } # CESM1-BGC # { # CESM1-BGC_00 { ... } # } # CESM1-CAM5 # { # r1i1p1 { ... } # CESM1-CAM5_01 { ... } # CESM1-CAM5_02 { ... } # } # }
This section illustrates how to calculate the global distribution of long-term average (see Figure 7.1) with either flat files or group file. Key steps include:
nces
(see nces
netCDF Ensemble Statistics)
ncra
(see ncra
netCDF Record Averager)
ncecat
(see ncrcat
netCDF Record Concatenator) with the --gag option
The first example shows how to process flat files.
#!/bin/bash #=========================================================================== # After cmb_fl.sh # Example: Long-term average of each model globally # # Input files like: # /data/cmip5/snc_LImon_bcc-csm1-1_historical_r1i1p1_185001-200512.nc # # Output files like: # /data/cmip5/output/snc/snc_LImon_all-mdl_historical_all-nsm_clm.nc # # Online: # http://nco.sourceforge.net/nco.html#Global-Distribution-of-Long_002dterm-Average # # Execute this script: bash glb_avg.sh #=========================================================================== #--------------------------------------------------------------------------- # Parameters drc_in='/home/wenshanw/data/cmip5/' # Directory of input files drc_out='/home/wenshanw/data/cmip5/output/' # Directory of output files var=( 'snc' 'snd' ) # Variables rlm='LImon' # Realm xpt=( 'historical' ) # Experiment ( could be more ) fld_out=( 'snc/' 'snd/' ) # Folders of output files #--------------------------------------------------------------------------- for var_id in {0..1}; do # Loop over two variables # Names of all models # (ls [get file names]; cut [get the part for model names]; # sort; uniq [remove duplicates]; awk [print]) mdl_set=$( ls ${drc_in}${var[var_id]}_${rlm}_*_${xpt[0]}_*.nc | \ cut -d '_' -f 3 | sort | uniq -c | awk '{print $2}' ) # Number of models (echo [print contents]; wc [count]) mdl_num=$( echo ${mdl_set} | wc -w ) for mdl in ${mdl_set}; do # Loop over models # Average all the ensemble members of each model # Use nces file ensembles mode: --nsm_fl nces --nsm_fl -O -4 -d time,"1956-01-01 00:00:0.0","2005-12-31 23:59:9.9" \ ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_*.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}\ _all-nsm_195601-200512.nc # Average along time ncra -O ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}\ _all-nsm_195601-200512.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${mdl}.nc echo Model ${mdl} done! done # Remove temporary files rm ${drc_out}${fld_out[var_id]}${var[var_id]}*historical*.nc # Store models as groups in the output file ncecat -O --gag ${drc_out}${fld_out[var_id]}${var[var_id]}_*.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_\ all-mdl_${xpt[0]}_all-nsm_clm.nc echo Var ${var[var_id]} done! done
With the use of group, the above script will be shortened to ONE LINE.
# Data from cmb_fl_grp.sh # ensemble averaging nces -O --nsm_grp --nsm_sfx='_avg' \ sn_LImon_all-mdl_all-xpt_all-nsm_200001-200512.nc \ sn_LImon_all-mdl_all-xpt_nsm-avg.nc
The input file,
sn_LImon_all-mdl_all-xpt_all-nsm_200001-200512.nc, produced by
cmb_fl_grp.sh, includes all the ensemble members as groups.
The option ‘--nsm_grp’ denotes
that we are using group ensembles mode of nces
,
instead of file ensembles mode, ‘--nsm_fl’.
The option ‘--nsm_sfx='_avg'’ instructs nces
to store the output as a new child group /[model]/[model name]_avg/var;
otherwise, the output will be stored directly in the parent group /[model]/var.
In the final output file, sn_LImon_all-mdl_all-xpt_nsm-avg_tm-avg.nc,
sub-groups with a suffix of ‘avg’ are the long-term averages of each model.
One thing to notice is that for now,
ensembles with only one ensemble member will be left untouched.
This section illustrates how to calculate the annual average over specific regions (see Figure 7.2). Key steps include:
ncap2
(see ncap2
netCDF Arithmetic Processor) and ncwa
(see ncwa
netCDF Weighted Averager);
ncpdq
(see ncpdq
netCDF Permute Dimensions Quickly);
ncra
(see ncra
netCDF Record Averager);
ncbo
(see ncbo
netCDF Binary Operator);
ncbo
(see ncbo
netCDF Binary Operator) and nces
(see nces
netCDF Ensemble Statistics);
ncrename
(see ncrename
netCDF Renamer);
ncatted
(see ncatted
netCDF Attribute Editor);
ncap2
(see ncap2
netCDF Arithmetic Processor);
ncap2
(see ncap2
netCDF Arithmetic Processor) with nco script file (i.e., .nco file);
ncks
(see ncks
netCDF Kitchen Sink).
Flat files example
#!/bin/bash # Includes gsl_rgr.nco #=========================================================================== # After cmb_fl.sh # Example: Annual trend of each model over Greenland and Tibet # ( time- and spatial-average, standard deviation, # anomaly and linear regression) # # Input files: # /data/cmip5/snc_LImon_bcc-csm1-1_historical_r1i1p1_185001-200512.nc # # Output files: # /data/cmip5/outout/snc/snc_LImon_all-mdl_historical_all-nsm_annual.nc # # Online: http://nco.sourceforge.net/nco.html#Annual-Average-over-Regions # # Execute this script: bash ann_avg.sh #=========================================================================== #--------------------------------------------------------------------------- # Parameters drc_in='/home/wenshanw/data/cmip5/' # Directory of input files drc_out='/home/wenshanw/data/cmip5/output/' # Directory of output files var=( 'snc' 'snd' ) # Variables rlm='LImon' # Realm xpt=( 'historical' ) # Experiment ( could be more ) fld_out=( 'snc/' 'snd/' ) # Folders of output files # ------------------------------------------------------------ for var_id in {0..1}; do # Loop over two variables # Names of all models # (ls [get file names]; cut [get the part for model names]; # sort; uniq [remove duplicates]; awk [print]) mdl_set=$( ls ${drc_in}${var[var_id]}_${rlm}_*_${xpt[0]}_*.nc | \ cut -d '_' -f 3 | sort | uniq -c | awk '{print $2}' ) for mdl in ${mdl_set}; do # Loop over models # Loop over ensemble members for fn in $( ls ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_*.nc ); do pfx=$( echo ${fn} | cut -d'/' -f6 | cut -d'_' -f1-5 ) # Two regions # Geographical weight ncap2 -O -s 'gw = cos(lat*3.1415926/180.); gw@long_name="geographical weight"\ ;gw@units="ratio"' ${fn} ${drc_out}${fld_out[var_id]}${pfx}_gw.nc # Greenland ncwa -O -w gw -d lat,60.0,75.0 -d lon,300.0,340.0 -a lat,lon \ ${drc_out}${fld_out[var_id]}${pfx}_gw.nc \ ${drc_out}${fld_out[var_id]}${pfx}_gw_1.nc # Tibet ncwa -O -w gw -d lat,30.0,40.0 -d lon,80.0,100.0 -a lat,lon \ ${drc_out}${fld_out[var_id]}${pfx}_gw.nc \ ${drc_out}${fld_out[var_id]}${pfx}_gw_2.nc # Aggregate 2 regions together ncecat -O -u rgn ${drc_out}${fld_out[var_id]}${pfx}_gw_?.nc \ ${drc_out}${fld_out[var_id]}${pfx}_gw_rgn4.nc # Change dimensions order ncpdq -O -a time,rgn ${drc_out}${fld_out[var_id]}${pfx}_gw_rgn4.nc \ ${drc_out}${fld_out[var_id]}${pfx}_gw_rgn4.nc # Remove temporary files (optional) rm ${drc_out}${fld_out[var_id]}${pfx}_gw_?.nc \ ${drc_out}${fld_out[var_id]}${pfx}_gw.nc # Annual average (use the feature of 'Duration') ncra -O --mro -d time,"1956-01-01 00:00:0.0","2005-12-31 23:59:9.9",12,12 \ ${drc_out}${fld_out[var_id]}${pfx}_gw_rgn4.nc \ ${drc_out}${fld_out[var_id]}${pfx}_yrly.nc # Anomaly # Long-term average ncwa -O -a time ${drc_out}${fld_out[var_id]}${pfx}_yrly.nc \ ${drc_out}${fld_out[var_id]}${pfx}_clm.nc # Subtract long-term average ncbo -O --op_typ=- ${drc_out}${fld_out[var_id]}${pfx}_yrly.nc \ ${drc_out}${fld_out[var_id]}${pfx}_clm.nc \ ${drc_out}${fld_out[var_id]}${pfx}_anm.nc done rm ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_*_yrly.nc # Average over all the ensemble members ncea -O -4 ${drc_out}${fld_out[var_id]}${var[var_id]}_\ ${rlm}_${mdl}_${xpt[0]}_*_anm.nc ${drc_out}${fld_out[var_id]}\ ${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_all-nsm_anm.nc # Standard deviation ------------------------------ for fn in $( ls ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_\ ${xpt[0]}_*_anm.nc ); do pfx=$( echo ${fn} | cut -d'/' -f8 | cut -d'_' -f1-5 ) # Difference between each ensemble member and the average of all members ncbo -O --op_typ=- ${fn} \ ${drc_out}${fld_out[var_id]}${var[var_id]}_\ ${rlm}_${mdl}_${xpt[0]}_all-nsm_anm.nc \ ${drc_out}${fld_out[var_id]}${pfx}_dlt.nc done # RMS ncea -O -y rmssdn ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_\ ${mdl}_${xpt[0]}_*_dlt.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_\ ${mdl}_${xpt[0]}_all-nsm_sdv.nc # Rename variables ncrename -v ${var[var_id]},sdv \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_\ ${mdl}_${xpt[0]}_all-nsm_sdv.nc # Edit attributions ncatted -a standard_name,sdv,a,c,"_standard_deviation_over_ensemble" \ -a long_name,sdv,a,c," Standard Deviation over Ensemble" \ -a original_name,sdv,a,c," sdv" \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_\ ${mdl}_${xpt[0]}_all-nsm_sdv.nc #------------------------------------------------------------ # Linear regression ----------------------------------------- #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! # Have to change the name of variable in the commands file # of gsl_rgr.nco manually (gsl_rgr.nco is listed below) ncap2 -O -S gsl_rgr.nco \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_\ ${mdl}_${xpt[0]}_all-nsm_anm.nc ${drc_out}${fld_out[var_id]}${var[var_id]}\ _${rlm}_${mdl}_${xpt[0]}_all-nsm_anm_rgr.nc #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! # Get rid of temporary variables ncks -O -v c0,c1,pval,${var[var_id]},gw \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_\ ${xpt[0]}_all-nsm_anm_rgr.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${mdl}.nc #------------------------------------------------------------ # Move the variable 'sdv' into the anomaly files (i.e., *anm.nc files) ncks -A -v sdv \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_\ ${mdl}_${xpt[0]}_all-nsm_sdv.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${mdl}.nc rm ${drc_out}${fld_out[var_id]}${var[var_id]}_*historical* echo Model ${mdl} done! done # Store models as groups in the output file ncecat -O --gag ${drc_out}${fld_out[var_id]}${var[var_id]}_*.nc ${drc_out}${fld_out[var_id]}${var[var_id]}_\ ${rlm}_all-mdl_${xpt[0]}_all-nsm_annual.nc echo Var ${var[var_id]} done! done
gsl_rgr.nco
// Linear Regression // Called by ann_avg.sh // Caution: make sure the variable name is // in agreement with the main script (now is 'snd') // Online: http://nco.sourceforge.net/nco.html#Annual-Average-over-Regions // Declare variables *c0[$rgn]=0.; // Intercept *c1[$rgn]=0.; // Slope *sdv[$rgn]=0.; // Standard deviation *covxy[$rgn]=0.; // Covariance *x = double(time); for (*rgn_id=0;rgn_id<$rgn.size;rgn_id++) // Loop over regions { gsl_fit_linear(time,1,snd(:,rgn_id),1,$time.size, \ &tc0, &tc1, &cov00, &cov01,&cov11,&sumsq); // Linear regression function c0(rgn_id) = tc0; // Output results c1(rgn_id) = tc1; covxy(rgn_id) = gsl_stats_covariance(time,1,\ $time.size,double(snd(:,rgn_id)),1,$time.size); // Covariance function sdv(rgn_id) = gsl_stats_sd(snd(:,rgn_id), \ 1, $time.size); // Standard deviation function } // P value------------------------------------------------------------ *time_sdv = gsl_stats_sd(time, 1, $time.size); *r_value = covxy/(time_sdv*sdv); *t_value = r_value/sqrt((1-r_value^2)/($time.size-2)); pval = abs(gsl_cdf_tdist_P(t_value, $time.size-2) - \ gsl_cdf_tdist_P(-t_value, $time.size-2)); //---------------------------------------------------------------- // Write RAM variables to disk //------------------------------------------------------------ // Usually NCO writes the outputs directly to disk // Using RAM variables, declared by *, will shorten running time // Output the final results using ram_write() //------------------------------------------------------------ ram_write(c0); ram_write(c1);
With the group feature, all the loops over experiments, models and ensemble members can be omitted. As we are working on implementing group feature in all NCO operators, some functions (e.g., regression and standard deviation over ensemble members) may have to wait until the new versions.
#!/bin/bash # #============================================================ # Group data output by cmb_fl_grp.sh # Annual trend of each model over Greenland and Tibet # Time- and spatial-average, standard deviation and anomaly # No regression yet (needs ncap2) # # Input files: # sn_LImon_all-mdl_all-xpt_all-nsm_200001-200512.nc # # Online: http://nco.sourceforge.net/nco.html#Annual-Average-over-Regions # # Execute this script: bash ann_avg_grp.sh #=========================================================================== # Input and output directory drc='../data/grp/' # Constants pfx='sn_LImon_all-mdl_all-xpt_all-nsm' tms='200001-200512' # Time series # Greenland ncwa -O -w gw -d lat,60.0,75.0 -d lon,300.0,340.0 -a lat,lon \ ${drc}${pfx}_${tms}.nc \ ${drc}${pfx}_${tms}_grl.nc # Tibet ncwa -O -w gw -d lat,30.0,40.0 -d lon,80.0,100.0 -a lat,lon \ ${drc}${pfx}_${tms}.nc \ ${drc}${pfx}_${tms}_tbt.nc # Aggregate 2 regions together ncecat -O -u rgn ${drc}${pfx}_${tms}_???.nc \ ${drc}${pfx}_${tms}_rgn2.nc # Change dimensions order ncpdq -O -a time,rgn ${drc}${pfx}_${tms}_rgn2.nc \ ${drc}${pfx}_${tms}_rgn2.nc # Remove temporary files (optional) rm ${drc}${pfx}_${tms}_???.nc #Annual average ncra -O --mro -d time,,,12,12 ${drc}${pfx}_${tms}_rgn2.nc \ ${drc}${pfx}_${tms}_rgn2_ann.nc # Anomaly #------------------------------------------------------------ # Long-term average ncwa -O -a time ${drc}${pfx}_${tms}_rgn2_ann.nc \ ${drc}${pfx}_${tms}_rgn2_clm.nc # Subtract ncbo -O --op_typ=- ${drc}${pfx}_${tms}_rgn2_ann.nc \ ${drc}${pfx}_${tms}_rgn2_clm.nc \ ${drc}${pfx}_${tms}_rgn2_anm.nc #------------------------------------------------------------ # Standard Deviation: inter-annual variability # RMS of the above anomaly ncra -O -y rmssdn ${drc}${pfx}_${tms}_rgn2_anm.nc \ ${drc}${pfx}_${tms}_rgn2_stddev.nc
This script illustrates how to calculate the monthly anomaly from the annual average (see Figure 7.3). In order to keep only the monthly cycle, we will subtract the annual average of each year from the monthly data, instead of subtracting the long-term average. This is a little more complicated in coding since we need to loop over years.
Flat files example
#!/bin/bash #============================================================ # After cmb_fl.sh # Example: Monthly cycle of each model in Greenland # # Input files: # /data/cmip5/snc_LImon_bcc-csm1-1_historical_r1i1p1_185001-200512.nc # # Output files: # /data/cmip5/snc/snc_LImon__all-mdl_historical_all-nsm_GN_mthly-anm.nc # # Online: http://nco.sourceforge.net/nco.html#Monthly-Cycle # # Execute this script: bash mcc.sh #============================================================ #------------------------------------------------------------ # Parameters drc_in='/home/wenshanw/data/cmip5/' # Directory of input files drc_out='/home/wenshanw/data/cmip5/output/' # Directory of output files var=( 'snc' 'snd' ) # Variables rlm='LImon' # Realm xpt=( 'historical' ) # Experiment ( could be more ) fld_out=( 'snc/' 'snd/' ) # Folders of output files #------------------------------------------------------------ for var_id in {0..1}; do # Loop over two variables # names of all models # (ls [get file names]; cut [get the part for model names]; # sort; uniq [remove duplicates]; awk [print]) mdl_set=$( ls ${drc_in}${var[var_id]}_${rlm}_*_${xpt[0]}_*.nc | \ cut -d '_' -f 3 | sort | uniq -c | awk '{print $2}' ) for mdl in ${mdl_set}; do ## Loop over models # Average all the ensemble members of each model ncea -O -4 -d time,"1956-01-01 00:00:0.0","2005-12-31 23:59:9.9" \ ${drc_in}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_*.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_all-nsm.nc # Greenland # Geographical weight ncap2 -O -s \ 'gw = cos(lat*3.1415926/180.); \ gw@long_name="geographical weight";gw@units="ratio"' \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_all-nsm.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_all-nsm.nc ncwa -O -w gw -d lat,60.0,75.0 -d lon,300.0,340.0 -a lat,lon \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_all-nsm.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_all-nsm_GN.nc # Anomaly---------------------------------------- for moy in {1..12}; do # Loop over months mm=$( printf "%02d" ${moy} ) # Change to 2-digit format for yr in {1956..2005}; do # Loop over years # If January, calculate the annual average if [ ${moy} -eq 1 ]; then ncra -O -d time,"${yr}-01-01 00:00:0.0","${yr}-12-31 23:59:9.9" \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_\ ${xpt[0]}_all-nsm_GN.nc ${drc_out}${fld_out[var_id]}${var[var_id]}_\ ${rlm}_${mdl}_${xpt[0]}_all-nsm_GN_${yr}.nc fi # The specific month ncks -O -d time,"${yr}-${mm}-01 00:00:0.0","${yr}-${mm}-31 23:59:9.9" \ ${drc_out}${fld_out[var_id]}${var[var_id]}_\ ${rlm}_${mdl}_${xpt[0]}_all-nsm_GN.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_\ all-nsm_GN_${yr}${mm}.nc # Subtract the annual average from the monthly data ncbo -O --op_typ=- ${drc_out}${fld_out[var_id]}${var[var_id]}_\ ${rlm}_${mdl}_${xpt[0]}_all-nsm_GN_${yr}${mm}.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_${xpt[0]}_\ all-nsm_GN_${yr}.nc ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_\ ${mdl}_${xpt[0]}_all-nsm_GN_${yr}${mm}_anm.nc done # Average over years ncra -O ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_\ ${xpt[0]}_all-nsm_GN_????${mm}_anm.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_\ ${xpt[0]}_all-nsm_GN_${mm}_anm.nc done #-------------------------------------------------- # Concatenate months together ncrcat -O ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_${mdl}_\ ${xpt[0]}_all-nsm_GN_??_anm.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${mdl}.nc echo Model ${mdl} done! done rm -f ${drc_out}${fld_out[var_id]}${var[var_id]}*historical* # Store models as groups in the output file ncecat -O --gag -v ${var[var_id]} \ ${drc_out}${fld_out[var_id]}${var[var_id]}_*.nc \ ${drc_out}${fld_out[var_id]}${var[var_id]}_${rlm}_all-mdl_\ ${xpt[0]}_all-nsm_GN_mthly-anm.nc echo Var ${var[var_id]} done! done
Using group feature and hyperslabs of ncbo
,
the script will be shortened.
#!/bin/bash #============================================================ # Monthly cycle of each ensemble member in Greenland # # Input file from cmb_fl_grpsh # sn_LImon_all-mdl_all-xpt_all-nsm_199001-200512.nc # Online: http://nco.sourceforge.net/nco.html#Monthly-Cycle # # Execute this script in command line: bash mcc_grp.sh #============================================================ # Input and output directory drc='../data/grp/' # Constants pfx='sn_LImon_all-mdl_all-xpt_all-nsm_200001-200512' # Greenland ncwa -O -w gw -d lat,60.0,75.0 -d lon,300.0,340.0 -a lat,lon \ ${drc}${pfx}.nc ${drc}${pfx}_grl.nc # Anomaly from annual average of each year for yyyy in {2000..2005}; do # Annual average ncwa -O -d time,"${yyyy}-01-01","${yyyy}-12-31" \ ${drc}${pfx}_grl.nc ${drc}${pfx}_grl_${yyyy}.nc # Anomaly ncbo -O --op_typ=- -d time,"${yyyy}-01-01","${yyyy}-12-31" \ ${drc}${pfx}_grl.nc ${drc}${pfx}_grl_${yyyy}.nc \ ${drc}${pfx}_grl_${yyyy}_anm.nc done # Monthly cycle for moy in {1..12}; do mm=$( printf "%02d" ${moy} ) # Change to 2-digit format ncra -O -d time,"2000-${mm}-01",,12 \ ${drc}${pfx}_grl_????_anm.nc ${drc}${pfx}_grl_${mm}_anm.nc done # Concatenate 12 months together ncrcat -O ${drc}${pfx}_grl_??_anm.nc \ ${drc}${pfx}_grl_mth_anm.nc
In order to compare the results between MODIS and CMIP5 models, one usually regrids one or both datasets so that the spatial resolutions match. Here, the script illustrates how to regrid MODIS data. Key steps include:
ncrename
(see ncrename
netCDF Renamer).
Main Script
#!/bin/bash # include bi_interp.nco #=========================================================================== # Example for # - regrid (using bi_interp.nco): the spatial resolution of MODIS data # is much finer than those of CMIP5 models. In order to compare # the two, we can regrid MODIS data to comform to CMIP5. # # Input files (Note: the .hdf files downloaded have to be converted to .nc at # the present): # /modis/mcd43c3/MCD43C3.A2000049.005.2006271205532.nc # # Output files: # /modis/mcd43c3/cesm-grid/MCD43C3.2000049.regrid.nc # # Online: http://nco.sourceforge.net/nco.html#Regrid-MODIS-Data # # Execute this script: bash rgr.sh #=========================================================================== var=( 'MCD43C3' ) # Variable fld_in=( 'monthly/' ) # Folder of input files fld_out=( 'cesm-grid/' ) # Folder of output files drc_in='/media/grele_data/wenshan/modis/mcd43c3/' # Directory of input files for fn in $( ls ${drc_in}${fld_in}${var}.*.nc ); do # Loop over files sfx=$( echo $fn | cut -d '/' -f 8 | cut -d '.' -f 2 ) # Part of file names # Regrid ncap2 -O -S bi_interp.nco ${fn} ${drc_in}${fld_out}${var}.${sfx}.regrid.nc # Keep only the new variables ncks -O -v wsa_sw_less,bsa_sw_less ${drc_in}${fld_out}${var}.${sfx}.regrid.nc \ ${drc_in}${fld_out}${var}.${sfx}.regrid.nc # Rename the new variables, dimensions and attributions ncrename -O -d latn,lat -d lonn,lon -v latn,lat -v lonn,lon \ -v wsa_sw_less,wsa_sw -v bsa_sw_less,bsa_sw -a missing_value,_FillValue \ ${drc_in}${fld_out}${var}.${sfx}.regrid.nc echo $sfx done. done
bi_interp.nco
// Bilinear interpolation // Included by rgr.sh // Online: http://nco.sourceforge.net/nco.html#Regrid-MODIS-Data defdim("latn",192); // Define new dimension: latitude defdim("lonn",288); // Define new dimension: longitude latn[$latn] = {90,89.0576 ,88.1152 ,87.1728 ,86.2304 ,85.288 ,\ 84.3456 ,83.4031 ,82.4607 ,81.5183 ,80.5759 ,79.6335 ,78.6911 ,\ 77.7487 ,76.8063 ,75.8639 ,74.9215 ,73.9791 ,73.0367 ,72.0942 ,\ 71.1518 ,70.2094 ,69.267 ,68.3246 ,67.3822 ,66.4398 ,65.4974 ,\ 64.555 ,63.6126 ,62.6702 ,61.7277 ,60.7853 ,59.8429 ,58.9005 ,\ 57.9581 ,57.0157 ,56.0733 ,55.1309 ,54.1885 ,53.2461 ,52.3037 ,\ 51.3613 ,50.4188 ,49.4764 ,48.534 ,47.5916 ,46.6492 ,45.7068 ,\ 44.7644 ,43.822 ,42.8796 ,41.9372 ,40.9948 ,40.0524 ,39.11 ,\ 38.1675 ,37.2251 ,36.2827 ,35.3403 ,34.3979 ,33.4555 ,32.5131 ,\ 31.5707 ,30.6283 ,29.6859 ,28.7435 ,27.8011 ,26.8586 ,25.9162 ,\ 24.9738 ,24.0314 ,23.089 ,22.1466 ,21.2042 ,20.2618 ,19.3194 ,\ 18.377 ,17.4346 ,16.4921 ,15.5497 ,14.6073 ,13.6649 ,12.7225 ,\ 11.7801 ,10.8377 ,9.89529 ,8.95288 ,8.01047 ,7.06806 ,6.12565 ,\ 5.18325 ,4.24084 ,3.29843 ,2.35602 ,1.41361 ,0.471204,-0.471204,\ -1.41361,-2.35602,-3.29843,-4.24084,-5.18325,-6.12565,-7.06806,\ -8.01047,-8.95288,-9.89529,-10.8377,-11.7801,-12.7225,-13.6649,\ -14.6073,-15.5497,-16.4921,-17.4346,-18.377 ,-19.3194,-20.2618,\ -21.2042,-22.1466,-23.089 ,-24.0314,-24.9738,-25.9162,-26.8586,\ -27.8011,-28.7435,-29.6859,-30.6283,-31.5707,-32.5131,-33.4555,\ -34.3979,-35.3403,-36.2827,-37.2251,-38.1675,-39.11 ,-40.0524,\ -40.9948,-41.9372,-42.8796,-43.822 ,-44.7644,-45.7068,-46.6492,\ -47.5916,-48.534 ,-49.4764,-50.4188,-51.3613,-52.3037,-53.2461,\ -54.1885,-55.1309,-56.0733,-57.0157,-57.9581,-58.9005,-59.8429,\ -60.7853,-61.7277,-62.6702,-63.6126,-64.555 ,-65.4974,-66.4398,\ -67.3822,-68.3246,-69.267 ,-70.2094,-71.1518,-72.0942,-73.0367,\ -73.9791,-74.9215,-75.8639,-76.8063,-77.7487,-78.6911,-79.6335,\ -80.5759,-81.5183,-82.4607,-83.4031,-84.3456,-85.288,-86.2304,\ -87.1728,-88.1152,-89.0576,-90}; // Copy of CCSM4 latitude lonn[$lonn] = {-178.75,-177.5,-176.25,-175,-173.75,-172.5,-171.25,\ -170,-168.75,-167.5,-166.25,-165,-163.75,-162.5,-161.25,-160,\ -158.75,-157.5,-156.25,-155,-153.75,-152.5,-151.25,-150,-148.75,\ -147.5,-146.25,-145,-143.75,-142.5,-141.25,-140,-138.75,-137.5,\ -136.25,-135,-133.75,-132.5,-131.25,-130,-128.75,-127.5,-126.25,\ -125,-123.75,-122.5,-121.25,-120,-118.75,-117.5,-116.25,-115,\ -113.75,-112.5,-111.25,-110,-108.75,-107.5,-106.25,-105,-103.75,\ -102.5,-101.25,-100,-98.75,-97.5,-96.25,-95,-93.75,-92.5,-91.25,\ -90,-88.75,-87.5,-86.25,-85,-83.75,-82.5,-81.25,-80,-78.75,-77.5,\ -76.25,-75,-73.75,-72.5,-71.25,-70,-68.75,-67.5,-66.25,-65,-63.75,\ -62.5,-61.25,-60,-58.75,-57.5,-56.25,-55,-53.75,-52.5,-51.25,-50,\ -48.75,-47.5,-46.25,-45,-43.75,-42.5,-41.25,-40,-38.75,-37.5,\ -36.25,-35,-33.75,-32.5,-31.25,-30,-28.75,-27.5,-26.25,-25,-23.75,\ -22.5,-21.25,-20,-18.75,-17.5,-16.25,-15,-13.75,-12.5,-11.25,-10,\ -8.75,-7.5,-6.25,-5,-3.75,-2.5,-1.25,0,1.25,2.5,3.75,5,6.25,7.5,\ 8.75,10,11.25,12.5,13.75,15,16.25,17.5,18.75,20,21.25,22.5,23.75,\ 25,26.25,27.5,28.75,30,31.25,32.5,33.75,35,36.25,37.5,38.75,40,\ 41.25,42.5,43.75,45,46.25,47.5,48.75,50,51.25,52.5,53.75,55,56.25,\ 57.5,58.75,60,61.25,62.5,63.75,65,66.25,67.5,68.75,70,71.25,72.5,\ 73.75,75,76.25,77.5,78.75,80,81.25,82.5,83.75,85,86.25,87.5,88.75,\ 90,91.25,92.5,93.75,95,96.25,97.5,98.75,100,101.25,102.5,103.75,\ 105,106.25,107.5,108.75,110,111.25,112.5,113.75,115,116.25,117.5,\ 118.75,120,121.25,122.5,123.75,125,126.25,127.5,128.75,130,131.25,\ 132.5,133.75,135,136.25,137.5,138.75,140,141.25,142.5,143.75,145,\ 146.25,147.5,148.75,150,151.25,152.5,153.75,155,156.25,157.5,\ 158.75,160,161.25,162.5,163.75,165,166.25,167.5,168.75,170,171.25,\ 172.5,173.75,175,176.25,177.5,178.75,180}; // Copy of CCSM4 longitude *out[$time,$latn,$lonn]=0.0; // Output structure // Bi-linear interpolation bsa_sw_less=bilinear_interp_wrap(bsa_sw,out,latn,lonn,lat,lon); wsa_sw_less=bilinear_interp_wrap(wsa_sw,out,latn,lonn,lat,lon); // Add attributions latn@units = "degree_north"; lonn@units = "degree_east"; latn@long_name = "latitude"; lonn@long_name = "longitude"; bsa_sw_less@hdf_name = "Albedo_BSA_shortwave"; bsa_sw_less@calibrated_nt = 5; bsa_sw_less@missing_value = 32767.0; bsa_sw_less@units = "albedo, no units"; bsa_sw_less@long_name = "Global_Albedo_BSA_shortwave"; wsa_sw_less@hdf_name = "Albedo_WSA_shortwave"; wsa_sw_less@calibrated_nt = 5; wsa_sw_less@missing_value = 32767.0; wsa_sw_less@units = "albedo, no units"; wsa_sw_less@long_name = "Global_Albedo_WSA_shortwave";
Main Script
#!/bin/bash #============================================================ # Example for # - regrid (using bi_interp.nco): the spatial resolution of MODIS data # is much finer than those of CMIP5 models. In order to compare # the two, we can regrid MODIS data to comform to CMIP5. # - add coordinates (using coor.nco): there is no coordinate information # in MODIS data. We have to add it manually now. # # Input files: # /modis/mcd43c3/cesm-grid/MCD43C3.2000049.regrid.nc # # Output files: # /modis/mcd43c3/cesm-grid/MCD43C3.2000049.regrid.nc # # Online: http://nco.sourceforge.net/nco.html#Add-Coordinates-to-MODIS-Data # # Execute this script: bash add_crd.sh #============================================================ var=( 'MOD10CM' ) # Variable fld_in=( 'snc/nc/' ) # Folder of input files drc_in='/media/grele_data/wenshan/modis/' # directory of input files for fn in $( ls ${drc_in}${fld_in}${var}*.nc ); do # Loop over files sfx=$( echo ${fn} | cut -d '/' -f 8 | cut -d '.' -f 2-4 ) # Part of file names echo ${sfx} # Rename dimension names ncrename -d YDim_MOD_CMG_Snow_5km,lat -d XDim_MOD_CMG_Snow_5km,lon -O \ ${drc_in}${fld_in}${var}.${sfx}.nc ${drc_in}${fld_in}${var}.${sfx}.nc # Add coordinates ncap2 -O -S crd.nco ${drc_in}${fld_in}${var}.${sfx}.nc \ ${drc_in}${fld_in}${var}.${sfx}.nc done
crd.nco
// Add coordinates to MODIS HDF data // Included by add_crd.sh // Online: http://nco.sourceforge.net/nco.html#Add-Coordinates-to-MODIS-Data lon = array(0.f, 0.05, $lon) - 180; lat = 90.f- array(0.f, 0.05, $lat);
MODIS orders latitude data from 90°N to -90°N, and longitude from -180°E to 180°E. However, CMIP5 orders latitude from -90°N to 90°N, and longitude from 0°E to 360°E. This script changes the MODIS coordinates to follow the CMIP5 convention.
#!/bin/bash ##=========================================================================== ## Example for ## - permute coordinates: the grid of MODIS is ## from (-180 degE, 90 degN), the left-up corner, to ## (180 degE, -90 degN), the right-low corner. However, CMIP5 is ## from (0 degE, -90 degN) to (360 degE, 90 degN). The script ## here changes the MODIS grid to CMIP5 grid. ## ## Input files: ## /modis/mcd43c3/cesm-grid/MCD43C3.2000049.regrid.nc ## ## Output files: ## /modis/mcd43c3/cesm-grid/MCD43C3.2000049.regrid.nc ## ## Online: http://nco.sourceforge.net/nco.html#Permute-MODIS-Coordinates ## ## Execute this script: bash pmt_crd.sh ##=========================================================================== ##--------------------------------------------------------------------------- ## Permute coordinates ## - Inverse lat from (90,-90) to (-90,90) ## - Permute lon from (-180,180) to (0,360) for fn in $( ls MCD43C3.*.nc ); do # Loop over files sfx=$( echo ${fn} | cut -d '.' -f 1-3 ) # Part of file names echo ${sfx} ## Lat ncpdq -O -a -lat ${fn} ${fn} # Inverse latitude (NB: there is '-' before 'lat') ## Lon ncks -O --msa -d lon,0.0,180.0 -d lon,-180.0,-1.25 ${fn} ${fn} ## Add new longitude coordinates ncap2 -O -s 'lon=array(0.0,1.25,$lon)' ${fn} ${fn} done
This section will describe NCO scripting strategies. Many techniques can be used to exploit script-level parallelism, including GNU Parallel and Swift.
ls *historical*.nc | parallel ncks -O -d time,"1950-01-01","2000-01-01" {} 50y/{}
This chapter illustrates how to use NCO to process and analyze the results of a CCSM climate simulation.
************************************************************************ Task 0: Finding input files x************************************************************************ The CCSM model outputs files to a local directory like: /ptmp/zender/archive/T42x1_40 Each component model has its own subdirectory, e.g., /ptmp/zender/archive/T42x1_40/atm /ptmp/zender/archive/T42x1_40/cpl /ptmp/zender/archive/T42x1_40/ice /ptmp/zender/archive/T42x1_40/lnd /ptmp/zender/archive/T42x1_40/ocn within which model output is tagged with the particular model name /ptmp/zender/archive/T42x1_40/atm/T42x1_40.cam2.h0.0001-01.nc /ptmp/zender/archive/T42x1_40/atm/T42x1_40.cam2.h0.0001-02.nc /ptmp/zender/archive/T42x1_40/atm/T42x1_40.cam2.h0.0001-03.nc ... /ptmp/zender/archive/T42x1_40/atm/T42x1_40.cam2.h0.0001-12.nc /ptmp/zender/archive/T42x1_40/atm/T42x1_40.cam2.h0.0002-01.nc /ptmp/zender/archive/T42x1_40/atm/T42x1_40.cam2.h0.0002-02.nc ... or /ptmp/zender/archive/T42x1_40/lnd/T42x1_40.clm2.h0.0001-01.nc /ptmp/zender/archive/T42x1_40/lnd/T42x1_40.clm2.h0.0001-02.nc /ptmp/zender/archive/T42x1_40/lnd/T42x1_40.clm2.h0.0001-03.nc ... ************************************************************************ Task 1: Regional processing ************************************************************************ A common task in data processing is often creating seasonal cycles. Imagine a 100-year simulation with its 1200 monthly mean files. Our goal is to create a single file containing 12 months of data. Each month in the output file is the mean of 100 input files. Normally, we store the "reduced" data in a smaller, local directory. caseid='T42x1_40' #drc_in="${DATA}/archive/${caseid}/atm" drc_in="${DATA}/${caseid}" drc_out="${DATA}/${caseid}" mkdir -p ${drc_out} cd ${drc_out} Method 1: Assume all data in directory applies for mth in {1..12}; do mm=`printf "%02d" $mth` ncra -O -D 1 -o ${drc_out}/${caseid}_clm${mm}.nc \ ${drc_in}/${caseid}.cam2.h0.*-${mm}.nc done # end loop over mth Method 2: Use shell 'globbing' to construct input filenames for mth in {1..12}; do mm=`printf "%02d" $mth` ncra -O -D 1 -o ${drc_out}/${caseid}_clm${mm}.nc \ ${drc_in}/${caseid}.cam2.h0.00??-${mm}.nc \ ${drc_in}/${caseid}.cam2.h0.0100-${mm}.nc done # end loop over mth Method 3: Construct input filename list explicitly for mth in {1..12}; do mm=`printf "%02d" $mth` fl_lst_in='' for yr in {1..100}; do yyyy=`printf "%04d" $yr` fl_in=${caseid}.cam2.h0.${yyyy}-${mm}.nc fl_lst_in="${fl_lst_in} ${caseid}.cam2.h0.${yyyy}-${mm}.nc" done # end loop over yr ncra -O -D 1 -o ${drc_out}/${caseid}_clm${mm}.nc -p ${drc_in} \ ${fl_lst_in} done # end loop over mth Make sure the output file averages correct input files! ncks --trd -M prints global metadata: ncks --trd -M ${drc_out}/${caseid}_clm01.nc The input files ncra used to create the climatological monthly mean will appear in the global attribute named 'history'. Use ncrcat to aggregate the climatological monthly means ncrcat -O -D 1 \ ${drc_out}/${caseid}_clm??.nc ${drc_out}/${caseid}_clm_0112.nc Finally, create climatological means for reference. The climatological time-mean: ncra -O -D 1 \ ${drc_out}/${caseid}_clm_0112.nc ${drc_out}/${caseid}_clm.nc The climatological zonal-mean: ncwa -O -D 1 -a lon \ ${drc_out}/${caseid}_clm.nc ${drc_out}/${caseid}_clm_x.nc The climatological time- and spatial-mean: ncwa -O -D 1 -a lon,lat,time -w gw \ ${drc_out}/${caseid}_clm.nc ${drc_out}/${caseid}_clm_xyt.nc This file contains only scalars, e.g., "global mean temperature", used for summarizing global results of a climate experiment. Climatological monthly anomalies = Annual Cycle: Subtract climatological mean from climatological monthly means. Result is annual cycle, i.e., climate-mean has been removed. ncbo -O -D 1 -o ${drc_out}/${caseid}_clm_0112_anm.nc \ ${drc_out}/${caseid}_clm_0112.nc ${drc_out}/${caseid}_clm_xyt.nc ************************************************************************ Task 2: Correcting monthly averages ************************************************************************ The previous step appoximates all months as being equal, so, e.g., February weighs slightly too much in the climatological mean. This approximation can be removed by weighting months appropriately. We must add the number of days per month to the monthly mean files. First, create a shell variable dpm: unset dpm # Days per month declare -a dpm dpm=(0 31 28.25 31 30 31 30 31 31 30 31 30 31) # Allows 1-based indexing Method 1: Create dpm directly in climatological monthly means for mth in {1..12}; do mm=`printf "%02d" ${mth}` ncap2 -O -s "dpm=0.0*date+${dpm[${mth}]}" \ ${drc_out}/${caseid}_clm${mm}.nc ${drc_out}/${caseid}_clm${mm}.nc done # end loop over mth Method 2: Create dpm by aggregating small files for mth in {1..12}; do mm=`printf "%02d" ${mth}` ncap2 -O -v -s "dpm=${dpm[${mth}]}" ~/nco/data/in.nc \ ${drc_out}/foo_${mm}.nc done # end loop over mth ncecat -O -D 1 -p ${drc_out} -n 12,2,2 foo_${mm}.nc foo.nc ncrename -O -D 1 -d record,time ${drc_out}/foo.nc ncatted -O -h \ -a long_name,dpm,o,c,"Days per month" \ -a units,dpm,o,c,"days" \ ${drc_out}/${caseid}_clm_0112.nc ncks -A -v dpm ${drc_out}/foo.nc ${drc_out}/${caseid}_clm_0112.nc Method 3: Create small netCDF file using ncgen cat > foo.cdl << 'EOF' netcdf foo { dimensions: time=unlimited; variables: float dpm(time); dpm:long_name="Days per month"; dpm:units="days"; data: dpm=31,28.25,31,30,31,30,31,31,30,31,30,31; } EOF ncgen -b -o foo.nc foo.cdl ncks -A -v dpm ${drc_out}/foo.nc ${drc_out}/${caseid}_clm_0112.nc Another way to get correct monthly weighting is to average daily output files, if available. ************************************************************************ Task 3: Regional processing ************************************************************************ Let's say you are interested in examining the California region. Hyperslab your dataset to isolate the appropriate latitude/longitudes. ncks -O -D 1 -d lat,30.0,37.0 -d lon,240.0,270.0 \ ${drc_out}/${caseid}_clm_0112.nc \ ${drc_out}/${caseid}_clm_0112_Cal.nc The dataset is now much smaller! To examine particular metrics. ************************************************************************ Task 4: Accessing data stored remotely ************************************************************************ OPeNDAP server examples: UCI DAP servers: ncks --trd -M -p http://dust.ess.uci.edu/cgi-bin/dods/nph-dods/dodsdata in.nc ncrcat -O -C -D 3 \ -p http://dust.ess.uci.edu/cgi-bin/dods/nph-dods/dodsdata \ -l /tmp in.nc in.nc ~/foo.nc Unidata DAP servers: ncks --trd -M -p http://thredds-test.ucar.edu/thredds/dodsC/testdods in.nc ncrcat -O -C -D 3 \ -p http://thredds-test.ucar.edu/thredds/dodsC/testdods \ -l /tmp in.nc in.nc ~/foo.nc NOAA DAP servers: ncwa -O -C -a lat,lon,time -d lon,-10.,10. -d lat,-10.,10. -l /tmp -p \ http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis.dailyavgs/surface \ pres.sfc.1969.nc ~/foo.nc LLNL PCMDI IPCC OPeNDAP Data Portal: ncks --trd -M -p http://username:password@esgcet.llnl.gov/cgi-bin/dap-cgi.py/ipcc4/sresa1b/ncar_ccsm3_0 pcmdi.ipcc4.ncar_ccsm3_0.sresa1b.run1.atm.mo.xml Earth System Grid (ESG): http://www.earthsystemgrid.org caseid='b30.025.ES01' CCSM3.0 1% increasing CO2 run, T42_gx1v3, 200 years starting in year 400 Atmospheric post-processed data, monthly averages, e.g., /data/zender/tmp/b30.025.ES01.cam2.h0.TREFHT.0400-01_cat_0449-12.nc /data/zender/tmp/b30.025.ES01.cam2.h0.TREFHT.0400-01_cat_0599-12.nc ESG supports password-protected FTP access by registered users NCO uses the .netrc file, if present, for password-protected FTP access Syntax for accessing single file is, e.g., ncks -O -D 3 \ -p ftp://climate.llnl.gov/sresa1b/atm/mo/tas/ncar_ccsm3_0/run1 \ -l /tmp tas_A1.SRESA1B_1.CCSM.atmm.2000-01_cat_2099-12.nc ~/foo.nc # Average surface air temperature tas for SRESA1B scenario # This loop is illustrative and will not work until NCO correctly # translates '*' to FTP 'mget' all remote files for var in 'tas'; do for scn in 'sresa1b'; do for mdl in 'cccma_cgcm3_1 cccma_cgcm3_1_t63 cnrm_cm3 csiro_mk3_0 \ gfdl_cm2_0 gfdl_cm2_1 giss_aom giss_model_e_h giss_model_e_r \ iap_fgoals1_0_g inmcm3_0 ipsl_cm4 miroc3_2_hires miroc3_2_medres \ miub_echo_g mpi_echam5 mri_cgcm2_3_2a ncar_ccsm3_0 ncar_pcm1 \ ukmo_hadcm3 ukmo_hadgem1'; do for run in '1'; do ncks -R -O -D 3 -p ftp://climate.llnl.gov/${scn}/atm/mo/${var}/${mdl}/run${run} -l ${DATA}/${scn}/atm/mo/${var}/${mdl}/run${run} '*' ${scn}_${mdl}_${run}_${var}_${yyyymm}_${yyyymm}.nc done # end loop over run done # end loop over mdl done # end loop over scn done # end loop over var cd sresa1b/atm/mo/tas/ukmo_hadcm3/run1/ ncks -H -m -v lat,lon,lat_bnds,lon_bnds -M tas_A1.nc | m bds -x 096 -y 073 -m 33 -o ${DATA}/data/dst_3.75x2.5.nc # ukmo_hadcm3 ncview ${DATA}/data/dst_3.75x2.5.nc # msk_rgn is California mask on ukmo_hadcm3 grid # area is correct area weight on ukmo_hadcm3 grid ncks -A -v area,msk_rgn ${DATA}/data/dst_3.75x2.5.nc \ ${DATA}/sresa1b/atm/mo/tas/ukmo_hadcm3/run1/area_msk_ukmo_hadcm3.nc Template for standardized data: ${scn}_${mdl}_${run}_${var}_${yyyymm}_${yyyymm}.nc e.g., raw data ${DATA}/sresa1b/atm/mo/tas/ukmo_hadcm3/run1/tas_A1.nc becomes standardized data Level 0: raw from IPCC site--no changes except for name Make symbolic link name match raw data Template: ${scn}_${mdl}_${run}_${var}_${yyyymm}_${yyyymm}.nc ln -s -f tas_A1.nc sresa1b_ukmo_hadcm3_run1_tas_200101_209911.nc area_msk_ukmo_hadcm3.nc Level I: Add all variables (not standardized in time) to file containing msk_rgn and area Template: ${scn}_${mdl}_${run}_${yyyymm}_${yyyymm}.nc /bin/cp area_msk_ukmo_hadcm3.nc sresa1b_ukmo_hadcm3_run1_200101_209911.nc ncks -A -v tas sresa1b_ukmo_hadcm3_run1_tas_200101_209911.nc \ sresa1b_ukmo_hadcm3_run1_200101_209911.nc ncks -A -v pr sresa1b_ukmo_hadcm3_run1_pr_200101_209911.nc \ sresa1b_ukmo_hadcm3_run1_200101_209911.nc If already have file then: mv sresa1b_ukmo_hadcm3_run1_200101_209911.nc foo.nc /bin/cp area_msk_ukmo_hadcm3.nc sresa1b_ukmo_hadcm3_run1_200101_209911.nc ncks -A -v tas,pr foo.nc sresa1b_ukmo_hadcm3_run1_200101_209911.nc Level II: Correct # years, months Template: ${scn}_${mdl}_${run}_${var}_${yyyymm}_${yyyymm}.nc ncks -d time,....... file1.nc file2.nc ncrcat file2.nc file3.nc sresa1b_ukmo_hadcm3_run1_200001_209912.nc Level III: Many derived products from level II, e.g., A. Global mean timeseries ncwa -w area -a lat,lon \ sresa1b_ukmo_hadcm3_run1_200001_209912.nc \ sresa1b_ukmo_hadcm3_run1_200001_209912_xy.nc B. Califoria average timeseries ncwa -m msk_rgn -w area -a lat,lon \ sresa1b_ukmo_hadcm3_run1_200001_209912.nc \ sresa1b_ukmo_hadcm3_run1_200001_209912_xy_Cal.nc
Jump to: | _
-
;
:
?
.
'
"
[
@
*
/
\
#
%
^
+
<
|
$
0
3
6
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z |
---|
Jump to: | _
-
;
:
?
.
'
"
[
@
*
/
\
#
%
^
+
<
|
$
0
3
6
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z |
---|
To produce these formats, nco.texi was simply run through the
freely available programs texi2dvi
, dvips
,
texi2html
, and makeinfo
.
Due to a bug in TeX, the resulting Postscript file, nco.ps,
contains the Table of Contents as the final pages.
Thus if you print nco.ps, remember to insert the Table of
Contents after the cover sheet before you staple the manual.
The ‘_BSD_SOURCE’ token is required on some Linux platforms where
gcc
dislikes the network header files like
netinet/in.h).
NCO may still build with an
ANSI or ISO C89 or C94/95-compliant compiler if the
C pre-processor undefines the restrict
type qualifier, e.g.,
by invoking the compiler with ‘-Drestrict=''’.
The Cygwin package is available from
http://sourceware.redhat.com/cygwin
Currently, Cygwin 20.x comes with the GNU C/C++
compilers (gcc
, g++
.
These GNU compilers may be used to build the netCDF
distribution itself.
The ldd
command, if it is available on your system,
will tell you where the executable is looking for each dynamically
loaded library. Use, e.g., ldd `which nces`
.
The Hierarchical Data Format, or HDF, is another self-describing data format similar to, but more elaborate than, netCDF. HDF comes in two flavors, HDF4 and HDF5. Often people use the shorthand HDF to refer to the older format HDF4. People almost always use HDF5 to refer to HDF5.
One must link the NCO code to the HDF4 MFHDF library instead of the usual netCDF library. Apparently ‘MF’ stands for Multi-file not for Mike Folk. In any case, until about 2007 the MFHDF library only supported netCDF2 calls. Most people will never again install NCO 1.2.x and so will never use NCO to write HDF4 files. It is simply too much trouble.
The procedure for doing this is documented at http://www.unidata.ucar.edu/software/netcdf/docs/build_hdf4.html.
Prior to NCO version 4.4.0 (January, 2014), we recommended the
ncl_convert2nc
tool to convert HDF to netCDF3 when
both these are true: 1. You must have netCDF3 and 2. the
HDF file contains netCDF4 atomic types.
More recent versions of NCO handle this problem fine, and
include other advantages so we no longer recommend
ncl_convert2nc
because ncks
is faster and more
space-efficient.
Both automatically convert netCDF4 types to netCDF3 types, yet
ncl_convert2nc
cannot produce full netCDF4 files.
In contrast, ncks
will happily convert HDF straight
to netCDF4 files with netCDF4 types.
Hence ncks
can and does preserve the variable types.
Unsigned bytes stay unsigned bytes.
64-bit integers stay 64-bit integers.
Strings stay strings.
Hence, ncks
conversions often result in smaller files than
ncl_convert2nc
conversions.
Another tool useful for converting netCDF3 to netCDF4 files, and whose
functionality is, we think, also matched or exceeded by ncks
,
is the Python script nc3tonc4
by Jeff Whitaker.
Two real-world examples: NCO translates the
NASA CERES dimension (FOV) Footprints
to
_FOV_ Footprints
, and
Cloud & Aerosol, Cloud Only, Clear Sky w/Aerosol, and Clear Sky
(yes, the dimension name includes whitespace and special characters) to
Cloud & Aerosol, Cloud Only, Clear Sky w_Aerosol, and Clear Sky
ncl_convert2nc
makes the element name netCDF-safe in a
slightly different manner, and also stores the original name in the
hdf_name
attribute.
The ncrename
and ncatted
operators are
exceptions to this rule.
See ncrename
netCDF Renamer.
The OS-specific system move command is used.
This is mv
for UNIX, and move
for Windows.
The terminology merging is reserved for an (unwritten) operator which replaces hyperslabs of a variable in one file with hyperslabs of the same variable from another file
Yes, the terminology is confusing. By all means mail me if you think of a better nomenclature. Should NCO use paste instead of append?
Currently
nces
and ncrcat
are symbolically linked to the
ncra
executable, which behaves slightly differently based on
its invocation name (i.e., ‘argv[0]’).
These three operators share the same source code, and merely have
different inner loops.
The third averaging operator, ncwa
, is the most
sophisticated averager in NCO.
However, ncwa
is in a different class than ncra
and
nces
because it operates on a single file per invocation (as
opposed to multiple files).
On that single file, however, ncwa
provides a richer set of
averaging options—including weighting, masking, and broadcasting.
The exact length which exceeds the operating system internal
limit for command line lengths varies across OSs and shells.
GNU bash
may not have any arbitrary fixed limits to the
size of command line arguments.
Many OSs cannot handle command line arguments (including
results of file globbing) exceeding 4096 characters.
By contrast NC_INT
and its deprecated synonym
NC_LONG
are only four-bytes.
Perhaps this is one reason why the NC_LONG
token is
deprecated.
If a getopt_long
function cannot be found on the system,
NCO will use the getopt_long
from the
my_getopt
package by Benjamin Sittler
bsittler@iname.com.
This is BSD-licensed software available from
http://www.geocities.com/ResearchTriangle/Node/9405/#my_getopt.
NCO supports decoding ENVI images in support of the
DOE Terraref project.
These options are indicated via the ncks
‘--trr’ switch,
and are otherwise undocumented.
Please contact us if more support and documentation of handling of
ENVI BIL, BSQ, and BIP images
would be helpful
The ‘-n’ option is a backward-compatible superset of the
NINTAP
option from the NCAR CCM Processor.
The CCM Processor was custom-written Fortran code maintained
for many years by Lawrence Buja at NCAR, and phased-out in
the late 1990s.
NCO copied some ideas, like NINTAP
-functionality,
from CCM Processor capabilities.
NCO does not implement command line options to
specify FTP logins and passwords because copying those data
into the history
global attribute in the output file (done by
default) poses an unacceptable security risk.
The hsi
command must be in the user’s path in one of
the following directories: /usr/local/bin
, /opt/hpss/bin
,
or /ncar/opt/hpss/hsi
.
Tell us if the HPSS installation at your site places the
hsi
command in a different location, and we will add that
location to the list of acceptable paths to search for hsi
.
NCO supported the old NCAR Mass Storage
System (MSS) until version 4.0.7 in April, 2011.
NCO supported MSS-retrievals via a variety of
mechanisms including the msread
, msrcp
, and
nrnet
commands invoked either automatically or with sentinels
like ncks -p mss:/ZENDER/nco -l . in.nc
.
Once the MSS was decommissioned in March, 2011, support for
these retrieval mechanisms was replaced by support for HPSS.
DODS is being deprecated because it is ambiguous, referring both to a protocol and to a collection of (oceanography) data. It is superceded by two terms. DAP is the discipline-neutral Data Access Protocol at the heart of DODS. The National Virtual Ocean Data System (NVODS) refers to the collection of oceanography data and oceanographic extensions to DAP. In other words, NVODS is implemented with OPeNDAP. OPeNDAP is also the open source project which maintains, develops, and promulgates the DAP standard. OPeNDAP and DAP really are interchangeable. Got it yet?
Automagic support for DODS version 3.2.x was deprecated in December, 2003 after NCO version 2.8.4. NCO support for OPeNDAP versions 3.4.x commenced in December, 2003, with NCO version 2.8.5. NCO support for OPeNDAP versions 3.5.x commenced in June, 2005, with NCO version 3.0.1. NCO support for OPeNDAP versions 3.6.x commenced in June, 2006, with NCO version 3.1.3. NCO support for OPeNDAP versions 3.7.x commenced in January, 2007, with NCO version 3.1.9.
The minimal set of libraries required to build NCO as OPeNDAP clients, where OPeNDAP is supplied as a separate library apart from libnetcdf.a, are, in link order, libnc-dap.a, libdap.a, and libxml2 and libcurl.a.
We are most familiar with the OPeNDAP ability to enable network-transparent data access. OPeNDAP has many other features, including sophisticated hyperslabbing and server-side processing via constraint expressions. If you know more about this, please consider writing a section on “OPeNDAP Capabilities of Interest to NCO Users” for incorporation in the NCO User Guide.
For example, DAP servers do not like variables with periods (“.”) in their names even though this is perfectly legal with netCDF. Such names may cause the DAP service to fail because DAP interprets the period as structure delimiter in an HTTP query string.
The reason (and mnemonic) for ‘-7’ is that NETCDF4_CLASSIC
files include great features of both netCDF3 (compatibility) and
netCDF4 (compression, chunking) and, well, 3+4=7.
The switches ‘-5’, ‘--5’, and ‘pnetcdf’ are
reserved for PnetCDF files, i.e., NC_FORMAT_CDF5
.
Such files are similar to netCDF3 classic files, yet also support
64-bit offsets and the additional netCDF4 atomic types.
Linux and AIX do support LFS.
Intersection-mode can also be explicitly invoked with the ‘--nsx’ or ‘--intersection’ switches. These switches are supplied for clarity and consistency and do absolutely nothing since intersection-mode is the default.
Note that the -3 switch should appear after the -G and -g switches. This is due to an artifact of the GPE implementation which we wish to remove in the future.
CFchecker is developed by Michael Decker and Martin Schultz at Forschungszentrum Jülich and distributed at https://bitbucket.org/mde_/cfchecker.
When originally released in 2012 this was called the duration feature, and was abbreviated DRN.
The term FV confusing because it is correct to call any Finite Volume grid (including arbitrary polygons) an FV grid. However, an FV grid has also been used for many years to described the particular type of rectangular grid with caps at the poles used to discretize global model grids for use with the Lin-Rood dynamical core. To reduce confusion, we use “Cap grid” to refer to the latter and reserv FV as a straightforward acronym for Finite Volume.
A Uniform grid in latitude could be called “equi-angular” in latitude, but NCO reserves the term Equi-angular or “eqa” for grids that have the same uniform spacing in both latitude and longitude, e.g., 1°x1° or 2°x2°. NCO reserves the term Regular to refer to grids that are monotonic and rectangular grids. Confusingly, the angular spacing in a Regular grid need not be uniform, it could be irregular, such as in a Gaussian grid. The term Regular is not too useful in grid-generation, because so many other parameters (spacing, centering) are necessary to disambiguate it.
The old functionality, i.e., where the ignored values are indicated by
missing_value
not _FillValue
, may still be selected
at NCO build time by compiling NCO
with the token definition
CPPFLAGS='-UNCO_USE_FILL_VALUE'.
For example, the DOE ARM program often
uses att_type = NC_CHAR
and _FillValue =
‘-99999.’.
This behavior became the default in November 2014 with NCO version 4.4.7. Prior versions would always use netCDF default chunking in the output file when no NCO chunking switches were activated, regardless of the chunking in the input file.
R. Kouznetsov contributed the masking techinques used in BitRound, BitGroomRound, and HalfShave. Thanks Rostislav!
E. Hartnett of NOAA EMC is co-founder and co-maintainer of the CCR. Thanks Ed!
D. Heimbigner (Unidata) helped implement all these features into netCDF. Thanks Dennis!
Full disclosure: Documentation of the meaning of the Shuffle parameter is scarce. I think though am not certain that the Shuffle parameter refers to the number of contiguous byte-groups that the algorithm rearranges a chunk of data into. I call this the stride. Thus the default stride of 4 means that Shuffle rearranges a chunk of 4-byte integers into four consecutive sequences, the first comprises all the leading bytes, the second comprises all the second bytes, etc. A well-behaved stride should evenly divide the number of bytes in a data chunk.
Quantization may never be implemented in netCDF for any CLASSIC or other netCDF3 formats since there is no compression advantage to doing so. Use the NCO implementation to quantize to netCDF3 output formats.
See, e.g., the procedure described in “Compressing atmospheric data into its real information content” by M.~Klower et al., available at https://doi.org/10.1038/s43588-021-00156-2.
Rounding is performed by the internal math library rint()
family of functions that were standardized in C99.
The exact alorithm employed is
val := rint(scale*val)/scale where
scale is the nearest power of 2 that exceeds
10**prc, and the inverse of scale is used when
prc < 0.
For qnt = 3 or qnt = -2, for example, we have
scale = 1024 and scale = 1/128.
Prior to NCO version 5.0.3 (October, 2021), NCO
stored the NSD attribute
number_of_significant_digits
.
However, this was deemed too ambiguous, given the increasing number of
supported quantization methods.
The new attribute names better disambiguate which algorithm was used
to quantize the data.
They also harmonize better with the metadata produced by the upcoming
netCDF library quantization features.
A suggestion by Rich Signell and the nc3tonc4
tool by Jeff
Whitaker inspired NCO to implement PPC.
Note that NCO implements a different DSD algorithm
than nc3tonc4
, and produces slightly different (not
bit-for-bit) though self-consistent and equivalent results.
nc3tonc4
records the precision of its DSD algorithm
in the attribute least_significant_digit
and NCO
does the same for consistency.
The Unidata blog
here also shows how to compress IEEE floating-point data by
zeroing insignificant bits.
The author, John Caron, writes that the technique has been called
“bit-shaving”.
We call the algorithm of always rounding-up “bit-setting”.
And we named the algorithm produced by alternately rounding up and
down (with a few other bells and whistles) “bit-grooming”.
Imagine orthogonally raking an already-groomed Japanese rock garden.
The criss-crossing tracks increase the pattern’s entropy, and this
entropy produces self-compensating instead of accumulating errors
during statistical operations.
The terminology of significant bits (not to mention digits) can be confusing. The IEEE standard devotes 24 and 53 bits, respectively, to the mantissas that determine the precision of single and double precision floating-point numbers. However, the first (i.e., the most significant) of those bits is implicit, and is not explicitly stored. Its value is one unless all of the exponent bits are zero. The implicit bit is significant thought it is not explicitly stored and so cannot be quantized. Therefore single and double precision floats have only 23 and 52 explicitly stored bits, respectively, that can be “kept” and therefore quantized. Each explicit bit kept is as significant as the implicit bit. Thus the number of “keepbits” is one less than the number of significant bits, i.e., the bits that contribute to the precision of an IEEE value. The BitRound quantization algorithm in NCO and in netCDF accept as an input parameter the number of keepbits, i.e., the number of explicit significant bits NESB to retain (i.e., not mask to zero). Unfortunately the acronym NSB has been used instead of the more accurate acronym NESB, and at this point it is difficult to change. Therefore the NSB acronym and parameter as used by NCO and netCDF should be interpreted as “number of stored bits” (i.e., keepbits) not the “number of significant bits”.
The artificial dataset employed is one million evenly spaced values from 1.0–2.0. The analysis data are N=13934592 values of the temperature field from the NASA MERRA analysis of 20130601.
On modern Linux systems the block size defaults to 8192 B. The GLADE filesystem at NCAR has a block size of 512 kB.
Although not a part of the standard, NCO enforces
the policy that the _FillValue
attribute, if any, of a packed
variable is also stored at the original precision.
32767 = 2^15-1
Operators began performing automatic type conversions before arithmetic in NCO version 1.2, August, 2000. Previous versions never performed unnecessary type conversion for arithmetic.
The actual type conversions with trunction were handled by intrinsic
type conversion, so the trunc()
function was never explicitly
called, although the results would be the same if it were.
According to Wikipedia’s summary of IEEE standard 754, “If a decimal string with at most 6 significant digits is converted to IEEE 754 single-precision and then converted back to the same number of significant decimal, then the final string should match the original; and if an IEEE 754 single-precision is converted to a decimal string with at leastn 9 significant decimal and then converted back to single, then the final number must match the original”.
According to Wikipedia’s summary of IEEE standard 754, “If a decimal string with at most 15 significant digits is converted to IEEE 754 double-precision representation and then converted back to a string with the same number of significant digits, then the final string should match the original; and if an IEEE 754 double precision is converted to a decimal string with at least 17 significant digits and then converted back to double, then the final number must match the original”.
See page 21 in Section 1.2 of the First edition for this gem:
One does not need much experience in scientific computing to recognize that the implicit conversion rules are, in fact, sheer madness! In effect, they make it impossible to write efficient numerical programs.
For example, the CMIP5 archive tends to distribute monthly average timeseries in 50-year chunks.
Thanks to Michael J. Prather for explaining this to me.
Note that before version 4.5.0, NCO could,
in append (‘-A’) mode only, inadvertently overwrite the global
metadata (including history
) of the output file with that of the
input file.
This is opposite the behavior most would want.
These are the GSL standard function names postfixed with
_e
.
NCO calls these functions automatically, without the
NCO command having to specifically indicate the _e
function suffix.
ANSI C compilers are guaranteed to support double-precision versions
of these functions.
These functions normally operate on netCDF variables of type NC_DOUBLE
without having to perform intrinsic conversions.
For example, ANSI compilers provide sin
for the sine of C-type
double
variables.
The ANSI standard does not require, but many compilers provide,
an extended set of mathematical functions that apply to single
(float
) and quadruple (long double
) precision variables.
Using these functions (e.g., sinf
for float
,
sinl
for long double
), when available, is (presumably)
more efficient than casting variables to type double
,
performing the operation, and then re-casting.
NCO uses the faster intrinsic functions when they are
available, and uses the casting method when they are not.
Linux supports more of these intrinsic functions than other OSs.
NaN is a special floating point value (not a string).
Arithmetic comparisons to NaN and NaN-like numbers always
return False, contrary to the behavior of all other numbers.
This behavior is difficult to intuit, yet IEEE 754
mandates it.
To correctly handle NaNs during arithmetic, code must use special
math library macros (e.g., isnormal()
) to determine whether
any operand requires special treatment.
If so, additional logic must be added to correctly perform the
arithmetic.
This is in addition to the normal handling incurred to correctly
handle missing values.
Handling field and missing values (either or both of which may be NaN)
in binary operators thus incurs four-to-eight extra code paths.
Each code path slows down arithmetic relative to normal numbers.
This makes supporting NaN arithmetic costly and inefficient.
Hence NCO supports NaN only to the extent necessary to
replace it with a normal number.
Although using NaN for the missing value (or any value) in datasets is
legal in netCDF, we discourage it.
We recommend avoiding NaN entirely.
A naked (i.e., unprotected or unquoted) ‘*’ is a wildcard character. A naked ‘-’ may confuse the command line parser. A naked ‘+’ and ‘/’ are relatively harmless.
The widely used shell Bash correctly interprets all these
special characters even when they are not quoted.
That is, Bash does not prevent NCO from correctly interpreting
the intended arithmetic operation when the following arguments are given
(without quotes) to ncbo
:
‘--op_typ=+’, ‘--op_typ=-’, ‘--op_typ=*’,
and ‘--op_typ=/’
The command to do this is ‘ln -s -f ncbo ncadd’
The command to do this is ‘alias ncadd='ncbo --op_typ=add'’
Prior to NCO version 4.3.1 (May, 2013), ncbo
would only broadcast variables in file_2 to conform to
file_1.
Variables in file_1 were never broadcast to conform to the
dimensions in file_2.
This is because ncra
collapses the record dimension
to a size of 1 (making it a degenerate dimension), but does
not remove it, while, unless ‘-b’ is given, ncwa
removes
all averaged dimensions.
In other words, by default ncra
changes variable size though
not rank, while, ncwa
changes both variable size and rank.
This means that newer (including user-modified) versions of
ncclimo
work fine without re-compiling NCO.
Re-compiling is only necessary to take advantage of new features or
fixes in the NCO binaries, not to improve ncclimo
.
One may download and give executable permissions to the latest source
at https://github.com/nco/nco/tree/master/data/ncclimo without
re-installing the rest of NCO.
At least one known environment (the E3SM-Unified
Anaconda environment at NERSC) prevents users from spawning
scores of processes and may report OpenBLAS/pthread or
RLIMIT_NPROC
-related errors.
A solution seems to be executing ‘ulimit -u unlimited’
We submitted pull-requests to implement the _FillValue
attribute in all MPAS-ocean output in July, 2020.
The status of this PR may be tracked at
https://github.com/MPAS-Dev/MPAS-Model/pull/677.
Once this PR is merged to master, we will do the same
for the MPAS-Seaice and MPAS-Landice models.
The old ncea command was deprecated in NCO version 4.3.9,
released December, 2013.
NCO will attempt to maintain back-compatibility and work
as expected with invocations of ncea
for as long as possible.
Please replace ncea
by nces
in all future work.
As of NCO version 4.4.2 (released February, 2014)
nces
allows hyperslabs in all dimensions so long as the
hyperslabs resolve to the same size.
The fixed (i.e., non-record) dimensions should be the same size in
all ensemble members both before and after hyperslabbing, although
the hyperslabs may (and usually do) change the size of the dimensions
from the input to the output files.
Prior to this, nces
was only guaranteed to work on hyperslabs
in the record dimension that resolved to the same size.
Those familiar with netCDF mechanics might wish to know what is
happening here: ncks
does not attempt to redefine the variable
in output-file to match its definition in input-file,
ncks
merely copies the values of the variable and its
coordinate dimensions, if any, from input-file to
output-file.
As of version 5.1.1 (November 2022), the map checker diagnoses
from the global attributes map_method
, no_conserve
,
or noconserve
(in that order, if present) whether the mapping
weights are intended to be conservative (as opposed to, e.g.,
bilinear).
Weights deemed non-conservative by design are no longer flagged with
dire WARNING messages.
The JSON boolean atomic type is not (yet) supported as there is no obvious netCDF-equivalent to this type.
This limitation, imposed by the netCDF storage layer, may be relaxed in the future with netCDF4.
Prior to NCO 4.4.0 and netCDF 4.3.1 (January, 2014),
NCO requires the ‘--hdf4’ switch to correctly read
HDF4 input files.
For example,
‘ncpdq --hdf4 --hdf_upk -P xst_new modis.hdf modis.nc’.
That switch is now obsolete, though harmless for backwards
compatibility.
Prior to version 4.3.7 (October, 2013), NCO lacked the
software necessary to circumvent netCDF library flaws handling
HDF4 files, and thus NCO failed to convert
HDF4 files to netCDF files.
In those cases, use the ncl_convert2nc
command distributed
with NCL to convert HDF4 files to netCDF.
ncpdq
does not support packing data using the
HDF convention.
Although it is now straightforward to support this, we think it might
sow more confusion than it reaps.
Let us know if you disagree and would like NCO to support
packing data with HDF algorithm.
This means that newer (including user-modified) versions of
ncremap
work fine without re-compiling NCO.
Re-compiling is only necessary to take advantage of new features or
fixes in the NCO binaries, not to improve ncremap
.
One may download and give executable permissions to the latest source
at https://github.com/nco/nco/tree/master/data/ncremap without
re-installing the rest of NCO.
Install the Conda NCO package with ‘conda install -c conda-forge nco’.
Install the Conda MPI versions of the ERWG and MOAB packages with ‘conda install -c conda-forge moab=5.3.0=*mpich_tempest* esmf’.
However, mapping weights generated by Although MOAB and TempestRemap use the same numerical algorithms, they are likely to produce slightly different weights due to round-off differences. MOAB is heavily parallelized and computes and adds terms together in an unpredictable order compared to the serial TempestRemap.
As of version 4.7.6 (August, 2018)), NCO’s syntax for
gridfile generation is much improved and streamlined, and is the
syntax described here.
This is also called “Manual Grid-file Generation”.
An earlier syntax (described at see Grid Generation) accessed
through ncks
options still underlies the new syntax, though
it is less user-friendly.
Both old and new syntax work well and produce finer rectangular
grids than any other software we know of.
Until version 5.0.4 (December, 2021) the ‘--stdin’ was
also supported by ncclimo
, and used for the same reasons
as it still is for ncclimo
.
At that time, the ‘--split’ switch superceded the ‘--stdin’
switch in ncclimo
, where it is now deprecated.
Z_2-Z_1=(R_d*T_v/g_0)*ln(p_1/p_2)=(R_d*T_v/g_0)*(ln(p_1)-ln(p_2))
The default behavior of (‘-I’) changed on 19981201—before this date the default was not to weight or mask coordinate variables.
If lat_wgt
contains Gaussian weights then the value of
latitude
in the output-file will be the area-weighted
centroid of the hyperslab.
For the example given, this is about 30 degrees.
The three switches ‘-m’, ‘-T’, and ‘-M’ are maintained for backward compatibility and may be deprecated in the future. It is safest to write scripts using ‘--mask_condition’.
gw
stands for Gaussian weight in many
climate models.
ORO
stands for Orography in some climate models
and in those models ORO < 0.5 selects ocean gridpoints.
Unfortunately the ‘-B’ and ‘--mask_condition’ options are unsupported on Windows (with the MVS compiler), which lacks a free, standard parser and lexer.
Happy users have sent me a few gifts, though. This includes a box of imported chocolate. Mmm. Appreciation and gifts are definitely better than money. Naturally, I’m too lazy to split and send gifts to the other developers. However, unlike some NCO developers, I have a steady "real job". My intent is to split monetary donations among the active developers and to send them their shares via PayPal.