QDataSet.new

QDataSet is the name of the data representation within Autoplot. Every system ends up having to represent its data in some format. For example, suppose you are building an analysis that needs to handle length measurements as a function of time. You could accept two arrays, one for the times and one for the lengths. What units are the length in? Your documentation would state that the lengths must be expressed in meters. How is time represented? The documentation could then state that the times need to be integer milliseconds since the midnight of the day they were collected. You can see that with simple arrays, documentation (which is only understood by a human) is needed. Most systems end up developing a standard data representation model to handle data. The complexity of this model is often unnecessary, incrementally developed as new types of data were introduced into the system. QDataSet was developed as a response to many failed data models, learning from them in hopes to provide a general-purpose system for modeling data. All data in Autoplot is handled with this model, demonstrating its flexibility and extensibility.

Introduction to QDataSet

QDataSet is designed to model anything from a scalar number to spacecraft operating in different sampling modes. QDataSets are essentially arrays with metadata properties attached to them. One can declare that one QDataSet depends on another, asserting dependencies. The lengths measured over time provide a nice starting point. The length measurements are stored in a QDataSet, which can have metadata describing its name and its units:

assert isinstance( ll, QDataSet )
print ll.property( QDataSet.UNITS )   # 'cm'
print ll.length()                     # 100
print ll[0]                           # '2.54 cm'

The times that each measurement is collected are also in a QDataSet:

assert isinstance( tt, QDataSet )
print tt.property( QDataSet.UNITS )   # 'seconds since 2000-01-01T00:00Z'
print tt.length()                     # 100
print tt[0]                           # '2021-03-29T14:20:23.343Z'

These two are connected by the 'DEPEND_0' property. DEPEND_0 is a QDataSet that shows why the data is changing.

plot( ll.property( QDataSet.DEPEND_0 ), ll )    # tt is returned

So here you can see how ll "carries around" the time tags for each measurement. Not surprisingly, Autoplot will look to see if there is a DEPEND_0, and use it automatically:

plot( ll )

Note that since ll can also carry around units, our routine which takes these length measurements doesn't need to describe to the human using it what the units should be. It can simply look at the data and use the units within.

if ( ll.property(QDataSet.UNITS)!=Units.lookup('cm') ):
    raise exception('data must be in cm')

Names, Labels, and Titles

Autoplot lets the scientists discover the data. QDataSets allow for this by having metadata which describes data.

print ll.property( QDataSet.NAME )   # a short name useful for a variable name, like 'length'.  This should be a legal Python identifier.
print ll.property( QDataSet.LABEL )  # a short human label to display with the data, used for a Y-axis label, for example.
print ll.property( QDataSet.TITLE )  # a longer label, suitable for an axis title.

Nominal, Interval, and Ratiometric Data

There should be a section talking about these three data types handled. Stevens identifies four types of data, of which we use three. Nominal data can be thought of as an array of strings, like ['Chicago', 'Paris', 'Beijing'] or [ 'Time', 'Density', 'B-x', 'B-y', 'B-z' ]. Comparisons for equality can be made, but no other operators are allowed. Interval data is defined with a basis. The Celcuis temperature scale is an example, where the number of degrees above freezing is recorded. Comparisons can be made and differences are meaningful. Time locations like CDF_TT2000 and us2000 (the number of microseconds since 2000-01-01T00:00Z) are another example. Last, we have ratiometric data which have a physical zero, such as the heat energy or density. These can also be divided.

A "simple bundle" is a rank 2 dataset having nominal data for DEPEND_1. Note there is "bundle data set" which allows for bundling complex data, with metadata attached to each bundled data set.

Schemes

QDataSet uses "Duck Typing", which is to say if it looks like a duck, it's a duck. There's no strong typing which asserts that a QDataSet can only be interpreted in some way. The page QDataSet.Schemes attempts to enumerate different types and the properties they have.

QDataSet.new

Introduction to QDataSet

Names, Labels, and Titles

Nominal, Interval, and Ratiometric Data

Schemes

Table Of Contents

URIs that Point to Data Files

Download a CDF and Plot it with Autoplot

Load a CDF directly from a website

URIs that Point to Data Servers

Saving to vap files

Loading vap files

Data Sources

CDF Files

HDF/NetCDF Files

Aggregation

CDAWeb

HAPI Servers

Exporting Data

Export Types

Additional controls

Aggregation

Tools

PNGWalk Tool

Data Mash Up

Events List

Run Batch

Advanced Topics

TimeSeriesBrowse and other Capabilities

Events Lists

Caching

Autoranging

Managing Autoplot's Data Cache

Using Autoplot with Python, IDL, and Matlab

Reading data into Python

Reading data into IDL

Reading data into Matlab

QDataSet Data Model

Clone this wiki locally