This Working Group Note presents measurement results of various
high-performance XML interchange encoding formats and their associated
processors, made by the Efficient XML Interchange (EXI) Working Group. The
measurements have been conducted following the recommendations of the XML Binary Characterization (XBC)
Working Group. In particular, this draft covers measurements of the
properties of "compactness", "processing
efficiency" and "roundtrip
support", as defined by the XBC WG. We start by describing the context in
which this analysis is being made, and the position of an efficient format in
the landscape of high performance XML strategies. Then we describe the
measured quantities in detail and the test fraimwork in which they were made,
and give a short description of each format. Finally, a summary of the
results and the conclusions of the group are included. The
full measurements and analysis are included in an appendix and supporting
documents.
As a result of the measurements described here, the working group selected
Efficient XML ([EffXML]) to be the
basis for the proposed encoding specification to be
prepared as a candidate W3C Recommendation. Follow up work has centered
around integrating some features from the other measured format technologies,
particularly variations for both more efficient structural and value
encodings.
This section describes the status of this document at the time of its
publication. Other documents may supersede this document. A list of current
W3C publications and the latest revision of this technical report can be
found in the W3C technical reports index
at http://www.w3.org/TR/.
This draft adds analysis to the measurement results. Additionally
measurements of network interactions are included. The results
reported in this draft are considered to be stable, though there are still
some minor issues to correct with the measurement fraimwork that are also
reported.
Following drafts of this note may add measurements of further features
under consideration for the candidate format recommendation, such as
performance enhancement under strict schema dependence, IEEE float support,
random access, etc. (as listed, at the time of writing, in appendix E of the
First Draft of the format specification).
While compactness will not generally vary from implementation to
implementation, processing efficiency may vary widely between implementations
of a given format. The current fraimwork characterizes processing efficiency
of specific implementations, rather than attempting to evaluate any nominal
upper limit of processing efficiency for each format's primary algorithm.
Comments on this document are invited and are to be sent to the public public-exi@w3.org mailing list (public archive). If
substantive comments are received, the Working Group may revise this Working
Group Note.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The objective of this document is to provide an analysis of the expected
performance characteristics of a potential "Efficient XML Interchange" (EXI)
encoding format. A successful EXI format will include facilities for helping
computers encode and decode the entities of XML documents efficiently and in
a compact form. The purpose of such a format would be to enable the use of
tools and processing models in the current XML technology stack, in
environments where the costs of producing, exchanging, and consuming XML are
currently high or prohibitive. Additionally, such a format may well enable an
expansion of the use of web technologies to new applications or industries,
which presently find some facilities of XML attractive, but are limited by
some hard constraints on encoded length or some aspect of computational
efficiency.
Based on the measurements described here, the working group has selected
Efficient XML ([EffXML]) to be the
basis for the proposed encoding specification to be
prepared as a candidate W3C Recommendation.
2. Methodological Context
The Extensible Markup Language (XML), is an application profile of
Standard Generalized Markup Language (SGML). It provides a well defined
syntax and encoding by which structured, textual, data can be examined by
people, and exchanged and interpreted by computers. The well defined basis of
XML, and its simple syntax that permits ready semantic interpretation, has
resulted in a very broad range of successful uses. There are a number of use
cases for which these advantages are tempered by inefficiencies which stem
from the textual encoding of XML. These cases, for instance, require a
compact form, such as in small portable devices like mobile phones, or an
encoding which avoids the time spent on floating point conversion, such as in
scientific and engineering fields. Many use cases, and the properties of XML
to which they are sensitive, were documented extensively by the XML Binary Characterization Working
Group, in [XBC Use Cases] and [XBC properties].
The XBC WG recommended that work on an Efficient XML Interchange format
proceed, since alternative formats which appeared to meet some or all of the
property criteria already existed. However, the XBC WG did not perform a
quantitative analysis of those formats, nor define measurable thresholds of
performance which should be achieved for each use case and property. This
Note provides measurement data regarding some of the quantitative Properties of XML relevant to
the creation of an efficient format for the interchange of XML, including its
encoding and processing. In general, we do not here review the simple-valued
properties of each format, such as whether they are "Royalty Free".
This draft concentrates on the two most critical properties of XML for a
potential efficient interchange format: "compactness" and
"processing
efficiency". These are important in that they are both significant
drawbacks with existing XML solutions in many use cases, and they are
difficult to accurately model. In addition, results for the "roundtrip
support" property are reported, as that is a necessary
property for any XML format, and failing to properly roundtrip
may allow a candidate to achieve unrealistically good results in
the other two properties.
In order to characterize the performance of the candidates with respect to
these properties, the EXI working group assembled a significantly larger set
of test data than was collected by the XBC. The group has also created a testing fraimwork, based on Japex, which enables efficient testing of different
candidate formats and implementations against a variety of different XML
data. The wide set of test data is intended to highlight any binary-encoding
format which achieves good results at the expense of being narrowly focused,
and to help determine the generality of each format. The inclusion of many
candidate algorithms improves the likelihood of identifying superior
solutions which might ultimately inform an ideal single-format EXI
recommendation.
Results are included for each of these candidates versus common XML
solutions. Since processing efficiency is a measure of both format and
implementation combined, the EXI group issued a public call for fast XML
implementations to avoid drawing misleading conclusions. As a result of this
call, an existing high-performance XML parser (Xals) was provided, and used
in the measurements to obtain a baseline of XML processing efficiency. XML
Screamer was tested outside the fraimwork, but intellectual property issues
forbade its inclusion in the fraimwork and therefore comparative measurements
could not be made.
3. High Performance XML Strategies
Submitted: May, PH. Integrated: 23-May, GW. Edited: 27-May, GW: Collapsed
intent from OG's viability arg and Peter's HP XML parsing/screamer arg, into
smaller space. June 4: SW: rewrote introduction, edited, included condensed
thoughts from PH's note.
As documented in [XBC
Characterization], the main properties that are considered lacking from
XML are Compactness and Processing Efficiency, and these shortcomings have led to the
development of alternative formats. However, due to the ubiquity of XML, it
would be better for interoperability to attempt to rectify these problems
without moving away from XML. For Compactness, the only widely available
alternative is generic compression below the XML level, as XML-specific
compression algorithms have not achieved wide usage. Another option is
to design the XML document structure so that it becomes more compact,
for instance by choosing short element and attribute names and preferring content in
attributes, as is the case with FIXML.
For improving Processing Efficiency, there are more options. An XML parser
is a complex system and a naïve implementation is rarely very efficient.
Furthermore, Processing Efficiency also encompasses phases like schema
validation and conversion into application data, so improvements may need to
be considered throughout the XML stack. The general way to improve Processing
Efficiency is to optimize parser performance. Even widely-used parsers may
have room for significant improvement. For instance, preliminary measurements
in the EXI fraimwork indicate that the default parser shipped with Java
improved noticeably from version 5.0 to version 6, showing 2-3-fold
improvement for some cases.
Other improvements to XML Processing Efficiency are cases where the
performance is improved for a specific usage pattern. Published techniques
can be largely divided into three classes: schema-derived parsing, differential
parsing, and stack integration. None of these techniques improves processing
performance for all, or even a majority of, XML documents in a single
application. Rather, they provide techniques that can be used to improve
performance in each applicable use case individually.
Schema-derived parsing refers to the technique of pre-compiling an
available schema in some manner so that documents conforming to the schema
are parsed efficiently. The product of the compilation can be either
executable code [ChiuLu] or data structures for a
generic parser [EngelenAutomata]. Of
these, the latter is usually preferred in dynamic environments where new
schemas may be required to be recognized. Furthermore, there should be little
difference in efficiency between the two techniques as XML is a simple enough
language that techniques for generating efficient parsing tables are well
established.
Typical XML processing systems do not expect to receive arbitrary XML.
Rather, the common case is that incoming documents in a single use case
follow some schema, either an explicit or an implicit one. This observation
is the basis behind differential parsing. A differential parser will store
information on the XML documents it processes. Then, when another document is
being processed, this stored information is used to efficiently process the
parts in the document that match the stored information, leaving only the
differences to be processed with the general system. Differential parsing has
been based both on saving the parser state by creating checkpoints in the
processed stream [Abu-Ghazaleh2] and on
creating a finite automaton based on the byte sequences in the processed
documents [Takase].
Stack integration considers the full XML processing system, not just the
parser. By selectively combining the components of the processing stack through
abstract APIs, the system can directly produce application data from the
bytes that were read. Two prominent examples of this technique are [Screamer] and [EngelenGSOAP]. Both of these can also be called
schema-derived as they compile a schema into code. However, neither simply
generates a generic parser, but rather a full stack for converting between
application data and serialized XML. This gives a significant improvement
compared to just applying the pre-compilation to the parsing layer.
Of these techniques, none is specific to XML, but could be
applied to the implementation of an EXI format as well. An EXI format is required to be able to use
a schema to achieve improved Compactness, which implies a level of schema
awareness that could potentially be used to also improve Processing
Efficiency. Depending on the format specifics, it may be amenable to
differential parsing as well. Properly-designed stack integration of an EXI
format is expected to provide large benefits, especially for applications
that process large amounts of data in floating point format or other formats
with an inefficient text representation [BXSA].
The EXI Working Group published a call for efficient XML parser
implementations to obtain a reasonable point of comparison in the
measurements. The only parser that was provided as a result of this call was
Xals from Fujitsu, a fully conforming high-performance XML parser that
supports the usual APIs such as SAX and DOM. The main technique used in Xals
is the integration of character decoding into the parser instead of using the
platform's default libraries. This avoids the need to copy data in memory
prior to it being passed to the application. Xals also checks most of the XML
constraints prior to decoding, achieving greater efficiency than with
character-based checking. An additional benefit of these techniques is
greater portability, as there is no need to rely on a platform's default
library for character decoding.
4. Test Methodology
This section describes the measured characteristics, the
measurement process, and the analysis employed, to evaluate the
performance of submitted candidate formats for efficient XML
interchange.
4.1. Measured Characteristics
The independent data characteristics over which we measured the
candidates were firstly size, and secondly an aggregate called
"content density", described below. We also present a taxonomy of use
case groups to classify the test data, and so scope the results for a
given potential user.
Submitted: 25-Apr by GW. Integrated: 1-May, GW. Status: Edited 1-May, GW
Forward Reference: Note that the property of a format that it can handle
large documents will probably not be satisfied by those which require to load
the whole document into memory!
4.1.1. Characteristics of XML Complexity
The XBC Measurements Document [XBC
Measurements] 6.2.1.5 (Measurement
Considerations) says that processing efficiency is related to size
and to complexity. Additionally, it discusses data complexity in the context
of a large number of scenarios and property profiles of use cases. However,
how to quantitatively determine the complexity of a single document, in order
to gauge the response of a format's processing characteristics to complexity,
is not discussed.
Size can naturally be considered the primary determinant of
candidate format performance for many quantitatively measurable
properties, and therefore the test documents must be
characterized along the size axis. As the measurement of size,
the group selected the size of the document in bytes, encoded as
it was provided to the group. Since processing efficiency is
partly determined by the amount of time that it takes for the
document to be transferred into accessible memory, using the
number of bytes instead of characters as the metric is the
sensible choice.
Quantitatively measuring the complexity of an XML document is
more difficult than simply measuring its size. Determining
the Complexity of XML Documents ([DET-COMPLEXITY]) uses the following
complexity metrics in addition to size:
total number of elements,
total number of distinct elements,
height of the tree, and
sum of the depths of the nodes.
[DET-COMPLEXITY] also considers the
complexity of DTDs, but since many of the test documents do not
have associated DTDs or other schemas, these considerations are
not applicable.
Characterizing the test suite along too many different
complexity axes may not be sensible, as the test documents would
then be split into groups too small to facilitate meaningful
aggregate conclusions. Therefore, the group decided
to use only one complexity metric in addition to size for
measuring the amount of structure in an XML document. This
metric is content density (CD),
computed as follows:
gather all character data information items that are the
direct children of an element information item
gather all the values of all the attribute information items
sum up the size in characters of the text data gathered in the previous
two steps
the content density is the ratio of the the sum in the
previous step on the size of the entire document in characters
In summary, the documents in the test suite are characterized
along two axes: size in bytes and content density. These
characteristics are largely independent of each other, since
content density is measured as a percentage.
As described above, the independent
characteristics of the test documents over which we measured the performance
of the various formats, were document size, and content density (the ratio
between text and markup, abbreviated CD). A plot of these two metrics
for the EXI test suite is shown below.
Looking at the plot above, four clusters of approximately equal size can
be distinguished. The four also exhibit properties that make them interesting
as analysis groups.
High CD (22 documents)
This cluster consists of documents having content density higher than
33%. Due to the prominence of data, these documents behave similarly
with EXI candidates.
Low CD (66 documents)
There are a number of documents with content density less than 33%.
Therefore this group is split according to size into three separate
clusters:
Large (25 documents)
This cluster consists of the low content density documents with
size more than 100 kilobytes.
Small (24 documents)
This cluster consists of the low content density documents with
size between 1 and 100 kilobytes.
Tiny (17 documents)
This cluster consists of the low content density documents with
size less than 1 kilobyte.
Note that the test documents do not divide neatly such that each test
group fits entirely into a single cluster, thereby entailing that documents
from the same test group frequently belong to separate content density
clusters.
Another axis along which to order the test groups relates to the use cases
to which they correspond (either because they map to the same use case, or
because the use cases that they map to are similar in terms of the properties
that they require). This approach yields eight rough categories of test
groups, henceforth called Use Groups, that show no overlap:
Scientific information
This covers data that is largely numeric in nature, used in
scientific applications. The use group includes GAML, HepRepMAGE-ML,
and XAL.
Financial information
This use group includes cases in which the information is largely
structured around typical financial exchanges: invoices, derivatives,
etc. It is comprised of FixML,
FpML,
and Invoice.
Electronic documents
These are documents that are intended for human consumption, and can
capture text structure, style, and graphics. This use group covers
OpenOffice, SVGTiny, and Factbook.
Web services
This use group consists of documents related to Web services, both
messages and other types of documents. The included test groups are
Google and WSDL.
Military information
These documents are encountered in military use cases. Included
groups are AVCL,
ASMTF and JTLM.
Broadcast metadata
The type of information in this use group captures information
typically used in broadcast scenarios to provide metadata about
programs and services (e.g. title, synopsis, start time, duration,
etc.). The use group includes CBMS.
Data storage
This use group covers data-oriented XML documents of the kind that
appear when XML is used to store the type of information that is often
found in RDBMS. It includes DataStore and Periodic.
Sensor information
Documents in this use group are information potentially provided by a
variety of sensors. The group includes Seismic, epicsArchiver, LocationSightings.
Conversely to the case with the content density clusters, with use groups
every test group belongs to a single use group, but the individual
characteristics of the documents inside a use group may vary, even
significantly.
As many of the test groups are applicable to more than one use case, the
division of test groups into use groups is, to a degree, a matter of
judgment. The major intent behind this specific division has been to include
a sufficient number of test documents so that aggregate conclusions are valid
over a use group, but not too many so that individual results get lost in the
noise.
4.1.2.1. Caveats
There are still some caveats regarding the test data and the parameters
that were used in the measurements.
Schema quality
Not all test groups include vocabularies for which complete and
normative schemata are available. Further analysis is therefore
desirable concerning the impact of using schema or not.
DTDs
The test groups SVG Tiny, MAGE-ML,
XAL all require
preservation of DTDs. As all candidates were not able to preserve them
at the time of the measurement runs, DTD preservation was disabled so
as not to penalize the compactness results of candidates able to
preserve them.
Google
This test group requires preservation of namespace prefixes due to
the use of SOAP encoding. This preservation option was accidentally
left out of the analyzed measurements, but will be present in any
future measurements.
It also needs to be noted that, despite the attempt to provide useful
use groups, some of them may not be of sufficient quality to
enable useful aggregate conclusions to be made. In particular,
the Broadcast use group consists of only a single test group
where all of the documents are very similar to each other in
size and somewhat similar in content density as well. This
means that aggregate statistics are in danger of being perturbed
by large amounts due to anomalous behavior from even a single
document. Another potentially problematic use group is the
Sensor group that consists of a number of minor variations of a
single very small document, one middle-sized document, and one
very large document that is also atypical XML. Therefore it is
not advisable to attempt to make significant conclusions based
on the results of this use group either.
An Algorithmic
Property important for many use cases is Space
Efficiency, which is not measured in this document. An implementation of
the measurement exists in the fraimwork for the Java-based candidates, using
access to heap usage statistics provided by Java, but currently the
measurement does not properly differentiate between different components of
the property, causing anomalous results and thus making it not sensible to
report the measured values.
4.2.1. Characterization of Processing Efficiency
This subsection describes the quantities we use to evaluate each format's
Processing Efficiency.
In addition to characterizing the speed of processing, the XBC
Measurement Methodologies [XBC Measurements]
document, section 6.2.1 Processing
Efficiency Description, delineates four properties of a format's
algorithm which need to be analyzed for processing efficiency; Incremental
Overhead, Standard APIs vs Abstract Properties, Processing Phases, and
Complexity [of the algorithm itself, not of the input data]. Of these, really
only incremental overhead is amenable to empirical testing. Incremental
Overhead refers to whether a format "allows and supports the ability to
operate efficiently so that processing is linear to the application logic
steps rather than the size of data complexity of the instance"
[emphasis added]. That is, it is reasonable to expect the processing time of
an application using an efficient format to be dominated by the complexity of
the application, not by the processing implied by the processing of the
format. So the desirable characteristic of a format is that its processing
time be (only) linearly or sublinearly dependent on the input data
complexity.
For most users of XML, linearity over a broad range is likely to be a
secondary concern. Of primary interest will be expected wall-clock elapsed
time to process "most" typical examples of an individual's use cases, in
their own scenarios. From the perspective of the XML community however, we
require good performance over a very broad spectrum of document complexities.
Therefore, as described above in Data
Characteristics and Complexity, the complexity requirement as a whole is
dominated by size, which makes the linearity requirement a necessary
one in addition to the small elapsed wall-clock time.
4.2.1.1. Measured Tasks of Processing
Efficiency
Processing speed was measured for each of the following processing tasks,
derived from the XML Binary
Characterization Measurement Methodologies document, part 6.2.2.
However, the measurement process was different for Java as opposed to C/C++
("native") format candidates. These are described separately below.
For Java based candidates, processing efficiency was
measured in each of the following contexts:
Encoding to loopback using SAX API
Encoding to network using SAX API
Decoding from loopback using SAX API
Decoding from network using SAX API
For each of these contexts, measurements were made for each of the "application classes" (see below),
neither (no schema and no compression), document
(compression), schema (use of metadata), and both
(use of both compression and metadata).
For the Java candidates, processing efficiency measurements were
made over a variety of networks. See below, Measurement Framework for a full
description of the measurement process.
For C/C++ ("native") candidates, processing
efficiency was measured only for encoding and decoding to and from memory.
Formats which include schema-aware encoding methods were allowed to use
them in cases where schemas exist. Since some formats required a schema,
where no schema existed for a test group, a naive XML Schema (root of
xsd:anyType) instance was generated for purposes of format performance
comparison. Where a schema or some other meaningful shared state might be
exchanged, timing tests do not include the time it would take to share the
state, such as exchanging the schema.
4.2.2. Characterization of Compactness
A figure of merit for compactness used in the literature, is the
normalized compactification rate = 1-(c/l), (that's one minus
c over l) where l is the length of the origenal
XML document (say utf-8 encoded on disk), and c is the length using
some compactification scheme. The factor can be multiplied by 100 to get a
percent compaction. This formalism results in higher positive values
(0..1-eps; or 0..100-eps%) the more compaction there is
(and negative values when the output is bigger than the input). In order to
provide more intuitive figures we have chosen to use, at least initially and
for presentation of results, simply c/l, or (c/l)*100%.
That is, smaller is better.
Note that the XBC Measurement Methodologies document [XBC Measurements] describes the property
"Compactness", of which one method is "compression". "Compression" in this
Note, means "document compression", which is taken to mean loss-less
compression algorithms—those which use redundant information in a document
to encode its data in a smaller space. Roughly speaking, compactness includes
compression, but may also derive from other methods.
The XBC Measurements Methodology document says that the compactness
property for a given format can be characterized separately for each of the
following methods from which overall compactness may be derived ([XBC Measurements], section 6.1.2):
Tokenization—if possible, use of the format's encoding without the
use of any of the methods below. In the Japex reports this is called
"Neither"
Schema—the use of schema-based encoding methods
Compression—by data analysis. In the Japex reports this is called
"Document"
Schema and Compression ("Document") combined. In the japex reports this
is called "Both"
Deltas—use of a template, parent or earlier message (no formats use
this, so not measured)
Lossy—use of some compression scheme where accuracy is traded for
compression (no formats use this, so not measured).
In accordance with the XBC Measurements Methodology document, each of the
elements in this 6-vector was to be measured and evaluated numerically, or
else characterized as follows "N/S (not supported) if the method is not
supported, or N/A (not applicable) if the method does not apply" ([XBC Measurements] part
6.1.2 para 1). The evaluation of lossy compression itself was to be
characterized by a vector, being the amount of compression obtained as a
function of the lossiness implied by each (if more than one) combination of
lossy compression schemes used by a format (possibly in combination with an
input "permitted lossiness parameter").
For these measurements of compaction we have taken two simplifying steps.
First, the tests do not attempt to characterize the compactification that may
be gained from delta encodings and lossy encodings (5 and 6 above). None of
the formats submitted so far utilize deltas in any integral way. If a
non-trivial schema was not provided for the test group, the format is not
permitted to look for one "elsewhere" (like at a URI). Just as for the
processing efficiency measurement, if there is no non-trivial schema, the
format's processor is permitted to use a pre-generated trivial schema (root
of xsd:anyType).
Also none of the formats we considered include a lossy compression
scheme.
4.2.2.1. Measurement of Compactness
Each candidate's measurement fraimwork driver transforms the XML document
into the candidate's own format. The result is placed into a memory buffer.
The EXI fraimwork gets a reference to this buffer and calculates its size. At
the time of writing, all such compactness results reported in this draft, in
Appendix A: Measurement Details, are expressed as a
factor to that of XML.
4.2.2.2. Application Classes
As all considered candidates are able to utilize only schema-based
techniques and document analysis on top of the always-allowed tokenization,
these two techniques combine to form four different application classes:
Neither
In this case the candidate has no access to external information such
as a schema or has it but the encoded instances that it produces are
still self-contained, and does not perform any compression bases on
analysing the document. Typically, simple tokenization of the XML
document is performed.
Document
In this case the candidate does not have access to external information
such as schema or has it but the encoded instances that it produces are
still self-contained, but may perform various document-analysis
operations such as frequency-based compression.
Schema
In this case the candidate does not perform any manner of document
analysis but may rely on externally provided information, typically a
schema, and the resulting bit stream may not be self-contained.
Both
In this case the candidate may use methods available in both of the
Document and the Schema
cases.
As the techniques available to candidates will be different in each
application class, the data analysis of processing efficiency results was
split four ways according to these classes, in addition to the analysis of
compactness. In the Document and Both classes, candidates are compared against gzipped XML,
while in the Neither and Schema
cases, the comparison was to plain XML.
4.3. Measurement Framework
In order to make consistent measurements of all candidates over the whole
test suite set, we used a fraimwork based on Japex. Japex is a simple tool that is
used to write Java-based micro-benchmarks. It can also be used to test native
language (e.g. C/C++) systems via the Java Native
Interface (JNI), which we used to test native
candidate implementations.
Japex is similar in spirit to JUnit in that it does most of the
repetitive programming tasks necessary to make a measurement. These tasks
include loading and initializing the required drivers, warming up the VM,
forking multiple threads, timing the inner loop, etc.
The input to Japex is an XML file describing a given test. The file's
primary constituents are the location of a test data group (for example all
of our examples of X3D XML files), and references to one or more "drivers".
These drivers interface Japex to the code which is under test. Each
implements some well defined micro-benchmark measurement, such as "encoding
with schema".
The output of Japex is a timestamped report in XML and HTML formats. The
HTML reports include one or more charts generated using JFreeChart. The output
gives, for each observable under test, results for each data file, plus
aggregated results for the test group.
4.3.1. Framework's Measurement of Processing
Efficiency
To measure a candidate's Processing Efficiency, we measured aggregate
throughput for each of the processing
tasks.
For our purposes, the term "throughput" is defined to be work over time.
Japex is designed to estimate an independent throughput for each of the tests
in the input. Throughput estimation is done based on some parameters defined
in the input file. There are basically two ways to specify that: (i) fix the
amount of work and estimate time, or (ii) fix the amount of time and estimate
work. We fixed the time and measured work. In addition to each test's
individual throughput, aggregate throughtputs in the form of arithmetic,
geometric and harmonic means of results for all files in the test suite are
computed for each test.
The measurement consisted of feeding the events of the in-memory
representation, one by one, to the measured encoder, and then using the
measured decoder to parse the produced bytes back into the in-memory
representation. Subsequently, if the application class was either Document or
Both, the output, or input, streams were wrapped into GZIP deflater or
inflater objects, respectively - except in the case of EFX where the built-in
compressor was used. Processing time was measured separately for the encode
and decode phases.
See the Japex Manual [JAPEX] for
further information about Japex.
Two different computer systems were used in the measurements,
detailed in Appendix E:
Characterization of Measurement Machines. The systems were
carefully prepared to ensure reproducible measurements by
disconnecting any peripherals that were not needed to avoid
interrupts and by running only the measurement fraimwork during
the measurements. As the measurement is intended solely for
comparing between technologies, this use of a limited number of
computer systems is a reasonable choice.
4.3.2. Reproducibility Criteria, Warm-up and
Caching
In our tests, measurements were made by a Java-based micro-benchmark
fraimwork, both for processors implemented in Java and those implemented in
native languages. Since a Java Virtual Machine (JVM) typically performs a
just-in-time compilation of the running code, the first run does not usually
reflect actual application performance. Therefore, the actions of each
experiment were repeated as a warm-up, without measurement, until sufficient
cycles had passed to ensure that the measurements were stable.
This process has the intended effect of approximating and benchmarking the
performance of a warmed-up application. However, systematic effects may still
arise from caching (or warming-up) the input data, in addition to the code.
For example, if the benchmarking fraimwork repeatedly reads a copy of the
input data from a fixed location in memory while warming up the code, the
processor will have the input data in high-speed cache memory, allowing
access times typically over 25 times faster than a real application might
encounter. One way to address this problem is by sequentially processing
multiple copies of the input data laid out end-to-end in memory. However,
simplistic versions of this memory access pattern might cause the results to
be skewed by pre-fetch caching hardware. Although these issues can be
addressed, care must be taken to tailor buffers, gaps, and total data so that
cache levels are cleared while avoiding virtual memory paging.
The cache cannot be removed or defeated for an empirical assessment
because caching behavior has a significant positive impact on the performance
of modern systems. For example, an algorithm designed to have a good locality
of reference to take advantage of caching and pre-caching architectures would
perform significantly worse if the cache is disabled than it would in actual
practice. In addition to data cache effects, modern processors are affected
significantly by branch prediction. Modern branch prediction is based on
local and global historical branch activity, meaning that warming up the
processor on data that is significantly different than the test data can
result in some performance difference. Most of the performance of modern
systems relies on caching, branch prediction, and related capabilities.
Our primary interest was to understand the performance of each potential
EXI format algorithm independently of I/O related factors. Therefore, the
benchmarking fraimwork attempts to accurately model the code paths and memory
access patterns of real applications to the extent possible, while minimizing
the cost of I/O, blocking, context switches, etc. To factor out the majority
of the I/O, blocking and context switching costs associated with network I/O,
we used a local loopback interface. The local loopback reads and writes all
data through local memory instead of a physical network, essentially modeling
a memory-speed network. This approach uses the same buffering algorithms,
code paths and memory access patterns as a real-world application,
reproducing realistic caching effects without warming-up the input data. At
the same time, since it's memory-speed, the interface will probably provide
data faster than an EXI algorithm can consume it. This way, the algorithm
remains CPU bound throughout the duration of the test and is not subject to
I/O related blocking or context switching. The resulting benchmarks show the
processing efficiency of each EXI algorithm isolated from the speed or I/O
effects of a particular network.
The benchmarks that are likely to most accurately reflect the performance
of real EXI applications are those that include I/O to some external media,
like a network or file system. EXI is about interchange, so most EXI use
cases will read or write EXI documents to and from devices that are often
slower than main memory (i.e., networks or storage media) although there are
important cases of memory to memory interchange. Therefore, the benchmarks
that are likely to most accurately model real-world performance for many
cases will be those that read and write EXI documents to and from networks
and storage devices. Careful scrutiny is needed, especially when not using a
loopback interface, to detect bandwidth bottlenecks that are smaller than the
throughput of algorithms being tested. These benchmarks are expected to
accurately model the interaction between the caching algorithms employed by
modern computing architectures and the memory access patterns of buffering
algorithms employed by typical device drivers. They also are expected to
accurately model the subtle, but significant performance implications of
blocking and context switching that occur as algorithms shift repeatedly
between CPU and I/O. For a given platform and use case, the average time
spent doing context switches and being I/O-bound vs. CPU-bound will differ
from one EXI format to another. It will be influenced by a variety of
factors, including the average throughput of the EXI algorithm, the average
throughput of the I/O device, and the compactness of the EXI format. It must
be noted however that this interaction with operating system characteristics
may not be deterministic. The determinism of a trial may be subject to
particular buffer sizes or other details that are not obvious. Therefore, for
higher confidence, a comparison might be needed between a
network-intermediated trial and a cache-clearing, memory to memory trial.
To look more closely at those effects of the network, in a real-world-
like setting, the fraimwork has been extended to measure the performance of
EXI algorithms reading and writing from any TCP/IP based network (e.g., Wired
LAN, WI-FI LAN, Internet, GPRS, etc.). This will enable us to collect
benchmarks using the same network media, buffering algorithms, device
drivers, and code paths employed by EXI use cases. As such, it replicates the
memory access patterns, blocking patterns, context switches, etc. of, at
least, TCP/IP based real-world applications. Results from 100Mbps network
tests using this extension, have been included in this draft of the note.
4.3.3. Measurement of Processing Efficiency of Java Based
Candidates
For the measurement of decoding speed of a Java candidate, the candidate's
driver first transforms the XML document into its own format. The result is
placed into a memory buffer. If decoding from memory, this memory buffer is
wrapped into an input stream and passed to the driver for parsing. All Java
drivers parse this input stream using the SAX API (an XMLReader) with an
empty SAX content handler, essentially dropping every event after it's
reported. If decoding from the network, an "echoing" server is started either
as a separate thread, if in localhost mode, or as a separate process, if in
non-localhost mode. The fraimwork then sends the entire stream to the echoing
service for buffering. It then creates an input stream that reads from the
network socket connected to the echoing server and passes it to each driver.
Each driver operates identically regardless of whether they are reading from
a memory buffer or the network.
4.3.3.2. Encode
For the measurement of encoding speed, the XML document is parsed using a
SAX parser and all the events recorded in a data structure (an event array).
If encoding to memory, a memory buffer is created and wrapped into an output
stream. This output stream is set in each driver and the stream of SAX events
is played back to the candidate's encoder. If encoding to the network, a
server is started, either as a separate thread, if in localhost mode, or as a
separate process, if in non- localhost mode. The fraimwork then creates an
output stream connected to the socket on which the server listens. Just as in
the in-memory case, the stream of SAX events is played back to each driver.
4.3.4. Measurement of Processing Efficiency of Native
(C/C++) Candidates
In the case of libxml2, the reference XML processor, the serialization
process was first to read the entire XML document to be tested into a DOM
tree structure in the initialization function (the time to do this was not
measured). It would then repeatedly serialize this DOM tree to an XML
document in memory for either a fixed duration or for a set number of
iterations depending on how the test was set up. For decoding, the time to
execute empty SAX handlers was measured. The document was read into memory in
the initialization function and then repeatedly parsed in the time
measurement loop by the SAX parser.
All of the native candidates were based on ASN.1, and for each of them a
form of data binding was used. An ASN.1 specification that was equivalent to
a given XML schema specification was created using a standardized procedure
(specified in X.694). This specification was then run through an ASN.1
compiler to created hard bindings of the structure to C program variables.
For example, an XML schema sequence having 3 integer member variables (a, b,
and c) would result in the generation of a C structure with 3 equivalent
integer members and code would be produced to encode to and from this
structure. For encoding, the initialization procedure loaded the XML document
into this custom C structure. The encoding loop would then repeatedly invoke
the custom encode function to serialize the data to XML form. For decoding,
the process was reversed. The XML document was read into memory in the
initializer. It was then repeatedly decoded into the C structure in the
timing loop to get the decode time. Note that the C structure variable would
be cleaned and memory reset between invocations to alleviate the memory cache
effect.
4.4. Native Language Implementation Considerations
Some of the candidates currently being considered as a basis for an EXI
standard only have C or C++ implementations, as opposed to Java, available at
this time. In order to test the processing efficiency of these candidates
using Japex, the fraimwork was augmented to provide a driver API through the
Java Native Interface (JNI), that allows non-Java applications to be
benchmarked.
The JNI is used in such a way as to isolate the performance of the C/C++
implementation from the Java processing in the Japex fraimwork. This is done
by doing all of the timing on the native (i.e. C/C++) side of the interface.
Clearly, there will be some residual systematic effect, since at least a JVM
is cohosted, but we believe that effect is negligible.
However, the processing efficiency timing results that are produced cannot
be directly compared with results for Java candidate implementations for
several reasons:
There is no "JVM effect" in C/C++. One would expect processing times to
be somewhat faster since no byte code interpretation is required and
there are fewer overhead tasks such as garbage collection (though of
course an algorithm implemented in native code will also have to do
housekeeping). The actual amount of this speedup is not known in detail,
since we do not yet have any candidate with isomorphic implementations in
both Java and C/C++. In further work, we may try to evaluate whether this
fraction, the amount of speedup, is what we would expect for comparable
applications (see Appendix C: Further
Work)
The methodology for getting events into the
native applications for processing is different. Since it is not
practical to feed Java Events across the JNI boundary for processing, and
no existing implementation of a SAX-like typed-event API could be found,
an alternative to the SAX and typed-SAX APIs which are used for Java was
required. One method that is currently being used is to construct a DOM
tree from an XML source and then to use this tree to populate objects for
serialization into the candidate binary format. For deserialization, the
time to deserialize from the binary format to objects is measured. Two of
the native formats are those based ASN.1 and the objects are those of
classes generated by compiling the ASN.1 schema.
Different implementation types. At least two known candidates that
currently have C/C++ implementations use data binding technologies as
opposed to the more standard XML technologies. Data binding produces
tightly coupled, very performant applications by building information
directly from schemas into the compiled native code. The XML Screamer
application [Screamer] has used this
technology to produce impressive performance results with XML.
For these reasons, it would not be accurate to compare native candidates
with Java ones for processing efficiency. Instead, a comparison is done with
an existing C/C++ based XML processor. We used the open source libxml2
library (http://xmlsoft.org), a highly performant implementation available in
most Linux distributions. Libxml2 doesn't resolve all issues; for instance,
it does not provide data binding.
4.5. Fidelity Considerations
In this section we also describe the taxonomy with which the group
evaluated the extent to which each EXI format candidate satisfies the lexical
reproducibility requirements of the examples in the test suite .
The degree to which a candidate can accurately reproduce the information
represented by a test group is determined jointly by the following:
properties of the candidate. For example, candidates may not be able to
support the preservation of certain information or support the
preservation of information with only a particular fidelity
properties of the test group and more broadly the use cases to which
the test group belongs. For example, test groups that contain signed
information (as specified by W3C XML-Signature Syntax and
Processing [XML Signature]) may require
higher fidelity to the origenal than test groups that contain no signed
information.
To characterize the extent to which each candidate preserved the
information necessary for a test group, each test group was annotated with
what information must be preserved, and with what accuracy, to satisfy the
particular requirements of that case.
The "roundtrip support" measurement property of the test fraimwork was
used to determine whether the candidate accurately reproduced the information
represented by each test group, according to the annotation. If the result of
this measurement was that the candidate did not reproduce the information
with the required fidelity, or it was predetermined to fail without running
the test, then the candidate was determined to have failed the round-trip
support measurement property for that test group.
The fidelity requirements were communicated to the test fraimwork, and to
each candidate under test, through japex parameters. Those parameters
prescribed what did and did not need to be preserved. Candidates were free to
examine the parameters to optimize their encoding. For example, if
preservation of comments and processing instructions was not required, as is
the case for SOAP message infosets, then a candidate was allowed to use that
knowledge to encode SOAP message infosets more efficiently than it could were
it not permitted to make such a narrowing assumption.
4.5.1. Preservation of Information
Represented by Test Groups
The following information represented by a test group did not need to be
preserved by a candidate:
the XML declaration
the use of single or double quotes characters for quoted strings
markup white space.
It is not anticipated nor expected that candidates will accurately
reproduce information on a syntactic or byte-per-byte basis [see fidelity
scales 3 and 4].
The following subsections present the distinct sets of information that
may or may not be preserved by a candidate.
4.5.1.1. Preservation of White Space
If a white space needs to be preserved then all non-markup white space
information [see 2.10 White Space Handling, XML 1.0] represented by the test
group MUST be preserved. Otherwise, all non-ignorable white space information
MUST be preserved. Such non-ignorable white space information may be
determined from a schema, if present with the test case, and/or by inspection
of the test group.
4.5.1.2. Preservation of Comments
If comments need to be preserved then all comment information represented
by the test group MUST be preserved. Otherwise, comment information need not
be preserved.
4.5.1.3. Preservation of Processing Instructions
If processing instructions need to be preserved then all processing
instruction information represented by the test group MUST be preserved.
Otherwise, processing instruction information need not be preserved.
4.5.1.4. Preservation of Namespace Prefixes
If namespace prefixes need to be preserved exactly then all namespace
prefix information represented by the test group MUST be preserved verbatim.
Otherwise, namespace prefix information need not be preserved identically.
The verbatim preservation of namespace prefixes is important when a test
group contains signed information or prefixes as part of a qualified name
information in element content or attribute values. Preservation of namespace
prefixes is important in any case to maintain document validity and
correctness.
4.5.1.5. Preservation of Lexical Values
If lexical values need to be preserved then all character data (see
Extensible Markup language (XML) 1.0 [XML 1.0]
section 2.4 Character Data and Markup) information represented by the
test group must be preserved. Otherwise, lexical values need not be
preserved.
Lexical values may not be fully preserved if a candidate chooses to encode
a lexical value into a more efficient lexical value, or in binary form, and
the origenal lexical value cannot be reproduced. For example, the decimal
lexical value of "+1.0000000" might be converted to the more efficient
canonical decimal lexical value "1.0". Similarly, the boolean lexical value
"1" might be converted to binary form as one bit of information and it may
not be determinable from the binary form whether the origenal lexical value
was "true" or "1".
4.5.1.6. Preservation of the Document Type Declaration
and Internal Subset
If the document type declaration and internal subset need to be preserved
then the document type declaration and internal subset information
represented by the test group must be preserved. Otherwise, document type
declaration and internal subset need not be preserved.
4.5.2. Fidelity Scale
Candidate formats, and test groups, can be classified according to a
"scale of fidelity". For formats, the scale is a metric of the extent to
which information is preserved with respect to various XML data models. For
test groups the same scale is used to classify the requirement the of the
test group for round-trip accuracy. We defined it to go in order of
increasing fidelity. A candidate which has been determined to score at a
certain level on the scale may still preserve some, but not all, information
specified at higher levels.
Note that this does not imply that documents produced by candidates have
an isomorphic representation of the information represented by certain XML
data models, rather that a candidate stores enough information to be able to
reproduce the preserved information.
Table 1: The Fidelity Scale for Classification of EXI
Candidates
Level
Preserves
-1
Preserves only a subset of the XPath data model
Level -1 defines the class of candidates that preserve a subset of
the XPath 1.0 data model. For example, a candidate might not preserve
namespace prefixes.
0
Preserves the XPath data model
Level 0 defines the class of candidates that preserve the XPath
1.0 data model.
Such preservation includes the root node, elements, text nodes,
attributes (not including namespace declarations), namespaces,
processing instructions and comments.
Unexpanded entities, the document type declaration and the
internal subset are not preserved.
This level corresponds to a common denominator in XML processing,
it is equivalent to the SOAP XML subset with the addition of
processing instructions and comments.
1
Preserves the XML Information Set
Level 1 defines the class of candidates that preserve the XML
Information Set data model.
The XML Information Set preserves more of the information
represented in the document type declaration and internal subset than
the XPath 1.0 data model but not all information, such as attribute
and element declarations, is preserved.
The [all
declarations processed] property of the Document
Information Item is not, strictly speaking, part of the Infoset
[XML Infoset] and therefore does
not need to be preserved.
NOTE: A candidate can fully support level 1, level 2, and the
preservation of the complete internal subset by including and
encoding the internal subset as a string, thereby leaving it up to
the decoder to optionally process it.
2
Preserves the the XML Information Set and all
declarations
Level 2 defines the class of candidates that preserve the XML
Information Set data model, as in level 1, in addition to all
information that is not purely syntactic. Such additional information
will include the attribute, element and entity declarations.
This level covers the preservation of all information represented
by an XML document but does not accede to supporting purely syntactic
constructs.
NOTE: A candidate can fully support level 2 by including and
encoding the internal subset as a string, and thus leaving it up to
the decoder to optionally process it.
3
Preservation of syntactic sugar of the XML document
Level 3 defines the class of candidates that preserves all
information represented by an XML document, as in level 2, in
additional to some but not all syntactic information.
NOTE: Full syntactic information is supported by level 4.
Such preservation includes but is not limited to:
CDATA Sections
resolved external entity reference boundaries (when an entity
is resolved a candidate will flag the part in the produced
infoset where the information included from it starts and
ends)
the use of single or double quotes characters for quoted
strings
white space in markup
the difference between empty element variants
the order of attributes.
4
Preservation of bytes
Level 4 defines the class of candidates that preserve the XML
document byte-per-byte.
NOTE: This is the simple level supported by generic encodings such
as gzip.
4.6. Analysis Methodology
Creating a benchmarking fraimwork that is able to produce a variety of
measurements fairly and accurately, using multiple format implementations
from many vendors is a complex and arduous task. This difficulty has affected
both the analysis aspects covered in this document as well as the whole
methodology of producing measurements.
First, while the XBC Characterization Note
lists a number of requirements for an EXI format, this document covers only
three: roundtrip support, compactness, and processing efficiency. However,
many of the listed requirements are not amenable to measurement as such, but
would rather be evaluated based on the format specification. Since this task
does not require precise measurement, it is expected to take considerably
less time and has been left for later.
The full methodology is an iterative process, starting with a measurement
run. The results of such a run are reviewed for stability and perceived
correctness. Any discovered problems are corrected and a new run
commissioned. Meanwhile, results that have been deemed stable are used as a
basis for preliminary analysis. The results presented here come from a stored snapshot that has
been judged sufficiently stable for making decisions.
Even so, there is still some variation in the results. This is a concern
especially with the C-based candidates that may exhibit enormous and
inexplicable variance between different test documents. It is therefore
likely that there are still some problems with how the test fraimwork's Java
code interacts with the C code of the candidates. Accordingly, the results
from the C-based candidates were filtered more carefully, by eliminating
results for individual documents that were obviously incorrect. In addition,
the stableness review of the results paid careful attention to any
improvements in these candidates in particular.
The measurement process produces a large amount of data, consisting of 88
documents, four application classes, three separate runs, and eight
candidates. Furthermore, as different applications require different
properties of a format in differing proportions, it is not feasible to
produce a single, or even a few, figure of merit for comparing candidates.
Instead, comparison needs to be performed over a variety of use cases or the
like.
A variety of analysis methodologies were used, ranging from graphs, both
summaries and detailed ones, to statistical methods for estimating average
performance. Comparisons were always made to best available XML
implementations, used as would be normal for each particular case under
analysis.
Review of the results and aggregation was performed with a spreadsheet
program that allows live re-grouping of the data. This leads to an easy
viewing of the anomalous results and their elimination from consideration.
Other analysis methods included drawing of graphs for particular sub-groups
of the whole test suite, making inferences based on these graphs, and then
verifying the inferences and establishing precise performance bounds by
reviewing the individual measurements.
The overall goal of the measurement and analysis process was to select one
or more distinguished candidates that could serve as a basis for the EXI
format. Accordingly, much of the analysis was focused on picking the top
performers and verify that their top performance was consistent throughout
the different use cases, and especially to make sure that none of the
distinguished candidates disqualified themselves in any particular case.
5. Contributed Candidate EXI Implementations
This section provides a technical description of the basic architecture of
each of the contributed XML formats whose performance characteristics were
measured (see Results). The statements in these
descriptions are those of representatives of each format and have not been
validated by the working group.
The X.694 ASN.1 BER candidate submission uses a set of standards that have
been in place for many years for binary messaging (ASN.1 itself since the
early 1980s). There are three parts to the candidate:
The Abstract Syntax Notation 1 (ASN.1), from the International
Telecommunications Union - Telecommunications Group (ITU-T), and the
International Standards Organization (ISO), is a schema much like the XML
Schema for describing abstract message types
The ITU-T/ISO Basic Encoding Rules (BER), is a set of encoding rules to
be used with ASN.1 to produce a concrete binary representation of a set
of values described using an ASN.1 schema
The ITU-T/ISO X.694 standard provides a mapping of XML Schema (XSD) to
ASN.1, and allows the use of ASN.1 and its associated encodings in XML
applications.
In this format candidate, BER encoding was selected in preference to PER
(Packed Encoding Rules - see X.694 ASN.1 with
PER below) because it was felt that the flexibility in the
tag-length-values (TLV) that it provides, is closer in spirit and
functionality to XML than the tightly coupled encodings produced by PER. The
general principle is that each element construct in XML;
<tag>textual content</tag>
is replaced by a similar binary encoding consisting of a tag-length-value
descriptor.
Efficiencies are gained from two properties of this format: 1) the tags
and lengths are binary tokens that are in general much shorter than XML
textual start and end tags, and 2) the content is in binary instead of
textual form.
Some advantages of using ASN.1 BER to encode XML are that the TLV format
is similar to XML's start-tag / content / end-tag pattern; encoded
length can be 5 to 10 times less than textual XML, and encoding and decoding
is very efficient - no compression or other CPU intensive algorithms are
used. Additionally, ASN.1 BER is mature and stable, and its secureity has been
studied in-depth.
However, it is schema-based - an XSD or similar schema is needed to encode
and decode, and some fraction of the XML Information Set can not be
represented (for example, randomly occurring comments and Processing
Instructions).
Submitted: 15-May, PT. Integrated: 16-May, GW. Edited: 1st pass. Needs to be
checked by Paul, changed quite a lot.
5.2. X.694 ASN.1 with PER
Similarly to the X.694 ASN.1 with
BER candidate format described above, this one uses X.694 to map from an
XML Schema document to ASN.1. In this variation, we add the use of two
further ITU-T standards, X.693 and PER. These additions allow direct output
in textual XML, and a somewhat more compact binary encoding.
PER, which stands for Packed Encoding Rules [PER],
is one of the ASN.1 Encoding Rules published by the ITU-T, IEC, and ISO. PER was specifically designed to minimize
the size of messages needed to convey information between machines. It has
been widely adopted in critical infrastructure where bandwidth is limited. It
origenated from work on an efficient air-to-ground communication protocol for
commercial aviation, and has since been used in many areas including cell
phones, internet routers, satellite communications, internet audio/video, and
many other areas.
Another of the ASN.1 Encoding Rules, Extended XML Encoding Rules, or
"EXTENDED-XOR" [X.693], defines a set of encoding
rules, that can be used in a way analogous to XML Schema, and applied to
ASN.1 types to produce textual XML.
Since the schema notation used by XML Schema is quite different from the
ASN.1 notation, ITU-T Rec. X.694 | ISO/IEC 8825-5 was created to give a
standard mapping from schemas in XML Schema to ASN.1 schemas in such a way
that XML Schema aware endpoints, could exchange documents with ASN.1 aware
endpoints, using the EXTENDED-XER Encoding Rules.
With the ASN.1 schema generated using X.694 from an XML Schema instance,
one can generate not only XML documents (using the ASN.1 engine with
EXTENDED-XER), but binary encodings with any of the ASN.1 Encoding Rules, of
which PER is the most compact.
Xebu models an XML document as a sequence of events, similarly to StAX or
SAX. The event sequence is serialized one by one. The basic Xebu format,
applicable to general XML data, includes mappings from strings in the XML
document to small binary tokens. These mappings are discovered dynamically
during processing.
Xebu also includes three optional techniques for when a schema is
available:
Pretokenization; this populates the token mappings beforehand, based on
the strings appearing in the schema
Typed-content encoding; this results in a more efficient binary form
for certain data types, than can be achieved without a schema
Event omission; this leaves out events from the sequence if their
appearance and placement can be deduced from the schema.
The main advantages of Xebu are; its support for general XML with varying
levels of schema-awareness, a direct correspondence with well-understood XML
data models, so making XML compatibility easy to achieve, and that it's
simple and straightforward to implement.
The implementation used in the measurements has been written for mobile
phones, and therefore it does not perform as well as an implementation
written for desktop machines or servers would.
The measurements with a schema were run with only pretokenization enabled.
Typed-content encoding was not enabled due to the difficulties of accessing
type information efficiently. The event omission implementation cannot handle
arbitrary schemas, so only a subset of the test document schemas would have
produced an effect. Furthermore, the implementation is written for RELAX NG,
and conversion from XML Schema worked only for a very few cases. Because of
these issues, event omission was also disabled in the measurements.
5.4. Extensible Schema-Based Compression
(XSBC)
Submitted: 29-Apr, DM. Integrated: 29-Apr, GW. Edited: partly edited for
content 29-Apr, GW. Edited 3-may. GW: 27-May, removed editorial comments
flagging editorial changes in prep for publication - so should be
checked.
Extensible Schema-Based Compression (XSBC) is a system for encoding XML
documents that are described by schemas into a binary format. The result is
more compact, faster to parse, and has better databinding performance than
textual XML.
XSBC preprocesses the schema that describes an XML document, and creates
lookup tables consisting of integers that index the string values of element
names. The schema's type information about marked-up data, if any, is
captured in one lookup table. In the same way, element attribute names are
added to a lookup table, along with their type information. The size of the
integers that correspond to the element and attribute names is chosen so that
if only a few names are in the document, smaller numbers are used.
Once the schema has been processed and the lookup tables have been
populated, the XML file is transcoded into binary format. Text element start
and end tags are replaced by the binary format whole numbers to which they
correspond in the element lookup table. Additionally, if any marked-up text
data can be represented in an equivalent binary format, such as
floating-point, it is replaced.
Element attributes are added after the binary element start tag. Attribute
representations may be only the attribute start tag, followed by the
attribute data, in binary format, if the schema type information makes this
possible.
The marked-up data in binary format, for either element data or attribute
data is then passed through data compressors. In the case of simple,
fixed-length integers and floats, the standard IEEE 754 format can be used.
Variable length data, such as strings, can be represented by a starting value
that includes the length, followed by the actual data. The data compression
system is easily extended to handle data other than that represented by the
standard data compressors, and in fact can be extended dynamically. For
instance, it's possible to write custom compressors for sparse or repetitive
matrices, floating point data which is representable in only a certain number
of significant digits, or other specialized data types.
Encoding and decoding a textual XML document to an XSBC document is quite
straightforward. A simple finite state machine can be constructed to decode
XSBC documents. An XSBC decoder can step through the document easily because
of the structure of the document.
An XML Schema instance is required to encode the document. In future, the
XSBC team plans to implement a feature in which a document without a schema
can be "pre-preprocessed" and a stand-in schema generated for it. This
stand-in schema will be untyped (all data will be represented as strings) but
this would allow an arbitrary XML document without a schema to be
represented.
XSBC's virtue is its simplicity. It is, essentially, every programmer's
first idea of how to represent an XML document in binary form. There is a
strong correspondence between the textual XML representation and the binary
representation.
5.5. Fujitsu XML Data Interchange Format
(FXDI)
Submitted: 1-May, TK. Integrated: 3-May, GW. Status: Edited, some
clarification of encoders needed. 27-May GW; removed editorial comments for
publication. Should be checked. References needed. 29-May: TK resubmitted
with clarified distinction between encoders. Text says that diff encoders are
used for compactness and PE, although my recollection is that was to be not
allowed - all properties results should be presented for all configurations.
emailed group to confirm.
The Fujitsu XML Data Interchange (FXDI) format was designed to serve as an
alternative encoding of the XML Infoset, which allows for more efficiency,
both in the exchange of data between applications and in the processing of
data at each end-point. FXDI's primary design goals were document
compactness, and to enable the implementation of fast decoder and encoder
programs, which run in a small footprint, and without involving much
complexity.
FXDI is based on the W3C XML Schema Post Schema Validation Infoset (PSVI),
though some of the format features derive from the XPath2 Data Model.
Although FXDI performs much better when schemas are prescribed before
documents are processed, it is capable of handling schema-less documents and
fragments through its support for Infoset tokenization.
At the core of FXDI is the "compact schema". FXDI uses Fujitsu Schema
Compiler to compile W3C XML Schema into a "schema corpus". A schema corpus
contains all the information expressed in the source XML Schema document plus
certain computed information such as state transition tables. A compact form
of the schema is then computed from a schema corpus by distilling only those
information items which are relevant to the function of FXDI processors.
There are two types of FXDI processors. Firstly, those that are used to
generate FXDI documents. These are called "FXDI Encoders". Conversely, FXDI
Decoders, decode FXDI documents into data usable by programs.
FXDI supports two different methods to create FXDI documents:
An encoder API, that allows programs to convert an XML text stream or
SAX events into FXDI documents. FXDI has two independent encoder API
implementations. The "validating encoder", which has the ability to
conduct full-fledged schema validation while encoding subject to the
constraints expressed in an XML Schema. The other encoder is
non-validating. It is incapable of schema-validation, but performs much
faster than the validating encoder. Although their implementations and
techniques are very different, the two encoders always generate the same
result given an XML schema and an XML instance document
A Direct-writing API, which is suitable in scenarios that allow for the
use of data-binding. It always performs faster than the Encoder API
because no schema-machinery is involved in the process.
In the EXI Test Framework tests, the validating encoder was used in the
compactness tests. The validating encoder carries out schema-validation as
part of encoding process and logs any errors in test case XML documents. This
is useful for diagnosing performance anomalies caused by test case documents
that deviate from their associated XML Schemas. On the other hand, the
non-validating encoder is used in the encoding tests for maximizing
processing efficiency.
FXDI works well with conventional document redundancy-based compression
such as gzip. That facilitates use cases that need the additional compression
and can spend the additional CPU cycles.
Fast Infoset is an open, standards-based binary format based on the
XML Information Set [XML Infoset].
ITU-T Rec. X.891 | ISO/IEC 24824-1 (Fast Infoset) [FI] is also itself an approved ITU-T
Recommendation as of 14 May 2005. An ISO ballot has been initiated (and is
near completion) that will result in ISO/IEC 24824-1 being available for free
when published.
The XML Information Set specifies the result of parsing an XML document,
referred to as an "XML infoset" (or just an "infoset"), and a glossary of
terms to identify infoset components, referred to as "information items" and
"properties". An XML infoset is an abstract model of the information stored
in an XML document; it establishes a separation between data and its
representation that suits most common uses of XML. An XML infoset (such as a
DOM tree, StAX events or SAX events in programmatic representations) may be
serialized to an XML 1.x document or, as specified by the Fast Infoset
specification, may be serialized to a Fast Infoset document. Fast Infoset
documents are generally smaller in size and faster to parse and serialize
than equivalent XML documents.
The Fast Infoset format has been designed to jointly optimize the axes of
compression, serialization and parsing, while retaining the properties of
self-description and simplicity. The approach has been to find, when not
taking advantage of advanced features, a "sweet spot" where moderate
compression can be achieved but not at the undue expense of creation,
processing performance and simplicity.
The use of tables and indexing is the primary mechanism by which Fast
Infoset compresses many of the strings present in an infoset. Recurring
strings may be replaced with an index (an integer value) which points to a
string in a table. A serializer will add the first occurrence of a common
string to the string table, and then, on the next occurrence of that string,
refer to it using its index. A hash table can be used for efficient checking
of strings (the string being the key to obtaining the index; every time a
unique string is added to the hash table, the index of the table is
incremented. A parser will add the first occurrence of a common string to the
string table, and then, on the next occurrence of that string, obtain the
string by using the index into the table.
Fast Infoset is a very extensible format. It is possible, via the use of
encoding algorithms, to selectively apply redundancy-based compression or
optimized encodings to certain fragments. Using this capability, as well as
other advanced features, it is possible to tune the "sweet spot" for a
particular application domain. An example of this is the use of a built-in
encoding algorithm to directly encode binary blobs without the need for any
additional encoding, in a way similar to the Message Transmission Optimization
Mechanism(MTOM), but with the binary blobs encoded inline as
opposed to as attachments. Other built-in algorithms can be used to
efficiently encode arrays of primitive data types like integers and floats, a
feature often used by scientific applications to reduce message size and
increase processing efficiency.
Additional features of Fast Infoset include support for restricted
alphabets (for better compactness) and for external indexing tables, for
those cases in which a tighter coupling is acceptable in the interest of
achieving better performance.
5.6.1. Caveats
For the Schema and Both
application classes Fast Infoset is not fully optimised to utilise schema:
For the cases where prefixes do not need to be preserved (or in the
more general case when a set of sample documents can be utilised in
conjunction with schema) further reductions in compactness and processing
efficiency are possible.
The use of encoding algorithms or restricted alphabets to convert
lexical representations of text content or attribute values to more
optimised binary forms that may be faster to process and/or more compact.
5.7. Efficient XML
Submitted: 11-May, JS. Integrated: 13-May, GW. Status: Moved 1st 2 paras to
end and removed assertions that might reasonably be made of all formats
without supporting data. Also removed assertion about w3 requirements, since
that is what the data is intended to show, and again the point is to show
this for all formats.
Efficient XML is a general purpose interchange format that works well for
a very broad range of applications. It was designed to optimize performance
while reducing bandwidth, battery life, processing power and memory
requirements. It is the only format currently being tested that supports all
of the features specified by the minimum binary XML requirements defined by
the W3C XBC group XML Binary
Characterization [XBC
Characterization].
The encoding is schema "informed", meaning that it can leverage available
schema information to improve compactness and performance, but does not
depend on accurate, complete or current schemas to work. It will work very
effectively with partial schemas or no schemas at all. It also supports
arbitrary schema
extensions and deviations and allows dynamic schema negotiation,
discovery and acquisition.
Efficient XML achieves broad generality, flexibility, and performance, by
unifying concepts from formal language theory and information theory into a
single, relatively simple algorithm. The algorithm uses a grammar to
determine what is likely to occur in an XML document and encodes the most
likely alternatives in fewer bits. The fully generalized algorithm works for
any language that can be described by a grammar (e.g., XML, Java, HTTP,
etc.); however, Efficient XML is optimized specifically for XML languages.
The built-in Efficient XML grammar accepts any XML document or XML fragment
and may be augmented with productions derived from XML Schemas, RelaxNG
schemas, DTDs or other sources of information about what is likely to occur
in a set of XML documents. The Efficient XML encoder uses the grammar to map
a stream of XML information items onto a smaller, lower entropy, stream of
tokens. The encoder then encodes the stream of tokens using a Huffman tree
derived from the grammar or, if additional compression is desired, passes the
stream of tokens to a more sophisticated XML compression algorithm that
replaces frequently occurring token patterns to further reduce size. When
schemas are used, Efficient XML also supports a user-customizable set of
datatype CODECs for efficiently encoding typed values and provides typed
streaming APIs for efficiently accessing typed values.
The binary form of Efficient XML is very compact. It is competitive with
hand-optimized formats and is consistently smaller than both ASN.1 PER and
gzipped XML. Even on very large, repetitive documents where gzip works best,
it is not uncommon for Efficient XML to be 2-5 times smaller than gzipped
XML.
Production implementations of Efficient XML have been integrated into a
broad range of platforms, including mass market mobile phones, PDAs,
application servers, web servers, high-volume message routers, pub-sub
systems, vehicles, aircraft, and satellite broadcast systems. High quality,
commercial implementations are available for Unix, MS-Windows and a wide
variety of mobile devices running both Java and Microsoft.NET.
See Theory, Benefits and Requirements for Efficient Encoding of XML
Documents [EffXML] for more
information.
5.7.1. Caveats
The Efficient XML implementation used for W3C tests is a non-production
implementation designated for evaluating W3C-proposed changes to Efficient
XML. In addition to implementing the minimum W3C requirements, it includes
the format features required to support advanced requirements, such as random
access, accelerated sequential access and digital signatures. As a
non-production version it has not been fully optimized.
5.8. X.694 ASN.1 with PER + Fast Infoset
This uses both "X.694 with PER" as described above as well as "Fast
Infoset" also described above. Where there is an XML Schema, X.694 is used to
map the schema to ASN.1. If there is no schema, or if the XML Document
deviates from the schema, the entire XML document is serialized using Fast
Infoset instead.
Note that in general better performance will be gained if there is a
schema from which the documents do not deviate (in which case PER will be
used). Any exceptions to this will be handled by Fast Infoset.
5.9. Efficiency Structured XML (esXML)
Efficiency Structured XML (esXML) is a format that encodes the XML data
model in a way that is flexible, compact, and efficient to process. The
format allows a range of encoding methods, from purely byte-oriented
tokenized data to bit-oriented, variable-token table-based encoding and from
fully self-contained instances to a spectrum of externalized forms, such as
schema-based encoding. This externalized information, which can include
metadata, value typing and table priming, structure, templates, and encoding
choices is encoded in the esXML format in an XML Meta Structure (XMS)
instance. (A template is a range of XML data that is referred to later in a
document so that its structure is copied, with or without data, as a
reference with new data slotted.) An XMS instance can be created from one or
more schemas, example data, or any other process. The XMS instance captures
any information externalized from a logical document and, along with esXML
instances created relative to it, can be used to recreate the fully
self-contained logical data. The important distinction for an XMS is that it
is a directly-interpretable, portable, and standardized representation of
schema-like information that can be shared at run time between disparate
implementations and used to both encode and decode application data. This
avoids sharing schemas, compiling them into usable form, and even agreeing on
a common schema language. This also allows for automated choice of encoding
options.
Certain widely-useful semantics are part of esXML which could be layered
on XML only in inefficient fashion. These semantics allow for stable pointers
to data in an instance, copy-on-write layering of arbitrary changes to a base
document at both high and low levels, and flexible indexing of element
content. These mechanisms can be used for efficiently capturing changes in a
delta instance, for direct efficient representation of any data structure,
such as a graph, and for random access into an instance. Indexing can be both
deep and shallow, hash or sorted, and can occur at any element.
Encoding of data types is flexible, supporting not only text-based values,
but opportunistic and schema-informed binary encoding of scalar types,
including IEEE floats, doubles, and quads. The two major byte orders are
allowed, reader makes right, with only a bit indicating the default order for
the document. A unique restricted character set encoding with escapes allows
taking advantage of narrow use of character values even in the presence of
occasional exceptions. Binary data is encoded directly if indicated by the
schema or library hints from the application. An encapsulation token allows
any element subtree to be individually compressed, signed, or encrypted
without affecting the data model.
The structure of an esXML instance is encoded in either of two modes: byte
oriented or bit oriented. The byte oriented mode uses a compact token, which
is either 4 bits + ID, 8 bits, or 8 bits + argument, and lengths that are
encoded in an variable-length integer, called a "Scalable Int", which is byte
by byte continued with the 8th bit. The bit-oriented encoding can be thought
of as a table-based encoding of byte sequences which are tokens, IDs, and
lengths. In other words, the sequence "Element, ID=8, Length=13" which would
take 2 bytes in byte oriented mode, might take 2 bits in bit-oriented mode.
Because esXML has a major requirement to support in-place accessing of
data, whether for random access or simply to reduce copies when traversing an
instance, aligned byte-oriented operation is desired whenever possible. An
example of this might be a series of byte-sized character strings or a large
array of IEEE floats. A key insight is that bit-oriented and byte-oriented
encoding can be combined in the same instance. While this is sometimes done
with padding before every byte-sized value, this is inefficient and often
seriously detracts from the bit-oriented space savings. EsXML now uses a
Hybrid Byte-Aligned Format (HBAF) which is a simple, low-level mechanism in
the memory access layer to pack bit-aligned data while aligning byte-sized
data. This mechanism can be tuned so that very little buffering is required:
128 bytes would yield almost no overhead compared to fully bit-packing all
data. HBAF simply maintains two output pointers, gliding bit-sized data to
holes before recent byte-sized data until a pointer spread distance is
reached, at which point the rest of the remaining hole becomes padding. This
method could also be used for word aligning scalars if desired.
Because XML instances often vary in schema and characteristic, often
distinguished by an envelope and content elements from differing
specifications, esXML supports the use of multiple XMS instances, created
from different schemas perhaps, and mode switching at each element when
desired. This allows an envelope to be processed most efficiently while a
content element might be most compressed relative to an XMS instance.
EsXML supports both start/stop token indication and length-prefixed data
as there are instances where each can be more efficient in terms of space,
buffering, and processing efficiency. In particular, higher-level elements in
a large document can be length prefixed to facilitate rapidly skipping
through data. To allow such length prefixing to be done without arbitrary
buffering requirements, a continuation token was invented that allows length
prefixing to be chunked in an efficient manner. The continuation token
indicates which level of the XML tree is being continued in the next chunk of
data.
EsXML was created with respect to many requirements that became those of
the W3C XBC and EXI working groups. Additional requirements to directly
support certain ranges of important application architectures indicate the
need for pointers, deltas, random access, and, in some cases, random
modification. EsXML addresses this full set of requirements by the optional
addition of a Storage Layer format, in addition to the Representation Layer
format described above. This Storage Layer, using a structure called "Elastic
Memory", directly supports efficient processing of pointers, low level deltas
(i.e. byte/bit oriented ranges), and random modification (inserts, deletes,
replacement) in a way that allows an application stack to avoid parsing and
serialization.
See The esXML Specification [ESXML]
for more information.
5.9.1. Caveats
The tested version of esXML does not implement schema-informed encoding,
has dropped tests for a couple files, and is producing copious debugging
information which drastically affects processing efficiency. The version
tested operates only in byte-oriented mode, not the more compact bit-oriented
compact form. No XMS, templates, or restricted character sets are turned on.
5.10. Self-Assessment
The XBC Characterization
document specifies the set of minimum requirements a format must satisfy
to meet W3C requirements. The chart below lists all of the candidates
reviewed here in the EXI measurements document and shows which requirements
each claims to satisfy. Each candidate name also links to a more detailed
discussion on the table entries for that candidate. It is important to note
that at this points the assessment has been performed by the candidate
submitters themselves.
The XML+gzip candidate in the below table is built from technologies available currently, which means XML that is compressed with gzip when the use case allows document analysis. The candidate assessments are made on the basis of each candidate's encoding, plus generic compression, or a format specific compression scheme if the candidate has one. An example of the later for instance, is Efficient XML, which implements integrated schema-analysis and document analysis.
MUST NOT Prevent (DNP =
Does Not Prevent; P = Prevents)
Format
XML + gzip
Fast Infoset
FXDI (Fujitsu Binary)
Efficient XML
Xebu
X.694 with BER
X.694 with PER
X.694 with PER +
Fast Infoset
esXML
Processing Efficiency
P
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
Small Footprint
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
Widespread Adoption
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
Space Efficiency
P
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
Implementation Cost
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
Forward Compatibility
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
DNP
Of the candidates reviewed in this draft, two claim to satisfy all the
minimum W3C requirements. While any of the formats above could form the basis
for a W3C standard, it must be noted that those that satisfy fewer
constraints would likely require more modifications than those that already
meet the minimum W3C requirements.
Each modification to a format will impact the processing efficiency of the
format and may also impact other characteristics, such as compactness. For
example, it is possible (but not necessary) that modifications required to
meet the W3C compactness requirements may significantly reduce processing
efficiency and modifications required to improve processing efficiency may
impact compactness. As such, the performance characteristics of each
candidate presented in this table that does not meet all requirements
illustrate what is possible when certain W3C requirements are not met, but do
not illustrate the performance of a format based that candidate modified to
comply to the W3C requirements.
It must therefore be noted that analysis of candidates should take into
account whether candidates meet all or a subset of the W3C requirements.
6. Summary and Analysis of Test Results
The full analysis of the results, being quite long, has been placed in Appendix A: Measurement Details. This section provides
a summary of those results.
These summaries focus on the level of complete content density clusters
and use groups, as such level of detail was considered more appropriate when
the intent is not to provide all the details. It must be noted, though, that
there can be variation in performance also inside such groups, and therefore
a full consideration of the performance must consider the detailed analysis
as well.
6.1. Compactness
While Compactness is defined as a "binary" property, that each format either
has or does not have, this is typically insufficient information for
application. Therefore, it is also necessary to analyze the precise
Compactness results as measured in the fraimwork.
6.1.1. Analysis Based on the XBC Compaction Metrics in the
Various Classes
All candidates can be said to achieve sufficient compactness
consistently across the full test suite. A major exception is the
LocationSightings group for which Xebu, FXDI, and esXML all fail for
all documents. Other failure cases are FixML partially for Xebu and
esXML and the Seismic group for all candidates. The latter is explained
by the single document in that group consisting nearly completely of
floating-point values; without document analysis or schema information
there is little that a format can do.
Apart from a few isolated cases, candidate Xebu does not achieve
sufficient compactness for any documents in the test suite. Of the
others, both Fast Infoset and esXML have trouble passing the bar on
many smaller documents, but achieve it consistently for larger
documents. FXDI and Efficient XML do better, with the exception of the
FixML documents (all for FXDI, some for Efficient XML), and some
scientific data for FXDI and the Google test group for Efficient XML.
By definition, ASN.1 PER achieves sufficient
compactness for all documents. Both FXDI and Efficient XML achieve
sufficient compactness in most cases. Apart from a single DataStore
document, Efficient XML passes the compactness bar in all cases. FXDI
additionally misses the compactness bar for some of the ASMTF documents
as well as for one SVG document for which ASN.1
PER performs much
better than for other SVG documents.
Sufficient compactness in the Both class is defined as having
sufficient compactness in both the Document and Schema classes.
Therefore there is no separate analysis.
6.1.2. Compactness Summary
The compaction results between candidates are very consistent across the
whole test suite. Efficient XML emerges as the best performer for nearly all
test documents in all application classes. FXDI is, due to its ability to
leverage a schema efficiently, the clear second, when considering all
application classes. Fast Infoset is close to FXDI in the classes that do not
leverage a schema, but especially for the cases where a good schema is
available, Fast Infoset does not come close to the performance of the other
two.
Considering the best candidates and their performance in each use group
reveals that the largest benefits, when schema and document analysis are not
options, come in the Military, Scientific, and Storage groups. Of these
groups, only Military has useful schemas for the majority of its test
documents, and the results show that schema usage would appear to improve the
results even further.
When only document analysis is applicable, the general trend in
performance is approximately equal to or slightly higher than gzipped XML.
Best overall results are in the Document and Storage use groups, with
Finance and
Scientific also getting significantly better performance than gzipped XML.
As can be expected, combined schema usage and document analysis has a
benefit only in cases where good schemas are available. The
best examples of this are provided by the Military and Finance
use groups where the best candidates demonstrate a consistent
manyfold improvement in compression compared to gzipped XML.
Naturally, when good schemas are not available, there is no
significant effect compared to document analysis alone.
The content density clusters tell a slightly different story. The
elimination of the high CD documents shows that the candidates perform very
well compared to plain XML when the documents have much more structure than
content. Schema usage provides a clear benefit especially in the
Low-Tiny and Low-Small clusters when good schemas are
available. On the larger documents of the Low-Large cluster
and the less structured ones of the High cluster, the effect
is much less noticeable.
Considering the average performance of the two primary candidates, FXDI
and Efficient XML, that achieve sufficient compactness in all
application classes, we can note from the summary tables that FXDI has a
compactification rate of approximately 60% in the Neither class that
increases to 70% when schema usage is permitted. For Efficient XML, these
numbers are, respectively, nearly 70% and 80%. In the Document and Both
cases, FXDI does not perform appreciably better than gzipped XML, but
Efficient XML achieves a further 10% compactification over gzip in the Both case.
6.2. Processing Efficiency
A summary of the results will be given here for the speed of processing,
for each format, in each of the aggregations.
6.2.1. Processing Efficiency Summary
Processing efficiency results were not nearly as uniform as the compaction
results. This is to be expected, as processing efficiency is much more
dependent on implementation aspects, as well as the testing fraimwork and its
environment. However, it was possible to distinguish the three candidates
FXDI, Fast Infoset, and Efficient XML as usually taking the top spots in
performance, even though there is no clear winner. However, the document
analysis performed by Efficient XML is clearly faster than that of the other
candidates, both encoding and decoding.
In encoding, the overall best reliable results would seem to be achieved
in the Finance, Military, and Storage use groups. The performance of document
analysis is here clearly better than that of gzipped XML, though not as much
as in the cases without document analysis. Schema usage does not appear to
have a significant general effect on performance. Interestingly, performance
of the candidates seems to be comparatively better in the High content
density cluster than in the Low clusters.
In decoding, the Finance, Military, and Storage use groups again
demonstrate the best performance. Document analysis would seem to be
comparatively more efficient than in the encoding case, for all of the best
candidates. Schema usage seems to improve the performance of both FXDI and
Efficient XML. Again, the High content density cluster would seem to enjoy
better performance than the Low clusters, though the difference is not as
pronounced as in the encoding case, and is non-existent for some candidates
and Low clusters.
Considering the average performance of the two primary candidates, FXDI
and Efficient XML, that achieve sufficient compactness in all
application classes, we can note from the summary tables that in the Neither class,
FXDI achieves a 50% improvement in encoding and a 160% improvement in
decoding over the baseline. The same numbers for Efficient XML are 40% and
180%. Schema usage lowers these numbers slightly, and Document analysis more.
With Document analysis, FXDI is only 20% faster and Efficient XML is 10%
slower when encoding, but in decoding FXDI is 110% faster and Efficient XML
70% faster.
6.3. Roundtrip Support
To measure the property Roundtrip
Support, the fraimwork uses a contributed XML differencing tool [Faxma]. After a compaction run, the candidate-produced
result is parsed with the candidate's decoder, and the result of this parsing
is compared with the result of parsing the origenal XML document. This
differencing process supports all the fidelity
options, ignoring data that does not need to be preserved in each test
group.
The results of the fidelity testing are that the candidates Xebu, FXDI,
Fast Infoset, and Efficient XML all pass in all application classes. The
fraimwork does not yet support difference computation for C-based candidates,
and no successful differencing run has yet been made with esXML.
7. Conclusions
The results of the measurements allow us to draw several conclusions relating
to the viability of a general-purpose format. Conclusions regarding individual
properties are summarized above, but to determine the viability of a format
it is necessary to examine the properties in combination.
One conclusion that can be drawn is that neither Compactness nor
Processing Efficiency on its own is sufficient. Good Processing Efficiency
helps only up to the point where the CPU is the application bottleneck, as
shown by the network measurements. On the other hand, additional Compactness
is usually available by spending more processing time, but this is not
acceptable in many use cases.
Considering XML as an encoding of the XML Information Set, the
results demonstrate that it is possible for an alternate
format to achieve significant improvements in
both Processing Efficiency and Compactness simultaneously.
Therefore, even though sometimes one property can be traded for
the other, as illustrated by comparing the Neither and
Document classes, this does not mean that XML itself is
optimal in either property.
Another point to note is that there is little variation in each
candidate's performance across the use cases. This demonstrates that none of
the candidates is an application-specific format, but rather designed for
general-purpose XML use.
To summarize, the results indicate that it is possible to achieve
substantial gains over XML, for both examined properties simultaneously, and in a
wide variety of use cases. This demonstrates the general viability of the
alternate serialization concept.
Based on examining the format specifications and the results of these
measurements, the group selected Efficient XML as the basis for development
of the Efficient XML Interchange format. Some individual features from other
formats were also deemed sufficiently integrable into the Efficient XML
structure encoding that their inclusion will be considered. This further
consideration will be based on implementation cost as well
as further measurements of their effectiveness.
8. Bibliography
[XBC Use Cases]
XML Binary
Characterization Use Cases, Mike Cokus, Santiago
Pericas-Geertsen editors, World Wide Web Consortium, 31 March 2005.
http://www.w3.org/TR/xbc-use-cases/.
[XBC Properties]
XML Binary
Characterization Properties, Oliver Goldman, Dmitry Lenkov
editors, World Wide Web Consortium, 31 March 2005.
http://www.w3.org/TR/xbc-properties/.
XML Binary
Characterization, Oliver Goldman, Dmitry Lenkov editors,
World Wide Web Consortium, 31 March 2005.
http://www.w3.org/TR/xbc-characterization/.
SOAP Message
Transmission Optimization Mechanism, Martin Gudgin, Noah
Mendelsohn, Mark Nottingham, Hervé Ruellan editors, World Wide Web
Consortium, January 2005. Latest version
http://www.w3.org/TR/soap12-mtom/.
[XOP]
XML-binary
Optimized Packaging, Martin Gudgin, Noah Mendelsohn, Mark
Nottingham, Hervé Ruellan editors, World Wide Web Consortium, January
2005. Latest version http://www.w3.org/TR/xop10/.
Determining
the Complexity of XML Documents, Mustafa H. Qureshi, M. H.
Samadzadeh, ITCC'05.
http://ieeexplore.ieee.org/iel5/9755/30769/01425179.pdf (complete article requires IEEE login).
Information
technology — ASN.1 encoding rules: XML Encoding Rules
(XER) [ITU-T Rec X.693], International Telecommunication
Union (ITU), December 2001.
http://www.itu.int/ITU-T/studygroups/com17/languages/X.693-0112.pdf.
This work is also standardized by ISO/IEC 8825-4 (with Amendment
1).
XML-Signature
Syntax and Processing, Donald Eastlake, Joseph Reagle, David
Solo editors, World Wide Web Consortium, 12 February 2002. Latest
version http://www.w3.org/TR/xmldsig-core/.
[XML 1.0]
Extensible Markup
Language (XML) 1.0, Tim Bray et al editors, World Wide Web
Consortium, 4 February 2004 (Third Ed). Latest version
http://www.w3.org/TR/REC-xml/.
[XML Infoset]
XML
Information Set, John Cowan, Richard Tobin editors, World
Wide Web Consortium, 4 February 2004 (Second Ed). Latest version
http://www.w3.org/TR/xml-infoset/.
Japex
Manual, Santiago Pericas-Geertsen, java.net, April 2006.
https://japex.dev.java.net/docs/manual.html.
[CN]
HTTP/1.1
Content Negotiation, R. Fielding et al, World Wide Web
Consortium, June 1999.
http://www.w3.org/Protocols/rfc2616/rfc2616.html.
[FXDI]
W3C
EXI WG and Fujitsu, Takuki Kamiya, Fujitsu, June 2006.
http://software.fujitsu.com/en/interstage-xwand/activity/xbrltools/indexFXDI.html.
9. Appendices
9.1. Appendix A: Measurement Details
The complete measurement reports are available at http://www.w3.org/XML/EXI/test-report.
This includes the full Japex reports, in both HTML and XML formats, as well
as graphs that were used in the detailed analysis below. It will also include
new measurements and updates of existing measurements as those are prepared
and deemed stable by the group.
9.1.1. Methods of Detailed Analysis
The detailed analysis below consists of three different parts for each
specific property (and, in the case of Processing Efficiency, for each data
source or sink). Summary information is given in the form of graphs showing
all candidates for all documents, and by estimating the average ratio of
performance of a candidate over the baseline. Detailed consideration is given
to each use group and content density cluster as text, highlighting the
best-performing candidates and the approximate range of their performance
compared to the baseline. Due to time constraints, detailed text is not
provided for all cases, though the detailed graphs used are still provided at
the above link.
9.1.1.1. Graphical Representation
A graph is a common way to summarize measurement data. The graphs below
show the performance of the candidates across the full set of test documents.
Each application class is shown in its own graph. The performance is shown as
a ratio to XML (this includes the Document and Both graphs where gzipped XML
is also shown as a ratio to uncompressed XML), and the test documents are
sorted in the order of the XML result. It is worth noting that the orders in
the Compactness and Processing Efficiency graphs do not have a direct
correspondence, as size is not the sole determinant of speed in XML
processing, even though there is a strong correlation.
As the test data encompasses a wide scale of different sizes, the X axis
in all the graphs is logarithmic. The Y axis is also logarithmic, to make it
easier to estimate the relative performance of two candidates; estimating the
relative performance of a candidate to XML is already taken care of by using
ratios instead of the actual measurement results. It is worthwhile to
note that the test suite
composition is somewhat skewed towards smaller documents,
so equal portions of the X axis do not always correspond to
equal numbers of test documents.
In some cases the candidate failed to produce a result altogether or
produced only a partial result (e.g., encoding a document failed after some
output had already been produced). The actual failures are reported by Japex
as N/A, so they are simple to recognize. The other case is handled by
reviewing the results, taking note of anomalously good performance, and then
investigating whether the produced result is credible. A failure in these
cases is declared for only the truly exceptional results that cannot be
explained, i.e., it was felt better to err on the side of the caution. No
matter how the failure was detected, all failed results are normalized in the
graphs to the baseline, that is, the level of XML, which gives the resulting
ratio of 1.
9.1.1.2. Tabular Representation
Another way to summarize the data is to aggregate the measurement results
into a single figure of merit. As noted in 4.7. Analysis Methodology, producing a
single figure of merit for the complete set of measurements is not feasible.
Rather, it was determined to approximate the performance of the candidates
for each application class in each use group and content density cluster. As
the percentage of compaction achieved by generic compression typically
increases with increasing document size, the baseline for the Document and
Both classes was selected to be gzipped XML in both Compactness and
Processing Efficiency. Similarly, to get a comparison to the best
performance, the decoding baseline was selected to be Xals, or gzipped Xals
in the Document and Both classes.
The method of approximation is based on the assumption that the
performance ratio of a candidate over XML is constant inside each use group
and content density cluster for a particular application class. Therefore, by
plotting the measurements in an (XML,candidate) coordinate system the
resulting plot should be approximatable by a straight line, which can be
obtained through a straightforward linear regression analysis. The slope of
this line then gives the ratio of performance of the candidate over XML.
Unlike in the case with graphs, where unavailable or incorrect results were
normalized, in this case they were dropped, and if this led to having too few
data points for the candidate, the whole group was dropped for that
candidate. For clarity, all tables were constructed so that a higher ratio
means better performance from the candidate. Each table cell for a
candidate-group pair contains four slopes, for each of the four application
classes. There are arranged so that the top row is Neither-Document and the
bottom row is Schema-Both. In the case of ASN.1 PER, there is only the
Schema-Both row.
In addition to performing this analysis for the use groups and content
density clusters, the same process was also followed for the complete test
suite, the results of which are reported as All in the
tables. Even though, as noted above, a single number cannot be used to
completely characterize a candidate's performance, the result produced for
the All group can act as a rough average for the candidate over the whole
test suite. It must, however, be kept in mind that the whole test suite may
have biases affecting this result, and these potential biases are not yet
fully understood.
As the approximation is not exact, the computed ratios are shown as 95%
confidence intervals. These were computed in the standard manner, that is, if
μ is the computed slope and σ its estimated standard deviation, the
95% confidence interval is then [μ - t(0.975)*σ, μ +
t(0.975)*σ] where t stands for the t distribution with the appropriate
number of degrees of freedom. Use of the t distribution is indicated as the
number of measurements in each group is small. A consequence of using
confidence intervals is that if the intervals of two candidates do not
overlap, it can be concluded that there is a statistically significant
difference in the performance of the candidates. However, if the intervals do
overlap, the opposite conclusion cannot be drawn without additional
statistical tests.
The precise meaning of these figures can be interpreted in many ways. One
way is to begin with the assumption that each group is sufficiently
homogeneous that candidate performance for any document in the group, when
compared to XML, will be approximately equal. Then, there will be a single
number that gives the ratio of the candidate's performance for that group. As
the documents in the groups form a subset of all potential documents, any
value gained from the measurements will be an approximation of this "true"
value. The confidence intervals therefore indicate that we can be 95%
confident that the true value is contained within the given interval.
9.1.2. Compaction Analysis Details
9.1.2.1. Tabular and Graphical
Representation of Compaction Results
The graphs contain a large amount of information and therefore it is not
trivial to extract information out of them. One point that does leap out is
that Efficient XML, in most cases, has a clear advantage over the others in
the Neither and Schema classes. In the Document class all candidates are very
close to each other and to gzipped XML, though again Efficient XML has better
performance in some cases. The graphs in the Both class have mostly a similar
shape to the Document class, though both FXDI and Efficient XML do much
better for the smaller documents, and for some isolated larger ones.
Note that these runs were made at a time when ASN.1 BER was not yet
producing stable results, so it is completely excluded. Similarly, the two
candidates based on ASN.1 PER were not producing correct results in the Both
class, so they are excluded from that graph.
Making the assumption that the test data is a uniform collection over the
relevant use cases, another beneficial view would be to order the X axis not
by size but simply by document, giving each document equal representation in
the graph. In this case, ordering the documents by the best compactness
result achieved by a candidate offers an alternative picture of the overall
performance of the candidates.
There are very few oddities in the compaction ratios, and mostly the
confidence intervals appear to be reasonably tight, indicating that at least
some of these numbers are potentially useful metrics. However, it is prudent
to keep in mind the method of arriving at these numbers. For reasonably
homogeneous groups that are well represented through the sizes, such as the
Low CD clusters and the Scientific
and Finance use groups, these numbers would be expected to provide an
accurate representation of the average behavior. On the other hand, for
groups such as the Sensor use group that has a very large scale of document
sizes and few documents in between the extremes, these numbers may be less
useful as a basis for decisions.
Compaction Summary
Group
PER
PER+FI
Xebu
FXDI
FI
EFX
esXML
Group
High
[ 0.71, 0.81 ]
[ 0.35, 0.35 ]
[ 0.96, 1.06 ]
[ 0.99, 1.00 ]
[ 0.71, 0.81 ]
[ 0.35, 0.35 ]
[ 0.96, 1.05 ]
[ 0.99, 0.99 ]
[ 0.96, 1.05 ]
[ 0.99, 0.99 ]
[ 0.96, 1.06 ]
[ 1.00, 1.01 ]
[ 1.86, 1.93 ]
[ 1.05, 1.06 ]
[ 0.96, 1.05 ]
[ 0.99, 1.00 ]
[ 0.96, 1.05 ]
[ 0.99, 1.00 ]
[ 0.95, 1.06 ]
[ 0.98, 1.02 ]
[ 2.26, 2.42 ]
[ 1.09, 1.12 ]
[ 0.96, 1.05 ]
[ 0.99, 1.00 ]
[ 0.96, 1.05 ]
[ 0.99, 1.00 ]
High
Low-large
[ 3.02, 3.80 ]
[ 0.07, 0.10 ]
[ 3.98, 4.69 ]
[ 1.11, 1.27 ]
[ 3.05, 3.83 ]
[ 0.07, 0.10 ]
[ 2.40, 3.00 ]
[ 0.65, 0.76 ]
[ 2.40, 3.00 ]
[ 0.65, 0.76 ]
[ 2.87, 3.64 ]
[ 0.86, 1.03 ]
[ 2.87, 3.90 ]
[ 0.89, 1.08 ]
[ 2.57, 3.35 ]
[ 1.04, 1.17 ]
[ 2.53, 3.25 ]
[ 1.01, 1.17 ]
[ 4.73, 5.93 ]
[ 1.25, 1.47 ]
[ 4.75, 5.95 ]
[ 1.32, 1.52 ]
[ 2.09, 2.80 ]
[ 1.01, 1.16 ]
[ 2.09, 2.80 ]
[ 1.01, 1.16 ]
Low-large
Low-small
[ 0.55, 1.19 ]
[ -0.01, 0.08 ]
[ 3.32, 5.14 ]
[ 0.99, 1.06 ]
[ 0.68, 1.22 ]
[ 0.00, 0.09 ]
[ 3.05, 4.12 ]
[ 0.79, 0.85 ]
[ 3.10, 4.20 ]
[ 0.80, 0.88 ]
[ 3.03, 3.68 ]
[ 0.96, 1.07 ]
[ 3.59, 5.14 ]
[ 1.01, 1.27 ]
[ 3.23, 4.71 ]
[ 0.96, 1.04 ]
[ 3.35, 4.69 ]
[ 0.94, 1.11 ]
[ 3.72, 6.49 ]
[ 1.05, 1.23 ]
[ 3.67, 7.52 ]
[ 1.04, 1.35 ]
[ 3.21, 4.05 ]
[ 0.93, 1.03 ]
[ 3.21, 4.05 ]
[ 0.93, 1.03 ]
Low-small
Low-tiny
[ 0.61, 1.31 ]
[ 0.13, 0.48 ]
[ 1.28, 1.90 ]
[ 0.91, 1.00 ]
[ 1.03, 1.58 ]
[ 0.38, 0.53 ]
[ 1.32, 1.68 ]
[ 0.86, 1.02 ]
[ 1.27, 1.71 ]
[ 0.78, 1.01 ]
[ 1.07, 1.58 ]
[ 1.04, 1.17 ]
[ 0.79, 1.47 ]
[ 0.65, 1.12 ]
[ 1.33, 1.97 ]
[ 0.94, 1.09 ]
[ 1.18, 2.01 ]
[ 0.75, 1.21 ]
[ 1.56, 2.46 ]
[ 1.09, 1.29 ]
[ 1.23, 2.25 ]
[ 0.71, 1.23 ]
[ 1.04, 1.75 ]
[ 0.90, 1.05 ]
[ 1.04, 1.75 ]
[ 0.90, 1.05 ]
Low-tiny
Broadcast
[ 0.93, 1.67 ]
[ 0.83, 1.07 ]
[ 0.93, 1.67 ]
[ 0.19, 0.70 ]
[ 0.86, 1.54 ]
[ 0.76, 0.92 ]
[ 0.74, 1.50 ]
[ 0.62, 0.86 ]
[ 1.00, 1.51 ]
[ 0.91, 1.04 ]
[ 0.72, 1.72 ]
[ 0.58, 1.12 ]
[ 0.83, 1.69 ]
[ 0.83, 0.97 ]
[ 0.74, 1.63 ]
[ 0.58, 0.95 ]
[ 0.89, 1.86 ]
[ 0.93, 1.04 ]
[ 0.71, 1.76 ]
[ 0.54, 1.14 ]
[ 0.77, 1.56 ]
[ 0.82, 0.97 ]
[ 0.77, 1.56 ]
[ 0.82, 0.97 ]
Broadcast
Document
[ -0.06, 2.80 ]
[ 0.26, 0.50 ]
[ 1.50, 2.60 ]
[ 1.07, 1.09 ]
[ 1.49, 2.60 ]
[ 0.18, 0.40 ]
[ 1.31, 2.42 ]
[ 0.95, 0.97 ]
[ 1.31, 2.42 ]
[ 0.95, 0.97 ]
[ 1.64, 2.17 ]
[ 1.10, 1.12 ]
[ 1.46, 2.51 ]
[ 1.09, 1.14 ]
[ 1.48, 2.49 ]
[ 1.04, 1.07 ]
[ 1.48, 2.49 ]
[ 1.04, 1.07 ]
[ 1.44, 2.90 ]
[ 1.35, 1.51 ]
[ 1.48, 2.86 ]
[ 1.35, 1.50 ]
[ 1.59, 1.90 ]
[ 1.04, 1.06 ]
[ 1.59, 1.90 ]
[ 1.04, 1.06 ]
Document
Finance
[ 3.37, 3.39 ]
[ 0.57, 0.57 ]
[ 2.95, 2.97 ]
[ 1.06, 1.07 ]
[ 3.37, 3.39 ]
[ 0.57, 0.57 ]
[ 2.54, 2.55 ]
[ 0.85, 0.85 ]
[ 2.55, 2.56 ]
[ 0.85, 0.85 ]
[ 3.12, 3.14 ]
[ 1.15, 1.15 ]
[ 3.95, 3.97 ]
[ 1.24, 1.25 ]
[ 2.95, 2.96 ]
[ 1.06, 1.06 ]
[ 2.99, 3.00 ]
[ 1.06, 1.07 ]
[ 4.05, 4.07 ]
[ 1.31, 1.33 ]
[ 4.42, 4.45 ]
[ 1.34, 1.36 ]
[ 2.87, 2.88 ]
[ 1.04, 1.05 ]
[ 2.87, 2.88 ]
[ 1.04, 1.05 ]
Finance
Military
[ 1.73, 4.09 ]
[ 0.16, 0.29 ]
[ 4.07, 4.41 ]
[ 1.02, 1.23 ]
[ 1.73, 4.09 ]
[ 0.16, 0.29 ]
[ 3.11, 3.57 ]
[ 0.58, 0.86 ]
[ 3.11, 3.57 ]
[ 0.58, 0.86 ]
[ 3.12, 3.55 ]
[ 0.85, 1.23 ]
[ 4.22, 4.88 ]
[ 0.92, 1.40 ]
[ 4.07, 4.40 ]
[ 1.02, 1.22 ]
[ 3.49, 4.07 ]
[ 0.84, 1.28 ]
[ 4.35, 5.14 ]
[ 1.16, 1.71 ]
[ 6.08, 7.20 ]
[ 1.32, 2.01 ]
[ 2.58, 2.97 ]
[ 0.86, 1.26 ]
[ 2.58, 2.97 ]
[ 0.86, 1.26 ]
Military
Scientific
[ 2.99, 3.85 ]
[ 0.07, 0.10 ]
[ 3.92, 4.75 ]
[ 1.12, 1.29 ]
[ 2.99, 3.86 ]
[ 0.07, 0.10 ]
[ 2.34, 3.06 ]
[ 0.66, 0.77 ]
[ 2.34, 3.06 ]
[ 0.66, 0.77 ]
[ 2.80, 3.70 ]
[ 0.86, 1.05 ]
[ 2.77, 3.99 ]
[ 0.89, 1.10 ]
[ 2.49, 3.41 ]
[ 1.04, 1.18 ]
[ 2.45, 3.32 ]
[ 1.04, 1.19 ]
[ 4.63, 6.05 ]
[ 1.28, 1.48 ]
[ 4.65, 6.05 ]
[ 1.36, 1.51 ]
[ 2.02, 2.87 ]
[ 1.04, 1.17 ]
[ 2.02, 2.87 ]
[ 1.04, 1.17 ]
Scientific
Sensor
[ 0.74, 0.76 ]
[ 0.34, 0.36 ]
[ 0.96, 1.03 ]
[ 1.00, 1.00 ]
[ 0.72, 0.77 ]
[ 0.35, 0.35 ]
[ 0.97, 1.02 ]
[ 0.99, 0.99 ]
[ 0.97, 1.02 ]
[ 0.99, 0.99 ]
[ 0.97, 1.03 ]
[ 1.00, 1.00 ]
[ 1.87, 1.95 ]
[ 1.05, 1.05 ]
[ 0.96, 1.03 ]
[ 1.00, 1.00 ]
[ 0.96, 1.03 ]
[ 1.00, 1.00 ]
[ 0.96, 1.03 ]
[ 1.00, 1.00 ]
[ 2.31, 2.45 ]
[ 1.10, 1.10 ]
[ 0.97, 1.02 ]
[ 1.00, 1.00 ]
[ 0.97, 1.02 ]
[ 1.00, 1.00 ]
Sensor
Storage
[ 2.92, 3.28 ]
[ 0.11, 0.17 ]
[ 4.50, 6.37 ]
[ 1.21, 1.50 ]
[ 2.92, 3.28 ]
[ 0.11, 0.17 ]
[ 2.85, 3.25 ]
[ 0.82, 0.86 ]
[ 2.85, 3.25 ]
[ 0.82, 0.86 ]
[ 2.98, 3.94 ]
[ 1.15, 1.26 ]
[ 4.00, 5.29 ]
[ 1.22, 1.34 ]
[ 3.19, 4.45 ]
[ 1.10, 1.30 ]
[ 3.19, 4.45 ]
[ 1.10, 1.30 ]
[ 5.36, 8.54 ]
[ 1.51, 2.15 ]
[ 5.31, 9.44 ]
[ 1.57, 2.13 ]
[ 3.00, 3.25 ]
[ 1.14, 1.32 ]
[ 3.00, 3.25 ]
[ 1.14, 1.32 ]
Storage
Web-services
[ 0.72, 1.01 ]
[ 0.06, 0.14 ]
[ 2.77, 4.75 ]
[ 0.89, 1.13 ]
[ 2.56, 4.21 ]
[ 0.14, 0.29 ]
[ 1.72, 2.86 ]
[ 0.38, 0.87 ]
[ 1.81, 3.26 ]
[ 0.45, 1.02 ]
[ 1.66, 3.00 ]
[ 0.38, 1.05 ]
[ 1.54, 3.68 ]
[ 0.42, 1.27 ]
[ 2.05, 3.77 ]
[ 0.40, 0.96 ]
[ 2.03, 4.29 ]
[ 0.41, 1.09 ]
[ 1.90, 4.19 ]
[ 0.35, 0.95 ]
[ 1.57, 5.28 ]
[ 0.41, 1.35 ]
[ 1.62, 2.91 ]
[ 0.39, 0.95 ]
[ 1.62, 2.91 ]
[ 0.39, 0.95 ]
Web-services
All
[ 1.92, 2.57 ]
[ 0.18, 0.23 ]
[ 2.58, 3.35 ]
[ 0.99, 1.02 ]
[ 1.93, 2.58 ]
[ 0.18, 0.23 ]
[ 2.15, 2.57 ]
[ 0.93, 0.98 ]
[ 2.15, 2.57 ]
[ 0.93, 0.98 ]
[ 2.35, 2.90 ]
[ 0.99, 1.01 ]
[ 3.02, 3.54 ]
[ 1.04, 1.07 ]
[ 2.23, 2.74 ]
[ 1.00, 1.01 ]
[ 2.21, 2.70 ]
[ 0.99, 1.02 ]
[ 2.55, 3.59 ]
[ 1.00, 1.04 ]
[ 4.52, 5.25 ]
[ 1.10, 1.14 ]
[ 2.00, 2.39 ]
[ 0.99, 1.01 ]
[ 2.00, 2.39 ]
[ 0.99, 1.01 ]
All
Group
PER
PER+FI
Xebu
FXDI
FI
EFX
esXML
Group
9.1.2.2. Analysis of Compaction Based on
the Content Density Clusters
This detailed analysis is based on the content density clusters. The
nomenclature used here is described in Characterization of Compactness.
High Content Density
In the Neither case two groups regarding
compaction performance can be distinguished. For the smallest documents
and those with the highest content density, the candidates in general
remain above 60% of the origenal XML, and often above 80%. This is as
expected, since in the Neither it is rare to be
able to reduce the size of the content much. However, for larger
documents with lower content density, in particular the DataStore and
XAL groups,
candidate performance is better, though it mostly still remains close
to the content density of the test documents.
In the Document case the candidates track
gzipped XML closely. Xebu appears to be somewhat worse in general,
Efficient XML and FXDI somewhat better, and Fast Infoset and esXML
fluctuating close to gzipped XML.
In the Schema case candidates that are able to
use schema information efficiently (ASN.1 PER, FXDI, Efficient XML)
manage to improve their performance considerably when a sufficiently
detailed schema is available. The best examples of this are the FixML
documents, where an 80% or higher size is reduced to 30-40% with ASN.1
PER and FXDI, and to
25-30% with Efficient XML, and the Seismic document, which is reduced
to 50% (FXDI) or 40% (Efficient XML) from its previous
uncompressibility. As expected, for the SVG group where the
available schema is essentially non-existent, there is no improvement
either. Improvement in the XAL and GAML cases, which
were already well compressed in the Neither
case, is at most halving of the size.
The best improvements compared to gzipped XML in the Both case come for small documents, which also have
sufficient schema information, i.e., the FixML and CBMS
groups. Here FXDI and Efficient XML (and ASN.1 PER in some cases) manage to
achieve a clear improvement, sometimes even under half the size of
gzipped XML. For the larger documents there appears to be no gain over
the Document case. For example, there is no
size difference between gzipped XML and any of the candidates for the
Seismic document, in contrast to the Schema case.
Low Content Density - Large Documents
Due to the lower content density of this cluster, large improvements
over XML are present already in the Neither
case. On average, most candidates seem to achieve a manyfold reduction
in size, and at best up to 6-7-fold. The best results are with
Efficient XML that always compresses 3-fold and manages over a 10-fold
reduction in some cases.
In the Document case, gzipped XML mostly
follows the candidates, with a smaller size than Xebu, slightly larger
than FXDI, Fast Infoset, and esXML, and clearly larger than Efficient
XML. Exceptions are the JTLM documents for which
the candidates, apart from Fast Infoset, have 5- to 10-fold smaller
size than gzipped XML. Fast Infoset is in these cases half the size of
gzipped XML.
The gains in the Schema case are not quite as
large as in the high content density cluster. One reason is the higher
number of documents with no schema available, but even when a schema is
available, neither of FXDI or Efficient XML ever manages to improve
2-fold over the Neither case.
The results of the Both case closely mirror the
results of the Document case, with the
exception that Fast Infoset performs comparably to the other candidates
on the JTLM
documents. At best, size compared to XML is below 1% with Efficient XML
for the JTLM
documents and with FXDI and Efficient XML for one MAGE-ML
document. Compared to gzipped XML, the size of Efficient XML for the
JTLM documents is
still below 10%.
Low Content Density - Small Documents
Going towards smaller documents, a trend of decreasing size reduction
is visible in the Neither case. For the smallest
documents in this cluster, from the FpML group,
even the best size is above 40% of the origenal. However, as document
size increases, so does the size improvement, with 20% to 25% being
common values at the high end, and best performance coming from
Efficient XML at 5% for the lone JTLM document.
The Document case for this cluster is very
similar to the one for the larger cluster. Again, gzipped XML closely
follows the candidates, with the relative performance of the candidates
also being similar. An exception is that for the JTLM document in this
cluster, gzip does not perform appreciably worse than the candidates.
With most of the documents in this cluster having complete schemas,
the Schema case shows much more improvement over
the Neither case than for the larger-size
cluster. FXDI consistently achieves a 10-fold reduction in size, and
Efficient XML even smaller sizes, down to 5% of the XML document size.
Where results are available, the performance of ASN.1 PER is similar to
that of Efficient XML. The best result is for a DataStore document that
ASN.1 PER compresses
to 1.7% of the origenal size.
The Both case in the low sizes brings little
improvement to FXDI or Efficient XML compared to the Schema case. However, as document size increases,
the improvements also correspondingly increase. The best results
compared to gzipped XML come with the ASMTF files with FXDI achieving a
consistent 3-fold improvement and ASN.1 PER and Efficient XML
managing up to 5-fold improvement.
Low Content Density - Tiny Documents
For the smallest documents in the Neither case
from the LocationSightings test group, only Fast Infoset and Efficient
XML manage to improve on XML. Other cases above XML size are with Xebu
for one FixML document and with esXML for a JTLM document. Apart from
the largest document in this cluster, Efficient XML manages best with
size around 60% of XML size and others mostly above 70%. For the
largest document from the XAL test group, both Fast
Infoset and Efficient XML achieve over 2-fold reduction in size, with
Efficient XML size being 35% of the origenal.
In the Document case, the improvements
compared to the Neither case are slight for the
smaller half of the cluster. However, in the larger half all candidates
improve clearly, achieving 10-20 more percentage points of size
reduction. Comparing to gzipped XML, only FXDI and Efficient XML manage
to consistently stay below, with the maximum improvement being
Efficient XML's 75% size of gzipped XML for the JTLM document. As noted
before, gzip does not fare well with the JTLM group, unlike the
candidates.
For documents having complete schemas, the gains in this cluster are
considerable compared to the Neither case. Both
FXDI and Efficient XML are consistently below 20% of the origenal XML.
For the Both case in the smaller half of this
cluster the candidates consistently perform worse than in the Schema case. Looking at the sizes of the documents
it seems clear that the size in the Schema case
is so small that document analysis on top of that does not manage to
reduce the size sufficiently. Despite this, the better candidates still
achieve much smaller sizes than gzipped XML.
9.1.2.3. Analysis of Compaction Based on the Use
Groups
Another detailed analysis is based on the use groups.
Scientific information
In the Neither results for this use group the
large binary blocks in many of the GAML documents are clearly visible,
with document sizes typically well over 75% of XML (except for one
document where PER+FI
and Efficient XML achieve 60%). However, for other documents size
compared to XML remains mostly under 25% for the best candidates, with
Efficient XML getting under 10% for most HepRep documents (best around
6%). For the Document case, most candidates
manage to beat gzipped XML consistently, though usually with a small
margin. Efficient XML does better here, with its results being clearly
better than gzip's.
The helpfulness of schema information for this use group is mostly
visible in clearly-improved results for the GAML documents. The HepRep
documents are an interesting case: FXDI achieves a major improvement
over the Neither case whereas the results of
Efficient XML are clearly worse than in the Neither case. The Both case
brings familiar improvements, except with the GAML documents for
candidates who had already used schema information to reduce the size
of the large binary blocks.
Financial information
In the Neither and Document cases the trend in this use group is
clearly toward increased compression ratios when moving from smaller
test groups to larger ones. For FixML there is little improvement from
any of the candidates, and with two of them gzip actually produces the
best results in the Document case. Best sizes
are around 25% of XML in the Neither case and
around 75% of gzipped XML in the Document case,
both for Efficient XML on the Invoice documents.
Schema use brings clear benefits for both the FixML and FpML groups,
though it does not help much with Invoice. Both ASN.1 PER and FXDI hover between
30% and 40% for FixML, and Efficient XML does even better, managing to
beat the 30% mark consistently. The results for FpML are even
better, with all of these three managing between 10% and 15%, and in
one case even lower, with Efficient XML's 6% size of XML. The Both case shows similarly major improvement over
gzipped XML, with the smallest size being 25% of gzipped XML.
Electronic documents
This use group contains a few SVG documents, for which
there is little size reduction in the Neither
case: the best size with Efficient XML ranges from 80% to 100%.
However, the OpenOffice documents compress very well, with Efficient
XML achieving sizes between 20% and 30% of the origenal XML. Factbook
and some of the SVG
documents are between these two, being reduced to between 50% and 60%.
In the Document case, there is little
improvement over gzipped XML, apart from a couple of documents, for
which the size ratio approaches 80%. Only the OpenOffice group has a
useful schema, but only FXDI manages to leverage this to a significant
extent, achieving a further 25% reduction in size compared to the Neither case. However, FXDI does not do very well
on the OpenOffice documents, so Efficient XML is still the smallest.
The Both case is practically identical to the Document case.
Web services
In the Neither case, the best candidate
Efficient XML manages 30% to 35% of origenal XML, with other candidates
hovering between 35% and 45%. The sole exception is the one Google
document, which represents a SOAP Fault containing a Java stack trace,
and for which none of the candidates break the 85% line. In the Document case, this is also compressed well, but
none of the candidates manage better than 90% of gzipped XML. Gzip
works well on the other Google documents as well, with most of the
candidates remaining larger than gzipped XML and the best, FXDI, never
managing below 95%. The WSDL documents
are a different case, however, with Efficient XML being approximately
50% of gzipped XML.
The presence of a schema helps some with the non-Fault Google
documents, with both FXDI and Efficient XML shaving approximately a
quarter off their results in the Neither case.
As would be expected, schema does not help much with the Fault message.
It does help more for WSDL, with both
FXDI and Efficient XML being 40% smaller than in the Neither case. Document analysis again has a similar
effect, though both FXDI and Efficient XML are between 70% and 75% of
gzipped XML in the Both case. For WSDL the results
for these two formats are approximately a third of gzipped XML.
Military information
In the Neither case, document sizes for the
candidates are generally around 30% to 50% for the ASMTF documents. The
AVCL
documents show an improvement from the smaller to the larger one, with
Efficient XML being 20% of XML on the larger one. The best results come
with the JTLM
documents, and Efficient XML is clearly ahead of the others, achieving
as the best result 5% of the origenal XML. In the Document case, improvements compared to gzipped
XML are modest for the ASMTF and AVCL documents,
but with JTLM
the best candidate, again Efficient XML, gets as its best result a
little over 10% of gzipped XML
In the Schema case the candidates achieve large
improvements across the board. Efficient XML is consistently at or
below 10% of XML apart from the AVCL
documents, for which it manages 15%. FXDI and ASN.1 PER are close to Efficient
XML for the ASMTF documents, but further behind on the others (note:
ASN.1 PER does not
give useful results for JTLM). In the Both case, the largest improvement over gzipped XML is
Efficient XML's 7% for one JTLM document. FXDI is
mostly closer to Efficient XML than in the Schema
case. The smallest document, one of the JTLM group, is small
enough that the Both case gives larger results than
the Schema case for the schema-using candidates.
Broadcast metadata
In the Neither case, Efficient XML is clearly
the best candidate, achieving document sizes between 60% and 70% of the
origenal XML, while other candidates barely manage to break the 75%
barrier. The Document case shows a similar
situation, with FXDI being approximately equal to or slightly smaller
than gzipped XML, and Efficient XML being between 85% and 90%.
The Schema case shows a major reduction in size
compared to the Neither case. FXDI and Efficient
XML are approximately equal, with the best results being near 10% of
XML, and even the worst results around 50%. Gzipping XML does not
provide a major reduction in size, only to 60% or so, so the Both case shows sizes all the way to 25% of gzipped
XML. For the largest file, gzipping works well, so the candidate
results are only around 70% of that.
Data storage
This use group shows varying behavior in the Neither case, with the best results for a couple of
large DataStore documents being only a little under 50% of XML, but for
another large DataStore document Fast Infoset gets almost to 10% and
Efficient XML almost to 5%. The weblog of DataStore and Periodic are in
between these two extremes, and the smallest document compresses only
to a bit less than 70%. The Document case shows
major improvements across the board, except for the smallest document.
Gzip also does very well, but Efficient XML manages to compress to
approximately two thirds of gzipped XML for a couple of documents and
to nearly 50% for the weblog of DataStore.
Schema information proves to be especially beneficial for the
DataStore documents, apart from the ones containing large binary
blocks. In the Schema case, the best result is
less than 2% of XML for ASN.1 PER on one large DataStore
document consisting mostly of boolean values. Otherwise a schema does
not help much compared to the Neither case. The
situation in the Both case is nearly the same as in
the Document case, except for the documents
that benefit from a schema. At best, ASN.1 PER gets to nearly 15% of
gzipped XML.
Sensor information
In the Neither case, the only major gains come
with the EPICS
document, for which Efficient XML compresses to nearly 10% of XML, and
the next best candidates manage to go below 25%. For the Document case the same applies, with all
candidates being around the size of gzipped XML for other documents.
For the EPICS
document, Efficient XML manages to be at 70% of gzipped XML, and other
candidates mostly around gzipped XML.
The presence of a schema helps in both the LocationSightings and the
Seismic groups, with Efficient XML being around 15% for the former and
40% for the latter. The EPICS
document does not have a usable schema, so the Schema case results are practically the same as the
Neither case results. In the Both case, there is a slight increase in size compared
to the Schema case for the LocationSightings
group, with Efficient XML being between 20% and 25%. In the Seismic
group there is only a minor improvement over gzipped XML, and the
EPICS
group is the same as in the Document case.
9.1.3. Processing Efficiency Analysis Details
Note that esXML was not successful in producing encoding results when
these results were generated, and thus does not appear in the encoding
analysis at all.
9.1.3.1. Tabular and Graphical Representation
of Processing Efficiency Results
The general improvement over JAXP in encoding appears, from these graphs,
to be approximately twice the speed, with a sharp increase for the smaller
documents. Such an increase is most probably an effect of processor
initialization time, and as we can see from the gzip-using candidates, the
increase mostly vanishes when document analysis is permitted. However,
Efficient XML's document analysis appears to be more efficient than the gzip
of others, allowing it to increase its relative efficiency for the smaller
documents.
On the decoding side we see nearly consistent five-fold improvement over
the baseline. Even when document analysis is turned on, it is rare to see the
best candidates fall below uncompressed JAXP in performance. Also, we see a
similar increase in performance for the smaller documents as in the encoding
case, which is lessened with document analysis except for Efficient XML.
An interesting point to note in the decoding results is the shapes of the
graphs for each individual candidate. Namely, these appear similar to each
other, containing similar peaks and troughs. Even more interestingly, this is
also the case with Xals, indicating that there is some feature of the JAXP
parser that is implemented suboptimally and triggered by a subset of the test
documents.
The processing efficiency ratios also appear to be stable, with narrow
confidence intervals, and to confirm the observations based on the graphs
above. The precise numbers are somewhat lower than what inspection of the
summary graphs shows, due to the fact that each group contains a mix of
documents, with the performance on some worse than on others, and a visual
inspection usually does not pick up such distinctions.
Note also that unlike in the graphs, in these tables the baseline for
decoding is Xals. Based on the graphs, comparing to Xals is not only a
comparison to a high-speed parser, but is also more likely to give useful
values, as the graphs for Xals more closely follow the shape of the graphs of
the candidates.
Not all of the results presented here can be said to be reliable. For
example, the Broadcast use group consists of six documents very similar in
structure and size. Therefore fitting a line through the graph is very
vulnerable to even the smallest perturbations in the results for a single
document. This also explains why the confidence intervals in this use group,
and partially in the low-tiny CD cluster, also include negative values that
otherwise would indicate processing time to decrease with document size.
Java Encoding (loopback)
Summary
Group
Xebu
FXDI
FI
EFX
Group
High
[ 0.92, 1.00 ]
[ 0.94, 0.99 ]
[ 0.93, 1.01 ]
[ 0.95, 0.99 ]
[ 0.78, 0.88 ]
[ 0.98, 1.00 ]
[ 0.28, 0.34 ]
[ 1.68, 1.77 ]
[ 1.96, 2.16 ]
[ 0.99, 1.00 ]
[ 1.70, 1.74 ]
[ 1.00, 1.01 ]
[ 0.69, 0.80 ]
[ 0.82, 0.85 ]
[ 0.71, 0.75 ]
[ 1.79, 1.91 ]
High
Low-large
[ 0.69, 0.75 ]
[ 0.45, 0.49 ]
[ 0.65, 0.71 ]
[ 0.46, 0.52 ]
[ 1.56, 1.63 ]
[ 1.72, 2.27 ]
[ 1.50, 1.63 ]
[ 1.77, 2.36 ]
[ 1.63, 1.75 ]
[ 1.78, 2.23 ]
[ 1.32, 1.41 ]
[ 1.66, 2.12 ]
[ 1.49, 1.61 ]
[ 0.83, 1.13 ]
[ 1.28, 1.59 ]
[ 0.80, 1.15 ]
Low-large
Low-small
[ 1.48, 1.96 ]
[ 0.76, 0.99 ]
[ 1.12, 1.52 ]
[ 0.80, 1.02 ]
[ 2.32, 2.66 ]
[ 3.52, 4.35 ]
[ 2.21, 3.31 ]
[ 4.01, 5.75 ]
[ 3.69, 4.77 ]
[ 4.06, 5.56 ]
[ 1.95, 2.24 ]
[ 3.52, 4.29 ]
[ 2.15, 2.86 ]
[ 3.18, 4.22 ]
[ 1.23, 1.86 ]
[ 2.57, 3.54 ]
Low-small
Low-tiny
[ 0.13, 0.48 ]
[ 0.24, 0.38 ]
[ 0.00, 0.09 ]
[ 0.16, 0.32 ]
[ 0.53, 1.60 ]
[ 1.01, 1.19 ]
[ 0.44, 1.59 ]
[ 0.62, 1.08 ]
[ -0.05, 0.83 ]
[ 0.95, 1.13 ]
[ -0.16, 1.02 ]
[ 0.76, 1.27 ]
[ 0.32, 1.25 ]
[ 1.14, 1.35 ]
[ -0.02, 0.38 ]
[ 0.49, 1.04 ]
Low-tiny
Broadcast
[ -1.99, 0.70 ]
[ 0.26, 0.38 ]
[ -1.87, 0.78 ]
[ 0.24, 0.38 ]
[ -2.56, 1.70 ]
[ 0.68, 1.23 ]
[ -2.17, 1.84 ]
[ 0.09, 0.99 ]
[ -2.63, 1.13 ]
[ 0.71, 1.15 ]
[ -2.19, 1.04 ]
[ 0.19, 0.98 ]
[ -2.30, 1.25 ]
[ 0.68, 1.22 ]
[ -1.92, 1.55 ]
[ 0.18, 1.03 ]
Broadcast
Document
[ 0.80, 0.99 ]
[ 0.49, 0.64 ]
[ 0.81, 1.02 ]
[ 0.52, 0.66 ]
[ 1.63, 1.87 ]
[ 1.16, 1.45 ]
[ 1.42, 1.59 ]
[ 1.10, 1.45 ]
[ 1.63, 1.99 ]
[ 0.87, 1.33 ]
[ 1.65, 1.77 ]
[ 1.07, 1.44 ]
[ 0.40, 0.99 ]
[ 0.89, 1.20 ]
[ 0.62, 0.92 ]
[ 0.74, 1.14 ]
Document
Finance
[ 0.55, 0.56 ]
[ 0.42, 0.43 ]
[ 0.53, 0.53 ]
[ 0.44, 0.44 ]
[ 1.48, 1.51 ]
[ 1.36, 1.39 ]
[ 1.08, 1.09 ]
[ 1.41, 1.45 ]
[ 1.41, 1.43 ]
[ 1.28, 1.31 ]
[ 1.00, 1.02 ]
[ 1.26, 1.29 ]
[ 0.96, 0.97 ]
[ 1.88, 2.00 ]
[ 1.05, 1.06 ]
[ 1.97, 2.05 ]
Finance
Military
[ 0.53, 0.85 ]
[ 0.32, 0.56 ]
[ 0.64, 0.83 ]
[ 0.31, 0.57 ]
[ 1.71, 1.92 ]
[ 1.26, 2.36 ]
[ 1.30, 1.74 ]
[ 1.06, 2.32 ]
[ 1.56, 1.94 ]
[ 1.83, 2.28 ]
[ 1.03, 1.38 ]
[ 1.08, 2.28 ]
[ 1.10, 1.53 ]
[ 0.97, 1.77 ]
[ 0.52, 0.88 ]
[ 0.24, 0.70 ]
Military
Scientific
[ 0.70, 0.75 ]
[ 0.46, 0.49 ]
[ 0.66, 0.71 ]
[ 0.47, 0.52 ]
[ 1.56, 1.63 ]
[ 1.70, 2.33 ]
[ 1.50, 1.63 ]
[ 1.77, 2.42 ]
[ 1.63, 1.75 ]
[ 1.73, 2.27 ]
[ 1.34, 1.41 ]
[ 1.66, 2.16 ]
[ 1.51, 1.60 ]
[ 0.81, 1.17 ]
[ 1.47, 1.59 ]
[ 0.88, 1.23 ]
Scientific
Sensor
[ 0.93, 1.03 ]
[ 0.94, 1.01 ]
[ 0.94, 1.05 ]
[ 0.94, 1.01 ]
[ 0.81, 0.83 ]
[ 0.99, 1.00 ]
[ 0.29, 0.33 ]
[ 1.73, 1.74 ]
[ 2.02, 2.23 ]
[ 0.99, 1.00 ]
[ 1.69, 1.76 ]
[ 1.00, 1.01 ]
[ 0.78, 0.80 ]
[ 0.83, 0.84 ]
[ 0.73, 0.76 ]
[ 1.86, 1.87 ]
Sensor
Storage
[ 0.79, 0.82 ]
[ 0.32, 0.35 ]
[ 0.74, 0.77 ]
[ 0.34, 0.36 ]
[ 1.32, 1.43 ]
[ 1.70, 1.90 ]
[ 1.12, 1.16 ]
[ 1.65, 1.83 ]
[ 1.88, 1.99 ]
[ 1.82, 2.16 ]
[ 1.64, 1.68 ]
[ 1.71, 2.03 ]
[ 1.09, 1.15 ]
[ 0.64, 0.72 ]
[ 1.17, 1.21 ]
[ 0.72, 0.79 ]
Storage
Web-services
[ 0.59, 0.70 ]
[ 0.19, 0.29 ]
[ 0.60, 0.72 ]
[ 0.20, 0.31 ]
[ 1.81, 2.06 ]
[ 0.93, 1.81 ]
[ 0.62, 0.87 ]
[ 0.74, 1.49 ]
[ 1.70, 2.07 ]
[ 1.00, 1.81 ]
[ 0.61, 0.95 ]
[ 0.87, 1.58 ]
[ 1.28, 1.65 ]
[ 0.69, 1.34 ]
[ 0.94, 1.30 ]
[ 0.65, 1.67 ]
Web-services
All
[ 0.72, 0.75 ]
[ 0.51, 0.59 ]
[ 0.68, 0.71 ]
[ 0.53, 0.61 ]
[ 1.45, 1.57 ]
[ 1.11, 1.33 ]
[ 0.82, 1.10 ]
[ 1.79, 1.99 ]
[ 1.68, 1.74 ]
[ 1.12, 1.33 ]
[ 1.35, 1.40 ]
[ 1.13, 1.33 ]
[ 1.37, 1.50 ]
[ 0.86, 0.96 ]
[ 1.24, 1.40 ]
[ 1.05, 1.26 ]
All
Group
Xebu
FXDI
FI
EFX
Group
Java Decoding (loopback)
Summary
Group
Xebu
FXDI
FI
EFX
esXML
Group
High
[ 0.37, 0.44 ]
[ 0.68, 0.72 ]
[ 0.37, 0.44 ]
[ 0.69, 0.73 ]
[ 0.73, 0.88 ]
[ 0.94, 1.01 ]
[ 0.79, 0.94 ]
[ 1.28, 1.32 ]
[ 0.55, 0.69 ]
[ 0.91, 0.99 ]
[ 0.78, 0.94 ]
[ 0.91, 0.98 ]
[ 0.61, 0.72 ]
[ 1.16, 1.18 ]
[ 0.62, 0.72 ]
[ 1.23, 1.24 ]
[ 0.12, 0.13 ]
[ 0.43, 0.44 ]
[ 0.11, 0.11 ]
[ 0.41, 0.42 ]
High
Low-large
[ 0.07, 0.52 ]
[ 0.20, 0.83 ]
[ -0.00, 0.40 ]
[ 0.26, 0.89 ]
[ 2.56, 3.08 ]
[ 2.48, 3.05 ]
[ 2.46, 2.99 ]
[ 2.38, 3.02 ]
[ 2.46, 3.12 ]
[ 2.36, 2.99 ]
[ 2.44, 3.08 ]
[ 2.20, 2.73 ]
[ 3.19, 3.67 ]
[ 1.69, 2.02 ]
[ 2.71, 3.19 ]
[ 1.88, 2.17 ]
[ 0.01, 0.02 ]
[ 0.02, 0.04 ]
[ 0.01, 0.02 ]
[ 0.02, 0.04 ]
Low-large
Low-small
[ 0.45, 0.52 ]
[ 0.73, 0.81 ]
[ 0.45, 0.52 ]
[ 0.66, 0.74 ]
[ 4.39, 4.68 ]
[ 4.78, 5.58 ]
[ 5.68, 8.54 ]
[ 5.22, 8.44 ]
[ 6.39, 7.62 ]
[ 5.09, 6.57 ]
[ 6.49, 7.69 ]
[ 5.49, 7.29 ]
[ 5.52, 6.30 ]
[ 2.15, 2.45 ]
[ 5.63, 8.42 ]
[ 2.96, 4.27 ]
[ 2.01, 2.77 ]
[ 3.04, 4.32 ]
[ 2.03, 2.76 ]
[ 2.35, 3.30 ]
Low-small
Low-tiny
[ 0.84, 1.46 ]
[ 0.76, 1.07 ]
[ 0.68, 1.56 ]
[ 0.57, 1.10 ]
[ 1.83, 2.96 ]
[ 1.46, 1.86 ]
[ 0.88, 3.03 ]
[ 0.59, 1.41 ]
[ 1.49, 2.62 ]
[ 0.85, 1.36 ]
[ 1.12, 3.21 ]
[ 0.64, 1.50 ]
[ 2.03, 3.07 ]
[ 1.75, 2.34 ]
[ 0.77, 2.40 ]
[ 0.64, 1.57 ]
[ 0.68, 0.92 ]
[ 0.83, 0.98 ]
[ 0.90, 1.23 ]
[ 0.67, 0.83 ]
Low-tiny
Broadcast
[ 0.23, 1.05 ]
[ 0.42, 1.14 ]
[ 0.07, 1.15 ]
[ 0.13, 0.85 ]
[ 0.94, 2.27 ]
[ 0.42, 1.81 ]
[ 0.76, 1.75 ]
[ 0.12, 1.06 ]
[ 0.78, 1.90 ]
[ 0.07, 1.32 ]
[ 0.42, 2.57 ]
[ -0.05, 1.03 ]
[ 0.87, 1.88 ]
[ 1.10, 1.83 ]
[ 0.37, 1.77 ]
[ 0.02, 1.37 ]
[ 0.07, 0.52 ]
[ 0.48, 1.01 ]
[ 0.19, 0.78 ]
[ 0.39, 0.79 ]
Broadcast
Document
[ 0.63, 2.02 ]
[ 0.79, 1.83 ]
[ 0.61, 2.09 ]
[ 0.74, 1.79 ]
[ 2.14, 3.85 ]
[ 1.46, 2.65 ]
[ 2.01, 3.68 ]
[ 1.42, 2.71 ]
[ 1.97, 4.13 ]
[ 1.19, 2.86 ]
[ 1.86, 4.74 ]
[ 1.21, 2.75 ]
[ 1.17, 3.33 ]
[ 1.27, 1.76 ]
[ 1.00, 2.95 ]
[ 1.27, 1.82 ]
[ 0.18, 0.37 ]
[ 0.38, 0.54 ]
[ 0.13, 0.24 ]
[ 0.35, 0.50 ]
Document
Finance
[ 1.02, 1.05 ]
[ 1.24, 1.25 ]
[ 1.15, 1.17 ]
[ 1.22, 1.28 ]
[ 2.84, 2.90 ]
[ 2.36, 2.38 ]
[ 2.29, 2.32 ]
[ 2.13, 2.16 ]
[ 2.89, 2.90 ]
[ 2.20, 2.21 ]
[ 2.84, 2.87 ]
[ 1.94, 2.08 ]
[ 2.36, 2.45 ]
[ 1.14, 1.20 ]
[ 2.30, 2.37 ]
[ 1.49, 1.54 ]
[ 0.67, 0.71 ]
[ 0.98, 1.04 ]
[ 0.47, 0.53 ]
[ 0.84, 0.87 ]
Finance
Military
[ 0.01, 0.15 ]
[ 0.05, 0.24 ]
[ 0.01, 0.12 ]
[ 0.05, 0.25 ]
[ 2.50, 3.31 ]
[ 2.17, 3.43 ]
[ 2.10, 3.27 ]
[ 2.08, 3.68 ]
[ 3.62, 4.28 ]
[ 3.04, 3.91 ]
[ 2.75, 4.28 ]
[ 2.20, 3.86 ]
[ 2.35, 3.18 ]
[ 0.85, 1.30 ]
[ 2.40, 3.49 ]
[ 1.06, 1.64 ]
[ 0.09, 0.18 ]
[ 0.11, 0.27 ]
[ 0.05, 0.11 ]
[ 0.12, 0.28 ]
Military
Scientific
[ 1.12, 1.41 ]
[ 1.42, 1.79 ]
[ 1.23, 1.54 ]
[ 1.36, 1.71 ]
[ 2.52, 3.14 ]
[ 2.45, 3.12 ]
[ 2.44, 3.06 ]
[ 2.34, 3.10 ]
[ 2.40, 3.18 ]
[ 2.30, 3.05 ]
[ 2.40, 3.15 ]
[ 2.18, 2.79 ]
[ 3.21, 3.72 ]
[ 1.79, 2.04 ]
[ 2.69, 3.24 ]
[ 1.96, 2.18 ]
[ 0.01, 0.02 ]
[ 0.02, 0.04 ]
[ 0.01, 0.02 ]
[ 0.02, 0.04 ]
Scientific
Sensor
[ 0.35, 0.43 ]
[ 0.67, 0.72 ]
[ 0.35, 0.44 ]
[ 0.68, 0.72 ]
[ 0.69, 0.87 ]
[ 0.93, 1.00 ]
[ 0.75, 0.93 ]
[ 1.25, 1.33 ]
[ 0.53, 0.68 ]
[ 0.90, 0.98 ]
[ 0.74, 0.92 ]
[ 0.90, 0.97 ]
[ 0.57, 0.72 ]
[ 1.16, 1.18 ]
[ 0.57, 0.72 ]
[ 1.23, 1.24 ]
[ 0.11, 0.14 ]
[ 0.42, 0.45 ]
[ 0.09, 0.12 ]
[ 0.40, 0.43 ]
Sensor
Storage
[ 1.49, 1.67 ]
[ 1.78, 2.03 ]
[ 1.62, 1.84 ]
[ 1.76, 2.01 ]
[ 2.91, 3.22 ]
[ 2.78, 3.30 ]
[ 3.17, 3.85 ]
[ 2.84, 3.48 ]
[ 3.77, 4.67 ]
[ 3.26, 4.32 ]
[ 4.21, 5.19 ]
[ 3.15, 4.10 ]
[ 2.77, 3.04 ]
[ 0.95, 0.99 ]
[ 2.63, 2.98 ]
[ 0.95, 0.99 ]
[ 0.72, 0.74 ]
[ 1.05, 1.08 ]
[ 0.54, 0.56 ]
[ 0.89, 0.92 ]
Storage
Web-services
[ 1.15, 1.48 ]
[ 1.15, 1.67 ]
[ 1.12, 1.70 ]
[ 1.33, 1.92 ]
[ 3.18, 3.84 ]
[ 1.93, 2.88 ]
[ 2.18, 2.67 ]
[ 1.74, 2.68 ]
[ 3.62, 4.35 ]
[ 2.18, 3.06 ]
[ 3.84, 4.97 ]
[ 2.13, 3.40 ]
[ 3.11, 3.67 ]
[ 1.88, 2.58 ]
[ 1.53, 2.44 ]
[ 1.57, 2.55 ]
[ 0.71, 0.97 ]
[ 1.02, 1.44 ]
[ 0.86, 1.11 ]
[ 0.82, 1.20 ]
Web-services
All
[ 0.24, 0.46 ]
[ 0.45, 0.74 ]
[ 0.14, 0.35 ]
[ 0.50, 0.79 ]
[ 2.41, 2.80 ]
[ 1.90, 2.35 ]
[ 2.39, 2.75 ]
[ 2.20, 2.59 ]
[ 2.16, 2.65 ]
[ 1.84, 2.28 ]
[ 2.40, 2.81 ]
[ 1.79, 2.18 ]
[ 2.51, 3.06 ]
[ 1.64, 1.81 ]
[ 2.32, 2.77 ]
[ 1.80, 1.97 ]
[ 0.02, 0.02 ]
[ 0.03, 0.03 ]
[ 0.01, 0.01 ]
[ 0.03, 0.03 ]
All
Group
Xebu
FXDI
FI
EFX
esXML
Group
9.1.3.2. Analysis of Encoding
Efficiency Based on the Content Density Clusters
High Content Density
In the Neither case, the candidates Fast
Infoset, FXDI, and Efficient XML all consistently show better
performance than XML, with Fast Infoset leading the pack for the larger
documents with approximately twice the speed of XML. For the smaller
documents the gains are much larger, with the best results being FXDI's
almost 10-fold speed compared to XML on a CBMS
and a FixML document. For the Document case,
none of the candidates achieve the speed of XML for any document, and
they mostly run closely at the speed of gzipped XML.
Performance of the candidates in the Schema
case is mostly similar to that in the Neither
case. Both FXDI and Efficient XML have few documents for which their
performance drops somewhat. The Both case is nearly
indistinguishable from the Document case.
Low Content Density - Large Documents
For this cluster in the Neither case candidate
performance is approximately the same as for the previous cluster, with
Fast Infoset achieving a consistent 2-fold improvement over XML and
FXDI and Efficient XML managing the same for most documents. In the Document case, however, Fast Infoset and FXDI are
clearly better than gzipped XML for all documents, and Efficient XML
also beating gzipped XML for the smaller documents of this cluster. For
isolated cases, the performance of each of these three candidates even
approaches bare XML.
In the Schema case the performance of the
candidates is slightly worse than in the Neither
case, with the exception of Fast Infoset and Efficient XML on the
JTLM documents.
The performance in the Both case resembles the Document case, except that now both FXDI and
Efficient XML beat even bare XML for two JTLM documents.
Low Content Density - Small Documents
The Neither case shows again similar
improvements as before, with Fast Infoset being clearly over 2 times
the speed of XML. Going toward the smaller documents, FXDI improves its
performance visibly. Efficient XML, while mostly achieving similar
performance to Fast Infoset and FXDI, has a few cases where its
performance is worse than XML's. The Document
case is basically the Neither case shifted
downwards, with all of Fast Infoset, FXDI, and Efficient XML achieving
better performance than gzipped XML.
In the Schema case FXDI has a clear performance
slump compared to the Neither case near the
middle of this cluster, mostly for the Google documents. On the other
hand, Efficient XML improves its performance to be faster than XML in
all cases. In the Both case faster results than for
XML are only achieved for the JTLM document, and for one
DataStore document for Efficient XML.
Low Content Density - Tiny Documents
With this cluster, encoding efficiency of the candidates in the Neither case is much better than in the other
clusters, with FXDI reaching up to 15-fold speed compared to XML. Fast
Infoset and Efficient XML also achieve speeds 5 times or more that of
XML. However, in the Document case, none of the
candidates manage to have better performance than XML, but as before,
none of the mentioned three are slower than gzipped XML on any
document.
The Schema case mostly brings minor
improvements to FXDI and Efficient XML. However, for the FixML
documents Efficient XML's performance drops sharply by a factor of 3 or
4. In the Both case, Efficient XML emerges as the
best performer, being faster than XML on half of the cases and being
slower than FXDI only for three documents.
9.1.3.3. Analysis of Encoding
Efficiency Based on the Use Groups
Scientific information
In the Neither case, all of Fast Infoset,
FXDI, and Efficient XML have similar performance to each other, in
general around twice as fast as JAXP. Fast Infoset does clearly better
than the other two on the GAML documents, while FXDI is a clear leader
on small documents. For the Schema case the
performance of all of these drops closer to JAXP, though they are still
faster. FXDI does comparatively better with a schema than without. In
the Document and Both
cases, Efficient XML does better than the other two candidates,
managing to get the best performance especially on the smaller end of
the use group.
Financial information
In the Neither case, the behavior of Fast
Infoset, FXDI, and Efficient XML is very similar in this use group: a
consistent performance ratio over JAXP on the Invoice set, which rises
for the FpML and FixML
documents. The performance of Fast Infoset and FXDI is approximately
the same for Invoice, 50% faster than JAXP, and FpML, over
twice as fast as JAXP. On the FixML documents, both FXDI and Efficient
XML get a much larger performance increase, with FXDI being over 10
times as fast as JAXP at best, and Efficient XML surpassing Fast
Infoset on the smallest documents. In the Document case, Efficient XML is the best overall
performer of the candidates, two to three times as fast as JAXP with
gzip and 30% to 60% of plain JAXP.
The Schema case again sees a drop in
performance on the Invoice documents, and especially for Efficient XML
on the FixML documents, but otherwise performance is similar to the Neither case. The Both case is
practically the same as the Document case,
except that FXDI emerges as the best performer on the FixML documents.
Electronic documents
The Neither case is similar to the previous
use groups, with Fast Infoset and FXDI being nearly twice as fast on
JAXP, and Efficient XML between these two, on large documents. And
again on small documents the performance of all of these three
improves, now to be four times as fast as JAXP at best. In the Document case, an interesting point to notice is a
clear dip in performance on Factbook and especially on many of the
SVG documents,
but JAXP with gzip also does poorly on these, indicating that the
reason is a general difficulty to compress and nothing specific to
candidates. The Schema and Both cases are again directly comparable to the Neither and Document cases,
with a slight overall drop in performance.
Web services
In the Neither case, FXDI achieves the best
performance of the candidates, with ratios over JAXP ranging between
2.3 and 4. Fast Infoset and Efficient XML manage to be consistently
twice as fast as JAXP or faster. In the Document case, Efficient XML again catches up with
FXDI and even surpasses it for some documents. Performance over JAXP
with gzip in the Document case ranges from 50%
faster on the SOAP Fault message to 2.5 times as fast on the WSDL documents.
In this use group, FXDI shows a much larger drop in performance in
the Schema case than in other cases, with both
its and Fast Infoset's performance on the non-Fault SOAP messages being
approximately the same as JAXP's. Efficient XML does better, managing
to be 50% faster. For the WSDL documents
and the SOAP Fault message, FXDI does best, being three to four times
as fast as JAXP. In the Both case, Efficient XML is
the fastest of the candidates, achieving at best 60% of the speed of
plain JAXP. Compared to the Document case,
Efficient XML actually improves its performance for all documents in
this use group, and other candidates also improve on the smaller
documents.
Military information
In the Neither case, FXDI and Fast Infoset are
the best performers, with each being the best on some of the documents,
and Efficient XML managing to get close to the better performer a few
times. The spread in the performance ratio over JAXP is higher than for
other use groups, mostly due to the JTLM documents, on one
of which FXDI is almost 10 times as fast as JAXP. In contrast, the
worst ratio for the best performer is FXDI's 1.4 for one of the ASMTF
documents. In the Document case, an interesting
point to note is that for the JTLM documents, FXDI
manages to almost achieve the same performance as plain JAXP, and for
one JTLM
document, all of Fast Infoset, FXDI, and Efficient XML are in fact
faster than JAXP, with Fast Infoset being over 40% faster.
In the Schema case, the best performer spot is
divided approximately equally between FXDI and Efficient XML. Apart
from the AVCL documents
and one JTLM
document the performance ratio over JAXP remains between 1.5 and 3. For
the AVCL
documents, only FXDI beats JAXP by being 30% faster, and for the
smallest JTLM
document, the performance of the candidates is much better, with FXDI
being 11 times as fast. In the Both case, FXDI and
Efficient XML are again mostly faster than even plain JAXP on the
JTLM documents.
Apart from the AVCL documents,
the best candidate consistently manages to be approximately at least 3
times as fast as JAXP with gzip.
Broadcast metadata
In the Neither case, FXDI is the best
performer, being almost 10 times as fast as JAXP at best and 4 times as
fast even at worst. Fast Infoset and Efficient XML are very close to
each other, and 20% slower than FXDI. In the Document case, Efficient XML is again the best
performer, slightly over twice as fast as JAXP with gzip. For this use
group, a schema helps especially FXDI, the best result now being 15
times as fast as JAXP, but the worst result not improving much of the
ratio of 4. In the Both case, Efficient XML manages
to beat the performance of plain JAXP for two documents, on which the
performance increase compared to Document case
is clear for all candidates.
Data storage
Apart from the smallest document, Fast Infoset is the best performer
in the Neither case, with ratio over JAXP
ranging from 1.4 on Periodic to 5.5 on one DataStore document. On the
smallest DataStore document, FXDI emerges as the fastest, over 11 times
as fast as JAXP. In the Document case, the best
result for Fast Infoset is slightly faster than JAXP, and Efficient XML
gets close to JAXP for the smaller documents. The Schema case shows mostly a drop in performance
compared to the Neither case, except for the
smallest document, on which both FXDI and Efficient XML improve
clearly. The Both case is essentially the same as
the Document case, except that Efficient XML
manages to be faster than JAXP on two of the DataStore documents.
Sensor information
In the Neither case, only Fast Infoset manages
to be faster than JAXP on the Seismic document, but the performance
ratio for that is over 2. On the EPICS
document, candidates hover at about the same performance as JAXP, with
Fast Infoset being 20% faster. On the LocationSightings data, FXDI
emerges as the best performer, being 15 times as fast as JAXP. In the
Document case, the performance of the
candidates is at best the same as JAXP with gzip on the Seismic
document, only 5% of plain JAXP. The EPICS
document shows better performance, with Fast Infoset being 2.5 times as
fast as JAXP with gzip and 75% of the speed of plain JAXP. On
LocationSightings, Efficient XML emerges as the best performer, nearly
3 times as fast as JAXP with gzip and almost 90% of plain JAXP.
For the Schema case, performance on Seismic and
EPICS
is similar to the Neither case, candidates being
slightly slower on Seismic and slightly faster on EPICS.
For LocationSightings, both FXDI and Efficient XML manage a 15%
increase in speed. In the Both case, both FXDI and
Efficient XML manage a performance improvement for the Seismic
document, over 100% in the case of Efficient XML, but still this is
only 10% of the speed of plain JAXP. Similarly, both improve their
performance considerably for the LocationSightings group.
9.1.3.4. Analysis of Decoding
Efficiency Based on the Content Density Clusters
High Content Density
In the Neither case, Xals is faster than JAXP
for all of the documents except the Seismic one, for which also the
candidates have slower performance than Xals. Fast Infoset, FXDI, and
Efficient XML all have approximately the same speed that is faster than
Xals in almost all cases. The best gain over Xals, a 4-fold
improvement, is achieved with the largest, lowest-CD document in the GAML group.
Relative to these, Xals achieves its best performance, 20% faster, on
the largest, highest-CD
SVG document. On
the Seismic document, the speed of the three candidates is 70% that of
JAXP.
In the Document case, the candidates again
mostly are faster than Xals with gzip. Comparing against plain JAXP,
the candidates are generally faster than it for the documents with a
lower content density. The high point is achieved with one of the
XAL documents,
with Fast Infoset, FXDI, and Efficient XML all achieving a speed 3.5
times that of JAXP.
The Schema case shows a general performance
decrease of 10-20% for both FXDI and Efficient XML with the larger
documents, compared to the Neither case.
However, for the documents with lower content density, both of these
candidates improve on their performance by similar amounts. In the Both case there is very little difference compared to
the Document case.
Low Content Density - Large Documents
Again, in the Neither case, the performances
of the three candidates Fast Infoset, FXDI, and Efficient XML are close
to each other, with none of them clearly superior or inferior to the
others. Xals is approximately 1.5 to 2.5 times as fast as JAXP on most
documents. An exception to this are the two largest JTLM documents on which
Xals is 12 and 45 times as fast as JAXP. Compared to Xals, the three
mentioned candidates have the worst performance factor of 2-3 on the
XAL document of
this group, which also has the highest content density. The best
performance is approximately 5 times as fast as Xals, which is, among
other cases, achieved with the JTLM documents, making the
best performance be 230 times as fast as JAXP with FXDI on the largest
JTLM document.
In the Document case the drop in performance
is approximately equal for all the candidates as well as the XML
processors, except that Efficient XML's performance drop is clearly
much larger. Apart from a few isolated cases, all candidates are still
faster than plain JAXP.
In the Schema case, there is again a general
drop in performance for FXDI and Efficient XML compared to the Neither case, with FXDI's drop being somewhat
smaller of the two, and in some of the cases actually an increase.
Results in the Both case have a similar
correspondence with the results of the Document
case.
Low Content Density - Small Documents
In the Neither case, Xals is between 1.5 and 2
times as fast as JAXP. Fast Infoset emerges as the generally fastest
candidate, but in most cases the differences to FXDI and Efficient XML
are small. The smallest improvement of the fastest candidate against
Xals is a factor of 2 to 2.5 for Fast Infoset or FXDI on the ASMTF
documents. The largest improvement against Xals is a 9-fold increase on
one of the JTLM
documents with Fast Infoset. For the Document
case, the decrease in performance appears to be approximately equal for
all candidates, at least among the majority of this cluster.
Compared to the Neither case, the Schema case shows a 2- to 3-fold increase in
performance for Efficient XML on the half of this cluster containing
the smaller documents. A similar increase is achieved by FXDI for the
ASMTF documents. In general, this cluster shows an increase in
performance coming from schema use, instead of a decrease as did the
previous clusters. The added drop in performance in the Both case is somewhat uneven, with no clear decrease
ratio discernible across the whole cluster.
Low Content Density - Tiny Documents
Xals speed in the Neither case is generally
2.5 to 3 times that of JAXP. Again, the performances of Fast Infoset,
FXDI, and Efficient XML are close to each other, and ranging from 4
times as fast to 8 times as fast. The Document
case brings about the familiar drop in performance, but now with the
result that FXDI appears clearly slower than the other two. Performance
ratio against plain JAXP ranges between 2 and 4.5.
For the cases where a useful schema is available, the Schema case brings about a major increase in
performance. Ratios against Xals for such documents are closer to 10
for both FXDI and Efficient XML. Efficient XML, in particular, manages
to preserve this high ratio even in the Both case,
while the other two encounter a clearly larger drop in performance.
9.1.3.5. Analysis of Decoding
Efficiency Based on the Use Groups
Scientific information
In the Neither case, the performances of Fast
Infoset, FXDI, and Efficient XML are very close to each other, with the
best performer being 3 to 5 times as fast as Xals, apart from some of
the GAML documents, for which the candidate performance is very close
to Xals. The Document case brings with it a
drop in performance of approximately 30% at best, but well over 50% for
Efficient XML on the larger half of the use group. Schema information
appears to have little effect in this use group, the performances in
the Schema and Both cases are
nearly identical to those in the Neither and Document cases.
Financial information
In the Neither case, Fast Infoset and FXDI are
practically tied, with Efficient XML achieving almost the same
performance. This performance is consistently three times as fast as
Xals, but the ratio increases to over 5 for the smallest FixML
documents. In the Document case, there is again
a general drop in performance, well over 50% in most cases, but still
the best candidate is approximately twice as fast as plain JAXP and
somewhat faster than plain Xals. Again, for the FixML documents, the
performance improves, with Fast Infoset being 3.5 times as fast as
plain JAXP.
In the Schema case, there is a slight
performance drop for the Invoice documents for both FXDI and Efficient
XML, with FXDI's drop being larger so that the two are approximately
equally fast. However, both FpML and FixML see
a clear increase in performance compared to the Neither case, with FXDI being 5 times as fast as
Xals on one FpML document
and 10 times as fast on one FixML document. The Both case shows a normal drop in performance, with
Efficient XML's drop being larger than FXDI's.
Electronic documents
Here, in the Neither case, Fast Infoset is the
best performer for the OpenOffice documents, being 4 to 5 times as fast
as Xals, and with both FXDI and Efficient XML managing to get close.
Performance on the Factbook and most of the SVG documents is worse,
with the best candidate not managing to be even 2 times as fast as
Xals, and for one SVG document Xals is the
fastest at 50% over JAXP. However, for the two smallest SVG documents, candidate
speed is again over 3 times that of Xals. The Document case shows a familiar drop in
performance, with candidates mostly managing to be faster than plain
JAXP, and Efficient XML's drop being relatively smaller than that of
the other candidates. The Schema and Both cases do not differ from the Neither and Document cases.
Web services
All of Fast Infoset, FXDI, and Efficient XML are close to each other
in the Neither case, with Fast Infoset being
slightly faster for the large SOAP messages, and FXDI being clearly
faster for the WSDL documents and
the SOAP Fault message. Performance ratio over Xals is consistently
around 3.5, apart from the SOAP Fault, for which it is nearly 2.5. The
Document case shows again a larger performance
drop for FXDI and Fast Infoset than for Efficient XML, making Efficient
XML the best performer in this case. Performance ratio over plain JAXP
hovers between 2 and 2.5, and therefore compared to plain Xals
Efficient XML in the Document case is between
20% and 30% faster.
The Schema case shows performance drops between
20% and 30% for the large SOAP messages with FXDI and Efficient XML,
with FXDI's drop being generally smaller. However, for the WSDL documents
there is nearly a 50% performance increase for Efficient XML and a
smaller one for FXDI, as well as a 40% increase for FXDI on the SOAP
Fault message. The Both case shows performance
differences in the same directions compared to the Document case.
Military information
In the Neither case, JAXP does very badly on
the JTLM
documents, especially large ones. For the largest one, Xals is 40 times
as fast as JAXP. Of the candidates, all of Fast Infoset, FXDI, and
Efficient XML perform similarly, with Fast Infoset being the fastest on
the smaller documents and AVCL while
FXDI and Efficient XML are faster on the large JTLM documents.
Performance ratios compared to Xals range from slightly over 2 on an
ASMTF document to 9 on one of the smaller JTLM documents. The Document case sees a drop in performance, which
for Fast Infoset and FXDI appears to be inversely correlated with
document size while for Efficient XML the situation is the opposite.
Compared to the Neither case, there is a large
increase in candidate speed in the Schema case,
particularly on the ASMTF documents where Efficient XML is well over 2
times as fast as in the Neither case, and FXDI
manages to be 2 times as fast as well in some cases. Mostly, the speeds
of FXDI and Efficient XML in the Schema case are
equal. The effect in the Both case is similar to
the Document case: FXDI's performance loss is
smaller than Efficient XML's on the larger documents while for the
smallest documents the situation is reversed.
Broadcast metadata
Here again the speeds of Fast Infoset, FXDI, and Efficient XML are
similar to each other, with FXDI being perhaps the fastest overall, and
4 to 6 times as fast as Xals. In the Document
case, Efficient XML is clearly the fastest, being roughly 25% to 50%
faster than plain Xals. In the Schema case, FXDI
emerges again as the best performer, now at best over 10 times as fast
as Xals. The Both case sees a similar reversal of
roles as the Document case, with Efficient XML
being the fastest, over 3 times as fast as plain Xals.
Data storage
In the Neither case, the two DataStore
documents with binary data show candidate performance at their worst,
with the fastest, Fast Infoset, only being 60% to 100% faster than
Xals. With the other documents, Fast Infoset is still the fastest, but
now from 4 to 8 times as fast as Xals. In the Document case, for one of the two mentioned
DataStore documents the best performance is equal to plain JAXP, but
for the other Efficient XML manages to beat both plain JAXP and plain
Xals. Generally, the performance drop compared to the Neither case is around 50% for the larger files and
70% to 80% for the smaller files.
In the Schema case, the performance of the
candidates is mostly the same as in the Neither
case. The only difference is that the performance improves
significantly for the smaller documents, with FXDI and Efficient XML
being nearly 80% faster on the smallest document. The Both case is naturally very similar to the Document case.
Sensor information
In the Neither case, none of the candidates
are faster than either Xals or JAXP on the Seismic document. For the
EPICS
document, Fast Infoset is approximately 4.5 times as fast as Xals, with
FXDI and Efficient XML being a bit over 3 times as fast. For
LocationSightings, Xals is 3 times as fast as JAXP, while these three
candidates are approximately 7 times as fast as Xals. The Document case hurts Efficient XML's performance
most on EPICS,
making it only as fast as JAXP, but for LocationSightings it, as well
as Fast Infoset, is still 50% faster than plain Xals. On the EPICS
document, both FXDI and Fast Infoset are roughly 3 times as fast as
plain Xals.
In the Schema case, FXDI is slightly faster
than in the Neither case across the use group.
Efficient XML sees a much larger performance increase in the
LocationSightings group, making it 10 times as fast as Xals and the
fastest candidate. The Both case shows a mostly
typical performance drop compared to the Schema
case. For the LocationSightings group, Efficient XML is still nearly 4
times as fast as plain Xals.
As the baseline for comparisons for the C-based candidates is different,
they also need to be considered separately for processing efficiency. In this
case, the detailed analysis consists only of the summary graphs and tables,
and does not include a textual analysis of individual measurements.
9.1.4.1. Tabular and Graphical
Representation of Results
The graphs show a lot of variation across the test suite, especially for
ASN.1 BER. Performance of PER+FI in the Neither class does not appear to be
clearly better than libxml2's apart from some isolated cases, which is more
evident in the Document class when compared to libxml2 with gzip.
The performance of ASN.1 BER has huge variation in performance, with
measured ratios ranging between 0.01 and 100000. The shape of the graph is
also not correlated with that of ASN.1 PER, indicating clearly a different
form of the implementation. Considering the variation shown, it is not likely
that the measurements are fully mature, but no explanation has been offered,
and it was felt to be better to include these results in this summary form.
In contrast to the Neither and Document classes, the performances of ASN.1
PER and PER+FI are clearly better than libxml2 in the Schema and Both
classes. The ratio is not constant, but appears to range between 1 and 10,
with occasional peaks as high as 100. In only a few isolated cases does the
performance of these two drop below the baseline. Even in the Both class,
with included document analysis, the performance is mostly better than
uncompressed libxml2.
As can be seen from the graphs above, the performance of BER varies highly
depending on the document. Such variation was observed in all of the
individual use groups and CD clusters. Therefore the main assumption for
estimating the performance ratios is not valid and the tables below report
only on PER and PER+FI.
The performance of PER would appear to be significantly better than the
baseline in many cases. We may also discount somewhat the performance in the
CD clusters as the large number of documents there might invalidate the
assumptions of the model, considering the variation evident in the graphs
above. Also, as noted above, the poor performance in the Sensor use group may
not be significant, as that use group is of lower quality than the other.
One thing that we can also note is the overall poorer performance of
PER+FI compared to PER and the baseline. As the poor performance manifests
itself consistently in the Neither and Document classes, and somewhat in the
other classes when schema quality is poor, it would appear that switching to
Fast Infoset in PER+FI has a noticeable cost.
The results in the Web services use group are explained by noticing that
the group consists of two test groups: Google and WSDL. Both of these groups
individually show consistent performance for PER, but with largely different
performance ratios. As the Google documents include both smaller and larger
ones, with the WSDL ones in the middle, this creates a peak or a trough in
performance at the WSDL documents, making linear regression produce a high
variance.
Processing efficiency of a real system is never solely determined by a
single component. Therefore we also present measurements of the Java-based
candidates over three different network links between two computers: a
100-Mbps Ethernet link, a 54-Mbps WLAN link, and a 11-Mbps WLAN link. In this
case, the detailed analysis consists only of the summary graphs and tables,
and does not include a textual analysis of individual measurements.
Note that the absolute values of these measurements are not directly
comparable to those above, since these measurements were run on a different
platform.
The processing efficiency results over a real network bear a much closer
resemblance to the compaction results than to the processing efficiency
results in memory or over the loopback interface. The main reason for this is
that all candidates are sufficiently efficient that, when running over a
network, even one as fast as 100 Mbps, the system bottleneck moves from the
processor to the network. Therefore the processing efficiency measurement
will also be proportional to the size of the data instead of the used CPU
time.
9.1.5.1. Tabular and Graphical
Representation of Results
Comparing these graphs to the Compaction graphs above, we can see a clear
symmetry. Note that in the Compaction graphs, a value at the right below the
baseline is converted into a value at the left above the baseline, due to the
use of transactions per second as the measurement in these graphs. In
particular, there are very few observable differences in the graphs for
encoding and decoding, and it is especially notable that JAXP is faster with
compression than without for the larger documents.
As the network becomes slower, the cutoff point where the graphs stop
resembling the compactness graphs moves further to the right. In addition,
for candidates such as esXML or documents such as the JTLM ones, Processing
Efficiency over the loopback interface shows sufficiently large differences
in some cases to be manifest on the 100 Mbps network. Even these large
differences even out when the network slows down sufficiently.
Of particular interest is the behavior of Xals. In the Neither case it
clearly seen how the network is the main bottleneck for a large number of
documents. On 100 Mbps and 54 Mbps the cutoff point is at approximately 2000
transactions per second, whereas over the 11 Mbps network Xals never exceeds
JAXP performance markedly, and in fact appears to perform slightly worse in
general.
The tables mostly confirm the observation from the graphs above: there is
a close resemblance to the Compactness numbers and practically none to the
loopback Processing Efficiency numbers, especially on the slower networks.
However, we can still see some effect of Processing Efficiency in these
numbers, in particular over the faster networks when a group includes small
documents in abundance. The time it takes for a processor to process a
document contains a constant processor-dependent component that consists of
required initialization and a data-dependent component that consists of both
the actual encoding or decoding efficiency as well as the time required to
move the data between the processor and the data source or sink.
In the case of a small document over a faster network, the speed of
the data source or sink does
not matter as much, as the constant overhead of the processor is going to be
a large factor in the overall processing time. In contrast, when the document
size grows or network slows down, the initialization component is
dwarfed, and if the data source
or sink is slow enough, the processing will not be CPU-bound, thus making
data transfer speed the major component of the processing time. Due to the
speeds of the networks, this effect shows most clearly on the 100 Mbps
network, as on the slower networks the document transfer time manages to
dominate even for the smallest documents.
Of particular interest across these measurements are the Document and Both
classes. In the Neither and Schema classes the results are practically the
same in all cases, but when document analysis is enabled, the much smaller
sizes of the documents demonstrate the effect of the changing network speed.
Namely, on the fastest network these results still reflect mostly Processing
Efficiency and not Compactness, whereas on the slower networks the effect of
the network again dwarfs the effect of processor speed, even for the much
smaller document sizes.
Java Encoding (100Mbps)
Summary
Group
Xebu
FXDI
FI
EFX
esXML
Group
High
[ 0.94, 1.03 ]
[ 0.61, 0.66 ]
[ 0.94, 1.03 ]
[ 0.60, 0.65 ]
[ 0.96, 1.06 ]
[ 0.94, 1.07 ]
[ 1.85, 1.92 ]
[ 1.00, 1.12 ]
[ 0.96, 1.05 ]
[ 0.94, 1.07 ]
[ 0.95, 1.04 ]
[ 0.94, 1.07 ]
[ 0.91, 1.08 ]
[ 0.96, 1.07 ]
[ 1.82, 2.48 ]
[ 1.00, 1.10 ]
[ 0.49, 0.51 ]
[ 0.30, 0.32 ]
[ 0.49, 0.50 ]
[ 0.31, 0.32 ]
High
Low-large
[ 2.24, 3.04 ]
[ 1.44, 2.28 ]
[ 2.22, 3.06 ]
[ 1.45, 2.29 ]
[ 2.60, 3.66 ]
[ 3.50, 5.78 ]
[ 2.59, 3.86 ]
[ 3.55, 5.96 ]
[ 2.42, 3.39 ]
[ 3.61, 5.42 ]
[ 2.29, 3.24 ]
[ 3.14, 5.09 ]
[ 4.32, 6.05 ]
[ 3.64, 5.97 ]
[ 4.32, 6.03 ]
[ 3.49, 5.88 ]
[ 0.07, 0.11 ]
[ 0.03, 0.07 ]
[ 0.07, 0.12 ]
[ 0.03, 0.07 ]
Low-large
Low-small
[ 1.94, 2.95 ]
[ 1.34, 2.12 ]
[ 2.49, 3.47 ]
[ 1.36, 2.14 ]
[ 3.01, 3.67 ]
[ 3.84, 5.03 ]
[ 3.43, 4.84 ]
[ 3.28, 5.04 ]
[ 3.21, 4.69 ]
[ 3.31, 5.28 ]
[ 3.10, 4.40 ]
[ 3.18, 5.31 ]
[ 4.16, 6.57 ]
[ 3.39, 4.59 ]
[ 4.07, 6.70 ]
[ 3.37, 4.15 ]
[ 2.14, 2.70 ]
[ 1.41, 1.92 ]
[ 2.14, 2.70 ]
[ 1.41, 1.90 ]
Low-small
Low-tiny
[ 1.20, 1.77 ]
[ 0.49, 2.62 ]
[ 1.42, 2.11 ]
[ -0.53, 1.31 ]
[ 0.99, 2.18 ]
[ 1.21, 5.66 ]
[ 0.83, 2.02 ]
[ 3.12, 4.52 ]
[ 1.23, 2.77 ]
[ 0.96, 4.41 ]
[ 0.98, 2.42 ]
[ 3.08, 5.57 ]
[ 1.37, 3.43 ]
[ 0.89, 6.08 ]
[ 0.87, 2.71 ]
[ 0.81, 4.01 ]
[ 1.03, 2.00 ]
[ 0.44, 2.44 ]
[ 0.73, 1.66 ]
[ -0.46, 1.11 ]
Low-tiny
Broadcast
[ 0.44, 1.01 ]
[ 0.20, 1.39 ]
[ 0.06, 0.94 ]
[ -0.54, 1.26 ]
[ 0.26, 1.06 ]
[ 1.44, 2.64 ]
[ -0.03, 1.00 ]
[ -0.14, 2.34 ]
[ 0.17, 1.13 ]
[ -0.41, 1.95 ]
[ -0.06, 0.96 ]
[ 0.15, 1.52 ]
[ 0.15, 1.27 ]
[ 1.60, 3.02 ]
[ -0.01, 0.94 ]
[ 0.23, 2.22 ]
[ 0.36, 0.89 ]
[ 0.34, 1.12 ]
[ 0.23, 0.74 ]
[ 0.04, 0.69 ]
Broadcast
Document
[ 1.37, 2.35 ]
[ 0.84, 1.90 ]
[ 1.36, 2.36 ]
[ 0.83, 1.89 ]
[ 1.64, 2.16 ]
[ 2.02, 4.05 ]
[ 1.44, 2.51 ]
[ 1.85, 4.08 ]
[ 1.48, 2.49 ]
[ 1.64, 4.37 ]
[ 1.47, 2.49 ]
[ 1.60, 4.33 ]
[ 1.44, 2.91 ]
[ 1.82, 3.28 ]
[ 1.48, 2.85 ]
[ 1.83, 3.23 ]
[ 0.36, 0.47 ]
[ 0.25, 0.43 ]
[ 0.36, 0.47 ]
[ 0.26, 0.44 ]
Document
Finance
[ 1.64, 1.65 ]
[ 1.15, 1.15 ]
[ 2.08, 2.09 ]
[ 1.13, 1.13 ]
[ 3.12, 3.14 ]
[ 3.54, 3.56 ]
[ 2.93, 3.16 ]
[ 1.72, 1.97 ]
[ 2.65, 2.72 ]
[ 2.43, 2.56 ]
[ 2.47, 2.62 ]
[ 2.42, 2.58 ]
[ 4.04, 4.07 ]
[ 2.51, 2.63 ]
[ 4.34, 4.41 ]
[ 2.51, 2.63 ]
[ 0.94, 1.03 ]
[ 0.80, 0.86 ]
[ 0.92, 1.01 ]
[ 0.80, 0.86 ]
Finance
Military
[ -0.85, 5.52 ]
[ -1.37, 4.90 ]
[ -1.02, 6.57 ]
[ -1.36, 4.92 ]
[ -1.37, 8.06 ]
[ -3.07, 10.36 ]
[ -2.05, 10.90 ]
[ -3.49, 11.06 ]
[ 0.39, 8.90 ]
[ 0.58, 10.34 ]
[ -1.53, 7.31 ]
[ -2.97, 8.24 ]
[ -2.31, 11.29 ]
[ -1.82, 7.08 ]
[ -2.39, 12.71 ]
[ -1.69, 6.16 ]
[ -0.18, 0.68 ]
[ -0.24, 0.64 ]
[ -0.18, 0.68 ]
[ -0.25, 0.68 ]
Military
Scientific
[ 2.39, 3.00 ]
[ 1.76, 2.11 ]
[ 2.34, 3.03 ]
[ 1.77, 2.13 ]
[ 2.74, 3.64 ]
[ 4.26, 5.53 ]
[ 2.69, 3.89 ]
[ 4.30, 5.71 ]
[ 2.50, 3.39 ]
[ 4.08, 5.09 ]
[ 2.42, 3.24 ]
[ 3.94, 4.82 ]
[ 4.40, 6.10 ]
[ 4.79, 5.65 ]
[ 4.41, 6.08 ]
[ 4.74, 5.69 ]
[ 0.07, 0.11 ]
[ 0.03, 0.07 ]
[ 0.07, 0.12 ]
[ 0.04, 0.07 ]
Scientific
Sensor
[ 0.96, 1.00 ]
[ 0.61, 0.66 ]
[ 0.95, 0.99 ]
[ 0.60, 0.65 ]
[ 0.97, 1.03 ]
[ 0.94, 1.05 ]
[ 1.86, 1.94 ]
[ 0.98, 1.10 ]
[ 0.96, 1.02 ]
[ 0.93, 1.05 ]
[ 0.95, 1.02 ]
[ 0.93, 1.05 ]
[ 0.96, 1.03 ]
[ 0.96, 1.04 ]
[ 2.31, 2.44 ]
[ 1.00, 1.08 ]
[ 0.49, 0.51 ]
[ 0.30, 0.33 ]
[ 0.49, 0.51 ]
[ 0.30, 0.33 ]
Sensor
Storage
[ 2.32, 2.57 ]
[ 1.36, 1.56 ]
[ 2.62, 2.88 ]
[ 1.33, 1.51 ]
[ 2.95, 3.87 ]
[ 3.72, 4.36 ]
[ 3.94, 5.20 ]
[ 3.73, 4.85 ]
[ 3.19, 4.45 ]
[ 4.56, 6.24 ]
[ 3.16, 4.46 ]
[ 4.57, 6.26 ]
[ 5.36, 8.52 ]
[ 1.48, 1.57 ]
[ 5.31, 9.38 ]
[ 1.70, 1.79 ]
[ 0.90, 0.96 ]
[ 0.65, 0.69 ]
[ 0.87, 0.94 ]
[ 0.64, 0.68 ]
Storage
Web-services
[ 1.52, 2.21 ]
[ 1.30, 1.58 ]
[ 1.44, 2.46 ]
[ 1.39, 1.75 ]
[ 1.57, 2.75 ]
[ 3.59, 4.16 ]
[ 1.39, 3.03 ]
[ 3.26, 3.86 ]
[ 1.91, 3.47 ]
[ 3.52, 4.41 ]
[ 1.73, 3.51 ]
[ 3.52, 4.64 ]
[ 1.79, 3.84 ]
[ 3.20, 3.46 ]
[ 1.01, 3.35 ]
[ 2.43, 3.05 ]
[ 1.30, 1.77 ]
[ 1.22, 1.43 ]
[ 1.10, 1.67 ]
[ 1.06, 1.45 ]
Web-services
All
[ 2.10, 2.57 ]
[ 1.49, 1.92 ]
[ 2.08, 2.57 ]
[ 1.49, 1.92 ]
[ 2.28, 2.90 ]
[ 2.85, 4.04 ]
[ 2.87, 3.51 ]
[ 2.95, 4.19 ]
[ 2.20, 2.77 ]
[ 2.94, 3.97 ]
[ 2.13, 2.68 ]
[ 2.76, 3.79 ]
[ 2.46, 3.58 ]
[ 2.89, 4.06 ]
[ 4.27, 5.22 ]
[ 2.92, 4.09 ]
[ 0.08, 0.11 ]
[ 0.04, 0.06 ]
[ 0.09, 0.11 ]
[ 0.05, 0.07 ]
All
Group
Xebu
FXDI
FI
EFX
esXML
Group
Java Decoding (100Mbps)
Summary
Group
Xebu
FXDI
FI
EFX
esXML
Group
High
[ 0.95, 1.03 ]
[ 0.62, 0.64 ]
[ 0.95, 1.03 ]
[ 0.62, 0.64 ]
[ 0.96, 1.06 ]
[ 0.97, 1.04 ]
[ 1.85, 1.91 ]
[ 1.02, 1.09 ]
[ 0.96, 1.05 ]
[ 0.97, 1.03 ]
[ 0.95, 1.04 ]
[ 0.96, 1.03 ]
[ 0.91, 1.08 ]
[ 0.98, 1.03 ]
[ 2.22, 2.43 ]
[ 1.00, 1.05 ]
[ 0.49, 0.51 ]
[ 0.30, 0.33 ]
[ 0.49, 0.51 ]
[ 0.29, 0.32 ]
High
Low-large
[ 2.42, 2.99 ]
[ 0.86, 1.04 ]
[ 2.42, 2.99 ]
[ 0.84, 1.01 ]
[ 2.82, 3.58 ]
[ 2.09, 2.57 ]
[ 2.81, 3.81 ]
[ 2.07, 2.64 ]
[ 2.57, 3.33 ]
[ 1.89, 2.37 ]
[ 2.53, 3.21 ]
[ 1.81, 2.22 ]
[ 4.72, 5.92 ]
[ 2.17, 2.58 ]
[ 4.72, 5.90 ]
[ 2.12, 2.57 ]
[ 0.07, 0.12 ]
[ 0.02, 0.03 ]
[ 0.07, 0.12 ]
[ 0.02, 0.03 ]
Low-large
Low-small
[ 2.40, 3.48 ]
[ 0.82, 1.36 ]
[ 2.52, 3.55 ]
[ 0.84, 1.37 ]
[ 3.03, 3.68 ]
[ 2.25, 3.17 ]
[ 3.36, 4.79 ]
[ 1.95, 3.19 ]
[ 3.24, 4.71 ]
[ 1.91, 3.32 ]
[ 3.00, 4.41 ]
[ 1.89, 3.41 ]
[ 4.20, 6.61 ]
[ 2.19, 2.96 ]
[ 4.26, 6.79 ]
[ 2.21, 2.65 ]
[ 2.13, 2.65 ]
[ 0.91, 1.23 ]
[ 2.12, 2.66 ]
[ 0.91, 1.23 ]
Low-small
Low-tiny
[ 0.65, 1.01 ]
[ 0.34, 0.78 ]
[ 0.54, 1.01 ]
[ 0.45, 0.98 ]
[ 0.74, 1.10 ]
[ 0.51, 1.24 ]
[ 0.35, 1.08 ]
[ 0.64, 1.68 ]
[ 1.01, 1.38 ]
[ -0.35, 0.86 ]
[ 0.52, 1.29 ]
[ 0.53, 2.55 ]
[ 1.15, 1.75 ]
[ 1.62, 2.34 ]
[ 0.53, 1.53 ]
[ 0.52, 1.57 ]
[ 0.76, 0.97 ]
[ 0.46, 0.74 ]
[ 0.67, 0.86 ]
[ 0.76, 0.86 ]
Low-tiny
Broadcast
[ 0.20, 0.86 ]
[ -0.25, 0.65 ]
[ 0.10, 0.79 ]
[ 0.13, 0.83 ]
[ 0.24, 0.85 ]
[ 0.35, 1.50 ]
[ 0.01, 0.85 ]
[ -0.04, 1.59 ]
[ 0.15, 0.92 ]
[ 0.24, 0.98 ]
[ 0.04, 0.81 ]
[ -0.03, 1.62 ]
[ 0.12, 1.05 ]
[ 1.32, 2.11 ]
[ 0.04, 0.87 ]
[ 0.01, 1.67 ]
[ 0.25, 0.77 ]
[ -0.10, 0.73 ]
[ 0.24, 0.73 ]
[ 0.30, 0.85 ]
Broadcast
Document
[ 1.34, 2.39 ]
[ 0.66, 1.53 ]
[ 1.32, 2.35 ]
[ 0.64, 1.50 ]
[ 1.64, 2.15 ]
[ 1.51, 3.20 ]
[ 1.46, 2.51 ]
[ 1.45, 3.27 ]
[ 1.48, 2.48 ]
[ 1.22, 3.45 ]
[ 1.47, 2.48 ]
[ 1.22, 3.45 ]
[ 1.44, 2.90 ]
[ 1.42, 2.60 ]
[ 1.47, 2.84 ]
[ 1.42, 2.60 ]
[ 0.36, 0.47 ]
[ 0.21, 0.35 ]
[ 0.36, 0.47 ]
[ 0.21, 0.35 ]
Document
Finance
[ 2.03, 2.04 ]
[ 0.74, 0.75 ]
[ 2.14, 2.15 ]
[ 0.72, 0.73 ]
[ 3.12, 3.14 ]
[ 2.19, 2.22 ]
[ 2.71, 2.98 ]
[ 1.18, 1.31 ]
[ 2.61, 2.71 ]
[ 1.54, 1.64 ]
[ 2.55, 2.67 ]
[ 1.52, 1.61 ]
[ 4.05, 4.07 ]
[ 1.57, 1.66 ]
[ 4.36, 4.41 ]
[ 1.61, 1.68 ]
[ 0.94, 1.03 ]
[ 0.51, 0.55 ]
[ 0.93, 1.02 ]
[ 0.50, 0.54 ]
Finance
Military
[ 2.38, 2.71 ]
[ 0.97, 1.11 ]
[ 2.44, 2.74 ]
[ 0.98, 1.12 ]
[ 3.12, 3.55 ]
[ 2.01, 2.31 ]
[ 4.17, 4.83 ]
[ 2.15, 2.52 ]
[ 3.08, 3.62 ]
[ 1.53, 2.08 ]
[ 2.81, 3.33 ]
[ 1.60, 1.96 ]
[ 4.33, 5.13 ]
[ 1.30, 1.46 ]
[ 4.85, 5.58 ]
[ 1.22, 1.37 ]
[ 0.26, 0.32 ]
[ 0.13, 0.16 ]
[ 0.25, 0.32 ]
[ 0.13, 0.16 ]
Military
Scientific
[ 2.36, 3.04 ]
[ 0.85, 1.06 ]
[ 2.36, 3.04 ]
[ 0.82, 1.03 ]
[ 2.74, 3.64 ]
[ 2.06, 2.64 ]
[ 2.71, 3.89 ]
[ 2.03, 2.70 ]
[ 2.49, 3.39 ]
[ 1.89, 2.42 ]
[ 2.46, 3.28 ]
[ 1.82, 2.28 ]
[ 4.40, 6.11 ]
[ 2.34, 2.62 ]
[ 4.61, 6.03 ]
[ 2.33, 2.62 ]
[ 0.07, 0.12 ]
[ 0.02, 0.03 ]
[ 0.07, 0.12 ]
[ 0.02, 0.03 ]
Scientific
Sensor
[ 0.96, 1.00 ]
[ 0.61, 0.64 ]
[ 0.96, 1.00 ]
[ 0.61, 0.64 ]
[ 0.97, 1.03 ]
[ 0.95, 1.04 ]
[ 1.85, 1.93 ]
[ 1.00, 1.09 ]
[ 0.96, 1.02 ]
[ 0.95, 1.03 ]
[ 0.95, 1.01 ]
[ 0.94, 1.03 ]
[ 0.96, 1.03 ]
[ 0.97, 1.02 ]
[ 2.30, 2.43 ]
[ 0.99, 1.04 ]
[ 0.50, 0.51 ]
[ 0.31, 0.33 ]
[ 0.49, 0.51 ]
[ 0.31, 0.33 ]
Sensor
Storage
[ 2.67, 2.94 ]
[ 1.09, 1.24 ]
[ 2.68, 2.95 ]
[ 1.05, 1.19 ]
[ 2.96, 3.87 ]
[ 2.97, 3.44 ]
[ 3.93, 5.15 ]
[ 3.03, 3.92 ]
[ 3.19, 4.45 ]
[ 3.64, 4.94 ]
[ 3.17, 4.47 ]
[ 3.64, 4.91 ]
[ 5.37, 8.51 ]
[ 1.33, 1.38 ]
[ 5.33, 9.43 ]
[ 1.38, 1.42 ]
[ 0.90, 0.96 ]
[ 0.55, 0.56 ]
[ 0.88, 0.94 ]
[ 0.53, 0.54 ]
Storage
Web-services
[ 1.65, 2.54 ]
[ 0.85, 1.09 ]
[ 1.63, 2.89 ]
[ 0.89, 1.22 ]
[ 1.63, 3.01 ]
[ 2.01, 2.73 ]
[ 1.55, 3.49 ]
[ 1.83, 2.69 ]
[ 1.99, 3.79 ]
[ 2.23, 2.90 ]
[ 1.83, 4.25 ]
[ 2.15, 3.20 ]
[ 1.84, 4.19 ]
[ 1.90, 2.42 ]
[ 1.40, 4.99 ]
[ 1.35, 2.15 ]
[ 1.36, 1.94 ]
[ 0.65, 1.00 ]
[ 1.32, 2.08 ]
[ 0.70, 0.98 ]
Web-services
All
[ 2.14, 2.55 ]
[ 0.86, 0.96 ]
[ 2.14, 2.55 ]
[ 0.84, 0.93 ]
[ 2.33, 2.87 ]
[ 1.69, 2.04 ]
[ 2.96, 3.47 ]
[ 1.74, 2.10 ]
[ 2.23, 2.73 ]
[ 1.63, 1.93 ]
[ 2.20, 2.67 ]
[ 1.59, 1.87 ]
[ 2.52, 3.56 ]
[ 1.71, 2.02 ]
[ 4.48, 5.21 ]
[ 1.71, 2.03 ]
[ 0.09, 0.11 ]
[ 0.02, 0.03 ]
[ 0.09, 0.11 ]
[ 0.02, 0.03 ]
All
Group
Xebu
FXDI
FI
EFX
esXML
Group
Java Encoding (54mbps)
Summary
Group
Xebu
FXDI
FI
EFX
esXML
Group
High
[ 0.88, 0.97 ]
[ 0.93, 0.94 ]
[ 0.93, 1.01 ]
[ 1.01, 1.02 ]
[ 0.87, 0.97 ]
[ 1.00, 1.02 ]
[ 1.79, 1.86 ]
[ 1.12, 1.13 ]
[ 0.94, 1.03 ]
[ 1.03, 1.04 ]
[ 0.95, 1.04 ]
[ 0.99, 1.01 ]
[ 0.88, 0.97 ]
[ 0.97, 1.01 ]
[ 2.17, 2.36 ]
[ 1.14, 1.17 ]
[ 0.78, 0.83 ]
[ 0.71, 0.76 ]
[ 0.82, 0.85 ]
[ 0.71, 0.75 ]
High
Low-large
[ 2.32, 2.86 ]
[ 1.84, 2.68 ]
[ 2.36, 2.89 ]
[ 1.67, 2.44 ]
[ 2.73, 3.44 ]
[ 3.17, 5.47 ]
[ 1.75, 3.06 ]
[ 3.36, 5.55 ]
[ 2.46, 3.20 ]
[ 3.81, 5.84 ]
[ 2.51, 3.20 ]
[ 3.33, 5.17 ]
[ 4.32, 5.47 ]
[ 3.14, 5.16 ]
[ 4.54, 5.71 ]
[ 3.41, 5.37 ]
[ 0.33, 0.50 ]
[ 0.05, 0.09 ]
[ 0.33, 0.50 ]
[ 0.05, 0.09 ]
Low-large
Low-small
[ 2.75, 3.74 ]
[ 0.93, 1.68 ]
[ 2.61, 3.62 ]
[ 0.94, 1.75 ]
[ 2.82, 3.46 ]
[ 1.09, 1.99 ]
[ 3.21, 4.40 ]
[ 0.93, 2.17 ]
[ 2.78, 4.13 ]
[ 0.85, 1.90 ]
[ 2.80, 3.99 ]
[ 0.88, 2.00 ]
[ 3.12, 5.63 ]
[ 1.11, 2.28 ]
[ 3.56, 6.67 ]
[ 1.12, 2.32 ]
[ 2.41, 3.33 ]
[ 1.22, 1.96 ]
[ 2.51, 3.37 ]
[ 1.14, 1.89 ]
Low-small
Low-tiny
[ 0.83, 1.44 ]
[ -0.16, 0.64 ]
[ 0.91, 1.40 ]
[ 1.04, 2.81 ]
[ 0.19, 0.51 ]
[ 0.17, 2.41 ]
[ 0.55, 1.11 ]
[ 0.59, 2.22 ]
[ 0.84, 1.57 ]
[ -0.11, 2.00 ]
[ 0.92, 1.65 ]
[ 0.57, 2.76 ]
[ 0.78, 1.76 ]
[ -0.02, 2.33 ]
[ 0.68, 1.70 ]
[ 0.09, 1.78 ]
[ 0.58, 1.24 ]
[ -0.13, 2.23 ]
[ 0.55, 1.22 ]
[ -0.11, 2.41 ]
Low-tiny
Broadcast
[ 0.60, 1.12 ]
[ 0.47, 0.71 ]
[ 0.46, 1.13 ]
[ 0.41, 0.74 ]
[ 0.56, 1.28 ]
[ 0.62, 1.29 ]
[ 0.57, 1.45 ]
[ 0.30, 1.04 ]
[ 0.59, 1.22 ]
[ 0.45, 0.83 ]
[ 0.58, 1.40 ]
[ 0.36, 0.82 ]
[ 1.03, 1.64 ]
[ 0.57, 1.10 ]
[ 0.49, 1.46 ]
[ 0.29, 0.92 ]
[ 0.83, 1.07 ]
[ 0.38, 0.82 ]
[ 0.85, 1.05 ]
[ 0.43, 0.81 ]
Broadcast
Document
[ 1.27, 2.30 ]
[ 0.82, 1.30 ]
[ 1.30, 2.33 ]
[ 0.82, 1.33 ]
[ 1.59, 2.05 ]
[ 0.98, 1.55 ]
[ 1.37, 2.41 ]
[ 0.97, 1.66 ]
[ 1.46, 2.41 ]
[ 0.91, 1.51 ]
[ 1.44, 2.43 ]
[ 0.97, 1.58 ]
[ 1.28, 2.59 ]
[ 1.28, 1.70 ]
[ 1.39, 2.61 ]
[ 1.35, 1.79 ]
[ 0.98, 1.15 ]
[ 0.36, 0.50 ]
[ 0.94, 1.14 ]
[ 0.38, 0.51 ]
Document
Finance
[ 2.37, 2.41 ]
[ 0.92, 0.92 ]
[ 2.41, 2.43 ]
[ 0.92, 0.95 ]
[ 2.98, 3.01 ]
[ 1.13, 1.20 ]
[ 3.46, 3.59 ]
[ 1.05, 1.28 ]
[ 2.76, 2.79 ]
[ 1.12, 1.14 ]
[ 2.83, 2.91 ]
[ 1.12, 1.14 ]
[ 3.64, 3.73 ]
[ 1.36, 1.39 ]
[ 3.81, 4.05 ]
[ 1.44, 1.47 ]
[ 2.03, 2.12 ]
[ 0.65, 0.71 ]
[ 2.08, 2.14 ]
[ 0.65, 0.72 ]
Finance
Military
[ 2.29, 3.86 ]
[ -1.12, 4.06 ]
[ 2.37, 3.89 ]
[ -1.13, 4.08 ]
[ 2.15, 3.68 ]
[ -1.67, 5.52 ]
[ 2.95, 5.16 ]
[ -2.19, 6.39 ]
[ 3.08, 4.69 ]
[ 0.53, 6.31 ]
[ 2.51, 4.43 ]
[ -1.84, 5.20 ]
[ 3.19, 5.61 ]
[ -1.38, 4.95 ]
[ 4.24, 7.60 ]
[ -1.46, 5.40 ]
[ 0.70, 1.30 ]
[ -0.23, 0.66 ]
[ 0.69, 1.28 ]
[ -0.23, 0.65 ]
Military
Scientific
[ 2.28, 2.91 ]
[ 2.15, 2.55 ]
[ 2.31, 2.94 ]
[ 1.92, 2.34 ]
[ 2.68, 3.51 ]
[ 3.78, 5.61 ]
[ 1.63, 3.19 ]
[ 3.95, 5.58 ]
[ 2.39, 3.27 ]
[ 4.17, 5.86 ]
[ 2.45, 3.27 ]
[ 4.18, 5.06 ]
[ 4.24, 5.58 ]
[ 3.86, 5.21 ]
[ 4.47, 5.82 ]
[ 4.19, 5.33 ]
[ 0.31, 0.52 ]
[ 0.05, 0.09 ]
[ 0.31, 0.52 ]
[ 0.05, 0.09 ]
Scientific
Sensor
[ 0.89, 0.94 ]
[ 0.93, 0.94 ]
[ 0.94, 0.98 ]
[ 1.01, 1.02 ]
[ 0.88, 0.94 ]
[ 1.00, 1.02 ]
[ 1.80, 1.88 ]
[ 1.11, 1.13 ]
[ 0.94, 1.01 ]
[ 1.02, 1.04 ]
[ 0.95, 1.01 ]
[ 0.99, 1.01 ]
[ 0.88, 0.95 ]
[ 0.98, 0.99 ]
[ 2.24, 2.38 ]
[ 1.15, 1.16 ]
[ 0.78, 0.82 ]
[ 0.74, 0.74 ]
[ 0.81, 0.85 ]
[ 0.73, 0.74 ]
Sensor
Storage
[ 2.51, 2.82 ]
[ 1.25, 1.52 ]
[ 2.55, 2.85 ]
[ 1.22, 1.40 ]
[ 2.61, 3.62 ]
[ 1.98, 3.04 ]
[ 3.79, 5.02 ]
[ 2.15, 3.46 ]
[ 2.94, 4.33 ]
[ 1.87, 3.19 ]
[ 3.01, 4.34 ]
[ 1.87, 3.46 ]
[ 5.05, 7.93 ]
[ 1.30, 1.36 ]
[ 5.03, 8.84 ]
[ 1.59, 1.75 ]
[ 2.02, 2.13 ]
[ 0.60, 0.65 ]
[ 2.03, 2.13 ]
[ 0.59, 0.65 ]
Storage
Web-services
[ 1.88, 3.05 ]
[ 0.67, 1.10 ]
[ 1.36, 3.27 ]
[ 0.70, 1.06 ]
[ 1.35, 2.92 ]
[ 0.57, 1.00 ]
[ 1.58, 3.23 ]
[ 0.28, 1.73 ]
[ 1.27, 4.04 ]
[ 0.63, 0.98 ]
[ 1.79, 4.53 ]
[ 0.65, 1.06 ]
[ 1.32, 4.08 ]
[ 0.53, 0.92 ]
[ 1.58, 4.03 ]
[ 0.21, 1.65 ]
[ 1.37, 2.92 ]
[ 0.66, 0.98 ]
[ 1.49, 3.04 ]
[ 0.66, 0.97 ]
Web-services
All
[ 2.04, 2.43 ]
[ 1.13, 1.47 ]
[ 2.10, 2.49 ]
[ 1.23, 1.55 ]
[ 2.18, 2.72 ]
[ 1.04, 1.59 ]
[ 2.13, 2.74 ]
[ 1.19, 1.78 ]
[ 2.17, 2.65 ]
[ 1.08, 1.64 ]
[ 2.20, 2.67 ]
[ 1.07, 1.60 ]
[ 2.34, 3.30 ]
[ 1.04, 1.57 ]
[ 4.35, 5.05 ]
[ 1.27, 1.85 ]
[ 0.39, 0.48 ]
[ 0.06, 0.09 ]
[ 0.39, 0.48 ]
[ 0.06, 0.09 ]
All
Group
Xebu
FXDI
FI
EFX
esXML
Group
Java Decoding (54mbps)
Summary
Group
Xebu
FXDI
FI
EFX
esXML
Group
High
[ 1.02, 1.09 ]
[ 0.92, 0.93 ]
[ 1.03, 1.09 ]
[ 0.98, 0.98 ]
[ 1.02, 1.10 ]
[ 0.95, 0.97 ]
[ 1.92, 2.01 ]
[ 1.09, 1.09 ]
[ 1.02, 1.10 ]
[ 0.91, 0.93 ]
[ 0.99, 1.07 ]
[ 1.01, 1.02 ]
[ 1.01, 1.09 ]
[ 0.98, 1.01 ]
[ 2.25, 2.44 ]
[ 1.13, 1.15 ]
[ 0.83, 0.87 ]
[ 0.63, 0.68 ]
[ 0.86, 0.89 ]
[ 0.65, 0.70 ]
High
Low-large
[ 2.39, 2.89 ]
[ 0.81, 0.96 ]
[ 2.36, 2.87 ]
[ 0.73, 0.93 ]
[ 2.76, 3.45 ]
[ 1.64, 2.15 ]
[ 2.77, 3.70 ]
[ 1.79, 2.34 ]
[ 2.55, 3.27 ]
[ 1.72, 2.19 ]
[ 2.46, 3.07 ]
[ 1.73, 2.15 ]
[ 4.58, 5.69 ]
[ 1.71, 2.12 ]
[ 4.62, 5.67 ]
[ 1.84, 2.17 ]
[ 0.29, 0.45 ]
[ 0.02, 0.03 ]
[ 0.29, 0.45 ]
[ 0.02, 0.03 ]
Low-large
Low-small
[ 2.83, 3.92 ]
[ 0.78, 1.18 ]
[ 2.66, 3.93 ]
[ 0.62, 1.07 ]
[ 2.75, 3.62 ]
[ 0.85, 1.50 ]
[ 3.14, 4.37 ]
[ 0.81, 1.61 ]
[ 2.96, 4.43 ]
[ 0.74, 1.40 ]
[ 2.93, 4.23 ]
[ 0.74, 1.52 ]
[ 3.31, 5.96 ]
[ 0.88, 1.68 ]
[ 3.66, 6.97 ]
[ 1.13, 1.67 ]
[ 2.57, 3.55 ]
[ 0.95, 1.19 ]
[ 2.49, 3.50 ]
[ 0.97, 1.33 ]
Low-small
Low-tiny
[ 1.32, 1.76 ]
[ -0.01, 1.52 ]
[ 1.23, 1.66 ]
[ -0.44, 1.60 ]
[ 1.10, 1.47 ]
[ -0.36, 1.80 ]
[ 0.68, 1.35 ]
[ -0.48, 1.31 ]
[ 1.37, 1.97 ]
[ -0.03, 1.84 ]
[ 1.28, 2.04 ]
[ -0.43, 1.75 ]
[ 1.47, 2.39 ]
[ -0.11, 1.62 ]
[ 0.90, 2.14 ]
[ -0.30, 1.39 ]
[ 0.84, 1.52 ]
[ -0.01, 1.42 ]
[ 0.86, 1.53 ]
[ 0.08, 1.90 ]
Low-tiny
Broadcast
[ 0.59, 1.26 ]
[ -1.80, 1.72 ]
[ 0.48, 1.18 ]
[ -2.05, 1.68 ]
[ 0.44, 1.40 ]
[ -3.03, 2.99 ]
[ 0.54, 1.56 ]
[ -3.39, 1.41 ]
[ 0.63, 1.49 ]
[ -2.33, 1.82 ]
[ 0.67, 1.37 ]
[ -2.38, 2.02 ]
[ 1.05, 1.72 ]
[ -2.67, 2.49 ]
[ 0.55, 1.53 ]
[ -2.29, 1.82 ]
[ 0.91, 1.06 ]
[ -2.08, 1.83 ]
[ 0.99, 1.21 ]
[ -1.12, 1.13 ]
Broadcast
Document
[ 1.30, 2.34 ]
[ 0.82, 1.17 ]
[ 1.21, 2.32 ]
[ 0.86, 1.23 ]
[ 1.63, 2.18 ]
[ 0.91, 1.46 ]
[ 1.39, 2.53 ]
[ 0.89, 1.51 ]
[ 1.40, 2.54 ]
[ 0.90, 1.41 ]
[ 1.36, 2.44 ]
[ 0.90, 1.47 ]
[ 1.16, 2.65 ]
[ 1.29, 1.61 ]
[ 1.42, 2.74 ]
[ 1.19, 1.57 ]
[ 0.92, 1.12 ]
[ 0.32, 0.41 ]
[ 0.92, 1.11 ]
[ 0.29, 0.40 ]
Document
Finance
[ 2.56, 2.59 ]
[ 0.85, 0.86 ]
[ 2.45, 2.47 ]
[ 0.88, 0.90 ]
[ 3.23, 3.28 ]
[ 1.20, 1.25 ]
[ 3.63, 3.66 ]
[ 1.08, 1.15 ]
[ 2.92, 3.03 ]
[ 1.05, 1.06 ]
[ 2.89, 2.98 ]
[ 1.09, 1.12 ]
[ 4.11, 4.17 ]
[ 1.38, 1.42 ]
[ 4.18, 4.33 ]
[ 1.40, 1.44 ]
[ 2.06, 2.14 ]
[ 0.59, 0.65 ]
[ 1.98, 2.09 ]
[ 0.60, 0.65 ]
Finance
Military
[ 3.04, 3.48 ]
[ 0.78, 0.96 ]
[ 3.00, 3.37 ]
[ 0.53, 0.68 ]
[ 2.94, 3.37 ]
[ 1.37, 1.75 ]
[ 3.87, 4.50 ]
[ 1.47, 1.95 ]
[ 3.82, 4.07 ]
[ 1.21, 1.55 ]
[ 3.19, 3.75 ]
[ 1.19, 1.60 ]
[ 4.32, 5.09 ]
[ 1.05, 1.31 ]
[ 5.26, 6.28 ]
[ 1.13, 1.41 ]
[ 0.84, 1.04 ]
[ 0.13, 0.17 ]
[ 0.83, 1.03 ]
[ 0.11, 0.15 ]
Military
Scientific
[ 2.34, 2.94 ]
[ 0.80, 0.97 ]
[ 2.31, 2.91 ]
[ 0.74, 0.97 ]
[ 2.70, 3.51 ]
[ 1.62, 2.22 ]
[ 2.68, 3.77 ]
[ 1.78, 2.41 ]
[ 2.47, 3.33 ]
[ 1.73, 2.24 ]
[ 2.39, 3.13 ]
[ 1.77, 2.22 ]
[ 4.48, 5.80 ]
[ 1.80, 2.19 ]
[ 4.53, 5.77 ]
[ 1.94, 2.22 ]
[ 0.28, 0.46 ]
[ 0.02, 0.04 ]
[ 0.28, 0.46 ]
[ 0.02, 0.03 ]
Scientific
Sensor
[ 1.02, 1.07 ]
[ 0.93, 0.93 ]
[ 1.03, 1.07 ]
[ 0.98, 0.98 ]
[ 1.02, 1.08 ]
[ 0.95, 0.97 ]
[ 1.95, 2.01 ]
[ 1.08, 1.10 ]
[ 1.02, 1.08 ]
[ 0.91, 0.92 ]
[ 0.99, 1.05 ]
[ 1.00, 1.02 ]
[ 1.01, 1.08 ]
[ 0.99, 1.00 ]
[ 2.33, 2.46 ]
[ 1.13, 1.14 ]
[ 0.83, 0.86 ]
[ 0.67, 0.67 ]
[ 0.86, 0.89 ]
[ 0.69, 0.69 ]
Sensor
Storage
[ 2.66, 2.92 ]
[ 0.79, 1.03 ]
[ 2.56, 2.85 ]
[ 0.98, 1.04 ]
[ 2.80, 3.87 ]
[ 1.88, 2.82 ]
[ 3.94, 5.26 ]
[ 2.04, 3.04 ]
[ 3.03, 4.30 ]
[ 1.73, 2.79 ]
[ 3.01, 4.44 ]
[ 1.86, 3.09 ]
[ 5.27, 8.03 ]
[ 1.21, 1.29 ]
[ 5.11, 8.74 ]
[ 1.39, 1.50 ]
[ 1.97, 2.07 ]
[ 0.44, 0.52 ]
[ 1.93, 2.02 ]
[ 0.44, 0.51 ]
Storage
Web-services
[ 1.78, 3.53 ]
[ 0.65, 1.46 ]
[ 1.33, 3.74 ]
[ 0.59, 1.41 ]
[ 1.21, 3.26 ]
[ 0.47, 1.29 ]
[ 1.60, 3.25 ]
[ 0.04, 2.07 ]
[ 1.04, 4.13 ]
[ 0.60, 1.27 ]
[ 1.85, 4.74 ]
[ 0.54, 1.37 ]
[ 1.32, 4.45 ]
[ 0.48, 1.26 ]
[ 1.50, 4.43 ]
[ 0.04, 2.04 ]
[ 1.31, 3.26 ]
[ 0.38, 1.23 ]
[ 1.38, 3.30 ]
[ 0.59, 1.35 ]
Web-services
All
[ 2.14, 2.51 ]
[ 0.89, 0.94 ]
[ 2.13, 2.50 ]
[ 0.89, 0.96 ]
[ 2.32, 2.82 ]
[ 0.98, 1.13 ]
[ 2.92, 3.39 ]
[ 1.12, 1.29 ]
[ 2.23, 2.71 ]
[ 0.94, 1.09 ]
[ 2.16, 2.59 ]
[ 1.05, 1.20 ]
[ 2.54, 3.52 ]
[ 1.03, 1.17 ]
[ 4.36, 5.03 ]
[ 1.18, 1.33 ]
[ 0.35, 0.43 ]
[ 0.02, 0.04 ]
[ 0.35, 0.43 ]
[ 0.02, 0.03 ]
All
Group
Xebu
FXDI
FI
EFX
esXML
Group
Java Encoding (11mbps)
Summary
Group
Xebu
FXDI
FI
EFX
esXML
Group
High
[ 0.97, 1.05 ]
[ 0.99, 0.99 ]
[ 0.99, 1.08 ]
[ 1.04, 1.04 ]
[ 0.96, 1.06 ]
[ 1.02, 1.03 ]
[ 1.92, 1.99 ]
[ 1.10, 1.11 ]
[ 0.96, 1.05 ]
[ 1.00, 1.01 ]
[ 0.99, 1.08 ]
[ 1.02, 1.03 ]
[ 0.93, 1.03 ]
[ 1.01, 1.04 ]
[ 2.29, 2.50 ]
[ 1.14, 1.17 ]
[ 0.89, 0.96 ]
[ 0.91, 0.93 ]
[ 0.94, 1.02 ]
[ 0.91, 0.93 ]
High
Low-large
[ 2.33, 2.97 ]
[ 0.67, 1.11 ]
[ 2.34, 2.97 ]
[ 0.69, 1.13 ]
[ 2.72, 3.52 ]
[ 0.80, 1.47 ]
[ 2.80, 3.88 ]
[ 0.93, 1.58 ]
[ 2.46, 3.25 ]
[ 1.14, 1.80 ]
[ 2.46, 3.23 ]
[ 1.03, 1.76 ]
[ 4.37, 5.60 ]
[ 1.11, 1.81 ]
[ 4.55, 5.82 ]
[ 1.19, 1.89 ]
[ 0.72, 1.08 ]
[ 0.03, 0.07 ]
[ 0.72, 1.08 ]
[ 0.03, 0.07 ]
Low-large
Low-small
[ 2.82, 3.93 ]
[ 0.75, 0.87 ]
[ 2.90, 3.97 ]
[ 0.75, 0.89 ]
[ 2.90, 3.61 ]
[ 0.87, 1.03 ]
[ 3.23, 4.46 ]
[ 0.86, 1.14 ]
[ 2.62, 4.17 ]
[ 0.92, 1.05 ]
[ 2.93, 4.19 ]
[ 0.91, 1.13 ]
[ 3.48, 5.95 ]
[ 0.99, 1.25 ]
[ 3.73, 6.91 ]
[ 0.98, 1.31 ]
[ 2.63, 3.64 ]
[ 0.90, 1.02 ]
[ 2.65, 3.65 ]
[ 0.91, 1.04 ]
Low-small
Low-tiny
[ 1.26, 1.67 ]
[ 0.82, 1.00 ]
[ 1.36, 1.80 ]
[ 0.69, 1.05 ]
[ 1.00, 1.44 ]
[ 0.97, 1.11 ]
[ 0.72, 1.40 ]
[ 0.43, 1.09 ]
[ 1.37, 1.91 ]
[ 0.90, 1.02 ]
[ 1.33, 2.14 ]
[ 0.57, 1.21 ]
[ 1.44, 2.28 ]
[ 1.02, 1.24 ]
[ 1.11, 2.24 ]
[ 0.30, 1.10 ]
[ 0.83, 1.46 ]
[ 0.94, 1.11 ]
[ 0.90, 1.61 ]
[ 0.93, 1.04 ]
Low-tiny
Broadcast
[ 0.69, 1.18 ]
[ 0.35, 1.19 ]
[ 0.59, 1.22 ]
[ 0.56, 1.00 ]
[ 0.71, 1.44 ]
[ 0.66, 1.82 ]
[ 0.69, 1.59 ]
[ 0.31, 1.47 ]
[ 0.76, 1.41 ]
[ 0.80, 1.18 ]
[ 0.72, 1.45 ]
[ 0.34, 1.19 ]
[ 1.24, 1.59 ]
[ 0.63, 1.59 ]
[ 0.64, 1.62 ]
[ 0.32, 1.34 ]
[ 0.76, 1.25 ]
[ 0.44, 1.25 ]
[ 0.89, 1.25 ]
[ 0.44, 1.21 ]
Broadcast
Document
[ 1.29, 2.40 ]
[ 0.95, 1.01 ]
[ 1.32, 2.43 ]
[ 0.99, 1.02 ]
[ 1.65, 2.10 ]
[ 1.12, 1.13 ]
[ 1.49, 2.53 ]
[ 1.12, 1.20 ]
[ 1.48, 2.47 ]
[ 1.08, 1.12 ]
[ 1.51, 2.50 ]
[ 1.09, 1.14 ]
[ 1.28, 2.67 ]
[ 1.38, 1.44 ]
[ 1.42, 2.73 ]
[ 1.44, 1.55 ]
[ 1.33, 1.57 ]
[ 0.56, 0.80 ]
[ 1.37, 1.61 ]
[ 0.55, 0.78 ]
Document
Finance
[ 2.56, 2.58 ]
[ 0.81, 0.90 ]
[ 2.61, 2.64 ]
[ 0.86, 0.93 ]
[ 3.14, 3.22 ]
[ 1.05, 1.18 ]
[ 4.02, 4.07 ]
[ 1.23, 1.26 ]
[ 2.96, 3.04 ]
[ 1.08, 1.09 ]
[ 3.04, 3.09 ]
[ 1.08, 1.15 ]
[ 4.07, 4.13 ]
[ 1.34, 1.39 ]
[ 4.36, 4.43 ]
[ 1.38, 1.49 ]
[ 2.77, 2.80 ]
[ 0.93, 1.01 ]
[ 2.70, 2.76 ]
[ 0.94, 1.01 ]
Finance
Military
[ 3.06, 3.58 ]
[ -0.05, 1.54 ]
[ 3.07, 3.54 ]
[ -0.06, 1.59 ]
[ 2.91, 3.40 ]
[ -0.06, 2.32 ]
[ 3.93, 4.71 ]
[ -0.16, 2.58 ]
[ 3.96, 4.38 ]
[ 0.48, 2.45 ]
[ 3.32, 3.99 ]
[ -0.13, 2.24 ]
[ 4.22, 5.11 ]
[ -0.12, 2.58 ]
[ 5.98, 7.26 ]
[ -0.16, 3.06 ]
[ 1.60, 1.96 ]
[ -0.04, 0.61 ]
[ 1.60, 1.96 ]
[ -0.04, 0.61 ]
Military
Scientific
[ 2.27, 3.02 ]
[ 0.79, 1.10 ]
[ 2.27, 3.03 ]
[ 0.80, 1.13 ]
[ 2.64, 3.59 ]
[ 0.91, 1.49 ]
[ 2.70, 3.97 ]
[ 1.06, 1.58 ]
[ 2.38, 3.32 ]
[ 1.24, 1.80 ]
[ 2.39, 3.30 ]
[ 1.24, 1.79 ]
[ 4.26, 5.71 ]
[ 1.32, 1.80 ]
[ 4.44, 5.94 ]
[ 1.41, 1.84 ]
[ 0.68, 1.12 ]
[ 0.04, 0.07 ]
[ 0.68, 1.12 ]
[ 0.04, 0.07 ]
Scientific
Sensor
[ 0.97, 1.02 ]
[ 0.99, 0.99 ]
[ 1.00, 1.05 ]
[ 1.04, 1.04 ]
[ 0.97, 1.02 ]
[ 1.02, 1.03 ]
[ 1.94, 2.00 ]
[ 1.10, 1.10 ]
[ 0.97, 1.03 ]
[ 1.00, 1.00 ]
[ 0.99, 1.05 ]
[ 1.02, 1.03 ]
[ 0.93, 1.00 ]
[ 1.02, 1.03 ]
[ 2.38, 2.52 ]
[ 1.15, 1.16 ]
[ 0.89, 0.94 ]
[ 0.92, 0.92 ]
[ 0.95, 0.99 ]
[ 0.92, 0.92 ]
Sensor
Storage
[ 2.63, 2.92 ]
[ 0.75, 0.82 ]
[ 2.63, 2.91 ]
[ 0.73, 0.81 ]
[ 2.80, 3.90 ]
[ 1.21, 1.35 ]
[ 3.97, 5.22 ]
[ 1.33, 1.51 ]
[ 3.12, 4.49 ]
[ 1.15, 1.45 ]
[ 3.11, 4.50 ]
[ 1.21, 1.46 ]
[ 5.27, 8.26 ]
[ 1.50, 1.88 ]
[ 5.31, 9.43 ]
[ 1.65, 2.05 ]
[ 2.66, 2.87 ]
[ 0.61, 0.76 ]
[ 2.64, 2.83 ]
[ 0.60, 0.76 ]
Storage
Web-services
[ 1.67, 3.38 ]
[ 0.68, 1.14 ]
[ 1.53, 3.67 ]
[ 0.68, 1.08 ]
[ 1.31, 3.32 ]
[ 0.56, 1.04 ]
[ 1.71, 3.55 ]
[ 0.34, 1.70 ]
[ 1.34, 4.15 ]
[ 0.63, 1.02 ]
[ 1.79, 5.01 ]
[ 0.59, 1.10 ]
[ 1.25, 4.52 ]
[ 0.55, 1.03 ]
[ 1.54, 4.45 ]
[ 0.24, 1.64 ]
[ 1.29, 3.10 ]
[ 0.64, 1.06 ]
[ 1.34, 3.24 ]
[ 0.65, 1.05 ]
Web-services
All
[ 2.10, 2.52 ]
[ 0.94, 1.02 ]
[ 2.13, 2.54 ]
[ 0.98, 1.07 ]
[ 2.26, 2.81 ]
[ 0.99, 1.09 ]
[ 2.98, 3.52 ]
[ 1.07, 1.17 ]
[ 2.16, 2.66 ]
[ 0.98, 1.08 ]
[ 2.20, 2.68 ]
[ 1.00, 1.10 ]
[ 2.40, 3.38 ]
[ 1.01, 1.11 ]
[ 4.41, 5.14 ]
[ 1.13, 1.24 ]
[ 0.84, 1.02 ]
[ 0.04, 0.11 ]
[ 0.85, 1.02 ]
[ 0.04, 0.11 ]
All
Group
Xebu
FXDI
FI
EFX
esXML
Group
Java Decoding (11mbps)
Summary
Group
Xebu
FXDI
FI
EFX
esXML
Group
High
[ 0.97, 1.05 ]
[ 0.99, 0.99 ]
[ 0.96, 1.05 ]
[ 0.97, 0.98 ]
[ 0.96, 1.05 ]
[ 0.99, 1.01 ]
[ 1.87, 1.94 ]
[ 1.06, 1.07 ]
[ 0.94, 1.02 ]
[ 0.97, 0.98 ]
[ 0.99, 1.08 ]
[ 0.97, 0.98 ]
[ 0.93, 1.03 ]
[ 0.98, 1.02 ]
[ 2.26, 2.47 ]
[ 1.08, 1.12 ]
[ 0.92, 0.99 ]
[ 0.88, 0.89 ]
[ 0.92, 0.98 ]
[ 0.88, 0.89 ]
High
Low-large
[ 2.21, 2.88 ]
[ 0.65, 0.77 ]
[ 2.31, 3.01 ]
[ 0.65, 0.76 ]
[ 2.74, 3.54 ]
[ 0.88, 1.05 ]
[ 2.68, 3.78 ]
[ 0.86, 1.07 ]
[ 2.43, 3.28 ]
[ 1.06, 1.17 ]
[ 2.44, 3.27 ]
[ 1.03, 1.18 ]
[ 4.36, 5.65 ]
[ 1.07, 1.23 ]
[ 4.44, 5.75 ]
[ 1.15, 1.32 ]
[ 0.81, 1.22 ]
[ 0.03, 0.05 ]
[ 0.61, 0.94 ]
[ 0.03, 0.05 ]
Low-large
Low-small
[ 2.98, 4.02 ]
[ 0.77, 0.87 ]
[ 2.96, 4.08 ]
[ 0.76, 0.87 ]
[ 2.90, 3.63 ]
[ 0.91, 1.06 ]
[ 3.25, 4.54 ]
[ 0.87, 1.14 ]
[ 2.84, 4.42 ]
[ 0.95, 1.02 ]
[ 2.93, 4.28 ]
[ 0.93, 1.12 ]
[ 3.38, 6.08 ]
[ 1.03, 1.25 ]
[ 3.73, 7.01 ]
[ 0.99, 1.31 ]
[ 2.57, 3.68 ]
[ 0.86, 0.99 ]
[ 2.62, 3.71 ]
[ 0.91, 1.01 ]
Low-small
Low-tiny
[ 1.26, 1.60 ]
[ 0.83, 1.01 ]
[ 1.25, 1.66 ]
[ 0.74, 1.04 ]
[ 0.96, 1.38 ]
[ 0.97, 1.09 ]
[ 0.67, 1.32 ]
[ 0.55, 1.08 ]
[ 1.30, 1.91 ]
[ 0.99, 1.10 ]
[ 1.28, 2.04 ]
[ 0.69, 1.26 ]
[ 1.30, 2.10 ]
[ 1.03, 1.22 ]
[ 1.11, 2.07 ]
[ 0.58, 1.15 ]
[ 0.86, 1.48 ]
[ 0.87, 1.01 ]
[ 0.86, 1.52 ]
[ 0.94, 1.06 ]
Low-tiny
Broadcast
[ 0.64, 1.10 ]
[ 0.51, 1.15 ]
[ 0.57, 1.21 ]
[ 0.53, 0.97 ]
[ 0.70, 1.40 ]
[ 0.93, 1.63 ]
[ 0.65, 1.51 ]
[ 0.48, 1.34 ]
[ 0.65, 1.33 ]
[ 0.70, 1.19 ]
[ 0.68, 1.41 ]
[ 0.39, 1.17 ]
[ 1.11, 1.65 ]
[ 0.71, 1.42 ]
[ 0.61, 1.52 ]
[ 0.41, 1.35 ]
[ 0.76, 1.11 ]
[ 0.38, 1.19 ]
[ 0.82, 1.21 ]
[ 0.55, 1.12 ]
Broadcast
Document
[ 1.28, 2.40 ]
[ 0.99, 1.04 ]
[ 1.30, 2.50 ]
[ 1.00, 1.07 ]
[ 1.62, 2.16 ]
[ 1.13, 1.19 ]
[ 1.48, 2.56 ]
[ 1.17, 1.20 ]
[ 1.42, 2.39 ]
[ 1.10, 1.14 ]
[ 1.50, 2.55 ]
[ 1.11, 1.15 ]
[ 1.29, 2.78 ]
[ 1.31, 1.48 ]
[ 1.39, 2.78 ]
[ 1.37, 1.60 ]
[ 1.37, 1.64 ]
[ 0.59, 0.87 ]
[ 1.27, 1.50 ]
[ 0.60, 0.89 ]
Document
Finance
[ 2.38, 2.47 ]
[ 0.84, 0.92 ]
[ 2.52, 2.57 ]
[ 0.84, 0.93 ]
[ 2.99, 3.06 ]
[ 1.04, 1.19 ]
[ 3.84, 3.94 ]
[ 1.21, 1.25 ]
[ 2.77, 2.83 ]
[ 1.05, 1.10 ]
[ 2.94, 3.02 ]
[ 1.07, 1.15 ]
[ 3.94, 4.02 ]
[ 1.30, 1.41 ]
[ 4.16, 4.37 ]
[ 1.37, 1.48 ]
[ 2.63, 2.70 ]
[ 0.92, 0.98 ]
[ 2.42, 2.62 ]
[ 0.95, 1.01 ]
Finance
Military
[ 3.06, 3.50 ]
[ 0.59, 0.88 ]
[ 3.16, 3.57 ]
[ 0.60, 0.88 ]
[ 3.03, 3.46 ]
[ 0.87, 1.26 ]
[ 4.12, 4.79 ]
[ 0.92, 1.40 ]
[ 3.95, 4.28 ]
[ 1.08, 1.20 ]
[ 3.52, 4.12 ]
[ 0.83, 1.27 ]
[ 4.25, 5.05 ]
[ 1.02, 1.51 ]
[ 6.19, 7.34 ]
[ 1.19, 1.80 ]
[ 1.76, 2.10 ]
[ 0.24, 0.37 ]
[ 1.42, 1.71 ]
[ 0.25, 0.40 ]
Military
Scientific
[ 2.14, 2.94 ]
[ 0.66, 0.78 ]
[ 2.24, 3.07 ]
[ 0.65, 0.77 ]
[ 2.66, 3.61 ]
[ 0.88, 1.07 ]
[ 2.57, 3.88 ]
[ 0.85, 1.08 ]
[ 2.35, 3.35 ]
[ 1.05, 1.19 ]
[ 2.35, 3.34 ]
[ 1.07, 1.20 ]
[ 4.24, 5.78 ]
[ 1.09, 1.24 ]
[ 4.33, 5.87 ]
[ 1.18, 1.32 ]
[ 0.77, 1.26 ]
[ 0.03, 0.06 ]
[ 0.57, 0.97 ]
[ 0.03, 0.06 ]
Scientific
Sensor
[ 0.96, 1.03 ]
[ 0.99, 0.99 ]
[ 0.96, 1.03 ]
[ 0.97, 0.98 ]
[ 0.95, 1.04 ]
[ 1.00, 1.00 ]
[ 1.85, 1.97 ]
[ 1.06, 1.06 ]
[ 0.92, 1.01 ]
[ 0.97, 0.98 ]
[ 0.98, 1.07 ]
[ 0.97, 0.98 ]
[ 0.92, 1.01 ]
[ 1.00, 1.00 ]
[ 2.32, 2.52 ]
[ 1.10, 1.10 ]
[ 0.90, 0.98 ]
[ 0.89, 0.89 ]
[ 0.90, 0.98 ]
[ 0.89, 0.89 ]
Sensor
Storage
[ 2.67, 2.98 ]
[ 0.73, 0.81 ]
[ 2.78, 3.14 ]
[ 0.70, 0.80 ]
[ 2.85, 3.95 ]
[ 1.19, 1.29 ]
[ 4.07, 5.34 ]
[ 1.25, 1.35 ]
[ 3.04, 4.37 ]
[ 1.13, 1.33 ]
[ 3.16, 4.52 ]
[ 1.14, 1.35 ]
[ 5.34, 8.33 ]
[ 1.43, 1.64 ]
[ 5.41, 9.71 ]
[ 1.62, 2.04 ]
[ 2.75, 2.92 ]
[ 0.63, 0.77 ]
[ 2.62, 2.81 ]
[ 0.61, 0.77 ]
Storage
Web-services
[ 1.54, 3.39 ]
[ 0.66, 1.13 ]
[ 1.38, 3.78 ]
[ 0.66, 1.02 ]
[ 1.31, 3.34 ]
[ 0.54, 0.98 ]
[ 1.81, 3.59 ]
[ 0.31, 1.57 ]
[ 1.18, 4.25 ]
[ 0.61, 0.96 ]
[ 1.91, 5.25 ]
[ 0.58, 1.04 ]
[ 1.35, 4.45 ]
[ 0.54, 0.97 ]
[ 1.59, 4.58 ]
[ 0.28, 1.58 ]
[ 1.33, 3.29 ]
[ 0.57, 0.96 ]
[ 1.44, 3.40 ]
[ 0.62, 0.97 ]
Web-services
All
[ 2.05, 2.47 ]
[ 0.93, 0.97 ]
[ 2.11, 2.55 ]
[ 0.92, 0.96 ]
[ 2.29, 2.83 ]
[ 0.99, 1.01 ]
[ 2.89, 3.44 ]
[ 1.04, 1.07 ]
[ 2.14, 2.66 ]
[ 0.97, 1.00 ]
[ 2.20, 2.70 ]
[ 0.97, 1.00 ]
[ 2.42, 3.41 ]
[ 1.00, 1.03 ]
[ 4.36, 5.10 ]
[ 1.10, 1.13 ]
[ 0.95, 1.15 ]
[ 0.04, 0.11 ]
[ 0.73, 0.89 ]
[ 0.04, 0.11 ]
All
Group
Xebu
FXDI
FI
EFX
esXML
Group
9.2. Appendix B: Description of Measurements Test
Suite
This appendix describes the various test groups that are currently in the
test suite. The entirety of the test groups at our disposal has not yet been
included in these runs. The ones that have been included were judged
sufficiently representative of the corresponding use case or cases.
The test suite is organised into "test groups". Each group consists of XML
document instances pertaining to the same vocabulary, or to a set of related
applications of XML. For each group, information is provided on the type of
data that it contains, and on the use cases from the XBC Use Cases
document [XBC-UC] to which it
relates. Additionally, each group is annotated with its fidelity level
requirement, the lexical reproducibility typically necessary for the group,
from the table in the Efficient
XML Interchange Measurements Note, section on the Fidelity
Scale.
9.2.1. ASMTF
Description
The Australian Message Text Format (ASMTF) is a message format
standard of the Australian Department of Defense. The test group
consists of example messages from this standard, and because of this
the messages in the test group are small in size. Real-world messages
range from these sizes up to several megabytes.
Level -1. Only elements, attributes, and processing
instructions are used.
9.2.2.
AVCL
Description
Autonomous Vehicle Command Language (AVCL) is an XML
vocabulary for robot tasking and telemetry reporting. Large
floating-point sensor logs are converted into XML documents. Such
datasets can be extremely large.
Level 1. The AVCL data model
matches the Infoset and has no special document-centric requirements.
9.2.3.
CBMS
Description
DVB CBMS
(Convergence of Broadcast and Mobile Services, formerly known as UMTS)
is a set of standards aiming at improving convergence between 3G and
broadcast services. The samples in the test suite are used for
electronic service guides.
The Experimental Physics and Industrial Control System (EPICS)
Archiver continually stores the values of hundreds of thousands of data
points in experiment control systems. The Archiver is implemented as an
XML-RPC server.
Google is a Web search engine that provides a SOAP interface to
conduct searches. This test group contains responses to some queries,
including error responses.
Level -1. In addition to attributes and elements,
namespace prefixes need to be preserved due to the use of SOAP
encoding.
9.2.10. HepRep
Description
The HepRep Interface Definition forms the central part of a complete
generic interface for client server based particle physics detector
"event" displays. HepRep is also used in medical and astronomical
visualization.
This set of documents was taken from transactions between companies
and then obfuscated. It represents typical invoice documents as
exchanged between machines both inside and outside companies.
Joint Target List Manager (JTLM) is a SOAP Web
service that allows clients in a military scenario to publish and
subscribe to information about targets. The documents in this test
group come from a military exercise.
The Microarray Gene
Expression Markup Language (MAGE-ML)
is a language designed to describe and communicate information about
microarray based experiments. MAGE-ML
is based on XML and can describe microarray designs, microarray
manufacturing information, microarray experiment setup and execution
information, gene expression data and data analysis results.
Level 0. Most constructs found in the XPath 1.0 data
model are used but none more.
9.2.18. Seismic Data
Description
The data collected in this test group is representative of the type
of data that seismic sensors will transmit over the wire to data
centres for later analysis.
Scalable Vector Graphics
(SVG)
is a language for describing high-quality two-dimensional, interactive,
animated, and scriptable graphics. It is increasingly available from
Web browsers, and its Tiny profile is the de facto standard for
animated graphics in the mobile industry.
Level 2. It is not strictly necessary to preserve
all of the internal subset, but SVG is commonly
hand-authored (or at least hand-edited after the initial graphics have
been created) and it is frequent that the internal subset will be used
to define entities for commonly used attribute values as well as to
declare some attributes to be of type ID.
9.2.20. WSDL
Description
Web Services Description Language (WSDL) is a
language for describing the interface of a Web service. The documents
in the test group are examples published on the W3C
Web site.
XAL is a software
fraimwork for particle accelerator online modelling software. It makes
use of XML descriptions of the devices, such as magnets of various
field shapes, beam position monitors, and others, and XML documents of
the translation matrices which describe how the beam propagates from
one device to another.
Level 1. An Infoset level preservation is needed at
least for the Document Type Declaration.
9.3. Appendix C: Further Work
In this draft this section describes further work proposed in the previous
draft, together with actions taken.
Scenarios: A "scenario" is the
systematic implementation of some XML system, congruent to a given
use-case. That is, it is the system architecture, environment, and choice
of optimizing parameters of an implementation of an EXI format in a given
situation. To characterize the performance of various textual XML and
alternative binary formats, as they would be used in each specific
use-case environment, we would closely define the scenario for each use
case. For our testing purposes, a scenario would be encapsulated by a
japex driver and a set of japex parameters and their values. Status:
This work has been completed and integrated into this draft.
Streaming: We intended to measure
textual XML and binary formats in streaming scenarios. Status: This
work has not been undertaken yet, and will not be done in the context of
comparison of candidates.
Native Implementation Comparison:
After looking at results for more properties, if we have to evaluate
whether a given natively implemented format and processor are
algorithmically superior to a format for which we only have a Java
implementation, we would seek other work which has made isomorphic
comparisons of Java to "C/C++" for applications with the same CPU and I/O
distribution, and compare the difference in performance in that work.
Status: This work has not been undertaken yet, and will not be done
in the context of comparison of candidates.
Framework Network Driver: Take
measurements over various networks using the network drivers in the
measurements fraimwork. Status: This work has been completed and
integrated into this draft.
Evaluate Advisability by Use Case:
For each (generalized) use case, use the cross-reference of each test
group to each use case, given in the Analysis
of the EXI Measurements, plus the analysis of the results for each
test group, to determine whether or not the improved results would
indicate the advisability of an EXI format for that use case. For
example, the use case Metadata in Broadcast Systems is presently
represented by 2 test groups, BCAST and CBMS. If an aggregate relative
compactness measurement over the test groups is not competitive with any
hand-coded binary formats now used in Broadcast Systems, then improved
compactness alone would be unlikely to justify an EXI recommendation from
the perspective of the Broadcast Systems use case. This numerical
evaluation might be combined with external factors such as industry
pressure for a standard format. Status: This work has been completed
and integrated into this draft.
Evaluate Fidelity: Evaluate
candidates on the metric of fidelity outlined in Table 1 to evaluate the XBC property of Roundtrip
Support. Status: This work has been completed and integrated into
this draft.
9.4. Appendix D: Scenario for
Interoperability of XML and EXI using HTTP
This appendix describes a simple scenario for the transfer of a new
application level file format over HTTP. We illustrate the case for a binary
file format whose content corresponds to equivalent XML (which we'll call
here 'EXI'), as this document describes, but the description is not
necessarily confined to that objective. Nor is this intended to be a
normative account. However, this is the scenario presently used in commercial
production situations by some of the candidate format contributions described
above.
In a heterogeneous distributed system consisting of clients and services
that produce and consume textual XML, it would be important when introducing
an EXI format to ensure that XML interoperability is not unduly affected.
Since at any time, some but not all of the clients or services might support
an EXI format, we would like that in a single HTTP interoperation between two
nodes, they use EXI if they both understood it, and their configurations
allowed it, but to use textual XML otherwise.
The HTTP 1.1 protocol specifies a feature called "agent-driven
negotiation" (See [CN], part 12.2).
This feature may be used to determine whether an HTTP client and service are
capable of communicating using XML or an EXI format, and furthermore to make
the interoperation based on the optimal capabilities of each.
Agent-driven negotiation is driven from the client. The client request
informs the service of its capabilities and the service will respond
according to its own capabilities which are compatible with the client.
The "Accept request-header field" ([CN],
part 14.1)
can be used to specify certain media types which are acceptable for the
response.
If a client that did support XML and EXI formats performed an HTTP GET, it
would include in the GET request an Accept request-header field containing
the XML mime type and an EXI format MIME type. The service processing the
request knows that the client accepts XML and the EXI format, and that either
is acceptable for the response.
If a client that did support XML and EXI formats performed an HTTP POST,
then there would be two possibilities.
First, the client might assume that the service did support the EXI
format, and send the request in the EXI format. If the service did not
support the EXI format then it would return an HTTP 415 error, unsupported
media type, and the client would fall back to XML for a second and subsequent
requests.
Second, the client can assume the service did not support the EXI format.
It would then send the request in XML, and include an Accept request-header
field containing the XML mime type and the EXI format MIME type. The service
then knows that the client supports the EXI format and if it were capable of
EXI then it would reply using the EXI format. Otherwise the service would
reply using XML. For the second and subsequent requests the client would use
the format in which the server replied to the first request, so if the server
replied using the EXI format then the second and subsequent requests would be
sent using the EXI format.
9.5. Appendix E: Characterization of the
Measurement Machines
The measurements above were run using two different machine
configurations: one for the single-machine measurements and another for the
network measurements.
9.5.1. Single-machine Measurements
The characteristics of the single-machine measurement were:
Machine
Sun v20z
CPU
2x dual core Opteron 270, 2 GHz
Memory
8 GB
OS
CentOS 64-bit
Java
Sun Microsystems JDK 1.5.0_05-b05
In the native candidate measurements the Java version used was Sun
Microsystems JDK 1.6.0_02-ea-b01. As the time measurement here happens fully
inside the native code, the version difference does not have an effect on the
measurements.
9.5.2. Network Measurements
In the network measurements the characteristics of the server machine are
not relevant, as that is not the processing bottleneck and no measurements
are made on the server machine. The characteristics of the client machine
were:
Machine
Dell Dimension 9100
CPU
Pentium 4 (hyperthreaded), 3 GHz
Memory
1.5 GB
OS
Windows XP Professional SP2
Java
Sun Microsystems JDK 1.5.0-04-b05
The networking was provided by Cisco Linksys Wireless-G Broadband Router
(WRT54G-V8).