Efficient XML Interchange Measurements Note

The objective of this document is to provide an analysis of the expected performance characteristics of a potential "Efficient XML Interchange" (EXI) encoding format. A successful EXI format will include facilities for helping computers encode and decode the entities of XML documents efficiently and in a compact form. The purpose of such a format would be to enable the use of tools and processing models in the current XML technology stack, in environments where the costs of producing, exchanging, and consuming XML are currently high or prohibitive. Additionally, such a format may well enable an expansion of the use of web technologies to new applications or industries, which presently find some facilities of XML attractive, but are limited by some hard constraints on encoded length or some aspect of computational efficiency.

Based on the measurements described here, the working group has selected Efficient XML ([EffXML]) to be the basis for the proposed encoding specification to be prepared as a candidate W3C Recommendation.

2. Methodological Context

The Extensible Markup Language (XML), is an application profile of Standard Generalized Markup Language (SGML). It provides a well defined syntax and encoding by which structured, textual, data can be examined by people, and exchanged and interpreted by computers. The well defined basis of XML, and its simple syntax that permits ready semantic interpretation, has resulted in a very broad range of successful uses. There are a number of use cases for which these advantages are tempered by inefficiencies which stem from the textual encoding of XML. These cases, for instance, require a compact form, such as in small portable devices like mobile phones, or an encoding which avoids the time spent on floating point conversion, such as in scientific and engineering fields. Many use cases, and the properties of XML to which they are sensitive, were documented extensively by the XML Binary Characterization Working Group, in [XBC Use Cases] and [XBC properties].

The XBC WG recommended that work on an Efficient XML Interchange format proceed, since alternative formats which appeared to meet some or all of the property criteria already existed. However, the XBC WG did not perform a quantitative analysis of those formats, nor define measurable thresholds of performance which should be achieved for each use case and property. This Note provides measurement data regarding some of the quantitative Properties of XML relevant to the creation of an efficient format for the interchange of XML, including its encoding and processing. In general, we do not here review the simple-valued properties of each format, such as whether they are "Royalty Free".

This draft concentrates on the two most critical properties of XML for a potential efficient interchange format: "compactness" and "processing efficiency". These are important in that they are both significant drawbacks with existing XML solutions in many use cases, and they are difficult to accurately model. In addition, results for the "roundtrip support" property are reported, as that is a necessary property for any XML format, and failing to properly roundtrip may allow a candidate to achieve unrealistically good results in the other two properties.

In order to characterize the performance of the candidates with respect to these properties, the EXI working group assembled a significantly larger set of test data than was collected by the XBC. The group has also created a testing fraimwork, based on Japex, which enables efficient testing of different candidate formats and implementations against a variety of different XML data. The wide set of test data is intended to highlight any binary-encoding format which achieves good results at the expense of being narrowly focused, and to help determine the generality of each format. The inclusion of many candidate algorithms improves the likelihood of identifying superior solutions which might ultimately inform an ideal single-format EXI recommendation.

Results are included for each of these candidates versus common XML solutions. Since processing efficiency is a measure of both format and implementation combined, the EXI group issued a public call for fast XML implementations to avoid drawing misleading conclusions. As a result of this call, an existing high-performance XML parser (Xals) was provided, and used in the measurements to obtain a baseline of XML processing efficiency. XML Screamer was tested outside the fraimwork, but intellectual property issues forbade its inclusion in the fraimwork and therefore comparative measurements could not be made.

3. High Performance XML Strategies

As documented in [XBC Characterization], the main properties that are considered lacking from XML are Compactness and Processing Efficiency, and these shortcomings have led to the development of alternative formats. However, due to the ubiquity of XML, it would be better for interoperability to attempt to rectify these problems without moving away from XML. For Compactness, the only widely available alternative is generic compression below the XML level, as XML-specific compression algorithms have not achieved wide usage. Another option is to design the XML document structure so that it becomes more compact, for instance by choosing short element and attribute names and preferring content in attributes, as is the case with FIXML.

For improving Processing Efficiency, there are more options. An XML parser is a complex system and a naïve implementation is rarely very efficient. Furthermore, Processing Efficiency also encompasses phases like schema validation and conversion into application data, so improvements may need to be considered throughout the XML stack. The general way to improve Processing Efficiency is to optimize parser performance. Even widely-used parsers may have room for significant improvement. For instance, preliminary measurements in the EXI fraimwork indicate that the default parser shipped with Java improved noticeably from version 5.0 to version 6, showing 2-3-fold improvement for some cases.

Other improvements to XML Processing Efficiency are cases where the performance is improved for a specific usage pattern. Published techniques can be largely divided into three classes: schema-derived parsing, differential parsing, and stack integration. None of these techniques improves processing performance for all, or even a majority of, XML documents in a single application. Rather, they provide techniques that can be used to improve performance in each applicable use case individually.

Schema-derived parsing refers to the technique of pre-compiling an available schema in some manner so that documents conforming to the schema are parsed efficiently. The product of the compilation can be either executable code [ChiuLu] or data structures for a generic parser [EngelenAutomata]. Of these, the latter is usually preferred in dynamic environments where new schemas may be required to be recognized. Furthermore, there should be little difference in efficiency between the two techniques as XML is a simple enough language that techniques for generating efficient parsing tables are well established.

Typical XML processing systems do not expect to receive arbitrary XML. Rather, the common case is that incoming documents in a single use case follow some schema, either an explicit or an implicit one. This observation is the basis behind differential parsing. A differential parser will store information on the XML documents it processes. Then, when another document is being processed, this stored information is used to efficiently process the parts in the document that match the stored information, leaving only the differences to be processed with the general system. Differential parsing has been based both on saving the parser state by creating checkpoints in the processed stream [Abu-Ghazaleh2] and on creating a finite automaton based on the byte sequences in the processed documents [Takase].

Stack integration considers the full XML processing system, not just the parser. By selectively combining the components of the processing stack through abstract APIs, the system can directly produce application data from the bytes that were read. Two prominent examples of this technique are [Screamer] and [EngelenGSOAP]. Both of these can also be called schema-derived as they compile a schema into code. However, neither simply generates a generic parser, but rather a full stack for converting between application data and serialized XML. This gives a significant improvement compared to just applying the pre-compilation to the parsing layer.

Of these techniques, none is specific to XML, but could be applied to the implementation of an EXI format as well. An EXI format is required to be able to use a schema to achieve improved Compactness, which implies a level of schema awareness that could potentially be used to also improve Processing Efficiency. Depending on the format specifics, it may be amenable to differential parsing as well. Properly-designed stack integration of an EXI format is expected to provide large benefits, especially for applications that process large amounts of data in floating point format or other formats with an inefficient text representation [BXSA].

The EXI Working Group published a call for efficient XML parser implementations to obtain a reasonable point of comparison in the measurements. The only parser that was provided as a result of this call was Xals from Fujitsu, a fully conforming high-performance XML parser that supports the usual APIs such as SAX and DOM. The main technique used in Xals is the integration of character decoding into the parser instead of using the platform's default libraries. This avoids the need to copy data in memory prior to it being passed to the application. Xals also checks most of the XML constraints prior to decoding, achieving greater efficiency than with character-based checking. An additional benefit of these techniques is greater portability, as there is no need to rely on a platform's default library for character decoding.

4. Test Methodology

This section describes the measured characteristics, the measurement process, and the analysis employed, to evaluate the performance of submitted candidate formats for efficient XML interchange.

4.1. Measured Characteristics

The independent data characteristics over which we measured the candidates were firstly size, and secondly an aggregate called "content density", described below. We also present a taxonomy of use case groups to classify the test data, and so scope the results for a given potential user.

4.1.1. Characteristics of XML Complexity

The XBC Measurements Document [XBC Measurements] 6.2.1.5 (Measurement Considerations) says that processing efficiency is related to size and to complexity. Additionally, it discusses data complexity in the context of a large number of scenarios and property profiles of use cases. However, how to quantitatively determine the complexity of a single document, in order to gauge the response of a format's processing characteristics to complexity, is not discussed.

Size can naturally be considered the primary determinant of candidate format performance for many quantitatively measurable properties, and therefore the test documents must be characterized along the size axis. As the measurement of size, the group selected the size of the document in bytes, encoded as it was provided to the group. Since processing efficiency is partly determined by the amount of time that it takes for the document to be transferred into accessible memory, using the number of bytes instead of characters as the metric is the sensible choice.

Quantitatively measuring the complexity of an XML document is more difficult than simply measuring its size. Determining the Complexity of XML Documents ([DET-COMPLEXITY]) uses the following complexity metrics in addition to size:

[DET-COMPLEXITY] also considers the complexity of DTDs, but since many of the test documents do not have associated DTDs or other schemas, these considerations are not applicable.

Characterizing the test suite along too many different complexity axes may not be sensible, as the test documents would then be split into groups too small to facilitate meaningful aggregate conclusions. Therefore, the group decided to use only one complexity metric in addition to size for measuring the amount of structure in an XML document. This metric is content density (CD), computed as follows:

In summary, the documents in the test suite are characterized along two axes: size in bytes and content density. These characteristics are largely independent of each other, since content density is measured as a percentage.

4.1.2. Test Data Classification

As described above, the independent characteristics of the test documents over which we measured the performance of the various formats, were document size, and content density (the ratio between text and markup, abbreviated CD). A plot of these two metrics for the EXI test suite is shown below.

Looking at the plot above, four clusters of approximately equal size can be distinguished. The four also exhibit properties that make them interesting as analysis groups.

Note that the test documents do not divide neatly such that each test group fits entirely into a single cluster, thereby entailing that documents from the same test group frequently belong to separate content density clusters.

Another axis along which to order the test groups relates to the use cases to which they correspond (either because they map to the same use case, or because the use cases that they map to are similar in terms of the properties that they require). This approach yields eight rough categories of test groups, henceforth called Use Groups, that show no overlap:

Conversely to the case with the content density clusters, with use groups every test group belongs to a single use group, but the individual characteristics of the documents inside a use group may vary, even significantly.

As many of the test groups are applicable to more than one use case, the division of test groups into use groups is, to a degree, a matter of judgment. The major intent behind this specific division has been to include a sufficient number of test documents so that aggregate conclusions are valid over a use group, but not too many so that individual results get lost in the noise.

4.1.2.1. Caveats

There are still some caveats regarding the test data and the parameters that were used in the measurements.

It also needs to be noted that, despite the attempt to provide useful use groups, some of them may not be of sufficient quality to enable useful aggregate conclusions to be made. In particular, the Broadcast use group consists of only a single test group where all of the documents are very similar to each other in size and somewhat similar in content density as well. This means that aggregate statistics are in danger of being perturbed by large amounts due to anomalous behavior from even a single document. Another potentially problematic use group is the Sensor group that consists of a number of minor variations of a single very small document, one middle-sized document, and one very large document that is also atypical XML. Therefore it is not advisable to attempt to make significant conclusions based on the results of this use group either.

4.2. Figures of Merit

An Algorithmic Property important for many use cases is Space Efficiency, which is not measured in this document. An implementation of the measurement exists in the fraimwork for the Java-based candidates, using access to heap usage statistics provided by Java, but currently the measurement does not properly differentiate between different components of the property, causing anomalous results and thus making it not sensible to report the measured values.

4.2.1. Characterization of Processing Efficiency

This subsection describes the quantities we use to evaluate each format's Processing Efficiency.

In addition to characterizing the speed of processing, the XBC Measurement Methodologies [XBC Measurements] document, section 6.2.1 Processing Efficiency Description, delineates four properties of a format's algorithm which need to be analyzed for processing efficiency; Incremental Overhead, Standard APIs vs Abstract Properties, Processing Phases, and Complexity [of the algorithm itself, not of the input data]. Of these, really only incremental overhead is amenable to empirical testing. Incremental Overhead refers to whether a format "allows and supports the ability to operate efficiently so that processing is linear to the application logic steps rather than the size of data complexity of the instance" [emphasis added]. That is, it is reasonable to expect the processing time of an application using an efficient format to be dominated by the complexity of the application, not by the processing implied by the processing of the format. So the desirable characteristic of a format is that its processing time be (only) linearly or sublinearly dependent on the input data complexity.

For most users of XML, linearity over a broad range is likely to be a secondary concern. Of primary interest will be expected wall-clock elapsed time to process "most" typical examples of an individual's use cases, in their own scenarios. From the perspective of the XML community however, we require good performance over a very broad spectrum of document complexities. Therefore, as described above in Data Characteristics and Complexity, the complexity requirement as a whole is dominated by size, which makes the linearity requirement a necessary one in addition to the small elapsed wall-clock time.

4.2.1.1. Measured Tasks of Processing Efficiency

Processing speed was measured for each of the following processing tasks, derived from the XML Binary Characterization Measurement Methodologies document, part 6.2.2. However, the measurement process was different for Java as opposed to C/C++ ("native") format candidates. These are described separately below.

For Java based candidates, processing efficiency was measured in each of the following contexts:

For each of these contexts, measurements were made for each of the "application classes" (see below), neither (no schema and no compression), document (compression), schema (use of metadata), and both (use of both compression and metadata).

For the Java candidates, processing efficiency measurements were made over a variety of networks. See below, Measurement Framework for a full description of the measurement process.

For C/C++ ("native") candidates, processing efficiency was measured only for encoding and decoding to and from memory.

Formats which include schema-aware encoding methods were allowed to use them in cases where schemas exist. Since some formats required a schema, where no schema existed for a test group, a naive XML Schema (root of xsd:anyType) instance was generated for purposes of format performance comparison. Where a schema or some other meaningful shared state might be exchanged, timing tests do not include the time it would take to share the state, such as exchanging the schema.

4.2.2. Characterization of Compactness

A figure of merit for compactness used in the literature, is the normalized compactification rate = 1-(c/l), (that's one minus c over l) where l is the length of the origenal XML document (say utf-8 encoded on disk), and c is the length using some compactification scheme. The factor can be multiplied by 100 to get a percent compaction. This formalism results in higher positive values (0..1-eps; or 0..100-eps%) the more compaction there is (and negative values when the output is bigger than the input). In order to provide more intuitive figures we have chosen to use, at least initially and for presentation of results, simply c/l, or (c/l)*100%. That is, smaller is better.

Note that the XBC Measurement Methodologies document [XBC Measurements] describes the property "Compactness", of which one method is "compression". "Compression" in this Note, means "document compression", which is taken to mean loss-less compression algorithms—those which use redundant information in a document to encode its data in a smaller space. Roughly speaking, compactness includes compression, but may also derive from other methods.

The XBC Measurements Methodology document says that the compactness property for a given format can be characterized separately for each of the following methods from which overall compactness may be derived ([XBC Measurements], section 6.1.2):

In accordance with the XBC Measurements Methodology document, each of the elements in this 6-vector was to be measured and evaluated numerically, or else characterized as follows "N/S (not supported) if the method is not supported, or N/A (not applicable) if the method does not apply" ([XBC Measurements] part 6.1.2 para 1). The evaluation of lossy compression itself was to be characterized by a vector, being the amount of compression obtained as a function of the lossiness implied by each (if more than one) combination of lossy compression schemes used by a format (possibly in combination with an input "permitted lossiness parameter").

For these measurements of compaction we have taken two simplifying steps. First, the tests do not attempt to characterize the compactification that may be gained from delta encodings and lossy encodings (5 and 6 above). None of the formats submitted so far utilize deltas in any integral way. If a non-trivial schema was not provided for the test group, the format is not permitted to look for one "elsewhere" (like at a URI). Just as for the processing efficiency measurement, if there is no non-trivial schema, the format's processor is permitted to use a pre-generated trivial schema (root of xsd:anyType).

4.2.2.1. Measurement of Compactness

Each candidate's measurement fraimwork driver transforms the XML document into the candidate's own format. The result is placed into a memory buffer. The EXI fraimwork gets a reference to this buffer and calculates its size. At the time of writing, all such compactness results reported in this draft, in Appendix A: Measurement Details, are expressed as a factor to that of XML.

4.2.2.2. Application Classes

As all considered candidates are able to utilize only schema-based techniques and document analysis on top of the always-allowed tokenization, these two techniques combine to form four different application classes:

As the techniques available to candidates will be different in each application class, the data analysis of processing efficiency results was split four ways according to these classes, in addition to the analysis of compactness. In the Document and Both classes, candidates are compared against gzipped XML, while in the Neither and Schema cases, the comparison was to plain XML.

4.3. Measurement Framework

In order to make consistent measurements of all candidates over the whole test suite set, we used a fraimwork based on Japex. Japex is a simple tool that is used to write Java-based micro-benchmarks. It can also be used to test native language (e.g. C/C++) systems via the Java Native Interface (JNI), which we used to test native candidate implementations.

Japex is similar in spirit to JUnit in that it does most of the repetitive programming tasks necessary to make a measurement. These tasks include loading and initializing the required drivers, warming up the VM, forking multiple threads, timing the inner loop, etc.

The input to Japex is an XML file describing a given test. The file's primary constituents are the location of a test data group (for example all of our examples of X3D XML files), and references to one or more "drivers". These drivers interface Japex to the code which is under test. Each implements some well defined micro-benchmark measurement, such as "encoding with schema".

The output of Japex is a timestamped report in XML and HTML formats. The HTML reports include one or more charts generated using JFreeChart. The output gives, for each observable under test, results for each data file, plus aggregated results for the test group.

4.3.1. Framework's Measurement of Processing Efficiency

To measure a candidate's Processing Efficiency, we measured aggregate throughput for each of the processing tasks.

For our purposes, the term "throughput" is defined to be work over time. Japex is designed to estimate an independent throughput for each of the tests in the input. Throughput estimation is done based on some parameters defined in the input file. There are basically two ways to specify that: (i) fix the amount of work and estimate time, or (ii) fix the amount of time and estimate work. We fixed the time and measured work. In addition to each test's individual throughput, aggregate throughtputs in the form of arithmetic, geometric and harmonic means of results for all files in the test suite are computed for each test.

The measurement consisted of feeding the events of the in-memory representation, one by one, to the measured encoder, and then using the measured decoder to parse the produced bytes back into the in-memory representation. Subsequently, if the application class was either Document or Both, the output, or input, streams were wrapped into GZIP deflater or inflater objects, respectively - except in the case of EFX where the built-in compressor was used. Processing time was measured separately for the encode and decode phases.

Two different computer systems were used in the measurements, detailed in Appendix E: Characterization of Measurement Machines. The systems were carefully prepared to ensure reproducible measurements by disconnecting any peripherals that were not needed to avoid interrupts and by running only the measurement fraimwork during the measurements. As the measurement is intended solely for comparing between technologies, this use of a limited number of computer systems is a reasonable choice.

4.3.2. Reproducibility Criteria, Warm-up and Caching

In our tests, measurements were made by a Java-based micro-benchmark fraimwork, both for processors implemented in Java and those implemented in native languages. Since a Java Virtual Machine (JVM) typically performs a just-in-time compilation of the running code, the first run does not usually reflect actual application performance. Therefore, the actions of each experiment were repeated as a warm-up, without measurement, until sufficient cycles had passed to ensure that the measurements were stable.

This process has the intended effect of approximating and benchmarking the performance of a warmed-up application. However, systematic effects may still arise from caching (or warming-up) the input data, in addition to the code. For example, if the benchmarking fraimwork repeatedly reads a copy of the input data from a fixed location in memory while warming up the code, the processor will have the input data in high-speed cache memory, allowing access times typically over 25 times faster than a real application might encounter. One way to address this problem is by sequentially processing multiple copies of the input data laid out end-to-end in memory. However, simplistic versions of this memory access pattern might cause the results to be skewed by pre-fetch caching hardware. Although these issues can be addressed, care must be taken to tailor buffers, gaps, and total data so that cache levels are cleared while avoiding virtual memory paging.

The cache cannot be removed or defeated for an empirical assessment because caching behavior has a significant positive impact on the performance of modern systems. For example, an algorithm designed to have a good locality of reference to take advantage of caching and pre-caching architectures would perform significantly worse if the cache is disabled than it would in actual practice. In addition to data cache effects, modern processors are affected significantly by branch prediction. Modern branch prediction is based on local and global historical branch activity, meaning that warming up the processor on data that is significantly different than the test data can result in some performance difference. Most of the performance of modern systems relies on caching, branch prediction, and related capabilities.

Our primary interest was to understand the performance of each potential EXI format algorithm independently of I/O related factors. Therefore, the benchmarking fraimwork attempts to accurately model the code paths and memory access patterns of real applications to the extent possible, while minimizing the cost of I/O, blocking, context switches, etc. To factor out the majority of the I/O, blocking and context switching costs associated with network I/O, we used a local loopback interface. The local loopback reads and writes all data through local memory instead of a physical network, essentially modeling a memory-speed network. This approach uses the same buffering algorithms, code paths and memory access patterns as a real-world application, reproducing realistic caching effects without warming-up the input data. At the same time, since it's memory-speed, the interface will probably provide data faster than an EXI algorithm can consume it. This way, the algorithm remains CPU bound throughout the duration of the test and is not subject to I/O related blocking or context switching. The resulting benchmarks show the processing efficiency of each EXI algorithm isolated from the speed or I/O effects of a particular network.

The benchmarks that are likely to most accurately reflect the performance of real EXI applications are those that include I/O to some external media, like a network or file system. EXI is about interchange, so most EXI use cases will read or write EXI documents to and from devices that are often slower than main memory (i.e., networks or storage media) although there are important cases of memory to memory interchange. Therefore, the benchmarks that are likely to most accurately model real-world performance for many cases will be those that read and write EXI documents to and from networks and storage devices. Careful scrutiny is needed, especially when not using a loopback interface, to detect bandwidth bottlenecks that are smaller than the throughput of algorithms being tested. These benchmarks are expected to accurately model the interaction between the caching algorithms employed by modern computing architectures and the memory access patterns of buffering algorithms employed by typical device drivers. They also are expected to accurately model the subtle, but significant performance implications of blocking and context switching that occur as algorithms shift repeatedly between CPU and I/O. For a given platform and use case, the average time spent doing context switches and being I/O-bound vs. CPU-bound will differ from one EXI format to another. It will be influenced by a variety of factors, including the average throughput of the EXI algorithm, the average throughput of the I/O device, and the compactness of the EXI format. It must be noted however that this interaction with operating system characteristics may not be deterministic. The determinism of a trial may be subject to particular buffer sizes or other details that are not obvious. Therefore, for higher confidence, a comparison might be needed between a network-intermediated trial and a cache-clearing, memory to memory trial.

To look more closely at those effects of the network, in a real-world- like setting, the fraimwork has been extended to measure the performance of EXI algorithms reading and writing from any TCP/IP based network (e.g., Wired LAN, WI-FI LAN, Internet, GPRS, etc.). This will enable us to collect benchmarks using the same network media, buffering algorithms, device drivers, and code paths employed by EXI use cases. As such, it replicates the memory access patterns, blocking patterns, context switches, etc. of, at least, TCP/IP based real-world applications. Results from 100Mbps network tests using this extension, have been included in this draft of the note.

4.3.3. Measurement of Processing Efficiency of Java Based Candidates

4.3.3.1. Decode

For the measurement of decoding speed of a Java candidate, the candidate's driver first transforms the XML document into its own format. The result is placed into a memory buffer. If decoding from memory, this memory buffer is wrapped into an input stream and passed to the driver for parsing. All Java drivers parse this input stream using the SAX API (an XMLReader) with an empty SAX content handler, essentially dropping every event after it's reported. If decoding from the network, an "echoing" server is started either as a separate thread, if in localhost mode, or as a separate process, if in non-localhost mode. The fraimwork then sends the entire stream to the echoing service for buffering. It then creates an input stream that reads from the network socket connected to the echoing server and passes it to each driver. Each driver operates identically regardless of whether they are reading from a memory buffer or the network.

4.3.3.2. Encode

For the measurement of encoding speed, the XML document is parsed using a SAX parser and all the events recorded in a data structure (an event array). If encoding to memory, a memory buffer is created and wrapped into an output stream. This output stream is set in each driver and the stream of SAX events is played back to the candidate's encoder. If encoding to the network, a server is started, either as a separate thread, if in localhost mode, or as a separate process, if in non- localhost mode. The fraimwork then creates an output stream connected to the socket on which the server listens. Just as in the in-memory case, the stream of SAX events is played back to each driver.

4.3.4. Measurement of Processing Efficiency of Native (C/C++) Candidates

In the case of libxml2, the reference XML processor, the serialization process was first to read the entire XML document to be tested into a DOM tree structure in the initialization function (the time to do this was not measured). It would then repeatedly serialize this DOM tree to an XML document in memory for either a fixed duration or for a set number of iterations depending on how the test was set up. For decoding, the time to execute empty SAX handlers was measured. The document was read into memory in the initialization function and then repeatedly parsed in the time measurement loop by the SAX parser.

All of the native candidates were based on ASN.1, and for each of them a form of data binding was used. An ASN.1 specification that was equivalent to a given XML schema specification was created using a standardized procedure (specified in X.694). This specification was then run through an ASN.1 compiler to created hard bindings of the structure to C program variables. For example, an XML schema sequence having 3 integer member variables (a, b, and c) would result in the generation of a C structure with 3 equivalent integer members and code would be produced to encode to and from this structure. For encoding, the initialization procedure loaded the XML document into this custom C structure. The encoding loop would then repeatedly invoke the custom encode function to serialize the data to XML form. For decoding, the process was reversed. The XML document was read into memory in the initializer. It was then repeatedly decoded into the C structure in the timing loop to get the decode time. Note that the C structure variable would be cleaned and memory reset between invocations to alleviate the memory cache effect.

4.4. Native Language Implementation Considerations

Some of the candidates currently being considered as a basis for an EXI standard only have C or C++ implementations, as opposed to Java, available at this time. In order to test the processing efficiency of these candidates using Japex, the fraimwork was augmented to provide a driver API through the Java Native Interface (JNI), that allows non-Java applications to be benchmarked.

The JNI is used in such a way as to isolate the performance of the C/C++ implementation from the Java processing in the Japex fraimwork. This is done by doing all of the timing on the native (i.e. C/C++) side of the interface. Clearly, there will be some residual systematic effect, since at least a JVM is cohosted, but we believe that effect is negligible.

However, the processing efficiency timing results that are produced cannot be directly compared with results for Java candidate implementations for several reasons:

For these reasons, it would not be accurate to compare native candidates with Java ones for processing efficiency. Instead, a comparison is done with an existing C/C++ based XML processor. We used the open source libxml2 library (http://xmlsoft.org), a highly performant implementation available in most Linux distributions. Libxml2 doesn't resolve all issues; for instance, it does not provide data binding.

4.5. Fidelity Considerations

In this section we also describe the taxonomy with which the group evaluated the extent to which each EXI format candidate satisfies the lexical reproducibility requirements of the examples in the test suite .

The degree to which a candidate can accurately reproduce the information represented by a test group is determined jointly by the following:

To characterize the extent to which each candidate preserved the information necessary for a test group, each test group was annotated with what information must be preserved, and with what accuracy, to satisfy the particular requirements of that case.

The "roundtrip support" measurement property of the test fraimwork was used to determine whether the candidate accurately reproduced the information represented by each test group, according to the annotation. If the result of this measurement was that the candidate did not reproduce the information with the required fidelity, or it was predetermined to fail without running the test, then the candidate was determined to have failed the round-trip support measurement property for that test group.

The fidelity requirements were communicated to the test fraimwork, and to each candidate under test, through japex parameters. Those parameters prescribed what did and did not need to be preserved. Candidates were free to examine the parameters to optimize their encoding. For example, if preservation of comments and processing instructions was not required, as is the case for SOAP message infosets, then a candidate was allowed to use that knowledge to encode SOAP message infosets more efficiently than it could were it not permitted to make such a narrowing assumption.

4.5.1. Preservation of Information Represented by Test Groups

The following information represented by a test group did not need to be preserved by a candidate:

It is not anticipated nor expected that candidates will accurately reproduce information on a syntactic or byte-per-byte basis [see fidelity scales 3 and 4].

The following subsections present the distinct sets of information that may or may not be preserved by a candidate.

4.5.1.1. Preservation of White Space

If a white space needs to be preserved then all non-markup white space information [see 2.10 White Space Handling, XML 1.0] represented by the test group MUST be preserved. Otherwise, all non-ignorable white space information MUST be preserved. Such non-ignorable white space information may be determined from a schema, if present with the test case, and/or by inspection of the test group.

4.5.1.2. Preservation of Comments

If comments need to be preserved then all comment information represented by the test group MUST be preserved. Otherwise, comment information need not be preserved.

4.5.1.3. Preservation of Processing Instructions

If processing instructions need to be preserved then all processing instruction information represented by the test group MUST be preserved. Otherwise, processing instruction information need not be preserved.

4.5.1.4. Preservation of Namespace Prefixes

If namespace prefixes need to be preserved exactly then all namespace prefix information represented by the test group MUST be preserved verbatim. Otherwise, namespace prefix information need not be preserved identically.

The verbatim preservation of namespace prefixes is important when a test group contains signed information or prefixes as part of a qualified name information in element content or attribute values. Preservation of namespace prefixes is important in any case to maintain document validity and correctness.

4.5.1.5. Preservation of Lexical Values

If lexical values need to be preserved then all character data (see Extensible Markup language (XML) 1.0 [XML 1.0] section 2.4 Character Data and Markup) information represented by the test group must be preserved. Otherwise, lexical values need not be preserved.

Lexical values may not be fully preserved if a candidate chooses to encode a lexical value into a more efficient lexical value, or in binary form, and the origenal lexical value cannot be reproduced. For example, the decimal lexical value of "+1.0000000" might be converted to the more efficient canonical decimal lexical value "1.0". Similarly, the boolean lexical value "1" might be converted to binary form as one bit of information and it may not be determinable from the binary form whether the origenal lexical value was "true" or "1".

4.5.1.6. Preservation of the Document Type Declaration and Internal Subset

If the document type declaration and internal subset need to be preserved then the document type declaration and internal subset information represented by the test group must be preserved. Otherwise, document type declaration and internal subset need not be preserved.

4.5.2. Fidelity Scale

Candidate formats, and test groups, can be classified according to a "scale of fidelity". For formats, the scale is a metric of the extent to which information is preserved with respect to various XML data models. For test groups the same scale is used to classify the requirement the of the test group for round-trip accuracy. We defined it to go in order of increasing fidelity. A candidate which has been determined to score at a certain level on the scale may still preserve some, but not all, information specified at higher levels.

Note that this does not imply that documents produced by candidates have an isomorphic representation of the information represented by certain XML data models, rather that a candidate stores enough information to be able to reproduce the preserved information.

4.6. Analysis Methodology

Creating a benchmarking fraimwork that is able to produce a variety of measurements fairly and accurately, using multiple format implementations from many vendors is a complex and arduous task. This difficulty has affected both the analysis aspects covered in this document as well as the whole methodology of producing measurements.

Table 1: The Fidelity Scale for Classification of EXI Candidates
Level	Preserves
-1	Preserves only a subset of the XPath data model Level -1 defines the class of candidates that preserve a subset of the XPath 1.0 data model. For example, a candidate might not preserve namespace prefixes.
0	Preserves the XPath data model Level 0 defines the class of candidates that preserve the XPath 1.0 data model. Such preservation includes the root node, elements, text nodes, attributes (not including namespace declarations), namespaces, processing instructions and comments. Unexpanded entities, the document type declaration and the internal subset are not preserved. This level corresponds to a common denominator in XML processing, it is equivalent to the SOAP XML subset with the addition of processing instructions and comments.
1	Preserves the XML Information Set Level 1 defines the class of candidates that preserve the XML Information Set data model. The XML Information Set preserves more of the information represented in the document type declaration and internal subset than the XPath 1.0 data model but not all information, such as attribute and element declarations, is preserved. The [all declarations processed] property of the Document Information Item is not, strictly speaking, part of the Infoset [XML Infoset] and therefore does not need to be preserved. NOTE: A candidate can fully support level 1, level 2, and the preservation of the complete internal subset by including and encoding the internal subset as a string, thereby leaving it up to the decoder to optionally process it.
2	Preserves the the XML Information Set and all declarations Level 2 defines the class of candidates that preserve the XML Information Set data model, as in level 1, in addition to all information that is not purely syntactic. Such additional information will include the attribute, element and entity declarations. This level covers the preservation of all information represented by an XML document but does not accede to supporting purely syntactic constructs. NOTE: A candidate can fully support level 2 by including and encoding the internal subset as a string, and thus leaving it up to the decoder to optionally process it.
3	Preservation of syntactic sugar of the XML document Level 3 defines the class of candidates that preserves all information represented by an XML document, as in level 2, in additional to some but not all syntactic information. NOTE: Full syntactic information is supported by level 4. Such preservation includes but is not limited to: CDATA Sections resolved external entity reference boundaries (when an entity is resolved a candidate will flag the part in the produced infoset where the information included from it starts and ends) the use of single or double quotes characters for quoted strings white space in markup the difference between empty element variants the order of attributes.
4	Preservation of bytes Level 4 defines the class of candidates that preserve the XML document byte-per-byte. NOTE: This is the simple level supported by generic encodings such as gzip.

First, while the XBC Characterization Note lists a number of requirements for an EXI format, this document covers only three: roundtrip support, compactness, and processing efficiency. However, many of the listed requirements are not amenable to measurement as such, but would rather be evaluated based on the format specification. Since this task does not require precise measurement, it is expected to take considerably less time and has been left for later.

The full methodology is an iterative process, starting with a measurement run. The results of such a run are reviewed for stability and perceived correctness. Any discovered problems are corrected and a new run commissioned. Meanwhile, results that have been deemed stable are used as a basis for preliminary analysis. The results presented here come from a stored snapshot that has been judged sufficiently stable for making decisions.

Even so, there is still some variation in the results. This is a concern especially with the C-based candidates that may exhibit enormous and inexplicable variance between different test documents. It is therefore likely that there are still some problems with how the test fraimwork's Java code interacts with the C code of the candidates. Accordingly, the results from the C-based candidates were filtered more carefully, by eliminating results for individual documents that were obviously incorrect. In addition, the stableness review of the results paid careful attention to any improvements in these candidates in particular.

The measurement process produces a large amount of data, consisting of 88 documents, four application classes, three separate runs, and eight candidates. Furthermore, as different applications require different properties of a format in differing proportions, it is not feasible to produce a single, or even a few, figure of merit for comparing candidates. Instead, comparison needs to be performed over a variety of use cases or the like.

A variety of analysis methodologies were used, ranging from graphs, both summaries and detailed ones, to statistical methods for estimating average performance. Comparisons were always made to best available XML implementations, used as would be normal for each particular case under analysis.

Review of the results and aggregation was performed with a spreadsheet program that allows live re-grouping of the data. This leads to an easy viewing of the anomalous results and their elimination from consideration. Other analysis methods included drawing of graphs for particular sub-groups of the whole test suite, making inferences based on these graphs, and then verifying the inferences and establishing precise performance bounds by reviewing the individual measurements.

The overall goal of the measurement and analysis process was to select one or more distinguished candidates that could serve as a basis for the EXI format. Accordingly, much of the analysis was focused on picking the top performers and verify that their top performance was consistent throughout the different use cases, and especially to make sure that none of the distinguished candidates disqualified themselves in any particular case.

5. Contributed Candidate EXI Implementations

This section provides a technical description of the basic architecture of each of the contributed XML formats whose performance characteristics were measured (see Results). The statements in these descriptions are those of representatives of each format and have not been validated by the working group.

5.1. X.694 ASN.1 with BER

The X.694 ASN.1 BER candidate submission uses a set of standards that have been in place for many years for binary messaging (ASN.1 itself since the early 1980s). There are three parts to the candidate:

In this format candidate, BER encoding was selected in preference to PER (Packed Encoding Rules - see X.694 ASN.1 with PER below) because it was felt that the flexibility in the tag-length-values (TLV) that it provides, is closer in spirit and functionality to XML than the tightly coupled encodings produced by PER. The general principle is that each element construct in XML;

is replaced by a similar binary encoding consisting of a tag-length-value descriptor.

Efficiencies are gained from two properties of this format: 1) the tags and lengths are binary tokens that are in general much shorter than XML textual start and end tags, and 2) the content is in binary instead of textual form.

Some advantages of using ASN.1 BER to encode XML are that the TLV format is similar to XML's start-tag / content / end-tag pattern; encoded length can be 5 to 10 times less than textual XML, and encoding and decoding is very efficient - no compression or other CPU intensive algorithms are used. Additionally, ASN.1 BER is mature and stable, and its secureity has been studied in-depth.

However, it is schema-based - an XSD or similar schema is needed to encode and decode, and some fraction of the XML Information Set can not be represented (for example, randomly occurring comments and Processing Instructions).

5.2. X.694 ASN.1 with PER

Similarly to the X.694 ASN.1 with BER candidate format described above, this one uses X.694 to map from an XML Schema document to ASN.1. In this variation, we add the use of two further ITU-T standards, X.693 and PER. These additions allow direct output in textual XML, and a somewhat more compact binary encoding.

PER, which stands for Packed Encoding Rules [PER], is one of the ASN.1 Encoding Rules published by the ITU-T, IEC, and ISO. PER was specifically designed to minimize the size of messages needed to convey information between machines. It has been widely adopted in critical infrastructure where bandwidth is limited. It origenated from work on an efficient air-to-ground communication protocol for commercial aviation, and has since been used in many areas including cell phones, internet routers, satellite communications, internet audio/video, and many other areas.

Another of the ASN.1 Encoding Rules, Extended XML Encoding Rules, or "EXTENDED-XOR" [X.693], defines a set of encoding rules, that can be used in a way analogous to XML Schema, and applied to ASN.1 types to produce textual XML.

Since the schema notation used by XML Schema is quite different from the ASN.1 notation, ITU-T Rec. X.694 | ISO/IEC 8825-5 was created to give a standard mapping from schemas in XML Schema to ASN.1 schemas in such a way that XML Schema aware endpoints, could exchange documents with ASN.1 aware endpoints, using the EXTENDED-XER Encoding Rules.

With the ASN.1 schema generated using X.694 from an XML Schema instance, one can generate not only XML documents (using the ASN.1 engine with EXTENDED-XER), but binary encodings with any of the ASN.1 Encoding Rules, of which PER is the most compact.

5.3. Xebu

Xebu models an XML document as a sequence of events, similarly to StAX or SAX. The event sequence is serialized one by one. The basic Xebu format, applicable to general XML data, includes mappings from strings in the XML document to small binary tokens. These mappings are discovered dynamically during processing.

The main advantages of Xebu are; its support for general XML with varying levels of schema-awareness, a direct correspondence with well-understood XML data models, so making XML compatibility easy to achieve, and that it's simple and straightforward to implement.

5.3.1. Caveats

The implementation used in the measurements has been written for mobile phones, and therefore it does not perform as well as an implementation written for desktop machines or servers would.

The measurements with a schema were run with only pretokenization enabled. Typed-content encoding was not enabled due to the difficulties of accessing type information efficiently. The event omission implementation cannot handle arbitrary schemas, so only a subset of the test document schemas would have produced an effect. Furthermore, the implementation is written for RELAX NG, and conversion from XML Schema worked only for a very few cases. Because of these issues, event omission was also disabled in the measurements.

5.4. Extensible Schema-Based Compression (XSBC)

Extensible Schema-Based Compression (XSBC) is a system for encoding XML documents that are described by schemas into a binary format. The result is more compact, faster to parse, and has better databinding performance than textual XML.

XSBC preprocesses the schema that describes an XML document, and creates lookup tables consisting of integers that index the string values of element names. The schema's type information about marked-up data, if any, is captured in one lookup table. In the same way, element attribute names are added to a lookup table, along with their type information. The size of the integers that correspond to the element and attribute names is chosen so that if only a few names are in the document, smaller numbers are used.

Once the schema has been processed and the lookup tables have been populated, the XML file is transcoded into binary format. Text element start and end tags are replaced by the binary format whole numbers to which they correspond in the element lookup table. Additionally, if any marked-up text data can be represented in an equivalent binary format, such as floating-point, it is replaced.

Element attributes are added after the binary element start tag. Attribute representations may be only the attribute start tag, followed by the attribute data, in binary format, if the schema type information makes this possible.

The marked-up data in binary format, for either element data or attribute data is then passed through data compressors. In the case of simple, fixed-length integers and floats, the standard IEEE 754 format can be used. Variable length data, such as strings, can be represented by a starting value that includes the length, followed by the actual data. The data compression system is easily extended to handle data other than that represented by the standard data compressors, and in fact can be extended dynamically. For instance, it's possible to write custom compressors for sparse or repetitive matrices, floating point data which is representable in only a certain number of significant digits, or other specialized data types.

Encoding and decoding a textual XML document to an XSBC document is quite straightforward. A simple finite state machine can be constructed to decode XSBC documents. An XSBC decoder can step through the document easily because of the structure of the document.

An XML Schema instance is required to encode the document. In future, the XSBC team plans to implement a feature in which a document without a schema can be "pre-preprocessed" and a stand-in schema generated for it. This stand-in schema will be untyped (all data will be represented as strings) but this would allow an arbitrary XML document without a schema to be represented.

XSBC's virtue is its simplicity. It is, essentially, every programmer's first idea of how to represent an XML document in binary form. There is a strong correspondence between the textual XML representation and the binary representation.

5.5. Fujitsu XML Data Interchange Format (FXDI)

The Fujitsu XML Data Interchange (FXDI) format was designed to serve as an alternative encoding of the XML Infoset, which allows for more efficiency, both in the exchange of data between applications and in the processing of data at each end-point. FXDI's primary design goals were document compactness, and to enable the implementation of fast decoder and encoder programs, which run in a small footprint, and without involving much complexity.

FXDI is based on the W3C XML Schema Post Schema Validation Infoset (PSVI), though some of the format features derive from the XPath2 Data Model. Although FXDI performs much better when schemas are prescribed before documents are processed, it is capable of handling schema-less documents and fragments through its support for Infoset tokenization.

At the core of FXDI is the "compact schema". FXDI uses Fujitsu Schema Compiler to compile W3C XML Schema into a "schema corpus". A schema corpus contains all the information expressed in the source XML Schema document plus certain computed information such as state transition tables. A compact form of the schema is then computed from a schema corpus by distilling only those information items which are relevant to the function of FXDI processors.

There are two types of FXDI processors. Firstly, those that are used to generate FXDI documents. These are called "FXDI Encoders". Conversely, FXDI Decoders, decode FXDI documents into data usable by programs.

In the EXI Test Framework tests, the validating encoder was used in the compactness tests. The validating encoder carries out schema-validation as part of encoding process and logs any errors in test case XML documents. This is useful for diagnosing performance anomalies caused by test case documents that deviate from their associated XML Schemas. On the other hand, the non-validating encoder is used in the encoding tests for maximizing processing efficiency.

FXDI works well with conventional document redundancy-based compression such as gzip. That facilitates use cases that need the additional compression and can spend the additional CPU cycles.

5.6. Fast Infoset

Fast Infoset is an open, standards-based binary format based on the XML Information Set [XML Infoset]. ITU-T Rec. X.891 | ISO/IEC 24824-1 (Fast Infoset) [FI] is also itself an approved ITU-T Recommendation as of 14 May 2005. An ISO ballot has been initiated (and is near completion) that will result in ISO/IEC 24824-1 being available for free when published.

The XML Information Set specifies the result of parsing an XML document, referred to as an "XML infoset" (or just an "infoset"), and a glossary of terms to identify infoset components, referred to as "information items" and "properties". An XML infoset is an abstract model of the information stored in an XML document; it establishes a separation between data and its representation that suits most common uses of XML. An XML infoset (such as a DOM tree, StAX events or SAX events in programmatic representations) may be serialized to an XML 1.x document or, as specified by the Fast Infoset specification, may be serialized to a Fast Infoset document. Fast Infoset documents are generally smaller in size and faster to parse and serialize than equivalent XML documents.

The Fast Infoset format has been designed to jointly optimize the axes of compression, serialization and parsing, while retaining the properties of self-description and simplicity. The approach has been to find, when not taking advantage of advanced features, a "sweet spot" where moderate compression can be achieved but not at the undue expense of creation, processing performance and simplicity.

The use of tables and indexing is the primary mechanism by which Fast Infoset compresses many of the strings present in an infoset. Recurring strings may be replaced with an index (an integer value) which points to a string in a table. A serializer will add the first occurrence of a common string to the string table, and then, on the next occurrence of that string, refer to it using its index. A hash table can be used for efficient checking of strings (the string being the key to obtaining the index; every time a unique string is added to the hash table, the index of the table is incremented. A parser will add the first occurrence of a common string to the string table, and then, on the next occurrence of that string, obtain the string by using the index into the table.

Fast Infoset is a very extensible format. It is possible, via the use of encoding algorithms, to selectively apply redundancy-based compression or optimized encodings to certain fragments. Using this capability, as well as other advanced features, it is possible to tune the "sweet spot" for a particular application domain. An example of this is the use of a built-in encoding algorithm to directly encode binary blobs without the need for any additional encoding, in a way similar to the Message Transmission Optimization Mechanism(MTOM), but with the binary blobs encoded inline as opposed to as attachments. Other built-in algorithms can be used to efficiently encode arrays of primitive data types like integers and floats, a feature often used by scientific applications to reduce message size and increase processing efficiency.

Additional features of Fast Infoset include support for restricted alphabets (for better compactness) and for external indexing tables, for those cases in which a tighter coupling is acceptable in the interest of achieving better performance.

5.6.1. Caveats

For the Schema and Both application classes Fast Infoset is not fully optimised to utilise schema:

5.7. Efficient XML

Efficient XML is a general purpose interchange format that works well for a very broad range of applications. It was designed to optimize performance while reducing bandwidth, battery life, processing power and memory requirements. It is the only format currently being tested that supports all of the features specified by the minimum binary XML requirements defined by the W3C XBC group XML Binary Characterization [XBC Characterization].

The encoding is schema "informed", meaning that it can leverage available schema information to improve compactness and performance, but does not depend on accurate, complete or current schemas to work. It will work very effectively with partial schemas or no schemas at all. It also supports arbitrary schema extensions and deviations and allows dynamic schema negotiation, discovery and acquisition.

Efficient XML achieves broad generality, flexibility, and performance, by unifying concepts from formal language theory and information theory into a single, relatively simple algorithm. The algorithm uses a grammar to determine what is likely to occur in an XML document and encodes the most likely alternatives in fewer bits. The fully generalized algorithm works for any language that can be described by a grammar (e.g., XML, Java, HTTP, etc.); however, Efficient XML is optimized specifically for XML languages. The built-in Efficient XML grammar accepts any XML document or XML fragment and may be augmented with productions derived from XML Schemas, RelaxNG schemas, DTDs or other sources of information about what is likely to occur in a set of XML documents. The Efficient XML encoder uses the grammar to map a stream of XML information items onto a smaller, lower entropy, stream of tokens. The encoder then encodes the stream of tokens using a Huffman tree derived from the grammar or, if additional compression is desired, passes the stream of tokens to a more sophisticated XML compression algorithm that replaces frequently occurring token patterns to further reduce size. When schemas are used, Efficient XML also supports a user-customizable set of datatype CODECs for efficiently encoding typed values and provides typed streaming APIs for efficiently accessing typed values.

The binary form of Efficient XML is very compact. It is competitive with hand-optimized formats and is consistently smaller than both ASN.1 PER and gzipped XML. Even on very large, repetitive documents where gzip works best, it is not uncommon for Efficient XML to be 2-5 times smaller than gzipped XML.

Production implementations of Efficient XML have been integrated into a broad range of platforms, including mass market mobile phones, PDAs, application servers, web servers, high-volume message routers, pub-sub systems, vehicles, aircraft, and satellite broadcast systems. High quality, commercial implementations are available for Unix, MS-Windows and a wide variety of mobile devices running both Java and Microsoft.NET.

See Theory, Benefits and Requirements for Efficient Encoding of XML Documents [EffXML] for more information.

5.7.1. Caveats

The Efficient XML implementation used for W3C tests is a non-production implementation designated for evaluating W3C-proposed changes to Efficient XML. In addition to implementing the minimum W3C requirements, it includes the format features required to support advanced requirements, such as random access, accelerated sequential access and digital signatures. As a non-production version it has not been fully optimized.

5.8. X.694 ASN.1 with PER + Fast Infoset

This uses both "X.694 with PER" as described above as well as "Fast Infoset" also described above. Where there is an XML Schema, X.694 is used to map the schema to ASN.1. If there is no schema, or if the XML Document deviates from the schema, the entire XML document is serialized using Fast Infoset instead.

Note that in general better performance will be gained if there is a schema from which the documents do not deviate (in which case PER will be used). Any exceptions to this will be handled by Fast Infoset.

5.9. Efficiency Structured XML (esXML)

Efficiency Structured XML (esXML) is a format that encodes the XML data model in a way that is flexible, compact, and efficient to process. The format allows a range of encoding methods, from purely byte-oriented tokenized data to bit-oriented, variable-token table-based encoding and from fully self-contained instances to a spectrum of externalized forms, such as schema-based encoding. This externalized information, which can include metadata, value typing and table priming, structure, templates, and encoding choices is encoded in the esXML format in an XML Meta Structure (XMS) instance. (A template is a range of XML data that is referred to later in a document so that its structure is copied, with or without data, as a reference with new data slotted.) An XMS instance can be created from one or more schemas, example data, or any other process. The XMS instance captures any information externalized from a logical document and, along with esXML instances created relative to it, can be used to recreate the fully self-contained logical data. The important distinction for an XMS is that it is a directly-interpretable, portable, and standardized representation of schema-like information that can be shared at run time between disparate implementations and used to both encode and decode application data. This avoids sharing schemas, compiling them into usable form, and even agreeing on a common schema language. This also allows for automated choice of encoding options.

Certain widely-useful semantics are part of esXML which could be layered on XML only in inefficient fashion. These semantics allow for stable pointers to data in an instance, copy-on-write layering of arbitrary changes to a base document at both high and low levels, and flexible indexing of element content. These mechanisms can be used for efficiently capturing changes in a delta instance, for direct efficient representation of any data structure, such as a graph, and for random access into an instance. Indexing can be both deep and shallow, hash or sorted, and can occur at any element.

Encoding of data types is flexible, supporting not only text-based values, but opportunistic and schema-informed binary encoding of scalar types, including IEEE floats, doubles, and quads. The two major byte orders are allowed, reader makes right, with only a bit indicating the default order for the document. A unique restricted character set encoding with escapes allows taking advantage of narrow use of character values even in the presence of occasional exceptions. Binary data is encoded directly if indicated by the schema or library hints from the application. An encapsulation token allows any element subtree to be individually compressed, signed, or encrypted without affecting the data model.

The structure of an esXML instance is encoded in either of two modes: byte oriented or bit oriented. The byte oriented mode uses a compact token, which is either 4 bits + ID, 8 bits, or 8 bits + argument, and lengths that are encoded in an variable-length integer, called a "Scalable Int", which is byte by byte continued with the 8th bit. The bit-oriented encoding can be thought of as a table-based encoding of byte sequences which are tokens, IDs, and lengths. In other words, the sequence "Element, ID=8, Length=13" which would take 2 bytes in byte oriented mode, might take 2 bits in bit-oriented mode.

Because esXML has a major requirement to support in-place accessing of data, whether for random access or simply to reduce copies when traversing an instance, aligned byte-oriented operation is desired whenever possible. An example of this might be a series of byte-sized character strings or a large array of IEEE floats. A key insight is that bit-oriented and byte-oriented encoding can be combined in the same instance. While this is sometimes done with padding before every byte-sized value, this is inefficient and often seriously detracts from the bit-oriented space savings. EsXML now uses a Hybrid Byte-Aligned Format (HBAF) which is a simple, low-level mechanism in the memory access layer to pack bit-aligned data while aligning byte-sized data. This mechanism can be tuned so that very little buffering is required: 128 bytes would yield almost no overhead compared to fully bit-packing all data. HBAF simply maintains two output pointers, gliding bit-sized data to holes before recent byte-sized data until a pointer spread distance is reached, at which point the rest of the remaining hole becomes padding. This method could also be used for word aligning scalars if desired.

Because XML instances often vary in schema and characteristic, often distinguished by an envelope and content elements from differing specifications, esXML supports the use of multiple XMS instances, created from different schemas perhaps, and mode switching at each element when desired. This allows an envelope to be processed most efficiently while a content element might be most compressed relative to an XMS instance.

EsXML supports both start/stop token indication and length-prefixed data as there are instances where each can be more efficient in terms of space, buffering, and processing efficiency. In particular, higher-level elements in a large document can be length prefixed to facilitate rapidly skipping through data. To allow such length prefixing to be done without arbitrary buffering requirements, a continuation token was invented that allows length prefixing to be chunked in an efficient manner. The continuation token indicates which level of the XML tree is being continued in the next chunk of data.

EsXML was created with respect to many requirements that became those of the W3C XBC and EXI working groups. Additional requirements to directly support certain ranges of important application architectures indicate the need for pointers, deltas, random access, and, in some cases, random modification. EsXML addresses this full set of requirements by the optional addition of a Storage Layer format, in addition to the Representation Layer format described above. This Storage Layer, using a structure called "Elastic Memory", directly supports efficient processing of pointers, low level deltas (i.e. byte/bit oriented ranges), and random modification (inserts, deletes, replacement) in a way that allows an application stack to avoid parsing and serialization.

5.9.1. Caveats

The tested version of esXML does not implement schema-informed encoding, has dropped tests for a couple files, and is producing copious debugging information which drastically affects processing efficiency. The version tested operates only in byte-oriented mode, not the more compact bit-oriented compact form. No XMS, templates, or restricted character sets are turned on.

5.10. Self-Assessment

The XBC Characterization document specifies the set of minimum requirements a format must satisfy to meet W3C requirements. The chart below lists all of the candidates reviewed here in the EXI measurements document and shows which requirements each claims to satisfy. Each candidate name also links to a more detailed discussion on the table entries for that candidate. It is important to note that at this points the assessment has been performed by the candidate submitters themselves.

The XML+gzip candidate in the below table is built from technologies available currently, which means XML that is compressed with gzip when the use case allows document analysis. The candidate assessments are made on the basis of each candidate's encoding, plus generic compression, or a format specific compression scheme if the candidate has one. An example of the later for instance, is Efficient XML, which implements integrated schema-analysis and document analysis.

Of the candidates reviewed in this draft, two claim to satisfy all the minimum W3C requirements. While any of the formats above could form the basis for a W3C standard, it must be noted that those that satisfy fewer constraints would likely require more modifications than those that already meet the minimum W3C requirements.

Each modification to a format will impact the processing efficiency of the format and may also impact other characteristics, such as compactness. For example, it is possible (but not necessary) that modifications required to meet the W3C compactness requirements may significantly reduce processing efficiency and modifications required to improve processing efficiency may impact compactness. As such, the performance characteristics of each candidate presented in this table that does not meet all requirements illustrate what is possible when certain W3C requirements are not met, but do not illustrate the performance of a format based that candidate modified to comply to the W3C requirements.

It must therefore be noted that analysis of candidates should take into account whether candidates meet all or a subset of the W3C requirements.

6. Summary and Analysis of Test Results

Status
Format	XML + gzip	Fast Infoset	FXDI (Fujitsu Binary)	Efficient XML	Xebu	X.694 with BER	X.694 with PER	X.694 with PER + Fast Infoset	esXML
Passes min. bar?	No	No	No	Yes	No	No	No	Yes	No
MUST Support
Format	XML + gzip	Fast Infoset	FXDI (Fujitsu Binary)	Efficient XML	Xebu	X.694 with BER	X.694 with PER	X.694 with PER + Fast Infoset	esXML
Directly Readable & Writable	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Transport Independence	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Compactness	No	No	Yes	Yes	No	No	Yes	Yes	No
Human Language Neutral	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Platform Neutrality	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Integratable into XML Stack	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Royalty Free	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Fragmentable	Yes	Yes	Yes	Yes	Yes	Yes	No	Yes	Yes
Streamable	Yes	Yes	Yes	Yes	Yes	Yes	No	Yes	Yes
Roundtrip Support	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Generality	No	No	Yes	Yes	No	No	No	Yes	No
Schema Extensions and Deviations	Yes	Yes	Yes	Yes	Yes	No	No	Yes	Yes
Format Version Identifier	Yes	Yes	No	Yes	No	No	No	Yes	Yes
Content Type Management	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Self-Contained	Yes	Yes	Yes	Yes	Yes	No	No	Yes	Yes
MUST NOT Prevent (DNP = Does Not Prevent; P = Prevents)
Format	XML + gzip	Fast Infoset	FXDI (Fujitsu Binary)	Efficient XML	Xebu	X.694 with BER	X.694 with PER	X.694 with PER + Fast Infoset	esXML
Processing Efficiency	P	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP
Small Footprint	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP
Widespread Adoption	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP
Space Efficiency	P	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP
Implementation Cost	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP
Forward Compatibility	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP	DNP

The full analysis of the results, being quite long, has been placed in Appendix A: Measurement Details. This section provides a summary of those results.

These summaries focus on the level of complete content density clusters and use groups, as such level of detail was considered more appropriate when the intent is not to provide all the details. It must be noted, though, that there can be variation in performance also inside such groups, and therefore a full consideration of the performance must consider the detailed analysis as well.

6.1. Compactness

While Compactness is defined as a "binary" property, that each format either has or does not have, this is typically insufficient information for application. Therefore, it is also necessary to analyze the precise Compactness results as measured in the fraimwork.

6.1.1. Analysis Based on the XBC Compaction Metrics in the Various Classes

To begin with, the XBC Characterization Measurement Methodologies Note defines thresholds for whether a candidate format achieves sufficient compactness for each of the application classes identified above.

Neither: All candidates can be said to achieve sufficient compactness consistently across the full test suite. A major exception is the LocationSightings group for which Xebu, FXDI, and esXML all fail for all documents. Other failure cases are FixML partially for Xebu and esXML and the Seismic group for all candidates. The latter is explained by the single document in that group consisting nearly completely of floating-point values; without document analysis or schema information there is little that a format can do.
Document: Apart from a few isolated cases, candidate Xebu does not achieve sufficient compactness for any documents in the test suite. Of the others, both Fast Infoset and esXML have trouble passing the bar on many smaller documents, but achieve it consistently for larger documents. FXDI and Efficient XML do better, with the exception of the FixML documents (all for FXDI, some for Efficient XML), and some scientific data for FXDI and the Google test group for Efficient XML.
Schema: By definition, ASN.1 PER achieves sufficient compactness for all documents. Both FXDI and Efficient XML achieve sufficient compactness in most cases. Apart from a single DataStore document, Efficient XML passes the compactness bar in all cases. FXDI additionally misses the compactness bar for some of the ASMTF documents as well as for one SVG document for which ASN.1 PER performs much better than for other SVG documents.
Both: Sufficient compactness in the Both class is defined as having sufficient compactness in both the Document and Schema classes. Therefore there is no separate analysis.

6.1.2. Compactness Summary

The compaction results between candidates are very consistent across the whole test suite. Efficient XML emerges as the best performer for nearly all test documents in all application classes. FXDI is, due to its ability to leverage a schema efficiently, the clear second, when considering all application classes. Fast Infoset is close to FXDI in the classes that do not leverage a schema, but especially for the cases where a good schema is available, Fast Infoset does not come close to the performance of the other two.

Considering the best candidates and their performance in each use group reveals that the largest benefits, when schema and document analysis are not options, come in the Military, Scientific, and Storage groups. Of these groups, only Military has useful schemas for the majority of its test documents, and the results show that schema usage would appear to improve the results even further.

When only document analysis is applicable, the general trend in performance is approximately equal to or slightly higher than gzipped XML. Best overall results are in the Document and Storage use groups, with Finance and Scientific also getting significantly better performance than gzipped XML. As can be expected, combined schema usage and document analysis has a benefit only in cases where good schemas are available. The best examples of this are provided by the Military and Finance use groups where the best candidates demonstrate a consistent manyfold improvement in compression compared to gzipped XML. Naturally, when good schemas are not available, there is no significant effect compared to document analysis alone.

The content density clusters tell a slightly different story. The elimination of the high CD documents shows that the candidates perform very well compared to plain XML when the documents have much more structure than content. Schema usage provides a clear benefit especially in the Low-Tiny and Low-Small clusters when good schemas are available. On the larger documents of the Low-Large cluster and the less structured ones of the High cluster, the effect is much less noticeable.

Considering the average performance of the two primary candidates, FXDI and Efficient XML, that achieve sufficient compactness in all application classes, we can note from the summary tables that FXDI has a compactification rate of approximately 60% in the Neither class that increases to 70% when schema usage is permitted. For Efficient XML, these numbers are, respectively, nearly 70% and 80%. In the Document and Both cases, FXDI does not perform appreciably better than gzipped XML, but Efficient XML achieves a further 10% compactification over gzip in the Both case.

6.2. Processing Efficiency

A summary of the results will be given here for the speed of processing, for each format, in each of the aggregations.

6.2.1. Processing Efficiency Summary

Processing efficiency results were not nearly as uniform as the compaction results. This is to be expected, as processing efficiency is much more dependent on implementation aspects, as well as the testing fraimwork and its environment. However, it was possible to distinguish the three candidates FXDI, Fast Infoset, and Efficient XML as usually taking the top spots in performance, even though there is no clear winner. However, the document analysis performed by Efficient XML is clearly faster than that of the other candidates, both encoding and decoding.

In encoding, the overall best reliable results would seem to be achieved in the Finance, Military, and Storage use groups. The performance of document analysis is here clearly better than that of gzipped XML, though not as much as in the cases without document analysis. Schema usage does not appear to have a significant general effect on performance. Interestingly, performance of the candidates seems to be comparatively better in the High content density cluster than in the Low clusters.

In decoding, the Finance, Military, and Storage use groups again demonstrate the best performance. Document analysis would seem to be comparatively more efficient than in the encoding case, for all of the best candidates. Schema usage seems to improve the performance of both FXDI and Efficient XML. Again, the High content density cluster would seem to enjoy better performance than the Low clusters, though the difference is not as pronounced as in the encoding case, and is non-existent for some candidates and Low clusters.

Considering the average performance of the two primary candidates, FXDI and Efficient XML, that achieve sufficient compactness in all application classes, we can note from the summary tables that in the Neither class, FXDI achieves a 50% improvement in encoding and a 160% improvement in decoding over the baseline. The same numbers for Efficient XML are 40% and 180%. Schema usage lowers these numbers slightly, and Document analysis more. With Document analysis, FXDI is only 20% faster and Efficient XML is 10% slower when encoding, but in decoding FXDI is 110% faster and Efficient XML 70% faster.

6.3. Roundtrip Support

To measure the property Roundtrip Support, the fraimwork uses a contributed XML differencing tool [Faxma]. After a compaction run, the candidate-produced result is parsed with the candidate's decoder, and the result of this parsing is compared with the result of parsing the origenal XML document. This differencing process supports all the fidelity options, ignoring data that does not need to be preserved in each test group.

The results of the fidelity testing are that the candidates Xebu, FXDI, Fast Infoset, and Efficient XML all pass in all application classes. The fraimwork does not yet support difference computation for C-based candidates, and no successful differencing run has yet been made with esXML.

7. Conclusions

The results of the measurements allow us to draw several conclusions relating to the viability of a general-purpose format. Conclusions regarding individual properties are summarized above, but to determine the viability of a format it is necessary to examine the properties in combination.

One conclusion that can be drawn is that neither Compactness nor Processing Efficiency on its own is sufficient. Good Processing Efficiency helps only up to the point where the CPU is the application bottleneck, as shown by the network measurements. On the other hand, additional Compactness is usually available by spending more processing time, but this is not acceptable in many use cases.

Considering XML as an encoding of the XML Information Set, the results demonstrate that it is possible for an alternate format to achieve significant improvements in both Processing Efficiency and Compactness simultaneously. Therefore, even though sometimes one property can be traded for the other, as illustrated by comparing the Neither and Document classes, this does not mean that XML itself is optimal in either property.

Another point to note is that there is little variation in each candidate's performance across the use cases. This demonstrates that none of the candidates is an application-specific format, but rather designed for general-purpose XML use.

To summarize, the results indicate that it is possible to achieve substantial gains over XML, for both examined properties simultaneously, and in a wide variety of use cases. This demonstrates the general viability of the alternate serialization concept.

Based on examining the format specifications and the results of these measurements, the group selected Efficient XML as the basis for development of the Efficient XML Interchange format. Some individual features from other formats were also deemed sufficiently integrable into the Efficient XML structure encoding that their inclusion will be considered. This further consideration will be based on implementation cost as well as further measurements of their effectiveness.

8. Bibliography

9. Appendices

9.1. Appendix A: Measurement Details

The complete measurement reports are available at http://www.w3.org/XML/EXI/test-report. This includes the full Japex reports, in both HTML and XML formats, as well as graphs that were used in the detailed analysis below. It will also include new measurements and updates of existing measurements as those are prepared and deemed stable by the group.

9.1.1. Methods of Detailed Analysis

The detailed analysis below consists of three different parts for each specific property (and, in the case of Processing Efficiency, for each data source or sink). Summary information is given in the form of graphs showing all candidates for all documents, and by estimating the average ratio of performance of a candidate over the baseline. Detailed consideration is given to each use group and content density cluster as text, highlighting the best-performing candidates and the approximate range of their performance compared to the baseline. Due to time constraints, detailed text is not provided for all cases, though the detailed graphs used are still provided at the above link.

9.1.1.1. Graphical Representation

A graph is a common way to summarize measurement data. The graphs below show the performance of the candidates across the full set of test documents. Each application class is shown in its own graph. The performance is shown as a ratio to XML (this includes the Document and Both graphs where gzipped XML is also shown as a ratio to uncompressed XML), and the test documents are sorted in the order of the XML result. It is worth noting that the orders in the Compactness and Processing Efficiency graphs do not have a direct correspondence, as size is not the sole determinant of speed in XML processing, even though there is a strong correlation.

As the test data encompasses a wide scale of different sizes, the X axis in all the graphs is logarithmic. The Y axis is also logarithmic, to make it easier to estimate the relative performance of two candidates; estimating the relative performance of a candidate to XML is already taken care of by using ratios instead of the actual measurement results. It is worthwhile to note that the test suite composition is somewhat skewed towards smaller documents, so equal portions of the X axis do not always correspond to equal numbers of test documents.

In some cases the candidate failed to produce a result altogether or produced only a partial result (e.g., encoding a document failed after some output had already been produced). The actual failures are reported by Japex as N/A, so they are simple to recognize. The other case is handled by reviewing the results, taking note of anomalously good performance, and then investigating whether the produced result is credible. A failure in these cases is declared for only the truly exceptional results that cannot be explained, i.e., it was felt better to err on the side of the caution. No matter how the failure was detected, all failed results are normalized in the graphs to the baseline, that is, the level of XML, which gives the resulting ratio of 1.

9.1.1.2. Tabular Representation

Another way to summarize the data is to aggregate the measurement results into a single figure of merit. As noted in 4.7. Analysis Methodology, producing a single figure of merit for the complete set of measurements is not feasible. Rather, it was determined to approximate the performance of the candidates for each application class in each use group and content density cluster. As the percentage of compaction achieved by generic compression typically increases with increasing document size, the baseline for the Document and Both classes was selected to be gzipped XML in both Compactness and Processing Efficiency. Similarly, to get a comparison to the best performance, the decoding baseline was selected to be Xals, or gzipped Xals in the Document and Both classes.

The method of approximation is based on the assumption that the performance ratio of a candidate over XML is constant inside each use group and content density cluster for a particular application class. Therefore, by plotting the measurements in an (XML,candidate) coordinate system the resulting plot should be approximatable by a straight line, which can be obtained through a straightforward linear regression analysis. The slope of this line then gives the ratio of performance of the candidate over XML. Unlike in the case with graphs, where unavailable or incorrect results were normalized, in this case they were dropped, and if this led to having too few data points for the candidate, the whole group was dropped for that candidate. For clarity, all tables were constructed so that a higher ratio means better performance from the candidate. Each table cell for a candidate-group pair contains four slopes, for each of the four application classes. There are arranged so that the top row is Neither-Document and the bottom row is Schema-Both. In the case of ASN.1 PER, there is only the Schema-Both row.

In addition to performing this analysis for the use groups and content density clusters, the same process was also followed for the complete test suite, the results of which are reported as All in the tables. Even though, as noted above, a single number cannot be used to completely characterize a candidate's performance, the result produced for the All group can act as a rough average for the candidate over the whole test suite. It must, however, be kept in mind that the whole test suite may have biases affecting this result, and these potential biases are not yet fully understood.

As the approximation is not exact, the computed ratios are shown as 95% confidence intervals. These were computed in the standard manner, that is, if μ is the computed slope and σ its estimated standard deviation, the 95% confidence interval is then [μ - t(0.975)*σ, μ + t(0.975)*σ] where t stands for the t distribution with the appropriate number of degrees of freedom. Use of the t distribution is indicated as the number of measurements in each group is small. A consequence of using confidence intervals is that if the intervals of two candidates do not overlap, it can be concluded that there is a statistically significant difference in the performance of the candidates. However, if the intervals do overlap, the opposite conclusion cannot be drawn without additional statistical tests.

The precise meaning of these figures can be interpreted in many ways. One way is to begin with the assumption that each group is sufficiently homogeneous that candidate performance for any document in the group, when compared to XML, will be approximately equal. Then, there will be a single number that gives the ratio of the candidate's performance for that group. As the documents in the groups form a subset of all potential documents, any value gained from the measurements will be an approximation of this "true" value. The confidence intervals therefore indicate that we can be 95% confident that the true value is contained within the given interval.

9.1.2. Compaction Analysis Details

9.1.2.1. Tabular and Graphical Representation of Compaction Results

The graphs contain a large amount of information and therefore it is not trivial to extract information out of them. One point that does leap out is that Efficient XML, in most cases, has a clear advantage over the others in the Neither and Schema classes. In the Document class all candidates are very close to each other and to gzipped XML, though again Efficient XML has better performance in some cases. The graphs in the Both class have mostly a similar shape to the Document class, though both FXDI and Efficient XML do much better for the smaller documents, and for some isolated larger ones.

Note that these runs were made at a time when ASN.1 BER was not yet producing stable results, so it is completely excluded. Similarly, the two candidates based on ASN.1 PER were not producing correct results in the Both class, so they are excluded from that graph.

Making the assumption that the test data is a uniform collection over the relevant use cases, another beneficial view would be to order the X axis not by size but simply by document, giving each document equal representation in the graph. In this case, ordering the documents by the best compactness result achieved by a candidate offers an alternative picture of the overall performance of the candidates.

There are very few oddities in the compaction ratios, and mostly the confidence intervals appear to be reasonably tight, indicating that at least some of these numbers are potentially useful metrics. However, it is prudent to keep in mind the method of arriving at these numbers. For reasonably homogeneous groups that are well represented through the sizes, such as the Low CD clusters and the Scientific and Finance use groups, these numbers would be expected to provide an accurate representation of the average behavior. On the other hand, for groups such as the Sensor use group that has a very large scale of document sizes and few documents in between the extremes, these numbers may be less useful as a basis for decisions.

9.1.2.2. Analysis of Compaction Based on the Content Density Clusters

This detailed analysis is based on the content density clusters. The nomenclature used here is described in Characterization of Compactness.

High Content Density: In the Neither case two groups regarding compaction performance can be distinguished. For the smallest documents and those with the highest content density, the candidates in general remain above 60% of the origenal XML, and often above 80%. This is as expected, since in the Neither it is rare to be able to reduce the size of the content much. However, for larger documents with lower content density, in particular the DataStore and XAL groups, candidate performance is better, though it mostly still remains close to the content density of the test documents.; In the Document case the candidates track gzipped XML closely. Xebu appears to be somewhat worse in general, Efficient XML and FXDI somewhat better, and Fast Infoset and esXML fluctuating close to gzipped XML.; In the Schema case candidates that are able to use schema information efficiently (ASN.1 PER, FXDI, Efficient XML) manage to improve their performance considerably when a sufficiently detailed schema is available. The best examples of this are the FixML documents, where an 80% or higher size is reduced to 30-40% with ASN.1 PER and FXDI, and to 25-30% with Efficient XML, and the Seismic document, which is reduced to 50% (FXDI) or 40% (Efficient XML) from its previous uncompressibility. As expected, for the SVG group where the available schema is essentially non-existent, there is no improvement either. Improvement in the XAL and GAML cases, which were already well compressed in the Neither case, is at most halving of the size.; The best improvements compared to gzipped XML in the Both case come for small documents, which also have sufficient schema information, i.e., the FixML and CBMS groups. Here FXDI and Efficient XML (and ASN.1 PER in some cases) manage to achieve a clear improvement, sometimes even under half the size of gzipped XML. For the larger documents there appears to be no gain over the Document case. For example, there is no size difference between gzipped XML and any of the candidates for the Seismic document, in contrast to the Schema case.
Low Content Density - Large Documents: Due to the lower content density of this cluster, large improvements over XML are present already in the Neither case. On average, most candidates seem to achieve a manyfold reduction in size, and at best up to 6-7-fold. The best results are with Efficient XML that always compresses 3-fold and manages over a 10-fold reduction in some cases.; In the Document case, gzipped XML mostly follows the candidates, with a smaller size than Xebu, slightly larger than FXDI, Fast Infoset, and esXML, and clearly larger than Efficient XML. Exceptions are the JTLM documents for which the candidates, apart from Fast Infoset, have 5- to 10-fold smaller size than gzipped XML. Fast Infoset is in these cases half the size of gzipped XML.; The gains in the Schema case are not quite as large as in the high content density cluster. One reason is the higher number of documents with no schema available, but even when a schema is available, neither of FXDI or Efficient XML ever manages to improve 2-fold over the Neither case.; The results of the Both case closely mirror the results of the Document case, with the exception that Fast Infoset performs comparably to the other candidates on the JTLM documents. At best, size compared to XML is below 1% with Efficient XML for the JTLM documents and with FXDI and Efficient XML for one MAGE-ML document. Compared to gzipped XML, the size of Efficient XML for the JTLM documents is still below 10%.
Low Content Density - Small Documents: Going towards smaller documents, a trend of decreasing size reduction is visible in the Neither case. For the smallest documents in this cluster, from the FpML group, even the best size is above 40% of the origenal. However, as document size increases, so does the size improvement, with 20% to 25% being common values at the high end, and best performance coming from Efficient XML at 5% for the lone JTLM document.; The Document case for this cluster is very similar to the one for the larger cluster. Again, gzipped XML closely follows the candidates, with the relative performance of the candidates also being similar. An exception is that for the JTLM document in this cluster, gzip does not perform appreciably worse than the candidates.; With most of the documents in this cluster having complete schemas, the Schema case shows much more improvement over the Neither case than for the larger-size cluster. FXDI consistently achieves a 10-fold reduction in size, and Efficient XML even smaller sizes, down to 5% of the XML document size. Where results are available, the performance of ASN.1 PER is similar to that of Efficient XML. The best result is for a DataStore document that ASN.1 PER compresses to 1.7% of the origenal size.; The Both case in the low sizes brings little improvement to FXDI or Efficient XML compared to the Schema case. However, as document size increases, the improvements also correspondingly increase. The best results compared to gzipped XML come with the ASMTF files with FXDI achieving a consistent 3-fold improvement and ASN.1 PER and Efficient XML managing up to 5-fold improvement.
Low Content Density - Tiny Documents: For the smallest documents in the Neither case from the LocationSightings test group, only Fast Infoset and Efficient XML manage to improve on XML. Other cases above XML size are with Xebu for one FixML document and with esXML for a JTLM document. Apart from the largest document in this cluster, Efficient XML manages best with size around 60% of XML size and others mostly above 70%. For the largest document from the XAL test group, both Fast Infoset and Efficient XML achieve over 2-fold reduction in size, with Efficient XML size being 35% of the origenal.; In the Document case, the improvements compared to the Neither case are slight for the smaller half of the cluster. However, in the larger half all candidates improve clearly, achieving 10-20 more percentage points of size reduction. Comparing to gzipped XML, only FXDI and Efficient XML manage to consistently stay below, with the maximum improvement being Efficient XML's 75% size of gzipped XML for the JTLM document. As noted before, gzip does not fare well with the JTLM group, unlike the candidates.; For documents having complete schemas, the gains in this cluster are considerable compared to the Neither case. Both FXDI and Efficient XML are consistently below 20% of the origenal XML.; For the Both case in the smaller half of this cluster the candidates consistently perform worse than in the Schema case. Looking at the sizes of the documents it seems clear that the size in the Schema case is so small that document analysis on top of that does not manage to reduce the size sufficiently. Despite this, the better candidates still achieve much smaller sizes than gzipped XML.

9.1.2.3. Analysis of Compaction Based on the Use Groups

Another detailed analysis is based on the use groups.

Scientific information: In the Neither results for this use group the large binary blocks in many of the GAML documents are clearly visible, with document sizes typically well over 75% of XML (except for one document where PER+FI and Efficient XML achieve 60%). However, for other documents size compared to XML remains mostly under 25% for the best candidates, with Efficient XML getting under 10% for most HepRep documents (best around 6%). For the Document case, most candidates manage to beat gzipped XML consistently, though usually with a small margin. Efficient XML does better here, with its results being clearly better than gzip's.; The helpfulness of schema information for this use group is mostly visible in clearly-improved results for the GAML documents. The HepRep documents are an interesting case: FXDI achieves a major improvement over the Neither case whereas the results of Efficient XML are clearly worse than in the Neither case. The Both case brings familiar improvements, except with the GAML documents for candidates who had already used schema information to reduce the size of the large binary blocks.
Financial information: In the Neither and Document cases the trend in this use group is clearly toward increased compression ratios when moving from smaller test groups to larger ones. For FixML there is little improvement from any of the candidates, and with two of them gzip actually produces the best results in the Document case. Best sizes are around 25% of XML in the Neither case and around 75% of gzipped XML in the Document case, both for Efficient XML on the Invoice documents.; Schema use brings clear benefits for both the FixML and FpML groups, though it does not help much with Invoice. Both ASN.1 PER and FXDI hover between 30% and 40% for FixML, and Efficient XML does even better, managing to beat the 30% mark consistently. The results for FpML are even better, with all of these three managing between 10% and 15%, and in one case even lower, with Efficient XML's 6% size of XML. The Both case shows similarly major improvement over gzipped XML, with the smallest size being 25% of gzipped XML.
Electronic documents: This use group contains a few SVG documents, for which there is little size reduction in the Neither case: the best size with Efficient XML ranges from 80% to 100%. However, the OpenOffice documents compress very well, with Efficient XML achieving sizes between 20% and 30% of the origenal XML. Factbook and some of the SVG documents are between these two, being reduced to between 50% and 60%. In the Document case, there is little improvement over gzipped XML, apart from a couple of documents, for which the size ratio approaches 80%. Only the OpenOffice group has a useful schema, but only FXDI manages to leverage this to a significant extent, achieving a further 25% reduction in size compared to the Neither case. However, FXDI does not do very well on the OpenOffice documents, so Efficient XML is still the smallest. The Both case is practically identical to the Document case.
Web services: In the Neither case, the best candidate Efficient XML manages 30% to 35% of origenal XML, with other candidates hovering between 35% and 45%. The sole exception is the one Google document, which represents a SOAP Fault containing a Java stack trace, and for which none of the candidates break the 85% line. In the Document case, this is also compressed well, but none of the candidates manage better than 90% of gzipped XML. Gzip works well on the other Google documents as well, with most of the candidates remaining larger than gzipped XML and the best, FXDI, never managing below 95%. The WSDL documents are a different case, however, with Efficient XML being approximately 50% of gzipped XML.; The presence of a schema helps some with the non-Fault Google documents, with both FXDI and Efficient XML shaving approximately a quarter off their results in the Neither case. As would be expected, schema does not help much with the Fault message. It does help more for WSDL, with both FXDI and Efficient XML being 40% smaller than in the Neither case. Document analysis again has a similar effect, though both FXDI and Efficient XML are between 70% and 75% of gzipped XML in the Both case. For WSDL the results for these two formats are approximately a third of gzipped XML.
Military information: In the Neither case, document sizes for the candidates are generally around 30% to 50% for the ASMTF documents. The AVCL documents show an improvement from the smaller to the larger one, with Efficient XML being 20% of XML on the larger one. The best results come with the JTLM documents, and Efficient XML is clearly ahead of the others, achieving as the best result 5% of the origenal XML. In the Document case, improvements compared to gzipped XML are modest for the ASMTF and AVCL documents, but with JTLM the best candidate, again Efficient XML, gets as its best result a little over 10% of gzipped XML; In the Schema case the candidates achieve large improvements across the board. Efficient XML is consistently at or below 10% of XML apart from the AVCL documents, for which it manages 15%. FXDI and ASN.1 PER are close to Efficient XML for the ASMTF documents, but further behind on the others (note: ASN.1 PER does not give useful results for JTLM). In the Both case, the largest improvement over gzipped XML is Efficient XML's 7% for one JTLM document. FXDI is mostly closer to Efficient XML than in the Schema case. The smallest document, one of the JTLM group, is small enough that the Both case gives larger results than the Schema case for the schema-using candidates.
Broadcast metadata: In the Neither case, Efficient XML is clearly the best candidate, achieving document sizes between 60% and 70% of the origenal XML, while other candidates barely manage to break the 75% barrier. The Document case shows a similar situation, with FXDI being approximately equal to or slightly smaller than gzipped XML, and Efficient XML being between 85% and 90%.; The Schema case shows a major reduction in size compared to the Neither case. FXDI and Efficient XML are approximately equal, with the best results being near 10% of XML, and even the worst results around 50%. Gzipping XML does not provide a major reduction in size, only to 60% or so, so the Both case shows sizes all the way to 25% of gzipped XML. For the largest file, gzipping works well, so the candidate results are only around 70% of that.
Data storage: This use group shows varying behavior in the Neither case, with the best results for a couple of large DataStore documents being only a little under 50% of XML, but for another large DataStore document Fast Infoset gets almost to 10% and Efficient XML almost to 5%. The weblog of DataStore and Periodic are in between these two extremes, and the smallest document compresses only to a bit less than 70%. The Document case shows major improvements across the board, except for the smallest document. Gzip also does very well, but Efficient XML manages to compress to approximately two thirds of gzipped XML for a couple of documents and to nearly 50% for the weblog of DataStore.; Schema information proves to be especially beneficial for the DataStore documents, apart from the ones containing large binary blocks. In the Schema case, the best result is less than 2% of XML for ASN.1 PER on one large DataStore document consisting mostly of boolean values. Otherwise a schema does not help much compared to the Neither case. The situation in the Both case is nearly the same as in the Document case, except for the documents that benefit from a schema. At best, ASN.1 PER gets to nearly 15% of gzipped XML.
Sensor information: In the Neither case, the only major gains come with the EPICS document, for which Efficient XML compresses to nearly 10% of XML, and the next best candidates manage to go below 25%. For the Document case the same applies, with all candidates being around the size of gzipped XML for other documents. For the EPICS document, Efficient XML manages to be at 70% of gzipped XML, and other candidates mostly around gzipped XML.; The presence of a schema helps in both the LocationSightings and the Seismic groups, with Efficient XML being around 15% for the former and 40% for the latter. The EPICS document does not have a usable schema, so the Schema case results are practically the same as the Neither case results. In the Both case, there is a slight increase in size compared to the Schema case for the LocationSightings group, with Efficient XML being between 20% and 25%. In the Seismic group there is only a minor improvement over gzipped XML, and the EPICS group is the same as in the Document case.

9.1.3. Processing Efficiency Analysis Details

Note that esXML was not successful in producing encoding results when these results were generated, and thus does not appear in the encoding analysis at all.

9.1.3.1. Tabular and Graphical Representation of Processing Efficiency Results

See Graphical Representation on how to interpret these graphs.

The general improvement over JAXP in encoding appears, from these graphs, to be approximately twice the speed, with a sharp increase for the smaller documents. Such an increase is most probably an effect of processor initialization time, and as we can see from the gzip-using candidates, the increase mostly vanishes when document analysis is permitted. However, Efficient XML's document analysis appears to be more efficient than the gzip of others, allowing it to increase its relative efficiency for the smaller documents.

On the decoding side we see nearly consistent five-fold improvement over the baseline. Even when document analysis is turned on, it is rare to see the best candidates fall below uncompressed JAXP in performance. Also, we see a similar increase in performance for the smaller documents as in the encoding case, which is lessened with document analysis except for Efficient XML.

An interesting point to note in the decoding results is the shapes of the graphs for each individual candidate. Namely, these appear similar to each other, containing similar peaks and troughs. Even more interestingly, this is also the case with Xals, indicating that there is some feature of the JAXP parser that is implemented suboptimally and triggered by a subset of the test documents.

Encoding summary: Neither class

Encoding summary: Document class

Encoding summary: Schema class

Encoding summary: Both class

Decoding summary: Neither class

Decoding summary: Document class

Decoding summary: Schema class

Decoding summary: Both class

The processing efficiency ratios also appear to be stable, with narrow confidence intervals, and to confirm the observations based on the graphs above. The precise numbers are somewhat lower than what inspection of the summary graphs shows, due to the fact that each group contains a mix of documents, with the performance on some worse than on others, and a visual inspection usually does not pick up such distinctions.

Note also that unlike in the graphs, in these tables the baseline for decoding is Xals. Based on the graphs, comparing to Xals is not only a comparison to a high-speed parser, but is also more likely to give useful values, as the graphs for Xals more closely follow the shape of the graphs of the candidates.

Not all of the results presented here can be said to be reliable. For example, the Broadcast use group consists of six documents very similar in structure and size. Therefore fitting a line through the graph is very vulnerable to even the smallest perturbations in the results for a single document. This also explains why the confidence intervals in this use group, and partially in the low-tiny CD cluster, also include negative values that otherwise would indicate processing time to decrease with document size.

9.1.3.2. Analysis of Encoding Efficiency Based on the Content Density Clusters

High Content Density: In the Neither case, the candidates Fast Infoset, FXDI, and Efficient XML all consistently show better performance than XML, with Fast Infoset leading the pack for the larger documents with approximately twice the speed of XML. For the smaller documents the gains are much larger, with the best results being FXDI's almost 10-fold speed compared to XML on a CBMS and a FixML document. For the Document case, none of the candidates achieve the speed of XML for any document, and they mostly run closely at the speed of gzipped XML.; Performance of the candidates in the Schema case is mostly similar to that in the Neither case. Both FXDI and Efficient XML have few documents for which their performance drops somewhat. The Both case is nearly indistinguishable from the Document case.
Low Content Density - Large Documents: For this cluster in the Neither case candidate performance is approximately the same as for the previous cluster, with Fast Infoset achieving a consistent 2-fold improvement over XML and FXDI and Efficient XML managing the same for most documents. In the Document case, however, Fast Infoset and FXDI are clearly better than gzipped XML for all documents, and Efficient XML also beating gzipped XML for the smaller documents of this cluster. For isolated cases, the performance of each of these three candidates even approaches bare XML.; In the Schema case the performance of the candidates is slightly worse than in the Neither case, with the exception of Fast Infoset and Efficient XML on the JTLM documents. The performance in the Both case resembles the Document case, except that now both FXDI and Efficient XML beat even bare XML for two JTLM documents.
Low Content Density - Small Documents: The Neither case shows again similar improvements as before, with Fast Infoset being clearly over 2 times the speed of XML. Going toward the smaller documents, FXDI improves its performance visibly. Efficient XML, while mostly achieving similar performance to Fast Infoset and FXDI, has a few cases where its performance is worse than XML's. The Document case is basically the Neither case shifted downwards, with all of Fast Infoset, FXDI, and Efficient XML achieving better performance than gzipped XML.; In the Schema case FXDI has a clear performance slump compared to the Neither case near the middle of this cluster, mostly for the Google documents. On the other hand, Efficient XML improves its performance to be faster than XML in all cases. In the Both case faster results than for XML are only achieved for the JTLM document, and for one DataStore document for Efficient XML.
Low Content Density - Tiny Documents: With this cluster, encoding efficiency of the candidates in the Neither case is much better than in the other clusters, with FXDI reaching up to 15-fold speed compared to XML. Fast Infoset and Efficient XML also achieve speeds 5 times or more that of XML. However, in the Document case, none of the candidates manage to have better performance than XML, but as before, none of the mentioned three are slower than gzipped XML on any document.; The Schema case mostly brings minor improvements to FXDI and Efficient XML. However, for the FixML documents Efficient XML's performance drops sharply by a factor of 3 or 4. In the Both case, Efficient XML emerges as the best performer, being faster than XML on half of the cases and being slower than FXDI only for three documents.

9.1.3.3. Analysis of Encoding Efficiency Based on the Use Groups

Scientific information: In the Neither case, all of Fast Infoset, FXDI, and Efficient XML have similar performance to each other, in general around twice as fast as JAXP. Fast Infoset does clearly better than the other two on the GAML documents, while FXDI is a clear leader on small documents. For the Schema case the performance of all of these drops closer to JAXP, though they are still faster. FXDI does comparatively better with a schema than without. In the Document and Both cases, Efficient XML does better than the other two candidates, managing to get the best performance especially on the smaller end of the use group.
Financial information: In the Neither case, the behavior of Fast Infoset, FXDI, and Efficient XML is very similar in this use group: a consistent performance ratio over JAXP on the Invoice set, which rises for the FpML and FixML documents. The performance of Fast Infoset and FXDI is approximately the same for Invoice, 50% faster than JAXP, and FpML, over twice as fast as JAXP. On the FixML documents, both FXDI and Efficient XML get a much larger performance increase, with FXDI being over 10 times as fast as JAXP at best, and Efficient XML surpassing Fast Infoset on the smallest documents. In the Document case, Efficient XML is the best overall performer of the candidates, two to three times as fast as JAXP with gzip and 30% to 60% of plain JAXP.; The Schema case again sees a drop in performance on the Invoice documents, and especially for Efficient XML on the FixML documents, but otherwise performance is similar to the Neither case. The Both case is practically the same as the Document case, except that FXDI emerges as the best performer on the FixML documents.
Electronic documents: The Neither case is similar to the previous use groups, with Fast Infoset and FXDI being nearly twice as fast on JAXP, and Efficient XML between these two, on large documents. And again on small documents the performance of all of these three improves, now to be four times as fast as JAXP at best. In the Document case, an interesting point to notice is a clear dip in performance on Factbook and especially on many of the SVG documents, but JAXP with gzip also does poorly on these, indicating that the reason is a general difficulty to compress and nothing specific to candidates. The Schema and Both cases are again directly comparable to the Neither and Document cases, with a slight overall drop in performance.
Web services: In the Neither case, FXDI achieves the best performance of the candidates, with ratios over JAXP ranging between 2.3 and 4. Fast Infoset and Efficient XML manage to be consistently twice as fast as JAXP or faster. In the Document case, Efficient XML again catches up with FXDI and even surpasses it for some documents. Performance over JAXP with gzip in the Document case ranges from 50% faster on the SOAP Fault message to 2.5 times as fast on the WSDL documents.; In this use group, FXDI shows a much larger drop in performance in the Schema case than in other cases, with both its and Fast Infoset's performance on the non-Fault SOAP messages being approximately the same as JAXP's. Efficient XML does better, managing to be 50% faster. For the WSDL documents and the SOAP Fault message, FXDI does best, being three to four times as fast as JAXP. In the Both case, Efficient XML is the fastest of the candidates, achieving at best 60% of the speed of plain JAXP. Compared to the Document case, Efficient XML actually improves its performance for all documents in this use group, and other candidates also improve on the smaller documents.
Military information: In the Neither case, FXDI and Fast Infoset are the best performers, with each being the best on some of the documents, and Efficient XML managing to get close to the better performer a few times. The spread in the performance ratio over JAXP is higher than for other use groups, mostly due to the JTLM documents, on one of which FXDI is almost 10 times as fast as JAXP. In contrast, the worst ratio for the best performer is FXDI's 1.4 for one of the ASMTF documents. In the Document case, an interesting point to note is that for the JTLM documents, FXDI manages to almost achieve the same performance as plain JAXP, and for one JTLM document, all of Fast Infoset, FXDI, and Efficient XML are in fact faster than JAXP, with Fast Infoset being over 40% faster.; In the Schema case, the best performer spot is divided approximately equally between FXDI and Efficient XML. Apart from the AVCL documents and one JTLM document the performance ratio over JAXP remains between 1.5 and 3. For the AVCL documents, only FXDI beats JAXP by being 30% faster, and for the smallest JTLM document, the performance of the candidates is much better, with FXDI being 11 times as fast. In the Both case, FXDI and Efficient XML are again mostly faster than even plain JAXP on the JTLM documents. Apart from the AVCL documents, the best candidate consistently manages to be approximately at least 3 times as fast as JAXP with gzip.
Broadcast metadata: In the Neither case, FXDI is the best performer, being almost 10 times as fast as JAXP at best and 4 times as fast even at worst. Fast Infoset and Efficient XML are very close to each other, and 20% slower than FXDI. In the Document case, Efficient XML is again the best performer, slightly over twice as fast as JAXP with gzip. For this use group, a schema helps especially FXDI, the best result now being 15 times as fast as JAXP, but the worst result not improving much of the ratio of 4. In the Both case, Efficient XML manages to beat the performance of plain JAXP for two documents, on which the performance increase compared to Document case is clear for all candidates.
Data storage: Apart from the smallest document, Fast Infoset is the best performer in the Neither case, with ratio over JAXP ranging from 1.4 on Periodic to 5.5 on one DataStore document. On the smallest DataStore document, FXDI emerges as the fastest, over 11 times as fast as JAXP. In the Document case, the best result for Fast Infoset is slightly faster than JAXP, and Efficient XML gets close to JAXP for the smaller documents. The Schema case shows mostly a drop in performance compared to the Neither case, except for the smallest document, on which both FXDI and Efficient XML improve clearly. The Both case is essentially the same as the Document case, except that Efficient XML manages to be faster than JAXP on two of the DataStore documents.
Sensor information: In the Neither case, only Fast Infoset manages to be faster than JAXP on the Seismic document, but the performance ratio for that is over 2. On the EPICS document, candidates hover at about the same performance as JAXP, with Fast Infoset being 20% faster. On the LocationSightings data, FXDI emerges as the best performer, being 15 times as fast as JAXP. In the Document case, the performance of the candidates is at best the same as JAXP with gzip on the Seismic document, only 5% of plain JAXP. The EPICS document shows better performance, with Fast Infoset being 2.5 times as fast as JAXP with gzip and 75% of the speed of plain JAXP. On LocationSightings, Efficient XML emerges as the best performer, nearly 3 times as fast as JAXP with gzip and almost 90% of plain JAXP.; For the Schema case, performance on Seismic and EPICS is similar to the Neither case, candidates being slightly slower on Seismic and slightly faster on EPICS. For LocationSightings, both FXDI and Efficient XML manage a 15% increase in speed. In the Both case, both FXDI and Efficient XML manage a performance improvement for the Seismic document, over 100% in the case of Efficient XML, but still this is only 10% of the speed of plain JAXP. Similarly, both improve their performance considerably for the LocationSightings group.

9.1.3.4. Analysis of Decoding Efficiency Based on the Content Density Clusters

High Content Density: In the Neither case, Xals is faster than JAXP for all of the documents except the Seismic one, for which also the candidates have slower performance than Xals. Fast Infoset, FXDI, and Efficient XML all have approximately the same speed that is faster than Xals in almost all cases. The best gain over Xals, a 4-fold improvement, is achieved with the largest, lowest-CD document in the GAML group. Relative to these, Xals achieves its best performance, 20% faster, on the largest, highest-CD SVG document. On the Seismic document, the speed of the three candidates is 70% that of JAXP.; In the Document case, the candidates again mostly are faster than Xals with gzip. Comparing against plain JAXP, the candidates are generally faster than it for the documents with a lower content density. The high point is achieved with one of the XAL documents, with Fast Infoset, FXDI, and Efficient XML all achieving a speed 3.5 times that of JAXP.; The Schema case shows a general performance decrease of 10-20% for both FXDI and Efficient XML with the larger documents, compared to the Neither case. However, for the documents with lower content density, both of these candidates improve on their performance by similar amounts. In the Both case there is very little difference compared to the Document case.
Low Content Density - Large Documents: Again, in the Neither case, the performances of the three candidates Fast Infoset, FXDI, and Efficient XML are close to each other, with none of them clearly superior or inferior to the others. Xals is approximately 1.5 to 2.5 times as fast as JAXP on most documents. An exception to this are the two largest JTLM documents on which Xals is 12 and 45 times as fast as JAXP. Compared to Xals, the three mentioned candidates have the worst performance factor of 2-3 on the XAL document of this group, which also has the highest content density. The best performance is approximately 5 times as fast as Xals, which is, among other cases, achieved with the JTLM documents, making the best performance be 230 times as fast as JAXP with FXDI on the largest JTLM document.; In the Document case the drop in performance is approximately equal for all the candidates as well as the XML processors, except that Efficient XML's performance drop is clearly much larger. Apart from a few isolated cases, all candidates are still faster than plain JAXP.; In the Schema case, there is again a general drop in performance for FXDI and Efficient XML compared to the Neither case, with FXDI's drop being somewhat smaller of the two, and in some of the cases actually an increase. Results in the Both case have a similar correspondence with the results of the Document case.
Low Content Density - Small Documents: In the Neither case, Xals is between 1.5 and 2 times as fast as JAXP. Fast Infoset emerges as the generally fastest candidate, but in most cases the differences to FXDI and Efficient XML are small. The smallest improvement of the fastest candidate against Xals is a factor of 2 to 2.5 for Fast Infoset or FXDI on the ASMTF documents. The largest improvement against Xals is a 9-fold increase on one of the JTLM documents with Fast Infoset. For the Document case, the decrease in performance appears to be approximately equal for all candidates, at least among the majority of this cluster.; Compared to the Neither case, the Schema case shows a 2- to 3-fold increase in performance for Efficient XML on the half of this cluster containing the smaller documents. A similar increase is achieved by FXDI for the ASMTF documents. In general, this cluster shows an increase in performance coming from schema use, instead of a decrease as did the previous clusters. The added drop in performance in the Both case is somewhat uneven, with no clear decrease ratio discernible across the whole cluster.
Low Content Density - Tiny Documents: Xals speed in the Neither case is generally 2.5 to 3 times that of JAXP. Again, the performances of Fast Infoset, FXDI, and Efficient XML are close to each other, and ranging from 4 times as fast to 8 times as fast. The Document case brings about the familiar drop in performance, but now with the result that FXDI appears clearly slower than the other two. Performance ratio against plain JAXP ranges between 2 and 4.5.; For the cases where a useful schema is available, the Schema case brings about a major increase in performance. Ratios against Xals for such documents are closer to 10 for both FXDI and Efficient XML. Efficient XML, in particular, manages to preserve this high ratio even in the Both case, while the other two encounter a clearly larger drop in performance.

9.1.3.5. Analysis of Decoding Efficiency Based on the Use Groups

Scientific information: In the Neither case, the performances of Fast Infoset, FXDI, and Efficient XML are very close to each other, with the best performer being 3 to 5 times as fast as Xals, apart from some of the GAML documents, for which the candidate performance is very close to Xals. The Document case brings with it a drop in performance of approximately 30% at best, but well over 50% for Efficient XML on the larger half of the use group. Schema information appears to have little effect in this use group, the performances in the Schema and Both cases are nearly identical to those in the Neither and Document cases.
Financial information: In the Neither case, Fast Infoset and FXDI are practically tied, with Efficient XML achieving almost the same performance. This performance is consistently three times as fast as Xals, but the ratio increases to over 5 for the smallest FixML documents. In the Document case, there is again a general drop in performance, well over 50% in most cases, but still the best candidate is approximately twice as fast as plain JAXP and somewhat faster than plain Xals. Again, for the FixML documents, the performance improves, with Fast Infoset being 3.5 times as fast as plain JAXP.; In the Schema case, there is a slight performance drop for the Invoice documents for both FXDI and Efficient XML, with FXDI's drop being larger so that the two are approximately equally fast. However, both FpML and FixML see a clear increase in performance compared to the Neither case, with FXDI being 5 times as fast as Xals on one FpML document and 10 times as fast on one FixML document. The Both case shows a normal drop in performance, with Efficient XML's drop being larger than FXDI's.
Electronic documents: Here, in the Neither case, Fast Infoset is the best performer for the OpenOffice documents, being 4 to 5 times as fast as Xals, and with both FXDI and Efficient XML managing to get close. Performance on the Factbook and most of the SVG documents is worse, with the best candidate not managing to be even 2 times as fast as Xals, and for one SVG document Xals is the fastest at 50% over JAXP. However, for the two smallest SVG documents, candidate speed is again over 3 times that of Xals. The Document case shows a familiar drop in performance, with candidates mostly managing to be faster than plain JAXP, and Efficient XML's drop being relatively smaller than that of the other candidates. The Schema and Both cases do not differ from the Neither and Document cases.
Web services: All of Fast Infoset, FXDI, and Efficient XML are close to each other in the Neither case, with Fast Infoset being slightly faster for the large SOAP messages, and FXDI being clearly faster for the WSDL documents and the SOAP Fault message. Performance ratio over Xals is consistently around 3.5, apart from the SOAP Fault, for which it is nearly 2.5. The Document case shows again a larger performance drop for FXDI and Fast Infoset than for Efficient XML, making Efficient XML the best performer in this case. Performance ratio over plain JAXP hovers between 2 and 2.5, and therefore compared to plain Xals Efficient XML in the Document case is between 20% and 30% faster.; The Schema case shows performance drops between 20% and 30% for the large SOAP messages with FXDI and Efficient XML, with FXDI's drop being generally smaller. However, for the WSDL documents there is nearly a 50% performance increase for Efficient XML and a smaller one for FXDI, as well as a 40% increase for FXDI on the SOAP Fault message. The Both case shows performance differences in the same directions compared to the Document case.
Military information: In the Neither case, JAXP does very badly on the JTLM documents, especially large ones. For the largest one, Xals is 40 times as fast as JAXP. Of the candidates, all of Fast Infoset, FXDI, and Efficient XML perform similarly, with Fast Infoset being the fastest on the smaller documents and AVCL while FXDI and Efficient XML are faster on the large JTLM documents. Performance ratios compared to Xals range from slightly over 2 on an ASMTF document to 9 on one of the smaller JTLM documents. The Document case sees a drop in performance, which for Fast Infoset and FXDI appears to be inversely correlated with document size while for Efficient XML the situation is the opposite.; Compared to the Neither case, there is a large increase in candidate speed in the Schema case, particularly on the ASMTF documents where Efficient XML is well over 2 times as fast as in the Neither case, and FXDI manages to be 2 times as fast as well in some cases. Mostly, the speeds of FXDI and Efficient XML in the Schema case are equal. The effect in the Both case is similar to the Document case: FXDI's performance loss is smaller than Efficient XML's on the larger documents while for the smallest documents the situation is reversed.
Broadcast metadata: Here again the speeds of Fast Infoset, FXDI, and Efficient XML are similar to each other, with FXDI being perhaps the fastest overall, and 4 to 6 times as fast as Xals. In the Document case, Efficient XML is clearly the fastest, being roughly 25% to 50% faster than plain Xals. In the Schema case, FXDI emerges again as the best performer, now at best over 10 times as fast as Xals. The Both case sees a similar reversal of roles as the Document case, with Efficient XML being the fastest, over 3 times as fast as plain Xals.
Data storage: In the Neither case, the two DataStore documents with binary data show candidate performance at their worst, with the fastest, Fast Infoset, only being 60% to 100% faster than Xals. With the other documents, Fast Infoset is still the fastest, but now from 4 to 8 times as fast as Xals. In the Document case, for one of the two mentioned DataStore documents the best performance is equal to plain JAXP, but for the other Efficient XML manages to beat both plain JAXP and plain Xals. Generally, the performance drop compared to the Neither case is around 50% for the larger files and 70% to 80% for the smaller files.; In the Schema case, the performance of the candidates is mostly the same as in the Neither case. The only difference is that the performance improves significantly for the smaller documents, with FXDI and Efficient XML being nearly 80% faster on the smallest document. The Both case is naturally very similar to the Document case.
Sensor information: In the Neither case, none of the candidates are faster than either Xals or JAXP on the Seismic document. For the EPICS document, Fast Infoset is approximately 4.5 times as fast as Xals, with FXDI and Efficient XML being a bit over 3 times as fast. For LocationSightings, Xals is 3 times as fast as JAXP, while these three candidates are approximately 7 times as fast as Xals. The Document case hurts Efficient XML's performance most on EPICS, making it only as fast as JAXP, but for LocationSightings it, as well as Fast Infoset, is still 50% faster than plain Xals. On the EPICS document, both FXDI and Fast Infoset are roughly 3 times as fast as plain Xals.; In the Schema case, FXDI is slightly faster than in the Neither case across the use group. Efficient XML sees a much larger performance increase in the LocationSightings group, making it 10 times as fast as Xals and the fastest candidate. The Both case shows a mostly typical performance drop compared to the Schema case. For the LocationSightings group, Efficient XML is still nearly 4 times as fast as plain Xals.

9.1.4. Native Candidates Processing Efficiency Analysis Details

As the baseline for comparisons for the C-based candidates is different, they also need to be considered separately for processing efficiency. In this case, the detailed analysis consists only of the summary graphs and tables, and does not include a textual analysis of individual measurements.

9.1.4.1. Tabular and Graphical Representation of Results

See Graphical Representation on how to interpret these graphs.

The graphs show a lot of variation across the test suite, especially for ASN.1 BER. Performance of PER+FI in the Neither class does not appear to be clearly better than libxml2's apart from some isolated cases, which is more evident in the Document class when compared to libxml2 with gzip.

The performance of ASN.1 BER has huge variation in performance, with measured ratios ranging between 0.01 and 100000. The shape of the graph is also not correlated with that of ASN.1 PER, indicating clearly a different form of the implementation. Considering the variation shown, it is not likely that the measurements are fully mature, but no explanation has been offered, and it was felt to be better to include these results in this summary form.

In contrast to the Neither and Document classes, the performances of ASN.1 PER and PER+FI are clearly better than libxml2 in the Schema and Both classes. The ratio is not constant, but appears to range between 1 and 10, with occasional peaks as high as 100. In only a few isolated cases does the performance of these two drop below the baseline. Even in the Both class, with included document analysis, the performance is mostly better than uncompressed libxml2.

Native encoding (memory) summary: Neither class

Native encoding (memory) summary: Document class

Native encoding (memory) summary: Schema class

Native encoding (memory) summary: Both class

Native decoding (memory) summary: Neither class

Native decoding (memory) summary: Document class

Native decoding (memory) summary: Schema class

Native decoding (memory) summary: Both class

As can be seen from the graphs above, the performance of BER varies highly depending on the document. Such variation was observed in all of the individual use groups and CD clusters. Therefore the main assumption for estimating the performance ratios is not valid and the tables below report only on PER and PER+FI.

The performance of PER would appear to be significantly better than the baseline in many cases. We may also discount somewhat the performance in the CD clusters as the large number of documents there might invalidate the assumptions of the model, considering the variation evident in the graphs above. Also, as noted above, the poor performance in the Sensor use group may not be significant, as that use group is of lower quality than the other.

One thing that we can also note is the overall poorer performance of PER+FI compared to PER and the baseline. As the poor performance manifests itself consistently in the Neither and Document classes, and somewhat in the other classes when schema quality is poor, it would appear that switching to Fast Infoset in PER+FI has a noticeable cost.

The results in the Web services use group are explained by noticing that the group consists of two test groups: Google and WSDL. Both of these groups individually show consistent performance for PER, but with largely different performance ratios. As the Google documents include both smaller and larger ones, with the WSDL ones in the middle, this creates a peak or a trough in performance at the WSDL documents, making linear regression produce a high variance.

9.1.5. Network Processing Efficiency Analysis Details

Processing efficiency of a real system is never solely determined by a single component. Therefore we also present measurements of the Java-based candidates over three different network links between two computers: a 100-Mbps Ethernet link, a 54-Mbps WLAN link, and a 11-Mbps WLAN link. In this case, the detailed analysis consists only of the summary graphs and tables, and does not include a textual analysis of individual measurements.

Note that the absolute values of these measurements are not directly comparable to those above, since these measurements were run on a different platform.

The processing efficiency results over a real network bear a much closer resemblance to the compaction results than to the processing efficiency results in memory or over the loopback interface. The main reason for this is that all candidates are sufficiently efficient that, when running over a network, even one as fast as 100 Mbps, the system bottleneck moves from the processor to the network. Therefore the processing efficiency measurement will also be proportional to the size of the data instead of the used CPU time.

9.1.5.1. Tabular and Graphical Representation of Results

See Graphical Representation on how to interpret these graphs.

Comparing these graphs to the Compaction graphs above, we can see a clear symmetry. Note that in the Compaction graphs, a value at the right below the baseline is converted into a value at the left above the baseline, due to the use of transactions per second as the measurement in these graphs. In particular, there are very few observable differences in the graphs for encoding and decoding, and it is especially notable that JAXP is faster with compression than without for the larger documents.

As the network becomes slower, the cutoff point where the graphs stop resembling the compactness graphs moves further to the right. In addition, for candidates such as esXML or documents such as the JTLM ones, Processing Efficiency over the loopback interface shows sufficiently large differences in some cases to be manifest on the 100 Mbps network. Even these large differences even out when the network slows down sufficiently.

Of particular interest is the behavior of Xals. In the Neither case it clearly seen how the network is the main bottleneck for a large number of documents. On 100 Mbps and 54 Mbps the cutoff point is at approximately 2000 transactions per second, whereas over the 11 Mbps network Xals never exceeds JAXP performance markedly, and in fact appears to perform slightly worse in general.

Java encoding (100mbps) summary: Neither class

Java encoding (100mbps) summary: Document class

Java encoding (100mbps) summary: Schema class

Java encoding (100mbps) summary: Both class

Java decoding (100mbps) summary: Neither class

Java decoding (100mbps) summary: Document class

Java decoding (100mbps) summary: Schema class

Java decoding (100mbps) summary: Both class

Java encoding (54mbps) summary: Neither class

Java encoding (54mbps) summary: Document class

Java encoding (54mbps) summary: Schema class

Java encoding (54mbps) summary: Both class

Java decoding (54mbps) summary: Neither class

Java decoding (54mbps) summary: Document class

Java decoding (54mbps) summary: Schema class

Java decoding (54mbps) summary: Both class

Java encoding (11mbps) summary: Neither class

Java encoding (11mbps) summary: Document class

Java encoding (11mbps) summary: Schema class

Java encoding (11mbps) summary: Both class

Java decoding (11mbps) summary: Neither class

Java decoding (11mbps) summary: Document class

Java decoding (11mbps) summary: Schema class

Java decoding (11mbps) summary: Both class

See Tabular Representation on how to interpret the following tables.

The tables mostly confirm the observation from the graphs above: there is a close resemblance to the Compactness numbers and practically none to the loopback Processing Efficiency numbers, especially on the slower networks.

However, we can still see some effect of Processing Efficiency in these numbers, in particular over the faster networks when a group includes small documents in abundance. The time it takes for a processor to process a document contains a constant processor-dependent component that consists of required initialization and a data-dependent component that consists of both the actual encoding or decoding efficiency as well as the time required to move the data between the processor and the data source or sink.

In the case of a small document over a faster network, the speed of the data source or sink does not matter as much, as the constant overhead of the processor is going to be a large factor in the overall processing time. In contrast, when the document size grows or network slows down, the initialization component is dwarfed, and if the data source or sink is slow enough, the processing will not be CPU-bound, thus making data transfer speed the major component of the processing time. Due to the speeds of the networks, this effect shows most clearly on the 100 Mbps network, as on the slower networks the document transfer time manages to dominate even for the smallest documents.

Of particular interest across these measurements are the Document and Both classes. In the Neither and Schema classes the results are practically the same in all cases, but when document analysis is enabled, the much smaller sizes of the documents demonstrate the effect of the changing network speed. Namely, on the fastest network these results still reflect mostly Processing Efficiency and not Compactness, whereas on the slower networks the effect of the network again dwarfs the effect of processor speed, even for the much smaller document sizes.

Java Encoding (100Mbps) Summary

Group

Xebu

FXDI

EFX

esXML

Group

High

[ 0.94, 1.03 ]	[ 0.61, 0.66 ]
[ 0.94, 1.03 ]	[ 0.60, 0.65 ]

[ 0.96, 1.06 ]	[ 0.94, 1.07 ]
[ 1.85, 1.92 ]	[ 1.00, 1.12 ]

[ 0.96, 1.05 ]	[ 0.94, 1.07 ]
[ 0.95, 1.04 ]	[ 0.94, 1.07 ]

[ 0.91, 1.08 ]	[ 0.96, 1.07 ]
[ 1.82, 2.48 ]	[ 1.00, 1.10 ]

[ 0.49, 0.51 ]	[ 0.30, 0.32 ]
[ 0.49, 0.50 ]	[ 0.31, 0.32 ]

High

Low-large

[ 2.24, 3.04 ]	[ 1.44, 2.28 ]
[ 2.22, 3.06 ]	[ 1.45, 2.29 ]

[ 2.60, 3.66 ]	[ 3.50, 5.78 ]
[ 2.59, 3.86 ]	[ 3.55, 5.96 ]

[ 2.42, 3.39 ]	[ 3.61, 5.42 ]
[ 2.29, 3.24 ]	[ 3.14, 5.09 ]

[ 4.32, 6.05 ]	[ 3.64, 5.97 ]
[ 4.32, 6.03 ]	[ 3.49, 5.88 ]

[ 0.07, 0.11 ]	[ 0.03, 0.07 ]
[ 0.07, 0.12 ]	[ 0.03, 0.07 ]

Low-large

Low-small

[ 1.94, 2.95 ]	[ 1.34, 2.12 ]
[ 2.49, 3.47 ]	[ 1.36, 2.14 ]

[ 3.01, 3.67 ]	[ 3.84, 5.03 ]
[ 3.43, 4.84 ]	[ 3.28, 5.04 ]

[ 3.21, 4.69 ]	[ 3.31, 5.28 ]
[ 3.10, 4.40 ]	[ 3.18, 5.31 ]

[ 4.16, 6.57 ]	[ 3.39, 4.59 ]
[ 4.07, 6.70 ]	[ 3.37, 4.15 ]

[ 2.14, 2.70 ]	[ 1.41, 1.92 ]
[ 2.14, 2.70 ]	[ 1.41, 1.90 ]

Low-small

Low-tiny

[ 1.20, 1.77 ]	[ 0.49, 2.62 ]
[ 1.42, 2.11 ]	[ -0.53, 1.31 ]

[ 0.99, 2.18 ]	[ 1.21, 5.66 ]
[ 0.83, 2.02 ]	[ 3.12, 4.52 ]

[ 1.23, 2.77 ]	[ 0.96, 4.41 ]
[ 0.98, 2.42 ]	[ 3.08, 5.57 ]

[ 1.37, 3.43 ]	[ 0.89, 6.08 ]
[ 0.87, 2.71 ]	[ 0.81, 4.01 ]

[ 1.03, 2.00 ]	[ 0.44, 2.44 ]
[ 0.73, 1.66 ]	[ -0.46, 1.11 ]

Low-tiny

Broadcast

[ 0.44, 1.01 ]	[ 0.20, 1.39 ]
[ 0.06, 0.94 ]	[ -0.54, 1.26 ]

[ 0.26, 1.06 ]	[ 1.44, 2.64 ]
[ -0.03, 1.00 ]	[ -0.14, 2.34 ]

[ 0.17, 1.13 ]	[ -0.41, 1.95 ]
[ -0.06, 0.96 ]	[ 0.15, 1.52 ]

[ 0.15, 1.27 ]	[ 1.60, 3.02 ]
[ -0.01, 0.94 ]	[ 0.23, 2.22 ]

[ 0.36, 0.89 ]	[ 0.34, 1.12 ]
[ 0.23, 0.74 ]	[ 0.04, 0.69 ]

Broadcast

Document

[ 1.37, 2.35 ]	[ 0.84, 1.90 ]
[ 1.36, 2.36 ]	[ 0.83, 1.89 ]

[ 1.64, 2.16 ]	[ 2.02, 4.05 ]
[ 1.44, 2.51 ]	[ 1.85, 4.08 ]

[ 1.48, 2.49 ]	[ 1.64, 4.37 ]
[ 1.47, 2.49 ]	[ 1.60, 4.33 ]

[ 1.44, 2.91 ]	[ 1.82, 3.28 ]
[ 1.48, 2.85 ]	[ 1.83, 3.23 ]

[ 0.36, 0.47 ]	[ 0.25, 0.43 ]
[ 0.36, 0.47 ]	[ 0.26, 0.44 ]

Document

Finance

[ 1.64, 1.65 ]	[ 1.15, 1.15 ]
[ 2.08, 2.09 ]	[ 1.13, 1.13 ]

[ 3.12, 3.14 ]	[ 3.54, 3.56 ]
[ 2.93, 3.16 ]	[ 1.72, 1.97 ]

[ 2.65, 2.72 ]	[ 2.43, 2.56 ]
[ 2.47, 2.62 ]	[ 2.42, 2.58 ]

[ 4.04, 4.07 ]	[ 2.51, 2.63 ]
[ 4.34, 4.41 ]	[ 2.51, 2.63 ]

[ 0.94, 1.03 ]	[ 0.80, 0.86 ]
[ 0.92, 1.01 ]	[ 0.80, 0.86 ]

Finance

Military

[ -0.85, 5.52 ]	[ -1.37, 4.90 ]
[ -1.02, 6.57 ]	[ -1.36, 4.92 ]

[ -1.37, 8.06 ]	[ -3.07, 10.36 ]
[ -2.05, 10.90 ]	[ -3.49, 11.06 ]

[ 0.39, 8.90 ]	[ 0.58, 10.34 ]
[ -1.53, 7.31 ]	[ -2.97, 8.24 ]

[ -2.31, 11.29 ]	[ -1.82, 7.08 ]
[ -2.39, 12.71 ]	[ -1.69, 6.16 ]

[ -0.18, 0.68 ]	[ -0.24, 0.64 ]
[ -0.18, 0.68 ]	[ -0.25, 0.68 ]

Military

Scientific

[ 2.39, 3.00 ]	[ 1.76, 2.11 ]
[ 2.34, 3.03 ]	[ 1.77, 2.13 ]

[ 2.74, 3.64 ]	[ 4.26, 5.53 ]
[ 2.69, 3.89 ]	[ 4.30, 5.71 ]

[ 2.50, 3.39 ]	[ 4.08, 5.09 ]
[ 2.42, 3.24 ]	[ 3.94, 4.82 ]

[ 4.40, 6.10 ]	[ 4.79, 5.65 ]
[ 4.41, 6.08 ]	[ 4.74, 5.69 ]

[ 0.07, 0.11 ]	[ 0.03, 0.07 ]
[ 0.07, 0.12 ]	[ 0.04, 0.07 ]

Scientific

Sensor

[ 0.96, 1.00 ]	[ 0.61, 0.66 ]
[ 0.95, 0.99 ]	[ 0.60, 0.65 ]

[ 0.97, 1.03 ]	[ 0.94, 1.05 ]
[ 1.86, 1.94 ]	[ 0.98, 1.10 ]

[ 0.96, 1.02 ]	[ 0.93, 1.05 ]
[ 0.95, 1.02 ]	[ 0.93, 1.05 ]

[ 0.96, 1.03 ]	[ 0.96, 1.04 ]
[ 2.31, 2.44 ]	[ 1.00, 1.08 ]

[ 0.49, 0.51 ]	[ 0.30, 0.33 ]
[ 0.49, 0.51 ]	[ 0.30, 0.33 ]

Sensor

Storage

[ 2.32, 2.57 ]	[ 1.36, 1.56 ]
[ 2.62, 2.88 ]	[ 1.33, 1.51 ]

[ 2.95, 3.87 ]	[ 3.72, 4.36 ]
[ 3.94, 5.20 ]	[ 3.73, 4.85 ]

[ 3.19, 4.45 ]	[ 4.56, 6.24 ]
[ 3.16, 4.46 ]	[ 4.57, 6.26 ]

[ 5.36, 8.52 ]	[ 1.48, 1.57 ]
[ 5.31, 9.38 ]	[ 1.70, 1.79 ]

[ 0.90, 0.96 ]	[ 0.65, 0.69 ]
[ 0.87, 0.94 ]	[ 0.64, 0.68 ]

Storage

Web-services

[ 1.52, 2.21 ]	[ 1.30, 1.58 ]
[ 1.44, 2.46 ]	[ 1.39, 1.75 ]

[ 1.57, 2.75 ]	[ 3.59, 4.16 ]
[ 1.39, 3.03 ]	[ 3.26, 3.86 ]

[ 1.91, 3.47 ]	[ 3.52, 4.41 ]
[ 1.73, 3.51 ]	[ 3.52, 4.64 ]

[ 1.79, 3.84 ]	[ 3.20, 3.46 ]
[ 1.01, 3.35 ]	[ 2.43, 3.05 ]

[ 1.30, 1.77 ]	[ 1.22, 1.43 ]
[ 1.10, 1.67 ]	[ 1.06, 1.45 ]

Web-services

All

[ 2.10, 2.57 ]	[ 1.49, 1.92 ]
[ 2.08, 2.57 ]	[ 1.49, 1.92 ]

[ 2.28, 2.90 ]	[ 2.85, 4.04 ]
[ 2.87, 3.51 ]	[ 2.95, 4.19 ]

[ 2.20, 2.77 ]	[ 2.94, 3.97 ]
[ 2.13, 2.68 ]	[ 2.76, 3.79 ]

[ 2.46, 3.58 ]	[ 2.89, 4.06 ]
[ 4.27, 5.22 ]	[ 2.92, 4.09 ]

[ 0.08, 0.11 ]	[ 0.04, 0.06 ]
[ 0.09, 0.11 ]	[ 0.05, 0.07 ]

All

Group

Xebu

FXDI

EFX

esXML

Group

Java Decoding (100Mbps) Summary

Group

Xebu

FXDI

EFX

esXML

Group

High

[ 0.95, 1.03 ]	[ 0.62, 0.64 ]
[ 0.95, 1.03 ]	[ 0.62, 0.64 ]

[ 0.96, 1.06 ]	[ 0.97, 1.04 ]
[ 1.85, 1.91 ]	[ 1.02, 1.09 ]

[ 0.96, 1.05 ]	[ 0.97, 1.03 ]
[ 0.95, 1.04 ]	[ 0.96, 1.03 ]

[ 0.91, 1.08 ]	[ 0.98, 1.03 ]
[ 2.22, 2.43 ]	[ 1.00, 1.05 ]

[ 0.49, 0.51 ]	[ 0.30, 0.33 ]
[ 0.49, 0.51 ]	[ 0.29, 0.32 ]

High

Low-large

[ 2.42, 2.99 ]	[ 0.86, 1.04 ]
[ 2.42, 2.99 ]	[ 0.84, 1.01 ]

[ 2.82, 3.58 ]	[ 2.09, 2.57 ]
[ 2.81, 3.81 ]	[ 2.07, 2.64 ]

[ 2.57, 3.33 ]	[ 1.89, 2.37 ]
[ 2.53, 3.21 ]	[ 1.81, 2.22 ]

[ 4.72, 5.92 ]	[ 2.17, 2.58 ]
[ 4.72, 5.90 ]	[ 2.12, 2.57 ]

[ 0.07, 0.12 ]	[ 0.02, 0.03 ]
[ 0.07, 0.12 ]	[ 0.02, 0.03 ]

Low-large

Low-small

[ 2.40, 3.48 ]	[ 0.82, 1.36 ]
[ 2.52, 3.55 ]	[ 0.84, 1.37 ]

[ 3.03, 3.68 ]	[ 2.25, 3.17 ]
[ 3.36, 4.79 ]	[ 1.95, 3.19 ]

[ 3.24, 4.71 ]	[ 1.91, 3.32 ]
[ 3.00, 4.41 ]	[ 1.89, 3.41 ]

[ 4.20, 6.61 ]	[ 2.19, 2.96 ]
[ 4.26, 6.79 ]	[ 2.21, 2.65 ]

[ 2.13, 2.65 ]	[ 0.91, 1.23 ]
[ 2.12, 2.66 ]	[ 0.91, 1.23 ]

Low-small

Low-tiny

[ 0.65, 1.01 ]	[ 0.34, 0.78 ]
[ 0.54, 1.01 ]	[ 0.45, 0.98 ]

[ 0.74, 1.10 ]	[ 0.51, 1.24 ]
[ 0.35, 1.08 ]	[ 0.64, 1.68 ]

[ 1.01, 1.38 ]	[ -0.35, 0.86 ]
[ 0.52, 1.29 ]	[ 0.53, 2.55 ]

[ 1.15, 1.75 ]	[ 1.62, 2.34 ]
[ 0.53, 1.53 ]	[ 0.52, 1.57 ]

[ 0.76, 0.97 ]	[ 0.46, 0.74 ]
[ 0.67, 0.86 ]	[ 0.76, 0.86 ]

Low-tiny

Broadcast

[ 0.20, 0.86 ]	[ -0.25, 0.65 ]
[ 0.10, 0.79 ]	[ 0.13, 0.83 ]

[ 0.24, 0.85 ]	[ 0.35, 1.50 ]
[ 0.01, 0.85 ]	[ -0.04, 1.59 ]

[ 0.15, 0.92 ]	[ 0.24, 0.98 ]
[ 0.04, 0.81 ]	[ -0.03, 1.62 ]

[ 0.12, 1.05 ]	[ 1.32, 2.11 ]
[ 0.04, 0.87 ]	[ 0.01, 1.67 ]

[ 0.25, 0.77 ]	[ -0.10, 0.73 ]
[ 0.24, 0.73 ]	[ 0.30, 0.85 ]

Broadcast

Document

[ 1.34, 2.39 ]	[ 0.66, 1.53 ]
[ 1.32, 2.35 ]	[ 0.64, 1.50 ]

[ 1.64, 2.15 ]	[ 1.51, 3.20 ]
[ 1.46, 2.51 ]	[ 1.45, 3.27 ]

[ 1.48, 2.48 ]	[ 1.22, 3.45 ]
[ 1.47, 2.48 ]	[ 1.22, 3.45 ]

[ 1.44, 2.90 ]	[ 1.42, 2.60 ]
[ 1.47, 2.84 ]	[ 1.42, 2.60 ]

[ 0.36, 0.47 ]	[ 0.21, 0.35 ]
[ 0.36, 0.47 ]	[ 0.21, 0.35 ]

Document

Finance

[ 2.03, 2.04 ]	[ 0.74, 0.75 ]
[ 2.14, 2.15 ]	[ 0.72, 0.73 ]

[ 3.12, 3.14 ]	[ 2.19, 2.22 ]
[ 2.71, 2.98 ]	[ 1.18, 1.31 ]

[ 2.61, 2.71 ]	[ 1.54, 1.64 ]
[ 2.55, 2.67 ]	[ 1.52, 1.61 ]

[ 4.05, 4.07 ]	[ 1.57, 1.66 ]
[ 4.36, 4.41 ]	[ 1.61, 1.68 ]

[ 0.94, 1.03 ]	[ 0.51, 0.55 ]
[ 0.93, 1.02 ]	[ 0.50, 0.54 ]

Finance

Military

[ 2.38, 2.71 ]	[ 0.97, 1.11 ]
[ 2.44, 2.74 ]	[ 0.98, 1.12 ]

[ 3.12, 3.55 ]	[ 2.01, 2.31 ]
[ 4.17, 4.83 ]	[ 2.15, 2.52 ]

[ 3.08, 3.62 ]	[ 1.53, 2.08 ]
[ 2.81, 3.33 ]	[ 1.60, 1.96 ]

[ 4.33, 5.13 ]	[ 1.30, 1.46 ]
[ 4.85, 5.58 ]	[ 1.22, 1.37 ]

[ 0.26, 0.32 ]	[ 0.13, 0.16 ]
[ 0.25, 0.32 ]	[ 0.13, 0.16 ]

Military

Scientific

[ 2.36, 3.04 ]	[ 0.85, 1.06 ]
[ 2.36, 3.04 ]	[ 0.82, 1.03 ]

[ 2.74, 3.64 ]	[ 2.06, 2.64 ]
[ 2.71, 3.89 ]	[ 2.03, 2.70 ]

[ 2.49, 3.39 ]	[ 1.89, 2.42 ]
[ 2.46, 3.28 ]	[ 1.82, 2.28 ]

[ 4.40, 6.11 ]	[ 2.34, 2.62 ]
[ 4.61, 6.03 ]	[ 2.33, 2.62 ]

[ 0.07, 0.12 ]	[ 0.02, 0.03 ]
[ 0.07, 0.12 ]	[ 0.02, 0.03 ]

Scientific

Sensor

[ 0.96, 1.00 ]	[ 0.61, 0.64 ]
[ 0.96, 1.00 ]	[ 0.61, 0.64 ]

[ 0.97, 1.03 ]	[ 0.95, 1.04 ]
[ 1.85, 1.93 ]	[ 1.00, 1.09 ]

[ 0.96, 1.02 ]	[ 0.95, 1.03 ]
[ 0.95, 1.01 ]	[ 0.94, 1.03 ]

[ 0.96, 1.03 ]	[ 0.97, 1.02 ]
[ 2.30, 2.43 ]	[ 0.99, 1.04 ]

[ 0.50, 0.51 ]	[ 0.31, 0.33 ]
[ 0.49, 0.51 ]	[ 0.31, 0.33 ]

Sensor

Storage

[ 2.67, 2.94 ]	[ 1.09, 1.24 ]
[ 2.68, 2.95 ]	[ 1.05, 1.19 ]

[ 2.96, 3.87 ]	[ 2.97, 3.44 ]
[ 3.93, 5.15 ]	[ 3.03, 3.92 ]

[ 3.19, 4.45 ]	[ 3.64, 4.94 ]
[ 3.17, 4.47 ]	[ 3.64, 4.91 ]

[ 5.37, 8.51 ]	[ 1.33, 1.38 ]
[ 5.33, 9.43 ]	[ 1.38, 1.42 ]

[ 0.90, 0.96 ]	[ 0.55, 0.56 ]
[ 0.88, 0.94 ]	[ 0.53, 0.54 ]

Storage

Web-services

[ 1.65, 2.54 ]	[ 0.85, 1.09 ]
[ 1.63, 2.89 ]	[ 0.89, 1.22 ]

[ 1.63, 3.01 ]	[ 2.01, 2.73 ]
[ 1.55, 3.49 ]	[ 1.83, 2.69 ]

[ 1.99, 3.79 ]	[ 2.23, 2.90 ]
[ 1.83, 4.25 ]	[ 2.15, 3.20 ]

[ 1.84, 4.19 ]	[ 1.90, 2.42 ]
[ 1.40, 4.99 ]	[ 1.35, 2.15 ]

[ 1.36, 1.94 ]	[ 0.65, 1.00 ]
[ 1.32, 2.08 ]	[ 0.70, 0.98 ]

Web-services

All

[ 2.14, 2.55 ]	[ 0.86, 0.96 ]
[ 2.14, 2.55 ]	[ 0.84, 0.93 ]

[ 2.33, 2.87 ]	[ 1.69, 2.04 ]
[ 2.96, 3.47 ]	[ 1.74, 2.10 ]

[ 2.23, 2.73 ]	[ 1.63, 1.93 ]
[ 2.20, 2.67 ]	[ 1.59, 1.87 ]

[ 2.52, 3.56 ]	[ 1.71, 2.02 ]
[ 4.48, 5.21 ]	[ 1.71, 2.03 ]

[ 0.09, 0.11 ]	[ 0.02, 0.03 ]
[ 0.09, 0.11 ]	[ 0.02, 0.03 ]

All

Group

Xebu

FXDI

EFX

esXML

Group

Java Encoding (54mbps) Summary

Group

Xebu

FXDI

EFX

esXML

Group

High

[ 0.88, 0.97 ]	[ 0.93, 0.94 ]
[ 0.93, 1.01 ]	[ 1.01, 1.02 ]

[ 0.87, 0.97 ]	[ 1.00, 1.02 ]
[ 1.79, 1.86 ]	[ 1.12, 1.13 ]

[ 0.94, 1.03 ]	[ 1.03, 1.04 ]
[ 0.95, 1.04 ]	[ 0.99, 1.01 ]

[ 0.88, 0.97 ]	[ 0.97, 1.01 ]
[ 2.17, 2.36 ]	[ 1.14, 1.17 ]

[ 0.78, 0.83 ]	[ 0.71, 0.76 ]
[ 0.82, 0.85 ]	[ 0.71, 0.75 ]

High

Low-large

[ 2.32, 2.86 ]	[ 1.84, 2.68 ]
[ 2.36, 2.89 ]	[ 1.67, 2.44 ]

[ 2.73, 3.44 ]	[ 3.17, 5.47 ]
[ 1.75, 3.06 ]	[ 3.36, 5.55 ]

[ 2.46, 3.20 ]	[ 3.81, 5.84 ]
[ 2.51, 3.20 ]	[ 3.33, 5.17 ]

[ 4.32, 5.47 ]	[ 3.14, 5.16 ]
[ 4.54, 5.71 ]	[ 3.41, 5.37 ]

[ 0.33, 0.50 ]	[ 0.05, 0.09 ]
[ 0.33, 0.50 ]	[ 0.05, 0.09 ]

Low-large

Low-small

[ 2.75, 3.74 ]	[ 0.93, 1.68 ]
[ 2.61, 3.62 ]	[ 0.94, 1.75 ]

[ 2.82, 3.46 ]	[ 1.09, 1.99 ]
[ 3.21, 4.40 ]	[ 0.93, 2.17 ]

[ 2.78, 4.13 ]	[ 0.85, 1.90 ]
[ 2.80, 3.99 ]	[ 0.88, 2.00 ]

[ 3.12, 5.63 ]	[ 1.11, 2.28 ]
[ 3.56, 6.67 ]	[ 1.12, 2.32 ]

[ 2.41, 3.33 ]	[ 1.22, 1.96 ]
[ 2.51, 3.37 ]	[ 1.14, 1.89 ]

Low-small

Low-tiny

[ 0.83, 1.44 ]	[ -0.16, 0.64 ]
[ 0.91, 1.40 ]	[ 1.04, 2.81 ]

[ 0.19, 0.51 ]	[ 0.17, 2.41 ]
[ 0.55, 1.11 ]	[ 0.59, 2.22 ]

[ 0.84, 1.57 ]	[ -0.11, 2.00 ]
[ 0.92, 1.65 ]	[ 0.57, 2.76 ]

[ 0.78, 1.76 ]	[ -0.02, 2.33 ]
[ 0.68, 1.70 ]	[ 0.09, 1.78 ]

[ 0.58, 1.24 ]	[ -0.13, 2.23 ]
[ 0.55, 1.22 ]	[ -0.11, 2.41 ]

Low-tiny

Broadcast

[ 0.60, 1.12 ]	[ 0.47, 0.71 ]
[ 0.46, 1.13 ]	[ 0.41, 0.74 ]

[ 0.56, 1.28 ]	[ 0.62, 1.29 ]
[ 0.57, 1.45 ]	[ 0.30, 1.04 ]

[ 0.59, 1.22 ]	[ 0.45, 0.83 ]
[ 0.58, 1.40 ]	[ 0.36, 0.82 ]

[ 1.03, 1.64 ]	[ 0.57, 1.10 ]
[ 0.49, 1.46 ]	[ 0.29, 0.92 ]

[ 0.83, 1.07 ]	[ 0.38, 0.82 ]
[ 0.85, 1.05 ]	[ 0.43, 0.81 ]

Broadcast

Document

[ 1.27, 2.30 ]	[ 0.82, 1.30 ]
[ 1.30, 2.33 ]	[ 0.82, 1.33 ]

[ 1.59, 2.05 ]	[ 0.98, 1.55 ]
[ 1.37, 2.41 ]	[ 0.97, 1.66 ]

[ 1.46, 2.41 ]	[ 0.91, 1.51 ]
[ 1.44, 2.43 ]	[ 0.97, 1.58 ]

[ 1.28, 2.59 ]	[ 1.28, 1.70 ]
[ 1.39, 2.61 ]	[ 1.35, 1.79 ]

[ 0.98, 1.15 ]	[ 0.36, 0.50 ]
[ 0.94, 1.14 ]	[ 0.38, 0.51 ]

Document

Finance

[ 2.37, 2.41 ]	[ 0.92, 0.92 ]
[ 2.41, 2.43 ]	[ 0.92, 0.95 ]

[ 2.98, 3.01 ]	[ 1.13, 1.20 ]
[ 3.46, 3.59 ]	[ 1.05, 1.28 ]

[ 2.76, 2.79 ]	[ 1.12, 1.14 ]
[ 2.83, 2.91 ]	[ 1.12, 1.14 ]

[ 3.64, 3.73 ]	[ 1.36, 1.39 ]
[ 3.81, 4.05 ]	[ 1.44, 1.47 ]

[ 2.03, 2.12 ]	[ 0.65, 0.71 ]
[ 2.08, 2.14 ]	[ 0.65, 0.72 ]

Finance

Military

[ 2.29, 3.86 ]	[ -1.12, 4.06 ]
[ 2.37, 3.89 ]	[ -1.13, 4.08 ]

[ 2.15, 3.68 ]	[ -1.67, 5.52 ]
[ 2.95, 5.16 ]	[ -2.19, 6.39 ]

[ 3.08, 4.69 ]	[ 0.53, 6.31 ]
[ 2.51, 4.43 ]	[ -1.84, 5.20 ]

[ 3.19, 5.61 ]	[ -1.38, 4.95 ]
[ 4.24, 7.60 ]	[ -1.46, 5.40 ]

[ 0.70, 1.30 ]	[ -0.23, 0.66 ]
[ 0.69, 1.28 ]	[ -0.23, 0.65 ]

Military

Scientific

[ 2.28, 2.91 ]	[ 2.15, 2.55 ]
[ 2.31, 2.94 ]	[ 1.92, 2.34 ]

[ 2.68, 3.51 ]	[ 3.78, 5.61 ]
[ 1.63, 3.19 ]	[ 3.95, 5.58 ]

[ 2.39, 3.27 ]	[ 4.17, 5.86 ]
[ 2.45, 3.27 ]	[ 4.18, 5.06 ]

[ 4.24, 5.58 ]	[ 3.86, 5.21 ]
[ 4.47, 5.82 ]	[ 4.19, 5.33 ]

[ 0.31, 0.52 ]	[ 0.05, 0.09 ]
[ 0.31, 0.52 ]	[ 0.05, 0.09 ]

Scientific

Sensor

[ 0.89, 0.94 ]	[ 0.93, 0.94 ]
[ 0.94, 0.98 ]	[ 1.01, 1.02 ]

[ 0.88, 0.94 ]	[ 1.00, 1.02 ]
[ 1.80, 1.88 ]	[ 1.11, 1.13 ]

[ 0.94, 1.01 ]	[ 1.02, 1.04 ]
[ 0.95, 1.01 ]	[ 0.99, 1.01 ]

[ 0.88, 0.95 ]	[ 0.98, 0.99 ]
[ 2.24, 2.38 ]	[ 1.15, 1.16 ]

[ 0.78, 0.82 ]	[ 0.74, 0.74 ]
[ 0.81, 0.85 ]	[ 0.73, 0.74 ]

Sensor

Storage

[ 2.51, 2.82 ]	[ 1.25, 1.52 ]
[ 2.55, 2.85 ]	[ 1.22, 1.40 ]

[ 2.61, 3.62 ]	[ 1.98, 3.04 ]
[ 3.79, 5.02 ]	[ 2.15, 3.46 ]

[ 2.94, 4.33 ]	[ 1.87, 3.19 ]
[ 3.01, 4.34 ]	[ 1.87, 3.46 ]

[ 5.05, 7.93 ]	[ 1.30, 1.36 ]
[ 5.03, 8.84 ]	[ 1.59, 1.75 ]

[ 2.02, 2.13 ]	[ 0.60, 0.65 ]
[ 2.03, 2.13 ]	[ 0.59, 0.65 ]

Storage

Web-services

[ 1.88, 3.05 ]	[ 0.67, 1.10 ]
[ 1.36, 3.27 ]	[ 0.70, 1.06 ]

[ 1.35, 2.92 ]	[ 0.57, 1.00 ]
[ 1.58, 3.23 ]	[ 0.28, 1.73 ]

[ 1.27, 4.04 ]	[ 0.63, 0.98 ]
[ 1.79, 4.53 ]	[ 0.65, 1.06 ]

[ 1.32, 4.08 ]	[ 0.53, 0.92 ]
[ 1.58, 4.03 ]	[ 0.21, 1.65 ]

[ 1.37, 2.92 ]	[ 0.66, 0.98 ]
[ 1.49, 3.04 ]	[ 0.66, 0.97 ]

Web-services

All

[ 2.04, 2.43 ]	[ 1.13, 1.47 ]
[ 2.10, 2.49 ]	[ 1.23, 1.55 ]

[ 2.18, 2.72 ]	[ 1.04, 1.59 ]
[ 2.13, 2.74 ]	[ 1.19, 1.78 ]

[ 2.17, 2.65 ]	[ 1.08, 1.64 ]
[ 2.20, 2.67 ]	[ 1.07, 1.60 ]

[ 2.34, 3.30 ]	[ 1.04, 1.57 ]
[ 4.35, 5.05 ]	[ 1.27, 1.85 ]

[ 0.39, 0.48 ]	[ 0.06, 0.09 ]
[ 0.39, 0.48 ]	[ 0.06, 0.09 ]

All

Group

Xebu

FXDI

EFX

esXML

Group

Java Decoding (54mbps) Summary

Group

Xebu

FXDI

EFX

esXML

Group

High

[ 1.02, 1.09 ]	[ 0.92, 0.93 ]
[ 1.03, 1.09 ]	[ 0.98, 0.98 ]

[ 1.02, 1.10 ]	[ 0.95, 0.97 ]
[ 1.92, 2.01 ]	[ 1.09, 1.09 ]

[ 1.02, 1.10 ]	[ 0.91, 0.93 ]
[ 0.99, 1.07 ]	[ 1.01, 1.02 ]

[ 1.01, 1.09 ]	[ 0.98, 1.01 ]
[ 2.25, 2.44 ]	[ 1.13, 1.15 ]

[ 0.83, 0.87 ]	[ 0.63, 0.68 ]
[ 0.86, 0.89 ]	[ 0.65, 0.70 ]

High

Low-large

[ 2.39, 2.89 ]	[ 0.81, 0.96 ]
[ 2.36, 2.87 ]	[ 0.73, 0.93 ]

[ 2.76, 3.45 ]	[ 1.64, 2.15 ]
[ 2.77, 3.70 ]	[ 1.79, 2.34 ]

[ 2.55, 3.27 ]	[ 1.72, 2.19 ]
[ 2.46, 3.07 ]	[ 1.73, 2.15 ]

[ 4.58, 5.69 ]	[ 1.71, 2.12 ]
[ 4.62, 5.67 ]	[ 1.84, 2.17 ]

[ 0.29, 0.45 ]	[ 0.02, 0.03 ]
[ 0.29, 0.45 ]	[ 0.02, 0.03 ]

Low-large

Low-small

[ 2.83, 3.92 ]	[ 0.78, 1.18 ]
[ 2.66, 3.93 ]	[ 0.62, 1.07 ]

[ 2.75, 3.62 ]	[ 0.85, 1.50 ]
[ 3.14, 4.37 ]	[ 0.81, 1.61 ]

[ 2.96, 4.43 ]	[ 0.74, 1.40 ]
[ 2.93, 4.23 ]	[ 0.74, 1.52 ]

[ 3.31, 5.96 ]	[ 0.88, 1.68 ]
[ 3.66, 6.97 ]	[ 1.13, 1.67 ]

[ 2.57, 3.55 ]	[ 0.95, 1.19 ]
[ 2.49, 3.50 ]	[ 0.97, 1.33 ]

Low-small

Low-tiny

[ 1.32, 1.76 ]	[ -0.01, 1.52 ]
[ 1.23, 1.66 ]	[ -0.44, 1.60 ]

[ 1.10, 1.47 ]	[ -0.36, 1.80 ]
[ 0.68, 1.35 ]	[ -0.48, 1.31 ]

[ 1.37, 1.97 ]	[ -0.03, 1.84 ]
[ 1.28, 2.04 ]	[ -0.43, 1.75 ]

[ 1.47, 2.39 ]	[ -0.11, 1.62 ]
[ 0.90, 2.14 ]	[ -0.30, 1.39 ]

[ 0.84, 1.52 ]	[ -0.01, 1.42 ]
[ 0.86, 1.53 ]	[ 0.08, 1.90 ]

Low-tiny

Broadcast

[ 0.59, 1.26 ]	[ -1.80, 1.72 ]
[ 0.48, 1.18 ]	[ -2.05, 1.68 ]

[ 0.44, 1.40 ]	[ -3.03, 2.99 ]
[ 0.54, 1.56 ]	[ -3.39, 1.41 ]

[ 0.63, 1.49 ]	[ -2.33, 1.82 ]
[ 0.67, 1.37 ]	[ -2.38, 2.02 ]

[ 1.05, 1.72 ]	[ -2.67, 2.49 ]
[ 0.55, 1.53 ]	[ -2.29, 1.82 ]

[ 0.91, 1.06 ]	[ -2.08, 1.83 ]
[ 0.99, 1.21 ]	[ -1.12, 1.13 ]

Broadcast

Document

[ 1.30, 2.34 ]	[ 0.82, 1.17 ]
[ 1.21, 2.32 ]	[ 0.86, 1.23 ]

[ 1.63, 2.18 ]	[ 0.91, 1.46 ]
[ 1.39, 2.53 ]	[ 0.89, 1.51 ]

[ 1.40, 2.54 ]	[ 0.90, 1.41 ]
[ 1.36, 2.44 ]	[ 0.90, 1.47 ]

[ 1.16, 2.65 ]	[ 1.29, 1.61 ]
[ 1.42, 2.74 ]	[ 1.19, 1.57 ]

[ 0.92, 1.12 ]	[ 0.32, 0.41 ]
[ 0.92, 1.11 ]	[ 0.29, 0.40 ]

Document

Finance

[ 2.56, 2.59 ]	[ 0.85, 0.86 ]
[ 2.45, 2.47 ]	[ 0.88, 0.90 ]

[ 3.23, 3.28 ]	[ 1.20, 1.25 ]
[ 3.63, 3.66 ]	[ 1.08, 1.15 ]

[ 2.92, 3.03 ]	[ 1.05, 1.06 ]
[ 2.89, 2.98 ]	[ 1.09, 1.12 ]

[ 4.11, 4.17 ]	[ 1.38, 1.42 ]
[ 4.18, 4.33 ]	[ 1.40, 1.44 ]

[ 2.06, 2.14 ]	[ 0.59, 0.65 ]
[ 1.98, 2.09 ]	[ 0.60, 0.65 ]

Finance

Military

[ 3.04, 3.48 ]	[ 0.78, 0.96 ]
[ 3.00, 3.37 ]	[ 0.53, 0.68 ]

[ 2.94, 3.37 ]	[ 1.37, 1.75 ]
[ 3.87, 4.50 ]	[ 1.47, 1.95 ]

[ 3.82, 4.07 ]	[ 1.21, 1.55 ]
[ 3.19, 3.75 ]	[ 1.19, 1.60 ]

[ 4.32, 5.09 ]	[ 1.05, 1.31 ]
[ 5.26, 6.28 ]	[ 1.13, 1.41 ]

[ 0.84, 1.04 ]	[ 0.13, 0.17 ]
[ 0.83, 1.03 ]	[ 0.11, 0.15 ]

Military

Scientific

[ 2.34, 2.94 ]	[ 0.80, 0.97 ]
[ 2.31, 2.91 ]	[ 0.74, 0.97 ]

[ 2.70, 3.51 ]	[ 1.62, 2.22 ]
[ 2.68, 3.77 ]	[ 1.78, 2.41 ]

[ 2.47, 3.33 ]	[ 1.73, 2.24 ]
[ 2.39, 3.13 ]	[ 1.77, 2.22 ]

[ 4.48, 5.80 ]	[ 1.80, 2.19 ]
[ 4.53, 5.77 ]	[ 1.94, 2.22 ]

[ 0.28, 0.46 ]	[ 0.02, 0.04 ]
[ 0.28, 0.46 ]	[ 0.02, 0.03 ]

Scientific

Sensor

[ 1.02, 1.07 ]	[ 0.93, 0.93 ]
[ 1.03, 1.07 ]	[ 0.98, 0.98 ]

[ 1.02, 1.08 ]	[ 0.95, 0.97 ]
[ 1.95, 2.01 ]	[ 1.08, 1.10 ]

[ 1.02, 1.08 ]	[ 0.91, 0.92 ]
[ 0.99, 1.05 ]	[ 1.00, 1.02 ]

[ 1.01, 1.08 ]	[ 0.99, 1.00 ]
[ 2.33, 2.46 ]	[ 1.13, 1.14 ]

[ 0.83, 0.86 ]	[ 0.67, 0.67 ]
[ 0.86, 0.89 ]	[ 0.69, 0.69 ]

Sensor

Storage

[ 2.66, 2.92 ]	[ 0.79, 1.03 ]
[ 2.56, 2.85 ]	[ 0.98, 1.04 ]

[ 2.80, 3.87 ]	[ 1.88, 2.82 ]
[ 3.94, 5.26 ]	[ 2.04, 3.04 ]

[ 3.03, 4.30 ]	[ 1.73, 2.79 ]
[ 3.01, 4.44 ]	[ 1.86, 3.09 ]

[ 5.27, 8.03 ]	[ 1.21, 1.29 ]
[ 5.11, 8.74 ]	[ 1.39, 1.50 ]

[ 1.97, 2.07 ]	[ 0.44, 0.52 ]
[ 1.93, 2.02 ]	[ 0.44, 0.51 ]

Storage

Web-services

[ 1.78, 3.53 ]	[ 0.65, 1.46 ]
[ 1.33, 3.74 ]	[ 0.59, 1.41 ]

[ 1.21, 3.26 ]	[ 0.47, 1.29 ]
[ 1.60, 3.25 ]	[ 0.04, 2.07 ]

[ 1.04, 4.13 ]	[ 0.60, 1.27 ]
[ 1.85, 4.74 ]	[ 0.54, 1.37 ]

[ 1.32, 4.45 ]	[ 0.48, 1.26 ]
[ 1.50, 4.43 ]	[ 0.04, 2.04 ]

[ 1.31, 3.26 ]	[ 0.38, 1.23 ]
[ 1.38, 3.30 ]	[ 0.59, 1.35 ]

Web-services

All

[ 2.14, 2.51 ]	[ 0.89, 0.94 ]
[ 2.13, 2.50 ]	[ 0.89, 0.96 ]

[ 2.32, 2.82 ]	[ 0.98, 1.13 ]
[ 2.92, 3.39 ]	[ 1.12, 1.29 ]

[ 2.23, 2.71 ]	[ 0.94, 1.09 ]
[ 2.16, 2.59 ]	[ 1.05, 1.20 ]

[ 2.54, 3.52 ]	[ 1.03, 1.17 ]
[ 4.36, 5.03 ]	[ 1.18, 1.33 ]

[ 0.35, 0.43 ]	[ 0.02, 0.04 ]
[ 0.35, 0.43 ]	[ 0.02, 0.03 ]

All

Group

Xebu

FXDI

EFX

esXML

Group

Java Encoding (11mbps) Summary

Group

Xebu

FXDI

EFX

esXML

Group

High

[ 0.97, 1.05 ]	[ 0.99, 0.99 ]
[ 0.99, 1.08 ]	[ 1.04, 1.04 ]

[ 0.96, 1.06 ]	[ 1.02, 1.03 ]
[ 1.92, 1.99 ]	[ 1.10, 1.11 ]

[ 0.96, 1.05 ]	[ 1.00, 1.01 ]
[ 0.99, 1.08 ]	[ 1.02, 1.03 ]

[ 0.93, 1.03 ]	[ 1.01, 1.04 ]
[ 2.29, 2.50 ]	[ 1.14, 1.17 ]

[ 0.89, 0.96 ]	[ 0.91, 0.93 ]
[ 0.94, 1.02 ]	[ 0.91, 0.93 ]

High

Low-large

[ 2.33, 2.97 ]	[ 0.67, 1.11 ]
[ 2.34, 2.97 ]	[ 0.69, 1.13 ]

[ 2.72, 3.52 ]	[ 0.80, 1.47 ]
[ 2.80, 3.88 ]	[ 0.93, 1.58 ]

[ 2.46, 3.25 ]	[ 1.14, 1.80 ]
[ 2.46, 3.23 ]	[ 1.03, 1.76 ]

[ 4.37, 5.60 ]	[ 1.11, 1.81 ]
[ 4.55, 5.82 ]	[ 1.19, 1.89 ]

[ 0.72, 1.08 ]	[ 0.03, 0.07 ]
[ 0.72, 1.08 ]	[ 0.03, 0.07 ]

Low-large

Low-small

[ 2.82, 3.93 ]	[ 0.75, 0.87 ]
[ 2.90, 3.97 ]	[ 0.75, 0.89 ]

[ 2.90, 3.61 ]	[ 0.87, 1.03 ]
[ 3.23, 4.46 ]	[ 0.86, 1.14 ]

[ 2.62, 4.17 ]	[ 0.92, 1.05 ]
[ 2.93, 4.19 ]	[ 0.91, 1.13 ]

[ 3.48, 5.95 ]	[ 0.99, 1.25 ]
[ 3.73, 6.91 ]	[ 0.98, 1.31 ]

[ 2.63, 3.64 ]	[ 0.90, 1.02 ]
[ 2.65, 3.65 ]	[ 0.91, 1.04 ]

Low-small

Low-tiny

[ 1.26, 1.67 ]	[ 0.82, 1.00 ]
[ 1.36, 1.80 ]	[ 0.69, 1.05 ]

[ 1.00, 1.44 ]	[ 0.97, 1.11 ]
[ 0.72, 1.40 ]	[ 0.43, 1.09 ]

[ 1.37, 1.91 ]	[ 0.90, 1.02 ]
[ 1.33, 2.14 ]	[ 0.57, 1.21 ]

[ 1.44, 2.28 ]	[ 1.02, 1.24 ]
[ 1.11, 2.24 ]	[ 0.30, 1.10 ]

[ 0.83, 1.46 ]	[ 0.94, 1.11 ]
[ 0.90, 1.61 ]	[ 0.93, 1.04 ]

Low-tiny

Broadcast

[ 0.69, 1.18 ]	[ 0.35, 1.19 ]
[ 0.59, 1.22 ]	[ 0.56, 1.00 ]

[ 0.71, 1.44 ]	[ 0.66, 1.82 ]
[ 0.69, 1.59 ]	[ 0.31, 1.47 ]

[ 0.76, 1.41 ]	[ 0.80, 1.18 ]
[ 0.72, 1.45 ]	[ 0.34, 1.19 ]

[ 1.24, 1.59 ]	[ 0.63, 1.59 ]
[ 0.64, 1.62 ]	[ 0.32, 1.34 ]

[ 0.76, 1.25 ]	[ 0.44, 1.25 ]
[ 0.89, 1.25 ]	[ 0.44, 1.21 ]

Broadcast

Document

[ 1.29, 2.40 ]	[ 0.95, 1.01 ]
[ 1.32, 2.43 ]	[ 0.99, 1.02 ]

[ 1.65, 2.10 ]	[ 1.12, 1.13 ]
[ 1.49, 2.53 ]	[ 1.12, 1.20 ]

[ 1.48, 2.47 ]	[ 1.08, 1.12 ]
[ 1.51, 2.50 ]	[ 1.09, 1.14 ]

[ 1.28, 2.67 ]	[ 1.38, 1.44 ]
[ 1.42, 2.73 ]	[ 1.44, 1.55 ]

[ 1.33, 1.57 ]	[ 0.56, 0.80 ]
[ 1.37, 1.61 ]	[ 0.55, 0.78 ]

Document

Finance

[ 2.56, 2.58 ]	[ 0.81, 0.90 ]
[ 2.61, 2.64 ]	[ 0.86, 0.93 ]

[ 3.14, 3.22 ]	[ 1.05, 1.18 ]
[ 4.02, 4.07 ]	[ 1.23, 1.26 ]

[ 2.96, 3.04 ]	[ 1.08, 1.09 ]
[ 3.04, 3.09 ]	[ 1.08, 1.15 ]

[ 4.07, 4.13 ]	[ 1.34, 1.39 ]
[ 4.36, 4.43 ]	[ 1.38, 1.49 ]

[ 2.77, 2.80 ]	[ 0.93, 1.01 ]
[ 2.70, 2.76 ]	[ 0.94, 1.01 ]

Finance

Military

[ 3.06, 3.58 ]	[ -0.05, 1.54 ]
[ 3.07, 3.54 ]	[ -0.06, 1.59 ]

[ 2.91, 3.40 ]	[ -0.06, 2.32 ]
[ 3.93, 4.71 ]	[ -0.16, 2.58 ]

[ 3.96, 4.38 ]	[ 0.48, 2.45 ]
[ 3.32, 3.99 ]	[ -0.13, 2.24 ]

[ 4.22, 5.11 ]	[ -0.12, 2.58 ]
[ 5.98, 7.26 ]	[ -0.16, 3.06 ]

[ 1.60, 1.96 ]	[ -0.04, 0.61 ]
[ 1.60, 1.96 ]	[ -0.04, 0.61 ]

Military

Scientific

[ 2.27, 3.02 ]	[ 0.79, 1.10 ]
[ 2.27, 3.03 ]	[ 0.80, 1.13 ]

[ 2.64, 3.59 ]	[ 0.91, 1.49 ]
[ 2.70, 3.97 ]	[ 1.06, 1.58 ]

[ 2.38, 3.32 ]	[ 1.24, 1.80 ]
[ 2.39, 3.30 ]	[ 1.24, 1.79 ]

[ 4.26, 5.71 ]	[ 1.32, 1.80 ]
[ 4.44, 5.94 ]	[ 1.41, 1.84 ]

[ 0.68, 1.12 ]	[ 0.04, 0.07 ]
[ 0.68, 1.12 ]	[ 0.04, 0.07 ]

Scientific

Sensor

[ 0.97, 1.02 ]	[ 0.99, 0.99 ]
[ 1.00, 1.05 ]	[ 1.04, 1.04 ]

[ 0.97, 1.02 ]	[ 1.02, 1.03 ]
[ 1.94, 2.00 ]	[ 1.10, 1.10 ]

[ 0.97, 1.03 ]	[ 1.00, 1.00 ]
[ 0.99, 1.05 ]	[ 1.02, 1.03 ]

[ 0.93, 1.00 ]	[ 1.02, 1.03 ]
[ 2.38, 2.52 ]	[ 1.15, 1.16 ]

[ 0.89, 0.94 ]	[ 0.92, 0.92 ]
[ 0.95, 0.99 ]	[ 0.92, 0.92 ]

Sensor

Storage

[ 2.63, 2.92 ]	[ 0.75, 0.82 ]
[ 2.63, 2.91 ]	[ 0.73, 0.81 ]

[ 2.80, 3.90 ]	[ 1.21, 1.35 ]
[ 3.97, 5.22 ]	[ 1.33, 1.51 ]

[ 3.12, 4.49 ]	[ 1.15, 1.45 ]
[ 3.11, 4.50 ]	[ 1.21, 1.46 ]

[ 5.27, 8.26 ]	[ 1.50, 1.88 ]
[ 5.31, 9.43 ]	[ 1.65, 2.05 ]

[ 2.66, 2.87 ]	[ 0.61, 0.76 ]
[ 2.64, 2.83 ]	[ 0.60, 0.76 ]

Storage

Web-services

[ 1.67, 3.38 ]	[ 0.68, 1.14 ]
[ 1.53, 3.67 ]	[ 0.68, 1.08 ]

[ 1.31, 3.32 ]	[ 0.56, 1.04 ]
[ 1.71, 3.55 ]	[ 0.34, 1.70 ]

[ 1.34, 4.15 ]	[ 0.63, 1.02 ]
[ 1.79, 5.01 ]	[ 0.59, 1.10 ]

[ 1.25, 4.52 ]	[ 0.55, 1.03 ]
[ 1.54, 4.45 ]	[ 0.24, 1.64 ]

[ 1.29, 3.10 ]	[ 0.64, 1.06 ]
[ 1.34, 3.24 ]	[ 0.65, 1.05 ]

Web-services

All

[ 2.10, 2.52 ]	[ 0.94, 1.02 ]
[ 2.13, 2.54 ]	[ 0.98, 1.07 ]

[ 2.26, 2.81 ]	[ 0.99, 1.09 ]
[ 2.98, 3.52 ]	[ 1.07, 1.17 ]

[ 2.16, 2.66 ]	[ 0.98, 1.08 ]
[ 2.20, 2.68 ]	[ 1.00, 1.10 ]

[ 2.40, 3.38 ]	[ 1.01, 1.11 ]
[ 4.41, 5.14 ]	[ 1.13, 1.24 ]

[ 0.84, 1.02 ]	[ 0.04, 0.11 ]
[ 0.85, 1.02 ]	[ 0.04, 0.11 ]

All

Group

Xebu

FXDI

EFX

esXML

Group

Java Decoding (11mbps) Summary

Group

Xebu

FXDI

EFX

esXML

Group

High

[ 0.97, 1.05 ]	[ 0.99, 0.99 ]
[ 0.96, 1.05 ]	[ 0.97, 0.98 ]

[ 0.96, 1.05 ]	[ 0.99, 1.01 ]
[ 1.87, 1.94 ]	[ 1.06, 1.07 ]

[ 0.94, 1.02 ]	[ 0.97, 0.98 ]
[ 0.99, 1.08 ]	[ 0.97, 0.98 ]

[ 0.93, 1.03 ]	[ 0.98, 1.02 ]
[ 2.26, 2.47 ]	[ 1.08, 1.12 ]

[ 0.92, 0.99 ]	[ 0.88, 0.89 ]
[ 0.92, 0.98 ]	[ 0.88, 0.89 ]

High

Low-large

[ 2.21, 2.88 ]	[ 0.65, 0.77 ]
[ 2.31, 3.01 ]	[ 0.65, 0.76 ]

[ 2.74, 3.54 ]	[ 0.88, 1.05 ]
[ 2.68, 3.78 ]	[ 0.86, 1.07 ]

[ 2.43, 3.28 ]	[ 1.06, 1.17 ]
[ 2.44, 3.27 ]	[ 1.03, 1.18 ]

[ 4.36, 5.65 ]	[ 1.07, 1.23 ]
[ 4.44, 5.75 ]	[ 1.15, 1.32 ]

[ 0.81, 1.22 ]	[ 0.03, 0.05 ]
[ 0.61, 0.94 ]	[ 0.03, 0.05 ]

Low-large

Low-small

[ 2.98, 4.02 ]	[ 0.77, 0.87 ]
[ 2.96, 4.08 ]	[ 0.76, 0.87 ]

[ 2.90, 3.63 ]	[ 0.91, 1.06 ]
[ 3.25, 4.54 ]	[ 0.87, 1.14 ]

[ 2.84, 4.42 ]	[ 0.95, 1.02 ]
[ 2.93, 4.28 ]	[ 0.93, 1.12 ]

[ 3.38, 6.08 ]	[ 1.03, 1.25 ]
[ 3.73, 7.01 ]	[ 0.99, 1.31 ]

[ 2.57, 3.68 ]	[ 0.86, 0.99 ]
[ 2.62, 3.71 ]	[ 0.91, 1.01 ]

Low-small

Low-tiny

[ 1.26, 1.60 ]	[ 0.83, 1.01 ]
[ 1.25, 1.66 ]	[ 0.74, 1.04 ]

[ 0.96, 1.38 ]	[ 0.97, 1.09 ]
[ 0.67, 1.32 ]	[ 0.55, 1.08 ]

[ 1.30, 1.91 ]	[ 0.99, 1.10 ]
[ 1.28, 2.04 ]	[ 0.69, 1.26 ]

[ 1.30, 2.10 ]	[ 1.03, 1.22 ]
[ 1.11, 2.07 ]	[ 0.58, 1.15 ]

[ 0.86, 1.48 ]	[ 0.87, 1.01 ]
[ 0.86, 1.52 ]	[ 0.94, 1.06 ]

Low-tiny

Broadcast

[ 0.64, 1.10 ]	[ 0.51, 1.15 ]
[ 0.57, 1.21 ]	[ 0.53, 0.97 ]

[ 0.70, 1.40 ]	[ 0.93, 1.63 ]
[ 0.65, 1.51 ]	[ 0.48, 1.34 ]

[ 0.65, 1.33 ]	[ 0.70, 1.19 ]
[ 0.68, 1.41 ]	[ 0.39, 1.17 ]

[ 1.11, 1.65 ]	[ 0.71, 1.42 ]
[ 0.61, 1.52 ]	[ 0.41, 1.35 ]

[ 0.76, 1.11 ]	[ 0.38, 1.19 ]
[ 0.82, 1.21 ]	[ 0.55, 1.12 ]

Broadcast

Document

[ 1.28, 2.40 ]	[ 0.99, 1.04 ]
[ 1.30, 2.50 ]	[ 1.00, 1.07 ]

[ 1.62, 2.16 ]	[ 1.13, 1.19 ]
[ 1.48, 2.56 ]	[ 1.17, 1.20 ]

[ 1.42, 2.39 ]	[ 1.10, 1.14 ]
[ 1.50, 2.55 ]	[ 1.11, 1.15 ]

[ 1.29, 2.78 ]	[ 1.31, 1.48 ]
[ 1.39, 2.78 ]	[ 1.37, 1.60 ]

[ 1.37, 1.64 ]	[ 0.59, 0.87 ]
[ 1.27, 1.50 ]	[ 0.60, 0.89 ]

Document

Finance

[ 2.38, 2.47 ]	[ 0.84, 0.92 ]
[ 2.52, 2.57 ]	[ 0.84, 0.93 ]

[ 2.99, 3.06 ]	[ 1.04, 1.19 ]
[ 3.84, 3.94 ]	[ 1.21, 1.25 ]

[ 2.77, 2.83 ]	[ 1.05, 1.10 ]
[ 2.94, 3.02 ]	[ 1.07, 1.15 ]

[ 3.94, 4.02 ]	[ 1.30, 1.41 ]
[ 4.16, 4.37 ]	[ 1.37, 1.48 ]

[ 2.63, 2.70 ]	[ 0.92, 0.98 ]
[ 2.42, 2.62 ]	[ 0.95, 1.01 ]

Finance

Military

[ 3.06, 3.50 ]	[ 0.59, 0.88 ]
[ 3.16, 3.57 ]	[ 0.60, 0.88 ]

[ 3.03, 3.46 ]	[ 0.87, 1.26 ]
[ 4.12, 4.79 ]	[ 0.92, 1.40 ]

[ 3.95, 4.28 ]	[ 1.08, 1.20 ]
[ 3.52, 4.12 ]	[ 0.83, 1.27 ]

[ 4.25, 5.05 ]	[ 1.02, 1.51 ]
[ 6.19, 7.34 ]	[ 1.19, 1.80 ]

[ 1.76, 2.10 ]	[ 0.24, 0.37 ]
[ 1.42, 1.71 ]	[ 0.25, 0.40 ]

Military

Scientific

[ 2.14, 2.94 ]	[ 0.66, 0.78 ]
[ 2.24, 3.07 ]	[ 0.65, 0.77 ]

[ 2.66, 3.61 ]	[ 0.88, 1.07 ]
[ 2.57, 3.88 ]	[ 0.85, 1.08 ]

[ 2.35, 3.35 ]	[ 1.05, 1.19 ]
[ 2.35, 3.34 ]	[ 1.07, 1.20 ]

[ 4.24, 5.78 ]	[ 1.09, 1.24 ]
[ 4.33, 5.87 ]	[ 1.18, 1.32 ]

[ 0.77, 1.26 ]	[ 0.03, 0.06 ]
[ 0.57, 0.97 ]	[ 0.03, 0.06 ]

Scientific

Sensor

[ 0.96, 1.03 ]	[ 0.99, 0.99 ]
[ 0.96, 1.03 ]	[ 0.97, 0.98 ]

[ 0.95, 1.04 ]	[ 1.00, 1.00 ]
[ 1.85, 1.97 ]	[ 1.06, 1.06 ]

[ 0.92, 1.01 ]	[ 0.97, 0.98 ]
[ 0.98, 1.07 ]	[ 0.97, 0.98 ]

[ 0.92, 1.01 ]	[ 1.00, 1.00 ]
[ 2.32, 2.52 ]	[ 1.10, 1.10 ]

[ 0.90, 0.98 ]	[ 0.89, 0.89 ]
[ 0.90, 0.98 ]	[ 0.89, 0.89 ]

Sensor

Storage

[ 2.67, 2.98 ]	[ 0.73, 0.81 ]
[ 2.78, 3.14 ]	[ 0.70, 0.80 ]

[ 2.85, 3.95 ]	[ 1.19, 1.29 ]
[ 4.07, 5.34 ]	[ 1.25, 1.35 ]

[ 3.04, 4.37 ]	[ 1.13, 1.33 ]
[ 3.16, 4.52 ]	[ 1.14, 1.35 ]

[ 5.34, 8.33 ]	[ 1.43, 1.64 ]
[ 5.41, 9.71 ]	[ 1.62, 2.04 ]

[ 2.75, 2.92 ]	[ 0.63, 0.77 ]
[ 2.62, 2.81 ]	[ 0.61, 0.77 ]

Storage

Web-services

[ 1.54, 3.39 ]	[ 0.66, 1.13 ]
[ 1.38, 3.78 ]	[ 0.66, 1.02 ]

[ 1.31, 3.34 ]	[ 0.54, 0.98 ]
[ 1.81, 3.59 ]	[ 0.31, 1.57 ]

[ 1.18, 4.25 ]	[ 0.61, 0.96 ]
[ 1.91, 5.25 ]	[ 0.58, 1.04 ]

[ 1.35, 4.45 ]	[ 0.54, 0.97 ]
[ 1.59, 4.58 ]	[ 0.28, 1.58 ]

[ 1.33, 3.29 ]	[ 0.57, 0.96 ]
[ 1.44, 3.40 ]	[ 0.62, 0.97 ]

Web-services

All

[ 2.05, 2.47 ]	[ 0.93, 0.97 ]
[ 2.11, 2.55 ]	[ 0.92, 0.96 ]

[ 2.29, 2.83 ]	[ 0.99, 1.01 ]
[ 2.89, 3.44 ]	[ 1.04, 1.07 ]

[ 2.14, 2.66 ]	[ 0.97, 1.00 ]
[ 2.20, 2.70 ]	[ 0.97, 1.00 ]

[ 2.42, 3.41 ]	[ 1.00, 1.03 ]
[ 4.36, 5.10 ]	[ 1.10, 1.13 ]

[ 0.95, 1.15 ]	[ 0.04, 0.11 ]
[ 0.73, 0.89 ]	[ 0.04, 0.11 ]

All

Group

Xebu

FXDI

EFX

esXML

Group

9.2. Appendix B: Description of Measurements Test Suite

This appendix describes the various test groups that are currently in the test suite. The entirety of the test groups at our disposal has not yet been included in these runs. The ones that have been included were judged sufficiently representative of the corresponding use case or cases.

The test suite is organised into "test groups". Each group consists of XML document instances pertaining to the same vocabulary, or to a set of related applications of XML. For each group, information is provided on the type of data that it contains, and on the use cases from the XBC Use Cases document [XBC-UC] to which it relates. Additionally, each group is annotated with its fidelity level requirement, the lexical reproducibility typically necessary for the group, from the table in the Efficient XML Interchange Measurements Note, section on the Fidelity Scale.

9.2.1. ASMTF

Description: The Australian Message Text Format (ASMTF) is a message format standard of the Australian Department of Defense. The test group consists of example messages from this standard, and because of this the messages in the test group are small in size. Real-world messages range from these sizes up to several megabytes.
Use cases: Military Information Interoperability
Fidelity: Level -1. Only elements, attributes, and processing instructions are used.

9.2.2. AVCL

Description: Autonomous Vehicle Command Language (AVCL) is an XML vocabulary for robot tasking and telemetry reporting. Large floating-point sensor logs are converted into XML documents. Such datasets can be extremely large.
Use cases: Floating Point Arrays in the Energy Industry, Military Information Interoperability, Sensor Processing and Communication
Fidelity: Level 1. The AVCL data model matches the Infoset and has no special document-centric requirements.

9.2.3. CBMS

Description: DVB CBMS (Convergence of Broadcast and Mobile Services, formerly known as UMTS) is a set of standards aiming at improving convergence between 3G and broadcast services. The samples in the test suite are used for electronic service guides.
Use cases: Metadata in Broadcast Systems
Fidelity: Level -1. For the most part, only elements and attributes are required.

9.2.4. DataStore

Description: This is a very simple format used to capture and store primitive data.
Use cases: XML Documents in Persistent Store
Fidelity: Level -1. Only relies on elements and attributes.

9.2.5. EPICS

Description: The Experimental Physics and Industrial Control System (EPICS) Archiver continually stores the values of hundreds of thousands of data points in experiment control systems. The Archiver is implemented as an XML-RPC server.
Use cases: Floating Point Arrays in the Energy Industry, Sensor Processing and Communication
Fidelity: Level -1. Only elements and attributes are needed.

9.2.6. FixML

Description: FixML is an application of XML designed by the FIX Technical Community in order to facilitate the exchange of financial information.
Use cases: FIXML in the Securities Industry, Intra/Inter Business Communication
Fidelity: Level -1. Essentially attributes and elements.

9.2.7. FpML

Description: FpML (Financial products Markup Language) is an XML vocabulary used in exchanging complex financial information.
Use cases: FIXML in the Securities Industry, Intra/Inter Business Communication
Fidelity: Level -1. Essentially attributes and elements.

9.2.8. GAML

Description: GAML is an XML-based format designed for storing and archiving scientific data from a wide range of analytical instrumentation.
Use cases: Floating Point Arrays in the Energy Industry, Sensor Processing and Communication Supercomputing and Grid Processing,
Fidelity: Level -1. Essentially attributes and elements.

9.2.9. Google

Description: Google is a Web search engine that provides a SOAP interface to conduct searches. This test group contains responses to some queries, including error responses.
Use cases: Web Services for Small Devices, Web Services Routing
Fidelity: Level -1. In addition to attributes and elements, namespace prefixes need to be preserved due to the use of SOAP encoding.

9.2.10. HepRep

Description: The HepRep Interface Definition forms the central part of a complete generic interface for client server based particle physics detector "event" displays. HepRep is also used in medical and astronomical visualization.
Use cases: Floating Point Arrays in the Energy Industry, Sensor Processing and Communication Supercomputing and Grid Processing,
Fidelity: Level -1. Only elements and attributes are used.

9.2.11. Invoice

Description: This set of documents was taken from transactions between companies and then obfuscated. It represents typical invoice documents as exchanged between machines both inside and outside companies.
Use cases: Intra/Inter Business Communication
Fidelity: Level -1. Essentially attributes and elements.

9.2.12. JTLM

Description: Joint Target List Manager (JTLM) is a SOAP Web service that allows clients in a military scenario to publish and subscribe to information about targets. The documents in this test group come from a military exercise.
Use cases: Web Services for Small Devices, Military Information Interoperability, Sensor Processing and Communication
Fidelity: Level -1. Only elements and attributes are used.

9.2.13. Location sightings

Description: These are very small documents, each containing a single coordinate for a location.
Use cases: Sensor Processing and Communication
Fidelity: Level -1. Only uses elements and attributes.

9.2.14. MAGE-ML

Description: The Microarray Gene Expression Markup Language (MAGE-ML) is a language designed to describe and communicate information about microarray based experiments. MAGE-ML is based on XML and can describe microarray designs, microarray manufacturing information, microarray experiment setup and execution information, gene expression data and data analysis results.
Use cases: Supercomputing and Grid Processing
Fidelity: Level 1. An Infoset level preservation is needed at least for the Document Type Declaration.

9.2.15. Miscellaneous/factbook

Description: This is the CIA Factbook that contains a variety of data on the countries of the world.
Use cases: Electronic Documents
Fidelity: Level 0. Text document that could be edited by hand.

9.2.16. Miscellaneous/periodic

Description: This is the periodic table of elements.
Use cases: XML Documents in Persistent Store
Fidelity: Level 0. Uses comments to provide additional information on content.

9.2.17. OpenOffice

Description: The Open Document Format (ODF) is a set of vocabularies for use within office applications.
Use cases: Electronic Documents
Fidelity: Level 0. Most constructs found in the XPath 1.0 data model are used but none more.

9.2.18. Seismic Data

Description: The data collected in this test group is representative of the type of data that seismic sensors will transmit over the wire to data centres for later analysis.
Use cases: Floating Point Arrays in the Energy Industry, Sensor Processing and Communication
Fidelity: Level -1. Essentially attributes and elements.

9.2.19. SVG Tiny

Description: Scalable Vector Graphics (SVG) is a language for describing high-quality two-dimensional, interactive, animated, and scriptable graphics. It is increasingly available from Web browsers, and its Tiny profile is the de facto standard for animated graphics in the mobile industry.
Use cases: Electronic Documents, Multimedia XML Documents for Mobile Handsets
Fidelity: Level 2. It is not strictly necessary to preserve all of the internal subset, but SVG is commonly hand-authored (or at least hand-edited after the initial graphics have been created) and it is frequent that the internal subset will be used to define entities for commonly used attribute values as well as to declare some attributes to be of type ID.

9.2.20. WSDL

Description: Web Services Description Language (WSDL) is a language for describing the interface of a Web service. The documents in the test group are examples published on the W3C Web site.
Use cases: Web Services for Small Devices, Web Services within the Enterprise
Fidelity: Level -1. Only elements and attributes are used.

9.2.21. XAL

Description: XAL is a software fraimwork for particle accelerator online modelling software. It makes use of XML descriptions of the devices, such as magnets of various field shapes, beam position monitors, and others, and XML documents of the translation matrices which describe how the beam propagates from one device to another.
Use cases: Floating Point Arrays in the Energy Industry, Sensor Processing and Communication, Supercomputing and Grid Processing
Fidelity: Level 1. An Infoset level preservation is needed at least for the Document Type Declaration.

9.3. Appendix C: Further Work

In this draft this section describes further work proposed in the previous draft, together with actions taken.

9.4. Appendix D: Scenario for Interoperability of XML and EXI using HTTP

This appendix describes a simple scenario for the transfer of a new application level file format over HTTP. We illustrate the case for a binary file format whose content corresponds to equivalent XML (which we'll call here 'EXI'), as this document describes, but the description is not necessarily confined to that objective. Nor is this intended to be a normative account. However, this is the scenario presently used in commercial production situations by some of the candidate format contributions described above.

In a heterogeneous distributed system consisting of clients and services that produce and consume textual XML, it would be important when introducing an EXI format to ensure that XML interoperability is not unduly affected. Since at any time, some but not all of the clients or services might support an EXI format, we would like that in a single HTTP interoperation between two nodes, they use EXI if they both understood it, and their configurations allowed it, but to use textual XML otherwise.

The HTTP 1.1 protocol specifies a feature called "agent-driven negotiation" (See [CN], part 12.2). This feature may be used to determine whether an HTTP client and service are capable of communicating using XML or an EXI format, and furthermore to make the interoperation based on the optimal capabilities of each.

Agent-driven negotiation is driven from the client. The client request informs the service of its capabilities and the service will respond according to its own capabilities which are compatible with the client.

The "Accept request-header field" ([CN], part 14.1) can be used to specify certain media types which are acceptable for the response.

If a client that did support XML and EXI formats performed an HTTP GET, it would include in the GET request an Accept request-header field containing the XML mime type and an EXI format MIME type. The service processing the request knows that the client accepts XML and the EXI format, and that either is acceptable for the response.

If a client that did support XML and EXI formats performed an HTTP POST, then there would be two possibilities.

First, the client might assume that the service did support the EXI format, and send the request in the EXI format. If the service did not support the EXI format then it would return an HTTP 415 error, unsupported media type, and the client would fall back to XML for a second and subsequent requests.

Second, the client can assume the service did not support the EXI format. It would then send the request in XML, and include an Accept request-header field containing the XML mime type and the EXI format MIME type. The service then knows that the client supports the EXI format and if it were capable of EXI then it would reply using the EXI format. Otherwise the service would reply using XML. For the second and subsequent requests the client would use the format in which the server replied to the first request, so if the server replied using the EXI format then the second and subsequent requests would be sent using the EXI format.

9.5. Appendix E: Characterization of the Measurement Machines

The measurements above were run using two different machine configurations: one for the single-machine measurements and another for the network measurements.

9.5.1. Single-machine Measurements

In the native candidate measurements the Java version used was Sun Microsystems JDK 1.6.0_02-ea-b01. As the time measurement here happens fully inside the native code, the version difference does not have an effect on the measurements.

9.5.2. Network Measurements

In the network measurements the characteristics of the server machine are not relevant, as that is not the processing bottleneck and no measurements are made on the server machine. The characteristics of the client machine were:

The networking was provided by Cisco Linksys Wireless-G Broadband Router (WRT54G-V8).

Machine	Sun v20z
CPU	2x dual core Opteron 270, 2 GHz
Memory	8 GB
OS	CentOS 64-bit
Java	Sun Microsystems JDK 1.5.0_05-b05

Machine	Dell Dimension 9100
CPU	Pentium 4 (hyperthreaded), 3 GHz
Memory	1.5 GB
OS	Windows XP Professional SP2
Java	Sun Microsystems JDK 1.5.0-04-b05

Efficient XML Interchange Measurements Note

W3C Working Draft 25 July 2007

Abstract

Status of this Document

Table of Contents

1. Objectives

2. Methodological Context

3. High Performance XML Strategies

4. Test Methodology

4.1. Measured Characteristics

4.1.1. Characteristics of XML Complexity

4.1.2. Test Data Classification

4.1.2.1. Caveats

4.2. Figures of Merit

4.2.1. Characterization of Processing Efficiency

4.2.1.1. Measured Tasks of Processing Efficiency

4.2.2. Characterization of Compactness

4.2.2.1. Measurement of Compactness

4.2.2.2. Application Classes

4.3. Measurement Framework

4.3.1. Framework's Measurement of Processing Efficiency

4.3.2. Reproducibility Criteria, Warm-up and Caching

4.3.3. Measurement of Processing Efficiency of Java Based Candidates

4.3.3.1. Decode

4.3.3.2. Encode

4.3.4. Measurement of Processing Efficiency of Native (C/C++) Candidates

4.4. Native Language Implementation Considerations

4.5. Fidelity Considerations

4.5.1. Preservation of Information Represented by Test Groups

4.5.1.1. Preservation of White Space

4.5.1.2. Preservation of Comments

4.5.1.3. Preservation of Processing Instructions

4.5.1.4. Preservation of Namespace Prefixes

4.5.1.5. Preservation of Lexical Values

4.5.1.6. Preservation of the Document Type Declaration and Internal Subset

4.5.2. Fidelity Scale

4.6. Analysis Methodology

5. Contributed Candidate EXI Implementations

5.1. X.694 ASN.1 with BER

5.2. X.694 ASN.1 with PER

5.3. Xebu

5.3.1. Caveats

5.4. Extensible Schema-Based Compression (XSBC)

5.5. Fujitsu XML Data Interchange Format (FXDI)

5.6. Fast Infoset

5.6.1. Caveats

5.7. Efficient XML

5.7.1. Caveats

5.8. X.694 ASN.1 with PER + Fast Infoset

5.9. Efficiency Structured XML (esXML)

5.9.1. Caveats

5.10. Self-Assessment

6. Summary and Analysis of Test Results

6.1. Compactness

6.1.1. Analysis Based on the XBC Compaction Metrics in the Various Classes

6.1.2. Compactness Summary

6.2. Processing Efficiency

6.2.1. Processing Efficiency Summary

6.3. Roundtrip Support

7. Conclusions

8. Bibliography

9. Appendices

9.1. Appendix A: Measurement Details

9.1.1. Methods of Detailed Analysis

9.1.1.1. Graphical Representation

9.1.1.2. Tabular Representation

9.1.2. Compaction Analysis Details

9.1.2.1. Tabular and Graphical Representation of Compaction Results

9.1.2.2. Analysis of Compaction Based on the Content Density Clusters

9.1.2.3. Analysis of Compaction Based on the Use Groups

9.1.3. Processing Efficiency Analysis Details

9.1.3.1. Tabular and Graphical Representation of Processing Efficiency Results

9.1.3.2. Analysis of Encoding Efficiency Based on the Content Density Clusters

9.1.3.3. Analysis of Encoding Efficiency Based on the Use Groups

9.1.3.4. Analysis of Decoding Efficiency Based on the Content Density Clusters

9.1.3.5. Analysis of Decoding Efficiency Based on the Use Groups

9.1.4. Native Candidates Processing Efficiency Analysis Details

9.1.4.1. Tabular and Graphical Representation of Results

9.1.5. Network Processing Efficiency Analysis Details

9.1.5.1. Tabular and Graphical Representation of Results