Binar Sort A Generalized Sorting Algorithm

Download as pdf
Download as pdf
You are on page 1of 32

Binar Sort: A Linear Generalized Sorting Algorithm

by

William F. Gilreath

(will@williamgilreath.com)

May 2011
Binar Sort: A Linear Generalized Sorting Algorithm

Abstract
Sorting is a common and ubiquitous activity for computers. It is not surprising that there exists a
plethora of sorting algorithms. For all the sorting algorithms, it is an accepted performance limit
that sorting algorithms are linearithmic O(N lg N). The linearithmic lower bound in performance
stems from the fact that the sorting algorithms use the ordering property of the data. The sorting
algorithm uses comparison by the ordering property to arrange the data elements from an initial
permutation into a sorted permutation.

Linear O(N) sorting algorithms exist, but use a priori knowledge of the data to use a specific
property, and thus has greater performance. In contrast, the linearithmic sorting algorithms are
generalized by using a universal property of data, relative ordering by comparison, but have a
linearithmic performance lower bound. The tradeoff in sorting algorithms is generality for per-
formance by the chosen property used to sort the data elements.

A general-purpose, linear sorting algorithm in the context of the tradeoff of performance for ge-
nerality seems implausible. There is an implicit assumption that only the property of ordering is
universal. But, ordering is not the only universal property for data elements. The binar sort is a
general-purpose sorting algorithm that utilizes this other universal property to sort with linear
performance.

Keywords: comparison, comparison sorting, encoding, ordering, linear, linearithmic, sort, sort-
ing, universal property.

-2-
Binar Sort: A Linear Generalized Sorting Algorithm

1. Introduction
A. Sorting

Sorting is a frequent, ubiquitous computational activity on computers, often in tandem with other
algorithms such as search. The interest and ubiquity of sorting has led to the development of a
diverse number of sorting algorithms. But it begs the question of “What exactly is sorting?”

Sorting Formalism

A sorting formalism independent of any specific sorting algorithm defines sorting without a spe-
cific method to sort a collection of data elements. Knuth defines sorting [Knuth 1998] as:

The records:

R1, R2, … , RN-1, RN

are supposed to be sorted into non-decreasing order of their keys K1, K2,…,KN, essentially by
discovering a permutation p(1) p(2)…p(N) such that:

K p(1) ≤ K p(2) ≤ … ≤ K p(N-1) ≤ K p(N)

Knuth’s definition of sorting defines sorting in terms of the end result, a specific permutation out
of the n! permutations possible for a collection of records. The definition is more mathematical
than algorithmic. The result is described mathematically in organization among the data
elements, but not the mechanism for the result. The actual means is the specific sorting
algorithm.

Linearithmic Sorting

Most sorting algorithms are comparison based sorting algorithms. The comparison sorting
algorithms include such algorithms as the merge sort, quick sort, and heap sort. These sorting
algorithms use comparison to arrange the elements in the sorted permutation, and are general-
purpose in nature.

The comparison sorting algorithms have a well-known theoretical [Johnsonbaugh and Schaefer
2004] performance limit that is the least upper bound for sorting that is linearithmic or O(N lg N)
in complexity. This theoretical lower bound is from the basis of the comparison sorting
algorithm-using sorting to arrange the data elements. A decision tree for N elements is of
logarithmic height lg N. Thus, the time complexity involves the cost that is the cost of using a
decision tree to compare elements lg N and the number of elements N. Hence the theoretical least
upper bound is the product of the two costs involved, O(N lg N), which is linearithmic
complexity.

-3-
Binar Sort: A Linear Generalized Sorting Algorithm

Linear Sorting

Most, but not all sorting algorithms are linearithmic and use comparison as the operation to
organize the data elements. But, there are linear sorting algorithms, or O(N) complexity, but are
not utilizing comparison. Such linear algorithms include the radix sort, hash sort, and counting
sort.

Linear sorting algorithms use a priori knowledge of the data elements, a specific property, that is
required to sort linearly. The linear sorting algorithms map with a function, the data element
using the specific property.

The linear sorting algorithms are not general-purpose as they are tied to a specific property.
Compared to (no pun intended) linearithmic sorting algorithms, the linear are special-purpose. In
sorting algorithms, there is the classic engineering tradeoff of generality with performance for
linear with linearithmic, respectively.

Summary of Linear with Linearithmic

Sorting is algorithmic, but the algorithm has a key operation based upon a property of the data
elements. Linearithmic sorting algorithms use comparison, or a comparison operation, but that
operation is based upon the universal property of ordering. Linear sorting algorithms use a
mapping operation based upon a specific property of the data elements.

The tradeoff in generality to performance for sorting algorithms is an absolute. But there is an
implicit presumption that there is no other universal property of the data elements. Such a
presumption is that ordering is the only universal property of data elements. However, ordering
is not the only universal property of data elements.

B. Data

Long before computers were invented and constructed, data or lots of datum existed. Files, cards,
and records were kept as the industrial revolution lead to a revolution in data for production fig-
ures, records on employees, and the government kept records for taxes, census for population,
and so forth. Hence the modern civilization is characterized by the growth of information in the
form of data in a variety of forms for different and varied uses. Now there is a vast dizzying
ocean of data, with more information accumulated, analyzed, mined, and processed.

Data on a Computer

In order to use data on a computer, the data must be represented in a form the computer can work
with natively. Roth explains, “...computers require processing of data which contains numbers,
letters, and other symbols such as punctuation marks. In order to transmit such alphanumeric da-
ta to or from a computer, or store it internally, in a computer, each symbol must be represented
by a binary code.” [Roth 1992, pp. 13-14]

-4-
Binar Sort: A Linear Generalized Sorting Algorithm

Data comes in the form of integers, ordinals, characters, strings, reals, and Boolean values. Com-
puters are binary, and only work on binary numbers, the binary digits or bits of a binary number.
Hence an integer must be represented in binary, a Boolean value as binary, and all others as a
binary number.

Encoding Data

The representing of data on a computer as a binary number, or more generally in binary form, the
data must be encoded into binary form. The kinds of data, integers, ordinals, reals, characters,
strings, and Booleans must be encoded.

Floyd explains, “In order to communicate, we need not only numbers, but also letters and other
symbols. In the strictest sense, alphanumeric codes are codes that represent numbers and alpha-
betic characters (letters). Most such codes however, also represent symbols and various instruc-
tions necessary for conveying information.” [Floyd 1990 p. 81].

Encoding is the system of representation of binary form of each data type in a consistent method.
Many varieties of encoding exist, some well before computers were invented such as Morse
code, or the Baudot code. Other encodings are for communication efficiently such as Gray code,
Excess-3, and encoding for error detection such as Hamming code. [Floyd 1990 p. 80] Null and
Lobur summarize encoding as “...how these internal values can be converted to a form that is
meaningful to humans. The manner in which this is done depends on both the coding system
used by computer and how the values are stored and retrieved.” [Null and Lobur 2003 p. 68]

There are different encoding standards used, such as ASCII code [ANSI 1986], UTF-8 [Yergeau
2003], and Unicode [Unicode 2006] for characters, IEEE-754-1985 [IEEE 1985] for floating
point numbers, and two’s complement for integers. Encoding is universal because computers
require a ubiquitous form of representation to interchange data. But, the encoding is perceived as
a hardware feature, which is the reason encoding, is not readily apparent as a universal property.
Data (from the software perspective) automatically exists in the computer, and the details of the
data are a hardware, not software focus. An integer or Boolean exists in the programming
language, and thus in the computer, but the fact is is specifically encoded into a binary number is
abstracted away.

A summary of encoding for data types in the various programming languages:

1. integer - 2-complement encoding


2. real (as float or double) - IEEE-754-1985
3. Boolean - binary word (each bit is a Boolean value of true or false)
4. character - ASCII, EBCDIC, Unicode, UTF-8 (string is an array or linear structure of
characters)

Other Universal Property

The ordering property, in the form of a comparison function or operator, is not the only universal
property in data. Frequently in object-oriented software development, a class must define a

-5-
Binar Sort: A Linear Generalized Sorting Algorithm

comparison method to determine the ordering of instances of the class as objects. From a
mathematical perspective, numbers such as real numbers and integer numbers have an ordering,
but from an algorithmic perspective the ordering is not always given. Thus ordering is not the
only universal property, and thus the tradeoff in sorting algorithms of performance to generality
is not immutable.

The question is then one of: “What is the alternate universal property of data?” The answer is
seemingly obvious, but is not. The answer is the encoding, that is how a data element is
represented, in the computer.

The basic data elements of integers, reals, and characters are all encoded in a binary form on a
computer. Computers are binary in operation and the representation or encoding is in bits--binary
digits. But for computers to exchange and share data the representation must be agreed upon--the
encoding, particularly for characters and strings of characters.

Given that the encoding is universal in a binary form, the germane question is: “How does this
relate to a sorting algorithm?” Simply put, finding another universal property allows for
generality, and encoding is not ordering. Hence the algorithm using encoding is not comparison
sorting, and the linearithmic O(N lg N) lower bound is inapplicable.

Utilizing the encoding, a sorting algorithm can map the data elements instead of comparing a
data element to another, which is the basis for a general-purpose linear sorting algorithm. A
function or operator is needed to use the encoding to arrange the data elements from an initial
random permutation into a sorted permutation. The sorting algorithm is the binar sort algorithm.

Ubiquity of Encoding

An obvious realization is that while Unicode is widely used, there is no one, single, universal
encoding of data, particulary for character data. It would seem that using the encoding as a
universal property to sort upon is contradictory as there is no universal encoding. No universal
encoding implies that there is no universal property, hence that sorting is no general-purpose.

While there is no one universal encoding for data, all the various different encodings can be
translated or converted to one another. Even a proprietary encoding can be translated, out of the
need to work with the bulk of existing software, unless the originator of the proprietary encoding
intends to implement all the various forms of software to work with the encoding. The translation
and conversion is a necessity for use of existing software. No matter how elegant and efficient an
encoding, the bulk of software exists using standardized encodings of data.

The universality is that all encodings are translatable to one another, so encoding is universal in
that each encoding is possible to use for a different platform or system.

More formally, for given encoding domain Ex there exists a translation function Fxy that
translates an encoding domain Ex into the encoding range Ey. A character or glyph cEx in
encoding Ex is mapped by Fxy into the encoding Ey for a character cEy.

-6-
Binar Sort: A Linear Generalized Sorting Algorithm

When cEx  Ex  cEy  Ey then for Fxy, it follows Fxy : Ex  Ey therefore Fxy(cEx)  cEy.

An encoding this is a subset of another larger encoding, the translation for the subset (a partial
mapping function for Ex mapped by Fxy to Ey) is possible with all other characters outside the
domain encoding mapped to a default character in the range encoding.

-7-
Binar Sort: A Linear Generalized Sorting Algorithm

2. Algorithm Synopsis
The binar sort partitions or divides data elements from the initial array into the lower and upper
sub-arrays. The algorithm extracts the nth bit from a given data element, and then uses that bit to
map the element in the lower or upper sub-array.

Diagram 1 of Two Sub-Arrays at Extreme Ends of the Original Array

Two sub-arrays are created within the original array starting at the extreme endpoints and pro-
gressing towards each other at the middle of the initial array.

Diagram 2 of Two Sub-Arrays Approaching One Another

The process continues on each created sub-array using the next bit position in the data elements.

Diagram 3 of Two Sub-Arrays Encompassing Original Array Approaching One Another

This process continues until either the last bit in the data element is reached, or the size of a sub-
array is one element.

Diagram 4 Illustrating Two Sub-Arrays at Intersection or Crossover

The resulting array at the base case for termination cannot partition into more sub-arrays, or there
are no more bits for extraction. The original array is now in a sorted permutation.

-8-
Binar Sort: A Linear Generalized Sorting Algorithm

3. Operation of Algorithm
The binar sort algorithm operates both iteratively and recursively in place on the original array of
data elements.

The algorithm operates with four discrete steps that are:

1. Evaluate for recursive base case.


2. Initialize starting array bounds.
3. Partition the array into sub-arrays.
4. Determine recursive call on sub-arrays.

Evaluate for Recursive Base Case

The first step is to evaluate the passed parameters for termination of the algorithm, to determine
if a recursive base case has been reached. When one of the two possible base cases of the binar
sort is reached, the recursion terminates, and the call returns without any further operation on the
passed parameters.

The two criteria for the base case of the recursion are:

1. Reach the end of the bits in an element.


2. The size of a sub-array is one element.

The first case is to reach the end of the number of bits for a data element. In effect, there are no
more bits to extract to use for partitioning. The second case is the check of the bounds of the ar-
ray parameter for any elements to partition into a sub-array. For an array of one element, there is
no point partitioning, as the element is in its final position. For either case, the operation of the
algorithm terminates.

Initialize Starting Array Bounds

When the binar sort algorithm does not terminate, then the operation proceeds to initialization.
The passed array bounds are used to initialize the bounds for the original array. The passed array
bounds are retained for use later in the operation of the algorithm. The bounds of the original ar-
ray are used by variables to track the changing boundaries of the original array during partition-
ing. After initializing the original array boundaries, the operation then proceeds to partition.

Partition Array into Sub-arrays

The heart of the binar sort algorithm is the continual partitioning of the original array. The parti-

-9-
Binar Sort: A Linear Generalized Sorting Algorithm

tion step of the algorithm divides the elements of the original array into lower and upper sub-
arrays.

The partition operation has three steps:

1. Extract the nth bit from the data element in selected position.
2. Map the data element in the correct sub-array.
3. Adjust sub-array boundary to encompass element.

The initial start position used in the operation is the lower index position of the array. The start-
ing position acts as the locus of partitioning the array. The data element at the locus position is
the working element mapped into a sub-array.

Bit Extraction

Before partitioning the working data element, the nth bit is extracted to determine the sub-array
to map the element. Bit extraction uses a shift operation to the left by N-bits, and a bit mask. The
bit mask is a literal value that depends on the data element word size in bits. The result is all the
bits are zero or the value of the literal bit mask.

For a data element of a nybble or 4-bits, the bit mask is hexadecimal 0x8 or a binary number of
10002 for an integer value of 8.

For a data element using the bit mask with a logical and bitwise operation is:

XXXX2 && 10002  X0002

The resulting value for a nybble is either hexadecimal 0x0 or a binary number 00002 the integer
0, a hexadecimal 0x8, or a binary number of 10002 for the integer value of 8, which is a non-
zero. Thus for a bit extraction, the result of the extracted bit for a data element is essentially a
zero or non-zero (the literal bit mask) value.

Placement of Element

The placement of the data element is dependent upon the bit value from the bit extraction. De-
pending on the extracted bit value, the data element is placed by one of two possibilities. The
data element selected is in the lower sub-array.

The two possibilities are:

1. Data element is in the correct position in the lower sub-array, extend sub-array boundary.
2. Data element is in the wrong position, exchange with the element in the upper sub-array.
With an extracted bit value of zero, the data element is in the correct position in the lower sub-
array, and nothing is exchanged. For an extracted bit value of non-zero, the data element is ex-
changed or swapped with the data element in the upper sub-array. In both cases, it then follows
to adjust the array bounds after the data element is placed.

-10-
Binar Sort: A Linear Generalized Sorting Algorithm

Adjust Array Bounds

Once the data element is placed in the correct sub-array, the sub-array boundaries are adjusted to
encompass the element. For the lower sub-array, the boundary index is incremented, and for the
upper sub-array, the boundary index is decremented. The lower and upper array boundaries ap-
proach each other as each data element is partitioned.

Partition Repetition

The partitioning process continues iteratively for each data element in the array. The iterative
process continues until the array bounds cross or intersect, when all the elements are partitioned
into the sub-arrays for a bit position. The remaining step is to determine the recursive continua-
tion of the operation of the algorithm.

Determine Recursive Continuation

The last step of the binar sort is recursive continuation of the algorithm on the result of partition-
ing. The partitioning creates one or two sub-arrays. For one sub-array, no partitioning occurred,
which is pass through. A result of two sub-arrays, the partitioning process successfully divided
the data elements of the original array.

The continuation of the algorithm must determine the results of partitioning. The original array
bounds are evaluated using the index of sub-array bounds after partitioning, and partitioning con-
tinues recursively.

4. Illustration of the Algorithm


The binar sort algorithm works on different data types (such as character, integer, ordinal, real,
and string) of different data bit sizes. The word size of the data element is a constant c of the al-

-11-
Binar Sort: A Linear Generalized Sorting Algorithm

gorithm. In illustration of the algorithm, a nybble, or a 4-bit word size (a single hexadecimal
number) is used.

Consider an initial array of nybble values which are eight nybbles to sort into an ordered permu-
tation. The bit mask for the bit extraction is hexadecimal 0x1000, or integer value of 8.

First Pass at Partitioning

The first pass at partitioning divides the original array (in the center) into a lower sub-array (on
the left) and an upper sub-array (on the right). Initially before any partitioning the configuration
of the array, lower sub-array, and upper sub-array is:

Figure 1 Initial Array of Elements before Partitioning

The selected element is 'B', and the bit extracted is from the most significant position, which is a
non-zero. The element is not in the correct position, so is swapped with the element at the boun-
dary of the upper sub-array, the 'E' element. After the exchange, and array boundary adjustment
the configuration is:

Figure 2 Array of Elements after Partitioning First Element

The selected element is 'E', and the bit extracted is non-zero. Again, the element is not in the cor-
rect position, so is swapped with the 'C' element. The configuration is:

Figure 3 Array of Elements after Partitioning Second Element

The selected element is 'C', and the bit extracted is non-zero. The element is not in the correct
position, and so is swapped the 'A' element. The configuration is:

-12-
Binar Sort: A Linear Generalized Sorting Algorithm

Figure 4 Array of Elements after Partitioning Third Element

The selected element is 'A', and the bit extracted is non-zero. The element is not in the correct
position, and so is swapped the '7' element. The configuration is:

Figure 5 Array of Elements after Partitioning Four Element

The selected element is '7', and the bit extracted is zero. The element is in the correct position,
and so only the array bounds are adjusted. The configuration is:

Figure 6 Array of Elements after Partitioning Fifth Element

The selected element is '4', and the bit extracted is zero. Again, the element is in the correct posi-
tion, and so only the array bounds are adjusted. The configuration is:

Figure 7 Array of Elements after Partitioning Sixth Element

The selected element is '0', and the bit extracted is zero. The element is in the correct position,
and so only the array bounds are adjusted. The configuration is:

-13-
Binar Sort: A Linear Generalized Sorting Algorithm

Figure 8 Array of Elements after Partitioning Seventh Element

Again, as the selected element is '0', and by the same process, the final configuration of:

Figure 9 Array of Elements after Partitioning Eighth Element

The algorithm continues on each sub-array as an array, further partitioning them on the next bit
position.

Partition from Initial Lower Sub-Array

The partitioning of the lower sub-array is on the second bit position. As before the initial confi-
guration is:

Figure 10 Lower Sub-array of Elements before Partitioning

The selected element is '7', and the extracted bit is non-zero. The element is placed with an ex-
change, and the configuration is:

Figure 11 Lower Sub-array of Elements after Partitioning First Element

The selected element is '0', and the extracted bit is zero. The element is in the correct position,
and the configuration is:

-14-
Binar Sort: A Linear Generalized Sorting Algorithm

Figure 12 Lower Sub-array of Elements after Partitioning Second Element

The selected element is '4', and the extracted bit is non-zero. The element is placed with an ex-
change, and the configuration is:

Figure 13 Lower Sub-array of Elements after Partitioning Third Element

The selected element is '0', and the extracted bit is zero. The element is in the swapped into posi-
tion (with itself as there is only one element), and the configuration is:

Figure 14 Lower Sub-array of Elements after Partitioning Fourth Element

The resulting lower and upper sub-array elements are in the correct positions. However, parti-
tioning would continue, but in the case of pass through resulting in only one sub-array. The re-
maining third and fourth bit positions are evaluated, and when the last bit position is reached.

Partition from Initial Upper Sub-Array

The partitioning of the lower sub-array is on the second bit position. As before the initial confi-
guration is:

Figure 15 Upper Sub-array of Elements before Partitioning


The selected element is 'A', and the extracted bit is zero. The element is in the correct position,
and the configuration is:

-15-
Binar Sort: A Linear Generalized Sorting Algorithm

Figure 16 Upper Sub-array of Elements after Partitioning First Element

The selected element is 'C', and the extracted bit is non-zero. The element is swapped into the
correct position, and the configuration is:

Figure 17 Upper Sub-array of Elements after Partitioning Second Element

The selected element is 'B', and the extracted bit is zero. The element is in the correct position,
and the configuration is:

Figure 18 Upper Sub-array of Elements after Partitioning Third Element

Lastly, the selected element is 'E', and the extracted bit is non-zero. The element is swapped
(with itself as there is only one element) and the array bounds adjusted. The final configuration
is:

Figure 19 Upper Sub-array of Elements after Partitioning Fourth Element

The lower sub-array for the third and fourth bits the elements are in the correct position, the case
of pass through. For the upper sub-array, the elements are positioned on the third bit, but then the
recursion terminates, as the sub-array size is one element.

Summary of Illustration of the Algorithm

-16-
Binar Sort: A Linear Generalized Sorting Algorithm

The partitioning process starts from an initial array of nibbles or hexadecimal numbers, and con-
tinues until the end of the last bit (pass through), or when the sub-array is only of one element in
size. Consider the array configuration into sub-arrays for each bit position, starting from an ini-
tial array. The configuration for each pass is:

Figure 20 Partitioning Hierarchy from Initial Array into Sub-arrays

On the partitioning with the second and third bit, the sub-array with [E C] is partitioned into
two sub-arrays of a single element. Thus, for the remaining bit positions, no partitioning occurs
as the recursion terminates.

However, the other sub-arrays are the case of pass through, no data elements are partitioned. This
continues until the last bit position is reached, and the recursion terminates. Both recursive cases
are used in the overall partitioning process.

After the last bit position is evaluated for the three sub-arrays that are partitioned but result in
pass through, the overall array is in the final configuration of:

-17-
Binar Sort: A Linear Generalized Sorting Algorithm

Figure 21 Resulting Array after Partitioning into Sorted Permutation

The configuration is, by Knuth's formal definition, a sorted permutation of the original array of
elements. The partitioning on the sub-arrays operates in place, within the index boundaries for
sub-arrays within the overall array.

5. Analysis of Algorithm
The binar sort is linear in both space and time complexity. The algorithm requires no extra sto-
rage space or memory, and performance time is proportional to the number of data elements.

-18-
Binar Sort: A Linear Generalized Sorting Algorithm

Space Analysis

The binar sort is an in-place sorting algorithm; hence the same initial array is used for each re-
cursive invocation of the algorithm for partitioning. The sub-arrays are determined by the passed
parameters as index boundaries within the array. The only change is of the variables for each re-
cursive call of the binar sort.

The array of N elements and a constant number of variables c for reach recursive invocation is
the space complexity S:

O(S) = N + c

As there are multiple invocations through recursion, the constant is the sum of each recursive
call, or for i total calls recursively:

O(S) = N + (c0 + ... + ci-1)

The expression can be simplified using summation notation to the expression:


i1
O(S) = N + c j
j 0

But the summation of a constant is the product of the constant:



i1

c j
= icj = c
j 0

Effectively the sum of the constants c for i recursive calls is simply another constant c, or in
Big-Oh notation a constant c, so the space complexity for the binar sort is:

O(S) = N + c = O(N)

The space complexity of the binar sort is linear or O(N), proportional to the number of data ele-
ments.

Time Analysis

The analysis of the performance time of sorting algorithms typically considers different cases,
from an optimal to a worst case. Each case is simply a different permutation for the particular
sorting algorithm is better or worse performance. Frequently, the permutations of the data for the

-19-
Binar Sort: A Linear Generalized Sorting Algorithm

different cases are for an ascending sorted permutation, a descending sorted permutation, and a
random permutation.

This approach to sorting algorithm analysis is used for the comparison-based sorting algorithms.
An unasked question is "Why use various permutations for performance time analysis?" The an-
swer is that comparison-based sorting algorithms do a relative comparison for positioning a data
element. Each comparison results in a relative positioning of the data element in the overall array
of data elements. Thus, the initial permutation of the data elements can impact sorting perfor-
mance.

Why consider this particular point for time analysis of a sorting algorithm? Simply, the binar sort
does not use comparison to place an element in position, but the extracted bit values. For the
constant c total bits or word size of a data element, each bit constitutes 1/c of the process in the
correct positioning a data element within the array. Each positioning is absolute within the over-
all array, not relative to the other data elements. Effectively, the initial permutation of the data
elements is superfluous to the performance time of the binar sort. Comparison is not used to rela-
tively place the element; each bit is used to put an element in an absolute position.

In the analysis of the performance time, considering worse and optimal cases is irrelevant as the
binar sort is not comparison-based, so is independent of any particular permutation of the data
elements.

Analysis of the Performance

Analysis of the binar sort algorithm performance time does not use specific cases. The analysis
approach is to consider each discrete step, and consider the overall total performance time O(T)
as the sum of each step. Thus the initial analysis of the performance time of the binar sort is:

O(T) = O(T0) + O(T1) + O(T2) + O(T3)

Each discrete step has its own performance time, and for each discrete step the performance time
is:

1. O(T0) - Evaluate for recursive base case.


2. O(T1) - Initialize starting array bounds.
3. O(T2) - Partition array into sub-arrays.
4. O(T3) - Determine recursive call on sub-arrays.

Except for the partition step of the algorithm, all the other steps are constant in the performance
time. For the last step involving a logical decision of the recursive call, the complexity of the
evaluation is constant for the decision, and the greater performance time for two recursive calls is
used as the constant. Substituting a constant for each step, the performance time of the binar sort
becomes:

O(T) = O(c0) + O(c1) + O(T2) + O(c3)

-20-
Binar Sort: A Linear Generalized Sorting Algorithm

The expression simplifies the constants to a single constant c, and then is:

O(T) = O(T2) + c

Partitioning Analysis

The analysis of partitioning is not constant, as the partitioning step in the algorithm is iterative.
However, the iterative step processes through all elements in the array, and for the recursive con-
tinuation the sub-arrays, which are divisions of the overall array. Thus partitioning is a constant
performance time c for the bit extraction and determination of which sub-array to place a data
element. But this process is repeated n times, or once for each data element. This means that the
performance time for the partitioning step is:

O(T2) = cN

Substituting this into the original performance time expression for the algorithm is:

O(T) = cN + c

But for the Big-Oh time complexity, this expression simplifies to:

O(T) = N

Recursion Analysis

One important consideration in the performance time analysis, is that for the expression as the
sum of the performance time for each discrete step, is that it is for one operation of the algorithm
on a single bit position. This means the performance time expression is:

O(T) = 1N

For each recursive pass, each bit of each data element is accessed once, and there are a constant
number c of bits (and for other data types with variants, such as a string of characters, there is an
overall average length for a data element that is independent of the number of data elements).
This means that the recursion will occur for each bit position, so that the performance time is:

O(T) = cN

Again, the performance time is O(N) linear.

Summary of Performance Analysis

Regardless of the permutation of the data elements, the binar sort in the worst case will access a
data element once for each bit. Hence for N data elements with a constant number of bits c, the
performance time is O(N). The binar sort algorithm operates independently of the permutation of

-21-
Binar Sort: A Linear Generalized Sorting Algorithm

the data elements, so avoids a worst or optimal case. Thus, the binar sort algorithm remains con-
sistently linear in performance time.

6. Applied Performance
The binar sort is implemented in both managed code (in Java and C#) and native code (using C)
to evaluate the applied performance of the algorithm. The code was executed on different plat-
forms, from an AMD 64-bit PC with Windows 7, an PowerPC eMac running Mac OS X, an Intel

-22-
Binar Sort: A Linear Generalized Sorting Algorithm

Core Duo MacBook, and Linux Pentium 4 PC. The managed code was executed on the Common
Language Runtime (CLR) of .NET, and the Java Virtual Machine (JVM).

The same data sets are used for all the implementations of the binar sort. The test set examined
three cases:

1. Unsorted integers
2. Sorted integers
3. Partially sorted integers

In each case, the test was executed, and the performance time and size of the data written to a
data log file. The test sizes are varied, but as increments of blocks from a size of 216 = 65536 in-
tegers, and blocks the size of 220 = 1,048,576 integers.

The performance is linear, with a statistical covariance ranging from 0.9994 to 0.9999 in the
graph of the performance. The applied implementation of the binar sort was consistent with the
theoretical expectation, linear O(N) performance.

Chart of Test for Binar Sort Performance at 0.9994 Covariance

-23-
Binar Sort: A Linear Generalized Sorting Algorithm

Chart of Test for Binar Sort Performance at 0.9996 Covariance

Chart of Test for Binar Sort Performance at 0.9997 Covariance

-24-
Binar Sort: A Linear Generalized Sorting Algorithm

Chart of Test for Binar Sort Performance at 0.9999 Covariance

Chart of Test for Binar Sort Performance at 0.9999 Covariance on Megabyte Block
Summary of Charts

-25-
Binar Sort: A Linear Generalized Sorting Algorithm

The charts of the performance of data size N to time T illustrate a linear, straight line. The size of
data increases linearly, and then the time to sort the data increases linearly in proportion to a con-
stant. The covariance varies from 0.9994 to 0.9999, but is 0.999 for each equation. Hence the
equations are statistically accurate for the binar sort performance.

Chart of Binar Sort Performance Equations High Scalar Range Plot

When the equations of data size to sort time are plotted, the lines are congruent, and some paral-
lel to each other.

Chart of Binar Sort Performance Equations Lower Scale Range Plot

The equations change along the position along the axes, but the plot of the lines are consistent.
This indicates how the binar sort performance is consistently linear, with a change in the equa-
tion constants from the data sizes and the runtime platform.

7. Future Work

-26-
Binar Sort: A Linear Generalized Sorting Algorithm

There are two major potential aspects for further work with the binar sort that are:

1. Variations
2. Optimization

Algorithm Variants

The binar sort is recursive and executes on a serial processor. This leads to the opportunity for:

1. Purely iterative implementation.


2. Parallel version of the algorithm.

The algorithm variant of purely iterative is to replace recursion with iteration. The implementa-
tion is more an investigation into the potential possibility, so an open research question. An itera-
tive version would remove the overhead in time and space of activation records on the runtime
stack of the environment. A comparison of the performance of a recursive to iterative implemen-
tation of the binar sort would show possible performance improvement.

The binar sort executes on a serial processor, and thus it has potential parallelization. A parallel
version of the binar sort would let each processor focus on the partition for the processor identi-
ty. The interesting feature is that from the initial array of data elements, each parallel operation is
completely independent of the other. Theoretically, the linear algorithm of O(N) increases in per-
formance by O(N/P) where there are a total of P-processors.

Optimize the Algorithm

The binar sort is efficient as a linear O(N) algorithm, but optimizations are possible in the opera-
tion of the algorithm. Two such possible optimizations are:

1. Improve for the partitioning case of pass through.


2. Optimize for determining if elements in sorted order.

A potential optimization is for pass through when partitioning, essentially no data elements are
partitioned into sub-arrays. The cost of a recursive call to continue partitioning in the same
bounds for the next bit position is optimized by iteration. Instead of a recursive call, the partition-
ing would proceed but for the next bit position. This also simplifies the continuation recursive
step, as the recursion would always be for two sub-arrays, not just a single one.

The illustration of the operation of the binar sort demonstrated that the algorithm continues the
recursive invocation of the algorithm when the entire array is in a sorted permutation. This is
wasteful of process time, and increases the overhead of additional recursion without any further
need. A viable optimization is to check within the original array bounds to verify if the data ele-
ments are in sorted order, and indicate a sorted permutation exists.

-27-
Binar Sort: A Linear Generalized Sorting Algorithm

The difficulty with the optimization is to avoid unnecessary checks when the array is not poten-
tially sorted, but to determine when the array is sorted to avoid unnecessary recursive partition-
ing. For example, after several occurrences of pass through while partitioning a sorted order
check is performed. If the array is in a sorted ordered arrangement, the recursive calls in other
sub-arrays would terminate.

-28-
Binar Sort: A Linear Generalized Sorting Algorithm

8. Conclusion
The binar sort is a linear, unstable, and recursive sorting algorithm. The binar sort algorithm is
independent of the initial data permutation. By using the bits of the encoded data elements, the
binar sort arranges the data elements into a sorted permutation. Utilizing the encoding makes the
binar sort universal, but also a linear algorithm independent of any initial permutation for ex-
treme performance cases.

Beyond the features of the binar sort algorithm, are some intriguing realizations. One is that ge-
nerality and performance are not a hard, fast, tradeoff that is frequently presumed. The binar sort,
by its nature illustrates that such a tradeoff is not immutable. The binar sort algorithm illustrates
that the linearithmic, comparison based sorting algorithms have an implicit presumption that or-
dering is the only universal property feasible for generalized sorting. This implicit presumption is
flawed, and the other universal property is (of which all computers at the time of this writing are)
the universal property of the encoding of data.

In the analysis of the binar sort algorithm, the permutation of the data is superfluous and irrele-
vant to the performance of the binar sort. This illustrates that comparison based sorting algo-
rithms are sensitive to the permutation of the data set to sort. By using encoding instead of order-
ing, the binar sort avoids the sensitivity flaw to the permutation of the data. This flaw in sensitiv-
ity to the permutation of the data is an accepted defect in existing comparison based sorting algo-
rithms, but again it is an implicit presumption that this is necessary.

The binar sort algorithm breaks new ground, but any further development is far from complete or
a finished. Further work is necessary for improvement and optimization of the binar sort. A pure-
ly iterative variant, and a parallel version of the algorithm are just two possibilities of future
work. Future developments are anticipated with possibly other algorithms inspired by the binar
sort.

-29-
Binar Sort: A Linear Generalized Sorting Algorithm

Appendix A. Binar Sort Java Source Code for Integer


void BinarSort(int lower, int upper, int pos, int[] array)
{
if(pos == 33 || upper < lower + 1) return;

int lo = lower; //array lower bound position


int hi = upper; //array upper bound position

while (lo < hi + 1){

int bit = (array[lo] << pos) & 0x80000000;

if (bit == 0) //bit is 0 move element in lower sub-array


{
lo++;
}
else
{
int temp = array[hi];
array[hi] = array[lo];
array[lo] = temp;
hi--;
}//end if

}//end while

if(lo == upper + 1)
{
BinarSort(lower, upper, pos + 1, array);
}
else
{
BinarSort(lower, lo - 1, pos + 1, array);
BinarSort(lo, upper, pos + 1, array);
}//end if

}//end BinarSort

-30-
Binar Sort: A Linear Generalized Sorting Algorithm

Appendix B. Binar Sort C Source Code for Integer


void bsort(const int low, const int high, const int pos, int list[])
{

int lo = low;
int hi = high;

if( pos == 32 || high < low + 1 ) return;

while (lo < hi + 1)


{
if( (list[lo] << pos) & 0x80000000 )
{
int temp = list[hi];
list[hi] = list[lo];
list[lo] = temp;
hi--;
}
else
{
lo++;
}//end if

}//end while

if(lo == high + 1)
{
bsort(low, high, pos+1, list);
}
else
{
bsort(low, lo-1, pos+1, list);
bsort(lo, high, pos+1, list);
}//end if

}//end bsort

-31-
Binar Sort: A Linear Generalized Sorting Algorithm

References
[ANSI 1986] American National Standards Institute, "Coded Character Set - 7-bit American
Standard Code for Information Interchange", ANSI X3.4, 1986.

[Floyd 1990] Floyd, Thomas L. Digital Fundamentals, 4th edition. Merril Publishing Company,
Columbus, OH. 1990.

[IEEE 1985] IEEE Computer Society. IEEE Standard for Binary Floating-Point Arithmetic,
IEEE Std 754-1985, 1985.

[Johnsonbaugh and Schaefer 2004] Johnsonbaugh, Richard and Schaefer, Marcus. Algorithms.
Prentice-Hall, Upper Saddle River, New Jersey, 2004.

[Knuth 1998] Knuth, Donald E. The Art of Computer Programming, 2nd edition. Addison-
Wesley, Reading, Massachusetts, 1998.

[Null and Lobur 2003] Null, Linda and Lobur, Julia. The Essentials of Computer Organization
and Architecture, Jones and Bartlett Publishers, Sudbury MA. 2003

[Roth 1992] Roth, Jr., Charles A. Fundamentals of Logic Design, 4th edition. West Publishing
Company, New York, NY. 1992.

[Unicode 2006] The Unicode Standard, Version 5.0, Fifth Edition, The Unicode Consortium,
Addison-Wesley Professional, 2006.

[Yergeau 2003] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC
3629, November 2003.

-32-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy