Julia: Fresh Approach To Numerical Computing
Julia: Fresh Approach To Numerical Computing
Julia: Fresh Approach To Numerical Computing
Abstract. Bridging cultures that have often been distant, Julia combines expertise from the diverse
fields of computer science and computational science to create a new approach to numerical
computing. Julia is designed to be easy and fast and questions notions generally held to
be “laws of nature” by practitioners of numerical computing:
1. High-level dynamic programs have to be slow.
2. One must prototype in one language and then rewrite in another language for speed
or deployment.
3. There are parts of a system appropriate for the programmer, and other parts that are
best left untouched as they have been built by the experts.
We introduce the Julia programming language and its design—a dance between special-
ization and abstraction. Specialization allows for custom treatment. Multiple dispatch,
a technique from computer science, picks the right algorithm for the right circumstance.
Abstraction, which is what good computation is really about, recognizes what remains the
same after differences are stripped away. Abstractions in mathematics are captured as
code through another technique from computer science, generic programming.
Julia shows that one can achieve machine performance without sacrificing human con-
venience.
DOI. 10.1137/141000671
Contents
1 Scientific Computing Languages: The Julia Innovation 66
1.1 Julia Architecture and Language Design Philosophy . . . . . . . . . . 67
∗ Receivedby the editors December 18, 2014; accepted for publication (in revised form) December
16, 2015; published electronically February 7, 2017.
http://www.siam.org/journals/sirev/59-1/100067.html
Funding: This work received financial support from the MIT Deshpande Center for Technological
Innovation, the Intel Science and Technology Center for Big Data, the DARPA XDATA program, the
Singapore MIT Alliance, an Amazon Web Services grant for JuliaBox, NSF awards CCF-0832997,
DMS-1016125, and DMS-1312831, VMware Research, a DOE grant with Dr. Andrew Gelman of
Columbia University for petascale hierarchical modeling, grants from Saudi Aramco thanks to Ali
Dogru and Shell Oil thanks to Alon Arad, and a Citibank grant for High Performance Banking Data
Analysis, Chris Mentzel, and the Gordon and Betty Moore Foundation.
† Julia Computing, Inc. (jeff@juliacomputing.com, viral@juliacomputing.com).
‡ CSAIL and Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA
02139 (edelman@math.mit.edu).
§ New York University, New York, NY 10012, and Julia Computing, Inc. (stefan@juliacomputing.
com).
65
66 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
2 A Taste of Julia 68
2.1 A Brief Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.2 An Invaluable Tool for Numerical Integrity . . . . . . . . . . . . . . . 72
2.3 The Julia Community . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3 Writing Programs With and Without Types 74
3.1 The Balance between Human and the Computer . . . . . . . . . . . . 74
3.2 Julia’s Recognizable Types . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3 User’s Own Types Are First Class Too . . . . . . . . . . . . . . . . . . 75
3.4 Vectorization: Key Strengths and Serious Weaknesses . . . . . . . . . 76
3.5 Type Inference Rescues “For Loops” and So Much More . . . . . . . . 78
4 Code Selection: Run the Right Code at the Right Time 78
4.1 Multiple Dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Code Selection from Bits to Matrices . . . . . . . . . . . . . . . . . . . 81
4.2.1 Summing Numbers: Floats and Ints . . . . . . . . . . . . . . . 81
4.2.2 Summing Matrices: Dense and Sparse . . . . . . . . . . . . . . 82
4.3 The Many Levels of Code Selection . . . . . . . . . . . . . . . . . . . . 83
4.4 Is “Code Selection” Traditional Object Oriented Programming? . . . . 85
4.5 Quantifying the Use of Multiple Dispatch . . . . . . . . . . . . . . . . 86
4.6 Case Study for Numerical Computing . . . . . . . . . . . . . . . . . . 87
4.6.1 Determinant: Simple Single Dispatch . . . . . . . . . . . . . . . 88
4.6.2 A Symmetric Arrow Matrix Type . . . . . . . . . . . . . . . . . 89
5 Leveraging Design for High Performance Libraries 90
5.1 Integer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 A Powerful Approach to Linear Algebra . . . . . . . . . . . . . . . . . 91
5.2.1 Matrix Factorizations . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.2 User-Extensible Wrappers for BLAS and LAPACK . . . . . . . 92
5.3 High Performance Polynomials and Special Functions with Macros . . 93
5.4 Easy and Flexible Parallelism . . . . . . . . . . . . . . . . . . . . . . . 94
5.5 Performance Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6 Conclusion 97
References 97
Many researchers today work in dynamic languages. Still, C and Fortran remain
the gold standard for performance for computationally intensive problems. As much
as the dynamic language programmer misses out on performance, though, the C and
Fortran programmer misses out on productivity. An unfortunate outcome is that
the most challenging areas of numerical computing have benefited the least from
the increased abstraction and productivity offered by higher-level languages. The
consequences have been more serious than many realize.
Julia’s innovation lies in its combination of productivity and performance. New
users want a quick explanation as to why Julia is fast and want to know whether
somehow the same “magic dust” could also be sprinkled on their favorite traditional
scientific computing language. Julia is fast because of careful language design and
the right combination of carefully chosen technologies that work very well with each
other. This article demonstrates some of these technologies using a number of exam-
ples. Celeste serves as an example for readers interested in a large-scale application
that leverages 8,192 cores on the Cori Supercomputer at Lawrence Berkeley National
Laboratory [28].
Users interact with Julia through a standard REPL (real-eval-print loop environ-
ment such as Python, R, or MATLAB), by collecting commands in a .jl file, or by
typing directly in a Jupyter (JUlia, PYThon, R) notebook [15, 30]. We invite the
reader to follow along at http://juliabox.com using Jupyter notebooks or by down-
loading Julia from http://julialang.org/downloads.
1.1. Julia Architecture and Language Design Philosophy. Many popular dy-
namic languages were not designed with the goal of high performance in mind. After
all, if you wanted really good performance you would use a static language, or so said
the popular wisdom. Only with the increasing need in the day-to-day life of scientific
programmers for simultaneous productivity and performance has the need for high
performance dynamic languages become pressing. Unfortunately, retrofitting an ex-
isting slow dynamic language for high performance is almost impossible, specifically
in numerical computing ecosystems. This is because numerical computing requires
performance-critical numerical libraries, which invariably depend on the details of the
internal implementation of the high-level language, thereby locking in those internal
implementation details. For example, you can run Python code much faster than the
standard CPython implementation using the PyPy just-in-time (JIT) compiler, but
PyPy is currently incompatible with NumPy and the rest of SciPy.
Another important point is that just because a program is available in C or
Fortran, it may not run efficiently from the high-level language.
The best path to a fast, high-level system for scientific and numerical computing is
to make the system fast enough that all of its libraries can be written in the high-level
language in the first place. The JuMP.jl [20] package for mathematical programming
and the Convex.jl [33] package for convex optimization are great examples of the
success of this approach—in each case the entire library is written in Julia and uses
many Julia language features described in this article.
The Two Language Problem. As long as the developers’ language is harder to
grasp than the users’ language, numerical computing will always be hindered. This
is an essential part of the design philosophy of Julia: all basic functionality must be
possible to implement in Julia—never force the programmer to resort to using C or
Fortran. Julia solves the two language problem. Basic functionality must be fast:
integer arithmetic, for loops, recursion, floating-point operations, calling C functions,
and manipulating C-like structs. While these features are not only important for
68 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
numerical programs, without them you certainly cannot write fast numerical code.
“Vectorization languages” like Python+NumPy, R, and MATLAB hide their for loops
and integer operations, but they are still there, inside the C and Fortran, lurking
beneath the thin veneer. Julia removes this separation entirely, allowing the high-level
code to “just write a for loop” if that happens to be the best way to solve a problem.
We believe that the Julia programming language fulfills much of the Fortran
dream: automatic translation of formulas into efficient executable code. It allows
programmers to write clear, high-level, generic and abstract code that closely re-
sembles mathematical formulas, yet produces fast, low-level machine code that has
traditionally only been generated by static languages.
Julia’s ability to combine these levels of performance and productivity in a single
language stems from the choice of a number of features that work well with each other:
1. An expressive type system, allowing optional type annotations (section 3).
2. Multiple dispatch using these types to select implementations (section 4).
3. Metaprogramming for code generation (section 5.3).
4. A dataflow type inference algorithm allowing types of most expressions to be
inferred [2, 4].
5. Aggressive code specialization against run-time types [2, 4].
6. JIT compilation [2, 4] using the LLVM compiler framework [18], which is also
used by a number of other compilers such as Clang [6] and Apple’s Swift [32].
7. Julia’s carefully written libraries that leverage the language design, i.e., points
1 through 6 above (section 5).
Points 1, 2, and 3 above are features especially for the human user, and they are
the focus of this paper. For details about the features related to language implemen-
tation and internals such as those in points 4, 5, and 6, we direct the reader to our
earlier work [2, 4]. The feature in point 7 brings everything together to enable the
building of high performance computational libraries in Julia.
Although a sophisticated type system is made available to the programmer, it
remains unobtrusive in the sense that one is never required to specify types, and
neither are type annotations necessary for performance. Type information flows nat-
urally through the program due to dataflow type inference.
In what follows, we describe the benefits of Julia’s language design for numeri-
cal computing, allowing programmers to more readily express themselves while also
obtaining performance.
2. A Taste of Julia.
2.1. A Brief Tour.
In[2]: x = A[1,2]
y = (A+2I)[3,3] # The [3,3] entry of A+2I
Out[2]: 2.601952
In Julia, I is a built-in representation of the identity matrix, without any explicit
forming of the identity matrix as is commonly done using commands such as “eye.”
(“eye,” a homonym of “I,” is used in such languages as MATLAB, Octave, Go’s matrix
library, Python’s NumPy, and Scilab.)
Julia has symmetric tridiagonal matrices as a special type. For example, we may
define Gil Strang’s favorite matrix (the second-order difference matrix; see Figure 1)
in a way that uses only O(n) memory.
for i=1:5
y=cumsum(randn(500))
plot(y)
end
n = 500
l = ["First", "Second", "Third"]
c = [colorant"yellow",colorant"cyan",colorant"magenta"]
p = [layer(x=1:n, y=cumsum(randn(n)), Geom.line,
Theme(default_color=i)) for i in c ]
labels=(Guide.xlabel("Time"),Guide.ylabel("Value"),
Guide.title("Brownian Motion Trials"),
Guide.manual_color_key("Legend", l, c))
Gadfly.plot(p...,labels...)
0
Trial
First
Value
-20
Second
Third
-40
-60
0 100 200 300 400 500
Time
The ellipses on the last line of text above are known as a splat operator. The
elements of the vector p and the tuple labels are inserted individually as arguments
to the plot function.
2.2. An Invaluable Tool for Numerical Integrity. One popular feature of Julia
is that it gives the user the ability to “kick the tires” of a numerical computation. We
thank Velvel Kahan for the sage advice4 concerning the importance of this feature.
The idea is simple: a good engineer tests his or her code for numerical stability. In
Julia this can be done by changing the IEEE rounding modes. There are five modes to
choose from, yet most engineers silently choose only the RoundNearest mode default
available in many numerical computing systems. If a difference is detected, one can
also run the computation in higher precision. Kahan [16] writes:
Can the effects of roundoff upon a floating-point computation be assessed
without submitting it to a mathematically rigorous and (if feasible at all)
time-consuming error-analysis? In general, No. . . .
In[10]: H=h(15);
setrounding(Float64,RoundNearest) do
inv(H)[1,1]
end
Out[10]: 154410.55589294434
In[11]: setrounding(Float64,RoundUp) do
inv(H)[1,1]
end
Out[11]: -49499.606132507324
In[12]: setrounding(Float64,RoundDown) do
inv(H)[1,1]
end
Out[12]: -841819.4371948242
With 300 bits of precision, we obtain
Out[13]: -2.09397179250746270128280174214489516162708857703714959763232689047153
50765882491054998376252e+03
Note that this is the [1,1] entry of the inverse of the rounded Hilbert-like ma-
trix, not the inverse of the exact Hilbert-like matrix, whose entry would be exactly
1,387,200. Also, the Float64 results are sensitive to the BLAS [19] and LAPACK [1]
74 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
libraries, and may differ on different machines with different versions of Julia. For
extended precision, Julia uses the MPFR library [9].
2.3. The Julia Community. Julia has been under development since 2009, and a
public release was announced in February of 2012. It is an active open source project
with over 500 contributors and is available under the MIT License [23] for open source
software. Over 2 million unique visitors have visited the Julia website since then, and
Julia has now been adopted as a teaching tool in dozens of universities around the
world.5 The community has contributed over 1200 Julia packages. While it was nur-
tured at the Massachusetts Institute of Technology, it is really the contributions from
experts around the world that make it a joy to use for numerical computing. It is
also recognized as a general purpose computing language, unlike traditional numerical
computing systems, allowing it to be used not only to prototype numerical algorithms,
but also to deploy those algorithms and even serve results to the rest of the world. A
great example of this is Shashi Gowda’s Escher.jl package,6 which makes it possible
for Julia programmers to build beautiful interactive websites in Julia and serve up the
results of a Julia computation from the web server, without any knowledge of HTML
or JavaScript. Another such example is “Sudoku-as-a-Service,”7 by Iain Dunning,
where a Sudoku puzzle is solved using the optimization capabilities of the JuMP.jl Ju-
lia package [20] and made available as a web service. This is exactly why Julia is being
increasingly deployed in production environments in businesses, as is seen in various
talks at JuliaCon.8 These use cases utilize Julia’s capabilities not only for mathemat-
ical computation, but for building web APIs, database access, and much more.
3. Writing Programs With and Without Types.
3.1. The Balance between Human and the Computer. Graydon Hoare, author
of the Rust programming language [29], defined programming languages succinctly in
an essay on Interactive Scientific Computing [11]:
Programming languages are mediating devices, interfaces that try to strike
a balance between human needs and computer needs. Implicit in that is
the assumption that human and computer needs are equally important, or
need mediating.
A program consists of data and operations on data. Data is not just the input file,
but everything that is held—an array, a list, a graph, a constant—during the life of the
program. The more the computer knows about this data, the better it is at executing
operations on it. Types are exactly this metadata. Describing this metadata, the
types, takes real effort for the human. Statically typed languages such as C and
Fortran are at one extreme, where all types must be defined and are statically checked
during the compilation phase. The result is excellent performance. Dynamically typed
languages dispense with type definitions, which leads to greater productivity but lower
performance as the compiler and the runtime cannot benefit from the type information
that is essential to producing fast code. Can we strike a balance between the human’s
preference to avoid types and the computer’s need to know?
3.2. Julia’s Recognizable Types. Many users of Julia may never need to know
about types for performance. Julia’s type inference system often does all the work,
giving performance without type declarations.
5 http://julialang.org/community
6 https://github.com/shashi/Escher.jl
7 http://iaindunning.com/2013/sudoku-as-a-service.html
8 http://www.juliacon.org
JULIA: A FRESH APPROACH TO NUMERICAL COMPUTING 75
Julia’s design allows for the gradual learning of concepts, where users start in a
manner that is familiar to them and, over time, learn to structure programs in the
“Julian way”—a term that implies well-structured readable high performance Julia
code. Julia users coming from other numerical computing environments have a notion
that data may be represented as matrices that may be dense, sparse, symmetric,
triangular, or of some other kind. They may also, though not always, know that
elements in these data structures may be single or double precision floating-point
numbers, or integers of a specific width. In more general cases, the elements within
data structures may be other data structures. We introduce Julia’s type system using
matrices and their number types:
In[14]: rand(1,2,1)
In[15]: [1 2; 3 4]
3.3. User’s Own Types Are First Class Too. Many dynamic languages for nu-
merical computing have traditionally contained an asymmetry, with built-in types
having much higher performance than any user-defined types. This is not the case
76 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
with Julia, where there is no meaningful distinction between user-defined and “built-
in” types.
We have mentioned so far a few number types and two matrix types: Array{T,2},
the dense array with element type T, and SymTridiagonal{T}, the symmetric tridi-
agonal with element type T. There are also other matrix types for other structures in-
cluding SparseMatrixCSC (compressed sparse columns), Hermitian, Triangular, Bidi-
agonal, and Diagonal. Julia’s sparse matrix type has an added flexibility, that it can
go beyond storing just numbers as nonzeros, and can instead store any other Julia
type as well. The indices in SparseMatrixCSC can also be represented as integers of
any width (16-bit, 32-bit, or 64-bit). All these different matrix types, available as
built-in types to a user downloading Julia, are implemented completely in Julia and
are in no way any more or less special than any other types a user may define in their
own program.
For demonstration purposes, we now create a symmetric arrow matrix type that
contains a diagonal and the first row A[1,2:n]. Oce could also throw an ArgumentEr-
ror if the ev vector was not one shorter in length than the dv vector.
Out[18]: SymArrow{Int64}([1,2,3,4,5],[6,7,8,9])
The parameter in the array refers to the type of each element of the array. Code
can and should be written independently of the type of each element.
In section 4.6.2, we develop the symmetric arrow example much further. The
SymArrow matrix type contains two vectors, one each for the diagonal and the first row,
and these vectors contain elements of type T. In the type definition, the type SymArrow
is parametrized by the type of the storage element T. By doing so, we have created a
generic type, which refers to a universe of all arrow matrices containing elements of
all types. The matrix S is an example where T is Int64. When we write functions
in section 4.6.2 that operate on arrow matrices, those functions themselves will be
generic and applicable to the entire universe of arrow matrices we have defined here.
Julia’s type system allows for abstract types, concrete “bits” types, composite
types, and immutable composite types. All of these types can have parameters and
users may even write programs using unions of them. We refer the reader to full
details about Julia’s type system in the types chapter in the Julia manual.9
9 See http://docs.julialang.org/en/latest/manual/types/
JULIA: A FRESH APPROACH TO NUMERICAL COMPUTING 77
by vectorizing, the user has promised the computer that the type of an entire vector
of data matches the very first element. This is an example where users are willing
to provide type information to the computer without even knowing that is what they
are doing. Hence, it is an example of a strategy that balances the computer’s needs
with the human’s.
From the computer’s viewpoint, vectorization means that operations on data
happen largely in sections of the code where types are known to the runtime system.
The runtime has no idea about the data contained in an array until it encounters
the array. Once encountered, the type of the data within the array is known, and
this knowledge is used to execute an appropriate high performance kernel. Of course,
what really occurs at runtime is that the system figures out the type and then reuses
that information through the length of the array. As long as the array is not too
small, all the extra work incurred in gathering type information and acting upon it
at run time is amortized over the entire operation.
The downside of this approach is that the user can achieve high performance
only with built-in types. User-defined types end up being dramatically slower. The
restructuring for vectorization is often unnatural, and at times not possible. We
illustrate this with an example of a cumulative sum computation. Note that due to
the size of the problem, the computation is memory bound, and one does not observe
the case with complex arithmetic to be twice as slower than the real case, even though
it is performing twice as many floating point operations.
We execute this code on a vector of double precision real numbers and double
precision complex numbers and observe something that may seem remarkable: similar
run times in each case.
This simple example is difficult to vectorize, and hence is often provided as a built-
in function in many numerical computing systems. In Julia, the implementation is
very similar to the snippet of code above and runs at speeds similar to C. While Julia
users can write vectorized programs as in any other dynamic language, vectorization
is not a prerequisite for performance. This is because Julia strikes a different balance
78 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
between the human and the computer when it comes to specifying types. Julia allows
optional type annotations, which are essential when writing libraries but not for end-
user programs that are exploring algorithms or a dataset.
Generally, in Julia, type annotations are not used for performance, but purely
for code selection (see section 4). If the programmer annotates their program with
types, the Julia compiler will use that information. However, in general, user code
often includes minimal or no type annotations, and the Julia compiler automatically
infers the types.
3.5. Type Inference Rescues “For Loops” and So Much More. A key compo-
nent of Julia’s ability to combine performance with productivity in a single language
is its implementation of dataflow type inference [24, 17, 4]. Unlike type inference al-
gorithms for static languages, this algorithm is tailored to the way dynamic languages
work: the typing of code is determined by the flow of data through it. The algorithm
works by walking through a program, starting with the types of its input values, and
“abstractly interpreting” it: instead of applying the code to values, it applies the
code to types, following all branches concurrently and tracking all possible states the
program could be in, including all the types each expression could assume.
The dataflow type inference algorithm allows programs to be automatically an-
notated with type bounds without forcing the programmer to explicitly specify types.
Yet, in dynamic languages it is possible to write programs which inherently cannot be
concretely typed. In such cases, dataflow type inference provides what bounds it can,
but these may be trivial and useless—i.e., they may not narrow down the set of possi-
ble types for an expression at all. However, the design of Julia’s programming model
and standard library are such that a majority of expressions in typical programs can
be concretely typed.
A lesson of the numerical computing languages is that one must learn to vectorize
to get performance. The mantra is “for loops” are bad, vectorization is good. Indeed
one can find the following mantra on p.72 of the 1998 Getting Started with MATLAB
manual (and other editions):
Experienced MATLAB users like to say “Life is too short to spend writing
for loops.”
It is not that “for loops” are inherently slow in themselves. The slowness comes
from the fact that in the case of most dynamic languages, the system does not have
access to the types of the variables within a loop. Since programs often spend much of
their time doing repeated computations, the slowness of a particular operation due to
lack of type information is magnified inside a loop. This leads to users often talking
about “slow for loops” or “loop overhead.”
4. Code Selection: Run the Right Code at the Right Time. Code selection or
code specialization from one point of view is the opposite of the code reuse enabled
by abstraction. Ironically, viewed another way, it enables abstraction. Julia allows
users to overload function names and select code based on argument types. This can
happen at the highest and lowest levels of the software stack. Code specialization lets
us optimize for the details of the case at hand. Code abstraction lets calling codes,
even those not yet written or perhaps not even imagined, work on structures that
may not have been envisioned by the original programmer.
JULIA: A FRESH APPROACH TO NUMERICAL COMPUTING 79
We see this as the ultimate realization of the famous 1908 quip that
Mathematics is the art of giving the same name to different things.10
10 A few versions of Poincaré’s quote are relevant to Julia’s power of abstraction and numerical
(http://www.nieuwarchief.nl/serie5/pdf/naw5-2012-13-3-154.pdf)
One example has just shown us the importance of terms in mathematics; but I could
quote many others. It is hardly possible to believe what economy of thought, as Mach
used to say, can be effected by a well-chosen term. I think I have already said somewhere
that mathematics is the art of giving the same name to different things. It is enough
that these things, though differing in matter, should be similar in form, to permit of their
being, so to speak, run in the same mould. When language has been well chosen, one is
astonished to find that all demonstrations made for a known object apply immediately to
many new objects: nothing requires to be changed, not even the terms, since the names
have become the same.
(http://www-history.mcs.st-andrews.ac.uk/Extras/Poincare Future.html)
80 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
Fig. 2 Gauss quote hanging from the ceiling of the longstanding Boston Museum of Science Math-
ematica Exhibit.
Here, multiplication is dispatched by the type of its first and second arguments.
It is implemented in the usual way if both are numbers, but there are three new ways
if one, the other, or both are functions.
These definitions exist as part of a larger system of generic definitions, which can
be reused by later definitions. Consider the case of the mathematician Gauss’s prefer-
ence for sin2 φ to refer to sin(sin(φ)) and not sin(φ)2 (writing “sin2 (φ) is odious to me,
even though Laplace made use of it.” (see Figure 2). By defining *(f::Function,
g::Function)= x->f(g(x)), (f^2)(x) automatically computes f (f (x)), as Gauss
wanted. This is a consequence of a generic definition that evaluates x^2 as x*x no
matter how x*x is defined.
This paradigm is a natural fit for numerical computing, since so many important
operations involve interactions among multiple values or entities. Binary arithmetic
operators are obvious examples, but many other uses abound. The fact that the
compiler can pick the sharpest matching definition of a function based on its input
types helps achieve higher performance, by keeping the code execution paths tight
and minimal.
We have not seen this elsewhere in the literature but it seems worthwhile to point
out four dispatch possibilities:
1. Static single dispatch (not done).
2. Static multiple dispatch (frequent in static languages, e.g., C++ overloading).
3. Dynamic single dispatch (MATLAB’s object oriented system might fall into
this category, though it has its own special characteristics).
4. Dynamic multiple dispatch (usually just called multiple dispatch).
In section 4.4 we discuss the comparison with traditional object oriented ap-
proaches. Class-based object oriented programming could reasonably be called dy-
JULIA: A FRESH APPROACH TO NUMERICAL COMPUTING 81
namic single dispatch, and overloading could reasonably be called static multiple
dispatch. Julia’s dynamic multiple dispatch approach is more flexible and adaptable
while still retaining powerful performance capabilities. Julia programmers often find
that dynamic multiple dispatch makes it easier to structure their programs in ways
that are closer to the underlying science.
4.2. Code Selection from Bits to Matrices. Julia uses the same mechanism for
code selection at all levels, from the top to the bottom.
f Function Operand Types
Low-Level “+” Add Numbers {Float , Int}
High-Level “+” Add Matrices {Dense Matrix , Sparse Matrix}
“*” Scale or Compose {Function , Number }
4.2.1. Summing Numbers: Floats and Ints. We begin at the lowest level. Math-
ematically, integers are thought of as being special real numbers, but on a computer,
an Int and a Float have two very different representations. Ignoring for a moment
that there are even many choices of Int and Float representations, if we add two num-
bers, code selection based on numerical representation is taking place at a very low
level. Most users are blissfully unaware of this code selection, because it is hidden
somewhere that is usually off-limits. Nonetheless, one can follow the evolution of the
high-level code all the way down to the assembler level, which will ultimately reveal
an ADD instruction for integer addition and, for example, the AVX11 instruction
VADDSD12 for floating-point addition in the language of x86 assembly level instruc-
tions. The point here is that ultimately two different algorithms are being called, one
for a pair of Ints and one for a pair of Floats.
Figure 3 takes a close look at what a computer must do to perform x+y depending
on whether (x,y) is (Int,Int), (Float,Float), or (Int,Float), respectively. In the first
case, an integer add is called, while in the second case a float add is called. In the
last case, a promotion of the int to float is implemented with the x86 instruction
VCVTSI2SD,13 and then the float add follows.
It is instructive to build a Julia simulator in Julia itself.
In[28]: methods(⊕)
Floating-Point Value
82 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
In[22]: f(a,b) = a + b
In[24]: # Floats add, for example, with the x86 vaddsd instruction
@code_native f(1.0,3.0)
Fig. 3 While assembly code may seem intimidating, Julia disassembles readily. Armed with
the code native command in Julia and perhaps a good list of assembler commands such
as may be found on http://docs.oracle.com/cd/E36784 $01$/pdf/E36859.pdf or http://en.
wikipedia.org/wiki/X86 instruction listings, one can really learn to see the details of code
selection in action at the lowest levels. More importantly, one can begin to understand that
Julia is fast because the assembly code produced is so tight.
attached, while sparse matrices (which may be stored in many ways) require storage
of index information one way or another. If we add two matrices, code selection must
take place depending on whether the summands are (dense,dense), (dense,sparse),
(sparse,dense), or (sparse,sparse).
While this is at a much higher level, the basic pattern is unmistakably the same
as that of section 4.2.1. We show how to use a dense algorithm in the implementation
of ⊕ when either A or B (or both) are dense. A sparse algorithm is used when both
A and B are sparse.
We have eight methods for the function ⊕, four for the low-level sum, and four
more for the high level:
In[30]: methods(⊕)
Any one instance of foo is a method. The collection of six methods is referred to
as a generic function. The word “polymorphism” refers to the use of the same name
(foo, in this example) for functions with different types. Contemplating the Poincaré
quote in footnote 5, it is handy to reason about everything to which you are giving the
same name. In actual coding, one tends to use the same name when the abstraction
makes a great deal of sense, so we use the same name “+” for ints, floats, dense, and
sparse matrices. Methods are grouped into generic functions.
While mathematics is the art of giving the same name to seemingly different
things, a computer eventually has to execute the right program in the right circum-
stance. Julia’s code selection operates at multiple levels in order to translate a user’s
abstract ideas into efficient execution. A generic function can operate on several ar-
guments, and the method with the most specific signature matching the arguments is
invoked. It is worth crystallizing some key aspects of this process:
1. The same name can be used for different functions in different circumstances.
For example, select may refer to the selection algorithm for finding the kth
smallest element in a list, or to select records in a database query, or simply
to a user-defined function in a user’s own program. Julia’s namespaces allow
the usage of the same vocabulary in different circumstances in a simple way
that makes programs easy to read.
2. A collection of functions that represent the same idea but operate on different
structures are naturally referred to by the same name. The particular method
called is based entirely on the types of all the arguments—this is multiple
dispatch. The function det may be defined for all matrices at an abstract
level. However, for reasons of efficiency, Julia defines different methods for
different types of matrices, depending on whether they are dense or sparse or
have a special structure such as diagonal or tridiagonal.
3. Within functions that operate on the same structure, there may be further
differences based on the different types of data contained within. For example,
whether the input is a vector of Float64 values or Int32 values, the norm
JULIA: A FRESH APPROACH TO NUMERICAL COMPUTING 85
Fig. 4 Advantages of Julia: It is true that this Java code is polymorphic, based on the types of the
two arguments. (“Polymorphism” means the use of the same name for a function that may
have different type arguments.) However, in Java if the method addthem is called, the types
of the arguments must be known at compile time. This is static dispatch. Java is also encum-
bered by encapsulation: in this case addthem is encapsulated inside the OverloadedAddable
class. While this is considered a safety feature in the Java culture, it becomes a burden for
numerical computing.
is computed in exactly the same way, with a common body of code, but
the compiler is able to generate different executable code from the abstract
specification.
4. Julia uses the same mechanism of code selection at the lowest and highest
levels, whether it is performing operations on matrices or operations on bits.
As a result, Julia is able to optimize the whole program, picking the right
method at the right time, either at compile time or run time.
Table 1 A comparison of Julia (1208 functions exported from the Base library) to other languages
with multiple dispatch. The “Julia operators” row describes 47 functions with special syntax
(binary operators, indexing, and concatenation). Data for other systems are from [26]. The
results indicate that Julia is using multiple dispatch far more heavily than previous systems.
Language DR CR DoS
Gwydion 1.74 18.27 2.14
OpenDylan 2.51 43.84 1.23
CMUCL 2.03 6.34 1.17
SBCL 2.37 26.57 1.11
McCLIM 2.32 15.43 1.17
Vortex 2.33 63.30 1.06
Whirlwind 2.07 31.65 0.71
NiceC 1.36 3.46 0.33
LocStack 1.50 8.92 1.02
Julia 5.86 51.44 1.54
Julia operators 28.13 78.06 2.01
puting from computer languages, finds its killer application in scientific computing.
We wanted to answer for ourselves the question of whether there was really anything
different about how Julia uses multiple dispatch.
Table 1 gives an answer in terms of dispatch ratio (DR), choice ratio (CR), and
degree of specialization (DoS). While multiple dispatch is an idea that has been cir-
culating for some time, its application to numerical computing appears to have sig-
nificantly favorable characteristics compared to previous applications.
To quantify how heavily a language feature is used, we use the following met-
rics [26]:
1. Dispatch ratio: The average number of methods in a generic function.
2. Choice ratio: For each method, the total number of methods over all generic
functions it belongs to, averaged over all methods. This is essentially the sum
of the squares of the number of methods in each generic function, divided by
the total number of methods. The intent of this statistic is to give more
weight to functions with a large number of methods.
3. Degree of specialization: The average number of type-specialized arguments
per method.
Table 1 shows the mean of each metric over the entire Julia Base library, showing
a high degree of multiple dispatch compared with corpora in other languages [26].
Compared to most multiple dispatch systems, Julia functions tend to have a large
number of definitions. To see why this might be so, it helps to compare results
from a biased sample of common operators. These functions are the most obvious
candidates for multiple dispatch, and as a result their statistics climb dramatically.
Julia is focused on numerical computing, and so is likely to have a large proportion
of functions with this characteristic.
4.6. Case Study for Numerical Computing. The complexity of linear algebra
software has been nicely captured in the context of LAPACK and ScaLAPACK by
Demmel, Dongarra et al. [7] and is reproduced verbatim here:
(1) for all linear algebra problems
(linear systems, eigenproblems, ...)
(2) for all matrix types
(general, symmetric, banded, ...)
88 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
In[33]: # Simple determinants defined using the short form for functions
newdet(x::Number) = x
newdet(A::Diagonal ) = prod(diag(A))
newdet(A::Triangular) = prod(diag(A))
newdet(A::Matrix) = -prod(diag(qrfact(full(A))[:R]))*(-1)^size(A,1)
# Tridiagonal determinant defined using the long form for functions
function newdet(A::SymTridiagonal)
# Assign c and d as a pair
c,d = 1, A[1,1]
for i=2:size(A,1)
# temp=d, d=the expression, c=temp
c,d = d, d*A[i,i]-c*A[i,i-1]^2
end
d
end
14 LU is more efficient. We simply wanted to illustrate that other ways are possible.
JULIA: A FRESH APPROACH TO NUMERICAL COMPUTING 89
4.6.2. A Symmetric Arrow Matrix Type. There exist matrix structures and
operations on those matrices. In Julia, these structures exist as Julia types. Julia has
a number of predefined matrix structure types: (dense) Matrix, (compressed sparse
column) SparseMatrixCSC, Symmetric, Hermitian, SymTridiagonal, Bidiagonal,
Tridiagonal, Diagonal, and Triangular are all examples of its matrix structures.
The operations on these matrices exist as Julia functions. Familiar examples
of operations are indexing, determinant, size, and matrix addition. Since matrix
addition takes two arguments, it may be necessary to reconcile two different types
when computing the sum.
In the following Julia example, we illustrate how the user can add symmetric
arrow matrices to the system, and then add a specialized det method to compute
the determinant of a symmetric arrow matrix efficiently. We build on the symmetric
arrow type introduced in section 3.3.
In[38]: # An example
S=SymArrow([1,2,3,4,5],[6,7,8,9])
90 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
In[43]: A=[1 2 3
1 2 1
1 0 1
1 0 -1]
Aqr = qrfact(A);
Q = Aqr[:Q]
In[44]: Q*[1,0,0,0]
In[45]: Q*[1, 0, 0]
fully in Julia code, using ccall,15 which does not require a C compiler and can be
called directly from the interactive Julia prompt.
Consider the Cholesky factorization by calling LAPACK’s xPOTRF. It uses Julia’s
metaprogramming facilities to generate four functions, corresponding to the xPOTRF
functions for Float32, Float64, Complex64, and Complex128 types. The call to the
Fortran functions is wrapped in ccall.
# Call to LAPACK:ccall(LAPACKroutine,Void,PointerTypes,JuliaVariables)
ccall(($(string(potrf)),:liblapack), Void,
(Ptr{Char}, Ptr{Int}, Ptr{$elty}, Ptr{Int}, Ptr{Int}),
&uplo, &size(A,1), A, &lda,
info)
return A, info[1]
end
end
end
chol(A::Matrix) = potrf!(’U’, copy(A))
5.3. High Performance Polynomials and Special Functions with Macros. Ju-
lia has a macro system that provides custom code generation, providing performance
that is otherwise difficult to achieve. A macro is a function that runs at parse time,
takes symbolic expressions in, and returns transformed expressions out, which are
inserted into the code for later compilation. For example, a library developer has
implemented an @evalpoly macro that uses Horner’s rule to evaluate polynomials
efficiently. Consider
In[47]: @evalpoly(10,3,4,5,6)
15 http://docs.julialang.org/en/latest/manual/calling-c-and-fortran-code/
94 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
which returns 6543 (the polynomial 3 + 4x + 5x2 + 6x3 , evaluated at 10 with Horner’s
rule). Julia allows us to see the inline generated code with the command
In[48]: macroexpand(:@evalpoly(10,3,4,5,6))
Julia provides many facilities for parallelism, which are described in detail in the
Julia manual.16 Distributed memory programming in Julia is built on two primitives—
remote calls that execute a function on a remote processor and remote references that
are returned by the remote processor to the caller. These primitives are implemented
completely within Julia. On top of them, Julia provides a distributed array data
structure, a pmap implementation, and a way to parallelize independent iterations of
a loop with the @parallel macro, all of which can parallelize code in distributed
memory. These ideas are exploratory in nature, and we only discuss them here to em-
phasize that well-designed programming language abstractions and primitives allow
one to express and implement parallelism completely within the language.
We proceed with one example that demonstrates parallel computing at work and
shows how one can impulsively grab a large number of processors and explore their
problem space quickly.
println("Sequential version")
t = 10000
for β=[1,2,4,10,20]
z = fit(Histogram, [stochastic(β) for i=1:t], -4:0.01:1).weights
plot(midpoints(-4:0.01:1), z/sum(z)/0.01)
end
16 http://docs.julialang.org/en/latest/manual/parallel-computing/
96 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
In[51]: # Readily adding 1024 processors sharpens the Monte Carlo simulation,
# computing 1024 times as many samples in the same time
Existing numerical computing languages would have us believe that this is the only
system or, even if there are others, that somehow this is the best system.
Vectorization at the software level can be elegant for some problems. There are
many matrix computation problems that look beautiful vectorized. These programs
should be vectorized. Other programs require heroics and skill to vectorize, sometimes
producing unreadable code all in the name of performance. These programs we object
to vectorizing. Still other programs cannot be vectorized very well, even with heroics.
The Julia message is to vectorize when it is natural, producing nice code. Do not
vectorize in the name of speed.
Some users believe that vectorization is required to make use of special hardware
capabilities such as SIMD instructions, multithreading, GPU units, and other forms
of parallelism. This is not strictly true, as compilers are increasingly able to apply
these performance features to explicit loops. The Julia message remains the same:
vectorize when natural, when you feel it is right.
6. Conclusion. We built Julia to meet our needs for numerical computing, and
it turns out that many others wanted exactly the same thing. At the time of writ-
ing, not a day goes by when we don’t learn that someone new has picked up Julia
at universities and companies around the world, in fields as diverse as engineering,
mathematics, physical and social sciences, finance, biotech, and many others. More
than just a language, Julia has become a place for programmers, physical scientists,
social scientists, computational scientists, mathematicians, and others to pool their
collective knowledge in the form of online discussions and code.
Acknowledgments. Julia would not have been possible without the enthusiasm
and contributions of the Julia community17 and of MIT during early years of its de-
velopment. We thank Michael La Croix for his beautiful Julia display macros. We are
indebted at MIT to Jeremy Kepner, Chris Hill, Saman Amarasinghe, Charles Leiser-
son, Steven Johnson, and Gil Strang for their collegial support, which not only allowed
for the possibility of an academic research project to update technical computing, but
made it more fun, too.
REFERENCES
17 https://github.com/JuliaLang/julia/graphs/contributors
98 JEFF BEZANSON, ALAN EDELMAN, STEFAN KARPINSKI, AND VIRAL B. SHAH
[8] A. Edelman and B. Sutton, From Random Matrices to Stochastic Operators, J. Statist.
Phys., 127 (2007), pp. 1121–1165. (Cited on p. 96)
[9] The GNU MPFR Library, http://www.mpfr.org/. (Cited on p. 74)
[10] C. Gomez, ed., Engineering and Scientific Computing with Scilab, Birkhäuser, Boston, 1999.
(Cited on p. 66)
[11] G. Hoare, Technicalities: Interactive Scientific Computing #1 of 2: Pythonic Parts, http:
//graydon2.dreamwidth.org/3186.html, 2014. (Cited on p. 74)
[12] R. Ihaka and R. Gentleman, R: A language for data analysis and graphics, J. Comput.
Graph. Statist., 5 (1996), pp. 299–314. (Cited on p. 66)
[13] Interactive Supercomputing, Star-p user guide. http://www-math.mit.edu/∼edelman/
publications/star-p-user.pdf. (Cited on p. 94)
[14] Interactive Supercomputing, Getting Started with Star-P: Taking Your First Test-Drive,
http://www-math.mit.edu/∼edelman/publications.php, 2006. (Cited on p. 94)
[15] The Jupyter Project, http://jupyter.org/. (Cited on p. 67)
[16] W. Kahan, How Futile Are Mindless Assessments of Roundoff in Floating-Point Computa-
tion?, http://www.cs.berkeley.edu/∼wkahan/Mindless.pdf, 2006. (Cited on p. 72)
[17] M. A. Kaplan and J. D. Ullman, A scheme for the automatic inference of variable types, J.
ACM, 27 (1980), pp. 128–145, https://doi.org/10.1145/322169.322181. (Cited on p. 78)
[18] C. Lattner and V. Adve, LLVM: A compilation framework for lifelong program analysis and
transformation, in Proceedings of the 2004 International Symposium on Code Generation
and Optimization (CGO’04), Palo Alto, CA, 2004, ACM, New York, 2004, pp. 75–86.
(Cited on p. 68)
[19] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic linear alge-
bra subprograms for Fortran usage, ACM Trans. Math. Softw., 5 (1979), pp. 308–323,
https://doi.org/10.1145/355841.355847. (Cited on p. 73)
[20] M. Lubin and I. Dunning, Computing in Operations Research using Julia,
INFORMS J. Comput., 27 (2015), pp. 238–248, https://doi.org/10.1287/ijoc.2014.0623;
arXiv preprint: http://dx.doi.org/10.1287/ijoc.2014.0623. (Cited on pp. 67, 74)
[21] Mathematica, http://www.mathematica.com. (Cited on p. 66)
[22] MATLAB, http://www.mathworks.com. (Cited on p. 66)
[23] The MIT License, http://opensource.org/licenses/MIT. (Cited on p. 74)
[24] M. Mohnen, A graph-free approach to data-flow analysis, in Compiler Construction, R. Hor-
spool, ed., Lecture Notes in Comput. Sci. 2304, Springer, Berlin, Heidelberg, 2002, pp. 185–
213. (Cited on p. 78)
[25] M. Murphy, Octave: A free, high-level language for mathematics, Linux J., 1997 (1997),
326884, http://dl.acm.org/citation.cfm?id=326876.326884. (Cited on p. 66)
[26] R. Muschevici, A. Potanin, E. Tempero, and J. Noble, Multiple dispatch in practice, in
Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming
Systems Languages and Applications, OOPSLA ’08, ACM, New York, 2008, pp. 563–582,
https://doi.org/10.1145/1449764.1449808. (Cited on p. 87)
[27] A. Noack, Fast and Generic Linear Algebra in Julia, Tech. report, MIT, Cambridge, MA,
2015. (Cited on pp. 88, 91)
[28] J. Regier, K. Pamnany, R. Giordano, R. Thomas, D. Schlegel, J. McAuliffe, and Prab-
hat, Learning an Astronomical Catalog of the Visible Universe through Scalable Bayesian
Inference, preprint, arXiv:1611.03404 [cs.DC], 2016. (Cited on p. 67)
[29] Rust, http://www.rust-lang.org/. (Cited on p. 74)
[30] H. Shen, Interactive notebooks: Sharing the code, Nature Toolbox, 515 (2014), pp. 151–152,
http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261. (Cited on
p. 67)
[31] G. Strang, Introduction to Linear Algebra, Wellesley-Cambridge Press, Wellesley, MA, 2003,
https://books.google.com/books?id=Gv4pCVyoUVYC. (Cited on p. 88)
[32] Swift, https://developer.apple.com/swift/. (Cited on p. 68)
[33] M. Udell, K. Mohan, D. Zeng, J. Hong, S. Diamond, and S. Boyd, Convex optimiza-
tion in Julia, in SC14 Workshop on High Performance Technical Computing in Dynamic
Languages, 2014; preprint, arXiv:1410.4821 [math.OC], 2014. (Cited on p. 67)
[34] S. van der Walt, S. C. Colbert, and G. Varoquaux, The NumPy Array: A Structure for
Efficient Numerical Computation, CoRR, abs/1102.1523, 2011. (Cited on p. 66)
[35] H. Wickham, ggplot2, http://ggplot2.org/. (Cited on p. 71)
[36] L. Wilkinson, The Grammar of Graphics (Statistics and Computing), Springer-Verlag, New
York, 2005. (Cited on p. 71)