0% found this document useful (0 votes)
6 views

Numerical-and-Scientific-Computing-in-Python-v0.1.2

The document provides an overview of numerical and scientific computing using Python, detailing the Research Computing Services team and their expertise. It includes instructions for running Python, accessing tutorials, and utilizing libraries like NumPy and SciPy for efficient numerical computations. The document emphasizes Python's strengths as a general-purpose language while also addressing its limitations in numeric computing and the need for specialized libraries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Numerical-and-Scientific-Computing-in-Python-v0.1.2

The document provides an overview of numerical and scientific computing using Python, detailing the Research Computing Services team and their expertise. It includes instructions for running Python, accessing tutorials, and utilizing libraries like NumPy and SciPy for efficient numerical computations. The document emphasizes Python's strengths as a general-purpose language while also addressing its limitations in numeric computing and the need for specialized libraries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Numerical and Scientific Computing in Python

v0.1.2

Research Computing Services


IS & T
RCS Team and Expertise

 Our Team  Consulting Focus:


 Scientific Programmers  Bioinformatics
 Systems Administrators  Data Analysis / Statistics
 Graphics/Visualization Specialists  Molecular modeling
 Account/Project Managers  Geographic Information Systems
 Special Initiatives (Grants)  Scientific / Engineering Simulation
 Maintains and administers the Shared  Visualization
Computing Cluster
 Located in Holyoke, MA
 ~19,000 CPUs running Linux
 CONTACT US: help@scc.bu.edu
Running Python for the Tutorial

 If you have an SCC account, log on and use Python


there.
 Run:

module load python/3.6.2


spyder &
source /net/scc2/scratch/Numpy_examples.sh

 Note that the spyder program takes a while to load!


SCC On-Demand

 Go to: scc-ondemand.bu.edu

 Go to Interactive Apps, choose Spyder.

 Once available, open a terminal window and run:

source /net/scc2/scratch/Numpy_examples.sh
Run Spyder

 Click on the Start Menu in


the bottom left corner and
type: spyder

 After a second or two it will


be found. Click to run it.

 Be patient…it takes a while


to start.
Download from the RCS website

 Open a browser and go to:

http://rcs.bu.edu/examples/numpy_scipy

 Click on the numerical_sci_python.zip file and download it.

 After it downloads extract it to a folder:


 This presentation is there
 Several example files can be found there as well.
SCC Python Tutorials

 Introduction to Python, Part one


 Introduction to Python, Part two
 Numerical and Scientific Computing in Python
 Python for Data Analysis
 Data Visualization in Python
 Introduction to Python Scikit-learn
 Python Optimization
Outline
 Python lists

 The numpy library

 Speeding up numpy: numba and numexpr

 Libraries: scipy and opencv

 Alternatives to Python
Python’s strengths

 Python is a general purpose language.


 In contrast with R or Matlab which started out as specialized languages

 Python lends itself to implementing complex or specialized algorithms for


solving computational problems.

 It is a highly productive language to work with that’s been applied to


hundreds of subject areas.
Extending its Capabilities
 However…for number crunching some aspects of the language are not
optimal:
 Runtime type checks
 No compiler to analyze a whole program for optimizations
 General purpose built-in data structures are not optimal for numeric calculations

 “regular” Python code is not competitive with compiled languages (C, C++,
Fortran) for numeric computing.

 The solution: specialized libraries that extend Python with data structures
and algorithms for numeric computing.
 Keep the good stuff, speed up the parts that are slow!
Outline
 The numpy library

 Libraries: scipy and opencv

 When numpy / scipy isn’t fast enough


NumPy
 NumPy provides optimized data structures and basic routines for
manipulating multidimensional numerical data.

 Mostly implemented in compiled C code.

 NumPy underlies many other numeric and algorithm libraries available for
Python, such as:
 SciPy, matplotlib, pandas, OpenCV’s Python API, and more
Ndarray – the basic NumPy data type

 NumPy ndarray’s are:


 Typed
 Fixed size
 Fixed dimensionality
 An ndarray can be constructed from:
 Conversion from a Python list, set, tuple, or similar data structure
 NumPy initialization routines
 Copies or computations with other ndarray’s
 NumPy-based functions as a return value
ndarray vs list

 List:  Ndarray:
 General purpose  Intended to store and process
 Untyped (mostly) numeric data
 1 dimension  Typed
 Resizable  N-dimensions
 Add/remove elements anywhere  Chosen at creation time
 Accessed with [ ] notation and  Fixed size
integer indices  Chosen at creation time
 Accessed with [ ] notation and
integer indices
List Review x = ['a','b',3.14]

Operation Syntax Notes

Indexing – starting from 0 x[0]  ‘a’

x[1]  ‘b’
Indexing backwards from -1 x[-1]  3.14

x[-3]  ‘a’
Slicing x[start:end:incr] Slicing produces a COPY of
the original list!
x[0:2]  [‘a’,’b’]

x[-1:-3:-1]  [3.14,’b’]

x[:]  [‘a’,’b’,3.14]
Sorting x.sort()  in-place sort Depending on list contents a
sorted(x)  returns a new sorted list sorting function might be req’d

Size of a list len(x)


List Implementation x = ['a','b',3.14]

 A Python list mimics a linked list data structure


 It’s implemented as a resizable array of pointers to Python objects for performance reasons.

Pointer to a
Python object
'a'
Allocated
Pointer to a
x Python object
'b' anywhere in
memory

Pointer to a
Python object
3.14

 x[1]  get the pointer (memory address) at index 1  resolve pointer to


retrieve the Python object in memory  get the value from the object 
return ‘b’
import numpy as np
# Initialize a NumPy array
NumPy ndarray # from a Python list
y = np.array([1,2,3])

 The basic data type is a class called ndarray.


 The object has:
 a data that describes the array (data type, number of dimensions, number of elements, memory
format, etc.)
 A contiguous array in memory containing the data.
Values are
Data description physically
(integer, 3 elements, 1-D) adjacent in
y memory

1 2 3
 y[1]  check the ndarray data type  retrieve the value at offset 1 in the
data array  return 2
https://docs.scipy.org/doc/numpy/reference/arrays.html
dtype
 Every ndarray has a dtype, the type a = np.array([1,2,3])
of data that it holds. a.dtype  dtype('int64')
 This is used to interpret the block of
data stored in the ndarray.
c = np.array([-1,4,124],
 Can be assigned at creation time: dtype='int8')
c.dtype --> dtype('int8')

 Conversion from one type to


another is done with the astype()
method: b = a.astype('float')
b.dtype  dtype('float64')
Ndarray memory notes
 The memory allocated by an ndarray:

 Storage for the data: N elements * bytes-per-element


 4 bytes for 32-bit integers, 8 bytes for 64-bit floats (doubles), 1 byte for 8-bit characters etc.

 A small amount of memory is used to store info about the ndarray (~few dozen bytes)

 Data storage is compatible with external libraries


 C, C++, Fortran, or other external libraries can use the data allocated in an ndarray directly without
any conversion or copying.
ndarray from numpy initialization

 There are a number of initialization routines. They are mostly copies of


similar routines in Matlab.
 These share a similar syntax:
function([size of dimensions list], opt. dtype…)

 zeros – everything initialized to zero.


 ones – initialize elements to one.
 empty – do not initialize elements
 identity – create a 2D array with ones on the diagonal and zeros elsewhere
 full – create an array and initialize all elements to a specified value
 Read the docs for a complete list and descriptions.
x = [1,2,3]
ndarray from a list y = np.array(x)

 The numpy function array creates a new array from any data structure
with array like behavior (other ndarrays, lists, sets, etc.)
 Read the docs!

 Creating an ndarray from a list does not change the list.

 Often combined with a reshape() call to create a multi-dimensional array.

 Open the file ndarray_basics.py in Spyder so we can check out some


examples.
ndarray indexing oneD = np.array([1,2,3,4])
twoD = oneD.reshape([2,2])

twoD 
 ndarray indexing is similar to array([[1, 2],
[3, 4]])
Python lists, strings, tuples, etc.
# index from 0
oneD[0]  1
 Index with integers, starting from oneD[3]  4
zero.
# -index starts from the end
oneD[-1]  4
oneD[-2]  3
 Indexing N-dimensional arrays,
just use commas: # For multiple dimensions use a comma
# matrix[row,column]
array[i,j,k,l] = 42 twoD[0,0]  1
twoD[1,0]  3
y = np.arange(50,300,50)
ndarray slicing # y --> array([ 50, 100, 150, 200, 250])

 Syntax for each dimension (same rules


as lists): y[0:3] --> array([ 50, 100, 150])
 start:end:step y[-1:-3:-1] --> array([250, 200])
 start:  from starting index to end
 :end  start from 0 to end (exclusive of end)
x = np.arange(10,130,10).reshape(4,3)
# x --> array([[ 10, 20, 30],
 :  all elements.
[ 40, 50, 60],
[ 70, 80, 90],
 Slicing an ndarray does not make a [100, 110, 120]])
copy, it creates a view to the original
data. # 1-D returned!
x[:,0] --> array([ 10, 40, 70, 100])
# 2-D returned!
 Slicing a Python list creates a copy. x[2:4,1:3] --> array([[ 80, 90],
[110, 120]])

Look at the file slicing.py


ndarray slicing assignment
 Slice notation on the left hand side of an = sign overwrites elements of an ndarray.

y = np.arange(50,300,50)
# y --> array([ 50, 100, 150, 200, 250])

y[0:3] = -1
# y --> array([ -1, -1, -1, 200, 250])

y[0:8] = -1
# NO ERROR!
# y --> array([ -1, -1, -1, -1, -1])
ndarray addressing with an ndarray

 Ndarray’s can be used to a=np.linspace(-1,1,5)


# a --> [-1. , -0.5, 0. , 0.5, 1. ]
address/index another ndarray.
b=np.array([0,1,2])

 Use integer or Boolean values. a[b] # --> array([-1. , -0.5, 0.])

c = np.array([True, False, True, True,


 Remember: still returns a view. False])

# Boolean indexing returns elements


# where True
a[c] # --> array([-1. , 0. , 0.5])
numpy.where
a=np.linspace(-1,1,5)
# a --> [-1. , -0.5, 0. , 0.5, 1. ]
 Similar to find in Matlab.
 Syntax: # Returns a TUPLE containing the INDICES where
# the condition is True!
np.where(a <= 0)
numpy.where(condition, [x,y]) # --> (array([0, 1, 2], dtype=int64),)

 Condition: some Boolean condition np.where(a <= 0, -a, 20*a)


applied to an ndarray # --> array([ 1. , 0.5, -0. , 10. , 20. ])
 x, y: Optional variables to choose
from. x is for condition == True,
y is for condition == False.
 All three arguments must apply to
ndarray’s.
ndarray memory layout
X = np.ones([3,5],order='F')
# OR...
 The memory layout (C or Fortran # Y is C-ordered by default
order) can be set: Y = np.ones([3,5])
 This can be important when dealing with # Z is a F-ordered copy of Y
external libraries written in R, Matlab, etc. Z = np.asfortranarray(Y)

 Row-major order: C, C++, Java, C#,


and others

 Column-major order: Fortran, R,


Matlab, and others

 See here for more detail


 Or read more about the concept in terms of
Matlab, including has speed measurements
https://en.wikipedia.org/wiki/Row-_and_column-major_order
ndarray memory layout
# Y is C-ordered by default
Y = np.ones([2,3,4])
 For row-major ordering the rightmost # For loop indexing:
index accesses values in adjacent total=0.0
memory. for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
for k in range(Y.shape[2]):
 The opposite is true for column-major total += Y[i,j,k]
ordering.
# X is Fortan-ordered
X = np.ones([2,3,4], order='F')
 If using for loops, row or column # For loop indexing:
operations like ‘sum’ etc. use indices total=0.0
correctly. for i in range(X.shape[2]):
for j in range(X.shape[1]):
for k in range(X.shape[0]):
total += X[k,j,i]

Look at the file row_vs_col_timing.py


ndarray math
a = np.array([1,2,3,4])
 By default operators work b = np.array([4,5,6,7])
element-by-element c = a / b
# c is an ndarray
print(type(c))  <class 'numpy.ndarray'>
 These are executed in
compiled C code. a * b  array([ 4, 10, 18, 28])
a + b  array([ 5, 7, 9, 11])
a - b  array([-3, -3, -3, -3])
a / b  array([0.25, 0.4, 0.5, 0.57142857])
-2 * a + b  array([ 2, 1, 0, -1])
 Vectors are applied
a = np.array([2,2,2,2])
row-by-row to matrices
c = np.array([[1,2,3,4],
[4,5,6,7],
[1,1,1,1],
 The length of the vector [2,2,2,2]])  array([[1, 2, 3, 4],
must match the width of [4, 5, 6, 7],
[1, 1, 1, 1],
the row. [2, 2, 2, 2]])
a + c  array([[3, 4, 5, 6],
[6, 7, 8, 9],
[3, 3, 3, 3],
[4, 4, 4, 4]])
Linear algebra multiplication

 Vector/matrix multiplication can


a = np.array([[1, 0], [0, 1]])
be done using the dot(), cross() b = np.array([[4, 1], [2, 2]])
functions, or @ operator np.dot(a, b) # --> array([[4, 1],
# [2, 2]])
a @ b # --> array([[4, 1],
 There are many other linear # [2, 2]])
algebra routines! np.cross(a,b)# --> array([ 1, -2])

https://docs.scipy.org/doc/numpy/reference/routines.linalg.html
NumPy I/O
 When reading files you can use standard Python, use lists, allocate
ndarrays and fill them.

 Or use any of NumPy’s I/O routines that will directly generate ndarrays.

 The best way depends on the structure of your data.

 If dealing with structured numeric data (tables of numbers, etc.) NumPy is


easier and faster.

 Docs: https://docs.scipy.org/doc/numpy/reference/routines.io.html
Numpy docs

 As numpy is a large library we can only cover the basic usage here

 Let’s look that the official docs:


https://docs.scipy.org/doc/numpy/reference/index.html

 As an example, computing an average:


https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html#numpy.mean
Some numpy file reading options
numpy.save # save .npy
numpy.savez # save .npz
 .npz and .npy file formats (cross-platform
# ditto, with compression
compatible) : numpy.savez_compressed
 .npy files store a single NumPY variable in a binary
format. numpy.load # load .npy
 .npz files store multiple NumPy Variables in a file. numpy.loadz # load .npz

 h5py is a library that reads HDF5 files into


ndarrays
Tutorial:
https://docs.scipy.org/doc/nu
 The I/O routines allow for flexible reading from
mpy/user/basics.io.html
a variety of text file formats
NumPy I/O example
B01
 Read in a data file from a set of ocean weather
buoys.
 File: buoy_data.csv
 18 columns. 1st column are dates, the rest are numeric data
for different buoys.
 Some rows have dates but are missing data points in some
columns.
 Use the most flexible NumPy file reader, A01

genfromtxt.

 Return a 2D matrix of floats


 Convert the date string column to numbers.

Look at the file numpy_io.py


Outline
 The numpy library

 Libraries: scipy and opencv

 When numpy / scipy isn’t fast enough


• physical constants and conversion factors
SciPy • hierarchical clustering, vector quantization, K-
means
• Discrete Fourier Transform algorithms
 SciPy builds on top of • numerical integration routines
NumPy. • interpolation tools
• data input and output
• Python wrappers to external libraries
 Ndarrays are the basic data • linear algebra routines
structure used. • miscellaneous utilities (e.g. image reading/writing)
• various functions for multi-dimensional image
processing
 Libraries are provided for: • optimization algorithms including linear
programming
• signal processing tools
 Comparable to Matlab • sparse matrix and related algorithms
toolboxes. • KD-trees, nearest neighbors, distance functions
• special functions
• statistical functions
scipy.io
 I/O routines support a wide variety of file formats:
Software Format Read? Write?
name
Matlab .mat Yes Yes
IDL .sav Yes No
Matrix Market .mm Yes Yes
Netcdf .nc Yes Yes
Harwell-Boeing .hb Yes Yes
(sparse matrices)
Unformatted Fortran files .anything Yes Yes
Wav (sound) .wav Yes Yes
Arff .arff Yes No
(Attribute-Relation File Format)
Using SciPy
 Think about your code and what sort of algorithms you’re using:
 Integration, linear algebra, image processing, etc.

 See if an appropriate algorithm exists in SciPy before trying to write


your own.

 Read the docs – many functions have large numbers of optional


arguments.

 Understand the algorithms!


Example: Fit a line with SciPy
 There are many ways to fit equation parameters to data in NumPy and
SciPy

 scipy.stats.linregress: Calculate a regression line

 Open the example linregress.py

 This demonstrates calling the function and extracting all the info it
returns.
Example: scipy.optimize.minimize 𝑦 = 3𝑥 2 + 𝑥 − 1

 Finds the minimum value of a function.


 You provide the function as a variable to
minimize.
 This is a common pattern in scipy.

 Open scipy_minimize.py

https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html
OpenCV
• Image Processing
• Image file reading and writing
 The Open Source Computer
• Video I/O
Vision Library • High-level GUI
• Video Analysis
• Camera Calibration and 3D Reconstruction
 Highly optimized and mature C++ • 2D Features Framework
library usable from C++, Java, and • Object Detection
Python. • Deep Neural Network module
• Machine Learning
• Clustering and Search in Multi-Dimensional Spaces
 Cross platform: Windows, Linux, • Computational Photography
Mac OSX, iOS, Android • Image stitching
OpenCV vs SciPy

 For imaging-related operations and many linear algebra functions there is a


lot of overlap between these two libraries.

 OpenCV is frequently faster, sometimes significantly so.

 The OpenCV Python API uses NumPy ndarrays, making OpenCV algorithms
compatible with SciPy and other libraries.
OpenCV vs SciPy
 A simple benchmark: Gaussian and median
filtering a 1024x671 pixel image of the CAS
building.
 Gaussian: radius 5, median: radius 9. See: image_bench.py
 Timing: 2.4 GHz Xeon E5-2680 (Sandybridge)

Operation Function Time (msec) OpenCV speedup

scipy.ndimage.gaussian_filter 85.7
Gaussian 3.7x
cv2.GaussianBlur 23.2

scipy.ndimage.median_filter 1,780
Median 22.5x
cv2.medianBlur 79.2
When NumPy and SciPy aren’t fast enough

 Auto-compile your Python code with the numba and numexpr libraries

 Use the Intel Python distribution

 Re-code critical paths with Cython

 Combine your own C++ (with SWIG) or Fortran code (with f2py) and call
from Python
numba

 The numba library can translate portions of your Python code and compile
it into machine code on demand.

 Achieves a significant speedup compared with regular Python.

 Compatible with numpy ndarrays.

 Can generate code to execute automatically on GPUs.


numba from numba import jit

 The @jit decorator is used to # This will get compiled when it's
indicate which functions are first executed
@jit
compiled.
def average(x, y, z):
 Options: return (x + y + z) / 3.0
 GPU code generation
 Parallelization
 Caching of compiled code # With type information this one gets
# compiled when the file is read.
@jit (float64(float64,float64,float64))
 Can produce faster array code def average_eager(x, y, z):
than pure NumPy statements. return (x + y + z) / 3.0
numexpr
import numpy as np
import numexpr as ne
 Another acceleration library for
Python. a = np.arange(10)
b = np.arange(0, 20, 2)

 Useful for speeding up specific # Plain NumPy


ndarray expressions. c = 2 * a + 3 * b
 Typically 2-4x faster than plain NumPy
# Numexpr
d = ne.evaluate("2*a+3*b")
 Code needs to be edited to move
ndarray expressions into the
numexpr.evaluate function:
Intel Python

 Intel now releases a customized build of Python 2.7 and 3.6 based on
their optimized libraries.

 Can be installed stand-alone or inside of Anaconda:


https://software.intel.com/en-us/distribution-for-python

 Available on the SCC: module avail python2-intel (or python3-intel)


Intel Python

 In RCS testing on various projects the Intel Python build is always at least
as fast as the regular Python and Anaconda modules on the SCC.
 In one case involving processing several GB’s of XML code it was 20x faster!

 Easy to try: change environments in Anaconda or load the SCC module.

 Can use the Intel Thread Building Blocks library to improve multithreaded
Python programs:

python -m tbb parallel_script.py


Cython

 Cython is a superset of the Python language.

 The additional syntax allows for C code to be auto-generated and


compiled from Python code.

 This can make mixing Python, Cython, and C code (or libraries) very
straightforward.

 A mature library that is widely used.


You feel the need for speed…
 Auto-compilation systems like numba, numexpr, and Cython:
 all provide access to higher speed code
 minimal to significant code changes
 You’re still working in Python or Python-like code
 Faster than NumPy which is also much faster than plain Python for numeric calculation

 For the fastest implementation of algorithms, optimized and well-written C,


C++, and Fortran codes are very hard to beat
 Connect C or C++ to Python with SWIG
 Connect Fortran to Python with f2py (part of Numpy)

 Contact RCS for help!


End-of-course Evaluation Form
 Please visit this page and fill in the evaluation form for this course.

 Your feedback is highly valuable to the RCS team for the improvement
and development of tutorials.

 If you visit this link later please make sure to select the correct tutorial –
name, time, and location.

http://scv.bu.edu/survey/tutorial_evaluation.html

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy