Numpy (Numerical Python)
Numpy (Numerical Python)
(Numerical Python)
Introduction
• Numpy
– Numeric Python
– Fast computation with n-dimensional arrays
– NumPy is a library for the Python programming language,
adding support for large, multi-dimensional arrays and
matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.
– NumPy is a Python library that provides a simple yet
powerful data structure: the n-dimensional array. This is the
foundation on which almost all the power of Python's data
science toolkit is built, and learning NumPy is the first step
on any Python data scientist's journey
Numpy
1. Based around one data structure
2. ndarray
3. n-dimensional array
4. Import with import numpy as np
5. Usage is np.command(xxx)
ndarrays
1d: 5,67,43,76,2,21
a=np.array([5,67,43,76,2,21])
2d: 4,5,8,4
6,3,2,1
8,6,4,3
a=np.array([4,5,8,4],[6,3,2,1],[8,6,4,3])
*, +
import numpy as np
data = randn(2, 3)
[[ 0.079 -0.8418 -0.0838]
print data [-1.4497 0.6628 1.1006]]
• Output
• [ 6. 7.5 8. 0. 1. ]
Multidimensional arrays
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
print arr2 OUTPUT
print arr2.ndim [[1 2 3 4]
print arr2.shape [5 6 7 8]]
2
(2L, 4L)
• print arr2
• print arr2.ndim
• print type(arr2.ndim)
• print arr2.shape
• print type(arr2.shape)
• print arr2.shape[0]
• print arr2.shape[1]
OUTPUT
[[1 2 3 4]
• print arr2
[5 6 7 8]]
• print arr2.ndim
•
2
print type(arr2.ndim)
• print arr2.shape
<type 'int'>
• print type(arr2.shape) (2L, 4L)
• print arr2.shape[0] <type 'tuple'>
• print arr2.shape[1] 2
4
3d array
• data2 = [[[1]]]
• arr2 = np.array(data2)
• print arr2
• print arr2.ndim
• print arr2.shape
3d array
• data2 = [[[1]]]
• arr2 = np.array(data2) OUTPUT
• print arr2 [[[1]]]
• print arr2.ndim 3
• print arr2.shape (1L, 1L, 1L)
More making arrays
• np.zeros(10) [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[[ 0. 0. 0. 0. 0. 0.]
• np.zeros((3, 6)) [ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]]
[[[ 0. 0.]
• np.empty((2, 3, [ 0. 0.]
2)) [ 0. 0.]]
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]]
Operations between arrays and scalars
• arr = np.array
([1., 2., 3.])
• print arr
• print arr * arr
• print arr - arr
• print 1 / arr
• print arr ** 0.5
Operations between arrays and scalars
output
• arr = np.array [ 1. 2. 3.]
([1., 2., 3.])
• print arr [ 1. 4. 9.]
• print arr * arr
• print arr - arr [ 0. 0. 0.]
• print 1 / arr
• print arr ** 0.5 [ 1. 0.5 0.3333]
[ 1. 1.4142 1.7321]
Array creation functions
• a=np.array([True],
dtype=np.int64)
• print a
• print a.dtype
• a=np.array([True],
dtype=np.bool)
• print a
• print a.dtype
• a=np.array([True],
dtype=np.int64) [1]
• print a Int64
• print a.dtype
• a=np.array([True],
dtype=np.bool) [ True]
• print a bool
• print a.dtype
NumPy data types 1
NumPy data types 2
astype
[ 3.7 -1.2 -2.6]
• arr = np.array(
• [3.7, -1.2, -2.6]) [ 3 -1 -2]
• print arr
• print arr.astype(np.int32)
• print arr[5] 5
• print arr[5:8] [5 6 7]
[ 0 1 2 3 4 12 12 12 8 9]
• arr[5:8] = 12
• print arr
The original array has changed
• arr_slice = arr[5:8]
• arr_slice[1] = 12345
• print arr
• arr_slice[:] = 64
• print arr
[ 0 1 2 3 4 12 12345 12 8 9]
[ 0 1 2 3 4 64 64 64 8 9]
2d array
• arr2d = np.array
• ([[1, 2, 3],
• [4, 5, 6], [7 8 9]
• [7, 8, 9]]) 3
• print arr2d[2]
3
• print arr2d[0][2]
• print arr2d[0, 2]
• (last two are same)
3d array [[[ 1 2 3]
[ 4 5 6]]
• arr3d = np.array [[ 7 8 9]
• ([[[1, 2, 3], [10 11 12]]]
• [4, 5, 6]],
• [[7, 8, 9], [[1 2 3]
• [10, 11, 12]]]) [4 5 6]]
• print arr3d
• print arr3d[0] [[ 7 8 9]
• print arr3d[1] [10 11 12]]
Indexing with slices – 1D
• arr = np.arange(10)
• print arr
• print arr[1:6]
• Output
• [0 1 2 3 4 5 6 7 8 9]
• [1 2 3 4 5]
Indexing with slices – 2D
[[1 2 3]
• arr2d = np.array( [4 5 6]
• [[1, 2, 3], [7 8 9]]
• [4, 5, 6],
• [7, 8, 9]]) [[1 2 3]
• print arr2d [4 5 6]]
• print arr2d[:2]
[[2 3]
• print arr2d[:2, 1:] [5 6]]
Indexing with slices – 2D
• arr2d = np.array( [4 5]
• [[1, 2, 3],
• [4, 5, 6], [7]
• [7, 8, 9]])
• print arr2d[1, :2] [[1]
[4]
• print arr2d[2, :1]
[7]]
• print arr2d[:, :1]
[[ 0. 0. 0. 0.]
Fancy indexing [ 1.
[ 2.
1.
2.
1.
2.
1.]
2.]
[ 3. 3. 3. 3.]
• indexing using integer arrays [ 4. 4. 4. 4.]
[ 5. 5. 5. 5.]
• arr = np.empty((8, 4)) [ 6. 6. 6. 6.]
• for i in range(8): [ 7. 7. 7. 7.]]
• arr[i] = i
[[ 4. 4. 4. 4.]
• print arr [ 3. 3. 3. 3.]
• print arr[[4, 3, 0, 6]] [ 0. 0. 0. 0.]
[ 6. 6. 6. 6.]]
• print arr[[-3, -5, -7]]
• Negative index select from [[ 5. 5. 5. 5.]
the end [ 3. 3. 3. 3.]
[ 1. 1. 1. 1.]]
[[ 0 1 2 3 4]
Transposing arrays [ 5 6 7 8 9]
and swapping axes [10 11 12 13 14]]
• arr =
np.arange(15).resh (3L, 5L)
ape((3, 5))
• print arr [[ 0 5 10]
• print arr.shape [ 1 6 11]
• print arr.T [ 2 7 12]
• print arr.T.shape [ 3 8 13]
[ 4 9 14]]
Inner Product [[0 1]
[2 3]]
(dot operator)
[[0 2]
• arr = np.arange(4). [1 3]]
• reshape((2, 2))
• print arr [[ 4 6]
• print arr.T [ 6 10]]
• print np.dot(arr.T, arr)
[[ 1 3]
• print np.dot(arr, arr.T) [ 3 13]]
Inner Product (dot operator)
• arr = np.arange(9).reshape((3, 3))
• print arr
• print np.dot(arr.T, arr)
[[0 1 2] [[ 0 1 4]
[3 4 5] [ 9 16 25]
[6 7 8]] [36 49 64]]
Fast element-wise array functions
• arr = np.arange(5)
• print arr
• print np.sqrt(arr)
• print np.exp(arr)
[0 1 2 3 4]
[ 0. 1. 1.4142 1.7321 2. ]
[ 1. 2.7183 7.3891 20.0855 54.5982]
element-wise maximum
• x = randn(4)
• y = randn(4)
• print x
• print y
• print np.maximum(x, y)
• [-0.9691 -1.4411 1.2614 -0.9615]
• [-0.0398 -0.0692 -1.6854 -0.3902]
• [-0.0398 -0.0692 1.2614 -0.3902]
element-wise add
• x = randn(4)
• y = randn(4)
• print x
• print y
• print np.add(x, y)
• [ 0.0987 -1.2579 -1.4827 -1.4299]
• [-0.2855 -0.7548 -1.0134 0.7546]
• [-0.1868 -2.0127 -2.4961 -0.6753]
Zip two lists together
• a = [1,2,3]
• b = [10, 20, 30]
• zipAB = zip(a,b)
• print zipAB
• OUTPUT
• [(1, 10), (2, 20), (3, 30)]
Zip three lists together
• a = [1,2,3]
• b = [10, 20, 30]
• c = [True, False, True]
• zipABC = zip(a,b,c)
• print zipABC
• Output
• [(1, 10, True), (2, 20, False), (3, 30, True)]
And is the same as
• a = [1,2,3]
• b = [10, 20, 30]
• c = [True, False, True]
• result = [(x,y,z)
• for x, y, z in zip(a,b,c)]
• print result
• Output
• [(1, 10, True), (2, 20, False), (3, 30, True)]
conditionals
• result = [(x if z else y)
• for x, y, z in
zip(a,b,c)]
• print result
• OUTPUT
• [1, 20, 3]
• NOTE depending on the boolean value, it
decides which list to take value from.
where
• an easier way to do this with np
• a = [1,2,3]
• b = [10, 20, 30]
• c = [True, False, True]
• np.where(c,a,b)
• Output is [ 1 20 3]
types
• result = [(x if z else y)
• for x, y, z in
zip(a,b,c)]
• print type(result)
• result = np.where(c,a,b)
• print type(result)
<type 'list'>
<type 'numpy.ndarray'>
where(arr > 0, 2, -2)
• arr = randn(4, 4)
• arr
• print np.where(arr > 0, 2, -2)
[[ 2 2 -2 -2]
[-2 2 -2 2]
[-2 -2 -2 -2]
[ 2 -2 2 2]]
where(arr > 0, 2, arr)
• arr = randn(4, 4)
• Arr
• print np.where(arr > 0, 2, arr)
[[ 2. 2. -0.9611 -0.3916]
[-1.0966 2. -1.9922 2. ]
[-0.2241 -0.9337 -0.8178 -1.1036]
[ 2. -1.096 2. 2. ]]
Mathematical and statistical methods
• arr = np.random.randn(5, 4)
• print arr.mean()
• print np.mean(arr)
• print arr.sum()
Axis
• An array has an axis.
• These are labelled 0, 1, 2, …
• These are just the dimensions.
Mean of rows/columns (axis)
• arr = np.array([[0, 1, 2], [3, 4,
5], [6, 7, 8]])
• print arr [[0 1 2]
• print arr.mean(axis=0) [3 4 5]
• print arr.mean(axis=1) [6 7 8]]
[ 3. 4. 5.]
[ 1. 4. 7.]
Sum different axis
• arr = np.array([[0, 1, 2], [3, 4,
5], [6, 7, 8]])
[[0 1 2]
• print arr [3 4 5]
• print arr.sum(0) [6 7 8]]
• print arr.sum(1)
[ 9 12 15]
[ 3 12 21]
Cumulative sum [[0 1 2]
[3 4 5]
• arr = np.array( [6 7 8]]
• [[0, 1, 2],
[[ 0 1 2]
• [3, 4, 5], [6, 7, 8]])
[ 3 5 7]
• print arr [ 9 12 15]]
• print arr.cumsum(0)
• print arr.cumsum(1) [[ 0 1 3]
[ 3 7 12]
• This is across different axis. [ 6 13 21]]
Cumulative product [[0 1 2]
[3 4 5]
• arr = np.array( [6 7 8]]
• [[0, 1, 2],
[[ 0 1 2]
• [3, 4, 5], [6, 7, 8]])
[ 0 4 10]
• print arr [ 0 28 80]]
• print arr.cumsum(0)
• print arr.cumsum(1) [[ 0 0 0]
[ 3 12 60]
• This is across different axis. [ 6 42 336]]
Methods for Boolean arrays
arr = randn(10) output
print (arr > 0).sum() 2
array_ex.txt [ 3. 4. 5. 6.]]
1,2,3,4 <type
3,4,5,6 'numpy.ndarray'>
Indexing elements in a NumPy array
Two-
dimensional
array
slicing
3d 2x2x2
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3]
]
])
Indexing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[0]
]
])
Indexing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[1]
]
])
Indexing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[0][0]
]
])
Indexing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[0][0][0]
]
])
Indexing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[0][1]
]
])
Indexing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[1][0]
]
])
Slicing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[0,0,0]
]
])
Slicing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[0,1,0]
]
])
Slicing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[:,0]
]
])
Slicing
a=np.array([
[ 2,4
[3, 1],[4, 3] 3,1
3,3
], 4,3
[
[2, 4],[3, 3] a[:,:,0]
] Both slices, both
]) rows, column 0
Slicing
• Remember, slicing range:
a:b
• Means elements from a up to b-1
Data Types
• Every element in an ndarray has the same type
• Basic types:
– int
– float
– complex
– bool
– object
– string
– unicode
Data Types
1. Types also have a defined size in bytes, e.g.
1. int32
2. float64
2. The size defines storage size and accuracy
3. To set the type:
a=np.array([1,2], dtype=np.int32)
print a.dtype
OUTPUT IS dtype('int32')
Iterating and Processing
• You can iterate through a ndarray if you like:
for e in a:
print e
or
for e in a[0]:
print e
etc. but this is complicated an not advised
• There is a better way ...
Element-wise Operations
• a=a*2
• a=a+5
• a=a+b
• etc.
• Functions:
• a.sum()
• a.max()
• a.mean()
• a.round()
Slices and Indexes
• a[0]=a[0]/2
• a[0,0,0]+=1
• a[:,1,1].sum()