PMI - Modules and Data Structures

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Data structures in Python and

modules

Gianfranco Lombardo, Ph.D


gianfranco.lombardo@unipr.it
Dictionary

● Data structure for key-value items


○ my_dict = { “Alice” : “Boss”,
○ “George”: “Project manager”,
○ “Mario” : “Developer” }
● Also known as “ associative arrays” or map
● Unordered set of key / value pairs
○ Keys are unique as indices in lists and can be used to access the
associated value
○ Keys are usually string, but can be also int or any immutable type

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Dictionary

Py

>>> tel = {} #Initialize an empy dict called tel


>>> tel["mario"] = "052-87653" #add “Mario” key with his value
>>> print(tel)
{'mario': '052-87653'}

>>> my_dict = {"George":"051-43235"} #initialize my_dict with one value


>>> my_dict["Mario"] = "052-87653" #add an item to my_dict
>>> print(my_dict)
{'George': '051-43235', 'Mario': '052-87653'}

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Iterate over a dictionary

Py
my_dict = {"George":"051-43235", “Alice”: “053-74123”,”Bob”:”051-23619”}
for key in my_dict:
#remember that dictionaries are unordered!
if key==”George”:
print(“George number is “+str(my_dict[key]))

my_dict[“George”] = “3409823654” #change value

my_dict.pop(“Alice”) #remove Alice from the dictionary

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Matrix

● A matrix is an ordered tables of elements


○ Example: Chessboard, Algebra matrices, datasets
● We can see as multidimensional lists!
● Basically a main list with inside other lists with the same number of elements
a = [ ['A', 'B', 'C', 'D'],
['E', 'F', 'G', 'H'],
['I', 'L', 'M', 'N'] ]

● To access an element we need two indices:


○ row y
○ column x
○ matrix[y][x]

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Sum of columns in a matrix

Py
matrix = [[2, 4, 3, 8],
[9, 3, 2, 7],
[5, 6, 9, 1]]
rows = len(matrix) # get number of rows
cols = len(matrix[0]) # get number of columns

for x in range(cols):
total = 0
for y in range(rows):
val = matrix[y][x]
total += val
print("Column "+str(x)+ " sums to “+str(total))

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Flatten matrix as a list

Py
matrix = [2, 4, 3, 8,
9, 3, 2, 7,
5, 6, 9, 1]
rows = 3 # Cannot be guessed from matrix alone
cols = len(matrix) // rows

for x in range(cols):
total = 0
for y in range(rows):
val = matrix[y * cols + x] # 2D -> 1D
total += val
print("Col #", x, "sums to", total)

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Python external modules

● Sometimes we need functionalities and operations that are not directly


provided by Python
● Or we want to reuse some code already written by another developer
● In those cases we need to import a module in our code!
● Python is the most used language for Data science especially for the richness
of different modules
● Examples:
○ Math : for mathematical operations
○ Numpy for matrix and vectors
○ Pandas for data mining
○ Matplotlib for data visualization
○ Scikit-learn for Machine learning
○ Keras and Tensor-flow for Deep learning

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Math module

import math
y = math.sin(math.pi / 4)
print(y) # √2 / 2

from math import sin, pi


print(sin(pi / 4))

from random import randint


die1 = randint(1, 6) # like rolling a die
die2 = randint(1, 6) # like rolling a die

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


How to install a module

● Most of the modules have to be installed


● PIP is the tool already installed with Python that permits to install new
modules!
● example: pip install “name of modules”
○ pip install numpy

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


NumPy

● Efficient library that provides multidimensional array and algorithms


● It is the base of most of the machine learning and data science libraries
● Implemented in Fortran, C, C++
● To use after installation:

○ import numpy as np

● Define a vector ( or array ) in numpy:

○ a = np.array([2, 3, 4])

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Zeros, ones, ranges

>>> np.zeros(4)
array([0, 0, 0, 0])

>>> np.ones((2, 3), dtype=np.int16) # dtype can also be specified


array([[[1, 1, 1],
[1, 1, 1]], dtype=int16)

>>> np.empty((2, 3)) # uninitialized, output may vary


array([[3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])

>>> np.arange(10, 30, 5) # from 10, to 30, step 5


array([10, 15, 20, 25])

>>> np.linspace(0, 2, 5) # 5 numbers from 0 to 2


array([0, 0.5, 1, 1.5, 2])

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Reshape and resize

>>> a = np.arange(12) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]


>>> a = a.reshape(3, 4) # a new array
>>> a
array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
>>> a.shape # Shape gets number of elements along each axes
(3, 4)

>>> a.resize(2, 6) # modifies the array itself


>>> a
array([[0, 1, 2, 3, 4, 5],
[6, 7, 8, 9, 10, 11]])

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Shuffle and choice

>>> a = np.arange( 6)
>>> np.random.shuffle(a) # modifies the array itself
array([5, 3, 2, 4, 1, 0]])

>>> np.random.choice([ "A","B"]) #Choose randomly an element in a list


>>> “B”

>>>np.random.choice([ "one", "two"], (2, 3))


>>> array([[ 'two', 'one', 'one'],
['two', 'two', 'one']])

#Generate a matrix by choosing randomly elements in the list

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Elementwise operations

● A new array holds the result

>>> a = np.array([20, 30, 40, 50])


>>> conditions= a < 35
array([True, True, False, False])

>>> b = np.arange(4)
>>> b_square = b ** 2
array([0, 1, 4, 9])
>>> c = a - b
array([20, 29, 38, 47])

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Aggregate functions

>>> b = np.arange(12).reshape(3, 4)
>>> b
array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
>>> b.sum() # guess also min and max
66
>>> b.sum(axis=1) # sum of each row
array([6, 22, 38])
>>> b / b.max(axis=0) # norm each column
array([[0., 0.11, 0.2, 0.27],
[0.5, 0.55, 0.6, 0.63],
[1., 1., 1., 1. ]])

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Indexing and slicing

b = np.array([[0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23]])
>>> b[2, 3]
23
>>> b[:, 1] # each row in the second column of b
array([1, 11, 21])
>>> b[1:3, 0:2]
array([[10, 11],
[20, 21]])

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


CSV files

● CSV files (Comma separated values) is one of the most common file format in
data science
○ It requires that the first line is an header where each field is separated
with a comma
○ The other lines are values separated by a comma
○ Basically, it represents always a table
○ Available also in Microsoft Excel

● It is possible read a CSV file with operators that with the Python open() by
splitting each line for commas or other ad-hoc modules are available like csv
or Pandas ( later in this lesson…)

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


CSV EXAMPLE

Name,Age,Job,City #Header
George,34,Waiter,Chicago
Alice,27,Developer,New York
Mario,57,Plumber, Rome
Lauren,42,Teacher,Detroit
Robert,29,Engineer,London

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


CSV module

import csv ###### without module


matrix = [] matrix = []
with open('people.csv', with open('people.csv', newline='')
newline='') as f: as f:
reader = csv.reader(f) reader = f.readlines()
for row in reader: for line in reader:
matrix.append(row) row =
print(matrix) line.replace("\r\n","").split(",")
matrix.append(row)
with open('people_2.csv', 'w', print(matrix)
newline='') as f:
writer = csv.writer(f)
for row in matrix: ………
writer.writerow(row)

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Exercise:

● Ask to the user to add 10 people information:


○ Name
○ ID
○ Age
● Save these information in a dictionary where the key is always the ID and the
value is a list [name,age]
● Then ask again to add more information for these people by asking the ID and
add to each one:
○ Salary
○ Years of work
● Finally, export the entire dictionary into registry.csv file
○ header: Name,ID,Age,Salary,Years of work

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Exercise

● Read the previous registry.csv


○ Skip the first row and save as a separate list called headers
○ Memorize the other rows in a matrix
○ Then ask to the user the name of a field:
■ Retrieve which column has to be considered from headers
■ Compute min and max for that column

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)


Chessboard

● Generate a zero vector with dimension 64 as a pseudo matrix


○ Zero means black
○ One means white
● Iterate over the matrix with two “for” and put “1” if the column is pair ( c%2 ==
0)
● Transform the list as numpy array and then reshape as a 8x8 matrix
● Print the matrix

2020 - Gianfranco Lombardo, MEng (gianfranco.lombardo@unipr.it)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy