0% found this document useful (0 votes)
10 views

INTRO TO PYTHON - DATACAMP

The document provides an extensive overview of Python programming, covering topics such as variables, lists, dictionaries, and data manipulation with libraries like NumPy and Pandas. It includes practical examples for data visualization using Matplotlib, as well as control structures like loops and conditionals. Additionally, it discusses random number generation and statistical analysis, emphasizing the importance of data structures and their applications in Python.

Uploaded by

adilrchanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

INTRO TO PYTHON - DATACAMP

The document provides an extensive overview of Python programming, covering topics such as variables, lists, dictionaries, and data manipulation with libraries like NumPy and Pandas. It includes practical examples for data visualization using Matplotlib, as well as control structures like loops and conditionals. Additionally, it discusses random number generation and statistical analysis, emphasizing the importance of data structures and their applications in Python.

Uploaded by

adilrchanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 10

A.

DATACAMP: INTRODUCTION TO PYTHON

-------------------------[17/2/24]--------------------------

2. VARIABLES & TYPES:-


To check the type of variable, use the type function. E.g.,
type(variableName) > gives int or float or str or bool
- bool is for boolean (T or F)

= using + with strings prints them together!!!! >'ab' + 'cd' gives 'abcd'

3. PYTHON LISTS:-
Lists give name and convenience to a collection of values (elements)
> listName = [a, b, c]

4. SUBSETTING LISTS:-
Index = indexes start from 0 for elements in a list
> listName[0] for example to access the first element of a list
> negative numbering accesses the list from last to first
> listName[-1] and listName[7] is same for a 7 element list

To select a range of items using indexes, use colons (slicing)


> listName[3:5] this returns only 2 elements, 3 and 4
> output = listName[start : end-1]
> start is inclusive, end is exclusive
> listName[:5] and listName[5:] to start from 0 OR end till last element

5. MANIPULATING LISTS:-
> listName[0] = AA # replaces prev first element with AA
> listName[0:2] = ["Random", 111] (changes a range of elements)
Adding lists adds them together like strings

To delete an element from list,


> del(listName[2]) # deletes the 3rd element of list

Behind the scenes of how lists work:


> x = ["a", "b", "c"]
here, x does not contain the list elements, it contains a reference to list
Thus,
y = x
y[1] = "z" # This changes both x and y to ['a', 'z', 'c']
!!! When an element in a list is updated, the list is stored in one address
thus when x or y access it, both find the changed list

To copy a list without having it changed, use:


- y = list(x) #LIST FUNCTION - y = x[:] #SLICING

>>> INFO: To place several commands in a single line, use semicolons.


E.g. command1; command 2; command 3

!!!!!!!!!! LOST NOTES FOR 18/2/24 FROM TOPIC 4 TO 8

-------------------------[18/2/24]--------------------------
9. PACKAGES:-
> To add package to script, use import PACKAGE NAME
> E.g. numpy.array([1 2 3]) for array creation in NumPy

!! To avoid using numpy.functions > import numpy as ny (now use ny.function)

> To import a particular function/method from a package, use:


>>>>>>>> from numpy import array
> Now you can use array without the numpy. part

10.NUMPY [Numeric Python]:-


>>> Using numpy.array(list) changes that list into a NumPy array, allowing
you to make complex calculations using arrays
!! HOWEVER, NumPy arrays can contain values of a single type only

> Thus NumPy arrays are objects too, with different methods
> Adding numpy arrays do not result in "string addition" like for lists
> It adds the corresponding values (element-wise sum) only

> EXAMPLE:
arrayVar = np.array(list)
arrayVar > 23 is a boolean condition that checks if every element is
greater than 23 (returns True or False in the array)

! BUT using arrayVar[arrayVar > 23] gives only the elements in the array
that are greater than 23

>> Type coercion - when elements of a NumPy array are changed into a single
type because array element types cannot be different

11. 2D NUMPY ARRAYS:-


> Using numpyArrayName.shape (an attribute, not a method) tells you about the
number of rows and columns in an array. For 2D, we have 2 rows

> To access elements in a 2D array, use arrayName[row][column]


OR you can use arrayName[row, column]
> To access several elements, use arrayName[row:range, col:range]

12. BASIC STATISTICS


>> You can use NumPy to get statistics of your data

> For example, to find the average value of a 1D array, use:


numpy.mean(arrayName) - you can also specify row or column if needed
> Use numpy.median(arrayName) for median value
> numpy.corrcoef(arrayName) checks if there is correlation between values
> numpy.std(name) gives the standard deviation
> numpy.sum(name) and numpy.sort() also exist to be used

B. DATACAMP: INTERMEDIATE PYTHON


-------------------------[19/2/24]--------------------------

1. BASIC PLOTS WITH MATPLOTLIB:-


> Learning data visualization and storing data in new structures.
> And control structures, used to customize the flow of the script and algorithm.

- DATA VISUALIZATION:
> helps explore data better and extract insights

> MATPLOTLIB is a visualization package


> import matplotlib.pyplot as plt
> plt.plot(x-axis, y-axis) is used to plot data points
> To display the plot, use plt.show()

> plt.scatter() - plot points aren't connected using a line [instead of .plot]
> plt.xscale('log') - turns the x scale into logarthmic values

2. HISTOGRAM:-
Histograms are useful for dataset exploration and getting an idea about the
distribution
> dividing chunks of scale in histograms are called bins
> the number of data in bins give height to the bar in a histogram

> Use plt.hist(x, bins=None, range=None, density=False, weights=None,


cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical',
rwidth=None, log=False, color=None, label=None, stacked=Fasle, *, data=None)
>>> x is list of values
>>> bins is how many equal divisions (10 by default)

> plt.clf() - cleans the plot to start again

3. CUSTOMIZATION:-
Data visualization has,
- many type of plots
- different colors, shapes, labels, axes, etc.

To label axes,
> plt.xlabel('x-axis') > plt.ylabel('y-label')
> For title, plt.title('Title of Graph')
> plt.grid(True) makes gridlines visible

Ticks function,
> plt.yticks([0,2,4,6,8,10]) - adjusts the graph to start from 0 to 10
To label the ticks, add a 2D array. E.g.
> plt.yticks([0,2,4], ['0', '2B', '4B']) - names the ticks 0, 2B and 4B
> can also be xticks

-------------------------[20/2/24]--------------------------

4. DICTIONARIES PART-I:-
To create a dictionary in Python,
>> world = {"afghanistan":30.55, "albania":2.77 and so on}
>> world["albania"] will give you its population directly
>> keys are the information used to access other info (left side of :)
>> values are information accessed (right side of :)
>> you can use listName.keys() to check the available keys
5. DICTIONARIES PART-II:-
>> if keys are repeated in a dictionary with different values, it considers last
value
>> keys have to be non-changeable objects (str, bool, int, float, etc.)
>> using mutable objects (e.g. lists) gives an error

>> To add more keys to dictionaries, you can use


dictionaryName["key"] = value
>> To check, use: key in world :and if it gives True

>> To delete a key, use: del(dictionaryName["key"])

>> Lists should be used when order and ease to select subset data matters
>> Dictionaries are for fast access to data and use of specific keys

>> Dictionaries can contain subdictionaries (just like lists). E.g.


europe {'spain' : {'capital':'madrid', 'population':46.77}}, and so on}

6. PANDAS, PART-I:-
- We don't use NumPy for collection different data types because they are only 1
type
- pandas is a high level data manipulation tool built on NumPy package

> import panda as pd


> tabular data is stored in an object called DataFrame
> rows are observations, columns are variables
> you can create a dataframe from dictionaries,
> keys are the column labels and the values are corresponding columns in list form
> frameName = pd.DataFrame(dictionaryName)
> frameName.index = ["label 0", "label 1", "label 2" and so on]

- Usually DataFrames are not created, they are imported from an external file
> for example, csv (comma seperated values) files
- use pd.read_csv("path/to/csvfile.csv") [this method has row labels as 0,1,etc.
already,
so if the csv files already has row labels, use:
- pd.read_csv("filepath.csv", index_col = 0)]

-------------------------[21/2/24]--------------------------

7. PANDAS, PART-II:-
Several ways to index and select data from dataframes,
= square brackets
= advanced data access methods
- loc (local-based)
- iloc (integer position-based)

> Square Brackets:


- To access a particular column, use: frameName["column label"] (prints row labels
as well)
- if we check the type of the particular column, it says pandas series
- series are like a 1D array that can be labelled

- to keep the type as a dataframe, use double square brackets: frameName[["column


label"]]
- add a comma then more column labels to print more than one column

- To access particular rows, we use the slice method: frameName[starting-row-


no:end-no-minus-1]
- HOWEVER, square brackets offer limited functionality
- same can be done using NumPy arrays in two dimensions

> Loc & Iloc:


- loc is used to select part of data using labels
- to get a particular row, use: frameName.loc["row-label"]
- however this gives us a inconveniently arranged series
- thus we use two brackets for dataframe
- for more than one row selection: frameName.loc[["row1", "row2"]]

- you can do a mix of row & column loc,


- frameName.loc[["row1", "row2"], ["col2", "col4"]]
- to select all rows: frameName.loc[:, ["col2", "col3"]]

- iloc is based on positions, subsetting frames based on their location


- in iloc, for row access: frameName.iloc[[1]] for second row
- frameName.iloc[[0,1,4]]

- for column access, use: frameName.iloc[[row-numbers], [0,1]]


- for all rows, use: frameName.iloc[:, [0,2]]

-------------------------[22/2/24]--------------------------

8. COMPARISION OPERATORS:-
Comparision operators are operators that can tell how two values relate and result
in a Boolean

> use == for equal sign, <= and >= for equal to and less/greater than
> comparing strings checks them in alphabetical order
> Python cannot compare different variable types (except for float and integers)
> you can also compare int/float with NumPy arrays

9. BOOLEAN OPERATORS:-
- and - or - not

> In NumPy, the functions for boolean operators are given as:
- logical_and(): E.g. np.logical_and(condition) [such as: (arrayName > 21,
arrayName < 21)
- logical_or()
- logical_not()

10. IF, ELIF, ELSE:-

if condition:
expression
elif condition:
expression
else condition-not-important:
expression

> Python leaves the control structure after executing the first True expression,
it does not execute multiple succeeding True expressions
> Python can check float conditions for integer and vice versa

11. FILTERING PANDAS DATAFRAMES:-


To use comparision operators to filter pandas Dataframes, we have three steps

i. Get column/row:
- Get pandas Series, you can use square brackets, loc or iloc

ii. Compare:
- use frameName["col/row"] > value
- you get a Boolean series, save it to a variable

iii. Subset the DataFrame:


- use the variable to subset the Dataframe, like:
frameName[boolVariable]

> Instead of this all, you can just do:


frameName[frameName["row/col"] > value]

> For boolean operator, use the NumPy logic functions:


frameName[np.logical_and(frameName["row/col"] > value, frameName["row/col"] < 10)]

-------------------------[17/2/24]--------------------------

12. WHILE LOOP:-


while condition :
expression

13. FOR LOOP:-


for var in seq:
expression :

> for each variable in sequence, execute an expression, e.h.:


for height in fam :
print(height)
- height is assigned to each element in an array (for example), then printed

> for index, height in enumerate(fam) :


print("index" + str(index) + ": " + str(height))
- here enumerate iterates two variables

> another example for loop over string:


for c in "family" :
print(c.capitalize())
- prints F A M I L Y in new lines each

> for a subsetted list:


for x in house :
print("the " + x[0] + " is " + str(x[1]) + "sqm")
- here x[0] is already string while x[1] are floats

14. LOOP DATA STRUCTURES PART-I:-


> To use for loop on a dictionary:
for key, value in dictName.items() :
print(key + ": " + str(value))
- here, the print is arranged a

> for 2D NumPy arrays,


meas = np.array([np_height, np_weight])
for val in np.nditer(meas)
print(val)

14. LOOPING PANDAS DATAFRAMES:-


> Using the NumPy method (withoyt np.nditer) only gives the column labels
> To print rows, use .iterrows()
for lab, row in brics.iterrows() :
print(lab); print(row)

> For selective print:


for lab, row in brics.iterrows() :
print(lab + " : " + row["capital"])

- only prints the info on capital column with labels

> You can also add a column to the pandas Dataframe:


for lab, rows in brics.iterrows() :
brics.loc[lab, "name_length"] = len(row[country]) # create Series on every
iteration

- not ideal because it creates series on every iteration

> You can also do this using the apply() function, no loops
brcis["name_length"] = brics["country"].apply(len)

> For methods using apply:


cars["COUNTRY"] = cars["country"].apply(str.upper)

-------------------------[25/2/24]--------------------------

15. RANDOM NUMBERS:-


> Hacker statistics are when you simulate a program to see what the chance of
something happening is

> To generate random numbers, we need to use .rand() function in NumPy:


np.random.rand() - gives a random number between 0 and 1

> You can also set the starting seed (number) for the above using:
np.random.seed(value) - same random numbers are generated for the same seed

> That is why it is called "pseudo-random", it is random but consistent b/w runs

> To simulate a coin toss:


import numpy as np
np.random.seed(123)
coin = np.random.randint(0,2)

> here, .randint(start, end-1) gives a random integer b/w the 2 values

16. RANDOM WALK:-


Random walk is a series of random steps. To make one, you need to create a list
usingfor loop.
E.g.:
np.random.seed(123)
outcomes = []
for x in range(10) :
coin = np.random.randint(0, 2)
if coin == 0 :
outcomes.append("heads")
else :
outcones.append("tails")

> The above program produces a random list, but it is not a random walk
> random walks depend on the previous output. E.g.
np.random.seed(123)
tails = [0]
for x in range(10) :
coin = np.random.randint(0, 2)
tails.append(tails[x] + coin)

> To ensure the answer doesn't go below zero, use max() which has 2 arguments
= max(number it shouldn't go below, condition)

17. DISTRIBUTION:-
To check the final value of something many times (distribution), we can use:

np.random.seed(123)
final_tails = []
for x in range(100) :
tails = [0]
for x in range(10) :
coin = np.random.randint(0, 2)
tails.append(tails[x] + coin)
final_tails.append(tails[-1])
print(final_tails)

-> this lets you check the number of tails that occur in 10 tosses, 100 times

> To visualize a distribution, you should use a histogram:

import matplotlib.pyplot as plt


plt.hist(final_tails, bins = 10)
plt.show()

> np.transpose() transposes the given numpy array

B. DATACAMP: PYTHON DATA SCIENCE TOOLBOX (I)

-------------------------[13/3/24]--------------------------

1. USER-DEFINED FUNCTIONS:-
> We need functions with functionality specific to your needs (unline built-in
ones)
> DEFINING A FUNCTION: (e.g. squaring function)
//
def square(): // Function header (no parameters)
new_value = 4 ** 2 // Function body
print(new_value)
square() // prints 16
//

> FUNCTION PARAMETERS:


//
def square(value): // Assigned parameter
"""Return the square of a value""" // Docstring
new_value = value**2
square(4) // Passed argument
//

> To return values from functions instead of printing it, use:


return new_value instead of print

> DOCSTRINGS:
- docstrings describe what your function does (example above)
- what computation it performs and its returns values
- serve as documentation for your functions
- placed in the immediate line after function header in b/w triple quotation marks

-------------------------[18/2/24]--------------------------

2. MULTIPLE PARAMETERS & RETURN VALUES:-


//
def power(value1, value2):
"""Raise value1 to the power of value2""" // Docstring
new_value = value1 ** value2
return new_value
result = power(2,3)
print(result) // Outputs 8
//

> TUPLES:
- similar to a list, contains multiple values
- immutable (values cannot be modified) - unlike lists
- constructed using parentheses
- used to make functions return multiple values

//
even_nums = (2, 4, 6) // This is a tuple
a, b, c = even_nums // Unpacking tuples to variables in order
//

- To access tuple elements (like lists):


print(even_nums[1])

> To return multiple values, use tuples. E.g. raised value1 to value2 and vice
versa
//
def raise_both(value1, value2):
"""Raise value1 to power of value2 and vice versa"""

new_value1 = value1 ** value2


new_value2 = value2 ** value1
new_tuple = (new_value1, new_value2)
return new_tuple

answer1, answer2 = raise_both(2,3)


//

3. SCOPE AND USER-DEFINED FUNCTIONS:-


Not all objects defined are accessible everywhere in a program

> Scope: tells you which part of a program an object or name may be accessible.
There are 3 types of scopes:
1. Global scope - defined in the main body of a script/program
2. Local scope - defined within a function (ceases to exist outside the function)
3. Built-in scope - names in pre-defined built-in modules (e.g. print)

Order of search: Local, Global, Built-in

> To alter the value of a global name within a function call, use global keyword.
//
new_val = 10

def square(value):
global new_val
new_val = new_val ** 2
return new_val

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy