INTRO TO PYTHON - DATACAMP
INTRO TO PYTHON - DATACAMP
-------------------------[17/2/24]--------------------------
= using + with strings prints them together!!!! >'ab' + 'cd' gives 'abcd'
3. PYTHON LISTS:-
Lists give name and convenience to a collection of values (elements)
> listName = [a, b, c]
4. SUBSETTING LISTS:-
Index = indexes start from 0 for elements in a list
> listName[0] for example to access the first element of a list
> negative numbering accesses the list from last to first
> listName[-1] and listName[7] is same for a 7 element list
5. MANIPULATING LISTS:-
> listName[0] = AA # replaces prev first element with AA
> listName[0:2] = ["Random", 111] (changes a range of elements)
Adding lists adds them together like strings
-------------------------[18/2/24]--------------------------
9. PACKAGES:-
> To add package to script, use import PACKAGE NAME
> E.g. numpy.array([1 2 3]) for array creation in NumPy
> Thus NumPy arrays are objects too, with different methods
> Adding numpy arrays do not result in "string addition" like for lists
> It adds the corresponding values (element-wise sum) only
> EXAMPLE:
arrayVar = np.array(list)
arrayVar > 23 is a boolean condition that checks if every element is
greater than 23 (returns True or False in the array)
! BUT using arrayVar[arrayVar > 23] gives only the elements in the array
that are greater than 23
>> Type coercion - when elements of a NumPy array are changed into a single
type because array element types cannot be different
- DATA VISUALIZATION:
> helps explore data better and extract insights
> plt.scatter() - plot points aren't connected using a line [instead of .plot]
> plt.xscale('log') - turns the x scale into logarthmic values
2. HISTOGRAM:-
Histograms are useful for dataset exploration and getting an idea about the
distribution
> dividing chunks of scale in histograms are called bins
> the number of data in bins give height to the bar in a histogram
3. CUSTOMIZATION:-
Data visualization has,
- many type of plots
- different colors, shapes, labels, axes, etc.
To label axes,
> plt.xlabel('x-axis') > plt.ylabel('y-label')
> For title, plt.title('Title of Graph')
> plt.grid(True) makes gridlines visible
Ticks function,
> plt.yticks([0,2,4,6,8,10]) - adjusts the graph to start from 0 to 10
To label the ticks, add a 2D array. E.g.
> plt.yticks([0,2,4], ['0', '2B', '4B']) - names the ticks 0, 2B and 4B
> can also be xticks
-------------------------[20/2/24]--------------------------
4. DICTIONARIES PART-I:-
To create a dictionary in Python,
>> world = {"afghanistan":30.55, "albania":2.77 and so on}
>> world["albania"] will give you its population directly
>> keys are the information used to access other info (left side of :)
>> values are information accessed (right side of :)
>> you can use listName.keys() to check the available keys
5. DICTIONARIES PART-II:-
>> if keys are repeated in a dictionary with different values, it considers last
value
>> keys have to be non-changeable objects (str, bool, int, float, etc.)
>> using mutable objects (e.g. lists) gives an error
>> Lists should be used when order and ease to select subset data matters
>> Dictionaries are for fast access to data and use of specific keys
6. PANDAS, PART-I:-
- We don't use NumPy for collection different data types because they are only 1
type
- pandas is a high level data manipulation tool built on NumPy package
- Usually DataFrames are not created, they are imported from an external file
> for example, csv (comma seperated values) files
- use pd.read_csv("path/to/csvfile.csv") [this method has row labels as 0,1,etc.
already,
so if the csv files already has row labels, use:
- pd.read_csv("filepath.csv", index_col = 0)]
-------------------------[21/2/24]--------------------------
7. PANDAS, PART-II:-
Several ways to index and select data from dataframes,
= square brackets
= advanced data access methods
- loc (local-based)
- iloc (integer position-based)
-------------------------[22/2/24]--------------------------
8. COMPARISION OPERATORS:-
Comparision operators are operators that can tell how two values relate and result
in a Boolean
> use == for equal sign, <= and >= for equal to and less/greater than
> comparing strings checks them in alphabetical order
> Python cannot compare different variable types (except for float and integers)
> you can also compare int/float with NumPy arrays
9. BOOLEAN OPERATORS:-
- and - or - not
> In NumPy, the functions for boolean operators are given as:
- logical_and(): E.g. np.logical_and(condition) [such as: (arrayName > 21,
arrayName < 21)
- logical_or()
- logical_not()
if condition:
expression
elif condition:
expression
else condition-not-important:
expression
> Python leaves the control structure after executing the first True expression,
it does not execute multiple succeeding True expressions
> Python can check float conditions for integer and vice versa
i. Get column/row:
- Get pandas Series, you can use square brackets, loc or iloc
ii. Compare:
- use frameName["col/row"] > value
- you get a Boolean series, save it to a variable
-------------------------[17/2/24]--------------------------
> You can also do this using the apply() function, no loops
brcis["name_length"] = brics["country"].apply(len)
-------------------------[25/2/24]--------------------------
> You can also set the starting seed (number) for the above using:
np.random.seed(value) - same random numbers are generated for the same seed
> That is why it is called "pseudo-random", it is random but consistent b/w runs
> here, .randint(start, end-1) gives a random integer b/w the 2 values
> The above program produces a random list, but it is not a random walk
> random walks depend on the previous output. E.g.
np.random.seed(123)
tails = [0]
for x in range(10) :
coin = np.random.randint(0, 2)
tails.append(tails[x] + coin)
> To ensure the answer doesn't go below zero, use max() which has 2 arguments
= max(number it shouldn't go below, condition)
17. DISTRIBUTION:-
To check the final value of something many times (distribution), we can use:
np.random.seed(123)
final_tails = []
for x in range(100) :
tails = [0]
for x in range(10) :
coin = np.random.randint(0, 2)
tails.append(tails[x] + coin)
final_tails.append(tails[-1])
print(final_tails)
-> this lets you check the number of tails that occur in 10 tosses, 100 times
-------------------------[13/3/24]--------------------------
1. USER-DEFINED FUNCTIONS:-
> We need functions with functionality specific to your needs (unline built-in
ones)
> DEFINING A FUNCTION: (e.g. squaring function)
//
def square(): // Function header (no parameters)
new_value = 4 ** 2 // Function body
print(new_value)
square() // prints 16
//
> DOCSTRINGS:
- docstrings describe what your function does (example above)
- what computation it performs and its returns values
- serve as documentation for your functions
- placed in the immediate line after function header in b/w triple quotation marks
-------------------------[18/2/24]--------------------------
> TUPLES:
- similar to a list, contains multiple values
- immutable (values cannot be modified) - unlike lists
- constructed using parentheses
- used to make functions return multiple values
//
even_nums = (2, 4, 6) // This is a tuple
a, b, c = even_nums // Unpacking tuples to variables in order
//
> To return multiple values, use tuples. E.g. raised value1 to value2 and vice
versa
//
def raise_both(value1, value2):
"""Raise value1 to power of value2 and vice versa"""
> Scope: tells you which part of a program an object or name may be accessible.
There are 3 types of scopes:
1. Global scope - defined in the main body of a script/program
2. Local scope - defined within a function (ceases to exist outside the function)
3. Built-in scope - names in pre-defined built-in modules (e.g. print)
> To alter the value of a global name within a function call, use global keyword.
//
new_val = 10
def square(value):
global new_val
new_val = new_val ** 2
return new_val