Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
Overview
In this tutorial, you will learn how to set up the Integrated Development Environment (IDE)
for python programming, and write simple python programs. The IDE being used is called
PyCharm.
Python is a programming language that can be used to build computer programs for data
processing, data analysis, and data visualization, which are the main steps in data science.
“C:\Users\...\AppData\Local\Programs\Python\Python311”
5. After creating a new project, you can right-click the project name to create a directory
“Week 1”, and a new python file “Hello World.py” inside such directory.
6. In the “Hello World.py” file, you can type the following in line 1. This will print the
message “Hello World!” when you run the program.
print("Hello World!")
7. To run the program, right-click the file “Hello World.py” at the top and choose Run
“Hello World”. You will see the message printed out in the output screen at the bottom.
10. To capture user’s input, you can use the input() function.
x = input("Please enter the x value:\n")
The x-value at the right of the “=” operator is in the “string” type. The function int()
will convert the value from the “string” type to the “integer” type. The value of x in
integer type will be stored in the variable x again.
12. Similarly, you can write the following two lines of codes for capturing the y value from
the user.
addition_result = x + y
subtraction_result = x - y
multiplication_result = x * y
division_result = x/y
powering_result = x**y
Sample Outputs:
Programming Task (2) – Simple Statistics
14. Write a python program that can computes the mean, the population standard deviation,
and the sample standard deviation of the values that are stored in a list. To start with,
create a python file named “StatCalculator.py” in the Week 1 directory.
15. Import the numpy package, which contains the required functions for computing the
statistics measures.
install the package (numpy) in python package
import numpy as np
The package initially has not been installed in the project. To install the package, you
have to download it using the python package installer.
16. You can use the following line of code to create a list of integers.
17. The mean, the population standard deviation, and the sample standard deviation has the
following formulas.
The mean is
𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 ∑𝑋
𝜇= =
𝑁 𝑁
You can use the following line of codes to compute their values.
parameter (input of the function)
mean = np.mean(lst)
p_std = np.std(lst) degree of freedom n-1
s_std = np.std(lst, ddof = 1)
This demonstrates the use of “functions” to get the desired results. The function has
inputs (known as “parameters”) and will return an output.
18. The results are printed using the following lines of code. The function round() is used
to perform the rounding. The first parameter is the value to be rounded, and the second
parameter is an integer that specifies the number of decimal places desired.
round up to 2 decimal place, 0 is integer, -1 is 10 place
print("The mean is", round(mean, 2))
print("The population standard deviation is", round(p_std, 2))
print("The sample standard deviation is", round(s_std,2))
Sample Outputs: