0% found this document useful (0 votes)
7 views

STT - CSE lab 5,6,7,8

The February report for CS202 includes four lab reports documenting activities conducted on specific dates in February 2025. The primary focus is on Code Coverage Analysis and Test Generation, Python Test Parallelization, and Vulnerability Analysis on Open-Source Software Repositories, detailing methodologies, tools used, and results achieved. Notably, the report highlights an increase in code coverage from 76% to 94% through the use of automated test generation tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

STT - CSE lab 5,6,7,8

The February report for CS202 includes four lab reports documenting activities conducted on specific dates in February 2025. The primary focus is on Code Coverage Analysis and Test Generation, Python Test Parallelization, and Vulnerability Analysis on Open-Source Software Repositories, detailing methodologies, tools used, and results achieved. Notably, the report highlights an increase in code coverage from 76% to 94% through the use of automated test generation tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

CS202: SOFTWARE TOOLS &

TECHNIQUES FOR CSE


February Month Report

This pdf contains 4 lab reports which documents the lab activities conducted on
6, 13, 20, 27 February - 2025

ID : 24120036 NISHITBHAI PRAJAPATI


Table of Contents

Code Coverage Analysis and Test Generation .............................................................................................. 1


Overview ............................................................................................................................................................ 1
Introduction and Tools........................................................................................................................................ 1
Setup .................................................................................................................................................................. 2
Methodology and Execution .............................................................................................................................. 5
Result and Analysis ........................................................................................................................................... 14
Discussion ......................................................................................................................................................... 18
Conclusion ........................................................................................................................................................ 18

Python Test Parallelization .............................................................................................................................. 19


Overview .......................................................................................................................................................... 19
Introduction and Tools...................................................................................................................................... 19
Setup ................................................................................................................................................................ 19
Methodology and Execution ............................................................................................................................ 20
Result and Analysis ........................................................................................................................................... 29
Discussion ......................................................................................................................................................... 33
Conclusion ........................................................................................................................................................ 33

Vulnerability Analysis on Open-Source Software Repositories ............................................................... 34


Overview .......................................................................................................................................................... 34
Introduction and Tools...................................................................................................................................... 34
Setup ................................................................................................................................................................ 34
Methodology and Execution ............................................................................................................................ 36
Result and Analysis ........................................................................................................................................... 44
Discussion ......................................................................................................................................................... 55
Conclusion ........................................................................................................................................................ 56
Course: Software Tools & Techniques for CSE (CS202)

Date: 6-Feb-2025
Name: Nishit Prajapati
ID: 24120036

Lab 05 Report
Code Coverage Analysis and Test Generation

Overview
In this lab assignment, I did Code Coverage Analysis and Test Generation to understand how
effective software testing is. The primary intent was to learn various types of code coverage
metrics and increase test coverage with automated test generation tools such as pynguin.
A GitHub-hosted repository (keon/algorithms) was used to determine line coverage, branch
coverage, and function coverage and to generate reports for all the python files which it
contained. For this assignment, I did:

• Installing all the tools such as pytest, pytest-cov, pynguin, coverage, and genhtml, etc.
• Running and analysing a provided test suite contained in the Keon’s repository (Test
Suite A) and its coverage measurement.
• Development of a new test suite (Test Suite B) using pynguin and its coverage
measurement as well.
• Analysis of the effectiveness of the two test suites by comparing coverage reports and
the identification of any cases still not covered.

Introduction and Tools


Before we discuss about the steps I have performed, it is important to introduce ourself with
all the tools which we are going to use in this assignment.
Pytest: It is a tool which we are going to use to write and execute test cases for the python
programs in the Keon’s repository.
Pynguin: This tool will help us generate the unit test cases automatically for the python
programs.
Coverage: This tool will provide us code coverage by tracking executed lines, branches and
function.

1
Genhtml: This tool will generate the HTML report from coverage data.
pytest-cov & pytest-func-cov: These are extra pytest plugins which provides coverage analysis
and function level coverage respectively.

Setup
I have performed this lab activity in the SET-IITGN-VM as it was “strongly advised” to use this
particular VM in the lab 5 document. The setup for this VM is already mentioned in the lab-2
report.
Now, the first thing we are going to do is install an IDE, Why? Because it will be easy for us to
write code(shell script) in it as compared to in-built text editor of the VM. To install Visual
Studio Code(I opted for this IDE), I followed the steps mentioned in this page. Below are the
screenshots which highlights the code snippets used to install Visual Studio Code.

2
Now after successfully installing the IDE, I created a virtual environment by following the steps
shown in this video. Below is the screenshot which highlights the code snippets used to create
the virtual environment.

After creating the virtual environment, we need to install all the tools mentioned above in the
Introduction part of this report. Below are the screenshots which highlights the code snippets
used to install all the above-mentioned tools.

I installed pytest, it’s plugins and coverage tools easily(as highlighted in the above screenshots)
but, installing the genhtml and lcov tool was quite tricky. I had to refer to the genhtml and
lcov’s documentation to install both of these tools. In the end, I managed to install these two
tools using the following code snippets highlighted in the below screenshots.

3
After installing genhtml and lcov, we can verify it’s installation & check it’s version using the
following code snippets highlighted in the below screenshot.

Now, we have everything we need to complete this lab assignment, i.e., the setup is ready!

4
Methodology and Execution
So, the first task mentioned in the lab assignment is to clone the keon’s algorithms repository
and report the current commit hash.
To clone the repository, we will use this “git clone <keon’s algorithms repo URL>” command in
VS code terminal.

We need to install all the requirements of keon’s repository. But in this repository, file named
‘requirements.txt’ is empty. So, we will install all the test requirements. Also, in this
repository, it contains a file named ‘test_requirements.txt’ which has some tool’s name in it.
To install the test requirements, use command “pip install -r test_requirements.txt”.

After cloning the repository, to get the current commit hash, we use “git rev-parse HEAD”
command. It will output the current commit hash in the terminal.

So, the current commit hash is cad4754bc71742c2d6fcbd3b92ae74834d359844.

Note: Before using the above command to get the current commit hash,
first move into the algorithms folder using “cd algorithms” command,
i.e., move the current directory to algorithms directory.
As we have already used “cd algorithms” command, we are in the algorithms directory. If we
now use “pytest” command it will execute the test cases already in the keon’s repository(Test
Suite A). So, I used the “pytest” command in the terminal, but it gave me an error. The error
was “ModuleNotFoundError”.

The solution to this error was shared to us on Google Classroom by our course TA. They
informed us to use “pip install -e .” command before using pytest. So, I did the same as
informed. This command installs the repository as an editable package in the virtual
environment we created, mentioned in the Setup part of this report.

5
Even after doing this, when I used the “pytest” command, it gave me some “SyntaxError”.

After looking into it, I found some syntax error in the test file already present in the keon’s
repository. I easily removed all these syntax errors as its cause was clearly mentioned in the
error message in the terminal. Some screenshots which highlight the errors as well as their
solutions to remove them are as follows:

Before After

Before

After

6
Note: For the syntax error shown in the below screenshot, I have
actually modified the original summarize_ranges.py file.
Before After

As you can see from the above screenshots, I removed all the syntax error. Now, I again used
the “pytest” command and this time it ran successfully.

But wait! We also want the coverage report of Test Suite A, i.e., already existing test files in
keon’s repository that contains test cases for all the python files of the repository.
So, after reading the pytest and coverage documentation, I configured the pytest command
to inspect this repository and generate html reports of line, branch and file coverage for each
python file in the repository.
The command is “pytest –cov=. –cov-report term missing –cov-report html”. Here in this
command,

• “pytest”: Runs all the tests it finds in the current directory and subdirectories.
• “--cov=.”: Specifies the directory to measure coverage for (. means the current
directory).
• “--cov-report term-missing”: Shows a summary of the coverage in the terminal,
highlighting lines that are not covered.
• “--cov-report html”: Creates an HTML report that we can open in any browser for a
visual representation of the coverage.

7
The above command we used created a ‘htmlcov’ folder in the algorithms directory. In this
folder, there are many html files. Each html file contains the coverage of each file of the
repository. But the main file is ‘class_index.html’ as it contains the summary of the coverage
of all the executable files which are in the repository.
When I opened this ‘class_index.html’ file, it shows the following:

This means, the already existing test cases in the test files of the keon’s repository provides
76% coverage. In other words, the test suite A doesn’t cover ALL lines, branches, functions in
this repository. So, we are going to increase the coverage using the pynguin tool.
Now, I read the pynguin documentation. It mentioned that to generate a test case for a
particular file, let’s say x.py, we have to use the following command.
$ pynguin --project-path <path of the folder which contains the x.py>
--output-path /tmp/pynguin-results
--module-name folder_name_of_x.x -v

8
Now I knew which command to use, but to fetch the module name from the keon’s repository,
I did the following steps:

• Copy all the file’s name from the generated report and pasted it in a excel sheet.

• Sort the excel sheet on the basis of coverage percentage(see 5th column in above
screenshot). As you can see many files have 0% coverage, this means there is no test
file present in the keon’s repository that covers those python files. We will use pynguin
to generate test files for these python files.

9
• Now, using the ‘Find & Replace’ feature of Microsoft excel, I replaced each “ / ” to
“ . ” and “ .py ” to “<space>“. (see 1st column in the above screenshot).
• I created a shell script which reads the module name from the excel file(1st column)
and uses it to run the following command:
pynguin --project-path "$project_path"
--output-path "$output_path"
--module-name "$module_name"
--maximum-iterations 2500
--maximum-search-time 500
--maximum-test-execution-timeout 700 -v

I will mention the reason for using the above code snippet in the shell script later. But for now,
here is the shell script:

Now, as you can see from the above screenshot, this shell script is named ‘run_all.sh’ and is
outside the algorithms directory and that’s why the ‘project_path’ variable is set to
‘./algorithms’ and other variables are initialized accordingly. Also, PYNGUIN_DANGER_AWARE
is set. This can potentially cause damage to our system and that’s why we are going to run
this script in a virtual environment. As you may observe, it reads the module name from the
excel sheet named “module_names” and uses it to run the below pynguin command.
To run this shell script, open the bash terminal, use command ‘source ./bin/activate’ to
activate the virtual environment we created before(see the Setup part of this report), then
use the command “./run_all.sh”.

Note: The ‘module_name.xlsx’ excel file should be in the algorithms


directory.
After running the shell scripts, it generates the test cases of all the file name contained in the
excel file.

10
Now, it took a large amount of time to completely execute the above shell script. In my case, it almost
took a day.

So, by default, Pynguin runs until coverage for that particular file reaches 100% or 10 minutes. But
there are certain files in the repository which might contain a loop and one certain value generated by
pynguin may cause that loop to run infinitely. Thus, our shell script might get stuck at a particular file.
To prevent this from happening, I used “--maximum-test-execution-timeout 700”. This flag/option will
make pynguin to abort the current file & jump to the next file, if the previous file took more than 700
seconds to execute the test cases which contains certain value that makes a loop in the previous file
run infinitely. Also, I noticed that if the coverage of a certain file at 2500th iteration is x% then even at
the 9000th iterations, the coverage remains the same. So, there is no need to run many iterations and
thus I included “--maximum-iterations 2500” in the pynguin command. Now, if a certain python
program has high running time complexity, it will take too much time to do 2500 iterations.
So, I included “--maximum-search-time 500” in the pynguin command, it will make pynguin to
search test cases for maximum of 500 seconds (by default pynguin search for 10 minutes/600
seconds).
See, I didn’t come to the above pynguin command in one go. It took many hits and trials.
Actually, I manually created test files for 37 python programs manually using the pynguin
commands in the terminal for each 37 python files before using the shell script discussed
above. There was total 370 modules in the keon’s repository which means I generated 10%
test file manually without using the above shell script. After making this many test files, I came
to the conclusion that 2500 iterations are enough and experienced how test generation took
so much time either due to some infinite loop or high running time complexity.
After running the above discussed shell script and waiting for almost 14+ hours, all the test
files were generated for each python program & stored in the pynguin-results folder. This
generated test file is referred as ‘Test Suite B’.

11
Now, we will remove all the test files which were present before generating the Test Suite B,
i.e., remove the Test Suite A(to remove means move the test files to some other folder, we
will need it later). After removing the Test Suite A, I again used the “pytest –cov=. –cov-report
term missing –cov-report html” command. However, many failures and errors occurred due
to the generated test cases.

12
All the errors were almost the same and occurred due to the same reason. There was some
problem in the import statement used in the generated test file. I fixed all those test files
manually. Below is a screenshot which shows the import/attribute errors. In most of the
errors, I had to change the import statement.

Before After

Error:

Solution:

Before After

So, after removing all the errors, I again used the “pytest –cov=. –cov-report term missing –
cov-report html” command. The coverage improved to 90% from 76%.

13
Now, I again moved the Test Suite A back to the algorithms folder and again used the above
pytest command.

As you can see, with both Test Suite A & B, the coverage increased to 94%. We will talk about
the uncovered scenario revealed by the Test Suite B in the result and analysis part of this
report.
I was able to improve the coverage to 94%. So, with this we have completed all the task
mentioned in the lab assignment!

Results and Analysis


I have already included most of the results of this lab assignment in the Methodology part of
this report. But still let’s summarizes it.

• The original test files of the keon’s repository(Test Suite A) had a coverage of 76%.
• The test files we generated using the pynguin tool(Test Suite B) had a coverage of 90%.
• When both the test suites A & B were considered jointly, the coverage came out to be
94%.
Now, let’s look at some scenario which were revealed by the Test Suite B. First, Test Suite B
contained test files that covered Python programs whose test cases were not available in Test
Suite A.

14
As you can observe in the above screenshot, those python files have 0% coverage. This means
their doesn’t exist any test files which contains test cases to test those files. Now, in test suite
B, we have:

But as you can observe in test suite B, we have test files which contains test cases generated
by pynguin which covers those python files which were uncovered earlier.
Also, Test Suite B increased the coverage of the already partially covered python files from
Test Suite A.
Coverage of merge_intervals.py file in the Test Suite A:

Coverage of merge_intervals.py file in the Test Suite B:

Now, let’s look at the function coverage.

As you can observe from the above image, not a single line in the ‘min_cost’ function is
covered. In other words, it has 0% function coverage. But in the generation of Test Suite B(see
below screenshot), each line is covered in ‘min_cost’ function. Thus, it has 100% function
coverage. (67% is due to some other uncovered lines)

15
Now, let’s look at the branch coverage.

As you can observe from the above image, line number 29 is not covered. We also know, ‘if
statement’ creates a branch as learned in control flow graph. In other words, the line number
28 created another branch but it is not covered by Test Suite A. Now, if we look at the coverage
provided by Test Suite B(see below screenshot), line number 29 is also covered. So, generated
Test Suite B increased the branch coverage.

16
Now, let’s look at the line coverage.

As you can observe from the above image, line number 32 is not covered. It a return statement
and also the function ‘has_alternative_bit’ function is covered(that’s why not highlighted in
red) but still return statement not covered. Therefore, Test Suite A doesn’t cover this line in
this particular scenario. But now if look at the coverage provided by the Test Suite B(see below
screenshot), we can clearly see it provides the full line coverage.

We have already talked about many scenarios above and can easily conclude that Test Suite B
improved the code coverage greatly(from 76% to 90%). It shows the importance of automated
test generation in software testing!

17
Discussion
I faced several challenges during this lab work:

• Installation of tools like 'genhtml' and 'lcov' was quite tricky. To install them, I had to
read their documentation to install them properly.
• After cloning the Keon's repository, I got a 'ModuleNotFoundError' when I tried to run
the pytest command. The fix was shared by the course TA on the google classroom,
who suggested running the pip install -e. command.
• Following the ModuleNotFoundError, I faced syntax errors in the current test files
within Keon's repository. The terminal output was quite clear in indicating where these
errors were, and it was simple for me to correct them.
• Running Pynguin to generate test files took significant time. The entire process took
over 14 hours to generate test files for every Python program that was included within
the repository.
• I needed to fix import and attribute errors in test files created by Pynguin. The import
statement problem occurred in many files.
Through the process of overcoming these issues, I learned many lessons:

• I realized the importance of environment creation and the installation of the


dependencies in the correct manner.
• I had hands-on learning in fixing Python syntax errors and import errors in Python
programs.
• I learned how to use Pynguin for automated test creation and how to manually modify
test cases when needed.

Conclusion
This laboratory experiment shows how effective it is to apply automated test generation using
tools like Pynguin to improve code coverage. Despite installation and syntax problems, the
test files produced by Pynguin improved the overall coverage to 90% from 76%. By combining
the initial test files (Test Suite A) with the test files generated (Test Suite B), the overall
coverage further improved to 94%. The improved coverage shows the importance of
automated testing in identifying untested branches, lines, and functions. Future
developments can be directed towards improving the accuracy of generated test cases by
Pynguin.

-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- End of Lab 05 Report -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-

18
Course: Software Tools & Techniques for CSE (CS202)

Date: 13-Feb-2025
Name: Nishit Prajapati
ID: 24120036

Lab 06 Report
Python Test Parallelization

Overview
In the previous lab assignment, we used a tool named pytest. This pytest tool executes the
test cases sequentially by default. So, in this lab assignment, we are going to execute test cases
in parallel. To do this, we are going use 2 extra plugins of pytest which will make pytest execute
tests in parallel. After this we are going to compare the performance of sequential and parallel
test execution through average execution time and speed up ratio.

Introduction and Tools


I have already introduced the pytest tool in the lab 5 report. Now, let’s look at the 2 plugins
we are going to use for test parallelization.

• Pytest-xdist: It is a pytest plugin that allows test cases to run in parallel at the process
level across multiple CPUs.
• Pytest-run-parallel: It is also a pytest plugin that allows test cases to run in parallel but
at the thread level. This allows multiple test cases to run concurrently within the same
process.
We don’t have many tools to talk about in this lab assignment. So, let’s look at the setup
required.

Setup
We just need to create a virtual environment and install the above discussed plugins to
complete this lab assignment. I am going to perform this lab activity on Windows as it was not
explicitly mentioned to use any VM(virtual machine) like in the previous lab 5 assignment.
Since we are using Windows, to activate the virtual environment we use “Scripts/activate”
command in place of “bin/activate” command which is used on Linux-based system. The other
process like creating the virtual environment is same as we did on SET-IITGN-VM
system(Ubuntu).

19
After activating the virtual environment, install the tools and it’s plugins by using “pip install
<tool_name> <space> <plugin_name>” command. See the below screenshot for the
implementation of this command.

Now, we have everything we need to complete this lab assignment, i.e., our setup is complete!

Methodology and Execution


First, we will clone the keon’s algorithm repository using the “git clone <URL of keon’s
repository>” command.

Then move the current directory into the algorithms directory using “cd algorithms” command
in the terminal.
To get the current commit hash, we will use the “git rev-parse HEAD” command.

So, the current commit hash is cad4754bc71742c2d6fcbd3b92ae74834d359844.


Now from previous lab 5 assignment, we know that if you use the “pytest” command without
using the “pip install -e .” command it will give “ModuleNotFoundError”. So, run the command
“pip install -e .” in the terminal.

20
After this I used the “pytest” command and the same errors which came in the previous lab 5
assignment occurred. I resolved those errors in the same as I did it in the lab 5 activity. So, I
am not mentioning those errors and their solution here again. Below are the screenshots
which shows the successful execution of the ‘pytest’ command.

So, as it was mentioned in this lab assignment, we have to execute the existing full test suite
sequentially ten times, I created a shell script which runs the “pytest” command 10 times.

Note: To run the above shell script, open a bash terminal, activate the
virtual environment and use “./run_seq.sh” command in the bash
terminal.
The above shell script ran for 10 times and due to running the ‘pytest’ command for 10 times,
some new errors and failures occurred. These are failing + flaky tests. Some screenshots which
show the error occurred due to running the ‘pytest’ command for 10 times are as follows:

21
I solved the above error by the following modifications in the code.

Before After

And, I solved the above error by the following modifications in the code.

Before After

After resolving all these errors and failures, I again ran the shell script using “./run_seq.sh”
command and this time it ran successfully without any errors or failures. See below screenshot
for proof.

But still I felt maybe I couldn’t really identify the flaky tests. When I looked on internet, I found
a pytest plugin which identifies the flaky tests. Here is the link of that plugin’s documentation.
So, I installed that plugin using the command “pip install pytest-flakefinder”. After installing it,
I used that plugin to find the flaky tests. To use the plugin, execute the command “pytest –
flake-finder” in the terminal. Below are the screenshots which shows the implementation of
these above commands.

22
As you can observe from the above screenshot, there are no flaky test in the full test suite
after we removed the errors and failures discussed above. After doing this, I was assured that
there are no flaky tests.
Now I used the command “pytest -v” three times and noted the execution time. So, the time
taken be pytest to execute the full test suite sequentially is as follows:

So, the execution time for sequentially running the test 3 times is 8.77, 7.76 & 8.78 seconds.
The average comes out to be [(8.77+7.76+8.78)/3] = 8.44 second = Tseq.
Now, we will do the parallel test execution. In the lab 6 assignment, it was instructed to jointly
consider the process level & thread level in the parallel configuration. So, if we look at all the
combination possible, we can have 8 configurations, they are:

• pytest -n auto --dist load --parallel-threads auto


• pytest -n auto --dist load --parallel-threads 1
• pytest -n auto --dist no --parallel-threads auto
• pytest -n auto --dist no --parallel-threads 1
• pytest -n 1 --dist load --parallel-threads auto
• pytest -n 1 --dist load --parallel-threads 1
• pytest -n 1 --dist no --parallel-threads auto
• pytest -n 1 --dist no --parallel-threads 1
Let’s analyze the above command. In the above commands, ‘-n auto’ means the system will
assign the number of core (core here means the number of cores in the processor) to run the
tests. Generally, it assigns every core available in the system. In simple words, if a laptop has
12 core processor, then it will assign all cores to run the tests. So, if ‘-n 1’ is used in the
command then only 1 core will work on the parallel test execution. ‘load’ & ‘no’ are

23
parallelization mode of pytest-xdist plugin. ‘--parallel-threads auto’ means the system will
assign the number of threads to work in the parallel test execution. So, if ‘--parallel-threads
1’ is used in the command then only 1 thread will work on the parallel test execution.
Now, I made 8 shell script for all the 8 commands discussed above and calculate the average
test execution time. Let’s look at some of them:

1. pytest -n auto --dist load --parallel-threads auto


Below screenshots shows the shell script which runs this command for 3 times.

The above shell script ran the above command 3 times. As you can observe from the above
image, my laptop has a 12-core processor. I recorded the execution time as well as the
flaky tests due to parallelization. Due to above command 4 flaky tests occurred, they are
shown in the below screenshot.

Time taken by each run:

So, the average execution time is [(71.99 + 57.15 + 51.10) / 3] = 60.08 seconds = Tpar

2. pytest -n auto --dist load --parallel-threads 1


Below screenshots shows the shell script which runs this command for 3 times.

24
The above shell script ran the above command 3 times. You may observe in the above
command ‘—parallel-threads 1’ is used, it means only one thread is working on the parallel
test execution. As only 1 thread is used, in different words, it is doing the execution
sequentially. Therefore, no flaky tests occurred. The average execution time is as follows:
The average execution time is [(8.23 + 7.83 + 7.95)/3] = 8.003 seconds = Tpar

3. pytest -n auto --dist no --parallel-threads auto


Below screenshots shows the shell script which runs this command for 3 times.

The above shell script ran the above command 3 times. In this case, ‘no’ parallelization
mode is used unlike command ‘1’(notice each command is given a number) in which ‘load’
parallelization mode was used.
As you can observe from the below images, 3 flaky tests occurred in the 1 st run of the
above command. But, in the 2nd & 3rd run, 4 flaky tests were found. All the flaky tests are
highlighted in the below images.

25
Time taken by each run:

So, the average execution time is [(45.66 + 45.89 + 44.87)/3] = 45.47 seconds = Tpar

4. pytest -n auto --dist no --parallel-threads 1


Below screenshots shows the shell script which runs this command for 3 times.

The above shell script ran the above command 3 times. You may observe the above
command is quite similar to the command 2, only the parallelization mode is changed, i.e.,
from ‘load’ to ‘no’.
The average execution time for this command is :
[(7.07 + 7.20 + 7.72)/3] = 7.33 seconds = Tpar

So, as you can see, I have already mentioned the steps I took to calculate the average
execution time for the above four commands. Similarly, I calculated the average execution

26
time for the other four commands. The only difference is that now we are going to fix the
number of workers to 1, i.e., only one core of the processor will work on the parallel test
execution. So, unlike the above four commands, I am not going to mention each and every
calculation as well as explanation for the other four commands, just directly going to tell you
the results and report the count and the flaky test found when using a particular command.

5. pytest -n 1 --dist load --parallel-threads auto


Below screenshots shows the shell script which runs this command for 3 times.

Flaky tests:

Time taken by each run:

The average execution time for this command is : Tpar = 58.36 seconds

6. pytest -n 1 --dist load --parallel-threads 1


Below screenshots shows the shell script which runs this command for 3 times.

Flaky tests: 0 (no flaky tests found)


Time taken by each run:

The average execution time for this command is : Tpar = 7.17 seconds

27
7. pytest -n 1 --dist no --parallel-threads auto
Below screenshots shows the shell script which runs this command for 3 times.

Flaky tests:

Time taken by each run:

The average execution time for this command is : Tpar = 59.72 seconds

8. pytest -n 1 --dist no --parallel-threads 1


Below screenshots shows the shell script which runs this command for 3 times.

Flaky tests: 0 (no flaky tests found)


Time taken by each run:

The average execution time for this command is : Tpar = 6.83 seconds

Now, we will calculate and compare the speed up ratios for different parallelization modes
and worker counts as well as analyse the flaky tests in the next section of this report.

28
Results and Analysis
We will analyse the results found during the parallel test execution. We will also inspect the
cause of the test failures in the parallel runs.
Now, after running all the above commands, I found that all the flaky tests identified by each
command were the same. Also, all the 4 flaky tests only occurred when the threads count was
more than 1. So, I inspected all these 4 flaky tests.
All four flaky tests, along with the reasons for their occurrence are as follows:

1.
Causes of Failure:

• File System Contention: The test creates and delete files during execution. So,
when it is executed in parallel, multiple tests might attempt to create or delete
these files simultaneously, leading to unexpected behaviour or file corruption.
• Shared State: The test uses class-level file names which are shared across all
instances of the test class. This can cause issues if multiple tests run in parallel and
attempt to write to or read from the same files.
• Timing Issues: The test involves file I/O operations, which can be slow and
unpredictable. In a parallel environment, tests might interfere with each other's
file operations, leading to unexpected results.

2.
3.
Causes of Failure for both the above flaky tests:

• Shared Heap Instance Across Threads: The test runner uses threading and reuses
the same ‘TestBinaryHeap’ instance across threads, the ‘setUp’ method may
initialize ‘self.min_heap’ once, and subsequent parallel test executions could
modify the same instance concurrently. It concurrently calls to ‘remove_min()’
method or ‘insert()’ method across threads & modify the same heap instance. In
simple words, threads may interfere with each other’s heap state.
• Non-Atomic Operations: During heap updates two threads may insert elements
simultaneously, leading to corrupted heap structure.
• Non-Deterministic Heap State: The assertion ‘self.assertEqual’ assumes a specific
heap structure after ‘remove_min’ method execution. Parallel runs might reorder
operations (even in isolated instances), causing unexpected heap states.

4.
Causes of Failure:

29
• Shared Linked List Instances Across Threads: Class level attributes (self.l or self.l1
self.l or self.l1) are shared between test instances, parallel threads may modify the
same linked list simultaneously. In simple words, one thread may alter the linked
list (e.g., reversing nodes) while another is traversing it, causing inconsistent
results.
• Mutability of Linked List Nodes: Two functions(is_palidrome_stack &
is_palidrone) are modifying the linked list structure (splitting the list during
reversal). In parallel runs, one thread’s modification (e.g., splitting the list) may
leave the linked list in a broken state for other threads.
Now, we have discussed about the cause of flaky tests. What about the speed up ratio for all
the commands we used for parallel test execution?
In the above section, we calculated the average execution time for both sequential and
parallel test execution. Now, we will define Speed Up ratio as Tseq/Tpar. I have manually
calculated the speed-up ratio for each parallelization mode and worker count using a
calculator. Below is an execution matrix that contains every result obtained from the above
steps, along with their respective speed-up ratios.
Note: When the “ –parallel-threads auto” is used in the command the number of threads involved in
the parallel test execution is unknown(system choose it). So, in the below matrix unknown label is
used in some cells.

Command used Parallelization Worker Threads Average Speed Failure Rate


mode count Count execution up (x failed out of 1248)

time(seconds) ratio (x/1248)*100%


pytest -n auto --dist load --parallel-threads auto load 12 Unknown 60.08 0.14 0.96 %
pytest -n auto --dist load --parallel-threads 1 load 12 1 8.003 1.05 0%
pytest -n auto --dist no --parallel-threads auto no 12 Unknown 45.473 0.186 0.88 %
pytest -n auto --dist no --parallel-threads 1 no 12 1 7.33 1.151 0 %
pytest -1 auto --dist load --parallel-threads auto load 1 Unknown 58.36 0.145 0.96 %
pytest -1 auto --dist load --parallel-threads 1 load 1 1 7.17 1.177 0 %
pytest -1 auto --dist no --parallel-threads auto no 1 Unknown 59.72 0.141 0.96 %
pytest -1 auto --dist no --parallel-threads 1 no 1 1 6.83 1.24 0%

In the above matrix, failure rate is calculated on the basis of number of tests failed. For
example, in the execution of first command, 4 flaky tests were found in the first, second as
well as third run. So, the total number of flaky tests is 12(4+4+4). Also, we know from the
sequential test execution, total number of tests which should pass ideally is 416 for each run.
And as we have run each command for three times, the total number of tests which should
pass is 1248(416+416+416). Therefore, the failure rate is calculated using this
𝑥
(1248) . 100 % formula. Here, 𝑥 is the total number of tests failed.

I also wrote a python script to visualize speed up ratios for each command. The python script
as well as its output is as follows:

30
Output of the above python script:

These speed ratios show how does the speed of parallel test execution increased when
compared to sequential test execution. When more than 1 threads were used, speed up ratio
came out to be less than 1. This shows multiple threads actually increased the parallel

31
execution time. This is because Keon’s repository contains some python scripts which are not
multi-thread safe as discussed above.
Now, if we report our analysis of Parallelization Success/Failure Patterns, we can say,

Successes: Configurations using '--parallel-threads 1' (a single thread) resulted in a speedup


of more than 1. This shows that this specific test suite still contains some tests which are not
yet ready to be executed using multiple threads.

Failures: Configurations using '-n auto' along with '--parallel-threads auto' always caused a
slowdown (speedup ratios of 0.14 to 0.19). This is strong evidence that spawning multiple
processes and threads automatically results in huge overhead and contention within this
specific test suite. The overhead is presumably processing creation, inter-process and
communication. We can say the slowdown reflects the fact that task size is small and number
of threads are big.
Based on these findings, the project at this point lacks any compatibility with large-scale
parallel testing methodologies. Raising the number of threads and processes to the test suite
merely exacerbates the issue significantly.
Possible Enhancements:
Test Isolation: Every test should be executed in isolation from other tests, with no common
mutable state. Common state is one of the main sources of contention, which will induce
unstable results in parallel executions. Fixtures should be employed only to set up fresh,
isolated environments per test.
Minimize Test Interdependencies: Minimize test interdependencies. Tests based on an order
of execution or on common resources are hard to parallelize effectively.
Optimize Individual Tests: Identify the slowest tests and determine where optimization
opportunities are present. Quicker individual tests will reduce the total execution time and
enable better parallelization.
Manual Thread Management : Set the number of threads to use manually, instead of auto.
Begin with a low number of threads and rise incrementally, keeping an eye on performance
to get the sweet spot.
Some recommendations for pytest developers to make it thread-safe.

• Enhance the fixture system to make stronger promises of isolation between tests. For
example, introduce a feature that offers the capability to automatically deep copy or
clone fixture data in a way that every test gets a new and isolated copy.
• Enforce a tagging mechanism for declaring tests "thread-safe" or "thread-unsafe."
pytest would then be able to recognize thread-unsafe tests and then refuse to run
them in parallel.

32
• Provide tools or plug-ins to estimate the resource usage (CPU, memory, I/O) of
particular tests. This can help programmers identify tests that have the potential to
cause contention in parallel setups.
We have now completed every task mentioned in this lab assignment!

Discussion
I didn’t face many challenges during this lab work as it was quite simple compared to other
lab assignments. Just resolving all the failures and flaky tests after sequential test execution
was quite a hassle. Also, I wasn’t confident enough about whether I had identified all the flaky
tests or not, so I installed the flake-finder plugin of pytest.
But I learned many things, like

• Installing and understanding the plugins of pytest tools


• How to use the pytest-flake-finder plugin to detect and confirm flaky tests effectively.
• The impact of thread and process allocation on test execution efficiency and failure
rates.

Conclusion
This lab exercise increased my understanding on running the tests in parallel with pytest and
its related plugins. Solving issues faced during sequential run and determining stability with
pytest-flake-finder increased my confidence in the test reliability. Testing with different
parallel settings provided useful insight on the influence of thread and process management
on performance and reliability during runtime.

-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- End of Lab 06 Report -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-

33
Course: Software Tools & Techniques for CSE (CS202)

Date: 20-Feb-2025 & 27-Feb-2025


Name: Nishit Prajapati
ID: 24120036

Lab 07 & 08 Report


Vulnerability Analysis on Open-Source Software Repositories

Overview
In this lab assignment, we have used a tool ‘bandit’, a static code analysis tool used to find the
security vulnerabilities. After executing this tool and obtaining the results, we are going to
answer three research question.

Introduction and Tools


In this lab assignment we are going to use ‘bandit’ tool. Let’s first understand what this tool
does.
Bandit is a static code analyser that finds security bugs in Python code. Bandit checks source
code for common security bugs, such as poor input validation, poor crypto usage, and unsafe
function usage. Bug scanning through the use of abstract syntax trees (ASTs), Bandit discovers
bugs without the need to run code(that's why it is static code analyser), and thus is a useful
tool in secure software development. It helps us to verify and ensure that applications meet
some defined security best practices before they are deployed.
Now that we know what Bandit is and why it is used, let’s go through the setup required to
complete this assignment.

Setup
For this assignment, I did not use the ‘SET-IITGN-VM’ as it was not explicitly mentioned in the
problem statement and instead completed it on a Windows system.
So, first I created three virtual environments for the three GitHub repository. I referred to this
video to create the virtual environments. But there is a slight difference in activating the
created virtual environment compared to what is shown in the video. We will activate the
virtual environment by using the command “<virtual environment name>\Scripts\activate”
instead of “.\bin\activate” command. The second command is used to activate the virtual

34
environment on a Linux system while the first one is used on a windows system. Below
screenshot shows how I created and activated one of the virtual environments.

To deactivate the virtual environment use “deactivate” command in the terminal.


Below screenshot shows its implementation.

The above screenshot also shows how I created the second virtual environment.
Similarly, I also created the third virtual environment.
Now install Bandit and PyDriller in each virtual environment using “pip install bandit
pydriller” command. We need bandit to find the vulnerabilities in the repository and
PyDriller to retrieve the last non-merge 100 commit hashes of that repository. Below
screenshots shows its implementation.

35
Methodology and Execution
After completing the setup as mentioned above, I selected THREE large scale open-source
repositories to analyze with bandit. To find those repositories, I used the SEART GitHub Search
Engine . Below were the selection criteria I used to find three large scale open-source
repositories.

Language should be Python

(Search engine returned 99,281 results)


results
Minimum stars : 30000 and
Minimum fork : 18000
(Search engine returned 34 results)

Randomly selected three repositories


that I have never used before for any
previous lab assignment.

Repository : transformers, scikit-learn and


Python selected!

36
The three repositories are:

I cloned each of these three repositories into a separate virtual environment, each created
earlier during the setup for this lab assignment.
To clone the repository, we use “git clone <URL of the repository>” in the terminal. Below
screenshots shows the implementation of this command.

37
Also, only the ‘Python’ repository contained a requirement text file. So, I installed up all the
dependencies required for this repository using command “pip install -r requirements.txt”.

First, I had to retrieve the commit hashes of the last 100 non-merge commits to the main
branch. So, I used PyDriller and created a python script which saves the commit hashes of the
last 100 non-merge commit in a csv file. Below screenshot contains the python script which I
used to do this task.

As you can observe in the above script, I have use “order = ‘reverse’” &
“only_no_merge=True” so that our commit iterate from the reverse order(newest to oldest)
while ignoring the merge commits. We store the commit hashes in the ‘Commits’ list and write
those commit hashes in a csv file. You may also notice that I have reversed the list which stores
the hashes. Why? You see when we use “order = ‘reverse’”, the iterations start with 100th
commit and goes to 1st commit for the last 100 commits of the repository. We want the last
100 commit hashes but the order should be from 1st commit to 100th commit. Therefore, I
have reversed the list. Below screenshot contains the text from the creator of PyDriller tool
which confirms the about the order of commits that PyDriller returns.

After running the above python script for each of the three repositories, it created a csv file
for each repository that contained the commit hashes of the last 100 non-merge commits to
the main branch. Below screenshots shows the output of the above python script.

38
Now I knew that to find vulnerabilities in each commit of a repository, I need to use the Bandit
tool. So, I learned its implementation by referring to this documentation. So, as it was
mentioned in the lab assignment that we have to scan only the python files of the repository
across each commit, I created a ‘.bandit’ configuration file which contained the extensions of
the files which should be excluded. Thus, when we run bandit on the repository it ignores the
files having extensions that are included in the ‘.bandit’ configuration file. I inspected each
repository and, after identifying all files with extensions other than “.py”, I included those
extensions in the configuration file so that bandit ignores those files later while finding the
vulnerabilities of the repository..

I added the above configuration file to each of the three repositories. Then, I created a shell
script that reads commit hashes from the CSV file we just created, which contains the last 100
non-merge commit hashes, and runs Bandit across all the commits. The screenshot below
shows the shell script that performs this task.

39
This script first creates a folder named ‘Commits’. After that it will the read the csv file we
created using the python script discussed earlier. As that csv file contains the commit hashes,
it first checkout that commit hashes and run bandit on it. The bandit tool finds the
vulnerabilities in repository at that particular commit and generates a report(csv file). Bandit
gives us the option to choose the output format, i.e., json file, csv file, etc. I opted for a csv file
format. The generated output csv file contains “filename, test_name, test_name, test_id,
issue_severity, issue_confidence, issue_cwe, issue_text, line_number, col_offset, end_col_offset,
line_range and more_info”. You many notice “-r .” flag in the above shell script. Remember I
created “.bandit” configuration file earlier. When we use this flag, bandit first look for that
configuration file and we also know that file contains extensions which bandit should ignore.
So, when we run this shell script by using the command “./run_bandit.sh”, it will create a csv
file each last 100 non-merge commit of the repository and store it in the ‘Commits’ folder.
This file contains the labels High, Medium and Low for severity issue and confidence issue. It
also contains CWE(Common Weakness Enumeration) issue column. These all collectively tell
us about the vulnerabilities in the repository at a particular commit. Below screenshots shows
how the output of generated csv files looks like.

40
I did the above discussed steps and generated csv files using the bandit tool for each of the
three repositories. Also, it took 3 to 4 hours for the 100 reports to be generated by bandit for
the transformers repository. But, for the other two repositories, its tool relatively less time.
Now, it was mentioned in the lab assignment that we need to report the number of HIGH,
MEDIUM and LOW confidence issues and severity issues. Also, we need to report the unique
CWE(Common Weakness Enumeration) the bandit tool identifies per commit. So, for this I
wrote a python script which will create a csv file and store the number of HIGH, MEDIUM and
LOW confidence issues and severity issues for each commit in it. Below screenshots shows the
python script I used to do this task.

41
This script created a csv file named “bandit_summary” which contains all the things asked to
us to be reported. Below screenshot shows how the bandit_summary.csv file look like.

42
As you can see, the csv file which our python script created contains the commit number and
its corresponding unique counts of High, Medium and Low severity and confidence issues
along with the unique CWEs(Common Weakness Enumeration). So, I have reported the
number of HIGH, MEDIUM and LOW confidence issues and severity issues along with unique
CWEs(Common Weakness Enumeration) by creating ‘bandit_summary.csv’ file which contains
all these things. Here is the link to the drive which contains three bandit_summary.csv file for
the three repositories.
Now, I will answer the three RQs(Research Questions) in the next section of this repository.

43
Results and Analysis
I have already done the Individual Repository-level Analyses in the methodology section of
this report. In this section, I will do the Overall Dataset-level Analyses and try to answer the
three RQs asked in this lab assignment.
First, let’s define what are vulnerabilities and ho how a vulnerability is fixed.
So, Vulnerabilities in a repository are security weaknesses or flaws in the code that could
potentially be exploited by attackers. One of the main objectives of this lab assignment was
to teach us about the Bandit tool because this tool is a static code analysis tool which identifies
the vulnerabilities by scanning Python code for common security issues.
A vulnerability can be considered fixed when:

• The unique count of a specific severity (High, Medium, or Low) decreases from one
commit to the next in the development timeline.
• A particular CWE (Common Weakness Enumeration) that was present in a previous
commit is no longer reported in a present or future commit.
• The total number of unique vulnerabilities (across all severity and confidence levels)
decreases between consecutive commits.
Now, let’s answer the asked three RQs!
(a) RQ1 (high severity): When are vulnerabilities with high severity, introduced and fixed
along the development timeline in OSS repositories?

Purpose:
The purpose of this research question is to examine the lifecycle of high-severity
vulnerabilities discovered in open-source software (OSS) repositories. In particular, it
seeks to determine trends in the timing of introducing such severe vulnerabilities into
the codebase and eventually resolving them. This research offers valuable information
regarding the security practices and the responsiveness of OSS projects in resolving
severe security vulnerabilities.

Approach:
In the methodology section of this report, I generated a csv file using a python script
which contained the commit number and its corresponding unique counts of High,
Medium and Low severity and confidence issues along with the unique
CWEs(Common Weakness Enumeration). I used this report to answer this question. I
created a python script which generated visualization of this csv file, thus making easy
for us to understand the aspects related to the high severity. Below is the screenshot
of that python script.

44
This python code, first reads the ‘bandit_summary.csv’ file which contains High,
Medium and Low severity counts. Then it plots the high severity count across all the
100 commits as a line chart, i.e., y-axis shows the high severity count and the x-axis
have the corresponding commit number/position. The pattern in the line chart will
show how the vulnerabilities with high severity, introduced and fixed along the
development timeline in OSS repositories.

Results:
Let’s look at the results that above python script generated for each repository.

So, the first repository is Transformers by Hugging Face. The plot that above python
script generated for this repository is as follows:

As you can see from the above image, the result is quite surprising. It’s a straight line,
not a single deflection(High severity count = 9). This suggests that the number of high
severity vulnerabilities never changed in the last 100 commits. First, I thought maybe
I have generated a wrong report. But when I looked at the individual report for each
commit generated by bandit, the above result shown in the image is actually true.

45
Those individual report for each commit also depicted the same thing. So, for this
repository we can say that the high severity vulnerabilities were never fixed in the last
100 commits but we can’t say that the high severity vulnerabilities were never fixed
along the development timeline in OSS repositories. Now, if we look at the
second(scikit-learn) and third(Python) repository, the plots generated by the above
python script for these repositories are as follows:

Scikit-learn Repository’s High Severity across all the 100 commits:

Python Repository’s High Severity across all the 100 commits:

As you can observe in the above two images, same thing happened. The number of
high severity vulnerabilities count never changed in the last 100 commits! The number
of high severity vulnerabilities count for scikit repository is 4 and for python repository
it is 2. This suggests the same thing that the high severity vulnerabilities were never
fixed in the last 100 commits but we can’t say that the high severity vulnerabilities
were never fixed along the development timeline in OSS repositories. Why? To answer

46
this, I counted the high severity vulnerabilities count in the Python Repository(here
‘Python’ is the name of the repository) for the head(root) commit. I found that in the
1st commit of the Python Repository, the high severity count was 0(See the below
image). And we know that, the high severity count in the Python repository for the last
100 commits is constant and equal to 2. This proves that high severity vulnerabilities
were introduced along the development timeline in this OSS repositories. But as we
can see, in the last 100 commits, the count never changes. Therefore, we can’t identify
the introduction of high severity vulnerabilities just from analyzing the last 100
commits. Also, as the plot suggest, those high severity vulnerabilities are also not
getting fixed in the last 100 commits of these three OSS repositories.

(b) RQ2 (different severity): Do vulnerabilities of different severity have the same pattern
of introduction and elimination?

Purpose:
The aim of this research question is to analyse whether vulnerabilities of different
severity levels (High, Medium, and Low) exhibit common patterns in their introduction
and closure across the development period in open-source software (OSS)
repositories. The purpose of the analysis is to illustrate whether particular levels of
severity tend to be more likely introduced or closed at any particular phase of
development, providing valuable information as to how security practices manage
vulnerabilities of different orders of severity.

Approach:
To answer this RQ, the approach was quite similar to what I did to answer the previous
RQ. I created a python script which reads the ‘bandit_summary.csv’ file which contains
High, Medium and Low severity counts. Then it plots the high, medium & low severity
count across all the 100 commits as a line chart, i.e., y-axis shows the high, medium &
low severity count and the x-axis have the corresponding commit number/position.
The pattern in the line chart will show how the different severity has the same pattern
of introduction and elimination. Below screenshot shows the python script which fulfill
this approach.

47
Results:
The output from the above python script for each of the three repositories are as
follows:
Transformers Repository’s severity patterns:

48
Scikit-learn Repository’s severity patterns:

Python Repository’s severity patterns:

49
I have added all the images that contains the patterns shown by each of the three
repositories for the three(High, Medium & Low) different level of severity. Let’s
interpret, what does these pattern means. As discussed earlier in the previous RQ, the
high severity vulnerabilities remain unfixed in the last 100 commits of all the three
repositories. But there are some differences in the patterns for the medium and low-
level severity across these three repositories. For the Python repository all(High,
Medium & Low) level of severity remains same, i.e., no new vulnerability is introduced
in the last 100 commits as well as no older vulnerabilities are getting fixed in the last
100 commits(Medium severity count = 4 & Low severity count = 755).
If we look at the transformer repository, the story remains the same for high severity
vulnerabilities. But there is fluctuation in medium and low severity count. From the
above images, we can see that one medium severity vulnerability gets introduced, thus
the count of the medium severity increased from 641 to 642 at the beginning and then
stayed constant. If we look at the low-level severity for this repository, we can see
many new vulnerabilities gets introduced as well as gets eliminated in the last 100
commits. Overall, at the end(100th commit ) the number of low-level severities is less
than what it was at the beginning(1st commit in the last 100 commits) which means
more vulnerabilities with low-level severities gets eliminated than its introduction.
If we look at the scikit-learn repository, the story remains the same for both high & low
severity vulnerabilities. But there are some changes in the low-level severity count.
The count is getting continuously changed. This implies many new vulnerabilities with
low-level severity gets introduced as well as gets eliminated in the last 100 commits.
But in this repository overall, the number of new vulnerabilities with low-level severity
at the end(100th commit ) is more than what it was at the beginning(1st commit in the
last 100 commits) which means more vulnerabilities with low-level severities gets
introduced than its elimination.
Still from all these patterns discussed above, we can’t comment on the introduction
and elimination of vulnerabilities as we are only looking at the last 100 commits.
But, we can for sure say that vulnerabilities of different severity don’t have the same
pattern of introduction and elimination for all cases. Overall, High Severity
vulnerabilities are less frequently modified during development, possibly due to their
critical nature requiring more extensive fixes, Low Severity vulnerabilities are
introduced and resolved more frequently, indicating they may be tied to minor code
changes or less critical areas of the codebase and Medium Severity vulnerabilities are
introduced and resolved relatively less than low severity vulnerabilities and more than
high severity vulnerabilities.
Well, it was not asked but I also generated the ‘total’ severity distribution across all
the commits. The python script which shows the distribution in form of pie chart is as
follows:

50
This script will generate the following output for each repository:

Transformers Repository’s severity distribution:

Scikit-learn Repository’s severity distribution:

51
Python Repository’s severity distribution:

(c) RQ3 (CWE coverage): Which CWEs are the most frequent across different OSS
repositories?

Purpose:
The objective of this research question is to examine which Common Weakness
Enumerations (CWEs) are most prevalent in various open-source software (OSS)
repositories. This assessment aids in understanding the most prevalent security
weaknesses of OSS so that developers can counter these weaknesses more effectively.

Approach:
To answer this RQ, the approach was quite similar to what I did to answer the other
two previous RQ. I created a python script which reads the ‘bandit_summary.csv’ file
which also contains the unique CWEs(Common Weakness Enumerations). Then it plots
the unique CWEs across all the 100 commits. The generate chart will show the most
frequent CWEs across different OSS repositories. Below python scripts identifies and
plot the unique CWEs.

52
Results:
The output from the above python script for each of the three repositories are as
follows:
Transformers Repository’s most frequent CWEs:

Scikit-learn Repository’s most frequent CWEs:

Python Repository’s most frequent CWEs:

53
The above three images show the most frequent CWEs across the 100 commits. Let’s
look at each of the CWEs and what it means. Before that, we will perform union
operation between the CWEs of the three repositories.

Transformer Repository most frequent CWEs are :


CWE-78, CWE-703, CWE-259, CWE-502, CWE-330, CWE-400, CWE-22, CWE-377,
CWE-89, CWE-20

Scikit-learn Repository most frequent CWEs are :


CWE-502, CWE-703, CWE-377, CWE-22, CWE-78, CWE-400, CWE-94, CWE-327,
CWE-330, CWE-259

Python Repository most frequent CWEs are :


CWE-703, CWE-330, CWE-327, CWE-22, CWE-78, CWE-502, CWE-259
So, to get the total most frequent CWEs across these three repositories, we perform a
union operation. Therefore, we get
Transformer Repository most frequent CWEs U Scikit-learn Repository most frequent CWEs U Python Repository most frequent CWE:
CWE-78, CWE-703, CWE-259, CWE-502, CWE-330, CWE-400, CWE-22, CWE-377,
CWE-89, CWE-20, CWE-94, CWE-327.
Now, let’s talk about each CWEs we got in the three repositories. I have made the
above mentioned CWEs as hyperlink. So, it will take you to page from where I
understand that particular CWE.
1. CWE-78 (OS Command Injection): It is a situation where untrusted input is
mistakenly interpreted as an operating system command, enabling attackers to run
any command they like.
2. CWE-703 (Improper Handling or Check of Exceptional Conditions): If errors or
exceptions are not handled appropriately, then unexpected problems can be
caused. This includes such security issues as information leakage or denial-of-
service.
3. CWE-259 (Hard-coded Passwords): Hard-coded passwords in code can allow
attackers to gain access via unauthorized pathways if the code is revealed.
4. CWE-502 (Untrusted Data Deserialization): The insecure deserialization process
makes it possible for attackers to run arbitrary code by tampering with serialized
objects before deserialization.
5. CWE-330 (Use of Insufficiently Random Values): Using too deterministic random
values in security protocols compromises security to the extent that it makes
attacks like brute force attacks function more effectively.
6. CWE-400 (Uncontrolled Resource Consumption): Uncontrolled use of resources
(e.g., CPU, memory, network) will lead to denial-of-service (DoS) conditions in case
an attacker employs uncontrolled recursion or loops.

54
7. CWE-22 (Path Traversal): Malicious input manipulation may be employed to
enable the attackers to access files outside the target directory, thereby exposing
sensitive system files.
8. CWE-377 (Incomplete Cleanup): Incomplete cleanup of temporary sessions, files,
or memory might result in information loss or system problems.
9. CWE-89 (SQL Injection): It is a situation where user input is included in SQL
statements without filtering. This attack can be used by malicious users to modify
the database or extract confidential data.
10. CWE-20 (Improper Input Validation): Failure to validate user input correctly can
lead to security issues such as buffer overflows, injection attacks, and system
crashes.
11. CWE-94 (Code Injection): Results when untrusted data is interpreted as code and
enables an attacker to run any commands within an application.
12. CWE-327 (Use of Broken or Risky Cryptographic Algorithm): The use of weak or
aging cryptographic algorithms can weaken data security and expose encryption
to attack.

So, we have looked at each CWEs and also answered each RQs asked in this lab assignment.

Discussion
I didn’t face many challenges during this lab work, but some of them are as follows:

• Creating the ‘.bandit’ configuration file took some time.


• The plots generated surprised me, so I initially thought I had done something wrong,
but in the end, everything was correct. It was just the consequence of looking at only
the last 100 commits.
I learned something new during this lab work, like:

• Learned how to install, execute, and configure the Bandit tool for security vulnerability
analysis.
• Learned how to answer research questions effectively based on data analysis and
visualization.

55
Conclusion
This lab assignment aimed to examine security weaknesses in open-source Python projects
using Bandit, and in particular, Common Weakness Enumeration (CWE) coverage and severity
trends over commits. The findings revealed that some vulnerabilities, especially high-severity
ones, remained unfixed, while medium and low-severity vulnerabilities fluctuated more.
However, since in this lab assignment, it only looked at the last 100 commits for each
repository, the findings were somewhat limited in their ability to show long-term security
trends. A larger number of commits would allow for a stronger observation of how
vulnerabilities both emerge and get addressed over time. This research proved the highest
need for security inspection in open-source development and the need for static analysis tools
such as Bandit in the identification of likely vulnerabilities.

-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x- End of Lab 07 & 08 Report -x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-

56

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy