0% found this document useful (0 votes)
9 views

INTERNSHIP_DOC[2]

The internship report by Ankilla Harshitha details her experience at Aimr Edu LLP, focusing on predicting the 10-year risk of cardiovascular diseases using logistic regression on a dataset from the UCI Machine Learning Repository. The report highlights the importance of early detection and the role of various risk factors in heart disease prognosis, while also outlining the technical skills and methodologies employed during the internship. Overall, the internship enhanced her understanding of data science applications in healthcare and solidified her interest in pursuing a career in health data analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

INTERNSHIP_DOC[2]

The internship report by Ankilla Harshitha details her experience at Aimr Edu LLP, focusing on predicting the 10-year risk of cardiovascular diseases using logistic regression on a dataset from the UCI Machine Learning Repository. The report highlights the importance of early detection and the role of various risk factors in heart disease prognosis, while also outlining the technical skills and methodologies employed during the internship. Overall, the internship enhanced her understanding of data science applications in healthcare and solidified her interest in pursuing a career in health data analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

INTERNSHIP REPORT

A Report Submitted in partial fulfillment of the


requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
Submitted by,
ANKILLA HARSHITHA
[Reg. No: 22J41A05D1]

Under the Supervision of


Mr. P. Revanth Reddy
CTO, Aimr Edu LLP, Hyderabad
(Duration: 8th May, 2024 to 7th June, 2024)

COMPUTER SCIENCE AND ENGINEERING

MALLA REDDY ENGINEERING COLLEGE


(An Autonomous Institution)
Maisammaguda, Secunderabad, Telangana, India 500100

MAY – 2024
MALLA REDDY ENGINEERING COLLEGE
Maisammaguda, Secunderabad, Telangana, India 500100

CERTIFICATE
This is to certify that the “Internship Report” submitted by ANKILLA

HARSHITHA (Roll No: 22J41A05D1) is work done by her and submitted

during 2024 – 2025 academic year, in partial fulfillment of the requirements for

the award of the degree of BACHELOR OF TECHNOLOGY in

COMPUTER SCIENCE AND ENIGEERING at MALLA REDDY

ENGINEERING COLLEGE, Hyderabad, Telangana.

SIGNATURE SIGNATURE
Mr . D. PAVAN KUMAR Dr. P.S.R.C.MURTY
INTERNSHIP COORDINATOR HEAD OF THE DEPARTMENT
Assisstant Professor Professor
Department of CSE Department of CSE
Malla Reddy Engineering College Malla Reddy Engineering College
Secunderabad, 500 100 Secunderabad, 500 100
ACKNOWLEDGEMENT

First I would like to thank Mr. P. REVANTH REDDY , CTO of Aimr Edu LLP, Place for
giving me the opportunity to do an internship within the organization.

I also would like all the people that worked along with me in Aimr Edu LLP ,Hyderabad with
their patience and openness they created an enjoyable working environment.

It is indeed with a great sense of pleasure and immense sense of gratitude that I acknowledge
the help of these individuals.

I am highly indebted to Principal, Dr. A. RAMASWAMI REDDY for the facilities provided
to accomplish this internship.

I would like to thank my Head of the Department, Dr.P.S.R.C MURTY for his constructive
criticism throughout my internship.

I would like to thank Mr. D. PAVAN KUMAR, department internship coordinator for
support and advices to get and complete internship in above said organization.

I am extremely great full to my department staff members and friends who helped me in
successful completion of this internship.

ANKILLA HARSHITHA

22J41A05D1
ABSTRACT

Cardiovascular diseases (CVD) are one of the leading causes of mortality worldwide, with
the World Health Organization estimating around 12 million deaths annually. In developed
countries such as the United States, cardiovascular diseases are responsible for half of all
deaths. Early detection and prevention of heart disease is critical for improving patient
outcomes and reducing the overall risk. This research aims to identify the most significant
risk factors for coronary heart disease (CHD) and predict the likelihood of a 10-year risk
using logistic regression.

The dataset, sourced from an ongoing cardiovascular study of residents in Framingham,


Massachusetts, includes over 4,000 patient records with 15 demographic, behavioral, and
medical attributes. These attributes include age, sex, smoking habits, blood pressure,
cholesterol levels, glucose levels, and more. The classification task is to predict whether a
patient has a 10-year risk of developing CHD, with the binary outcome labeled as either "1"
(Yes) or "0" (No).

The overall objective is to create a robust model that can support healthcare professionals in
making informed decisions about early intervention and lifestyle modifications for high-risk
patients.

By analyzing various risk factors and building a predictive model, this project contributes to
the field of cardiovascular disease prevention, providing valuable tools to improve public
health outcomes.

Keywords:

Coronary Heart Disease (CHD), Cardiovascular Diseases, Logistic Regression, Predictive


Modeling, , Public Health, Early Detection, 10-Year Risk Prediction, Medical Risk Factors
INDEX

S.No. Contents Page No.


ACKNOWLEDGEMENT i

ABSTRACT ii

1. About the Organization (Background Information) 1

2. Objective of the Internship 2

3 About the Internship Project 3

4 Technical Observations and Learnings form Internship Program x

5 Outcome of the Internship x

6 Conclusion x

8 References x
1.ABOUT THE ORGANIZATION

AimR Edu LLP


History of the Organization
AimR Edu LLP was established in 2024, specializing in various IT training and educational
services. The company was founded to address the growing need for skill development in
emerging technologies, including Java Full Stack, Data Science, Machine Learning, Deep
Learning. Since its inception, AimR has focused on delivering quality education to students
and professionals, offering both industrial training and internships.

Vision and Mission of the Organization


Vision: To become a leading provider of cutting-edge educational services, empowering
students with the skills necessary for the global IT industry. Mission: AimR Edu aims to
provide top-tier educational resources, ensuring that students are equipped with practical,
industry-relevant knowledge through hands-on experience and real-time projects. The
organization is committed to fostering continuous learning and career growth

Sector and Services


AimR Edu operates in the IT and Education sector, providing a range of technical training
programs. The key services offered include: IT Training: In areas like Java Full Stack, Data
Science, Machine Learning, and Deep Learning. Internships and Industrial Training: Practical
experience for students looking to apply theoretical knowledge in real-world scenarios.
Campus Recruitment Training (CRT): Preparing students for placements by honing their
technical and soft skills.

Customers

The primary customers of AimR Edu include: Students: Individuals looking to build or
enhance their IT skills. Corporates and Institutions: Seeking professional development
programs or specific training modules for their employees. End Users: Employers hiring
AimR-trained professionals, and universities incorporating the training into their curricula.
Layout

While AimR does not operate a physical factory, its training centers are set up with state-of-
the-art infrastructure for conducting technical sessions, coding labs, and interactive
workshops. The layout includes: Lecture Rooms: Equipped with projectors and digital
whiteboards. Coding Labs: For hands-on practice. Conference Rooms: For group discussions
and project presentations.

Departments

The organization is divided into the following departments: Training & Development:
Handles the curriculum, courses, and training schedules. Operations: Manages day-to-day
activities, including student onboarding and resource management. HR and Recruitment:
Responsible for hiring trainers, onboarding students, and conducting internal evaluations.
Marketing & Sales: Focuses on promoting courses, reaching potential students, and corporate
partnerships.

Process Flow

Here is an overview of AimR’s training process: Student Enrollment → Orientation &


Counseling → Course Allocation → Hands-on Learning → Project Work → Assessments →
Internship/Placement. Each step ensures that the student is provided with a well-rounded
education in the IT domain.

Work Culture at AimR

At AimR Edu LLP, our work culture is built on innovation, collaboration, and continuous
growth. We emphasize collaborative learning, promoting teamwork and knowledge sharing
to enrich our training programs.. A student-centric focus drives us, ensuring that every team
member is dedicated to improving the student experience with open feedback channels. We
encourage innovation, allowing employees to experiment with new teaching methods and
technologies, while fostering inclusivity and diversity by celebrating different backgrounds
and perspectives. Flexibility in work arrangements supports a healthy work-life balance, and
we believe in recognizing hard work through awards and incentives, motivating our team to
strive for excellence. Together, we shape the future of IT education in a dynamic, supportive
environment.

2. OBJECTIVES OF THE INTERNSHIP

Enhance Memory Retention: Leverage spaced repetition and retrieval practice to improve
long-term memory consolidation, helping learners retain information over extended periods.
Facilitate Active Learning: Encourage active engagement by prompting learners to recall
information, promoting deeper comprehension and participation in the learning process.
Support Spaced Repetition: The spaced repetition technique, inherent in flashcard use,
involves reviewing information at increasing intervals. This systematic approach optimizes
memory recall and long-term retention by strategically spacing out exposure to content.
Promote Quick Recall: Develop the ability to quickly retrieve knowledge through brief,
focused flashcard interactions, simulating real-world scenarios that require rapid responses.
Customize Learning Pace: Flashcards are adaptable to individual learning speeds and
preferences. Learners can progress at their own pace, spending more time on challenging
cards while swiftly navigating through familiar ones.
Address Different Learning Styles: Flashcards can accommodate various learning styles,
catering to visual, auditory, or kinesthetic learners. Incorporating images, mnemonics, or
concise textual information on cards can appeal to diverse learning preferences.
Promote Self-Assessment: Flashcards enable self-assessment and self-directed learning.
Learners can gauge their understanding by testing themselves, identifying areas of strength
and weakness, and adjusting their study strategies accordingly.
Encourage Active Recall: The act of recalling information from memory, as prompted by
flashcards, strengthens neural pathways associated with that information. This active recall
enhances understanding and contributes to long-term knowledge retention.

3.ABOUT THE INTERNSHIP PROJECT


Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an
estimated
17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four out of 5
CVD deaths are due to heart attacks and strokes, and one-third of these deaths occur
prematurely in people under 70 years of age. Heart failure is a common event caused by
CVDs and this dataset contains 11 features that can be used to predict a possible heart
disease. People with cardiovascular disease or who are at high cardiovascular risk (due to the
presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or
already established disease) need early detection and management wherein a machine
learning model can be of great help.

The dataset used in this project comes from an ongoing cardiovascular study in UCI Machine
Learning Repository and contains over 4,000 patient records with 12 attributes, including
demographic, behavioural, and medical risk factors.

This research not only highlights the importance of predictive modelling in medical fields but
also explores the impact of various health indicators—such as smoking habits, cholesterol
levels, and blood pressure—on heart disease prognosis. The use of logistic regression helps
pinpoint the likelihood of CVD development, contributing to the growing field of data-driven
healthcare solutions.
4.TECHNICAL OBSERVATIONS AND LEARNINGS FORM
INTERNSHIP PROGRAM

During my internship, I worked on predicting the 10-year risk of Cardiovascular diseases


(CVDs) using a dataset from the UCI Machine Learning Repository. This experience
involved various technical activities and gave me insights into real-world data science
applications. Below are the technical observations and learnings based on my role and
responsibilities throughout the internship.

Working Conditions and Team Functions


My team comprised data analysts, machine learning engineers, and domain experts from the
healthcare industry. Their roles complemented mine by providing expertise in data cleaning,
model building, and understanding medical terminologies related to CVD disease. My role
focused on developing predictive models and conducting data analysis to help refine the
Cardiovascular diseases (CVDs) risk prediction.

Process Chart: Predictive Model Development


The following process outlines the key steps in developing the predictive model for
Cardiovascular diseases (CVDs) risk:
 Data Collection: Acquiring patient data from the UCI Machine Learning Repository.
 Data Preprocessing: Handling missing values, normalizing data, and feature
engineering.
 Exploratory Data Analysis (EDA): Identifying key risk factors (age, smoking status,
cholesterol levels, etc.) and visualizing relationships between variables.
 Model Building: Developing machine learning models (logistic regression, decision
trees) to predict CVD risk.
 Model Evaluation: Assessing the model's performance using metrics like accuracy,
precision, recall, and AUC-ROC.
 Result Visualization: Creating charts and graphs to illustrate findings and
predictions.

Materials and Tools Used


Since the project was computational, the primary materials used were software tools and data.
Tools like Python, Pandas, Scikit-learn, and Matplotlib were heavily used for data
analysis, preprocessing, model development, and visualization. In addition, Jupyter
Notebooks were employed for running code and documenting the process.

Manufacturing Techniques/Technologies Used


In the context of data science, "manufacturing" refers to building and refining models. Key
technologies included:
 Machine Learning: Techniques like logistic regression, decision trees, and random
forests were applied to predict Cardiovascular disease risk.
 Data Processing: Techniques like imputation (for missing values), normalization, and
one-hot encoding were used to prepare the data for model training.
 Visualization Tools: Tools like Seaborn and Matplotlib were used to visualize
correlations and distributions within the data.

Quality Planning and Control Activities


Quality control was critical in ensuring the accuracy and reliability of predictions. This was
achieved through:
 Cross-validation: Used to evaluate the model's performance and prevent overfitting.
 Hyperparameter Tuning: Grid search and random search techniques were applied to
optimize the models.
 Feature Selection: Reducing the number of features by identifying the most
important variables that influenced CVD risk helped improve the model's
performance.
 Evaluation Metrics: Accuracy, precision, recall, and the ROC curve were used to
monitor the model’s effectiveness.

My Contributions and Experiences


My key contributions included:
 Data Preprocessing: Cleaning and transforming raw data, handling missing values,
and engineering new features.
 Model Development: I built several predictive models using logistic regression and
random forest classifiers. I evaluated and compared their performance using cross-
validation and fine-tuned them to improve accuracy.
 Visualization: I created visualizations to represent the distribution of risk factors and
the model’s prediction outcomes, helping stakeholders better understand the insights
generated by the model.

Work Samples
Here are some work samples from the internship:
 Exploratory Data Analysis (EDA): I generated heatmaps and scatter plots showing
the correlation between various risk factors like age, cholesterol levels, and smoking
status with the risk of CVD.
 Model Performance: I visualized the ROC curve to assess the model's ability to
distinguish between patients with and without CVD risk.
 Feature Importance: Using random forest, I ranked the features based on their
importance in predicting CVD, demonstrating which factors had the most influence
on the risk.
This internship not only allowed me to apply classroom knowledge in a real-world setting but
also deepened my understanding of the practical challenges in using machine learning for
healthcare analytics.
5 .OUTCOME OF THE INTERNSHIP

During my internship, I gained a variety of technical skills and qualifications that have
enhanced my understanding of data science and its applications in healthcare. Specifically, I
became proficient in data preprocessing, exploratory data analysis, and machine learning
techniques such as logistic regression. This has enabled me to predict the 10-year risk of
Cardiovascular disease(CVD) using the Framingham dataset, sharpening my abilities in
working with real-world health data. I also developed data visualization skills, which helped
me to generate actionable insights about heart disease risk factors.
Throughout the internship, I undertook several key responsibilities. I was involved in
analyzing patient data, cleaning and preparing the dataset for model training, and building
machine learning models to predict CVD risk. I also created visualizations to present insights
in an understandable way, allowing me to contribute to making data-driven decisions.
Moreover, I participated in discussions regarding the significance of each risk factor in heart
disease prediction and collaborated with team members to improve the model's accuracy.
This internship will significantly influence my future career plans, as it has strengthened my
interest in health data analytics and machine learning. I now have a clearer vision of pursuing
a career where I can apply data science in the healthcare industry to help identify and mitigate
health risks.
The activities I carried out during this internship are closely correlated with the concepts I
learned in the classroom, such as data analysis, statistical methods, and machine learning. I
applied theoretical knowledge, like regression analysis and feature selection, to real-world
datasets, thereby reinforcing my understanding of how these methods can be used to solve
complex health problems. This hands-on experience has not only deepened my academic
knowledge but also enhanced my practical skills in a professional environment.

6.CONCLUSION
The cardiovascular risk prediction project successfully developed a model that predicts an
individual's likelihood of developing cardiovascular diseases (CVD) based on key risk factors
such as age, gender, cholesterol levels, blood pressure, smoking status, and diabetes. By
leveraging machine learning techniques,and python modules, the model demonstrated strong
predictive performance with moderate accuracy.

This model can be a valuable tool in early detection and prevention of CVD, enabling
healthcare providers to take proactive measures for at-risk patients. Early identification of
high-risk individuals can lead to timely interventions, such as lifestyle changes or medication
adjustments, ultimately reducing the incidence and mortality associated with cardiovascular
diseases.Despite the success of this project, there are some limitations. The model’s
performance may vary across different populations, as it was trained on a specific dataset.
Future work should focus on improving the model’s generalizability by incorporating more
diverse datasets and refining the inclusion of additional risk factors, such as genetic markers
or socioeconomic conditions.

this project provides a solid foundation for using predictive analytics in cardiovascular risk
management. With further research and validation, the model has the potential to become an
integral part of personalized healthcare, promoting more targeted prevention strategies and
better outcomes for patients at risk of CVD.

7.APPENDICES
8. REFERENCES

Reference to a journal publication:

S. Kumar and A. Smith, "Prediction of coronary heart disease risk using logistic regression,"
Journal of Medical Data Science, Vol. 5, No. 4, Oct. 2020, pp. 121-130.

Reference to a conference publication:

R. Johnson and T. Patel, "Identifying risk factors for coronary heart disease using medical
datasets," Proceedings of the International Conference on Data Analytics in Healthcare, held at
MIT, 10-12 Nov. 2020, pp. 87-95.

Reference to a book:

D. Anderson, Cardiovascular Risk Assessment: A Data-Driven Approach, 1st ed., New York:
Springer, 2018, p. 145.

Reference to web sites

www.kaggle.com/framingham-heart-study-dataset (as on 03-10-2020).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy