Internship - Report Nithin
Internship - Report Nithin
An Internship Report on
DIGIADD
Submitted in partial fulfilment of the requirement for the award
Degree of Bachelor of Engineering
in
Computer Science & Engineering
Submitted by
Nithin N
1AT20CS065
Internship carried out at
Internal Guide
Dr. Farhana Kausar
Associate Professor
CERTIFICATE
This is to certified that this internship work presented by “Nithin N”, bearing USN
report deposited in the department library. The internship report has been approved as it
satisfies the academic requirements with respect to internship report as prescribed for the said
Degree.
Signature of Guide
Dr. Farhana Kausar
1.
Image format
Declaration
We, Nithin N (1AT20CS097), hereby declare that this internship work is carried
out under the guidance of Dr. Farhana Kausar Associate Professor, Dept. of
CS&E. This internship work is submitted to Visvesvaraya Technological
University in partial fulfilment of the requirement for the award of degree of
Bachelor of Engineering in Computer Science & Engineering for the academic
year 2023-24.
Place: Bangalore
Date:
Signature of Student
Acknowledgement
I would like to express my sincere gratitude to the respected Principal Dr. Y Vijay Kumar,
for providing a congenial environment to work in. I also like to express my sincere gratitude
to Dr. Aishwarya P, Head of Department, Computer Science & Engineering, for her
continuous support and encouragement. I am indeed indebted to Dr. Farhana Kausar,
coordinator and guide for her continued support, advice and valuable inputs during the
course of this internship work. Last, but not the least I would like to thank my family, who
have acted as a beacon of light throughout my life. My sincere gratitude goes out to all my
comrades and well-wishers who have supported me through all the ventures.
Abstract of Internship
This study leverages the heart disease dataset from the UCI machine learning
repository to propose a predictive model capable of assessing the likelihood of
heart disease. Various data mining techniques, including Naive Bayes, Decision
Tree, Logistic Regression, and Random Forest, are implemented to classify
patients into different risk levels. The experimental findings underscore the
effectiveness of the KNN algorithm, which attains the highest accuracy of
90.16% when compared to other machine learning algorithms employed.
In conclusion, this paper delves into the intricate landscape of heart disease
prediction, emphasizing the pivotal role of data science in processing
voluminous healthcare data. The amalgamation of diverse data mining
Table of Contents
Declaration
Acknowledgement
Summary of Internship
References
Chapter 1
1.1History of Company
Digiadd technologies, established in 2007, is a pioneer in providing high quality
competitively priced products and services to customers in Embedded Solutions for the
Packaged Products, Product Development and in Technology Solutions covering Embedded
Systems and System Software to its Global Clientele. Digiadd technologies was started by
Sridhar P with an objective of providing an integrated system solution for any organization.
Presently the team consists of software professionals’ experts in Technology Domain and
Functional Knowledge. Quality and timely delivery being the deliverables of Digiadd
technologies, we have retained Clients from the start and are continuing in doing so with our
new clients.
Over the years we have plunged into diverse domains acquiring valuable insights into the
business of product engineering. With over 10 employees, the company has development
centres in different regions of Bangalore. As an Organization, our Goal is to contribute to
society through broad-ranging activities in the areas of Software development, Training and
Technical Projects.
1.2 Mission
1. To provide more value per dollar to our clients by providing timely and qualitative
services/solutions and attain utmost client satisfaction through skill
building, innovation and best practiced processes.
2. To offer total, cost-effective, next generation embedded hardware and software
solutions in the shortest possible development time enabling our clients to launch
their product ideas early.
1.4 Vision
1. To bring best of the human assets by providing environment for grooming, nurturing,
and growing talent to foster human growth and providing services and solutions to
the IT companies globally creating value for our customers.
2. To lead in embedded hardware and software solutions and be known as an electronic
product development company of repute.
3. To build strategic partnerships globally with all stakeholders - clients, vendors, and
investors.
4. To stay abreast with technology and build our technical competence and domain
expertise.
5. To nurture a winning team that has a passion for excellence.
1.5 Management
Digiadd technologies is led by highly experienced and successful Professionals from
technology Industry With focus on customer and passion to create value management at
Digiadd technologies is committed to create an unmatched experience for the customers.
1.6 Process
At Digiadd technologies we utilize our resources and expertise to ensure that your product
development project flows smoothly. We will maintain close communication with you
throughout the project to ensure that your project is proceeding consistent with your needs.
To provide you a complete, robust and cost-effective solution we implement our projects in
the following manner
Stage 1: Inquiry
This is an initial discussion between you and our Team to share our general skills and
capabilities to see if we might be a good fit for your project needs.
This is a detailed discussion with our team and you. The purpose of this discussion is for us
to get an in depth understanding of your project needs and for you to get a firm
understanding of our specific technical capabilities in areas needed for your project.
Typically, this discussion will involve appropriate members of your team, our Business
Development Manager, our engineers, and a Project Manager.
1. Customer First - We are a customer focused company committed to create best value
for our customers. In every engagement with our customer we strive to make our
customers successful.
2. Integrity - Integrity is the way of life at Digiadd technologies. At Digiadd
technologies every single employee understands the value of integrity.
3. Team Spirit - We firmly believe that none of us is as smart as all of us.
4. Respect for all - We value diversity and respect each one associated with Digiadd
technologies.
Chapter 2
Our department's expertise lies in creating reliable machine learning models and algorithms,
making use of Python's adaptability to ensure smooth deployment and integration. Because of
our professionals' extensive knowledge of several machine learning frameworks and libraries,
we are able to customize solutions to satisfy a wide range of business demands.
Our field of study is broad, ranging from computer vision to natural language processing.
Our dedication to quality extends to the careful processing of data, using advanced feature
engineering and pre-processing methods. To promote trust and understanding among
stakeholders, we place a strong emphasis on interpretability and transparency in our models.
Our strategy is centred around collaboration, which guarantees a comprehensive
comprehension of customer needs and industry-specific obstacles.
The Machine Learning in Python Department at our company stands as a beacon of expertise
and knowledge in the ever-evolving landscape of artificial intelligence. Our team is
composed of seasoned professionals well-versed in the intricacies of machine learning, with a
specific focus on Python as our primary programming language. We pride ourselves on
staying at the forefront of technological advancements, ensuring that our solutions are
cutting-edge and aligned with industry best practices.
Chapter 3
TASK PERFORMED
Python:
Python is an interpreted, high-level, general-purpose programming language. Created by
Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code
readability with its notable use of significant whitespace. Its language constructs and object-
oriented approach aim to help programmers write clear, logical code for small and large-scale
projects. [27]
Python is dynamically typed and garbage-collected. It supports multiple programming
paradigms, including procedural, object-oriented, and functional programming. Python is
often described as a "batteries included" language due to its comprehensive standard library
Scikit Learn:
Scikit-learn (formerly scikits. learn and also known as sklearn) is a free software machine
learning library for the Python programming language. It features various classification,
regression and clustering algorithms including support vector machines, random forests,
gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python
numerical and scientific libraries NumPy and SciPy. Scikit-learn is largely written in Python,
and uses numpy extensively for high-performance linear algebra and array operations.
Furthermore, some core algorithms are written in Cython to improve performance. Support
vector machines are implemented by a Cython wrapper around LIBSVM; logistic regression
and linear support vector machines by a similar wrapper around LIBLINEAR. In such cases,
extending these methods with Python may not be possible.
Tensor flow:
Tensor Flow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for
machine learning applications such as neural networks. It is used for both research and
production at Google. Tensor Flow is Google Brain's second-generation system. Version
1.0.0 was released on February 11, 2017. While the reference implementation runs on single
devices, Tensor Flow can run on multiple CPUs and GPUs (with optional CUDA and SYCL
extensions for general-purpose computing on graphics processing units). Tensor Flow is
available on 64-bit Linux, macOS, Windows, and mobile computing platforms including
Android and iOS.
Theano:
Theano is a Python library and optimizing compiler for manipulating and evaluating
mathematical expressions, especially matrix-valued ones. In Theano, computations are
expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or
GPU architectures. Theano is an open source project primarily developed by a Montreal
Institute for Learning Algorithms (MILA) at the Université de Montréal.
Numpy:
NumPy or sometimes /ˈnʌmpi/ (NUM-pee)) is a library for the Python programming
language, adding support for large, multi-dimensional arrays and matrices, along with a large
collection of high-level mathematical functions to operate on these arrays. The ancestor of
NumPy, Numeric, was originally created by Jim Hugunin with contributions from several
other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the
competing Numarray into Numeric, with extensive modifications. NumPy is open-source
software and has many contributors.
Pandas:
In computer programming, pandas are software library written for the Python programming
language for data manipulation and analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time series. It is free software released
under the three-clause BSD license. [2] The name is derived from the term "panel data", an
econometrics term for data sets that include observations over multiple time periods for the
same individuals.
OPEN CV:
OpenCV (Open Source Computer Vision Library) is an open source computer vision and
machine learning software library. OpenCV was built to provide a common infrastructure for
computer vision applications and to accelerate the use of machine perception in the
commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses
to utilize and modify the code.
Economical Feasibility
Technical Feasibility
Social Feasibility
This study is carried out to check the economic impact that the system
will have on the organization. The amount of fund that the company can pour
into the research and development of the system is limited. The expenditures
must be justified. Since the project is Machine learning based, the cost spent
in executing this project would not demand cost for software and related
products, as most of the products are open source and free to use. Hence the
project would consumed minimal cost and is economically feasible.
This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Since machine learning algorithms is
based on pure math there is very less requirement for any professional
software. Also, most of the tools are open source. The best part is that we can
run this software in any system without any software requirements which
makes them highly portable. Most of the documentation and tutorials make
easy to learn the technology.
The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently.
The user must not feel threatened by the system, instead must accept it as a
necessity. The main purpose of this project which is based on creating an early
prediction system of Heart disease. Thus, this is a noble cause for the sake of
the society, a small step taken to achieve a secure and healthy future.
Chapter 4
REFLECTION NOTES
The internship commenced with an immersive introduction to the UCI machine learning
repository's heart disease dataset, setting the stage for a comprehensive analysis.
Collaborating within a supportive team, I implemented and fine-tuned various algorithms,
including Naive Bayes, Decision Tree, Logistic Regression, and Random Forest, to discern
their efficacy in predicting heart disease.
Weekly meetings and mentorship sessions not only strengthened my technical skills but also
offered a glimpse into the collaborative and innovative culture at DigiAdd. The internship
culminated in a comprehensive presentation showcasing the impact of machine learning in
predicting heart disease risk levels, with a particular highlight on the KNN algorithm's
outstanding accuracy of 90.16%.
Overall, my time at DigiAdd was an enriching journey, laying a solid foundation for my
future endeavours in the evolving field of healthcare data science.
2. Applied smart data cleaning techniques to maintain data quality, supporting effective
training of machine learning models.
4. Evaluated models thoroughly using metrics like precision, recall, and F1 score to
ensure reliable predictions.
8. Demonstrated expertise in handling healthcare data from the UCI machine learning
repository, showcasing adaptability in real-world scenarios.
2. Improved the accuracy of predictions by making sure the computer understood and
learned from all types of heart-related situations.
3. Cleaned up data to make sure the computer was using good-quality information to
make its predictions.
4. Adjusted the computer settings to make sure it was really good at making predictions
about heart disease.
5. Checked how well the computer predictions matched real-life situations, making sure
it didn't miss important details.
6. Solved problems related to creating effective features for the computer to use in
making predictions about heart health.
7. Worked closely with different groups of people, sharing ideas in regular meetings and
learning from experienced mentors.
4.3.2Adapatability Skills
4.3.4 Responsibility
1. Ownership: Demonstrated a strong sense of ownership by taking
responsibility for assigned tasks and projects from inception to
completion.
REFERENCES
[1] Avinash Golande, Pavan Kumar T, ”Heart Disease Prediction Using Effective Machine
Learning Techniques”, International Journal of Recent Technology and Engineering, Vol 8,
pp.944-950,2019.
[2] T.Nagamani, S.Logeswari, B.Gomathy,” Heart Disease Prediction using Data Mining
with Mapreduce Algorithm”, International Journal of Innovative Technology and Exploring
Engineering (IJITEE) ISSN: 2278-3075, Volume-8 Issue-3, January 2019.
[3] Fahd Saleh Alotaibi,” Implementation of Machine Learning Model to Predict Heart
Failure Disease”, (IJACSA) International Journal of Advanced Computer Science and
Applications, Vol. 10, No. 6, 2019.
[4] Anjan Nikhil Repaka, Sai Deepak Ravikanti, Ramya G Franklin, ”Design And
Implementation Heart Disease Prediction Using Naives Bayesian”, International Conference
on Trends in Electronics and Information(ICOEI 2019).
[5] Theresa Princy R,J. Thomas,’Human heart Disease Prediction System using Data Mining
Techniques’, International Conference on Circuit Power and Computing
Technologies,Bangalore,2016.
[6] Nagaraj M Lutimath,Chethan C,Basavaraj S Pol.,’Prediction Of Heart Disease using
Machine Learning’, International journal Of Recent Technology and Engineering,8,(2S10),
pp 474-477, 2019.
[7] UCI, ―Heart Disease Data Set.[Online]. Available (Accessed on May 1 2020):
https://www.kaggle.com/ronitf/heart-disease-uci.
[8] Sayali Ambekar, Rashmi Phalnikar,“Disease Risk Prediction by Using Convolutional
Neural Network”,2018 Fourth International Conference on Computing Communication
Control and Automation.
[9] C. B. Rjeily, G. Badr, E. Hassani, A. H., and E. Andres, ―Medical Data Mining for
Heart Diseases and the Future of Sequential Mining in Medical Field,‖ in Machine Learning
Paradigms, 2019, pp. 71–99.
[10] Jafar Alzubi, Anand Nayyar, Akshi Kumar. "Machine Learning from Theory to
Algorithms: An Overview", Journal of Physics: Conference Series, 2018
[11] Fajr Ibrahem Alarsan., and Mamoon Younes ‘Analysis and classification of heart
diseases using heartbeat features and machine learning algorithms’,Journal Of Big
Data,2019;6:81.