Report file
Report file
2
Acknowledgemnt
I am truly thankful for the opportunity to complete my
internship in Business Analytics with YBI FOUNDATION.
This experience has greatly enhanced my understanding
of business analytics and the importance of data-driven
decision-making. I want to extend my heartfelt
appreciation to everyone who contributed to making
this experience possible.
3
4
5
Content
Highlights
1. Introduction
1.1 Executive Summary ...................................................... 7
1.2 Introduction to internship.......................................... 8-9
1.3 Internship duration .......................................................10
1.4 Objective of the Internship ........................................11-12
2. Internship Experience
2.1 Company profile ............................................................13-16
2.2 Job Description and Responsibilities ........................17-18
2.3 Outline of the internship ..............................................19-20
2.4Introduction to Python ..................................................21-22
2.5 Introduction to Google Collab...........…........................23-24
3.Key Learnings
3.1 Understanding Business Analytics............................. 25-26
3.2 Python libraries for Business Analytics..................... 27-32
3.3 Steps to make a Collab file............................................33- 39
4. Learning outcomes
4.1 Skills developed ............................................................. 40-44
5. Conclusion
5.1 Summary of the Internship Experience ..................... 45-46
6. References
6.1 Bibliography of Sources Used .......................................47-48
6
The internship at YBI Foundation provided a comprehensive
learning experience in the field of business analytics,
focusing on practical applications of analytical tools and
methodologies. The primary goal of the internship was to
enhance my understanding of data-driven decision-making
processes while contributing to ongoing projects within the
organization.
7
A summer internship is crucial for fostering both personal and
professional development, offering advantages that extend beyond
conventional classroom learning. One of the primary benefits is the
chance to gain hands-on experience. Interns are able to apply
theoretical concepts learned in their studies to practical projects,
effectively closing the gap between academic knowledge and the
realities of the workplace. This experiential learning aids in a smoother
transition into full-time employment.
8
This report reflects on my experiences and insights from my business
analytics internship at YBI Foundation. This valuable opportunity
greatly enhanced my comprehension of data-driven decision-making
and the real-world application of analytical techniques within a
corporate context. The internship was structured to provide practical
experience while allowing me to engage in projects that utilize business
analytics to generate strategic value for the organization.
9
I completed my internship at the YBI Foundation
from 17 August 2024 to 17 September 2024,
spanning a total of 1 month.
10
Before beginning my internship at YBI Foundation, I
established specific objectives to boost my
productivity and enhance my experience in business
analytics. These goals were centered on improving
my technical abilities, broadening my knowledge
base, and ensuring that my internship experience
would align with my future career goals.
11
Furthermore, I aspired to expand my knowledge of
the business environment, particularly how
analytics can influence economic decision-making.
Understanding the intersection of data analytics
and economics was crucial for me being an
economics student. I wanted to learn how
organizations leverage data to drive operational
efficiency, market fluctuations, or policy impacts
that help understand the economic implications of
data using metrics like GDP growth, inflation rates
and unemployment data.
12
YBI Foundation, based in Delhi, is a not-for-profit edutech
organization dedicated to empowering youth to thrive in the
realm of emerging technologies.
13
YBI Foundation is a leading organization
dedicated to fostering entrepreneurship and
supporting emerging entrepreneurs,
particularly in developing regions. The
foundation focuses on empowering young
individuals by providing mentorship, training
programs, and access to essential resources for
starting and managing successful businesses.
Through its efforts, YBI Foundation aims to
drive economic growth and create job
opportunities by promoting innovation and
entrepreneurial initiatives, making a meaningful
impact on the communities it serves.
+ coll 0+ st
ud
1,0 ,00
eg
1500
ents
es
14
YBI Foundation is a non-profit organization
based in Delhi, India, established in
October 2020. Operating on a business-to-
consumer (B2C) model, the foundation has
a small team of fewer than 10 employees.
The core team includes Dr. Alok Yadav as
the founder and Arushi as the co-founder.
15
Hands-On Project learning.
Instant Doubt Resolution
Capstone Projects
Interview Preparation Support
Curriculum Aligned with Industry Needs
Verified Digital Certification
Beginner-Friendly approch
16
The business analytics internship of 30 days involved 8 chapters
divided into 22 lessons along with project submission and a quiz at
the end to attain the certificate.
1. Importing Libraries:
Utilizing Scikit-learn for machine learning
predictions
Performing numerical operations with NumPy
Visualizing data insights using Matplotlib
Manipulating and processing data with Pandas
2. Acquiring Data:
Mastering web scraping methods
Cleaning and prepping datasets
Conducting exploratory data analysis (EDA)
Understanding when to apply pd.get_dummies()
for categorical variables
4. Splitting Data:
Splitting data into training and testing subsets for
model evaluation.
17
5. Selecting Models:
Choosing between supervised and unsupervised learning
based on the problem
Managing categorical vs. continuous data effectively
6. Training Models:
Using model.fit(X_train, y_train) to train your model
Addressing issues with maximum iterations during training
7. Making Predictions:
Applying your trained model to make predictions on new data
8. Evaluating Accuracy:
Checking the performance of your model to ensure its
accuracy and reliability
GitHub Profile:
Setting up a professional GitHub profile
Uploading your assignment projects to showcase your work
18
Introduction to Internship:
This section provided an overview of the internship's objectives,
goals, and structure. It established clear expectations regarding
the skills to be developed throughout the program. Additionally,
guidelines were provided to help us meet the internship
requirements, ensuring we were aware of deadlines, submission
standards, and available learning resources.
Introduction to Python:
Python emerged as an essential programming language for data
analysis. During this part of the internship, we focused on
understanding the foundational concepts of Python
programming.
19
Project Submission:
The process of submitting your project involved
gathering the findings, insights, and
recommendations from our analysis.
Being able to present data insights is an essential
skill in business analytics. We honed our ability to
summarize complex data analyses in a clear and
concise manner, making it accessible for non-
technical stakeholders.
20
Python is a widely used, high-level programming
language that is interpreted and versatile,
renowned for its object-oriented and general-
purpose design with dynamic semantics. It plays
a significant role in numerous applications,
often operating behind the scenes in tools and
technologies people use every day. Python
supports various programming paradigms, such
as structured, object-oriented, and functional
programming. Known as a "batteries included"
language, it features an extensive standard
library equipped with a comprehensive array of
built-in functionalities.
21
Several factors contribute to Python's appeal as a great
language for learning:
22
Google Colab is an online Jupyter notebook platform
offered by Google, enabling users to write and run Python
code directly in their web browsers without any setup. It
provides free GPU access and simplifies the process of
sharing notebooks.
23
How to create a Google
Collab notebook
To create a Google Colab notebook, we can follow
these steps
24
Business analytics (BA) encompasses a variety of techniques and
technologies that leverage data to assist organizations in making
informed decisions. It is an integral part of the broader business
intelligence domain, which emphasizes the collection, storage,
and analysis of data.
25
Business analytics plays a crucial role in providing
actionable insights that enable organizations to make
informed, data-driven decisions. It is vital for identifying
and addressing ongoing challenges that disrupt smooth
operations, such as workflow inefficiencies that lead to
delays. Additionally, business analytics evaluates how
resources are allocated and utilized, revealing potential
areas for cost reduction.
26
A Python library consists of a set of modules that
contain reusable code, making it easier for
developers to implement various functionalities in
their applications. By utilizing these libraries,
programmers can avoid redundant coding and
dedicate their efforts to more intricate programming
challenges. This is particularly advantageous in areas
such as Machine Learning, Data Science, and Data
Visualization.
27
Pandas
Pandas are a highly effective tool for data manipulation and
analysis, offering powerful structures like Data Frames and
Series that simplify working with structured data, especially
tabular data. It provides functionalities for cleaning,
transforming, and analyzing data, making it an essential
library for data science and machine learning tasks. Built on
Python, it is open-source, fast, flexible, and easy to use,
allowing users to perform complex data operations with
minimal code. With its ability to handle large datasets
efficiently and integrate seamlessly with other libraries,
Pandas has become one of the most widely used tools in the
field of data analysis.
Data Exploration:
data.head() # Display the first five entries of the
DataFrame
data.info() # Provide a summary of the DataFrame's
structure, including column names and data types
data.describe() # Generate descriptive statistics for
numerical columns
28
Numpy
NumPy, short for Numerical Python, is considered "the primary library for
scientific computing with Python" and serves as the foundation for libraries
like Pandas, Matplotlib, and Scikit-learn.
import numpy as np
Creating a DataFrame from a NumPy Array
29
Matplotlib
Matplotlib is an extensive library designed for creating static,
animated, and interactive visualizations in Python. It simplifies
simple tasks while making complex ones achievable. It offers
the ability to:
Customizing Plots:
plt.title('Sales Over Time')
plt.xlabel('Months')
plt.ylabel('Sales')
plt.legend(['Product A', 'Product B'])
plt.grid(True)
30
Seaborn
SSeaborn is a powerful Python data visualization library built on Matplotlib
that enhances its capabilities by providing a simpler interface for creating
more aesthetically pleasing and complex statistical plots. It is designed to
make it easy to generate informative and attractive visualizations, especially
for statistical data, such as correlation heatmaps, regression plots, and
distribution charts. Seaborn integrates well with Pandas data structures,
making it ideal for visualizing data stored in DataFrames. Additionally, it
supports a variety of plot types, customization options, and color palettes,
which makes it a popular choice for data analysts and scientists aiming to
create insightful and visually appealing graphics with minimal effort.
To create a line plot that shows trends over time, you can use:
`sns.lineplot(x='month', y='sales', data=data)`
31
Sklearn
Scikit-Learn, commonly referred to as sklearn, is a comprehensive
library tailored for machine learning tasks. It offers a suite of tools for
constructing models, preprocessing datasets, and assessing their
performance.
Fundamental Functions and Commands:
*Data Preparation:**
To prepare your data for modeling, you can use the following code
snippet:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
This function divides the dataset into training and testing sets, which
is crucial for training models effectively and validating their
performance.
*Model Development:**
When building a model, you might use:
*Performance Evaluation:**
from sklearn.metrics import mean_squared_error, accuracy_score
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred) # Calculates mean
squared error for regression tasks
accuracy = accuracy_score(y_test, y_pred) # Determines accuracy for
classification tasks
This part of the code helps to assess how well the model performs by
predicting outcomes on the test data and calculating relevant
performance metrics.
32
Steps to make a
collab file
1. Open google Collab to create a new notebook
4. Then we use the describe, head and info function to get more
details about the data we have with us.
33
The head function is utilized to showcase the initial rows of a
DataFrame, offering a swift glimpse into its contents and structure.
On the other hand, the info function delivers a brief overview of the
DataFrame, highlighting the number of non-null entries, the data
types for each column, and memory usage. This function is also
instrumental in identifying missing values and gaining insight into
data types.
34
5. Defining target i.e y variable and features i.e x variable
If y variable is continuous, we use Regression in our model
and if y variable is categorical, we use Classification in our
model.
6. When working with data, it's important to split it into two parts:
training samples and testing samples. The training samples are
used to teach the model, helping it learn the different features
and patterns present in the data. On the other hand, the testing
samples, which consist of the remaining data, are used to evaluate
how well the model performs.
35
8. Using the trained model to make predictions is all about
applying what it has learned from past data to new situations. It
looks for patterns and insights in the information it has already
processed and uses that knowledge to generate meaningful
outputs when faced with fresh data.
36
10. We can plot the model using matplotlib simply by
importing matplotlib library and using various
commands.
The plot function in Matplotlib is used to create line
plots. It is one of the most commonly used functions for
visualizing data in a 2D space. Here’s a detailed look at its
uses and features:
37
11. Seaborn library allows to create informative
charts and graphs including histograms, lineplots,
bargraphs, scatterplots etc .
Here we have taken a lineplot and histogram.
38
11. Performing ols
39
Skills
Developed
Throughout my internship at YBI Foundation, I had the
opportunity to work with various analytical tools and
software that significantly enhanced my data analysis skills.
Notably, I became proficient in Google collab and Python
each of which played a crucial role in different aspects of my
analytical tasks.
40
OLS regression is a popular tool in business analytics, utilized
for a variety of purposes, including predicting sales, analyzing
customer behavior, and evaluating the effectiveness of
marketing campaigns.
41
Data Visualizations: Mastering data visualization
skills is vital for transforming complex information
into an easily understandable visual format.
Utilizing Python libraries like Matplotlib and
Seaborn enables the creation of charts, graphs, and
dashboards. These visual tools simplify the
presentation of findings, allowing for the
translation of intricate data insights into clear and
actionable conclusions.
Machine Learning: Developing machine learning
skills involves using algorithmic analysis of data
trends and generate predictions. Leveraging
libraries such as Scikit-learn and Statsmodels in
Python allows me to work with models like
regression and OLS. This practical experience aids
in building predictive models and understanding
evaluation metrics such as accuracy, precision, and
recall, which are crucial for making informed, data-
driven predictions.
Big Data Tools: Big data tools are indispensable for
managing and processing extensive datasets,
particularly in scenarios where data volume
surpasses the limits of traditional databases.
Familiarity with analytical tools and cloud
platforms like Google Colab facilitates distributed
data processing, making it easier to manage,
analyze, and extract insights from large data sets,
ultimately preparing them for large-scale analytics
tasks.
42
2. Analytical Skills
Problem-Solving: Problem-solving skills are critical in business
analytics, as they enable us to approach data-related challenges
systematically. We learn to define business problems clearly, explore
potential solutions, and apply analytical methods to test hypotheses.
This skill is essential for tackling real-world issues, as it enables interns
to break down complex scenarios into manageable steps and find
practical, data-driven solutions.
43
3.Economic skills
Econometrics - Econometric models help interns examine
relationships between variables, such as price and demand,
or income and spending habits. For instance, we can analyze
how interest rates affect consumer spending, employment,
or investment decisions. Also it allows model Building and
Regression Analysis, particularly regression models (linear
and multiple regression), to interpret data, identify trends,
and understand economic behaviors.
44
Conclusion
My internship at the YBI Foundation was
invaluable for advancing my skills in data analysis
and business strategies. Through hands-on
projects and mentorship, I developed a stronger
command of analytics tools and learned to
transform raw data into actionable insights.
45
Pandas and NumPy are powerful libraries that enable efficient
handling of extensive datasets, making it easier to analyze
complex economic indicators such as GDP, inflation rates, and
employment statistics. For constructing econometric models,
which are vital for economic predictions and testing hypotheses,
scikit-learn provides essential tools for regression analysis and
machine learning applications. Additionally, visualization libraries
like Matplotlib and Seaborn allow economists to produce clear
and insightful graphs that illustrate trends and patterns in
economic data. These tools also facilitate the export of results
and the creation of interactive reports, thereby enhancing the
transparency and verifiability of economic research.
46
References
Alteryx. (2021). A beginner’s guide to
information Analytics in business.
www.pythoninstitute.org
GitHub.blog
yourstory.com
geeksforgreeks.com
47
Shmueli, G., & Koppius, O. (2011). Predictive
Analytics in Information Systems Research.
MIS Quarterly, 35(3), 553-572.
doi:10.2307/23042796
Journal of Business
Logistics, 34(2), 77-84. doi:10.1111/jbl.12010
Witten, I. H., Frank, E., & Hall, M. A. (2016).
48