0% found this document useful (0 votes)
13 views

Report file

The document reflects on an internship experience in Business Analytics at YBI Foundation, highlighting the importance of data-driven decision-making and practical applications of analytical tools. Key projects involved regression analysis, data visualization, and machine learning, enhancing both technical and soft skills. The internship provided valuable insights into the role of business analytics in driving organizational success and personal career development.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Report file

The document reflects on an internship experience in Business Analytics at YBI Foundation, highlighting the importance of data-driven decision-making and practical applications of analytical tools. Key projects involved regression analysis, data visualization, and machine learning, enhancing both technical and soft skills. The internship provided valuable insights into the role of business analytics in driving organizational success and personal career development.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

1

2
Acknowledgemnt
I am truly thankful for the opportunity to complete my
internship in Business Analytics with YBI FOUNDATION.
This experience has greatly enhanced my understanding
of business analytics and the importance of data-driven
decision-making. I want to extend my heartfelt
appreciation to everyone who contributed to making
this experience possible.

I would like to particularly thank my internship


supervisor, Dr. Kakali Majumdar, for her guidance,
support, and valuable feedback throughout my time at
the organization. Her expertise and insights have
significantly enriched my learning experience and were
essential in helping me successfully navigate and
complete my project.

I am also grateful to my family and friends for their


unwavering support and encouragement throughout my
internship journey.

3
4
5
Content
Highlights
1. Introduction
1.1 Executive Summary ...................................................... 7
1.2 Introduction to internship.......................................... 8-9
1.3 Internship duration .......................................................10
1.4 Objective of the Internship ........................................11-12

2. Internship Experience
2.1 Company profile ............................................................13-16
2.2 Job Description and Responsibilities ........................17-18
2.3 Outline of the internship ..............................................19-20
2.4Introduction to Python ..................................................21-22
2.5 Introduction to Google Collab...........…........................23-24

3.Key Learnings
3.1 Understanding Business Analytics............................. 25-26
3.2 Python libraries for Business Analytics..................... 27-32
3.3 Steps to make a Collab file............................................33- 39

4. Learning outcomes
4.1 Skills developed ............................................................. 40-44

5. Conclusion
5.1 Summary of the Internship Experience ..................... 45-46

6. References
6.1 Bibliography of Sources Used .......................................47-48

6
The internship at YBI Foundation provided a comprehensive
learning experience in the field of business analytics,
focusing on practical applications of analytical tools and
methodologies. The primary goal of the internship was to
enhance my understanding of data-driven decision-making
processes while contributing to ongoing projects within the
organization.

During my tenure, I undertook several key projects and


various methodologies such as regression analysis, data
visualization techniques, and machine learning algorithms to
derive actionable insights from the data. These projects not
only allowed me to apply theoretical knowledge but also
enhanced my technical skills in software tools like collab and
python.

The outcomes of these projects were significant. Throughout


the internship, I also engaged in collaborative efforts that
enhanced my communication and teamwork skills. Regular
feedback sessions and mentorship from experienced
professionals at YBI Foundation played a crucial role in my
professional development.

In summary, the internship not only met my learning


objectives but also contributed to tangible improvements
within the organization. The insights gained and skills
developed during this period are invaluable as I pursue a
career in business analytics.

7
A summer internship is crucial for fostering both personal and
professional development, offering advantages that extend beyond
conventional classroom learning. One of the primary benefits is the
chance to gain hands-on experience. Interns are able to apply
theoretical concepts learned in their studies to practical projects,
effectively closing the gap between academic knowledge and the
realities of the workplace. This experiential learning aids in a smoother
transition into full-time employment.

Moreover, internships help individuals cultivate vital skills necessary


for success in the workplace, such as effective communication,
collaboration, and time management. These skills are best developed
through real-world experiences, where interns navigate workplace
dynamics, engage with colleagues, and learn to establish clear
communication when working with teams. Such experiences boost their
confidence and equip them to face future professional challenges.

Networking is another significant advantage of internships. Interns have


the opportunity to connect with industry professionals, supervisors,
and mentors who can provide valuable guidance, insights into the
industry, and potential job leads. These relationships can pave the way
for mentorship opportunities or job offers, making networking an
invaluable aspect of career development.

Additionally, summer internships allow students to explore their career


interests more deeply. By participating in daily tasks, they can assess
how well a particular field matches their skills and career aspirations.
This direct experience enables them to refine their career objectives
and identify roles that align with their strengths and interests.

In conclusion, a summer internship is a strategic step towards building a


successful future. It enhances resumes, broadens professional
connections, and promotes skill acquisition. The knowledge and
experiences gained during an internship lay a strong groundwork for a
fulfilling and prosperous career.

8
This report reflects on my experiences and insights from my business
analytics internship at YBI Foundation. This valuable opportunity
greatly enhanced my comprehension of data-driven decision-making
and the real-world application of analytical techniques within a
corporate context. The internship was structured to provide practical
experience while allowing me to engage in projects that utilize business
analytics to generate strategic value for the organization.

In the current competitive landscape, business analytics has become a


crucial component for organizations aiming to sustain their market
edge. The capability to effectively analyze data empowers businesses
to identify trends, streamline processes, and make informed decisions
grounded in evidence rather than mere intuition. Organizations that
embrace data analytics are better equipped to anticipate customer
preferences, enhance operational performance, and drive innovation.
As the shift towards data-driven methodologies intensifies, the need
for skilled professionals in the analytics field continues to rise.

YBI Foundation is a distinguished organization committed to


empowering young individuals through entrepreneurship and skills
development. It acts as a platform that nurtures innovation and
provides resources to support young people in their professional
journeys. By integrating business analytics into its operations, the
foundation seeks to improve service delivery, evaluate its impact, and
implement strategic initiatives that align with its mission of fostering
economic empowerment. The organization acknowledges the crucial
role of data in informing its programs and outreach strategies,
highlighting the significance of analytics in fulfilling its objectives.

Throughout this internship, I had the chance to work alongside


seasoned professionals and participate in projects that honed my
technical abilities. This experience also enriched my understanding of
how business analytics can facilitate substantial change within a
mission-driven organization like YBI Foundation.

9
I completed my internship at the YBI Foundation
from 17 August 2024 to 17 September 2024,
spanning a total of 1 month.

I worked in the business analytics department where


I was engaged in various analytical projects focused
on utilizing data analysis and visualisation techniques
to support various projects and initiatives.
It also involved the use of various tools like
Python for data analysis and libraries such as
pandas and NimPy to manipulate data and
conduct statistical analysis.
Additionally, I used Google Colab as my primary
development platform for running Python code.
This platform facilitated seamless collaboration
and made it easy to share notebooks with my
team. It also provided an efficient environment
for tasks such as data cleaning, analysis, and
model development.

10
Before beginning my internship at YBI Foundation, I
established specific objectives to boost my
productivity and enhance my experience in business
analytics. These goals were centered on improving
my technical abilities, broadening my knowledge
base, and ensuring that my internship experience
would align with my future career goals.

One of my primary aims was to gain proficiency in


data analysis tools and software. I understood the
significance of becoming adept with platforms like
Google Colab and Python, along with its various
libraries, as these are crucial for tasks such as data
visualization and statistical analysis. My goal was not
just to learn the functionalities of these tools but
also to apply them effectively in real-world
scenarios.

In addition to technical skills, I sought to deepen my


comprehension of analytical methodologies. This
encompassed acquiring knowledge in areas such as
regression analysis, predictive modeling, and
machine learning techniques. By delving into these
methodologies, I aimed to cultivate the skills
necessary to analyze complex datasets, extract
actionable insights, and support data-driven
decision-making processes.

11
Furthermore, I aspired to expand my knowledge of
the business environment, particularly how
analytics can influence economic decision-making.
Understanding the intersection of data analytics
and economics was crucial for me being an
economics student. I wanted to learn how
organizations leverage data to drive operational
efficiency, market fluctuations, or policy impacts
that help understand the economic implications of
data using metrics like GDP growth, inflation rates
and unemployment data.

Lastly, I aimed to enhance my soft skills,


particularly in communication and teamwork. It is
believed and true that effective communication is
vital for successful project execution in a
professional setting. This internship was an
opportunity to engage with diverse perspectives
and learn how to convey complex analytical
concepts clearly and concisely.

12
YBI Foundation, based in Delhi, is a not-for-profit edutech
organization dedicated to empowering youth to thrive in the
realm of emerging technologies.

The YBI Foundation integrates both digital and traditional


methods to equip students, educators, and professionals
with vital skills, education, and technological knowledge.
Embracing a "learn anytime, anywhere" approach, the
platform offers free online courses led by instructors in
various domains such as data science, business analytics,
machine learning, cloud computing, and big data. Focused
on fostering innovation, creativity, and technology, the YBI
Foundation customizes its offerings to align with the
evolving demands of the industry, helping learners stay
competitive in the dynamic tech landscape.

Target Market: This group primarily consists of students


pursuing degrees in engineering, management, and
computer applications. Additionally, it includes individuals
interested in careers within emerging technologies such as
machine learning, artificial intelligence, big data, cloud
computing, and data science.

User Age Range:

Individuals under 18 years old


Young adults aged 18 to 25
Adults aged 26 to 34

13
YBI Foundation is a leading organization
dedicated to fostering entrepreneurship and
supporting emerging entrepreneurs,
particularly in developing regions. The
foundation focuses on empowering young
individuals by providing mentorship, training
programs, and access to essential resources for
starting and managing successful businesses.
Through its efforts, YBI Foundation aims to
drive economic growth and create job
opportunities by promoting innovation and
entrepreneurial initiatives, making a meaningful
impact on the communities it serves.

+ coll 0+ st
ud
1,0 ,00
eg
1500

ents
es

14
YBI Foundation is a non-profit organization
based in Delhi, India, established in
October 2020. Operating on a business-to-
consumer (B2C) model, the foundation has
a small team of fewer than 10 employees.
The core team includes Dr. Alok Yadav as
the founder and Arushi as the co-founder.

15
Hands-On Project learning.
Instant Doubt Resolution
Capstone Projects
Interview Preparation Support
Curriculum Aligned with Industry Needs
Verified Digital Certification
Beginner-Friendly approch

16
The business analytics internship of 30 days involved 8 chapters
divided into 22 lessons along with project submission and a quiz at
the end to attain the certificate.

1. Importing Libraries:
Utilizing Scikit-learn for machine learning
predictions
Performing numerical operations with NumPy
Visualizing data insights using Matplotlib
Manipulating and processing data with Pandas

2. Acquiring Data:
Mastering web scraping methods
Cleaning and prepping datasets
Conducting exploratory data analysis (EDA)
Understanding when to apply pd.get_dummies()
for categorical variables

3. Defining Targets and Features:


Identifying independent and dependent variables

4. Splitting Data:
Splitting data into training and testing subsets for
model evaluation.

17
5. Selecting Models:
Choosing between supervised and unsupervised learning
based on the problem
Managing categorical vs. continuous data effectively

6. Training Models:
Using model.fit(X_train, y_train) to train your model
Addressing issues with maximum iterations during training

7. Making Predictions:
Applying your trained model to make predictions on new data

8. Evaluating Accuracy:
Checking the performance of your model to ensure its
accuracy and reliability

GitHub Profile:
Setting up a professional GitHub profile
Uploading your assignment projects to showcase your work

Check my project here in GitHub


(https://github.com/Gargi-1721)

The online internship focuses on exploring machine learning,


primarily using Python. It spans one month and emphasizes
the application of predictive analysis, particularly in business
contexts. As a self-paced program, it allows you to manage
your learning around your own schedule. By the end, you'll
gain a comprehensive understanding of machine learning and
its practical uses.

18
Introduction to Internship:
This section provided an overview of the internship's objectives,
goals, and structure. It established clear expectations regarding
the skills to be developed throughout the program. Additionally,
guidelines were provided to help us meet the internship
requirements, ensuring we were aware of deadlines, submission
standards, and available learning resources.

Introduction to Python:
Python emerged as an essential programming language for data
analysis. During this part of the internship, we focused on
understanding the foundational concepts of Python
programming.

Google Colab Usage:


Our next focus was on Google Colab, a user-friendly online
coding platform favored by data scientists. We learned how to
write and execute Python code within Colab, which significantly
improved our efficiency in conducting data analysis in a cloud
environment.

Python Libraries for Business Analytics:


We examined key Python libraries such as Pandas and NumPy.
We practiced using Pandas for various data manipulation tasks,
including data loading, cleaning, and transformation, while
utilizing NumPy for performing numerical computations.

Fundamentals of Business Analytics:


This segment provided valuable insights into how data can be
leveraged to solve business problems. By applying data
analytics, companies can effectively target specific customer
segments, predict sales, and optimize resources. Furthermore,
we utilized libraries like Matplotlib and Seaborn to create
visualizations, including bar charts, scatter plots, and line graphs.
These visual tools are crucial for conveying complex information
in a clear and accessible manner.

19
Project Submission:
The process of submitting your project involved
gathering the findings, insights, and
recommendations from our analysis.
Being able to present data insights is an essential
skill in business analytics. We honed our ability to
summarize complex data analyses in a clear and
concise manner, making it accessible for non-
technical stakeholders.

Feedback and Review Assignment


This part of the process encouraged us to reflect on
our internship experience, pinpointing the strengths
and areas for improvement. Receiving feedback is
crucial for understanding industry expectations and
enhancing our skills for future opportunities.

Final Quiz and Certification


The final quiz assessed our grasp of the course
material, ensuring you understood key concepts and
were prepared to apply them in real-world
situations.

Certification: Completing the quiz and earning a


certificate validated our new skills, enabling you to
showcase your knowledge to potential employers.

20
Python is a widely used, high-level programming
language that is interpreted and versatile,
renowned for its object-oriented and general-
purpose design with dynamic semantics. It plays
a significant role in numerous applications,
often operating behind the scenes in tools and
technologies people use every day. Python
supports various programming paradigms, such
as structured, object-oriented, and functional
programming. Known as a "batteries included"
language, it features an extensive standard
library equipped with a comprehensive array of
built-in functionalities.

Python, developed by Guido van Rossum, made


its debut on February 20, 1991. It has become a
pervasive technology, powering numerous
devices that people interact with daily, often
unknowingly. With billions of lines of Python
code written globally, it offers immense
potential for reusing code and gaining insights
from expertly designed examples.

21
Several factors contribute to Python's appeal as a great
language for learning:

1. Easy to Learn: Python has a relatively short learning


curve compared to many other languages, allowing you to
start programming quickly.

2. Efficient for Writing Software: Python enables faster


code development, making it easier to write new software
applications.

3. Open Source: Python is free, open-source, and


multiplatform, which makes it accessible and easy to install
and deploy—advantages not all languages offer.

4. Cross-Platform: Unlike some languages that require code


modifications for different platforms, Python allows you to
run the same code across any operating system with a
Python interpreter.

5. Rich Standard Library: Python provides an extensive


standard library that users can access, saving time by
offering pre-written modules for common tasks and
beyond.

Python is a versatile programming language that opens up


numerous opportunities across various industries. A solid
understanding of Python can lead to a wide range of c
areer possibilities.

22
Google Colab is an online Jupyter notebook platform
offered by Google, enabling users to write and run Python
code directly in their web browsers without any setup. It
provides free GPU access and simplifies the process of
sharing notebooks.

Advantages of Google Colab:


Accessibility: Google Colab is accessible via any web
browser, eliminating the need for installation on your
device. You can access your notebooks from anywhere with
an internet connection, offering exceptional convenience.

Computing Power: It provides access to advanced


computing resources such as GPUs and TPUs, allowing you
to efficiently train and execute complex machine learning
models.

Collaboration: Google Colab facilitates seamless teamwork


by enabling users to share notebooks and collaborate in
real-time, where others can view, edit, and execute the
code. This enhances productivity and collaborative efforts.

Educational Utility: As a valuable platform for learning data


science and machine learning, Google Colab supports
numerous tutorials and resources. Users can create, share,
and experiment with their own notebooks, making it an
effective educational tool.

23
How to create a Google
Collab notebook
To create a Google Colab notebook, we can follow
these steps

Access Google Colab: Go to Google Colab or


search for "Google Colab" in your web browser.

Sign in to Google: Ensure you're logged into your


Google account, as Colab is part of Google Drive.

Create a New Notebook: Once on the Colab


homepage, click "File" in the top-left corner.Choose
"New notebook" from the dropdown menu.

Name Your Notebook: Click on the default


notebook name (usually "Untitled") in the top-left
corner, and rename it as desired.

Start Coding: By default, a code cell appears. Type


Python code directly into this cell.To add more
cells, click "+ Code" (for code cells) or "+ Text" (for
markdown text).

Save the Notebook: Colab automatically saves


your work in Google Drive.You can also go to "File"
> "Save a copy in Drive" to keep an additional copy.

24
Business analytics (BA) encompasses a variety of techniques and
technologies that leverage data to assist organizations in making
informed decisions. It is an integral part of the broader business
intelligence domain, which emphasizes the collection, storage,
and analysis of data.

In Business analytics (BA) plays a vital role in economics by


enabling businesses to gain a deeper understanding of their
operations, customers, and market dynamics. This insight helps
businesses recognize opportunities, refine strategies, and reduce
risks. In essence, business analytics involves using quantitative
methods to derive meaningful insights from data that inform
decision-making.

There are four primary types of business analysis:


Descriptive: Analyzing past data to uncover trends and patterns.

Diagnostic: Examining historical data to understand the causes


of past events.

Predictive: Using statistical models to project future outcomes.

Prescriptive: Employing testing and analytical methods to


identify the most effective course of action in specific situations.

The overarching aim of business analytics is to extract significant


insights from data and represent them visually, thereby
facilitating informed and effective decision-making.

25
Business analytics plays a crucial role in providing
actionable insights that enable organizations to make
informed, data-driven decisions. It is vital for identifying
and addressing ongoing challenges that disrupt smooth
operations, such as workflow inefficiencies that lead to
delays. Additionally, business analytics evaluates how
resources are allocated and utilized, revealing potential
areas for cost reduction.

By examining market trends, businesses can discern


patterns that refine their strategies for better customer
targeting and enable quicker adaptations to shifts in
demand. Furthermore, business analytics offers essential
information about customer demographics and buying
behaviors. These insights can inform the creation of
tailored services and marketing initiatives, thereby
improving customer engagement and enhancing the
overall experience.

The main responsibility of business analytics professionals


is to collect and analyze data to support strategic decision-
making that influences the future direction of the
organization.

26
A Python library consists of a set of modules that
contain reusable code, making it easier for
developers to implement various functionalities in
their applications. By utilizing these libraries,
programmers can avoid redundant coding and
dedicate their efforts to more intricate programming
challenges. This is particularly advantageous in areas
such as Machine Learning, Data Science, and Data
Visualization.

How Python Libraries Work:

In environments like Google Colab, the use of Python


libraries is crucial. They provide robust tools for
conducting data analysis, executing machine learning
algorithms, and creating visual representations of
data. These libraries include optimized functions that
help save both time and resources, allowing
developers to tackle specific issues without the need
to build everything from scratch. This approach
enhances overall productivity and simplifies the
management of data-centric tasks.

27
Pandas
Pandas are a highly effective tool for data manipulation and
analysis, offering powerful structures like Data Frames and
Series that simplify working with structured data, especially
tabular data. It provides functionalities for cleaning,
transforming, and analyzing data, making it an essential
library for data science and machine learning tasks. Built on
Python, it is open-source, fast, flexible, and easy to use,
allowing users to perform complex data operations with
minimal code. With its ability to handle large datasets
efficiently and integrate seamlessly with other libraries,
Pandas has become one of the most widely used tools in the
field of data analysis.

Key Commands and Functions:


Importing Data:
import pandas as pd
data = pd.read_csv('data.csv') # Import data from a CSV
file
data = pd.read_excel('data.xlsx') # Import data from an
Excel file

Data Exploration:
data.head() # Display the first five entries of the
DataFrame
data.info() # Provide a summary of the DataFrame's
structure, including column names and data types
data.describe() # Generate descriptive statistics for
numerical columns

Business Analytics Applications:


Data Cleaning and Preparation to refine raw business
data for analysis.
Aggregation to create summary statistics by categories,
such as total sales by region.
Time-series analysis to study trends over time, such as
monthly or annual sales performance.

28
Numpy
NumPy, short for Numerical Python, is considered "the primary library for
scientific computing with Python" and serves as the foundation for libraries
like Pandas, Matplotlib, and Scikit-learn.

To begin using NumPy, it is typically imported as follows:

import numpy as np
Creating a DataFrame from a NumPy Array

You can easily create a DataFrame using a NumPy array:


data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
print(df)

Adding a New Column with a NumPy Array


To add a new column to the DataFrame, you can use another NumPy array:
df['D'] = np.array([10, 11, 12])
print(df)

Applying NumPy Functions to DataFrame Columns


NumPy functions can be applied directly to DataFrame columns for various
mathematical operations:
`
df['sqrt_A'] = np.sqrt(df['A']) # Calculating the square root of column 'A'
df['log_B'] = np.log(df['B']) # Calculating the natural logarithm of column 'B'
print(df)
`

Applications in Business Analytics


NumPy is widely used in business analytics for various purposes:

It enables efficient numerical calculations, making it suitable for handling


large datasets.
It supports matrix operations essential for linear algebra, which can be
crucial in predictive modeling, such as calculating regression coefficients.
Moreover, its seamless integration with Pandas enhances the capability for
advanced calculations within DataFrames.

29
Matplotlib
Matplotlib is an extensive library designed for creating static,
animated, and interactive visualizations in Python. It simplifies
simple tasks while making complex ones achievable. It offers
the ability to:

Produce publication-quality plots.


Generate interactive figures that allow zooming, panning,
and updating.
Tailor visual styles and layouts.
Export in various file formats.
Integrate with JupyterLab and Graphical User Interfaces.
Key Commands and Functions:
Creating Basic Plots:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6]) # Line plot
plt.show() # Display the plot

Customizing Plots:
plt.title('Sales Over Time')
plt.xlabel('Months')
plt.ylabel('Sales')
plt.legend(['Product A', 'Product B'])
plt.grid(True)

Bar, Scatter, and Histogram:


plt.bar(['A', 'B', 'C'], [10, 20, 15]) # Bar chart
plt.scatter([1, 2, 3], [4, 5, 6]) # Scatter plot
plt.hist(data['column_name'], bins=10) # Histogram

Applications in Business Analytics:

Data Visualization to showcase trends, distributions, and


comparisons.
Exploratory Data Analysis (EDA) for visually understanding
data patterns and outliers.
Time-Series Visualization to track changes over time, such
as revenue trends or seasonal patterns.

30
Seaborn
SSeaborn is a powerful Python data visualization library built on Matplotlib
that enhances its capabilities by providing a simpler interface for creating
more aesthetically pleasing and complex statistical plots. It is designed to
make it easy to generate informative and attractive visualizations, especially
for statistical data, such as correlation heatmaps, regression plots, and
distribution charts. Seaborn integrates well with Pandas data structures,
making it ideal for visualizing data stored in DataFrames. Additionally, it
supports a variety of plot types, customization options, and color palettes,
which makes it a popular choice for data analysts and scientists aiming to
create insightful and visually appealing graphics with minimal effort.

Key Commands and Functions:Basic Visualizations with Seaborn:

To start using Seaborn, you would typically import it:


`import seaborn as sns`

To create a line plot that shows trends over time, you can use:
`sns.lineplot(x='month', y='sales', data=data)`

For visualizing categorical data, a bar plot can be generated with:


`sns.barplot(x='category', y='sales', data=data)`

Advanced Visualization Techniques:


To explore data distributions across different categories, a box plot can
be utilized:
`sns.boxplot(x='category', y='value', data=data)`

To analyze relationships between multiple variables, a pair plot is useful:


`sns.pairplot(data)`

Applications in Business Analytics:

Statistical Data Visualization: Seaborn is instrumental in examining


distributions and relationships, which is crucial for understanding feature
correlations.
Categorical Data Analysis: It effectively compares metrics across
different groups or categories, aiding in decision-making processes.

31
Sklearn
Scikit-Learn, commonly referred to as sklearn, is a comprehensive
library tailored for machine learning tasks. It offers a suite of tools for
constructing models, preprocessing datasets, and assessing their
performance.
Fundamental Functions and Commands:

*Data Preparation:**
To prepare your data for modeling, you can use the following code
snippet:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

This function divides the dataset into training and testing sets, which
is crucial for training models effectively and validating their
performance.

*Model Development:**
When building a model, you might use:

from sklearn.linear_model import LinearRegression


model = LinearRegression()
model.fit(X_train, y_train) # Train the model using the training data

In this example, a linear regression model is instantiated and trained


on the training dataset.

*Performance Evaluation:**
from sklearn.metrics import mean_squared_error, accuracy_score
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred) # Calculates mean
squared error for regression tasks
accuracy = accuracy_score(y_test, y_pred) # Determines accuracy for
classification tasks

This part of the code helps to assess how well the model performs by
predicting outcomes on the test data and calculating relevant
performance metrics.
32
Steps to make a
collab file
1. Open google Collab to create a new notebook

2. Import library by writing the command - import pandas


as pd. (We can also create data at own)

3.Import data - Here we have taken data on diabetes from


GitHub profile of ybi foundation.

4. Then we use the describe, head and info function to get more
details about the data we have with us.

Describe function gives quick summary of the statistics for


numerical columns in a dataFrame.

33
The head function is utilized to showcase the initial rows of a
DataFrame, offering a swift glimpse into its contents and structure.

On the other hand, the info function delivers a brief overview of the
DataFrame, highlighting the number of non-null entries, the data
types for each column, and memory usage. This function is also
instrumental in identifying missing values and gaining insight into
data types.

34
5. Defining target i.e y variable and features i.e x variable
If y variable is continuous, we use Regression in our model
and if y variable is categorical, we use Classification in our
model.

6. When working with data, it's important to split it into two parts:
training samples and testing samples. The training samples are
used to teach the model, helping it learn the different features
and patterns present in the data. On the other hand, the testing
samples, which consist of the remaining data, are used to evaluate
how well the model performs.

To make this division easier, you can use the train_test_split


function from the sklearn.model_selection library. This function
helps separate your dataset into training and testing sets,
ensuring that the model learns from one part of the data while its
performance is measured on another. This approach is crucial for
building a reliable model that can generalize well to new, unseen
data.

7.When it comes to working with our categorical data, we need to


choose the right modeling approach. In this case, we’ll go with
classification modeling techniques. Specifically, we'll use logistic
regression, which is a great fit for our needs. We'll implement this
using the sklearn library, making it easier to handle our data and
build our model effectively.

35
8. Using the trained model to make predictions is all about
applying what it has learned from past data to new situations. It
looks for patterns and insights in the information it has already
processed and uses that knowledge to generate meaningful
outputs when faced with fresh data.

9. Verify the model's accuracy or the level of error present.


The function yields a value ranging from 0 to 1, where 1
indicates that all predictions are accurate, and 0 signifies that
none are. To express this as a percentage, simply multiply by
100.
It is evident that the data utilized in this case shows an
accuracy of nearly 76%.

36
10. We can plot the model using matplotlib simply by
importing matplotlib library and using various
commands.
The plot function in Matplotlib is used to create line
plots. It is one of the most commonly used functions for
visualizing data in a 2D space. Here’s a detailed look at its
uses and features:

37
11. Seaborn library allows to create informative
charts and graphs including histograms, lineplots,
bargraphs, scatterplots etc .
Here we have taken a lineplot and histogram.

38
11. Performing ols

39
Skills
Developed
Throughout my internship at YBI Foundation, I had the
opportunity to work with various analytical tools and
software that significantly enhanced my data analysis skills.
Notably, I became proficient in Google collab and Python
each of which played a crucial role in different aspects of my
analytical tasks.

Google collab was my primary tool for data manipulation and


analysis. I utilized its functionalities to perform complex
calculations, create or import data and generate charts that
visualized key metrics.
For instance, I analyzed survey data collected from young
entrepreneurs and was posted at YBI's GitHub profile, to
identify trends in their resource utilization and know data
accuracy. By employing Collab's data sorting and filtering
capabilities, I was able to get the required outcome.This
analysis not only informed YBI Foundation’s strategic
planning but also shaped new initiatives aimed at addressing
these challenges.

Additionally, I explored Python for more advanced data


analysis tasks, particularly in handling larger datasets and
performing statistical analyses. By using libraries such as
Pandas and Matplotlib, I conducted regression analyses to
forecast trends based on historical data. This capability
allowed me to present evidence-based recommendations.

40
OLS regression is a popular tool in business analytics, utilized
for a variety of purposes, including predicting sales, analyzing
customer behavior, and evaluating the effectiveness of
marketing campaigns.

For instance, consider a simple example where a business


wants to understand the relationship between the hours of
training employees receive and their subsequent sales
performance. Using OLS regression, the company can analyze
the data to see if increasing training hours correlates with
higher sales. If the analysis shows a positive relationship, the
business may decide to invest more in employee training,
ultimately aiming to boost sales performance.

Skill developed through this internship includes :


1. Technical skills
Data manipulation and management programming- I
learned to work with data in diverse formats, using Python
(Pandas, NumPy) for transforming and organizing data.
These skills are foundational for preparing raw data into
structured formats, enabling efficient analysis. It gives
gain hands-on experience in handling data, merging
datasets, and preparing data for deeper analytical tasks.

Statistical analysis - Statistical analysis skills help me


understand data patterns, test hypothesis, and validate
findings. Through tools like Python (matplotlib,
statsmodels), I learned essential statistical techniques.
These skills enable interns to draw meaningful inferences
from data and support data-driven decision-making with
statistically valid insights.

41
Data Visualizations: Mastering data visualization
skills is vital for transforming complex information
into an easily understandable visual format.
Utilizing Python libraries like Matplotlib and
Seaborn enables the creation of charts, graphs, and
dashboards. These visual tools simplify the
presentation of findings, allowing for the
translation of intricate data insights into clear and
actionable conclusions.
Machine Learning: Developing machine learning
skills involves using algorithmic analysis of data
trends and generate predictions. Leveraging
libraries such as Scikit-learn and Statsmodels in
Python allows me to work with models like
regression and OLS. This practical experience aids
in building predictive models and understanding
evaluation metrics such as accuracy, precision, and
recall, which are crucial for making informed, data-
driven predictions.
Big Data Tools: Big data tools are indispensable for
managing and processing extensive datasets,
particularly in scenarios where data volume
surpasses the limits of traditional databases.
Familiarity with analytical tools and cloud
platforms like Google Colab facilitates distributed
data processing, making it easier to manage,
analyze, and extract insights from large data sets,
ultimately preparing them for large-scale analytics
tasks.

42
2. Analytical Skills
Problem-Solving: Problem-solving skills are critical in business
analytics, as they enable us to approach data-related challenges
systematically. We learn to define business problems clearly, explore
potential solutions, and apply analytical methods to test hypotheses.
This skill is essential for tackling real-world issues, as it enables interns
to break down complex scenarios into manageable steps and find
practical, data-driven solutions.

Critical Thinking: Critical thinking helps asses data objectively,


considering different perspectives and questioning assumptions.
Learning to evaluate data quality, recognize potential biases, and
weigh the validity of their findings. This skill is invaluable in analytics,
as it enables us to ensure their analyses are accurate, relevant, and
aligned with business objectives.

Statistical Analysis & Modeling: Beyond fundamental statistical


knowledge, we cultivate modeling skills that enable us to create
predictive and descriptive models to explain and forecast data
behavior. These skills are instrumental in uncovering patterns in data
and making predictions that can guide business strategy.

Data Interpretation: Data interpretation is the ability to make sense of


analysis outcomes and understand what they imply for the business.
We learn to interpret data patterns, correlations, and anomalies, and
present these findings in a way that aligns with business goals. This skill
ensures that their insights are meaningful and can directly inform
decision-making, helping stakeholders understand the implications of
data insights for the business.

43
3.Economic skills
Econometrics - Econometric models help interns examine
relationships between variables, such as price and demand,
or income and spending habits. For instance, we can analyze
how interest rates affect consumer spending, employment,
or investment decisions. Also it allows model Building and
Regression Analysis, particularly regression models (linear
and multiple regression), to interpret data, identify trends,
and understand economic behaviors.

Forecasting and Predictive Analysis - Forecasting and


predictive analysis skills are key for anticipating economic
trends, which can shape business strategy.Predictive analysis
allows us to apply machine learning and econometric models
to estimate future economic outcomes based on historical
data, to develop models that support business decision-
making by anticipating market trends, pricing strategies, and
potential risks.

Consumer Behavior Analysis- Understanding consumer


behavior is vital for developing targeted marketing and
pricing strategies. We can analyze consumer data to uncover
purchasing trends, preferences, and price sensitivity. This
analysis helps businesses tailor products, adjust pricing, and
design marketing campaigns that resonate with specific
customer segments. Also, consumer behavior analysis often
includes examining price elasticity, allowing us to evaluate
how changes in price affect consumer demand. These
insights help companies optimize pricing strategies and
maximize revenue.

44
Conclusion
My internship at the YBI Foundation was
invaluable for advancing my skills in data analysis
and business strategies. Through hands-on
projects and mentorship, I developed a stronger
command of analytics tools and learned to
transform raw data into actionable insights.

I am grateful for the opportunity to contribute to


YBI's mission and support decision-making
through data-driven solutions. The experience has
solidified my interest in business analytics, and I
look forward to applying what I have learnt to
make an impact contribution in my future career.

With collab, we as economists can write an


executive Python code in a cloud-based
notebook, which is accessible from anywhere and
acquires no setup. This ease of assessment
supports collaboration, allowing multiple
researchers to work together on data and share
findings in real-time.

Overall, python in Colab has become a powerful


platform for data-driven economic insights and
research. The use of Python libraries is
particularly suited to economics.

45
Pandas and NumPy are powerful libraries that enable efficient
handling of extensive datasets, making it easier to analyze
complex economic indicators such as GDP, inflation rates, and
employment statistics. For constructing econometric models,
which are vital for economic predictions and testing hypotheses,
scikit-learn provides essential tools for regression analysis and
machine learning applications. Additionally, visualization libraries
like Matplotlib and Seaborn allow economists to produce clear
and insightful graphs that illustrate trends and patterns in
economic data. These tools also facilitate the export of results
and the creation of interactive reports, thereby enhancing the
transparency and verifiability of economic research.

During this experience, I gained significant understanding of the


importance of data-driven decision-making and the critical role
that analytics play in achieving organizational objectives. The
projects I worked on not only challenged my analytical skills but
also enabled me to apply theoretical knowledge in a practical
environment, effectively bridging the gap between academic
learning and industry practice.

I am deeply thankful to my mentors at YBI Foundation, whose


support and guidance were instrumental throughout my
internship. Their constructive feedback and encouragement
greatly enhanced my learning experience and inspired me to
venture beyond my comfort zone. I also want to extend my
gratitude to the YBI Foundation for this opportunity, which has
been vital for my professional development. The organization
fosters an environment of growth, collaboration, and innovation,
and I am proud to have been a part of it.

46
References
Alteryx. (2021). A beginner’s guide to
information Analytics in business.

Davenport, T. H., & Harris, J. G. (2017).


Competing on
Analytics: the new technological know-how of
winning.

Dua, A., & Dutta, A. (2020). commercial


enterprise Analytics: principles, strategies, and
applications.

Evans, J. R., & Lindner, C. (2021). commercial


enterprise Analytics:techniques, fashions, and
decisions.

www.pythoninstitute.org

GitHub.blog

yourstory.com

geeksforgreeks.com

47
Shmueli, G., & Koppius, O. (2011). Predictive
Analytics in Information Systems Research.
MIS Quarterly, 35(3), 553-572.
doi:10.2307/23042796

Tableau Software. (2020). Data Visualization:


The New Language of Business. Retrieved
from Tableau

Waller, M. A., & Fawcett, S. E. (2013). Data


Science, Predictive Analytics, and Big Data: A
Revolution that Will Transform Supply Chain
Design and Management.

Journal of Business
Logistics, 34(2), 77-84. doi:10.1111/jbl.12010
Witten, I. H., Frank, E., & Hall, M. A. (2016).

Data Mining: Practical Machine Learning


Tools and Techniques. Morgan Kaufmann.

48

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy