0% found this document useful (0 votes)
119 views27 pages

Internship Report

The document describes an internship report submitted by a student named Yashwanth R. for their Bachelors of Engineering in Computer Science. The report provides details of Yashwanth's internship at Macrolytics, an IT solutions company, where they learned about data science, Python programming, databases, machine learning, and business intelligence. The various sections of the report document Yashwanth's work and learnings during the internship period at Macrolytics.

Uploaded by

ALL STAR channel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views27 pages

Internship Report

The document describes an internship report submitted by a student named Yashwanth R. for their Bachelors of Engineering in Computer Science. The report provides details of Yashwanth's internship at Macrolytics, an IT solutions company, where they learned about data science, Python programming, databases, machine learning, and business intelligence. The various sections of the report document Yashwanth's work and learnings during the internship period at Macrolytics.

Uploaded by

ALL STAR channel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI – 590018, Karnataka

INTERNSHIP REPORT
ON

“DATA SCIENCE”
Submitted in partial fulfilment for the award of degree(21CSI85)

BACHELOR OF ENGINEERING IN
Computer Science Engineering
Submitted by:
NAME :YASHWANTH.R

USN:1AM21CS221

AMC ENGINEERING COLLEGE


Department of Computer Science Engineering
Accredited by VTU, New Delhi
Bannerghatta Road,Bangalore-83

Internship report 2022-2023 1


AMC ENGINEERING COLLEGE
Department of Computer Science Engineering
Accredited by NBA, New Delhi
Bannerghatta Road Bangalore-83

CERTIFICATE

This is to certify that the Internship titled “Data Science” carried out by YASHWANTH.R
a bonafide student of AMC Engineering Institute of Technology, in partial fulfillment for
the award of Bachelor of Engineering, in CSE under Visvesvaraya Technological
University, Belagavi, during the year 2022-2023. It is certified that all
corrections/suggestions indicated have been incorporated in the report.

The project report has been approved as it satisfies the academic requirements in respect
of Internship prescribed for the course Internship / Professional Practice

Signature of Guide Signature of HOD Signature of Principal

External Viva:

Name of the Examiner Signature with Date

1)

2)

Internship report 2022-2023 2


D E C LARAT I O N

I YASHWANTH.R, 3rd year student of Computer science engineering, AMC


ENGINNERING COLLEGE , declare that the Internship has been
successfully completed, in Company of Macrolytics. This report is submitted
in partial fulfillment of the requirements for award of Bachelor Degree in
CSE, during the academic year 2022-2023.

Date :03-12-23 :
Place: Rajarajeshwarinagar, Bangalore
USN : 1AM21CS221
NAME : YASHWANTH.R

Internship report 2022-2023 3


OFFER LETTER PROVIDED BY THE COMPANY

Internship report 2022-2023 4


AC K N OWLE D G EM ENT

This Internship is a result of accumulated guidance, direction and support of several important
persons. We take this opportunity to express our gratitude to all who have helped us to
complete the Internship.

We express our sincere thanks to our Principal, for providing usadequate facilities to undertake
this Internship.

We would like to thank our Head of Dept – branch code, for providing us an opportunity to
carry out Internship and for his valuable guidance and support.

We would like to thank our Software Services for guiding us during the period of internship.

We express our deep and profound gratitude to our guide, Guide name, Assistant/Associate
Prof, for her keen interest and encouragement at every step in completing the Internship.

We would like to thank all the faculty members of our department for the support extended
during the course of Internship.

We would like to thank the non-teaching members of our dept, for helping us during the
Internship.

Last but not the least, we would like to thank our parents and friends without whose constant
help, the completion of Internship would have not been possible.

Internship report 2022-2023 5


ABSTRACT
This report presents a comprehensive analysis of [Your Report Topic]. The study aims to [state
the main objectives or goals of the report]. Through [methodology/approach], we [briefly
describe how the research was conducted]. The findings reveal [key findings or results], shedding
light on [important insights or implications].

The report begins with an introduction to the [background/context] and a review of relevant
literature. Methodological details are outlined to provide transparency and context for the results.
The main body of the report discusses key findings, [subtopics], and their significance.

Key highlights of the report include [noteworthy findings or conclusions]. Recommendations for
[potential actions or future research] are provided based on the insights gained. The implications
of this research extend to [relevant stakeholders or fields].

Overall, this report contributes to the understanding of [Your Report Topic] and provides
valuable insights for [target audience or industry]. It serves as a foundation for further exploration
and advancements in [related areas].

Internship report 2022-2023 6


Table of Contents
Sl.no Description Page no

1 Company Profile 8

2 About the Company 9

3 Introduction 11

4 System Analysis 11

5 Basic of Python Programming 13

6 Essential Libraries For Data science 15

7 Database Design and Introduction to 17


MySQL
8 Business Intelligence Process 20

9 Basic Visualization in Tableau 22

10 Introduction To Machine Learning 23


11 Introduction To Machine Learning 24
12 Introduction To Big Data 26

Internship report 2022-2023 7


1. COMPANY PROFILE
A Brief History of Macrolytics
Macrolytics, was incorporated with a goal ”To provide high quality and optimal Technological
Solutions to business requirements of our clients”. Every business is a different and has a unique
business model and so are the technological requirements. They understand this and hence the
solutions provided to these requirements are different as well. They focus on clients
requirements and provide them with tailor made technological solutions. They also understand
that Reach of their Product to its targeted market or the automation of the existing process into
e-client and simple process are the key features that our clients desire from Technological
Solution they are looking for and these are the features that we focus on while designing the
solutions for their clients.

Macrolytics, strive to be the front runner in creativity and innovation in software development
through their well-researched expertise and establish it as an out of the box software
development company in Bangalore, India. As a software development company, they translate
this software development expertise into value for their customers through their professional
solutions.

They understand that the best desired output can be achieved only by understanding the clients
demand better. Macrolytics work with their clients and help them to defiine their exact solution
requirement. Sometimes even they wonder that they have completely redefined their solution
or new application requirement during the brainstorming session, and here they position
themselves as an IT solutions consulting group comprising of high caliber consultants.

They believe that Technology when used properly can help any business to scale and achieve
new heights of success. It helps Improve its efficiency, profitability, reliability; to put itin one
sentence ” Technology helps you to Delight your Customers” and that is what we want to
achieve.

Internship report 2022-2023 8


2. ABOUT THE COMPANY

Macrolytics Technology is a Technology Organization providing solutions for all web design
and development, MYSQL, PYTHON Programming,HTML,CSS,ASP,CAT,NET and LINQ.
Meeting the ever increasing automation requirements, Macrolytics Technology specialize in
ERP ,Connectivity,SEO Services,Conference Management effective webpromotion and
tailor-made software products, designing solutions best suiting clients requirements. The
organization where they have a right mix of professionals as a stakeholders to help us serve
our clients with best of our capability and with at par industry standards.They have
young, enthusiastic, passionate and creative Professionals to develop technological
innovations in the field of Mobile technologies, Web applications as well as Business and
Enterprise solution. Motto of our organization is to “Collaborate with our clients to provide
them with best Technological solution hence creating Good Present and Better Future for our
client which will bring a cascading a positive effect in their business shape as well”.
Providing a Complete suite of technical solutions is not just our tag line, it is Our Vision for
Our Clients and for Us, We strive hard to achieve it.

Internship report 2022-2023 9


Departments and services offered

Macrolytics Technology plays an essential role as an institute, the level of education,


development of student’s skills are based on their trainers. If you do not have a good mentor
then you may lag in many things from others and that is why we at Macrolytics Technology
gives you the facility of skilled employees so that you do not feel unsecured about the
academics. Personality development and academic status are some of those things which lie on
mentor’s hands. If you are trained well then you can do well in your future and knowing its
importance of Macrolytics Technology always tries to give you the best.

They have a great team of skilled mentors who are always ready to direct their trainees in the
best possible way they can and to ensure the skills of mentors we held many skill development
programs as well so that each and every mentor can develop their own skills with the demands
of the companies so that they can prepare a complete packaged trainee.

Services provided by Macrolytics Technology.


• Core Java and Advanced Java

• Web services and development

• Dot Net Framework

• Python

• Selenium Testing

• Conference / Event Management Service

• Academic Project Guidance

• On The Job Training

• Software Training

• CAD Automation

Internship report 2022-2023 10


3.INTRODUCTION
In the era of rapid technological advancement, the convergence of data science and cloud
computing has revolutionized the way organizations harness and analyze vast amounts of data.
Data science, the interdisciplinary field that combines statistical analysis, machine learning, and
domain expertise, empowers businesses to extract valuable insights from their data.
Simultaneously, cloud computing platforms, such as Amazon Web Services (AWS), have
emerged as indispensable tools, providing scalable and flexible infrastructure to support the ever-
growing demands of data-intensive processes.

This report delves into the symbiotic relationship between data science and AWS, exploring how
the integration of cutting-edge data science techniques with AWS's robust cloud services
enhances the efficiency, scalability, and accessibility of data-driven solutions. As organizations
increasingly recognize the strategic importance of leveraging their data assets, the synergy
between data science and AWS becomes a pivotal driver for innovation, agility, and competitive
advantage in today's dynamic business landscape.

Throughout this report, we will navigate the key components of data science workflows on AWS,
examining the tools, services, and best practices that facilitate the seamless development,
deployment, and management of data-driven applications. From data ingestion and storage to
advanced analytics and machine learning model deployment, we will explore how AWS enables
organizations to build end-to-end data solutions that optimize decision-making processes and
uncover hidden patterns within their data.

As we embark on this exploration, it is essential to recognize the transformative impact of the


data science and AWS integration, not only on individual businesses but on industries as a whole.
The collaborative power of data science and AWS not only empowers organizations to make
data-driven decisions but also fosters innovation, accelerates time-to-market, and enhances the
overall agility of enterprises in the digital age.

4.SYSTEM ANALYSIS

System analysis is a critical phase in the development of information systems that involves
studying and understanding the current system, identifying problems or areas for improvement,
and defining the requirements for a new or enhanced system. This process is essential for
designing effective and efficient solutions that align with organizational goals. Here's an
overview of key aspects involved in system analysis:

Understanding the Current System:

Scope Definition: Clearly defining the boundaries of the system under consideration and
understanding its interfaces with external entities.
Stakeholder Identification: Identifying and involving all relevant stakeholders, including end-
users, managers, and IT professionals.
Problem Identification and Definition:

Internship report 2022-2023 11


Gathering Requirements: Collecting and documenting the functional and non-functional
requirements of the system.
Feasibility Study: Assessing the technical, operational, and economic feasibility of implementing
a new system.

Data Analysis:

Data Collection: Identifying and documenting the data sources, formats, and structures.
Data Modeling: Creating data models (e.g., Entity-Relationship Diagrams) to represent the
relationships between different data entities.
Process Analysis:

Process Modeling: Creating process flow diagrams or flowcharts to represent the workflow and
interactions within the system.
Identifying Bottlenecks: Analyzing the current processes to identify inefficiencies or bottlenecks.

Object-Oriented Analysis (OOA):

Object Identification: Identifying and modeling objects in the system, along with their attributes
and behaviors.
Use Case Analysis: Identifying and documenting use cases that describe the interactions between
users and the system.
Prototyping and Mockups:

Prototyping: Developing prototypes or mockups to visualize the proposed system and gather
feedback from stakeholders.
User Interface Design: Designing the user interface based on user requirements and usability
principles.

System Requirements Specification:

Functional Requirements: Defining the functions and features the system must provide.
Non-functional Requirements: Specifying criteria related to performance, security, and other
quality attributes.

Documenting and Reporting:

System Analysis Report: Compiling the findings, requirements, and proposed solutions into a
comprehensive report.
Communication: Effectively communicating the analysis results to stakeholders for validation
and feedback.

System analysis is a dynamic and iterative process, often involving collaboration between
analysts, end-users, and other stakeholders. It lays the foundation for successful system design
and development, ensuring that the resulting information system meets the needs of the
organization and its users.

Internship report 2022-2023 12


5.Basics of Python Programming
• History: Python was created by Guido van Rossum and first released in 1991.
• Philosophy: Python follows the principles of simplicity, readability, and versatility, often
summarized as the "Zen of Python."
• Installation: You can download and install Python from the official website
(https://www.python.org/).
Python comes in different versions (e.g., Python 2.x and Python 3.x).
•Python Syntax: Python uses indentation for code blocks (spaces or tabs) instead of
braces {}.This enforces code readability.Statements do not need to end with a semicolon
• Data Types:
Numeric Types: Integers, Floats, Complex numbers.
Text Type: Strings.
Sequence Types: Lists, Tuples, and Range.
Mapping Type: Dictionary.
Set Types: Set, Frozenset.
Boolean Type: Bool.
• Variables and Assignments:
Variables are created by assigning a value to them.
Python is dynamically typed, meaning the type of a variable is inferred at runtime.
• Control Flow: Conditional Statements: if, elif, else.
Looping Statements: for, while.
• Functions:
Functions are defined using the def keyword.
Functions can have parameters and return values.
• Lists and Dictionaries:
Lists are ordered, mutable sequences.
Dictionaries are unordered collections of key-value pairs.
• File Handling:
Python provides easy-to-use tools for reading from and writing to files.
• Exception Handling:
Python has a try-except block for handling exceptions.
• Modules and Packages:
Modules are Python files that consist of Python code and definitions.
Packages are a way of organizing related modules into a single directory hierarchy.
• Object-Oriented Programming (OOP):
Python supports OOP principles, including classes and inheritance.
• Libraries and Frameworks:
Python has a rich ecosystem of libraries and frameworks, such as NumPy Pandas for data
analysis, Flask and Django for web development, and more.
• Popular Python IDEs:
IDLE (comes with Python installation)
PyCharm
Jupyter Notebooks
VSCode
• Community and Resources:
Python has a large and active community.
Documentation: https://docs.python.org/
Online tutorials and forums are widely available.
Internship report 2022-2023 13
• Testing and Debugging:
Python has built-in testing frameworks like unittest and pytest.
Debugging can be done using tools like pdb

Assignment
Complete four GeeksforGeeks website problems

SNAPSHOTS

Internship report 2022-2023 14


6.Essential Libraries For Data science

NumPy
Key Features:

• Provides the ndarray object for efficient array manipulation.


• Mathematical functions, broadcasting, and vectorization enhance performance.
• Linear algebra, random module, and integration with other libraries contribute to its
versatility.
Use Cases:
• Widely used in scientific computing, data analysis, and machine learning.
• Essential for tasks like signal and image processing.
Community and Documentation:
• Active community support with comprehensive documentation.

Pandas
Key Features:
• DataFrame and Series are core data structures for structured data.
• Data cleaning, exploration, and manipulation functionalities.
• Time series data support, merging, joining, and I/O operations.
Use Cases:
• Data cleaning, preprocessing, and exploratory data analysis.
• Essential for statistical analysis, time series analysis, and data wrangling.
Community and Documentation:
• Large and active community support with comprehensive documentation.

Conclusion:
Python, NumPy, and Pandas together form a powerful trio for programming, scientific
computing, and data manipulation. Python's simplicity, combined with the array manipulation
capabilities of NumPy and the versatile data structures of Pandas, makes them indispensable tools
for a wide range of applications. Whether you're a programmer, data scientist, or analyst, these
tools empower you to efficiently work with data and solve complex problems. The active
communities and extensive documentation further contribute to their widespread adoption and
continuous improvement. As you delve deeper into these tools, you'll unlock even more
possibilities for creativity and innovation in your projects.**

Internship report 2022-2023 15


Assignment

SNAPSHOTS

Internship report 2022-2023 16


7.Database Design and Introduction to MySQL
Database Design:

Database design is a crucial step in creating efficient and effective databases. It involves defining
the structure that will store and manage data.
Entity-Relationship (ER) Modeling:

ER modeling is a common technique for database design. It identifies entities, attributes, and
relationships to create a visual representation of the database structure.
Normalization:

Normalization is the process of organizing data to eliminate redundancy and dependency,


ensuring data integrity and minimizing anomalies.
Denormalization:

While normalization is important, denormalization may be applied for performance optimization


in certain scenarios, striking a balance between efficiency and data integrity.
Keys and Indexing:

Primary and foreign keys establish relationships between tables. Indexing enhances query
performance by allowing faster data retrieval.
Data Types and Constraints:

Choosing appropriate data types for columns and applying constraints (e.g., NOT NULL,
UNIQUE, CHECK) ensures data accuracy and consistency.

MySQL:
MySQL is an open-source relational database management system (RDBMS) widely used for
web applications and various software development projects.
Features:
• MySQL supports ACID properties (Atomicity, Consistency, Isolation, Durability) to
ensure reliability and data integrity.
• It provides a comprehensive set of SQL commands for database manipulation and
querying.
Data Definition Language (DDL):
• DDL statements in MySQL are used to define the database structure, including creating
and altering tables, specifying constraints, and defining indexes.
Data Manipulation Language (DML):
• DML statements are used for data manipulation operations such as SELECT, INSERT,
UPDATE, and DELETE.
Transactions:
• MySQL supports transactions, allowing multiple operations to be treated as a single,
atomic unit, ensuring consistency in the database.
User Management:
• MySQL allows the creation and management of multiple users with different levels of
access privileges, enhancing security.
Internship report 2022-2023 17
Storage Engines:
• MySQL supports multiple storage engines, such as InnoDB and MyISAM, each with its
own advantages and use cases.
Community and Documentation:
• MySQL has a large and active community providing support and resources. The official
documentation is comprehensive, offering guides, tutorials, and a detailed reference.
Integration with Programming Languages:
• MySQL integrates seamlessly with various programming languages, making it a popular
choice for developers working on diverse projects.

Use Cases:

Web Development:
• MySQL is extensively used in web development for storing and retrieving data from
websites and web applications.
Enterprise Applications:
• MySQL is employed in enterprise-level applications where data integrity and reliability
are paramount.
Data Warehousing:
• It is suitable for data warehousing applications where large volumes of data need to be
efficiently stored and queried.

Assignment

Snapshots

Internship report 2022-2023 18


Internship report 2022-2023 19
8.Business Intelligence Process
Business Intelligence is a set of technologies, processes, and tools that help organizations collect,
analyze, and present business information to support decision-making. The BI process involves
various stages to transform raw data into actionable insights. Here is an overview of the typical
steps in a Business Intelligence process:

1. Data Collection:
Gather data from various sources, including databases, spreadsheets, cloud services, IoT devices,
and external data providers.
Extraction, Transformation, and Loading (ETL):
Use ETL processes to extract data from source systems, transform it into a consistent format, and
load it into a data warehouse or data mart.
2. Data Storage:
Data Warehouse:
Store structured, organized data in a centralized repository, such as a data warehouse, to facilitate
efficient querying and reporting.
Data Mart:
Create data marts for specific business units or departments to provide focused subsets of data
tailored to their needs.
3. Data Processing:
Data Modeling:
Design data models to represent the relationships between different data entities, ensuring data
integrity and supporting efficient queries.
Data Cubes:
Use multidimensional data cubes to organize data along multiple dimensions, facilitating
multidimensional analysis.
4. Data Analysis:
Query and Reporting:
Develop queries and reports to retrieve and present data in a readable format for analysis.
OLAP (Online Analytical Processing):
Utilize OLAP tools to interactively analyze multidimensional data, enabling users to explore and
drill down into information.
5. Data Visualization:
Dashboards:
Create interactive dashboards that visualize key performance indicators (KPIs) and critical metrics
to facilitate quick decision-making.
Charts and Graphs:
Use various charts, graphs, and visual elements to represent data trends and patterns.

Internship report 2022-2023 20


6. Business Intelligence Analysis:
Ad-Hoc Analysis:
Allow users to perform ad-hoc analysis by providing tools that enable them to explore data on their
own.
Predictive Analytics:
Incorporate predictive analytics to forecast future trends and outcomes based on historical data.
7. Decision Making:
Reporting Tools:
Provide decision-makers with access to real-time or scheduled reports containing relevant insights.
Alerts and Notifications:
Implement alerts and notifications to notify stakeholders of significant changes or exceptions in
the data.
8. Continuous Improvement:
Feedback Loop:
Establish a feedback loop to collect user feedback and continuously refine the BI system based on
evolving business needs.
Performance Monitoring:
Monitor the performance of the BI system, identifying areas for improvement in data quality,
processing speed, and user experience.
9. Deployment and Maintenance:
User Training:
Provide training to end-users to ensure they can effectively use the BI tools and interpret the
information presented.
System Maintenance:
Regularly update and maintain the BI system, addressing issues, adding new features, and adapting
to changing business requirements.

Internship report 2022-2023 21


9.Basic Visualization in Tableau
Creating a basic visualization in Tableau involves importing data, selecting a chart type, and
customizing the visualization to convey meaningful insights. Below, I'll provide a step-by-step
guide to create a simple bar chart using Tableau.
Step 1: Install and Open Tableau
Ensure Tableau Desktop is installed on your machine.
Open Tableau Desktop.
Step 2: Connect to Data
In Tableau, click on "Connect" to import your data.
Choose the data source (Excel, CSV, database, etc.) and select the file or database table.
Step 3: Explore Data
Tableau will display a preview of your data. Review the columns and rows to understand the
structure of your dataset.
Step 4: Drag and Drop Dimensions and Measures
In the left sidebar, you will see Dimensions (categorical data) and Measures (numeric data).
Drag a categorical Dimension (e.g., "Category") to the Rows shelf.
Drag a numeric Measure (e.g., "Sales") to the Columns shelf.
Step 5: Choose Visualization Type
Tableau will automatically create a visualization based on the chosen dimensions and measures.
You can change the chart type by selecting from the "Show Me" menu on the right.

Internship report 2022-2023 22


10.Introduction To Machine Learning

Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on the
development of algorithms and statistical models that enable computer systems to perform tasks
without explicit programming. The core idea behind machine learning is to empower machines to
learn patterns and insights from data, improving their performance and decision-making
capabilities over time.
Machine learning can be categorized into three main types:
Supervised Learning:
In supervised learning, the algorithm is trained on a labeled dataset, where each input is
associated with a corresponding output. The goal is to learn a mapping from inputs to outputs,
allowing the algorithm to make predictions on new, unseen data.
Unsupervised Learning:
Unsupervised learning involves working with unlabeled data, where the algorithm aims to
discover inherent patterns or structures without explicit guidance. Clustering and dimensionality
reduction are common tasks in unsupervised learning.
Semi-supervised Learning:
Semi-supervised learning is a machine learning paradigm that falls between supervised learning
and unsupervised learning. In this approach, the algorithm is trained on a dataset that contains
both labeled and unlabeled examples. While a portion of the data is explicitly labeled with
corresponding outputs, the majority of the data remains unlabeled.
Reinforcement Learning:
Reinforcement learning involves an agent interacting with an environment and learning to make
decisions by receiving feedback in the form of rewards or penalties. The agent aims to maximize
cumulative rewards over time.

Internship report 2022-2023 23


11.Introduction To Cloud and AWS
Cloud Computing:
Cloud computing is a technology paradigm that enables access to a shared pool of computing
resources over the internet. Instead of relying on local servers or personal devices to handle
applications, users can leverage cloud services to store and process data, run applications, and
perform various computing tasks. Cloud computing offers flexibility, scalability, cost-
effectiveness, and accessibility, making it a popular choice for businesses and individuals.
Characteristics of Cloud Computing:
On-Demand Self-Service:
Users can provision and manage computing resources as needed, without requiring human
intervention from the service provider.
Broad Network Access:
Cloud services are accessible over the internet from a variety of devices, such as laptops, tablets,
and smartphones.
Resource Pooling:
Computing resources, such as processing power, storage, and memory, are pooled and shared
among multiple users, optimizing resource utilization.
Rapid Elasticity:
Cloud resources can be quickly scaled up or down to accommodate changing workloads,
providing flexibility and responsiveness.
Measured Service:
Cloud usage is metered and billed based on consumption, allowing users to pay for only the
resources they use.
Amazon Web Services (AWS):
Amazon Web Services (AWS) is one of the leading cloud service providers, offering a
comprehensive suite of on-demand computing services. AWS provides a vast array of
infrastructure services, platform services, and software services, allowing businesses to build and
scale applications without the need to invest in and manage physical hardware.
AWS Services:
Compute Services:
Amazon EC2 (Elastic Compute Cloud): Virtual servers in the cloud, allowing users to run
applications.
Storage Services:
Amazon S3 (Simple Storage Service): Scalable object storage for data backup, archiving, and
content distribution.
Amazon EBS (Elastic Block Store): Block-level storage volumes for EC2 instances.
Database Services:
Amazon RDS (Relational Database Service): Managed relational databases supporting various
database engines.
Internship report 2022-2023 24
Amazon DynamoDB: Fully managed NoSQL database.

Networking Services:
Amazon VPC (Virtual Private Cloud): Isolated virtual networks for secure and customizable
cloud environments.
Amazon Route 53: Scalable and highly available Domain Name System (DNS) web service.
Analytics and Machine Learning:
Amazon Redshift: Fully managed data warehouse service for analytics.
Amazon SageMaker: Managed service for building, training, and deploying machine learning
models.
Security and Identity Services:
AWS Identity and Access Management (IAM): Access control and identity management
service.
AWS Key Management Service (KMS): Managed service for creating and controlling
encryption keys.
Management and Monitoring:
Amazon CloudWatch: Monitoring and management service for AWS resources.
AWS CloudTrail: Records AWS API calls for auditing.

Assignment:
Create an EFS and connect it to 3 different EC2 instances. Make sure all instances have
Different Operating Systems. For instance, Ubuntu, Red Hat Linux, and Amazon
Linux 2.

Snapshots:

Internship report 2022-2023 25


12.Introduction To Big Data
Big Data refers to the massive volume, variety, and velocity of data that inundates businesses and
organizations on a day-to-day basis. This data is characterized by its size, complexity, and the
speed at which it is generated, making traditional data processing methods inadequate for
handling it. The advent of Big Data technologies has opened new avenues for extracting valuable
insights from these vast datasets, driving innovation and decision-making in various industries.
Characteristics of Big Data:
Volume:
Big Data involves large amounts of data, typically measured in petabytes, exabytes, or even
zettabytes. This includes data from various sources such as social media, sensors, transactions,
and more.
Variety:
Data comes in diverse formats, including structured data (like databases and spreadsheets), semi-
structured data (like JSON and XML), and unstructured data (such as text, images, and videos).
Velocity:
Big Data is generated at high speeds and requires real-time or near-real-time processing to derive
meaningful insights. Examples include streaming data from social media or Internet of Things
(IoT) devices.
Veracity:
Refers to the accuracy and trustworthiness of the data. Big Data may include uncertain,
incomplete, or inconsistent data, requiring advanced techniques to ensure data quality.
Value:
The ultimate goal of Big Data is to extract value and insights from the vast amounts of
information generated. This involves analyzing and interpreting data to make informed business
decisions.
Three Vs of Big Data:
Volume:
Refers to the sheer size of the data. Big Data technologies are designed to handle large datasets
that cannot be effectively processed with traditional databases.
Velocity:
Denotes the speed at which data is generated, collected, and processed. Real-time or near-real-
time processing is often crucial for certain applications, such as fraud detection or monitoring
social media.
Variety:
Encompasses the different types of data, including structured, semi-structured, and unstructured
data. Big Data solutions must be capable of handling this diverse range of data formats.
Challenges of Big Data:
Storage:
Managing and storing massive volumes of data requires scalable and cost-effective storage
solutions.

Internship report 2022-2023 26


Processing:
Analyzing and processing large datasets in a timely manner is a significant challenge. Distributed
computing frameworks like Apache Hadoop and Apache Spark are commonly used.
Analysis:
Extracting meaningful insights from diverse and complex datasets requires advanced analytics
and machine learning techniques.
Privacy and Security:
With the increase in data volume and variety, ensuring the privacy and security of sensitive
information is a critical concern.
Integration:
Combining and integrating data from various sources, often with different formats, can be
complex and challenging.

Internship report 2022-2023 27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy