OJT-SS
OJT-SS
Report On
On Job Training
During
I Year II Semester
On Job Training
Submitted to
The Department of Data Science and Information. Technology
In partial fulfillment
of the academic
requirements for the
award of the Degree of
For
The award of the degree Of
Masters Of Science In Data Science
By
SAYYAD SAHIL JAMAL
CERTIFICATE
lOMoAR cPSD| 54191830
DECLARATION
It is declared to the best of our knowledge that the work reported does not form part of any
dissertation submitted to any other University or Institute for award of any degree
lOMoAR cPSD| 54191830
ACKNOWLEDGEMENT
I would like to express our gratitude to all the people behind the screen who
helped me to transform an idea into a real application.
I would like to thank our Sir Mr. Tejas Jadhav for his technical guidance,
constant encouragement and support in carrying out our project at college.
We would like to express our heart-felt gratitude to our parents without whom
we would not have been privileged to achieve and full fill our dreams. We are
grateful to our principal, who most ably run the institution and has had the
major hand in enabling me to do our project.
Abstract
LIST OF FIGURES
1 3.1 6
Architectural Design
INDEX
Abstract (i)
1. INTRODUCTION 1
1.1. Scope 1
1.2. Existing Systems 2
1.3. Proposed Systems 3
2. SYSTEM ANALYSIS 4
3. SYSTEM DESIGN
3.1 Architecture Design 6
3.2 Modules 7
3.3 UML Diagrams 8
4. SYSTEM IMPLEMENTATION 9
5. OUTPUT SCREENS 15
8. BIBLIOGRAPHY 21
1)Appendix A: Abstract 22
2)Appendix C: Domain of Internship and Nature of internship 24
lOMoAR cPSD| 54191830
1. INTRODUCTION
1.1 Scope
• Limited Scalability
• Complex Maintenance
• Reduced Flexibility
The proposed system for data storage in Amazon Simple Storage Service (Amazon
S3) revolves around creating a streamlined and secure infrastructure. Utilizing S3
buckets as the organizational framework, the system ensures the systematic
categorization and storage of diverse datasets. Access controls are meticulously
configured to fortify security measures, providing granular control over data access.
Data transfer methods, including direct uploads and seamless integration with AWS
services, facilitate the efficient and secure flow of a variety of data types into Amazon
S3, ensuring adaptability to dynamic data requirements. Strategic decisions regarding
storage classes, such as Standard, Intelligent-Tiering, Glacier, and Glacier Deep
Archive, are made based on the specific characteristics of the data. This approach
lOMoAR cPSD| 54191830
MERITS:
• Scalability and Flexibility
2. SYSTEM ANALYSIS
System analysis for storing data in Amazon Simple Storage Service (Amazon S3)
involves a comprehensive examination of the requirements, functionalities, and
constraints associated with utilizing this cloud storage solution. The analysis
encompasses several key aspects:
Bucket Management:
• Creation: Users should be able to create new S3 buckets to logically organize and store
data.
• Deletion: Authorized users should have the ability to delete buckets that are no longer
needed.
• Download and Retrieval: Authorized users should have the capability to retrieve and
download stored data from S3 buckets.
Access Controls:
• ACLs and Bucket Policies: Implement access control lists (ACLs) and bucket policies
to control who can access and perform operations on S3 buckets and objects.
Security Measures:
lOMoAR cPSD| 54191830
• Encryption: Implement encryption mechanisms for data in transit and at rest, ensuring
the security and confidentiality of stored information.
Usability:
• User Access Management: Implement user access management to control who can
perform specific
The performance requirements for storing data in Amazon Simple Storage Service
(Amazon S3) center on optimizing data transfer, retrieval, and system responsiveness.
The system must ensure high-speed data transfer between clients and S3 buckets, with
clearly defined minimum acceptable rates for uploads and downloads, accounting for
network latency. Minimizing latency in data access and retrieval operations is
paramount, and the system should support a specified number of concurrent requests
without compromising performance. Different storage classes, such as Standard and
Glacier, should exhibit defined performance characteristics, and the system must
scale horizontally to handle increasing data volumes while maintaining high
availability and reliability. Data redundancy measures should be in place to ensure
availability in the event of hardware failures, and the system should optimize data
retrieval speed, especially for frequently accessed data. Throughput requirements
must be specified for data transfer operations, and seamless integration with analytics
tools and other AWS services should be ensured. Monitoring and reporting
mechanisms for performance metrics, including caching to optimize retrieval, should
be implemented to evaluate and maintain the system's efficiency, responsiveness, and
scalability over time.
Technology: Amazon S3
3. SYSTEM DESIGN
All big data solutions begin with storing data. This is the first step in the big data
pipeline. You can store data with several different services from Amazon Web
Services (AWS). Amazon Simple Storage Service (Amazon S3) is one of the most
commonly used services for storing data. The AWS Management Console to create
an S3 bucket. You will then add an AWS Identity and Access Management (IAM)
user to a group that has full access to Amazon S3. You will also upload files to
Amazon S3, and run simple queries on the data in Amazon
S3. You must have permissions to access Amazon S3. IAM is a web service for securely
6
controlling access to AWS services. One best practice for managing IAM permissions is to
create groups of users with a set of permissions. These permissions are controlled by IAM
policies.
3.2 Modules
Section 1: Bucket Management
Bucket management in Amazon S3 includes creating and configuring storage
containers with fine-grained access controls, versioning, and lifecycle policies for
efficient data governance. Users can optimize storage costs and enhance security,
tailoring configurations to specific organizational needs through features like cross-
region replication and bucket policies.
Section 5: Usability
Amazon S3's usability is reflected in its intuitive web interface, facilitating easy
navigation, bucket management, and access control configuration. User access
management ensures a secure and efficient experience, making it accessible for users
to manage and retrieve data seamlessly.
In UML, use-case diagrams model the behavior of a system and help to capture the
requirements of the system. Use-case diagrams describe the high-level functions and
scope of a system. These diagrams also identify the interactions between the system
and its actors.
4. SYSTEM IMPLEMENTATION
4.1 Procedure
In this task, we will review the permissions for the awsusers IAM group and add the awsuser
to that group
In the task, you will create a new group of user accounts • On the AWS Management
The policy document is in JavaScript Object Notation (JSON) format. This policy states
that users in that group are allowed to take all actions for Amazon S3 on all resources. • Choose
Cancel.
The policy document is in JSON format. This policy states that users in the group are not
allowed to perform the following specified actions on S3 objects:
lOMoAR cPSD| 54191830
In this task, you will add the awsuser to the awsusers group. You will also log out of the
console and log back in to the console with the awsuser account and password. • In the
• From the navigation header, open the list of account actions and copy the account ID.
• To sign back in with the awsuser credentials, choose Sign in to the Console.
• Select IAM user and then use the following information to sign in:
Note: Remove the dashes from the account number before you enter it.
• Enter a bucket name with three or more characters. Uppercase characters are not allowed.
Note: S3 bucket names must be unique across all buckets in Amazon S3. If you get a conflict
Note: Write down the bucket name because it will be used in future steps.
In this task, you will upload an object to the S3 bucket that you created. First, you must get the
file.
• Choose Upload.
lOMoAR cPSD| 54191830
In this task, you will query the object that you uploaded to verify that it was uploaded
successfully.
• Review the file properties for the file that you uploaded.
12
Note: You should get a message stating that versioning is not enabled for the bucket. This
behavior is expected.
• You should see the first few records from the file.
• Replace the previous query by deleting it and then paste the query you copied.
• In the Result pane, you should get the total number of records, which is 5.
In this task, you will change the encryption setting and storage class for the lab1.csv file.
• In the Amazon S3 breadcrumbs, choose the bucket name for your bucket.
You receive a confirmation that you successfully edited the storage class.
In this task, you will upload a file that is compressed as a .gzip file. First, you must get the
• In the Amazon S3 console, choose your bucket from the breadcrumbs again.
• Choose Upload.
• Choose Add files, and choose the lab1.csv.gz file that you downloaded previously.
• Choose Upload.
5. OUTPUT SCREENS
Output Screens of various functionalities in our application are shown over here along
with the description.
You must have permissions to access Amazon S3. IAM is a web service for securely
controlling access to AWS services. One best practice for managing IAM permissions
is to create groups of users with a set of permissions. These permissions are controlled
by IAM policies. An IAM policy is an entity that you attach to identities or resources
to define permissions.
Fig 5.1
Buckets and objects are the basic building blocks for Amazon S3. You create buckets
and add objects to the buckets. Objects in Amazon S3 can be up to 5 TB. You can set
individual object properties—such as encryption at rest and storage class type—in the
Amazon S3 console. Amazon S3 supports two kinds of encryption: Advanced
Encryption Standard (AES)-256, and AWS Key Management Service (AWS KMS).
If you select server-side encryption, each object has a unique key. The keys are also
encrypted with a master key that AWS rotates regularly. If you choose to use AWS
KMS, your objects will also be encrypted with unique keys, but you will manage those
keys yourself.
lOMoAR cPSD| 54191830
When you uploaded the lab1.csv file, you accepted the default storage class, which is
Standard. Amazon S3 provides six different storage classes, each with different
properties and cost structures.
Fig 5.2
Fig 5.3
lOMoAR cPSD| 54191830
Fig 5.4
Fig 5.5
lOMoAR cPSD| 54191830
Fig 5.6
Fig 5.7
lOMoAR cPSD| 54191830
Fig 5.8
Fig 5.9
lOMoAR cPSD| 54191830
6. INTERNSHIP FEEDBACK
It was a good experience performing all the lab activities and also reffering
the keen power point presentations provided. Also it was a new experience
for us to enhance your skills by using all the applications provided in the
internship. We have got hands-on experience to use each and every tool in
AWS platform by performing various lab activities . The guided labs were
the building blocks which are to be learnt to perform the challenging labs
which were really challenging and compact.
lOMoAR cPSD| 54191830
CONCLUSION
In conclusion, employing AWS data analytics with data stored in Amazon S3, coupled
with Identity and Access Management (IAM), establishes a robust and secure foundation
for scalable and efficient data processing. Amazon S3 serves as a highly durable and
scalable storage solution, accommodating diverse data types and volumes. IAM ensures
secure access controls, allowing fine-grained permissions to regulate who can interact
with the data. This integrated approach facilitates seamless data analytics workflows,
from ingestion to transformation and analysis. The combination of these AWS services
enables organizations to harness the power of their data, ensuring reliability, scalability,
and stringent security measures throughout the entire data lifecycle.
FUTURE SCOPE
The future scope of AWS data analytics in storing data using Amazon S3 and IAM
(Identity and Access Management) is poised for continued growth and innovation. As
organizations increasingly prioritize data-driven decision-making, the demand for
scalable and secure data storage solutions coupled with robust analytics capabilities is set
to surge. AWS, with its comprehensive suite of services, including Amazon S3 for
durable and scalable storage, and IAM for fine-grained access control, positions itself at
the forefront of this evolution. Future developments may see enhanced integration with
machine learning and AI services, enabling more sophisticated analytics. Additionally,
advancements in real-time analytics, data governance, and compliance features within
the AWS ecosystem are likely, offering organizations powerful tools to derive actionable
insights from their data while ensuring security and compliance standards are met. The
collaborative nature of AWS services is expected to foster an ecosystem where seamless
interactions between storage, access control, and analytics components drive continuous
innovation in data analytics solutions.
8. BIBLIOGRAPHY
[1] https://awsacademy.instructure.com,
[2] Grady Booch, James Rumbaugh, Ivar Jacobson. The Unified Modeling Language UserGuide.
AddisonWesley, Reading, Mass., 1999.
[3] https://docs.aws.amazon.com/s3/?id=docs_gateway#lang/en_us
[4] https://medium.com/aws-lambda-serverless-developer-guide-with-hands/amazon-s3-main-
featuresbuckets-and-objects-use-cases-and-how-it-works-b2689024e1b6
[5] www.w3schools.com
[6] www.wikipedia.org
APPENDIX A: ABSTRACT
VPM Bandodkar College of Science and Tech.
Summer Industry Internship -I
Batch No:1
Title
Roll No Name
ABSTRACT
Storing data in Amazon Redshift is of paramount importance in the field of data analytics, serving as a
foundational solution for secure, scalable, and reliable data storage. Amazon S3's ability to handle
diverse datasets, from raw to processed, makes it an ideal choice for analytics workflows, ensuring
seamless scalability as data volumes grow. The durability and availability of Amazon S3 contribute to
the reliability of analytics processes, while robust security features such as access controls and
encryption safeguard sensitive data. The integration capabilities with various analytics tools streamline
workflows, allowing analysts to efficiently access and analyze data. Overall, Amazon S3 plays a central
role in empowering organizations to derive meaningful insights from their data while maintaining the
integrity, security, and scalability required for effective data analytics.
Table 2: Nature of the Project/Internship work (Please tick √ Appropriate for your
project)
Nature of project
1 VISUALIZATION √ √
AND ANALYSIS OF
AMAZON REDSHIFT
Table 1: Domain of the Project/ Internship work (Please tick √ Appropriate For your
project)
VISUALIZATION
AND ANALYSIS
B18 √
OF INDIA’S GDP
USING AMAZON
REDSHIFT