Susheel Kumar - Performance Lead
Susheel Kumar - Performance Lead
Susheel Kumar - Performance Lead
I am a certified AWS and GCP Cloud Solutions Architect with over 15 years of working experience in the field of
Information Technology with over 10 years of experience in IT Infrastructure Management, Cloud, Backup &
Recovery, VMware virtualization, DevOps Engineer. Worked on AWS provided Solutions and familiar with other
Cloud Services such as Google Cloud, VMware Virtualization & Storage. I extensively worked on the 24x7
Production Support team & worked with incident and problem management, resolving performance issues, High
Availability, Disaster Recovery and Fault Tolerance. I have excellent communication skills and am available to
work onsite from day one.
EDUCATION
Master’s (M.S.) in Advanced Embedded Systems Design from International Institute of Information Technology,
Pune-India in 2006-2007.
Master’s in Business Administration (MBA) from Osmania University from Hyderabad, India in 2017-2019.
Bachelor’s in engineering in Electronics and Communications Engineering from JNTU-Hyderabad, India in
2000-2004.
SKILLS
Programming Languages Python, Terraform, Yaml
Monitoring / Perf Tools Dynatrace, CloudWatch, Datadog, JMeter, HP LoadRunner
Cloud Amazon Web Service (AWS), Google Cloud Platform(GCP)
Cloud Services Lambda, EC2, S3, VPC, API Gateway, SQS, Cloud Formation, Cloud Watch,
IAM, Cloud Trail, Cloud Alarm, Eventbridge, RDS, Fargate, ELB
Storage NetApp, EMC, IBM, HP, Actifio (Backup&Recovery product)
Virtualization VMware
Version Control Git, GitHub, Bitbucket, JFrog, Jenkins, Maven
System(CI/CD) tools
DBMS MySQL, Oracle, PostgreSQL, DynamoDB, Aurora
Bug Issue and Change JIRA, Bugzilla, Kibana, Confluence, Xray, Rally
Tracking tools
DevOps Docker, Kubernetes, Ansible
Environment Windows, Linux, Unix
EXPERIENCE
● Led a team of Performance engineers for establishing a robust performance testing framework
incorporating chaos engineering principles.
● Designed and executed chaos tests to proactively identify performance issues, stress test system
components, and validate the scalability and stability of critical applications.
● Used Observability tools such as Datadog to monitor and analyze system behavior during chaos testing
and identify potential weaknesses and bottlenecks.
● Mentored and coached junior performance engineers on performance testing methodologies, chaos
engineering practices, and performance tuning techniques to enhance team capabilities and expertise.
● Creation and maintenance of AWS EC2 instances in both production and non-production environments.
● Using Terraform to create resources on AWS.
● Integration of Lambda functions with other AWS services like API Gateways, SQS, SNS.
● Monitoring and troubleshooting using CloudWatch, configuring alarms and alerts for critical events,
Datadog tools.
● Experience in creating IAM, Route53, Elastic Load Balancer and Auto scaling groups.
● Creation of VPCs, RDS and DynamoDBs.
● Well exposure to version control systems – Git, GitHub, Jenkins.
● Deployed and managed AWS EKS, ensuring High Availability and scalability.
● All the services of the chuck are hosted on the cloud as Kubernetes applications (micro-services).
● Creation of resources using Terrorform.
● Testing of complex warehouse workflows related to inbound and outbound.
● Supporting the team related to the workflows testing in the Atlanta warehouse.
● Well-versed in Python programming.
Environment: AWS, Terraform, Jenkins, GIT, GitHub, Kibana, Python, Windows & Linux.
● 24x7 Production support and on-call support on rotation basis, Incident and problem management as part
of Cloud Engineering team.
● Worked on building CI/CD pipelines for Kubernetes containerized Java applications using the artifacts
stored in JFrog with AWS CodeBuild. Deploying these containerized applications using the Docker
images existing in AWS ECR repository onto the EKS cluster.
● Configuring the DBs considering the Disaster Recovery and High availability.
● Integrated Lambda functions into CI/CD pipelines for continuous deployment and testing.
● Collaborated with development teams to containerize applications and deploy them on Kubernetes.
● Used AWS Cloud Formation to automate the creation of resources on AWS cloud.
● Cost optimization by using AWS Lambda functions to monitor and send alerts to specific team members
when the CPU, Disk utilization exceeds the configured threshold and powering off the unused EC2
instances.
● Configured Lambda functions to trigger during the events of the instance state changes, files added in S3
bucket, etc.
● Using lambda functions to process messages that are in the SQS queue.
● Monitoring and troubleshooting of Lambda functions, identifying performance issues, and optimizing the
lambda functions for resource allocation.
● Configuring Lambda triggers, managing multiple lambda function versions and its aliases.
● Monitoring of jobs using Airflow DAG, troubleshooting and root causing the failures.
● Creation and maintenance of SQL DBs(Aurora, MySQL) and Dynamo DBs in both production and
non-production environments.
● Provisioning of instances, Airflow DAGs, EKS master and worker nodes in production and
non-production environments.
Environment: AWS, EKS, Kubernetes, Airflow DAGs, Service Now, Rally, Python, Windows & Linux.
Google, Hyderabad / Persistent Systems payroll May 2020 to March 2022
Software Architect (QA)
● Supporting the existing customers of Google Cloud’s Backup & DR product after acquisition of Actifio
by Google.
● Demonstrated advanced knowledge in Linux administration and troubleshooting, effectively diagnosing
and resolving issues related to system performance and availability.
● Protection of Windows and Linux VMs both physical and virtual using Google's Backup & DR product.
● Restoring the FS and VM applications using the backup images created.
● Collaborated with cross-functional teams, including software development, operations, and quality
assurance, to define performance requirements, establish performance metrics, and drive performance
improvements.
● Conducted in-depth performance analysis using industry-standard tools and techniques, including load
testing, stress testing, and profiling, to identify and resolve performance bottlenecks, reducing response
times significantly. Used HP LoadRunner for the performance testing.
● Migrating the VMs from on-premises setup to Google Cloud Platform (GCP) using the Backup&DR.
● Led Development and Operations processes in the team, developing CI/CD roadmap and implementation
to the project.
● Built, maintained, and scaled infrastructure for Dev and QA environments.
● Led a team of six resources, mentoring them and resolving their issues when needed.
● Worked on installation, managing and troubleshooting of Linux packages and services using RPM, yum
and APT package managers.
● Installation and configuration of RHEL, CentOS, SUSE and Windows O.S. on physical servers and
VMware ESX hosts, installation and Upgrades of VMware's vCenter and ESX servers.
● FC switch zoning on Brocade and Cisco switches and firmware upgrades on them.
● Configuration of RAID and LUNs creation from storage arrays and mapping them to the client machines
as per the Test and Dev requirements.
● Using Python scripting with Robot framework and Shell scripting to automate the regular repeated tasks.
● Worked on the various DevOps related components like Docker containerization and Jenkins.
● Creation of resources on AWS provider using Terraform.
● Configuration management using Ansible.
● Backing up VMware VMs and migrating them to Google Cloud.
● Production environment support 24x7. Incident and problem management.
● Monitoring of infrastructure including hosts, processes and network using tools like Dynatrace.
● Reduced costs significantly by eliminating unnecessary servers and consolidating databases.
● Jenkins setup to automate the continuous build and deployment integration process.
● Automation using Python with Robot framework.
Environment: AWS cloud services, GCP, Jenkins, GIT, GitHub, Terraform, Kubernetes, Ansible, Python,
Jira, Windows & Linux.
Environment: Docker, Jenkins, Kubernetes, GIT, Maven, Ansible, AWS Python, Dynatrace, New Relic,
Sumo logic, Cloud watch, Jira, Salesforce.
Project #2 : NAS-D / Big Data Director for NetApp and EMC Isilon support
NAS-D/BDD is an addition to Actifio appliances to support NAS data protection of EMC Isilon and
NetApp’s 7-mode and C-mode NFS and CIFS shares.
● Worked on design discussions along with the Project Management team and Development team.
● Prepared Test strategy document.
● Understanding the feature functionality and deriving the Test Plan.
● Performance Testing, Load Testing, Integration and Stress testing with huge volumes and millions of files
with random and sequential files.
● Test cases review with all stakeholders like PM, Dev, QA teams.
● Updating finalized test cases in the Test Link.
● Worked on configuring NetApp and Isilon simulators and doing the initial testing on them.
● Test execution and ongoing bugs and follow up with the developers in quick resolution of bugs.
● Configuration of NFS and CIFS shares on NetApp and EMC Isilon physical filers.
● Worked on Joining of NetApp and Isilon filers to the Active Directory Domain.
● Configured different setups as per customer environment and testing the feature.
● Knowledge transfer to 5 other QA resources and onboarding them in this project.
● Helped the automation team to automate most of the test cases and run them as part of weekly
Automation regression suites on new builds.
● Handling customer cases and troubleshooting them.
Environment: Docker, Jenkins, GIT, Maven, Ansible, AWS Management and CLI Console, Python and
Linux Shell Scripting, NetApp and EMC Storage and VMware.
Environment: NetApp, IBM, HP arrays, Linux and Windows, Cisco and Brocade swtiches, PERL and
Linux Shell Scripting.
Project: Microsoft Cluster Services (MSCS) qualification on various Active-Active and Active
Passive Arrays on ESX
Environment: VMware ESX/ESXi servers, HP, Dell and IBM servers, Brocade and Cisco FC switches,
EMC, HP and IBM storage arrays, PERL scripting.