0% found this document useful (0 votes)
21 views4 pages

sunil-ongolu-data-engineer

Sunil Ongolu is a Data Engineer with over 4 years of experience in Big Data technologies and cloud platforms such as AWS, Azure, and GCP. He has a strong background in designing and optimizing ETL pipelines, data integration, and analytics, utilizing tools like Spark, Kafka, and various database systems. Currently, he works at Kemper Corporation, focusing on developing scalable data integration solutions using Azure services.

Uploaded by

Pradeep Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views4 pages

sunil-ongolu-data-engineer

Sunil Ongolu is a Data Engineer with over 4 years of experience in Big Data technologies and cloud platforms such as AWS, Azure, and GCP. He has a strong background in designing and optimizing ETL pipelines, data integration, and analytics, utilizing tools like Spark, Kafka, and various database systems. Currently, he works at Kemper Corporation, focusing on developing scalable data integration solutions using Azure services.

Uploaded by

Pradeep Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

EDUCATION

SUNIL ONGOLU University of Alabama,


Birmingham, Alabama, USA
Data Engineer
(659) 223-4114 Masters / Computer Science
(Aug 2023 - May 2025)
songolu14@gmail.com

PROFILE SUMMARY CARRER OBJECTIVE


• 4+ years of IT experience in Analysis, Design, Development in Seasoned Data Engineer with 4 years
Big Data technologies like Spark, MapReduce, Hive, Yarn of experience in building and optimiz-
and HDFS including programming languages like Java, Scala, ing data infrastructure on cloud plat-
and Python. forms, including Azure, AWS, and GCP.
• Hands-on experience with Amazon EC2, Amazon S3, Ama- Skilled in designing robust ETL pipe-
zon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scal- lines, managing large-scale data
ing, CloudWatch, SNS, SES, SQS, Lambda, EMR and other ser- warehouses, and implementing data-
vices of the AWS family. Designed, build and man- driven solutions to support analytical
aged ELT data pipelines leveraging Airflow, Python, and business intelligence needs.
and GCP solutions. KEY SKILLS
• Experience on Migrating SQL database to Azure data Lake, Az- Cloud Technologies: AWS, GCP, Ama-
ure data lake Analytics, Data Bricks and Azure SQL Data zon, S3, EMR, Redshift, Lambda, Athena
warehouse and controlling and granting database access and Composer, Big Query.
Migrating On premise databases to Azure Data Lake store us- Script Languages: Python, Shell Script
ing Azure Data factory. Experience in Oozie and workflow (bash, shell).
scheduler to manage Hadoop jobs by Direct Acyclic Graph Programming Languages: Java, Py-
(DAG) of actions with control flows. thon, Hibernate, JDBC, JSON, HTML, CSS.
• Used Kafka to load real-time data from multiple data sources Databases: Oracle, MySQL, SQL Server,
into HDFS. Highly skilled in using visualization tools like Tab- PostgreSQL, HBase, Snowflake, Cas-
leau, Matplotlib, ggplot2 for creating dashboards. sandra, MongoDB.
• Hands-on experience with Spark, Databricks, and Delta Lake. Version controls and Tools: GIT, Ma-
• Expertise in developing production ready Spark applications ven, SBT, CBT.
utilizing Spark-Core, DataFrames, Spark-SQL, Spark-ML and Web/Application server: Apache
Spark-Streaming APIs. Experience in Performance Monitoring, Tomcat, WebLogic, WebSphere
Security, Trouble shooting, Backup, Disaster recovery, Mainte- Azure Ecosystem: Azure Data Lake,
nance and Support of Linux systems. ADF, Databricks, Azure SQL
• Experience in implementing Azure data solutions, provisioning Operating Systems: Windows, Unix,
storage account, Azure Data Factory, SQL Server, SQL Data- Linux
bases, SQL Data warehouse, Azure Data Bricks and Azure Cos- IDE Methodologies: Eclipse, Dream-
mos DB. Practical experience with Python and Apache Airflow weaver
to create, schedule, and monitor workflows. Hadoop Components / Big Data: HDFS,
• Hands-on experience in implementing, Building, and Deploy- Hue, MapReduce, PIG, Hive, HBase,
ment of CI/CD pipelines, managing projects often including Sqoop, Impala, Zookeeper, Flume,
tracking multiple deployments across multiple pipeline stages Kafka, Yarn, Kerberos, Pyspark Airflow,
(Dev, Test/QA staging, and production). Kafka, Snowflake Spark Components
• Have Extensive Experience in IT data analytics projects, Hands- Visualization& ETL tools: Tableau,
on experience in migrating on premise ETs to Google Cloud PowerBI, Informatica, Talend
Platform (GCP) using cloud native tools such as BIG query, Tools: TOAD, SQL developer, Azure Data
Cloud Data Proc, Google Cloud Storage, Composer. Studio, Soap UI, SSMS, GitHub, Share
• Used microservices and containerization technologies such Point, Visual Studio, Teradata SQL As-
as Docker and Kubernetes to build scalable and resilient SaaS sistant.
applications.
EMPLOYMENT HISTORY
Kemper Corporation - Alabama, USA
Azure Data Engineer Apr 2024 - Present
Kemper Corporation is an insurance holding company that provides insurance to individuals and businesses.
I design, develop, and maintain scalable and reliable data integration and ETL/ELT pipelines using Microsoft
Azure data services, primarily Azure Data Factory, Azure Databricks, and Azure Synapse Pipelines, to ingest
and process data from diverse sources, including SaaS applications.
Key Responsibilities:
• Been part of developing ETL jobs for extracting data from multiple tables and loading it into data mart
in Redshift. Involved in designing different components of system like Sqoop, Hadoop process involves
map reduce & hive, Spark, FTP integration to down systems.
• Integrated T-SQL codebase into Azure DevOps for version control, continuous integration, and automated
deployments. Involved in writing Spark applications using Scala. Performed data transformation and an-
alytics with Pyspark, Scala, and Python on Azure platforms, and created advanced data visualizations
with Power BI integrated with Azure Data Sources.
• Integrated data workflows with REST APIs and GraphQL for seamless data exchange between banking
systems and utilized Terraform and Bicep for infrastructure-as-code deployments and processed big
data using HDInsight with open-source frameworks like Hadoop, Hive, Kafka, and Spark on Azure for ef-
ficient data processing across financial systems.
• Designed and deployed Graph Databases with Azure Cosmos DB Gremlin API to enable advanced con-
nected data analysis for customer and transaction insights and utilized Apache NiFi to automate data
flows between systems, ensuring smooth integration and transformation for better data processing in fi-
nance operations.
• Actively involved in designing and developing data ingestion, aggregation, and integration in the Hadoop
environment. Using Azure Cluster services, Azure Data Factory V2 ingested a large amount and diversity
of data from diverse source systems into Azure Data Lake Gen2.
• Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems
by maintaining feeds. Built and configured Jenkins slaves for parallel job execution. Installed and config-
ured Jenkins for continuous integration and performed continuous deployments.
• Developed an end-to-end solution that involved ingesting sales data from multiple sources, transforming
and aggregating it using Azure Databricks, and visualizing insights through Tableau dashboards
• Created Data tables utilizing PyQt to display customer and policy information and add, delete, update
customer records. Developed automated monitoring and alerting systems using Kubernetes and Docker,
ensuring proactive identification and resolution of data pipeline issues.
• Implemented a Reusable plug & play Python Pattern (Synapse Integration, Aggregations, Change Data
Capture, Deduplication and High Watermark Implementation. This process accelerated the development
time and standardization across teams.
• Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering re-
quirements, design, development, deployment, and analysis of the application.
• Created Complex Stored Procedures, Slow Changing Dimension Type 2, Triggers, Functions, Tables, Views
and other T SQL code and SQL joins to ensure efficient data retrieval.
• Working on data management disciplines including data integration, modeling and other areas directly
relevant to business intelligence/business analytics development.
• Designed and implemented Elasticsearch index schemas to support scalable, high-performance search
and analytics over structured and unstructured data.
• Implemented Performance tuning techniques in Azure Data Factory and Azure Synapse Analytics.
• Developed ETL pipelines between data warehouses using a combination of Python and Snow-
flake, SnowSQL, writing SQL queries against Snowflake.
Environment: Analytics, Azure, Azure Data Lake, Azure Synapse Analytics, Cluster, Data Factory, Docker, Elas-
ticsearch, ETL, Factory, Java, Jenkins, Kafka, Kubernetes, Python, Redshift, Scala, Snowflake, Spark, SQL, Sqoop,
Tableau.

Drummond Company - Birmingham, Alabama, USA


AWS Data Engineer Dec 2023 - Mar 2024
Drummond Company, Inc. (DCI) is a privately-owned company that mines and processes coal and coal
products, and also deals with oil and real estate. Utilized the big data processing frameworks like Apache
Spark and Hadoop on Amazon EMR to process and transform large datasets for advanced analytics and data
science initiatives relevant to Drummond Company's industry.
Key Responsibilities:
• Strong knowledge of ETL best practices and experience designing and implementing ETL workflows using
Talend. Automated ingestion and transformation of streaming and batch data
with AWS Lambda + Glue + S3 pipeline.
• Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like
REGEX, JSON, and AVRO. Involved in the entire lifecycle of the projects including Design, Development, and
Deployment, Testing and Implementation, and support.
• Achieved 70% faster EMR cluster launch and configuration, optimized Hadoop job processing by 60%, im-
proved system stability, and utilized Boto3 for seamless file writing to S3 bucket.
• Have worked on partition of Kafka messages and setting up the replication factors in Kafka Cluster.
• Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks clus-
ter and Ability to apply the spark Data Frame API to complete Data manipulation within spark session.
• Provisioned high availability of AWS EC2 instances, migrated legacy systems to AWS, and developed Ter-
raform plugins, modules, and templates for automating AWS infrastructure.
• Storing different configs in No SQL database MongoDB and manipulating the configs using PyMongo.
• Set up the CI/CD pipelines using Jenkins, Maven, GitHub, Chef, Terraform, and AWS.
• Successfully managed data migration projects, including importing and exporting data to and from Mon-
goDB, ensuring data integrity and consistency throughout the process.
• Worked on container orchestration tools such as Docker swarm, Mesos, and Kubernetes.
• Optimized Elasticsearch cluster performance through shard tuning, heap memory management, refresh
interval adjustments, and query profiling.
• Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (Google Cloud).
• Used Ansible for application automatic deployment and provisioning to AWS environment.
Environment: API, AWS, CI/CD, Cluster, Docker, EC2, Elasticsearch, EMR, ETL, Git, Glue, Jenkins, JS, Kafka, Ku-
bernetes, lake, Lambda, S3, SAS, Spark, SQL

(Accenture) Pandora - Mumbai, India


GCP Data Engineer Aug 2021 - Aug 2023
Pandora is a Danish jewelry manufacturer and retailer. Developed and managed the ETL/ELT pipelines to in-
gest data from various sources, such as market data feeds, transactional systems, and third-party APIs. Used
GCP services like Cloud Dataflow, Cloud Composer, or Pub/Sub for data integration and real-time data
streaming.
Key Responsibilities:
• Data Ingestion to Azure Services, including Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, as
well as data processing in Azure Data bricks.
• Created reusable views and data marts in BigQuery to power Data Studio reports with consistent metrics
and definitions. Involved in loading data from UNIX file system to HDFS.
• Designed and deployed scalable ETL pipelines using Google Cloud Dataproc, integrat-
ing PySpark and Hive to process over 5TB of raw data daily, reducing data transformation time by 40%.
• Analyzed data using Hadoop components Hive and Pig. Developed and optimized Spark jobs on Data-
bricks clusters to process large-scale datasets (TBs+), improving runtime by 30%.
• Developed and executed BigQuery SQL, Dataflow jobs, and Dataproc scripts directly from Cloud Shell
• Analyzed the SQL scripts and designed it by using Spark SQL for faster performance.
• Authoring Python (PySpark) Scripts for custom UDFs for Row/ Column manipulations, merges, aggrega-
tions, stacking, data labelling and for all Cleaning and conforming tasks.
• Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.
• Developed Triggers, stored procedures, functions, and packagers using cursors associated with the pro-
ject using PL/SQL. Deployed Azure Functions and other dependencies into Azure to automate Azure Data
Factory pipelines for Data Lake jobs.
• Migrating an entire oracle database to BigQuery and using of Power BI for reporting.
• Designed Cassandra schemas for time-series IoT data (500K writes/sec). Used Sqoop import/export to
ingest raw data into Google Cloud Storage by spinning up Cloud Dataproc cluster.
• Used Google Cloud Dataflow using Python sdk for deploying streaming jobs in GCP as well as batch jobs
for custom cleaning of text and json files and write them to BigQuery. Build data pipelines in Airflow/Com-
poser for orchestrating ETL related jobs using different airflow operators.
Environment: Airflow, Azure, Azure Data Lake, BigQuery, Cassandra, Data Factory, ETL, Factory, GCP, HDFS,
Hive, Lake, Pig, PL/SQL, Power BI, PySpark, Python, SDK, Services, Spark, Spark SQL, SQL, Sqoop

Metropolis Healthcare Limited - Mumbai, India


Data Engineer Apr 2020 - Jul 2021
Metropolis Healthcare Limited is one of leading and renowned Indian diagnostics companies. Integrated the
data from diverse sources, including APIs, databases, and third-party services, ensuring seamless data flow
between systems and platforms. Implemented the data validation and cleansing processes to ensure data
quality and consistency.
Key Responsibilities:
• Implemented One time Data Migration of Multistate level data from SQL Server to Snowflake by using Py-
thon and Snow SQL. and developing ETL Pipelines in and out of data warehouse.
• The AWS Lambda functions were written in Spark with cross - functional dependencies that generated
custom libraries for delivering the Lambda function in the cloud. Wrote Spark-Streaming applications to
consume the data from Kafka topics and write the processed streams to HBase.
• Collaborated with data scientists by preparing clean, high-quality datasets on Databricks for machine
learning pipelines. Proficient in using Snowflake utilities, Snow SQL, Snow Pipe, and applying Big Data mod-
eling techniques using Python.
• Developed Spark applications for the entire batch processing by using PySpark. Actively Participated in
all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.
• Developed T-SQL scripts for managing instance-level objects and optimizing performance.
• Built a common SFTP download or upload framework using Azure Data Factory and Databricks.
• Implemented a continuous delivery (CI/CD) pipeline with Docker for custom application images in the
cloud using Jenkins. Containerized Airflow deployments with Docker and Kubernetes for consistent and
scalable execution environments.
• Ensured data integrity and consistency during migration, resolving compatibility issues with T-SQL script-
ing. Responsible for implementing monitoring solutions in Ansible, Terraform, Docker, and Jenkins.
• Skilled in monitoring servers using Nagios, Cloud watch and using ELK Stack- Elastic search and Kibana.
• Created data models and created an actual data lake on AWS Athena from S3 for use with AWS quick
sight. Performed Hive Query Optimization for better performance.
Environment: Airflow, Athena, AWS, CI/CD, Data Factory, Docker, ETL, Factory, HBase, Hive, IaaS, Jenkins,
Kafka, Kubernetes, PaaS, PySpark, Python, S3, Snowflake, Spark, SQL

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy