Sunil Ongolu is a Data Engineer with over 4 years of experience in Big Data technologies and cloud platforms such as AWS, Azure, and GCP. He has a strong background in designing and optimizing ETL pipelines, data integration, and analytics, utilizing tools like Spark, Kafka, and various database systems. Currently, he works at Kemper Corporation, focusing on developing scalable data integration solutions using Azure services.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
21 views4 pages
sunil-ongolu-data-engineer
Sunil Ongolu is a Data Engineer with over 4 years of experience in Big Data technologies and cloud platforms such as AWS, Azure, and GCP. He has a strong background in designing and optimizing ETL pipelines, data integration, and analytics, utilizing tools like Spark, Kafka, and various database systems. Currently, he works at Kemper Corporation, focusing on developing scalable data integration solutions using Azure services.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4
EDUCATION
SUNIL ONGOLU University of Alabama,
Birmingham, Alabama, USA Data Engineer (659) 223-4114 Masters / Computer Science (Aug 2023 - May 2025) songolu14@gmail.com
PROFILE SUMMARY CARRER OBJECTIVE
• 4+ years of IT experience in Analysis, Design, Development in Seasoned Data Engineer with 4 years Big Data technologies like Spark, MapReduce, Hive, Yarn of experience in building and optimiz- and HDFS including programming languages like Java, Scala, ing data infrastructure on cloud plat- and Python. forms, including Azure, AWS, and GCP. • Hands-on experience with Amazon EC2, Amazon S3, Ama- Skilled in designing robust ETL pipe- zon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scal- lines, managing large-scale data ing, CloudWatch, SNS, SES, SQS, Lambda, EMR and other ser- warehouses, and implementing data- vices of the AWS family. Designed, build and man- driven solutions to support analytical aged ELT data pipelines leveraging Airflow, Python, and business intelligence needs. and GCP solutions. KEY SKILLS • Experience on Migrating SQL database to Azure data Lake, Az- Cloud Technologies: AWS, GCP, Ama- ure data lake Analytics, Data Bricks and Azure SQL Data zon, S3, EMR, Redshift, Lambda, Athena warehouse and controlling and granting database access and Composer, Big Query. Migrating On premise databases to Azure Data Lake store us- Script Languages: Python, Shell Script ing Azure Data factory. Experience in Oozie and workflow (bash, shell). scheduler to manage Hadoop jobs by Direct Acyclic Graph Programming Languages: Java, Py- (DAG) of actions with control flows. thon, Hibernate, JDBC, JSON, HTML, CSS. • Used Kafka to load real-time data from multiple data sources Databases: Oracle, MySQL, SQL Server, into HDFS. Highly skilled in using visualization tools like Tab- PostgreSQL, HBase, Snowflake, Cas- leau, Matplotlib, ggplot2 for creating dashboards. sandra, MongoDB. • Hands-on experience with Spark, Databricks, and Delta Lake. Version controls and Tools: GIT, Ma- • Expertise in developing production ready Spark applications ven, SBT, CBT. utilizing Spark-Core, DataFrames, Spark-SQL, Spark-ML and Web/Application server: Apache Spark-Streaming APIs. Experience in Performance Monitoring, Tomcat, WebLogic, WebSphere Security, Trouble shooting, Backup, Disaster recovery, Mainte- Azure Ecosystem: Azure Data Lake, nance and Support of Linux systems. ADF, Databricks, Azure SQL • Experience in implementing Azure data solutions, provisioning Operating Systems: Windows, Unix, storage account, Azure Data Factory, SQL Server, SQL Data- Linux bases, SQL Data warehouse, Azure Data Bricks and Azure Cos- IDE Methodologies: Eclipse, Dream- mos DB. Practical experience with Python and Apache Airflow weaver to create, schedule, and monitor workflows. Hadoop Components / Big Data: HDFS, • Hands-on experience in implementing, Building, and Deploy- Hue, MapReduce, PIG, Hive, HBase, ment of CI/CD pipelines, managing projects often including Sqoop, Impala, Zookeeper, Flume, tracking multiple deployments across multiple pipeline stages Kafka, Yarn, Kerberos, Pyspark Airflow, (Dev, Test/QA staging, and production). Kafka, Snowflake Spark Components • Have Extensive Experience in IT data analytics projects, Hands- Visualization& ETL tools: Tableau, on experience in migrating on premise ETs to Google Cloud PowerBI, Informatica, Talend Platform (GCP) using cloud native tools such as BIG query, Tools: TOAD, SQL developer, Azure Data Cloud Data Proc, Google Cloud Storage, Composer. Studio, Soap UI, SSMS, GitHub, Share • Used microservices and containerization technologies such Point, Visual Studio, Teradata SQL As- as Docker and Kubernetes to build scalable and resilient SaaS sistant. applications. EMPLOYMENT HISTORY Kemper Corporation - Alabama, USA Azure Data Engineer Apr 2024 - Present Kemper Corporation is an insurance holding company that provides insurance to individuals and businesses. I design, develop, and maintain scalable and reliable data integration and ETL/ELT pipelines using Microsoft Azure data services, primarily Azure Data Factory, Azure Databricks, and Azure Synapse Pipelines, to ingest and process data from diverse sources, including SaaS applications. Key Responsibilities: • Been part of developing ETL jobs for extracting data from multiple tables and loading it into data mart in Redshift. Involved in designing different components of system like Sqoop, Hadoop process involves map reduce & hive, Spark, FTP integration to down systems. • Integrated T-SQL codebase into Azure DevOps for version control, continuous integration, and automated deployments. Involved in writing Spark applications using Scala. Performed data transformation and an- alytics with Pyspark, Scala, and Python on Azure platforms, and created advanced data visualizations with Power BI integrated with Azure Data Sources. • Integrated data workflows with REST APIs and GraphQL for seamless data exchange between banking systems and utilized Terraform and Bicep for infrastructure-as-code deployments and processed big data using HDInsight with open-source frameworks like Hadoop, Hive, Kafka, and Spark on Azure for ef- ficient data processing across financial systems. • Designed and deployed Graph Databases with Azure Cosmos DB Gremlin API to enable advanced con- nected data analysis for customer and transaction insights and utilized Apache NiFi to automate data flows between systems, ensuring smooth integration and transformation for better data processing in fi- nance operations. • Actively involved in designing and developing data ingestion, aggregation, and integration in the Hadoop environment. Using Azure Cluster services, Azure Data Factory V2 ingested a large amount and diversity of data from diverse source systems into Azure Data Lake Gen2. • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds. Built and configured Jenkins slaves for parallel job execution. Installed and config- ured Jenkins for continuous integration and performed continuous deployments. • Developed an end-to-end solution that involved ingesting sales data from multiple sources, transforming and aggregating it using Azure Databricks, and visualizing insights through Tableau dashboards • Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records. Developed automated monitoring and alerting systems using Kubernetes and Docker, ensuring proactive identification and resolution of data pipeline issues. • Implemented a Reusable plug & play Python Pattern (Synapse Integration, Aggregations, Change Data Capture, Deduplication and High Watermark Implementation. This process accelerated the development time and standardization across teams. • Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering re- quirements, design, development, deployment, and analysis of the application. • Created Complex Stored Procedures, Slow Changing Dimension Type 2, Triggers, Functions, Tables, Views and other T SQL code and SQL joins to ensure efficient data retrieval. • Working on data management disciplines including data integration, modeling and other areas directly relevant to business intelligence/business analytics development. • Designed and implemented Elasticsearch index schemas to support scalable, high-performance search and analytics over structured and unstructured data. • Implemented Performance tuning techniques in Azure Data Factory and Azure Synapse Analytics. • Developed ETL pipelines between data warehouses using a combination of Python and Snow- flake, SnowSQL, writing SQL queries against Snowflake. Environment: Analytics, Azure, Azure Data Lake, Azure Synapse Analytics, Cluster, Data Factory, Docker, Elas- ticsearch, ETL, Factory, Java, Jenkins, Kafka, Kubernetes, Python, Redshift, Scala, Snowflake, Spark, SQL, Sqoop, Tableau.
Drummond Company - Birmingham, Alabama, USA
AWS Data Engineer Dec 2023 - Mar 2024 Drummond Company, Inc. (DCI) is a privately-owned company that mines and processes coal and coal products, and also deals with oil and real estate. Utilized the big data processing frameworks like Apache Spark and Hadoop on Amazon EMR to process and transform large datasets for advanced analytics and data science initiatives relevant to Drummond Company's industry. Key Responsibilities: • Strong knowledge of ETL best practices and experience designing and implementing ETL workflows using Talend. Automated ingestion and transformation of streaming and batch data with AWS Lambda + Glue + S3 pipeline. • Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON, and AVRO. Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support. • Achieved 70% faster EMR cluster launch and configuration, optimized Hadoop job processing by 60%, im- proved system stability, and utilized Boto3 for seamless file writing to S3 bucket. • Have worked on partition of Kafka messages and setting up the replication factors in Kafka Cluster. • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks clus- ter and Ability to apply the spark Data Frame API to complete Data manipulation within spark session. • Provisioned high availability of AWS EC2 instances, migrated legacy systems to AWS, and developed Ter- raform plugins, modules, and templates for automating AWS infrastructure. • Storing different configs in No SQL database MongoDB and manipulating the configs using PyMongo. • Set up the CI/CD pipelines using Jenkins, Maven, GitHub, Chef, Terraform, and AWS. • Successfully managed data migration projects, including importing and exporting data to and from Mon- goDB, ensuring data integrity and consistency throughout the process. • Worked on container orchestration tools such as Docker swarm, Mesos, and Kubernetes. • Optimized Elasticsearch cluster performance through shard tuning, heap memory management, refresh interval adjustments, and query profiling. • Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (Google Cloud). • Used Ansible for application automatic deployment and provisioning to AWS environment. Environment: API, AWS, CI/CD, Cluster, Docker, EC2, Elasticsearch, EMR, ETL, Git, Glue, Jenkins, JS, Kafka, Ku- bernetes, lake, Lambda, S3, SAS, Spark, SQL
(Accenture) Pandora - Mumbai, India
GCP Data Engineer Aug 2021 - Aug 2023 Pandora is a Danish jewelry manufacturer and retailer. Developed and managed the ETL/ELT pipelines to in- gest data from various sources, such as market data feeds, transactional systems, and third-party APIs. Used GCP services like Cloud Dataflow, Cloud Composer, or Pub/Sub for data integration and real-time data streaming. Key Responsibilities: • Data Ingestion to Azure Services, including Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, as well as data processing in Azure Data bricks. • Created reusable views and data marts in BigQuery to power Data Studio reports with consistent metrics and definitions. Involved in loading data from UNIX file system to HDFS. • Designed and deployed scalable ETL pipelines using Google Cloud Dataproc, integrat- ing PySpark and Hive to process over 5TB of raw data daily, reducing data transformation time by 40%. • Analyzed data using Hadoop components Hive and Pig. Developed and optimized Spark jobs on Data- bricks clusters to process large-scale datasets (TBs+), improving runtime by 30%. • Developed and executed BigQuery SQL, Dataflow jobs, and Dataproc scripts directly from Cloud Shell • Analyzed the SQL scripts and designed it by using Spark SQL for faster performance. • Authoring Python (PySpark) Scripts for custom UDFs for Row/ Column manipulations, merges, aggrega- tions, stacking, data labelling and for all Cleaning and conforming tasks. • Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery. • Developed Triggers, stored procedures, functions, and packagers using cursors associated with the pro- ject using PL/SQL. Deployed Azure Functions and other dependencies into Azure to automate Azure Data Factory pipelines for Data Lake jobs. • Migrating an entire oracle database to BigQuery and using of Power BI for reporting. • Designed Cassandra schemas for time-series IoT data (500K writes/sec). Used Sqoop import/export to ingest raw data into Google Cloud Storage by spinning up Cloud Dataproc cluster. • Used Google Cloud Dataflow using Python sdk for deploying streaming jobs in GCP as well as batch jobs for custom cleaning of text and json files and write them to BigQuery. Build data pipelines in Airflow/Com- poser for orchestrating ETL related jobs using different airflow operators. Environment: Airflow, Azure, Azure Data Lake, BigQuery, Cassandra, Data Factory, ETL, Factory, GCP, HDFS, Hive, Lake, Pig, PL/SQL, Power BI, PySpark, Python, SDK, Services, Spark, Spark SQL, SQL, Sqoop
Metropolis Healthcare Limited - Mumbai, India
Data Engineer Apr 2020 - Jul 2021 Metropolis Healthcare Limited is one of leading and renowned Indian diagnostics companies. Integrated the data from diverse sources, including APIs, databases, and third-party services, ensuring seamless data flow between systems and platforms. Implemented the data validation and cleansing processes to ensure data quality and consistency. Key Responsibilities: • Implemented One time Data Migration of Multistate level data from SQL Server to Snowflake by using Py- thon and Snow SQL. and developing ETL Pipelines in and out of data warehouse. • The AWS Lambda functions were written in Spark with cross - functional dependencies that generated custom libraries for delivering the Lambda function in the cloud. Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase. • Collaborated with data scientists by preparing clean, high-quality datasets on Databricks for machine learning pipelines. Proficient in using Snowflake utilities, Snow SQL, Snow Pipe, and applying Big Data mod- eling techniques using Python. • Developed Spark applications for the entire batch processing by using PySpark. Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment. • Developed T-SQL scripts for managing instance-level objects and optimizing performance. • Built a common SFTP download or upload framework using Azure Data Factory and Databricks. • Implemented a continuous delivery (CI/CD) pipeline with Docker for custom application images in the cloud using Jenkins. Containerized Airflow deployments with Docker and Kubernetes for consistent and scalable execution environments. • Ensured data integrity and consistency during migration, resolving compatibility issues with T-SQL script- ing. Responsible for implementing monitoring solutions in Ansible, Terraform, Docker, and Jenkins. • Skilled in monitoring servers using Nagios, Cloud watch and using ELK Stack- Elastic search and Kibana. • Created data models and created an actual data lake on AWS Athena from S3 for use with AWS quick sight. Performed Hive Query Optimization for better performance. Environment: Airflow, Athena, AWS, CI/CD, Data Factory, Docker, ETL, Factory, HBase, Hive, IaaS, Jenkins, Kafka, Kubernetes, PaaS, PySpark, Python, S3, Snowflake, Spark, SQL