Donald Ngandeu 1
Donald Ngandeu 1
Big Data Engineer (Senior Level – 10+ years in Big Data, Cloud,
Hadoop)
925 215-5452 donaldngandeu54@gmail.com
Professional Summary
Created dashboards in Tableau using various features of Tableau such as Custom-SQL, Multiple Tables, Blending,
Extracts, Parameters, Filters, Calculations, Context Filters, Data source filters, Hierarchies, Filter Actions, Maps etc.
Skilled applying database performance tuning to data-heavy dashboards and reports and optimizing performance using
options such as Extracts, Context filters, writing efficient calculations, Data source filters, Indexing and Partitioning in
data source etc.
Designed custom reports using data extraction and reporting tools.
Developed algorithms based on business cases, and design new custom solutions to solve business issues and advance
goals.
Applied hands-on with project management tools such as Microsoft Team Foundation Server (TFS), Jira, and MS
Project.
Used database technologies and frameworks involving structured data, unstructured data, and semi-structured data, as
well as various storage formats such as RDMS and Data Lakes.
Installed and configured Hadoop ecosystem components such as Hadoop Map Reduce, HDFS, HBase, and Oozie.
Have used Scala for mor than 5 years
Wrote SQL queries for data validation of reports and dashboards.
Recommended and applied best practices to improve dashboard performance for Tableau server users.
Worked with Data Lakes and Big Data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).
Skilled using BI tools such as Tableau and PowerBI, data interpretation, modeling, data analysis, and reporting with the
ability to assist in directing planning based on insights.
Served on Big Data development teams that applied Agile methodologies and data-driven analytics.
Hands-on managing migrations, installations, and development.
Worked on teams with on-site and remote members across multiple time zones in culturally diverse environments.
Experienced in Partitioning, Dynamic-Partitioning, and bucketing concepts in Hive to compute data metrics.
Collected log data from different sources (web servers and social media- Tweets) using Flume and storing in HDFS to
perform MapReduce jobs/Hive queries.
Skilled using Hive, Zookeeper, Sqoop, Kafka-Storm, Spark, and Flume.
Applied pipeline development skills with Apache Airflow, and Kafka,
Worked on different clouds: AWS, GCP and Azure and mostly worked on migration from AWS to GCP.
Technical Skills
APACHE - Flume, Hadoop, YARN, Hive, Kafka, Oozie, Spark, Zookeeper, Cloudera, HDFS, Hortonworks,
MapR, MapReduce
SCRIPTING – Python, HiveQL, MapReduce, XML, UNIX, Shell scripting, LINUX
DATA PROCESSING (COMPUTE) ENGINES - Apache Spark, Spark Streaming, SparkSQL
PROGRAMMING LANGUAGES: Python, Java, Scala, PySpark, Spark, Spark Streaming, Spark SQL
DATABASES - Microsoft SQL Server Database Database & Data Structures, Apache Cassandra, Amazon
Redshift, DynamoDB, Apache Hbase, Apache Hive, MongoDB
Cloud- GCP, Azure, AWS
Professional Summary
Worked on 3 different projects in this company: Audience Targeting, Off Media Platform, and Audience Insights:
o Audience Targeting - I was responsible to create queries from BigQuery, push them on Liveramp and for the
front end, and investigate and make sure all the tables and segments ran perfectly.
o Off Media Platform – Ran the campaign and ping-specific customers to show them the on-sale product based
on their navigation and shopping in Safeway stores.
o Audience Insights - Made the queries, tables, and segments visible on Big Query LiveRamp and Big Query
Merkle and also reported all tables and segments to a SharePoint file.
Used Jira to track the tasks and the timeline.
Achieved campaign visualization with PowerBi and Tableau.
Worked closely with Business team to ensure business requirements were clearly defined, understood, and met.
Sqooped the data from Netezza onto out HDFS cluster. Transferred the data using
Informatica tool from AWS S3 to AWS Redshift.
Designed and implemented test environment on AWS-
Migrated data between Hadoop system and multi cloud infrastructure on AWS and GCP.
Built and analyzed Regression model on GCP using PySpark..
Worked data migration from AWS S3 bucket to GCP cloud storage bucket using data migration.
Worked closely to Data Scientist team to ensure their requirements were clearly defined, understood, and met.
Wrote Spark scripts using Scala shell.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Worked with Marketing to help to run their campaign.
Developed Spark UDFs using Scala for better performance.
Wrote queries on BigQuery.
Wrote queries on Snowflake.
Updated Liveramp and made segments available for the front end.
Cleaned unused taxonomies.
Made sure all queries ran properly.
Advised Business team about naming convention on LiveRamp.
Trained on-site employees how to use LiveRamp and the process of the circulation of data to BigQuery to LiveRamp
and Snowflake to LiveRamp.
Unit Tested the Data between Redshift and Snowflake
Worked with Merkle team to make sure all Albertsons’ queries worked properly.
Helped Albertsons’ employees configure Citrix and make sure their LiveRamp Analytic Environment worked properly.
Configured Axway to assure the transport between Albertsons and our different partners.
Worked with data warehousing team to join all the table needed for the campaign.
Reviewed colleagues’ queries for validation.
Worked with JIRA for bug tracking.
Participated in project scoping/sizing for several projects.
Made all segments visible on LiveRamp.
Provided access to users in our Sharepoint file.
Trained teams about LiveRamp utilization and the naming convention we used on LiveRamp.
Collaborated with my team using GitHub.
Elaborated some schema description with Lucid chart.
Worked with Alation to see how to query different tables.
Merrill Edge in Big Data Engineer March 2020 to
New York NY March 2022
Merrill Edge is an electronic trading platform provided by BofA Securities, part of Bank of America's
retail banking division.
Created PySpark streaming job to receive real time data from Kafka.
Defined Spark data schema and set up development environment inside the cluster.
Processed data with natural language toolkit to count important words and generated word clouds.
Interacted with data residing in HDFS using PySpark to process data.
Configured Linux on multiple Hadoop environments setting up Dev, Test, and Prod clusters within the same
configuration.
Conducted HDFS Monitoring job status and assessed life of the Data Nodes according to the specs.
Configured Spark Streaming to receive real-time data from the Apache Kafka and store the streamed data to HDFS
using Scala.
Designed AWS Cloud Formation templates to create VPC, Subnets, NAT to ensure
successful deployment of Web and database
Involved in designing and developing Enhancements of CSG using AWS API’s
Installed Spark and PySpark library in terminal using CLI in bootstrapping steps.
Worked in virtual machines to run pipelines to on a distributed system.
Started and configured master and slave nodes for Spark.
Produced scripts for doing transformations using Scala.
Designed Spark Python job to consume information from S3 Buckets using Boto3.
Worked on Snowflake Clone
Set up cloud compute engine in managed and unmanaged mode and SSH key management.
Utilized a cluster of multiple Kafka brokers to handle replication needs and allow for fault tolerance,
Created a pipeline to gather data using PySpark, Kafka, and HBase,
Used Spark streaming to receive real-time data using Kafka,
Maintained ELK (Elastic Search, Kibana) and wrote Spark scripts using Scala shell.
Implemented Spark using Scala and utilized DataFrames and Spark SQL API for faster processing of data.
Developed Spark Streaming applications to consume data from Kafka topics and insert the processed streams to HBase.
Sent requests to source REST Based API from a Scala script via Kafka producer.
Worked with unstructured data and parsed out the information by Python built-in functions.
Configured a Python API Producer file to ingest data from the Slack API using Kafka for real-time processing with
Spark.
Managed Hive connection with tables, databases, and external tables.
Installed Hadoop using Terminal and set the configurations.
Designed and developed data pipelines in an Azure environment using ADL Gen2, Blob Storage, ADF, Azure
Databricks, Azure SQL, Azure Synapse for analytics and MS PowerBI for reporting.
Formatted responses from Spark jobs to data frames using a schema containing News Type, Article Type, Word Count,
and News Snippet to parse JSONs.
Worked on Azure data factory pipeline to schedule job in Azure Databricks in Azure cloud.
Nationwide Mutual Insurance Company is a group of large U.S. insurance and financial services
companies based in Columbus, OH.
Automated AWS components creation like EC2 instances, Security groups, ELB, RDS, Lambda and IAM through
AWS cloud Formation templates
Provided proof of concepts converting Json data into parquet format to improve query processing by using Hive.
Worked on various real-time and batch processing applications using Spark/Scala, Kafka and Cassandra.
Created AWS Cloud Formation templates used for Terraform with existing plugins.
Implemented AWS IAM user roles and policies to authenticate and control access
Used Spark DataFrame API over Cloudera platform to perform analytics on Hive data.
Developed flume agents to extract data from Kafka Logs and other web servers into HDFS.
Collaborated on requirement gathering for data warehouse.
Implemented AWS IAM user roles and policies to authenticate and control access.
Installed, Configured and Managed AWS Tools such as ELK, Cloud Watch for Resource Monitoring
Implemented AWS Lambda functions to run scripts in response to events in the Amazon Redshift table or S3.
Created Hive tables, loading with data and writing Hive queries.
Performed streaming data ingestion process through PySpark.
Experience in Automation Testing, Software Development Life Cycle (SDLC) using the Waterfall Model, and a good
understanding of Agile Methodology.
Installed Kafka and start zookeeper, servers, partitions, and topics.
Implemented AWS IAM user roles and policies to authenticate and control access.
Specified nodes and performed the data analysis queries on Amazon Redshift clusters on AWS.
Responsible for Designing Logical and Physical data modeling for various data sources on AWS Redshift.
Implemented security measures AWS provides, employing key concepts of AWS Identity and Access Management
(IAM).
Built and analyzed Regression model on Google cloud using PySpark.
Migrated data between Hadoop system and multi cloud infrastructure on AWS and GCP.
Worked on GCP cloud on data proc cluster, Bigtable, BigQuery, pub sub and composer (Airflow managed service),
cloud storages, transfer data from on premise to cloud GCP.
Worked data migration from AWS S3 bucket to GCP cloud storage bucket using data migration service and VM
migrations from AWS to GCP.
Worked AWS EC2, S3, RDS, DynamoDB, EMR and Lambda services initially in project. Later worked on migration
to GCP.
BNY Mellon is a global investments company dedicated to helping its clients manage and service their
financial assets throughout the investment lifecycle. Whether providing financial services for
institutions, corporations, or individual investors.
Installed Spark and configured the Spark config files, environment path, Spark home and external libraries.
Wrote Python script using Spark to read and count word frequency from eighty-one doc files and generate tables to
compare frequency across the files.
Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language.
Imported and exported data using Flume and Kafka.
Wrote shell scripts to automate workflows to pull data from various databases into Hadoop framework for users to
access the data through Hive-based views.
Loaded data from the LINUX file system to HDFS.
Developed and maintained Spark/Scala application which calculates interchange between different credit cards.
Created a pipeline to gather new product releases of a country for a given week using PySpark, Kafka, and Hive.
Used Spark SQL to convert the Dstream into RDDs or Data frames.
Worked with Hive on MapReduce, and various configuration options for improving query performance.
Optimized data ingestion in Kafka Brokers within the Kafka cluster by partitioning Kafka Topics.
Sent requests to REST-Based API from a Python script via Kafka Producer.
Performed Data scrubbing and processing with Airflow and Spark.
Configured Flume agent batch size, capacity, and transaction capacity, roll size, roll count and roll intervals.
Provided connections to different Business Intelligence tools to the tables in the data warehouse such as Tableau and
Power BI.
Forrester Research is one of the most influential research and advisory firms in the world. Forrester
helps business and technology leaders use customer obsession to accelerate growth by putting their
customers at the center of their leadership, strategy, and operations.
Imported and exported data into HDFS and Hive using Sqoop.
Developed Spark jobs using Spark Core API.
Connected various data centers and transferred data between them using Sqoop and various ETL tools.
Used Cron jobs to schedule the execution of data processing scripts.
Imported real-time logs to Hadoop Distributed File System (HDFS) using Flume.
Exported data from DB2 to HDFS using Sqoop.
Used Zookeeper and Oozie for coordinating the cluster and scheduling workflows.
Worked with Ambari to monitor workloads, job performance and capacity planning.
Managing Hadoop clusters via Command Line, and Hortonworks Ambari agent.
Performed cluster and system performance tuning.
Collected real-time data from diverse sources like webservers and social media using Python and stored data in HDFS
for further analysis.
Implemented Partitioning, Dynamic Partitions and Buckets in Hive for optimized data retrieval.
Created Hive tables, loaded with data and wrote Hive Queries.
Processed multiple terabytes of data stored in Cloud using Hadoop.
Loaded and transformed large sets of structured, semi-structured and unstructured data working with data on HDFS,
Apache Cassandra, and HDFS in Hadoop Data Lake.
Configured cluster coordination services through Zookeeper and Kafka.
Configured Yarn capacity scheduler to support various business SLAs.
LexisNexis is a corporation that sells data mining platforms through online portals, computer-assisted
legal research and information about vast swaths of consumers around the world.
Worked directly with large knowledge data team that created the inspiration of this Enterprise Analytics initiative
during a Hadoop-based knowledge Data Lake.
Programmed Oozie advancement engine to run multiple Hive queries.
Developed Python Scripts for data capture and delta record process between fresh arrived knowledge and already
existing knowledge in HDFS.
Implemented workflows with Apache Oozie framework to automate tasks.
Imported data from disparate sources into Spark RDD for process.
Analyzed giant sets of structures, semi-structured and unstructured knowledge by running Hive.
Partitioned and bucketed log file knowledge.
Collected relational data designed with custom input adapters and Sqoop.
Consumed knowledge from writer queue exploitation Storm.
Wrote Oozie configuration scripts for exportation of log files to Hadoop cluster through automatic processes.
Accessed Hadoop Cluster (CDM)and reviewed log files of all daemons.
Translated migrated MapReduce jobs to Spark to boost performance of body process.
EY-Parthenon is a global strategy consulting company that helps CEOs and business leaders design and
deliver transformative strategies across the entire enterprise to help build long-term value to all
stakeholders.
Assigned to a development team that designed and programmed a variety of customized software programs.
Programmed a range of functions (e.g., automate logistical tracking and analysis, automate schematic measurements,
etc.).
Hands-on with technologies such as XML, Java, and JavaScript.
Conducted unit and systems integration tests to ensure system functioned to specification.
Established communications interfacing between the software program and the database backend.
Developed client-side testing/validation using JavaScript.
Installed MySQL Servers, configured tables, and supported database operations.
Implemented mail alert mechanism for alerting users when their selection criteria were met.
Education
Bachelors - Computer Science
https://play.google.com/store/apps/details?id=com.dutchbros.loyalty
Implemented customer rewards utilizing kotlin + MVVM, using dagger/hilt for dependency
injections, retrofit/RESTful APIs, roomDB, Firebase for maintaing customer login and rewards
information.
https://github.com/oceasia1622/AlbertsonsChallenge.git
https://github.com/oceasia1622/JetpackComposeExample.git
https://github.com/oceasia1622/RoomDBExample.git
https://github.com/oceasia1622/ServiceExample.git