ankush_kaira

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Ankush Kaira

(Azure Data Engineer)

Ph. 469 905 0684 kairaankush17@gmail.com

Summary:
• Overall, 10+ years of experience in multiple technology methodologies like Big Data, including
Analysis, Design, and Development of Big Data using Hadoop, Azure, Python, data Lake, Scala, and
PySpark, and database and data warehousing development using My SQL, Oracle, and Datawarehouse
• Extensively used PowerBI along with other azure cloud services while ingesting data in different
formats and sources.
• With 10+ years of experience in data warehouse varying from traditional warehouse to synapse and
snowflake these days.
• 5 years of experience as Azure Cloud Data Engineer in Microsoft Azure Cloud technologies including
Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics (SQL Data
warehouse), Azure SQL Database, Azure Analytical services, Polybase, Azure Cosmos NoSQL DB,
Azure Key vaults, Azure DevOps, Azure HDInsight Big Data Technologies like Hadoop, Apache
Spark, and Azure Data bricks.
• 4 years of experience as a Data warehouse developer handling Microsoft Business Intelligence Tools.
• Experience in developing pipelines in Spark using Scala and PySpark.
• Experience in building ETL (Azure Data Bricks) data pipelines leveraging PySpark, and Spark SQL.
• Extensively worked on Azure Databricks.
• Proficient in Azure Data Factory to perform Incremental Loads from Azure SQL DB to Azure
Synapse.
• Proficient in T-SQL with extensive experience in Microsoft SQL Server
• Hands-on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure,
Data Factory, Azure Analysis Services, Application Insights, Azure Monitoring, Key Vault, and Azure
Data Lake.
• Experience in building the Orchestration on Azure Data Factory for scheduling purposes.
• Hands-on experience in Azure cloud worked on App services, Azure SQL Database, Azure Blob
storage, Azure Functions, Virtual machines, Azure AD, Azure Data Factory, event hub, and event
Queue.
• Experience working with systems processing event-based logs, user logs, etc.
• Experience working with Azure Logic APP Integration tool.
• Experience with the Azure logic apps with different triggers
• Orchestrated data integration pipelines in ADF using various Activities like Get Metadata, Lookup,
For Each, Wait, Execute Pipeline, Set Variable, Filter, until, etc.
• Strong experience in migrating other databases to Snowflake.
• Experience with Snowflake Multi-Cluster Warehouses.
• Experience with MS Azure (Databricks, Data Factory, Data Lake, Azure SQL, Event Hub, etc.)
• Experience in using Snowflake Clone and Time Travel.
• Hands-on working experience and developing large-scale data pipelines using spark and hive
• Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
• Experience working with ARM templets to deploy in production using Azure DevOps
• Experience with the Azure logic apps with different triggers
• Experience in developing very complex mappings, reusable transformations, sessions, and workflows
using the Informatica ETL tool to extract data from various sources and load it into targets.
• Proficient in leveraging cloud-based data solutions, including Microsoft 365, to enable smooth
collaboration, data-driven decision-making, and data sharing with cross functional teams.
• Worked closely with SAS institute representatives to enhance data analytics capabilities, develop
dashboards, visualizations, and custom reports.
• Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction,
transformation, and aggregation from multiple file formats.
• Adept in azure intune management to ensure secure and compliant data access and device
management within cloud-based data environments.
• Implemented production scheduling jobs using Control-M, and Airflow.
• Used various file formats like Avro, Parquet, Sequence, JSON, ORC, and text for loading data,
parsing, gathering, and performing transformations.
• Hands-on experience with Kafka and Flume to load the log data from multiple sources directly into
HDFS.
• Good experience in Hortonworks and Cloudera for Apache Hadoop distributions.
• Hands-on experience with Confluent Kafka to load data from StreamSets directly into ADLS.
• Strong experience building data pipelines and performing large-scale data transformations.
• In-Depth knowledge in working with Distributed Computing Systems and parallel processing
techniques to efficiently deal with Big Data.
• Designed and Implemented Hive external tables using a shared meta-store with Static & Dynamic
partitioning, bucketing, and indexing.
• Exploring with Spark improving the performance and optimization of the existing algorithms in
Hadoop using Spark context, Spark-SQL, Data Frame, and pair RDDs.
• Extensive hands-on experience tuning spark Jobs and spark performance tuning.
• Experienced in working with structured data using HiveQL, and optimizing Hive queries.
• Experience with solid capabilities in exploratory data analysis, statistical analysis, and visualization
using R, Python, SQL, and Tableau.
• Running and scheduling workflows using Oozie and Zookeeper, identifying failures, and integrating,
coordinating, and scheduling jobs.
• Knowledge of Database Architecture for OLAP and OLTP Applications, Database designing, Data
Migration, and Data Warehousing Concepts, emphasizing ETL.
• Experience in Data Modeling & Analysis using Dimensional and Relational Data Modeling.
• Experience in using Star Schema and Snowflake schema for Modeling and using FACT & Dimensions
tables, Physical & Logical Data Modeling.
• Defining user stories and driving the agile board in JIRA during project execution, participating in
sprint demos and retrospectives.
• Maintained and administered GIT source code repository and GitHub Enterprise.
Education:
Bachelor of Engineering in Information Technology| Panjab University, India (2011)
MS in Information Systems| New York University, USA (2013)
Technical Skills:
Azure Services Azure data Factory, Airflow, Azure Data Bricks, Logic Apps, Functional App, Snowflake, Azure
DevOps, Blob Storage
Big Data Technologies MapReduce, Hive, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper
Hadoop Distribution Cloudera, Horton Works
Languages Java, SQL, PL/SQL, Python, HiveQL, Scala.
Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Build Automation tools Ant, Maven
Version Control GIT, GitHub.
IDE & Build Tools, Design Eclipse, Visual Studio.
Databases MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle
11g/12c, Cosmos DB

Lead Azure Data Engineer/DataBricks Developer


Agiliti Health New York |June 2022 - Present
Responsibilities:
• Integrated on-premises (MySQL, Cassandra) and cloud-based (Blob storage, Azure SQL DB) data using Azure
Data Factory and made it available into Data Lake Gen 2
• Proficient in leveraging Azure Databricks for end-to-end data processing and analytics, utilizing Apache Spark
and Python to transform and analyze large-scale datasets, resulting in enhanced data-driven insights and
streamlined data workflows.
• Performed data ingestion to Azure services and processed data in Azure Databricks from different APIs.
• Developed ETL transformations and validations using Spark-SQL/Spark Data Frames in Azure Databricks and
Azure Data Factory.
• Collaborated with Azure Logic Apps administrators to monitor and resolve process automation and data
processing pipeline issues.
• Optimized code for Azure Functions to extract, transform, and load data from diverse sources.
• Designed and implemented scalable data ingestion pipelines using Apache Kafka, Apache Flume, and Apache
Nifi.
• Managed Azure resources through Terraform, ensuring consistency, and repeatability across multiple platforms.
• Implemented infrastructure as code using Terraform to automate the provisioning of Azure resources, reducing
deployment time by 40%
• Developed and maintained ETL/ELT workflows using technologies like Apache Spark, Apache Beam, or
Apache Airflow.
• Implemented data quality checks and data cleansing techniques using Azure Dataflow jobs.
• Built and optimized data models and schemas using technologies like Apache Hive, Apache HBase, or
Snowflake.
• Developed ELT/ETL pipelines using Python and Snowflake Snow SQL.
• Implemented a CI/CD framework for data pipelines using Jenkins.
• Collaborated with DevOps engineers to establish automated CI/CD, Selenium and test-driven development
pipelines using Azure
• Integrated curated data in PowerBI in Syanpse, worked collaborately with data analyst on PowerBI service and
PowerBI desktop.
• Worked on PowerBi to create interactive dashboards, wrote complex DAX measres in power query editor.
• Created PowerBi different report page, consisting of 6 pages has map visuals, different KPI cards, multicards
• Managed end-to-end operations of ETL data pipelines, worked with sensitive data using Key vault service of
Azure cloud and used selenium testing on React page.
• Utilized SQL queries, scripting languages such as Python and Scala, and executed Hive scripts through Hive on
Spark and SparkSQL.
• Skilled in implementing scalable and efficient data pipelines on Azure Databricks, utilizing PySpark and Spark
SQL to process diverse data sources, leading to improved data quality and faster data delivery for critical business
operations
• Worked on EDI data format specifically HL7/ FHIR to process medical records from different medical equips.
• Leveraged Kafka, Spark Streaming, and Hive to process streaming data.
• Developed T-SQL scripts for ETL processes, including data cleansing and transformation, reducing data processing
time by 40%
• Utilized different ADF activities and Linked Services for data ingestion and data transformation.
• Utilized JIRA for project reporting and actively participated in Agile ceremonies.
• Worked on Milvus Vector DB, deploying code in virtual machines using Azure DevOps.
• Transferred data using Azure Synapse and serverless SQL pools for effectively transforming data.
• Proficient in designing and implementing scalable and distributed data soultions using Apache Cassandra, a highly
available NoSQL database.
• Collaborated with data analyst team, wrote complex DAX measures, performing data modeling in PowerBI
• Successfully migrated and integrated SSIS/ETL processes into Azure Data Factory, enhancing data orchestration
and optimizing workflow efficiency.
• Skilled in modeling data with Casandra’s column store and optimizing data performance through efficient data
partitioning in NoSQL databases.
• Leveraged Microsoft power platform to design and implement data-driven solutions, enhancing data analytics and
reporting capabilities within azure-based environments.
Azure ETL Data Engineer
Simmons Bank, New York |Oct 2020 - May 2022
Responsibilities:
• Improved Spark performance by optimizing data processing algorithms on Azure Databricks.
• Implemented efficient data integration solutions using Apache Kafka, Apache NiFi, and Azure Data Factory.
• Leverage strong understanding of financial derivatives and instruments to design and implement data processing
pipelines for pricing, risk analysis, and trade lifecycle management.
• Worked with Microsoft Azure services such as HDInsight Clusters, BLOB, Data Factory, Logic Apps, and Azure
Databricks.
• Handled near real time data streams using Azure Stream Analytics and Azure event hubs..
• Conducted ETL using Azure Databricks and migrated on-premises Oracle ETL processes to Azure Synapse
Analytics.
• Migrated SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks,
and Azure SQL Data Warehouse.
• Engineered robust ETL pipelines to seamlessly integrate data from various financial derivatives sources, ensuring
accuracy and completeness for risk management and regulatory compliance.
• Controlled and granted database access and migrated on-premise databases to Azure Data Lake Store.
• Deployed and optimized Python web applications using Azure DevOps CI/CD.
• Transformed the data using serverless SQL pools, stored procedures and transferred big data using Azure
Synapse Analytics pipelines.
• As an expert in Power BI Desktop and Server, I've consistently delivered insightful reports and maintained data security,
aligning perfectly with your requirements
• Developed enterprise-level solutions using batch processing and streaming frameworks.
• Designed and implemented robust data models and schemas.
• Developed and maintained end-to-end data pipelines using Apache Spark, Apache Airflow, or Azure Data
Factory.
• Collaborated with cross-functional teams to gather requirements and design data integration workflows.
• Provided production support and troubleshooting for data pipelines using Azure Synapse.
• Utilized Apache Spark pools for data analysis and processing in Azure Synapse.
• Utilized Git, JIRA, and worked on Spark using Python (PySpark) and Spark SQL.
• Expereinced in working with Amazon Dynamo DB(NoSQL database) service to build fast and flexible applications
with seamless scalability.
• Developed data models, performed data migrations, and utilized DynamoDB’s indexing and query capabilities for
efficient data retrieval.
• Ingested data in PowerBI from websites as well as from azure cloud containers which was stored in adls gen2.
• Worked in PowerBI creating interactive dashboards with different KPI, slicers, Filters and adding different visuals
in it to get pattern and insights from the users to finally deliver it to stakeholders.
• Utilized SSIS for data transfer between databases and scheduled jobs for package execution.
• Used Git as a version control tool to keep track of coding versions.

Role: Data Engineer


UPS, Louisville, KY |Jun 2019 – Sep 2020
Responsibilities:
• Designed and implemented an Enterprise Data Lake, ADLS gen2.
• Ensured data quality by performing data operations and integrity checks and worked on virtual machines.
• Created tables, views, and indexes in a relational database management system (RDBMS) for efficient data
retrieval and analysis.
• Integrated multiple data sources, including on-premises databases and cloud-based platforms, into PowerBI to
create a centralized reporting solution, resulting in improved data accessibility and accuracy.
• Created different data models such as Star model, Snowflake model in PowerBI for better relationship in data.
• Wrote different DAX measures for getting the required information from the dataset that was in csv format.
• Designed and Implemented ETL pipelines for supply chain management operation to extract, transform and load
supply chain data into a data warehouse, using tools Apache Spark and Python.
• Developed ETL processes to extract data from various sources, transform it, and load it into the data lake.
• Collaborated with cross-functional teams to understand data requirements and design data models.
• Conducted data profiling and analysis to identify patterns and insights.
• Used Grafana for monitoring purpose.
• Real-time data processing solutions implemented for supply chain data using Apache Kafka and Stream analytics.
• Implemented data governance policies and procedures to ensure data security and compliance.
• Utilized Git for version control of T-SQL scripts and databases schema changes, enabling collaboration with
development teams and ensuring code traceability
• Performed data cleansing and data transformation using SQL and scripting languages.
• Developed and maintained ETL workflows using tools like Informatica or Talend.
• Monitored and optimized ETL performance and identified areas for improvement.
• Conducted data validation and reconciliation to ensure accuracy and consistency.
• Worked with stakeholders to define data integration requirements and design data pipelines.
• Participated in Agile development methodologies and scrum meetings.
• Utilized PowerBI for creating reports, creating bookmarks, actions for visualizing and getting insights.
• Designed and maintained SSIS packages for seamless data integration between data sources and destinations

Big Data Developer
| Centene Corporation, St. Louis, Missouri | May 2018 – May 2019
Responsibilities:
• Designed and developed applications on the data lake to transform data according to business users' requirements.
• Developed a custom MapReduce framework to filter out bad and unnecessary records.
• Set up the CI/CD pipeline using Jenkins, Maven, Nexus, GitHub, and AWS.
• Built a data pipeline using Flume, Sqoop, Pig, and MapReduce to ingest customer behavioral data.
• Code mapper and reducer function in Python for Map Reduce jobs.
• Utilized Spark SQL to load JSON data, create schema RDDs, and load them into Hive tables.
• Leveraged Hive for data transformations, event joins, and pre-aggregations.
• Configured, designed and developed different dashboards using PowerBI on the gold layer data stored in ADLS
gen2 for the final reports generation.
• Developed comprehensive design documents and wrote MapReduce code to parse log files.
• Automated tasks using the Apache Oozie framework.
• Worked with Azure Synapse to drive data-driven insights and business intelligence.
• Environment: Cloudera CDH, Hadoop, HDFS, MapReduce, Hive, Oozie, Pig, Shell Scripting, MySQL.
• Established robust error handling and logging mechanism in SSIS, reducing data-related incidents.

Data Analyst
| UBS, Weehawken, NJ | Apr 2016 - Apr 2018
Responsibilities:
• Worked on Azure Synapse for building and optimizing end-to-end data analytics solutions with seamless
integration of data warehousing, big data, and data integration capabilities.
• Created and maintained databases for server inventory and performance inventory
• Leveraged Azure Synapse Analytics while working with data warehousing
• Developed data marts and a user access tool for ad-hoc reporting.
• Built cubes and dimensions for business intelligence and wrote MDX scripting.
• Developed SSIS jobs to automate report generation and cube refresh processes.
• Utilized SQL Server Reporting Services (SSRS) for report management and delivery.
• Developed stored procedures and triggers for data consistency.
• Leveraged Snowflake data model for external data sharing using Erwin.
• Worked on PL/SQL databases for scripting and database modeling.
• Played a pivotal role in identifying and developing use cases for EDW/DAR applications that align with
organization goals and priorities.
• Utilized business intelligence tools specifically PowerBI for creating dashboards for project progress, resource
allocation, task completion, and project budgets.
• Environment: Windows Server, MS SQL Server, SSIS, SSAS, SSRS, SQL Profiler, Power BI, C#, SharePoint.
Data Warehouse Developer
| CenturyLink, Denver, CO | Feb 2014 - Mar 2016
Responsibilities:
• Developed stored procedures, triggers, and functions for performance enhancement.
• Designed ETL data flows using SSIS for data extraction and transformation.
• Built Cubes and Dimensions using various architectures for Business Intelligence.
• Collaborated with SAS institute to implement business intelligence solutions, such as Enterprise Data Warehouse
(EDW) and Data Analytics and Reporting (DAR), applications.
• Developed dimensional data models using Erwin and implemented slowly changing dimensions (SCD).
• Developed SSAS Cubes, implemented aggregations, and deployed and processed SSAS objects.
• Created ad hoc reports and performed database queries for Business Intelligence purposes.
• Collaborated effectively in a project-oriented team with excellent communication skills.
• Environment: MS SQL Server, Visual Studio, SSIS, SharePoint, MS Access, Team Foundation Server, Git.
Certifications: Microsoft Certified Associate: Azure Data Engineer Associate (H461-5113)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy