0% found this document useful (0 votes)
21 views

Sai Prashanth_Sr. Data Scientist

JA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Sai Prashanth_Sr. Data Scientist

JA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Name: Sai Prashanth

Mobile: +1 314-325-9432
Mail: prasanth.bodaballa@gmail.com

PROFESSIONAL SUMMARY:

 Over 10+ years of experience as a Data scientist with expertise in Machine Learning, Datamining with
large Data Sets of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive
Modelling, Data Visualization, Web Crawling, Web Scraping Statistical Modelling, Data Mining, and
Natural Language Processing (NLP).
 Experience working with a complete data science project life cycle and actively involved in all the phases of
the project life cycle including data acquisition, data cleaning, feature engineering, features scaling, features
engineering, and statistical modeling (Decision Trees, Regression Models, Neural Networks, Support Vector
Machine (SVM), Clustering), dimensionality reduction using Principal Component Analysis and Factor
Analysis, testing and validation using ROC plot, K - fold cross validation and Data Visualization.
 Created pipelines, data flow, and complex data transformations and manipulations using PySpark with
Databricks.
 Solid experience and understanding of designing, designing, and operationalization of large-scale data and
analytics solutions on Snowflake Cloud Data Warehouse.
 Strong knowledge in data mining, machine learning, and deep learning skills such as Computer Vision,
Recommender Systems, and Natural Language Processing; developed computer vision systems using F-
RNN, with a focus on object detection and recognition.
 Strong experience using PyTorch, Open CV Tensorflow, and Computer Vision; used these libraries to
develop facial recognition, object tracking, creating neural networks, and implementing image recognition.
 Skilled in SQL database management, with expertise in designing schemas, writing efficient queries, and
ensuring data accuracy for structured data storage.
 Proficient in leveraging NoSQL databases to handle diverse data types and build scalable systems, enabling
real-time data analysis for dynamic datasets.
 Proficient in Statistical Modeling, Multivariate Analysis, model testing, and validation, with a focus on
integrating AI techniques for enhanced predictive modeling and decision-making.
 Expert in translating business requirements into analytical models, designing algorithms, and developing
scalable Data Mining and reporting solutions for both structured and unstructured data using cutting-edge
AI methodologies.
 Skilled in Data Parsing, Manipulation, and Preparation, employing various techniques such as descriptive
statistics, regex, splitting, merging, and reshaping.
 Experienced in leveraging AI-driven approaches for automated and efficient data processing.
 Experience using various Python and R packages like ggplot2, dplyr, NLP, plyr, pandas, NumPy, seaborn,
SciPy, matplotlib, and scikit-learn.
 Proven ability to analyse complex datasets and extract valuable insights, making data-driven
recommendations to drive business growth.
 Strong background in machine learning algorithms and techniques, with a track record of developing
predictive models for various applications.
 Proficient in programming languages like Python and R, with experience in data manipulation, feature
engineering, and model development.
 Skilled in data visualization tools such as Tableau and Matplotlib, effectively communicating findings to
both technical and non-technical stakeholders.
 Familiarity with big data technologies like Hadoop and Spark, enabling the processing and analysis of large-
scale datasets.
 Experience in designing and conducting A/B tests to evaluate the impact of data-driven initiatives and make
informed decisions.
 Extensive experience in Text Analytics, generating Data Visualization using Python and R creating
dashboards using tools like Tableau.
 Deep understanding of MapReduce with Hadoop and Spark. Good knowledge of Big Data ecosystems like
Hadoop ecosystem, and Spark.
 Effective team player with strong communication and interpersonal skills, possess a strong ability to adapt
and learn new technologies and new business lines rapidly
 Capable of leveraging AWS services to create scalable, cost-effective infrastructure for data analytics,
ensuring the efficient handling of large datasets and enabling seamless integration of machine learning
models into cloud environments.
 Good understanding of web design based on Python, Java, HTML, CSS, JavaScript, C++, and C.
 Extensive experience in Data Visualization including producing tables, graphs, and listings using various
procedures and tools such as Tableau
 Strong industry knowledge, superb analytical & problem-solving skills, and ability to adapt.

TECHNICAL SKILLS:

Programming: Python, Java, HTML, CSS, JavaScript, C++, C, SQL, Spark


Database: PostgreSQL, MySQL, MongoDB, SQL Server
BI/Data Tableau, Google Analytics, Advanced Microsoft Excel
Visualization:
Machine Learning: Linear and Logistic Regression, Regularization (Lasso & Ridge),
Decision trees, Support Vector Machines, Random Forest, Gradient
Boosting, CNN, RNN, Principal Component Analysis (PCA),
Hierarchical & K-means clustering
Big Data Tools: Spark/PySpark, HIVE, Map Reduce, HDFS
Machine Learning Natural Language Processing & Understanding, Sentiment Analysis,
Applications: Computer Vision, Time series Analysis and Forecasting, Survival
Analysis, Classification, Regression, Recommender Systems, Customer
Segmentation
Deep Learning: Multi-Layer Perceptron, Artificial Neural Networks (ANNs),
Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNN, LSTMs), Gradient Descent Optimizers, TensorFlow, Keras,
PyTorch
Cloud Platform: AWS(Amazon Web Services), Azure
Web Framework: Django, Flask, Tomcat
ETL / Warehousing: Snowflake, Databricks, Data Pipelines, Airflow
Data Automation: DataRobot
Python Libraries: TensorFlow, Scikit-learn, Pandas, NumPy, Matplotlib
Code Platform: Android Development
Version Control: Git, Gitlab

PROFESSIONAL EXPERIENCE:

Client: Centene Corporation, St Louis, Missouri Mar 2023 – Present


Role: Sr. Data Scientist/AI/ML Engineer

Responsibilities:
 Identifying, gathering, and analysing complex, multi-dimensional datasets utilizing a variety of tools.
 Explored ADNI Website and built a data pipeline to get 1.5TB of patient data.
 Pre-processed 3D and 2D image datasets and used TensorFlow to Predict the Onset.
 Performed quality checks on data, identifying outliers, and normality, and standardizing the data.
 Experienced in using AWS services like EMR and S3 to process and analyse large datasets efficiently,
enabling valuable insights for data-driven decision-making.
 Competent in deploying machine learning models using AWS Sage Maker, leveraging its capabilities for
model development and deployment.
 Generated risk stratification reports using Tableau to manage higher cost treatment plans and improve the
quality of care by highlighting risk summary and impact of prenatal conditions.
 Used computer vision techniques such as object detection can be applied to detect plaques and tangles in
brain scans, which are characteristic of Alzheimer's disease. These methods can be implemented with the
help of OpenCV and PyTorch.
 Used computer vision, and PyTorch to create a machine learning model; used image segmentation
techniques to identify and analyse specific regions of the brain that are affected by Alzheimer's disease. For
example, using PyTorch, a deep learning model can be trained to segment the hippocampus, a region
known to be affected by the disease, and analyse its shape and size to predict Alzheimer's.
 Capable of extracting valuable insights from SQL databases to drive data-driven decision-making and
support business objectives.
 Incorporated ethical considerations into AI projects, ensuring fairness, transparency, and accountability in
model development and deployment.
 Proficient in leveraging NoSQL databases to handle diverse data types and build scalable systems, enabling
real-time data analysis for dynamic datasets.
 Implemented NumPy for efficient data manipulation and mathematical operations, improving the speed
and accuracy of data pre-processing and analysis.
 Created informative and visually appealing data visualizations using Seaborn, facilitating data exploration
and presentation to both technical and non-technical stakeholders.
 Developed and fine-tuned deep learning models using TensorFlow, allowing for the implementation of
state-of-the-art machine learning techniques in various applications.
 Employed Apache Spark for distributed data processing and analysis, enabling the handling of large-scale
datasets and complex data transformations.
 Maintained a version control system using Git to track code changes, collaborate with team members, and
ensure codebase stability in data science projects.
 Employed RStudio as a comprehensive environment for AI-driven data analysis, modeling, and
visualization, integrating advanced R-based algorithms and methodologies into the data science projects.
 Designed and developed interactive data dashboards and visualizations in Tableau, incorporating AI-
powered analytics for enhanced insights.
 Spearheaded the automation of the build and deployment process for applications, incorporating AI-driven
components to enhance the user experience.
 Merged data from MongoDB and PostgreSQL for training ML model.
 Developed linear regression predictive risk model to predict members with high risk using Python.
 Developed a logistic regression Impact model that helped care providers quantify care-gaps prioritization
for each case using Python libraries Pandas, NumPy, SciKit-Learn, and Matplotlib.

Environment: TensorFlow, PyTorch, NumPy, pandas, scikit-learn, OpenCV, AI, Apache Spark, SQL, Python,
NoSQL, Git, Seaborn, Tableau, AWS (EMR, S3, SageMaker), Jupyter Notebook, RStudio, Databricks, ETL,
MongoDB, PostgreSQL.

Client: Focus Financial Services, New York, NY Dec 2021 – Feb 2023
Role: Data Scientist/AI/ ML Engineer

Responsibilities:
 Participated in all phases of research including data collection, data cleaning, data mining, and developing
models and visualizations.
 Designed and developed machine-learning models to improve marketing departments’ programmatic
strategies for optimal usage of impression opportunities.
 Utilized Azure Machine Learning to develop and deploy predictive models for business applications.
 Worked on unsupervised segmentation, targeting based on social network activities, and finding clusters of
user groups using the k-means method.
 Created multivariate regression-based attribution models using ad stock analysis from the digital
marketing data.
 Developed and implemented deep learning models using TensorFlow for tasks such as image classification,
object detection, or natural language processing.
 Designed and implemented Convolutional Neural Networks (CNN) for computer vision applications,
leveraging TensorFlow's high-level APIs.
 Deployed machine learning models as microservices on Azure Kubernetes Service (AKS) to enable scalable
and reliable inference in a production environment.
 Orchestrated model versioning and updates seamlessly using AKS, ensuring minimal downtime and
efficient resource utilization.
 Used PyTorch, Computer Vision, and Open CV for a variety of techniques such as image classification, object
detection, and feature extraction.
 Collected data from MongoDB database by using SQL and updated data on it
 Worked with high dimensional data sets retrieved from users, media agencies, or third-party apps and used
methods such as PCA, LDA, and Kernel Approximations.
 Data sources are extracted, transformed, and loaded to generate CSV data files with Python programming
and SQL queries.
 Managed end-to-end machine learning workflows, including data preparation, model training, and
deployment using Azure ML.
 Designed and maintained SQL databases to efficiently store and manage structured data, ensuring data
integrity and accessibility.
 Developed complex SQL queries and optimized database performance, enabling timely extraction of
insights for informed decision-making.
 Implemented version control using Git to track code changes, collaborate with team members, and
maintain codebase integrity throughout data science projects.
 Collaborated with data engineering teams using Azure Databricks for big data analytics and distributed
computing.
 Implemented cross-validation strategies to assess model performance and ensure robustness, contributing
to the development of more reliable AI models.
 Utilized Hadoop for distributed data storage and processing, handling and analysing large-scale datasets
effectively.
 Developed an ETL pipeline on Databricks, utilized black box and white box testing for software, and trained
a model on Kubernetes within a Docker image containing the entire runtime environment.
 Developed MapReduce jobs to extract valuable information from unstructured and semi-structured data
sources.
 Applied scikit-learn to develop machine learning models for predictive analytics, classification, regression,
and clustering, tailoring solutions to meet project objectives.
 Applied advanced hyperparameter tuning techniques to enhance the accuracy and generalization of
machine learning models, leveraging AI methodologies to fine-tune model parameters.
 Implemented and trained machine learning models using various frameworks (e.g., TensorFlow, scikit-
learn) and evaluated their performance using appropriate metrics.
 Collaborated with cross-functional teams, including data scientists, engineers, and domain experts, to
understand business requirements and integrate AI solutions into existing systems.
 Engineered custom data pipelines using Pandas and integrated AI-driven components to automate complex
data pre-processing tasks, contributing to the efficiency and repeatability of the AI-driven data science
workflow.
 Used different feature engineering methods in Python (Pandas, NumPy, and Matplotlib, Seaborn) to cleanse
high-dimensional datasets and prepare them for modelling.
 Developed and Supervised classification models to predict if the users will click on certain ads. Using
algorithms such as Stochastic Gradient Descent, Logistic Regression, Random Forest, SVM, and more.
 Analysed and visualized different segments of users to understand their behaviours better with Tableau.
 Worked in an agile environment using Jira for ticketing and confluence for documentation.

Environment: Apache Spark, EMR, Hadoop, Azure, ETL, PyTorch, TensorFlow, CNN, Machine learning, AI,
Python, SQL, Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn, Tableau, MongoDB, NoSQL, Git, Jira, Confluence,
Docker, Kubernetes, OpenCV.
Client: Synchrony, Bridgewater, NJ Sep 2019 – Nov 2021
Role: Sr. Data Scientist/AI/ML Engineer

Responsibilities:
• Developed AI/ML solutions to improve efficiency.
• Implemented predictive models for decision-making.
• Led a team of engineers to deliver projects on time.
• Proficiently implemented model monitoring and logging strategies to ensure the ongoing performance of
deployed machine learning models.
• Conducted training sessions on Azure DevOps best practices, contributing to team skill enhancement.
• Contributed to continuous improvements in DevOps processes and workflows within Azure DevOps,
enhancing overall efficiency.
• Demonstrated familiarity with model registry systems, ensuring meticulous cataloging and management
of versions for traceability and reproducibility.
• Administered PostgreSQL databases, focusing on ensuring data integrity, performance, and security.
• Innovatively designed and optimized database schemas, indexes, and queries to boost application
efficiency.
• Implemented replication solutions, including streaming and logical replication, ensuring high availability
and disaster recovery.
• Expertise in handling Geospatial Consortium (OGC) formats, promoting interoperability with diverse
geospatial datasets.
• Applied Python as a versatile tool for geospatial data manipulation, analysis, and automation.
• Skill fully utilized Pandas to streamline geospatial data processing and analysis workflows.
• Demonstrated adeptness in navigation systems, applying geospatial technologies to enhance precision in
location-based applications.
• In-depth knowledge of raster formats, including ESRI Grid, Geo TIFF, JPEG 2000, and NITF, facilitating
effective handling and analysis of raster data.
• Developed and implemented innovative conversational AI solutions using the Kore.AI platform.
• Designed and deployed chatbots, leveraging Kore. AI’s NLP capabilities to enhance customer interactions.
• Utilized natural language processing techniques.
• Worked on libraries like NumPy, Pandas, Schilt Learn, math plot, seaborn, psycopg2 etc.
• Using machine learning techniques supervised, unsupervised, reinforcement learning and understand the
requirement & design for AI/ML use cases.
• Mentor team members to deliver high quality software.
• Constantly assess current framework and operating model and bring new ideas to table to improve
process and outcomes.
• Partner with broader engineering team to drive inner source adoption and platform engineering.
• Built an internal job matching system using Natural Language Processing (NLP) to help HR specialists find
top N candidates suitable for certain open positions and at the same time, recommend potential
candidates to top N matching open positions in the company.
• Developed a Natural Language Processing/Understanding (NLP/NLU) front end for a Public Key
Infrastructure (PKI) proof of concept.
• Used machine learning modelling techniques like Grid Search CV, SMOTE.
• Worked on CentOS7 and Linux to access the AWS EC2 instances.
• Created customized Tableau dashboards for daily/weekly/monthly reporting purposes.
• Utilized natural language processing to perform parts of speech tagging and named entity recognition.
• Perform unit testing and provide system test support and validate & monitor deliverable in production.
• Automated the ML model building process by building Data Pipelines further integrating it with Data
cleaning process.
• Used Putty for accessing the AWS EC2 instances for training the models with Production phase data.
• Integrated the AWS server with GitLab to push the model into production and to monitor the performance.
• Work closely with both onshore and offshore team.
Environment: TensorFlow, PyTorch, NumPy, pandas, scikit-learn, OpenCV, AI, Apache Spark, SQL, Python,
NoSQL, Git, Seaborn, Tableau, AWS (EMR, S3, SageMaker), Jupyter Notebook, RStudio, Databricks, ETL,
MongoDB, PostgreSQL

Client: Costco, Issaquah, WA March 2018 – Aug 2019


Role: Data Scientist

Responsibilities:
 Worked with a variety of ML algorithms and dataset problems to explore their use case
 Utilized SQL for data preprocessing and analysis, querying databases to retrieve relevant data for machine
learning tasks
 Used regression algorithms on datasets to predict future sales values and identify growth areas
 Identified fraudulent customer transactions by verifying their payment methods using Random Forest
Classifier and obtained an accuracy metric of 85 per cent
 Worked on Machine Learning (ML) Algorithms, Stochastic Process and Modeling, Data Mining, Model
development, Validation and Scoring / Projections in R, Python and SAS environment.
 Performed Data Profiling to learn about user behaviour and merged data from multiple data sources.
 Exploratory data analysis (univariate, bi-variate, multi-variate analysis) with R, Python.
 Extensively used data blending and embed functionalities in Tableau.
 Developed, Organized, managed and maintained graphs, tables, slides and document templates for the
efficient creation of reports.
 Worked extensively with Advance analytics like Reference Lines and Bands, Trend Lines, Dual Axis.
 Worked in standardizing Tableau for shared service deployment.
 Created Tableau scorecards, dashboards using stack bars, bar graphs, scatter plots, geographical maps,
Gantt charts using show me functionality.
 Experienced in building, publishing customized interactive reports and dashboards, report scheduling
using Tableau server. Created New Schedules and involved in checking the task's daily on the server.
 Involved in writing T-SQL queries using SQL Server Management Studio (SSMS) and developed a new data
warehouse using SQL Server Analysis Services (SSAS) built cubes using Kimball methodology
 Developed report automation processes using Tableau to improve turnaround time and reporting
capabilities.
 Developed Trend Lines, Statistics and Log Axes, Groups, Hierarchies and Sets to create detail level summary
report and Dashboard using KPI's.

Environment: Python, Pandas, PySpark, NLP, BERT, T-5, Gpt3, ChatGPT, Prompt Engineering, ML-Flow, CI/CD
Pipelines, AWS, Sagemaker, Docker, Kubernetes, ML

Client: Netzone Technologies Limited, India Apr 2015 – Oct 2017


Role: Data Scientist

Responsibilities:
 Developed KNN model to choose toys to sell and achieved this business turn a profit.
 Used computer vision to inspect the packaging of before they are shipped and used OpenCV to analyse
images of the packaging, identify defects such as missing labels or damage, and used PyTorch to train a
deep learning model to classify images as defective or non-defective Work with the management team to
create a prioritized list of needs for each business segment.
 Used PyTorch to train an ML model on a dataset of toy images and their corresponding labels.
 Comparing the sales and Revenue impacts with the previous month’s reports. It helps to take the right
decisions to improve the business.
 Designed and implemented scalable and cost-effective data storage solutions on AWS, utilizing services
such as Amazon S3 for efficient data storage and retrieval.
 Utilized AWS services like AWS Glue on Amazon EMR to pre-process and transform raw data into usable
formats, ensuring data quality and reliability.
 Developed and maintained end-to-end data pipelines on AWS, leveraging services like AWS Step Functions
for orchestrating complex workflows and automating data processing tasks.
 Gathering sales data of different commodities, and different regions from the database and preparing the
data for analysis based on the requirement.
 Ran diagnostic survey tool to measure and predict team performance.
 Extracted, compiled, and analysed data using Python, Pandas, NumPy, and Advanced Excel to build reports
and provide recommendations to clients to improve team performance.
 Generated ongoing reports of each active account as they are being consulted.
 Used advanced Excel functions to generate spreadsheets and pivot tables, leveraging cost data to assist
companies in controlling their expenses.
 Acquired data from both primary and secondary sources, maintained databases and data systems, and used
this information to analyze market product data, determining which products to sell and discontinue.
 Performed data cleaning by imputing missing values as needed, while also identifying and addressing data
quality issues such as duplicate removal and data standardization.
 Involved in client-facing activities where reports were presented to upper management and each team.
 Grouping the data based on the requirement and performing summary statistics

Environment: Machine learning, R, Tableau, SQL, Python, MySQL, AWS, PyTorch, Open CV, Computer Vision.

Client: Genex Technologies Pvt Ltd, Mumbai, India June 2013 – Mar 2015
Role: Data Analyst

Responsibilities:
 Responsible for providing data analysis that focuses on improving user experience and optimizing user
retention.
 Developed sophisticated ad hoc MySQL queries using correlated subqueries, window functions, and common
table expressions to track user activity metrics like retention rate and daily active user. Enhanced current
queries to operate more quickly on large datasets.
 Designed and maintained MySQL databases and created pipelines using user-defined functions and stored
procedures for reporting daily tasks.
 observed and examined trends in user activity data, identified underperforming user segmentation, and
provided stakeholders with insight.
 Designed metrics for A/B testing, built Tableau dashboards to track test procedures, examined test findings,
explained them to stakeholders, and made recommendations.
 Performed statistical analysis such as hypothesis testing causal inference and bayesian analysis.
 Built ETL pipelines to retrieve data from NoSQL databases and load aggregated data into the analytical
platform.
 Performed feature engineering and data preprocessing using spark jobs.
 Managed data storing and processing using big data tools Hadoop, HDFS, Hive, Spark.
 Developed Python scripts to automate data validation and data cleaning processes such as deduplicating and
checking data consistency using Pandas and Apache Airflow.
 Implemented scalable machine learning models Random Forest, Linear Regression using SparkML and
Python.
 Applied advanced classification models such as XGboost, SVM, Neural Network to train data using Python
packages such as Scikit-learn.
 Defined metrics to estimate impact of new features and give recommendations for business decisions based
on data analysis.
 Participated in data project planning, gathering business requirements, and translating them into technology
requirements.

Environment: Python, Facebook Prophet, SQL, Matplotlib, AWS, MS Office Suite.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy