DWDM Final Lab Syllabus

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Data Warehousing & Data Mining Lab

Course Objectives:
The main objective of the course is to
1. Inculcate Conceptual, Logical, and Physical design of Data Warehouses OLAP
applications and OLAP deployment
2. Emphasize hands-on experience working with all real data sets.
3. Test real data sets using popular data mining Python Libraries
4. Develop ability to design various algorithms based on data mining tools.
Course Outcomes:
By the end of the course student will be able to
● Pre-process the raw data provided by the client for best results
● Design a data warehouse for any organization and apply operations on it.
● Extract knowledge using data mining techniques and enlist various algorithms used in
information analysis of Data Mining Techniques
● Implement and Analyse on knowledge flow application on data sets and Apply the
suitable visualization techniques to output analytical results

Software Requirements: Python, pentaho /Microsoft-SSIS/ Informatica

1. Demonstrate the following data preprocessing tasks using python libraries.


a. Loading the dataset
b. Dealing with missing data
2. Demonstrate the following data preprocessing tasks using python libraries.
a. Dealing with categorical data
b. Scaling the features
c. Splitting dataset into Training and Testing Sets
3. Demonstrate the following Similarity and Dissimilarity Measures using python.
a. Pearson’s Correlation
b. Cosine Similarity
c. Jaccard Similarity
d. Euclidean Distance
e. Manhattan Distance
4. Creation of a Data Warehouse
a. Build Data Warehouse/Data Mart (using open source tools like Pentaho Data
Integration Tool, Pentaho Business Analytics; or other data warehouse tools like
Microsoft-SSIS, Informatica, Business Objects,etc.,)
b. Design multi-dimensional data models namely Star, Snowflake and Fact
Constellation schemas for any one enterprise (ex. Banking, Insurance, Finance,
Healthcare, manufacturing, Automobiles, sales etc).
c. Write ETL scripts and implement using data warehouse tools.
d. Perform Various OLAP operations such slice, dice, roll up, drill up and pivot
5. Build a model using linear regression algorithm on any dataset.
6. Build a classification model using Decision Tree algorithm on iris dataset
7. Apply Naïve Bayes Classification algorithm on any dataset
8. Generate frequent itemsets using Apriori Algorithm in python and also generate association
rules for any market basket data.
9. Apply K- Means clustering algorithm on any dataset.
10. Apply Hierarchical Clustering algorithm on any dataset.
11. Apply DBSCAN clustering algorithm on any dataset.

Reference:
1. https://analyticsindiamag.com/data-pre-processing-in-python/
2.https://towardsdatascience.com/decision-tree-in-python-b433ae57fb93
3. https://towardsdatascience.com/calculate-similarity-the-most-relevant-metrics-in-a-
nutshell9a43564f533e
4. https://www.springboard.com/blog/data-mining-python-tutorial/
5. https://medium.com/analytics-vidhya/association-analysis-in-python-2b955d0180c
6. https://www.datacamp.com/community/tutorials/naive-bayes-scikit-learn
7. https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/
8. https://towardsdatascience.com/dbscan-algorithm-complete-guide-and-application-with-
pythonscikit-learn-d690cbae4c5d

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy