Business Intelligence Lab Report: Ms. A.Lalitha Registration No.: 15381033
Business Intelligence Lab Report: Ms. A.Lalitha Registration No.: 15381033
Business Intelligence Lab Report: Ms. A.Lalitha Registration No.: 15381033
Submitted By
Ms. A.LALITHA
Registration No. : 15381033
NAME
A. LALITHA
REG. NO.
15381033
SUBJECT
BUSINETGLCARPO
eProncjtihag
HeadofthDprmn
SubmiteodhrfVva-cExnl
NTEIRALXM
EXTRNALMI
1. Introduction:
1.2. Datawarehousing
A data warehouse (DW) is a database used for reporting. The data is
offloaded from the operational systems for reporting. The data may pass
through an operational data store for additional operations before it is used in
the DW for reporting. A data warehouse maintains its functions in three layers:
staging, integration, and access. Staging is used to store raw data for use by
developers (analysis and support). The integration layer is used to integrate
data and to have a level of abstraction from users. The access layer is for
getting data out for users.
This definition of the data warehouse focuses on data storage. The main
source of the data is cleaned, transformed, catalogued and made available for
use by managers and other business professionals for data mining, online
analytical processing, market research and decision support (Marakas & OBrien
2009). However, the means to retrieve and analyze data, to extract, transform
and load data, and to manage the data dictionary are also considered essential
components of a data warehousing system. Many references to data
warehousing use this broader context. Thus, an expanded definition for data
2. Personal Loans
Personal loans are unsecured loans which people can use for a variety of
purposes, such as paying tax bills, covering school tuition, or making car repairs.
Many banks and other lenders offer personal loans to people with good credit
records who can demonstrate an ability to repay them. This type of loan is often
touted as a useful tool for consolidating debt, for people who have multiple
outstanding accounts which are difficult to manage. By using a single loan to
pay off debt, people can consolidate their debt into one monthly payment, and
they may also achieve a lower interest rate, which is a distinct benefit.
Consolidating debt also tends to increase one's credit rating.
There are two types of personal loans. A closed-end loan is a one-time
loan of a set amount, with a fixed rate and repayment schedule. This type of
loan often has a repayment period of one to two years, depending on the
amount which is borrowed, and borrowers can choose to make additional
payments to pay the loan off more quickly. For one-time expenses, a closed-end
loan can be very useful.
4
3. Problem Definition
The objective is to perform Extract, Transform & Load (ETL) operations on
the set of input files containing the details of Personal Loans of a particular bank.
Each input file is of a specific format like XML, txt, csv, etc. The first part of an
ETL process involves extracting the data from these sources and carrying out
transformations on these data.
The load phase loads the data into the end target, usually the data
warehouse (DW). As the load phase interacts with a database, the constraints
defined in the database schema as well as in triggers activated upon data
load apply (for example, uniqueness, referential integrity, mandatory fields),
which also contribute to the overall data quality performance of the ETL process.
3. Click on the Data Flow Task and switch to the Data Flow tab.
4. From the Toolbox go to Data Flow Sources and add a Flat File Source,
from the Data Flow Transformations add a Derived Column and from
the Data Flow Destinations add a OLE DB Destination (Data Flow
Diagram) given below.
Note: use the green arrows to do the mapping and the red rows for the
reject rows.
5. Double click the Flat File Source and click the New button and browse
for the File name (figure shown below).
7. Switch to the Advanced and change the column names as desired. Check
the Preview tab to see whether the data is in the format as expected
(figure shown below). Click OK.
8. Double click on the Derived Column and Expand the Columns tree. Drag
and drop all the items to the Derived Column Name in the below
window. Make sure to use to Replace appropriate columns and click OK
(figure shown below). This is the stage where any transformations can be
carried out using the functions given in the right pane.
9. Double click the OLE DB Destination and press New to create a new
connection to the database.
10
10.
Choose the Data access mode as Table or View - fast load and
uncheck the Table Lock constraint. If the table is not yet created from the
SQL Management Studio then create a new table by clicking the New
button in the Name of the table or view to create a new table (figure
shown below).
11.
Go to the mappings and choose the mappings from the input
columns to the appropriate columns in the table.
11
12
13
When the project is run, all the flows are executed at one go and the details are
loaded into the database (figure given below). When the stages are shown in
green color, the loading has been completed successfully.
14
5. Next right click on Cubes in the Solution Explorer and select the New
Cube menu. In the cube wizard select to use the Use existing tables option
and choose the tables that need to be imported to form the cube. Below is
the figure depicting the cube structure.
15
6. Select the fields that need to be chosen for querying. Apply filters or sorting
or group by if necessary. The query will be automatically built in the below
pane. Click OK.
HerathQuisExdc
Query 1: BASED ON DATE CREATION AND THE NAME
SELECT Customers.DateCreated, Employees.CompanyName
FROM Allocations INNER JOIN
Customers ON Allocations.CustomerId = Customers.CustomerId INNER JOIN
Employees ON Allocations.EmployeeId = Employees.EmployeeId INNER JOIN
Payments ON Allocations.AllocationId = Payments.AllocationId
WHERE (Customers.DateCreated > '2010-12-01')
16
17
18
SELECT
dbo.Employees.CompanyName, SUM(dbo.Allocations.Amount) AS [Total
Loan Amount], SUM(dbo.Allocations.Emi) * dbo.Allocations.Period AS [Total Emi To Pay],
SUM(DISTINCT dbo.Payments.PayAmount) AS [Amount Repaid],
dbo.Allocations.InterestRate, dbo.Allocations.Period
FROM
dbo.Allocations INNER JOIN
dbo.Employees ON dbo.Allocations.EmployeeId =
dbo.Employees.EmployeeId INNER JOIN
//
dbo.Payments ON dbo.Allocations.AllocationId =
dbo.Payments.AllocationId
GROUP BY dbo.Employees.CompanyName, dbo.Allocations.InterestRate,
dbo.Allocations.Period
ORDER BY dbo.Employees.CompanyName, COUNT(DISTINCT
dbo.Payments.PaymentDate)
19
SELECT
dbo.Customers.Gender, SUM(dbo.Allocations.Amount) AS [Total Loan
Amount Sanctioned], SUM(dbo.Allocations.EMI * dbo.Allocations.Period)
AS [Total Amount To Be Paid], SUM(dbo.Payments.PayAmount) AS [Total
Amount Repaid], dbo.Employees.CompanyName,
SUM(dbo.Allocations.EMI * dbo.Allocations.Period) SUM(dbo.Payments.PayAmount) AS [Total Payments Remaining]
FROM
dbo.Allocations INNER JOIN
dbo.Customers ON dbo.Allocations.CustomerId =
dbo.Customers.CustomerId INNER JOIN
dbo.Payments ON dbo.Allocations.AllocationId =
dbo.Payments.AllocationId INNER JOIN
dbo.Employees ON dbo.Allocations.EmployeeId =
dbo.Employees.EmployeeId
GROUP BY dbo.Customers.Gender, dbo.Employees.CompanyName
20
21
SELECT
dbo.Customers.Address, SUM(dbo.Allocations.InterestRate) AS [Total Loan
Alloted], SUM(dbo.Payments.PayAmount * dbo.Allocations.Period)
AS [Total Payments Recieved], dbo.Employees.CompanyName,
SUM(DISTINCT dbo.Payments.PayAmount * dbo.Allocations.Period)
- SUM(DISTINCT dbo.Allocations. InterestRate) AS [Interest Paid]
FROM
dbo.Customers INNER JOIN
dbo.Allocations ON dbo.Customers.CustomerId =
dbo.Allocations.CustomerId INNER JOIN
dbo.Payments ON dbo.Customers.CustomerId =
dbo.Payments.CustomerId INNER JOIN
dbo.Employees ON dbo.Customers.EmployeeId =
dbo.Employees.EmployeeId
GROUP BY dbo.Customers.Address, dbo.Employees.CompanyName
22
23
24
CASENLYI
aUgesofBnkivScr
Case:
5. Usage of Banking Services
In day to day life we come across many banking services if the customers
are using ATMs, Debit Card, Credit card, Current account, loan account
25
etc. Here by integrating the account no we can able to know whether the
customer uses the services daily, weekly, monthly , twice a month etc.
Same time by Integrating the loan account with customer details we come
to know how many customers use debit card, how many uses Internet
Banking, how many uses Mobile Banking , how many of them having
loans, how many of the customers repaid correctly, not repaid etc.
By integrating this data we will able to decide the customers loyalty
towards the bank , which banking services is most widely used.
6. Stage I: Building The Warehouse
For the ETL process the inputs are the files;
Below is the structure of each of the input file;
1. BankDetails.csv (contains the Bank Name and loan id)
2. ProductDetails.txt(contains the details of Loan amount , Balance )
3. Region.txt(contains the details of the services offered by bank)
26
27
28
DatflowpcesrBnki
29
ExtracingDfombse
30
nDefihgtColums
31
32
After the Integration Process Select New Project and Go for Analysis
33
Open the data source wizard and Impersonation Information Dialog Box will be opened choose use
the service account.
34
After creating a Data Source Wizard Create a cube using existing tables
The process ofcreating cubes is shown below
35
36
37
38
After generating Cube goto new project Select Report Service Project
39
In datasource Microsoft SQLServer(sqlclient) and server name (local) and test the connection if the
test connection is successful then go to next step.
40
In the Query builder add the existing Tables and establish a logical relationship between them
Click ok then choose the report type as Tabular and then click next
41
Click preview report and finished the query will be executed successfully.
42
43
use sample
Select
Status
Product,State,LoanId
From
ProductDetails
where LoanId >=25000
44
45
46
Data Mining
Introduction About Weka the Data Mining Tool.
It consists of following
1.
2.
3.
4.
Explorer
Experimenter
Knowledge Flow
Work Bench
For the above related case we were mining the state customers how many of them
uses Atm cards, credit cards, Debit cards, internet Banking etc using Weka tool.
1. For starting the preprocess we have to export the Csvfile to Arff file. And we have
to define the class attribute.
2. Explorer can read the csv files directly by selecting the file type as csv in file
Selection Menu.
3. Select the preprocess and click an open file and select the file type to open the
file
47
By clicking the various attributes we can see the graphical representation of this
process also.
And we can add the missing values in the table fields.
48
49
4. We can mine the data base or classify the database using various algorithms,
here we have choosen Nave Buyers Algorithm.
Select the Classifier tab
Select the appropriate test options. Here am choosing Cross validation.
After choosing click start the result will be displayed with a short delay.
We can use the result view to
o Load / Save models
o Save results buffer
5. After classifying we can cluster our database and we can visualize the cluster
Scattered clustering
We can access the various clustering algorithms from the cluster tab
We can change the parameters to our needs.
I have choosen k means clustering algorithm with k=3
We can visualize the Dataand ease of changing the attributes against the axisand we
can adjust the noise as well
50
Here it is a general cluster and we can see the scattered cluster across various
attributes
51
52
Linear Clustering
53