nnml_ml3
nnml_ml3
Jan T Kim
1 n
n i∑
x= xi
=1
1
xs = x
σx
◮ Should be applied to elements of same dimension.
◮ Typically applied to columns (features) of data matrices.
◮ Normally not applicable to rows.
original data
4
2
0
y
−2
−4
−4 −2 0 2 4
x
4
2
2
0
0
y
y
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
x x
4
2
2
0
0
y
y
−2
−2
−2
−4
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x
2
1.5
1
0
y
−1
0.5
−3
−0.5
−3 −1 0 1 2 −2 0 1 2 3
x x
4
2
2
0
0
y
y
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
x x
x1 x2 x3 x4
1 1 0.41 0.00 1.00
2 2 0.86 0.95 0.31
3 3 1.10 0.59 -0.81
4 4 1.55 -0.59 -0.81
5 5 1.90 -0.95 0.31
x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0
5
3 3 1.10 0.59 -0.81
4
x1
3
4 4 1.55 -0.59 -0.81
2
1
5 5 1.90 -0.95 0.31
1.5
x2
1.0
0.5
1.0
0.5
x3
0.0
−1.0
1.0
0.5
x4
−0.5 0.0
3
2
1
0
y
−1
−2
−3
2
1
0
z
−1
−2
−3 −2 −1 0 1 2 3
0.6
0.4
0.2
x
0.0
−0.2
−0.4
−0.6
3
2
1
y
0
−1
−2
−3
2
1
0
z
−1
−2
−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 −2 −1 0 1 2
−3 −2 −1 0 1 2 3
y
3
z
2
1
0
−1
−2
−3
−4
−0.8
−0.6
−0.4
−0.20.00.20.40.60.8
x
3
2
1
−3 −2 −1 0
z
y
2
1
0
−1
−2
−3
−3 −2 −1 0 1 2 3
x
◮ Covariance:
n
1
n − 1 i∑
cov(x , y ) = σxy = (xi − x )(yi − y )
=1
1 1 1 -1 -1 -1
0 0 0 0 0 0 0
https://en.wikipedia.org/wiki/File:Correlation_examples2.svg
−1 −1
3
−0.5 −1
2
0 − 1
1
.. ..
0
y
. .
P=
−1
0.5 0
.. ..
−2
. .
−3
0.5 1 −3 −2 −1 0 1 2 3
1 1 x
3
3 0
S=
0 0.8
2
1
Compute and plot
0
y
−1
.. ..
. .
−2
1.5 0
−3
PS =
.. ..
−3 −2 −1 0 1 2 3
. . x
3 0.8
−1
−2
−3
−3 −2 −1 0 1 2 3
Jan T Kim x PCA & Hierarchical Clustering
Eigenvalues and Eigenvectors
−1
−2
−1
−2
−1
−2
3
0.87 0.5
R=
−0.5 0.87
2
1
Compute and plot
0
y
−1
.. ..
. .
−2
1.3 0.7
−3
PSR =
.. ..
−3 −2 −1 0 1 2 3
. . x
2.1 2.4
cos( π6 ) sin( π6 )
3
R=
− sin( π6 ) cos( π6 )
2
1
Compute and plot
0
y
−1
.. ..
. .
−2
1.3 0.7
−3
PSR =
.. ..
−3 −2 −1 0 1 2 3
. . x
2.1 2.4
Symmetric matrix: A = AT
◮ Let λi , λj be eigenvalues, λi 6= λj ,
◮ ei , ej corresponding eigenvectors.
λi (ei · ej )
= (Aei ) · ej
= (eiT AT ) · ej
= (eiT A)ej
= eiT (Aej )
= eiT (λj ej )
= λj (ei · ej )
λ1 ≥ λ2 ≥ . . . ≥ λd
∑ λi = trace(ΣΣ)
◮ Covariances:
◮ generally nonzero in original data: cov(x∗,i , x∗,j ) 6= 0
◮ principal components are decorrelated:
cov(p∗,i , p∗,j ) = 0∀i 6= j
∑ki=1 λi
.
∑di=1 λi
◮ Maximum variance that can be preserved in k < d
dimensions.
◮ Pre-processing: Consider
◮ centering columns,
◮ scaling columns to unit variance.
Built into many PCA implementations, check documentation.
◮ Calculate the covariance matrix.
◮ Compute the eigenvectors and eigenvalues of the covariance
matrix.
◮ Select k principal components
◮ Reduce dimensions:
◮ Transform data to PC space.
◮ Project to selected PCs.
◮ Transform back to original space.
Data: X = PSR
3
2
1
0
y
−1
−2
−3
−3 −2 −1 0 1 2 3
PCA results:
◮ Eigenvalues: λ1 = 4.7, λ2 = 0.3, scaling matrix
!
√1 0
λ1 0.46 0
L= =
0 √1 0 1.73
λ 2
0.87 −0.5
◮ Eigenvectors: u1 = , u2 = , combined to
0.5 0.87
matrix
0.87 −0.5
U=
0.5 0.87
3
2
2
1
1
×U
0
0
y
y
=
−1
−1
−2
−2
−3
−3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x x
3
3
2
2
1
1
×L
0
0
y
y
=
−1
−1
−2
−2
−3
−3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x x
U = R −1
L = const · S −1
√
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87
3
2
1
XUL =
0
y
−1
−2
−3
−3 −2 −1 0 1 2 3
√
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87
3
2
1
XULS ′ =
0
y
−1
−2
−3
−3 −2 −1 0 1 2 3
√
′ λ1 0 −1 0.87 −0.5
S = , R=U =
0 0 0.5 0.87
3
2
1
XULS ′ U −1 =
0
y
−1
−2
−3
−3 −2 −1 0 1 2 3
raw data
−5 −3 −1 1 −6 −4 −2 0 2
0.3
0.1
x1
−0.1
−0.3
1
−1
x2
−3
−5
0.0
x3
−1.0
−2.0
2
0
−2
x4
−4
−6
2
0
x5
−2
−4
−0.3 −0.1 0.1 0.3 −2.0 −1.0 0.0 −4 −2 0 2
raw data
x1 x2 x3 x4 x5
4
2
PC1
0
−4 −2
4
2
PC2
0
−2
−4
0.1
PC3
−0.1
−0.3
0.1
PC4
−0.1
−0.3
−0.2
−4 −2 0 2 4 −0.3 −0.1 0.1 −0.2 0.0 0.1 0.2
0.15
x5 135
137105
111
20
134 112 102
131
109
138 141 139
148
124
120
0.10 123127
128 145
129
104
101 132121
114
147 146
115
125
140 106
144
103
107116 149 119
10
944 136 117
143
113122
130 150
126
118
0.05
x2 12
4624 18 1531
111 110 108
142
26
49
2223
39 35 34
30
5013
2125 8
27 133
17 3 33x3
14
20 x1
0.00
16 28
0
48
PC2
47 45 36
32
2738641 37
4 42
29 89 52
5 4010
−0.05
1943 65 6095
7670 77
−10
86 84 9363
7188 67
92 97 62
94
66 545564
69
−0.10
58 9098 5773
8153
56
x4 74
7991 6185 72 83
−20
59 82100
75
78 68
−0.15
96
51 80
8799
PC1
7
6
5
4
3
2
1
0 variances per PC without scaling
3
2
1
PC1
−1 0
−3
3
2
1
PC2
−1 0
−3
1.5
0.5
PC3
−0.5
−1.5
0.2
PC4
0.0
−0.2
0.05
PC5
−0.05
−0.15
−3 −1 0 1 2 3 −1.5 −0.5 0.5 1.5 −0.15 −0.05 0.05
20
10
x1
0
−20 −10
20
10
x2
0
−20 −10
20
10
x3
0
−20 −10
20
10
x4
0
−20 −10
20
10
PC1
0
−20 −10
20
10
PC2
0
−20 −10
20
10
PC3
0
−20 −10
20
10
PC4
0
−20 −10
20
10
PC2
0
−10
−20
−20 −10 0 10 20
PC1
q √
◮ L2 norm (Euclidean norm): kx k2 = ∑i xi2 = 16 + 9 = 5
◮ L1 norm (Manhattan norm): kx k1 = ∑i |xi | = 4 + 3 = 7
◮ L∞ norm: kx k∞ = maxi |xi | = max{4, 3} = 4
x1 x2 x3 x4
1 1 0.41 0.00 1.00
2 2 0.86 0.95 0.31
3 3 1.10 0.59 -0.81
4 4 1.55 -0.59 -0.81
5 5 1.90 -0.95 0.31
x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0
5
3 3 1.10 0.59 -0.81
4
x1
3
4 4 1.55 -0.59 -0.81
2
1
5 5 1.90 -0.95 0.31
1.5
x2
1.0
0.5
1.0
0.5
x3
0.0
−1.0
1.0
0.5
x4
−0.5 0.0
x1 x2 x3 x4
1 1 0.41 0.00 1.00 plot(dataFrame);
2 2 0.86 0.95 0.31 0.5 1.0 1.5 −0.5 0.0 0.5 1.0
5
3 3 1.10 0.59 -0.81
4
x1
3
4 4 1.55 -0.59 -0.81
2
1
5 5 1.90 -0.95 0.31
1.5
x2
1.0
0.5
1.0
image(matrix2D);
0.5
x3
0.0
−1.0
1.0
0.8
0.5
x4
−0.5 0.0
0.4
a b c d e
b
a 0 1 3 3 5
b 1 0 3 3 5
c
c 3 3 0 2 5
d 3 3 2 0 4
e 5 5 5 4 0 d
a b c d e
b
a 0 1 3 3 5
b 1 0 3 3 5
c
c 3 3 0 2 5
d 3 3 2 0 4
e 5 5 5 4 0 d
ab c d e b
ab 0 3 3 5
c 3 0 2 5 c
d 3 2 0 4
e 5 5 4 0 d
ab c d e b
ab 0 3 3 5
c 3 0 2 5 c
d 3 2 0 4
e 5 5 4 0 d
b
ab cd e
ab 0 3 5
c
cd 3 0 4
e 5 4 0
d
b
ab cd e
ab 0 3 5
c
cd 3 0 4
e 5 4 0
d
b
abcd e
abcd 0 4 c
e 4 0
d
b
abcd e
abcd 0 4 c
e 4 0
d
b
abcde
c
abcde 0
2
1
0
y
−1
−2
−2 −1 0 1 2
0 1 2 3
35
54
14
12
13
75
70
74
43
50
1040
601
72
49
48
57
36
6747
29
55
53
7338
7
22 24
Jan T Kim
39
4
6466
69
5
37
25
61
30
45 11
dist(d)
422
46
19
23
59
Hierarchical Clustering: Example
71 31
63 21
52
15
658
41
58
683
17
446
32
33
349
62
51
26
16
18
PCA & Hierarchical Clustering
Hierarchical Clustering: Example
1.0
0.8
0.6
0.4
0.2
0.0
53
55
47
67
57
1
60
40
70
43
12
14
54
68
8
15
33
6
44
9
51
18
21
52
56
63
11
2
71
19
38
22
39
4
66
5
61
30
x
70
60
50
40
y
30
20
10
10 20 30 40 50 60 70
70
60
50
40
y
30
20
10
10 20 30 40 50 60 70
input order
1, ..., 300
301, ..., 600
10
8
6
1.5
y
z
1.0
4
0.5
0.0
2
−0.5
−1.0
−1.5
0
0 2 4 6 8 10
301
306
304
303
302
305
308
307
309
311
312
310
313
315
314
318
317
316
319
320
321
322
323
327
326
325
324
13
142
3
4
12
15
5
6
1071
8
9
11
23
22
21
18
20
19
17
16
36
35
31
32
33
34
27
26
30
29
28
24
25
329
328
330
331
332
335
334
333
348
349
346
344
347
345
336
339
337
338
343
342
340
341
37
38
39
40
43
41
42
46
47
44
45
48
52
51
49
50
56
54
55
53
58
57
61
60
59
69
70
68
71
64
63
62
65
66
67
102
101
100
103
104
105
99
98
97
95
96
92
94
91
93
79
78
77
76
75
74
72
73
80
81
84
83
82
90
88
89
86
85
87
360
359
363
362
361
366
364
365
355
357
356
358
353
352
354
351
350
367
370
369
368
375
374
373
372
371
379
381
380
378
377
376
387
385
386
383
382
384
530
529
528
527
526
525
537
538
539
540
535
536
531
532
534
533
514
513
510
511
512
505
504
509
508
507
506
516
517
515
523
524
521
522
519
520
518
251
252
250
247
246
248
249
245
243
Jan T Kim
244
241
242
240
238
239
256
255
253
254
259
260
258
257
270
271
272
268
269
263
262
261
266
267
265
264
599
600
596
598
597
586
589
588
587
594
595
591
590
593
592
580
581
582
583
585
584
573
574
575
579
576
577
578
572
570
571
566
567
569
568
559
560
556
557
558
563
561
562
565
564
547
546
545
544
543
542
541
554
555
553
551
550
549
552
548
300
299
298
297
296
294
295
291
292
293
274
273
275
280
281
278
277
279
276
289
290
288
287
282
283
284
286
285
109
dist(dHelix)
110
106
107
108
115
114
113
111
112
127
125
124
126
128
129
118
119
116
117
121
120
122
123
397
395
396
398
399
400
394
393
389
388
390
391
392
409
410
411
413
412
414
415
406
408
407
401
403
402
404
405
154
152
153
137
136
135
130
132
131
133
134
439
440
442
441
446
444
445
443
438
434
437
436
432
435
433
428
427
426
431
430
429
425
424
423
422
421
420
419
416
418
417
178
180
179
181
176
177
174
175
173
156
155
157
158
163
161
162
159
160
165
164
167
166
172
171
169
168
170
478
476
475
477
468
465
466
467
469
470
471
472
473
474
459
460
461
462
463
464
454
453
455
458
457
456
448
447
449
452
450
451
205
203
204
202
201
208
206
207
213
210
209
212
211
193
192
194
195
196
197
198
200
199
183
185
184
182
186
188
187
189
191
190
236
237
233
235
234
225
227
228
230
229
232
231
214
215
216
217
218
221
220
219
223
222
224
226
489
490
Complete, Average and Single Linkage
487
488
486
485
483
484
482
479
481
480
501
499
498
502
503
500
497
495
496
492
491
494
493
1, ..., 300
301, ..., 600
10
8
6
1.5
y
z
1.0
4
0.5
0.0
2
−0.5
−1.0
−1.5
0
0 1 2 3 4 5
398
399
400
397
395
396
394
393
383
382
384
379
381
380
387
385
386
389
388
390
391
392
409
410
411
413
412
414
415
406
408
407
401
405
404
403
402
416
418
417
419
422
421
420
425
424
423
428
427
426
431
430
429
92
94
91
93
102
101
100
99
98
97
95
96
104
105
103
109
110
106
107
108
118
119
116
117
115
114
113
111
112
128
129
130
132
131
133
134
121
120
122
123
126
127
125
124
139
140
138
137
136
135
144
141
143
142
151
149
150
148
145
147
146
31
32
33
34
37
36
35
27
26
30
29
23
22
28
24
25
10
21
18
20
193 5
671
428
17
16
12
15
13
14
11
46
47
44
45
43
41
42
40
38
39
52
51
49
50
60
59
58 498
57
56
54
55
53
69
70
68
71
61
64
63
62
72
7365
66
67
79
78
77
76
75
74
90
88
89
86
85
87
84
83
82
80
81
305
308
307
309
311
312
310
301
306
304
303
302
332
335
334
333
330
331
329
328
Jan T Kim
327
326
318
317
316
313
315
314
322
323
325
324
321
319
320
348
349
347
345
346
344
336
339
337
338
343
342
340
341
378
375
374
377
376
367
373
372
371
370
369
368
360
359
363
362
361
366
364
365
355
357
356
358
350
351
353
352
354
259
260
258
257
266
267
265
264
263
262
261
248
249
247
246
245
243
244
251
252
250
256
255
253
254
270
271
272
268
269
274
273
275
280
281
278
277
279
276
282
285
283
284
289
290
286
288
287
294
295
291
292
293
300
299
298
297
296
531
532
534
533
538
535
536
537
539
540
544
543
542
dist(dHelix)
541
547
546
545
548
551
550
549
554
552
555
553
556
557
558
561
559
560
573
574
575
579
576
577
578
563
562
565
564
566
567
572
568
569
570
571
586
589
588
587
585
584
580
581
582
583
599
525
530
529
528
527
517
515
514
513
523
524
516
518
519
520
521
522
509
510
511
512
504
508
505
507
506
497
495
496
492
491
494
493
501
499
498
502
503
500
208
206
207
201
202
205
203
204
221
220
219
216
217
218
214
215
212
211
213
210
209
241
242
240
238
239
236
237
233
235
234
223
222
224
226
230
232
231
225
229
227
228
156
155
157
158
154
152
153
172
171
169
168
170
165
164
167
166
163
161
162
159
160
196
197
198
200
199
193
192
194
195
189
191
190
182
183
185
184
186
188
187
178
180
179
181
176
177
174
175
173
471
472
473
474
478
476
475
477
489
490
487
488
486
485
483
484
482
479
481
480
434
437
Complete, Average and Single Linkage
436
432
435
433
438
439
440
442
441
447
449
445
443
446
444
452
448
450
451
454
453
455
458
457
456
459
460
461
462
463
464
465
466
468
467
469
470
1, ..., 300
301, ..., 600
10
8
6
1.5
y
z
1.0
4
0.5
0.0
2
−0.5
−1.0
−1.5
0
589
586
588
587
594
595
596
599
600
598
597591
590
593
592
547
546
545
561
559
560 554
548
563
562
565
564
551
550
549
556
557
558
552
555
553
580
581
582
583
585
584
573
574
575
579
576
577
578
567
566
568
572
569
570
571301
336
339
337
338342
340
341 343
318
317
316
313
315
314
303
302
306
304
305
311
312
310
309
308
307
319
320
322
323 321
332
335
334
333
330
331
329
328
327
326
325
324
544
543
542
541
538
536
535
534
533
531
532
537
539
540
347
345
344
346
348
349
408
407
409
410
411
414
415
413
412 406
393
387
385
386
390
388
389
391
392394
397
395
396401
399
400
402398
405
404
403
383
382
384
379
381
380
370
377
376
375
374
373
372
371
369
368 366
367
378
364
365
353
363
362
361
360
359
356
350
358
355
357
514
513351
352
354
502
517
515
510
511
512
504
509
508
505
507
506516
519
518
520
521
522
523
524
528
527
530
Jan T Kim
529
526
525
494
492
491
493
497
495
496
501
499
498
430
429503
500
4
433
454
453
448
450 432
478
45231
451
447
449
437
436435
445
434
443
438
442
441
439
440
446
444
416
418
417
428
427
426
425
424
423419
421
420422
490
489
479
481
480
482
487
488
486
485
483
484
472
473
4744468
476
475
477
471
469
470 67
455
458
457
456
459
460
464
463
461
462
465
466
294
295
291
292
293
300
299
298
297
296
289
290
288282
287
286
285
283
284233
236
237
235
234
243
244
241
242
240
238
239245
248
249
247
246
251
252
250
256
255
253
254
263
266
267
259
260
262
261
265
264
258
dist(dHelix)
257
268
269
270
271
272
276
279
274
280
281
273
275
278
277
232
231225
230
229
227
228
185
184182
183
213
226
224
223
222
217
218
221
220
219
210
209
196
197
198
200
199201
202
205
203
204
189
190
191
195
194
193
192186
188
187
136
135
138
137
139
140
146154
148
145
147
144
141
143
142
152
153
151
149
150
171
172
169
168
170
178
180
179
181
174
173
175
176
177
156
155
157
158
159
163
160
161
162
165
164
167
166
107
108
109
110126
106 1031
48
131
130
132
133
134
127
128
129
125
124
113
122
123
121
120
114
111
112
115
118
119
116
11797
95
96
99
98
102
101
100
91
92
94
104
10593
84
86
79
78
77
80
81
83
82
85
87
88
8990 68
72
7369
70
66
6760
597
75
71
74 6
615
64
63
62
36
35
31
32 37
Complete, Average and Single Linkage
33
34
645
46
47
4243
40
41
38
39 4
456
50
49
52
51
58
57
54
55
533
41 0
28
24
2529
27 11
23
28
305
6
97
13
1426
12
15222
18
20
17
16 1
19
1, ..., 300
301, ..., 600
10
8
6
1.5
y
z
1.0
4
0.5
0.0
2
−0.5
−1.0
−1.5
0
◮ R: heatmap function
(package stats)
◮ Left dendrogram:
clustering of rows (data
items)
x7
x4
x2
x6
x5
x1
x3
Jan T Kim PCA & Hierarchical Clustering
Hierarchical Clustering: Summary