0% found this document useful (0 votes)
104 views8 pages

Tara Venit Per Capita (US$) Rata de Alfabetizare (%) Rata de Mortalitate Infantila (%) Durata Medie de Viata (Ani)

The k-means clustering algorithm grouped the 19 countries into 3 clusters based on 4 quantitative attributes: per capita income, literacy rate, infant mortality rate, and life expectancy. Cluster 1 consists of developed countries like Germany, UK, and Japan. Cluster 2 includes countries in development like Brazil, Turkey and Argentina. Cluster 3 contains underdeveloped nations such as Mozambique, India and Pakistan. Self-organizing maps also clustered the countries into 6 groups based on the attributes, with the first group characterized by high income, literacy and life expectancy.

Uploaded by

HaMu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views8 pages

Tara Venit Per Capita (US$) Rata de Alfabetizare (%) Rata de Mortalitate Infantila (%) Durata Medie de Viata (Ani)

The k-means clustering algorithm grouped the 19 countries into 3 clusters based on 4 quantitative attributes: per capita income, literacy rate, infant mortality rate, and life expectancy. Cluster 1 consists of developed countries like Germany, UK, and Japan. Cluster 2 includes countries in development like Brazil, Turkey and Argentina. Cluster 3 contains underdeveloped nations such as Mozambique, India and Pakistan. Self-organizing maps also clustered the countries into 6 groups based on the attributes, with the first group characterized by high income, literacy and life expectancy.

Uploaded by

HaMu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

1

Venit per capita Rata de Rata de mortalitate Durata medie de


Tara
(US$) alfabetizare(%) infantila(%) viata(ani)

Brazil 10326 90 23.6 75.4

Germany 39650 99 4.08 79.4

Mozambique 830 38.7 95.9 42.1

Australia 43163 99 4.57 81.2

China 5300 90.9 23 73

Argentina 13308 97.2 13.4 75.3

United
34105 99 5.01 79.4
Kingdom

South Africa 10600 82.4 44.8 49.3

Zambia 1000 68 92.7 42.4

Namibia 5249 85 42.3 52.9

Georgia 4200 100 17.36 71

Pakistan 3320 49.9 67.5 65.5

India 2972 61 55 64.7

Turkey 12888 88.7 27.5 71.8

Sweden 34735 99 3.2 80.9

Lithuania 19730 99.6 8.5 73

Greece 36983 96 5.34 79.5

Italy 26760 98.5 5.94 80

Japan 34099 99 3.2 82.6

>fisier<-data.frame(Tara=c("Brazil", "Germany", "Mozambique", "Australia", "China", "Argentina", "UK",


"South Africa","Zambia", "Namibia", "Georgia", "Pakistan", "India", "Turkey", "Sweden", "Lithuania",
"Greece", "Italy", "Japan"), Venit=c(10326, 39650, 830, 43163, 5300,13308,34105, 10600, 1000,
5249,4200, 3320, 2972, 12888, 34735, 19730, 36983, 26760, 34099),
Alfabetizare=c(90,99,38.7,99,90.9,97.2,99,82.4,68,85,100,49.9,61,88.7,99,99.6,96,98.5,99),Mortalitate=c
(23.6,4.08,95.9,4.57,23,13.4,5.01,44.8,92.7,42.3,17.36,67.5,55,27.5,3.2,8.5,5.34,5.94,3.2),Varsta=c(75.4,
79.4,42.1,81.2,73,75.3,79.4,49.3,42.4,52.9,71,65.5,64.7,71.8,80.9,73,79.5,80,82.6))

>fisier
2

>set.seed(5)

> km <- kmeans(fisier[,2:4], 3, 15) Datele sunt clusterizate cu algoritmul k-means cu 3 clustere si 15
iteratii.

>print(km)
> plot(fisier, col = km$cluster)

0 20000 0 40 80

5 10
Tara
20000

Venit
0

Alfabetizare 40 60 80 100
80

Mortalitate
40
0

70

Varsta
50

5 10 40 60 80 100 50 70

Sunt reprezentate observatiile grupate in cele 3 clustere in raport cu cele 4 atribute numerice.
3

> plot(fisier[,2],fisier[,3], col=km$cluster) Sunt vizualizate observatiile grupate in clustere in functie de


atributele numerice Venit per capita (coloana 2) si Rata de alfabetizare (coloana 3).

>points(km$centers, col = 1:3, pch = 8) Sunt afisati centroizii celor 3 clustere.


100
90
80
fisier[, 3]

70
60
50
40

0 10000 20000 30000 40000

fisier[, 2]

>o<-order(km$cluster)

>data.frame(fisier$Tara[o], km$cluster[o])

Sunt afistate tarile si apartenenta lor la cele 3 clustere.

fisier.Tara.o. km.cluster.o.

1 Brazil 1
4

2 Argentina 1

3 South Africa 1

4 Turkey 1

5 Lithuania 1

6 Germany 2

7 Australia 2

8 UK 2

9 Sweden 2

10 Greece 2

11 Italy 2

12 Japan 2

13 Mozambique 3

14 China 3

15 Zambia 3

16 Namibia 3

17 Georgia 3

18 Pakistan 3

19 India 3

> text(x=fisier$Venit, y=fisier$Alfabetizare, labels=fisier$Tara, col=km$cluster)

Observatiile din grafic sunt etichetate cu numele tarilor.


5

100

Georgia Lithuania Japan


UK Germany
Sweden Australia
Argentina Italy
Greece
China Brazil
90

Turkey
Namibia
South Africa
80
fisier[, 3]

70

Zambia

India
60
50

Pakistan
40

Mozambique

0 10000 20000 30000 40000

fisier[, 2]

INTERPRETARE:

Implementarea algoritmului k-means a generat 3 clustere, relativ omogene, constand din 5, 7 si 7


tari. Analizand media clusterului (=centroidul), putem relationa fiecare grup cu tarile componente
astfel:

 clusterul format din Germany, United Kingdom, Greece, Australia, Japan, Italy si Sweden, are cel
mai mare venit per capita, cea mai mare rata de alfabetizare, cea mai mare durata medie de viata si
cea mai mica rata a mortalitatii infantile. Deci, acest cluster reprezinta tarile dezvoltate.

 Clusterul format din Mozambique, Georgia, Pakistan, India, Zambia si Namibia are cele mai
mici valori pentry toate atributele si deci, reprezinta tarile subdezvoltate.
6

 Clusterul format din restul de tari, Brazil, South Africa, Turkey, Argentina siLithuania
reprezinta grupul tarilor in curs de dezvoltare.

Gruparea tarilor dupa algoritmul K-Means a fost comparata cu clasificarea tutoror tarilor bazata pe
indicele dezvoltarii umane. Acest indice (HDI) este o masura comparativa a bunastarii care ia in
considerare aspecte precum: durata medie de viata, rata de alfabetizare si educatia. In comparatie
cu gruparea tarilor pe baza HDI, numai 4 tari au fost clasificate in grupe diferite: Namibia, Georgia,
Pakistan si India. Aceste tari trebuiau plasate in clusterul tarilor in curs de dezvoltare.

SOM (=SELF-ORGANIZING MAPS)

> library(kohonen)

>set.seed(100)

>fisier1<-fisier[-1] Se elimina prima coloana.

>fisier1

> standard<-scale(fisier1, center = attr(train.set, "scaled:center"),scale = attr(train.set, "scaled:scale"))


Se standardizeaza setul de date numerice.

>standard

>somexemplu <- som(standard, grid = somgrid(3, 2, "hexagonal"))

>plot(somexemplu)
7

Venit Mortalitate
Alfabetizare Varsta

S-au obtinut 6 clustere, 3 pe linie si 2 pe coloana. Primul cluster este caracterizat de venit mare, rata de
alfabetizare ridicata si durata medie de viata ridicata.

Ultimul cluster este caracterizat de tari cu o rata de mortalitate infantila ridicata.

Hartile SOM pot fi vizualizate si cu comanda urmatoare:

>plot(somexemplu, type="mapping", labels=fisier$Tara, main="mapping plot")


8

mapping plot

Germany Georgia India


Italy
Australia
Greece
Sweden China
UK Turkey
Brazil Pakistan
Japan

Lithuania Mozambique
Zambia
Argentina South
Namibia
Africa

Am obtinut ca tarile din primul cluster sunt Germania, Australia , UK, Japonia, Suedia, deci tarile
dezvoltate, caracterizate de venit mare, rata de alfabetizare ridicata si durata medie de viata ridicata.

Celelalte clustere se analizeaza similar.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy