Link Prediction: Leonid E. Zhukov
Link Prediction: Leonid E. Zhukov
Link Prediction: Leonid E. Zhukov
Leonid E. Zhukov
2 Proximity measures
4 Other methods
Graph G(V,E)
Number of ”missing edges”: |V |(|V | − 1)/2 − |E |
In sparse graphs |E | |V |2 , Prob. of correct random guess O( |V1|2 )
|N (vi ) ∩ N (vj )|
Jaccard’s coefficient:
|N (vi ) ∩ N (vj )|
|N (vi ) ∪ N (vj )|
Adamic/Adar:
X 1
log |N (v )|
v ∈N (vi )∩N (vj )
Katz score:
∞
X ∞
X
l (l)
β |paths (vi , vj )| = (βA)lij = (I − βA)−1 − I
l=1 l=1
SimRank:
C X X
SimRank(vi , vj ) = SimRank(m, n)
|N (vi )| · |N (vj )|
m∈N (vi ) n∈N (vj )
Preferential attachment:
ki · kj = |N (vi )| · |N (vj )|
or
ki + kj = |N (vi )| + |N (vj )|
Clustering coefficient:
CC (vi ) · CC (vj )
or
CC (vi ) + CC (vj )
2 Model training
Features:
Topological proximity features
Aggregated features
Content based node proximity features
True positive rate (TPR), False positive rate (FPR), ROC curve, AUC
TP FP
TPR = , FPR =
TP + FN FP + TN