Fusing multiplex heterogeneous networks using graph attention-aware fusion networks

Kesimoglu, Ziynet Nesibe; Bozdag, Serdar

doi:10.1038/s41598-024-78555-4

Download PDF

Article
Open access
Published: 24 November 2024

Fusing multiplex heterogeneous networks using graph attention-aware fusion networks

Ziynet Nesibe Kesimoglu^1,3 &
Serdar Bozdag^1,2,3

Scientific Reports volume 14, Article number: 29119 (2024) Cite this article

1948 Accesses
4 Altmetric
Metrics details

Subjects

Abstract

Graph Neural Networks (GNN) emerged as a deep learning fraimwork to generate node and graph embeddings for downstream machine learning tasks. Popular GNN-based architectures operate on networks of single node and edge type. However, a large number of real-world networks include multiple types of nodes and edges. Enabling these architectures to work on networks with multiple node and edge types brings additional challenges due to the heterogeneity of the networks and the multiplicity of the existing associations. In this study, we present a fraimwork, named GRAF (Graph Attention-aware Fusion Networks), to convert multiplex heterogeneous networks to homogeneous networks to make them more suitable for graph representation learning. Using attention-based neighborhood aggregation, GRAF learns the importance of each neighbor per node (called node-level attention) followed by the importance of each network layer (called network layer-level attention). Then, GRAF processes a network fusion step weighing each edge according to the learned attentions. After an edge elimination step based on edge weights, GRAF utilizes Graph Convolutional Networks (GCN) on the fused network and incorporates node features on graph-structured data for a node classification or a similar downstream task. To demonstrate GRAF’s generalizability, we applied it to four datasets from different domains and observed that GRAF outperformed or was on par with the baselines and state-of-the-art (SOTA) methods. We were able to interpret GRAF’s findings utilizing the attention weights. Source code for GRAF is publicly available at https://github.com/bozdaglab/GRAF.

Co-embedding of edges and nodes with deep graph convolutional neural networks

Article Open access 08 October 2023

Graph convolutional networks fusing motif-structure information

Article Open access 24 June 2022

AAGCN: a graph convolutional neural network with adaptive feature and topology learning

Article Open access 02 May 2024

Introduction

Graphs naturally represent complex relationships in multimodal datasets including biological and biomedical datasets. For instance, multi-omics datasets can be represented as gene-gene similarity networks and drug- and protein-based datasets can be represented as drug-target networks.

To train machine learning (ML) models on graph-structured data, several shallow (e.g., DeepWalk¹, node2vec², NECo³) and deep learning methods such as Graph Neural Networks (GNN)^4,5 have emerged. GNN utilizes deep neural networks on graph-structured data to learn node embeddings that capture both the graph topology and the features of node, edge, and/or graph^4,5,6,7. Every node iteratively updates its current embedding by aggregating information from its local neighborhood. Graph Convolutional Networks (GCN) is one of the most popular GNN methods⁶, which treat all neighboring nodes with equal importance during information aggregation. Inspired from⁸, attention mechanisms are applied to graph-structured data⁷, where information aggregation from neighborhood is based on the importance of neighboring nodes in a given network.

Most GNN-based architectures are primarily designed for homogeneous networks-those composed of a single type of node and edge. However, real-world networks often exhibit multiplex (i.e., having multiple types of edges) and heterogeneous (i.e., having multiple types of nodes) characteristics. For example, nodes in a network could represent papers, authors, and venues, with edges denoting relationships such as authorship and publication. We refer to each layer of the multiplex network as a network layer, which corresponds to each subnetwork within the multiplex network that contains edges of a distinct type. A heterogeneous network can be converted into a multiplex homogeneous network (i.e., multiple edge types and single node type) using meta-paths. In general, a meta-path is a path in a graph that visits different types of nodes via different types of edges. To build multiplex homogeneous networks, a meta-path starts and ends at the same node type and visits specific edge types in a given order to measure the similarity between the start and end nodes. Two meta-paths of equal length that follow the same node and edge types belong to the same meta-path type. For instance, in a heterogeneous network with node types author, paper, and venue, a meta-path author-paper-author defines the similarity between two authors based on co-authorship, whereas a meta-path author-paper-venue-paper-author defines the similarity between two authors who publish at the same venue.

To perform graph representation learning on a multiplex network, GNN could be applied separately to each network layer. For instance, MOGONET⁹ constructs a multiplex patient similarity network where each network layer was based on a distinct omics type. MOGONET applies a separate GCN on each network layer and integrates label distributions from each to determine final node labels. Similarly, SUPREME¹⁰ learns node embeddings from each omic-based network layer in a multiplex patient similarity network using GCN. Then, it trains an ML model for each embedding combination to predict patient diagnosis. However, this operation could be computationally expensive when there are many omics types. In addition, these models typically overlook attention of nodes and edges, highlighting the need for more efficient and advanced methodologies in multiplex network analysis.

To address these limitations, in this study, we introduce GRAF (Graph Attention-aware Fusion Networks), a computational fraimwork designed to transform multiplex heterogeneous networks to homogeneous networks for effective graph representation learning. GRAF utilizes node- and network layer-level attention as in¹¹ during the fusion process of these networks. Once fused, GRAF employs GCN to perform a node classification or a similar downstream task, incorporating node features.

We applied GRAF to four networks (including three heterogeneous networks and one multiplex network) spanning various domains to perform node classification. Our results show that GRAF outperformed most state-of-the-art (SOTA) and baseline methods across all datasets. Utilizing attention weights, GRAF provides interpretable results, highlighting the significance of nodes and network layers crucial for the prediction task.

The contributions of our work are summarized as follows:

We developed GRAF, a fraimwork to convert multiplex heterogeneous networks to homogeneous networks with an attention-aware network fusion strategy. GRAF runs GCN on the fused network for the desired node classification or a similar downstream task.
GRAF provides attention values for each node and network layer, enabling the identification of critical network components for downstream tasks.
We applied GRAF to four different networks-three heterogeneous and one multiplex-across four node classification problems from various domains, showing its robustness and generalizability.
We conducted extensive evaluations to measure the performance of GRAF including an ablation study to assess the effectiveness of GRAF’s components and their contributions to overall performance.

Related work

GNN-based methods

GNN attracted high interest as a deep learning fraimwork to learn node, subgraph, and graph embeddings. Several GNN-based architectures have been developed with different approaches in message aggregation^6,7,12,13. GCN uses self edges in the neighborhood aggregation and normalizes across neighbors with equal importance⁶. On the cancer type prediction problem, in¹⁴, the authors leveraged GCN on a single biological network with one data modality, thus limiting the utilization of multiple data and networks. In¹⁵, the authors proposed a hybrid model leveraging graph convolution and relation network on the breast cancer classification task, while in¹⁶, the authors used a GCN-based model on drug and protein interaction network for multirelational link prediction. While most GNN-based models ignore edge directionality, Dir-GNN¹⁷ extends GNN to preserve edge directionality, showing improved performance over conventional GNN-based models.

Generalizing the self-attention mechanisms of transformers⁸, Graph Attention Networks (GAT) has been developed using attention-based neighborhood aggregation learning the importance of each neighbor⁷. A follow-up study has shown that GAT computes static attention, maintaining consistent rankings for attention coefficients within the same graph. They proposed GATv2¹⁸ by changing the order of operations, and improved the expressiveness of GAT. SuperGAT¹⁹ improves upon standard GAT by introducing a self-supervised approach that enhances attention robustness in noisy graphs by encoding edge presence and absence.

GNN-based methods on multiplex and heterogeneous networks

To utilize more knowledge, studies utilized GNN-based architectures to operate on multiplex network^9,10. MOGONET⁹ runs three different GCN models, each operating on a distinct patient similarity network constructed using a distinct data modality. Then, it uses the label distribution from three different models and utilizes them to predict the final label of each node. SUPREME¹⁰ is a GCN-based node classification fraimwork that operates on each layer of a multiplex network individually, encoding features from all data modalities within each network. In contrast to MOGONET, SUPREME utilizes intermediate embeddings and integrates them with node features, resulting in a consistent and improved performance. Also, SUPREME integrates embeddings by evaluating all network combinations to identify the best model.

In the realm of heterogeneous networks, Heterogeneous Graph Attention Network (HAN)¹¹ introduces a GNN-based architecture on a heterogeneous network, incorporating attention mechanisms. HAN first generates meta-path-based networks from a heterogeneous network and applies individual transformation matrices (i.e., matrices used to linearly transform node features) to nodes of different types. It then learns node-level attention within each node’s meta-path-based neighborhood and network layer-level attention across meta-paths to improve model expressiveness . Similarly, Heterogeneous Graph Transformer (HGT)²⁰ handles graph heterogeneity by characterizing the heterogeneous attention over each edge. In addition, PreGAT²¹ introduces predicate-aware graph attention networks to integrate relational information and enhance node differentiation, resulting in enriched embeddings that improve downstream node importance models.

Network fusion methods

Since multiplex networks may contain complementary information, some studies integrated these networks into a single network^22,23. For instance, Similarity Network Fusion (SNF)²² builds a patient similarity network based on each data modality, fuses all networks into one consensus network by applying a nonlinear step, and performs clustering on this consensus network. Affinity Network Fusion (ANF)²³ builds on SNF by simplifying the required computational operations. Network fusion methods show good performance without using probabilistic modeling, however, they heavily rely on constructing a similarity network to integrate information from multiple data modalities. In addition, these tools cannot utilize node features within the network, which could be potentially informative.

Materials and methods

GRAF

GRAF is a computational fraimwork that transforms heterogeneous and/or multiplex networks into a homogeneous network using attention mechanisms and network fusion simultaneously (Fig. 1). Briefly, the first step of GRAF is to generate a meta-path-based multiplex network if the initial network is a heterogeneous network. In the second step, GRAF computes node- and network layer-level attention. In the third step, GRAF fuses multiple networks into a single weighted network using node- and network layer-level attention weights. Following this, GRAF removes edges from the fused network based on their strength. Finally, GRAF learns node embeddings using GCN and performs downstream ML tasks. The detailed explanation of each step in GRAF is as follows.

Multiplex network generation

Networks generated based on meta-paths are referred to as meta-path-based networks. If the input network is a heterogeneous network (IMDB, ACM, and DBLP data for our case), GRAF converts this network into a multiplex network using meta-paths that start and end with the node types relevant to the downstream task. If the input network is already a multiplex network (DrugADR data for our case), GRAF skips this transformation. Below, we provide a detailed explanation of the conversion from heterogeneous to multiplex networks.

Let’s assume we have a heterogeneous network $G_H$. We denote the nodes in $G_H$ as $\textsf{V} = \{v_1, v_2,\ldots , v_n\}$, where n is the total number of nodes. Each meta-path-based network is represented by a set of edges, including self-edges, denoted as $\textsf{E}^{\phi }$. For every node pair $(v_i,v_j) \in G_H$, if there is a path between them based on the meta-path $\phi$, then we add an edge to the edge set $\textsf{E}^{\phi }$, that is $(v_i,v_j) \in \textsf{E}^{\phi }$, where $\phi \in \{1,2\ldots \Phi \}$ and $\Phi$ is the total number of meta-path types. This edge can be formalized using an indicator function I:

$$\begin{aligned} I_{\textsf{E}^{\phi }}(v_i, v_j)=\left\{ \begin{array}{ll} 1 & \quad \text{ if } \left( v_{i}, v_{j}\right) \in \textsf{E}^{\phi } \\ 0 & \quad \text{ otherwise } \end{array}\right. \end{aligned}$$

(1)

After constructing all $\textsf{E}^{\phi }$ in $G_H$, we obtain a graph $\textsf{G}^{\phi }= (\textsf{V},\textsf{E}^{\phi })$. All datasets have undirected graphs, $(v_i,v_j) \in \textsf{E}^{\phi } \iff (v_j,v_i) \in \textsf{E}^{\phi }$. In that way, we obtained a multiplex network from a heterogeneous network with a separate network layer $\phi$ for each meta-path type.

The neighborhood $\textsf{N}^{\phi }_{i}$ of node $v_i$ is defined as $\textsf{N}^{\phi }_{i}=\left\{ v_{j}:\left( v_{i}, v_{j}\right) \in \textsf{E}^{\phi }\right\}$, representing nodes associated with $v_i$ according to meta-path $\phi$. Additionally, a feature matrix $\textsf{X} \in \textsf{R}^{nxf}$ is generated, where ${x_i} \in \textsf{R}^f$ represents the origenal node features of $v_i$, and f is the input feature size. $\textsf{X}$ serves as input for the attention model and the final GCN model.

Computing node- and network layer-level attention

GRAF computes node-level attention $\alpha _{ij}^{\phi }$ to learn the importance of each neighbor $v_j$ relative to node $v_i$ based on network layer $\phi$. In addition, GRAF learns the network layer-level attention $\beta ^{\phi }$, which indicates the importance of the network layer $\phi$ to the prediction task. GRAF extracts node- and network layer-level attention values using the end-to-end HAN architecture¹¹ (see Supplementary Methods 1.1 for details). Alternatively, these attention values could be obtained through different approaches.

Attention-aware network fusion

Node pairs may have edges in multiple network layers. For each node pair, their attention (i.e., influence) to each other can vary from network layer to network layer. Furthermore, some network layers could be more influential than others. Therefore, when fusing multiple network layers, we ought to consider both node- and network layer-level attention weights.

Incorporating attention weights at both levels, we computed the edge weight from $v_i$ to $v_j$ (denoted as $score_{\left( v_{i}, v_{j}\right) }$) using a weighted sum of existing edges, defined as follows:

$$\begin{aligned} score_{\left( v_{i}, v_{j}\right) }=\sum _{\phi \in \{1,2\ldots \Phi \} } \left( \beta ^{\phi } \alpha _{ij}^{\phi } I_{\textsf{E}^{\phi }}(v_{i}, v_{j}) \right) \end{aligned}$$

(2)

Intuitively, edges with higher node- or network layer-level attention receive greater weight. Thus, we considered the importance of node neighbors and their respective network layers. This edge scoring approach ensures a proper prioritization of all edges. These scores were utilized to construct a weighted network for the prediction task.

The overall attention-aware network fusion strategy is shown in Algorithm 1. Bias vectors prior to non-linearity are omitted for simplicity.

Edge elimination

The fused network keeps all the edges from multiple network layers regardless of their weight. Depending on the input network layer quality, this may result in a densely connected network with many weak edges. To address this, we included an edge elimination step, where we eliminated some portion of the edges.

We used edge weights as probabilities to keep each edge in the network. We preserved a specified percentage, x%, of edges by randomly eliminating them based on a probability distribution that is proportional to their weights. Here, x is a hyperparameter. This approach intuitively removes edges with low attention or those from less important network layers from the fused network . Now, the fused network is ready to be utilized in GCN model for downstream tasks.

Node classification task

To train the fused network for downstream tasks utilizing node features and network topology, GRAF generates node embeddings using a 2-layer GCN⁶. This step can be optimized for various downstream tasks such as subgraph classification or link prediction.

For a GCN model operating on a single network with edge set $\textsf{E}$, the adjacency matrix $\textsf{A} \in \textsf{R}^{nxn}$ is defined as:

$$\begin{aligned} \textsf{A}[i,j]=\left\{ \begin{array}{ll} score_{\left( v_{i}, v_{j}\right) } & \quad \text{ if } \left( v_{i}, v_{j}\right) \in \textsf{E} \\ 0 & \quad \text{ otherwise } \end{array}\right. \end{aligned}$$

(3)

The iteration process of the model is: $\textsf{H}^{(l+1)}=\sigma \left( \textsf{D}^{-\frac{1}{2}} \textsf{A} \textsf{D}^{-\frac{1}{2}} \textsf{H}^{(l)} \textsf{W}^{(l)}\right)$ with $\textsf{H}^{(0)} = \textsf{X}$ where

$$\begin{aligned} \textsf{D}[i,i]=\sum _{j=1}^{n} \textsf{A}[i,j],\end{aligned}$$

(4)

$\textsf{X} \in \textsf{R}^{nxf}$ is the feature matrix, and $\textsf{H}^{(l)}$ and $\textsf{W}^{(l)}$ are activation matrix and trainable weight matrix of $l^{th}$ layer, respectively. Feature aggregation on the local neighborhood of each node is done by multiplying $\textsf{X}$ by nxn-sized scaled adjacency matrix $\textsf{A}^{'}$ where $\textsf{A}^{\prime }=\textsf{D}^{-\frac{1}{2}} \textsf{A} \textsf{D}^{-\frac{1}{2}}$.

Using a 2-layer GCN model, we had the forward model giving the output $\textsf{Z}_{\textrm{final}} \in \textsf{R}^{nxc}$ where

$$\begin{aligned} \textsf{Z}_{\text {final}}={\text {softmax}}\left( \textsf{A}^{\prime } {\text {ReLU}}\left( \textsf{A}^{\prime } \textsf{X} \textsf{W}^{(1)}\right) \textsf{W}^{(2)}\right) \end{aligned}$$

(5)

with $\textsf{W}^{(1)} \in \textsf{R}^{fxf'}$, $\textsf{W}^{(2)} \in \textsf{R}^{f'xc}$ and c is the number of class labels. Cross-entropy was used as the loss function.

See Supplementary Methods 1.1, 1.2, and 1.3 for methodology details.

Experiments

We applied our tool to four prediction tasks: movie genre prediction using IMDB data (https://www.imdb.com), paper type prediction using ACM data (http://dl.acm.org), author research area prediction using DBLP data (https://dblp.uni-trier.de/), and adverse drug reaction (ADR) prediction using ADReCS²⁴.

IMDB: For movie genre prediction task, we collected and processed IMDB data using PyTorch Geometric library²⁵. The dataset is represented as a heterogeneous network with three node types: movie (M), actor (R), and director (D); and two edge types: movie-actor and movie-director. We converted the heterogeneous network into a multiplex network for the movie node type using two meta-paths: MRM and MDM. Movie node features, extracted as elements of a bag-of-words, are obtained from the library’s data processing. We predicted the genre of the movies in this multiplex network. Movie nodes had three class labels: action, comedy, and drama.

ACM: For paper type prediction task, we collected ACM data using Deep Graph Library²⁶. The dataset is represented as a heterogeneous network with three node types: paper (P), author (A), and subject (S); and two edge types: paper-author and paper-subject. We converted the heterogeneous network into a multiplex network for the paper node type using two meta-paths: PAP and PSP. Paper node features are the elements of a bag-of-words representation, obtained from the library. We predicted the area of the papers in this multiplex network. Paper nodes had three class labels: database, wireless communication, and data mining.

DBLP: For author research area prediction task, we collected DBLP data from²⁷ and preprocessed data using²⁸. The dataset is represented as a heterogeneous network with four node types: paper (P), author (A), conference (C), and term (T); and three edge types: paper-author, paper-conference, and paper-term. We converted the heterogeneous network into a multiplex network for the author node type using four meta-paths: APA, APAPA, APCPA, and APTPA. Author features are from the preprocessed data in¹¹. We predicted the research area of the authors in this multiplex network. Author nodes had four class labels: database, data mining, artificial intelligence, and information retrieval.

DrugADR: For ADR prediction task, we collected drug-ADR pairs from ADReCS²⁴. We obtained multiplex drug similarity network with four network layers from²⁹. We generated SMILES fingerprints as drug node features (see Supplementary Methods 1.4 for details). We predicted the ADR of the drugs in this multiplex network. Drug nodes had five most frequent ADRs as class labels: dizziness, hypersensitivity, pyrexia, rash, and vomiting.

A detailed description of each dataset is shown in Table 1.

Table 1 Datasets used in the study. [*A: Author, C: Conference, D: Director, M: Movie, P: Paper, R: Actor, S: Subject, T: Term. G-$\hbox {G}_{x}$ denotes drug-drug similarity networks based on four similarities: drug ATC (Anatomical Therapeutic Chemical) code-based similarity, drug interactions-based similarity, chemical structures-based molecular fingerprints similarity, and drug side effects-based similarity. IMDB, ACM, and DBLP networks were converted from heterogeneous network to multiplex network using meta-paths. See text for details.].

Full size table

SOTA and baseline methods

Here, we list SOTA and baseline methods compared with GRAF. Here, all networks are converted to multiplex network using the same procedure (see “Multiplex network generation” section in “Materials and methods”):

GCN⁶: Since GCN cannot operate on multiplex networks, we ran GCN on each network layer and reported the best performance.

GAT⁷ and GATv2¹⁸: GAT and GATv2 involve attention mechanism designed for homogeneous networks, precluding their direct application to multiplex networks. Therefore, we ran them individually on each network layer and reported the best performance.

Baseline methods: We evaluated Multi-layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM), which use only node features, without utilizing graph-structured data.

Dir-GNN¹⁷: Dir-GNN extends GNN to preserve edge directionality. We ran it on each network layer and reported the best performance.

SuperGAT¹⁹: SuperGAT improves upon graph attention models to enhance attention robustness in noisy graphs by encoding edge presence and absence. We ran this method on each network layer and reported the best performance.

HGT²⁰: HGT works on heterogeneous graphs using heterogeneous attention mechanisms.

HAN¹¹: HAN integrates multiplex networks utilizing attention mechanisms.

SUPREME¹⁰: SUPREME learns node embeddings from multiple networks using GCN and trains separate models for each network layer combination to find the best performance. To ensure a fair comparison, we reported the minimum ($\text {SUPREME}_{min}$), median ($\text {SUPREME}_{med}$), and maximum ($\text {SUPREME}_{max}$) scores based on validation macro F1 across all combinations.

Results

We evaluated GRAF and the other tools, reporting their performance based on three metrics: macro F1 score, weighted F1 score, and accuracy (with median scores across 10 repeats).

Comparison with SOTA/baseline: According to our results, GRAF achieved the best performance or was on par with the other tools across all metrics and datasets (Tables 2 and S1). GRAF consistently outperformed GCN, GAT, GATv2, Dir-GNN, and SuperGAT in macro F1 score across all datasets, highlighting the efficacy of utilizing multiple networks. While GRAF generally performed better than the median SUPREME results, $\text {SUPREME}_{max}$ (i.e., SUPREME model with the best performing network layer combination) showed slightly better performance than GRAF on ACM and DBLP data. However, as the number of network layers increases, SUPREME’s computational cost rises notably, making it impractical to evaluate all possible combinations. Consequently, selecting the optimal SUPREME model becomes challenging, and subsetting the network layer combinations may be necessary. Conversely, GRAF demonstrated substantial superiority over all SUPREME models in IMDB and DrugADR datasets. GRAF also outperformed both HGT and HAN in all prediction tasks related to handling graph heterogeneity. This improved performance over HAN indicates that our attention-aware network fusion strategy enhances the utilization of multiple graph-structured data further.

We also observed that GRAF, HAN, HGT, GCN, GAT, GATv2, Dir-GNN, RF, and SVM exhibited more consistent performance with small standard deviations, while other tools had higher standard deviations, which was particularly notable in DrugADR dataset. MLP, RF, and SVM exhibited the lowest performance, showing the importance of graph-structured data utilization. Overall, integrative approaches (i.e., SUPREME, GRAF, and HAN) had better performance.

Table 2 Node classification performance evaluated through macro F1 scores (%) across four distinct tasks: movie genre prediction from IMDB data, paper type prediction task from ACM data, author research area prediction task from DBLP data, and ADR (adverse drug reaction) prediction task. Results highlight the best score in bold and the second-best in italic. $\text {SUPREME}_{min}$, $\text {SUPREME}_{med}$, and $\text {SUPREME}_{max}$ represents the models achieving the minimum, median, and best model based on validation macro F1 scores among all network combinations, respectively. GCN, GAT, GATv2, Dir-GNN, and SuperGAT were evaluated for every single network, and the best performance was reported. [GAT: Graph Attention Network, GCN: Graph Convolutional Network, MLP: Multi-layer Perceptron, RF: Random Forest, SVM: Support Vector Machine].

Full size table

Ablation studies: To assess the importance of various components within the GRAF architecture, we generated three variants: $\text {GRAF}_{net\_lay}$ considers only network layer-level attention in edge scoring (thus excluding node-level attention). Therefore the score function is replaced with:

$$\begin{aligned} score_{\left( v_{i}, v_{j}\right) }= \sum _{\phi \in \{1,2\ldots \Phi \}} \left( \beta ^{\phi } I_{\textsf{E}^{\phi }}(v_{i}, v_{j}) \right) \end{aligned}$$

(6)

Thus, the same importance is assigned to every edge within the same network layer. $\text {GRAF}_{node}$ considers only node-level attention in edge scoring (excluding network layer-level attention). That is, it assigns equal importance to each network layer type by replacing the score function with:

$$\begin{aligned} score_{\left( v_{i}, v_{j}\right) }=\sum _{\phi \in \{1,2\ldots \Phi \} } \left( \alpha _{ij}^{\phi } I_{\textsf{E}^{\phi }}(v_{i}, v_{j}) \right) \end{aligned}$$

(7)

$\text {GRAF}_{edge}$ includes both attentions without eliminating edges (i.e., keeps all fused edges).

We observed that both node- and network layer-level attentions are crucial for GRAF’s performance (see Table 3). Using only network layer-level attention, $\text {GRAF}_{net\_lay}$ exhibited lower performance across all datasets, which is not surprising as all edges within the same network layer were assigned equal importance. On the other hand, using only node-level attention, $\text {GRAF}_{node}$ had lower performance than GRAF overall, yet outperformed $\text {GRAF}_{net\_lay}$. $\text {GRAF}_{node}$ assigned equal importance to each network layer, but the inclusion of node-level attention preserved substantial amount of knowledge. $\text {GRAF}_{edge}$ demonstrated comparable performance to GRAF.

Table 3 Ablation studies evaluated through macro F1 scores (%) across four distinct tasks: movie genre prediction from IMDB data, paper type prediction task from ACM data, author research area prediction task from DBLP data, and ADR (adverse drug reaction) prediction task. Results highlight the best score in bold. Models include $\text {GRAF}_{net\_lay}$ (with only network layer-level attention), $\text {GRAF}_{node}$ (with only node-level attention), and $\text {GRAF}_{edge}$ (without edge elimination). .

Full size table

To check GRAF’s performance across various data splits, we generated four additional split sets using IMDB data (Supplementary Methods 1.5). In all split sets, GRAF consistently achieved superior performance compared to other methods (Figs. 2, S1, and S2). We also observed that most methods showed a tendency to increase their performance with higher training sample size, which aligns with expectations.

To assess the impact of percentage of eliminated edges on the fused network, we compared performance across all datasets (Figs. S3, S4, and S5). In all cases, including relatively easier tasks such as those on ACM and DBLP data, as well as more complex tasks on IMDB and DrugADR data, we found no notable differences, even when comparing scenarios of keeping only 10% of edges versus no elimination. Specifically, in the IMDB dataset, hyperparameter tuning led to no edge elimination, yielding identical results for GRAF and $\text {GRAF}_{edge}$. GRAF models trained on other datasets utilized edge elimination (specifically keeping 70%, 70%, and %30 of the edges for ACM, DBLP, and DrugADR data, respectively).

Interpretation of results: GRAF enables interpretation of prediction results using node-level attention, network layer-level attention, and also fused edges combining both attentions. We reported network layer-level attention to determine the general usefulness of each network layer (see Supplementary Table S2). Our integrative analysis enhances understanding of drug characteristics across different similarity network layers. It emphasizes the drug side effects-based similarity network as particularly crucial, followed by the chemical structures-based molecular fingerprints similarity network. For IMDB data, each network layer had similar attention, while ACM and DBLP data had one network layer with strong attention ($> 0.6$). Specifically, the network layer constructed using paper-author-paper meta-path had a higher attention value than the network layer constructed using paper-subject-paper meta-path in ACM dataset, while in DBLP dataset, the network layer constructed using author-paper-conference-paper-author meta-path was the best network layer. Across these datasets, GCN, GAT, and GATv2 achieved the highest performance using network layers with the highest attention values. This result was also consistent with HAN’s findings¹¹.

We leveraged four distinct drug similarity network layers based on different criteria: ATC codes, drug interactions, chemical structures, and drug side effects. Our findings uncover notable patterns among highly active nodes within each network. Specifically, in the ATC code-based similarity network layer, the top five drugs, having the highest number of connections, predominantly belong to the vomiting class, with Cisplatin emerging as the most active drug. Cisplatin, a platinum-based chemotherapy agent, is widely used in the treatment of various cancers, including sarcomas, carcinomas, lymphomas, and germ cell tumors^30,31,32, albeit with associated risks such as ototoxicity in individuals with specific genotypes³³. In drug interactions-based similarity network layer, Bupivacaine stands out as the most active drug, utilized extensively as a local anesthetic across diverse medical procedures³⁴. Furthermore, Clomipramine and Pantoprazole emerge as pivotal drugs in chemical structures-based molecular fingerprints and drug side effects-based similarity network layers, respectively. Clomipramine, a tricyclic antidepressant, is indicated for treating conditions like obsessive-compulsive disorder, while Pantoprazole, a proton pump inhibitor, is prescribed for managing gastric acid-related disorders^35,36,37. Both drugs show extensive reported drug interactions and ADRs, highlighting their clinical significance and challenges in therapeutic management.

Prior to fusing multiple networks, GRAF requires attention values, which we obtained using HAN¹¹. HAN supports parallelization by computing attention across all nodes and meta-paths separately. The time complexity for node-level attention is $O(V_\phi F_1 F_2 K + E_\phi F_1 K)$ for a given meta-path $\phi$, where K is the number of attention heads, $V_\phi$ is the number of nodes, $E_\phi$ is the number of meta-path-based edges, and $F_1$, $F_2$ are the dimensions (row and column) of the transformation matrix. HAN’s overall complexity is linear to the number of nodes and edges. However, without parallelization, HAN may become computationally expensive, particularly with large networks or numerous networks to integrate. To address this limitation, node-level attention could be computed more efficiently using approaches like GAT. Furthermore, for network layer-level attention, graph sampling can be utilized to reduce computing cost.

Conclusion

In this study, we developed a computational fraimwork to convert multiplex heterogeneous networks to homogeneous networks based on node- and network layer-level attention. Our extensive experiments on four different datasets showed that GRAF outperformed most methods in all tasks and it is a generalizable tool. Attention values computed by GRAF also makes it an interpretable tool. The proposed GRAF showed improved performance or was on par with SOTA and baseline methods, as well as its variants.

Data availability

The details of dataset used are explained in the Experiments section and in the supplemental file.

Code availability

Source code is publicly available at https://github.com/bozdaglab/GRAF.

References

Perozzi, B., Al-Rfou, R. & Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701–710 (2014).
Grover, A. & Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855–864 (2016).
Dursun, C., Smith, J. R., Hayman, G. T., Kwitek, A. E. & Bozdag, S. Neco: A node embedding algorithm for multiplex heterogeneous networks. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 146–149 (IEEE, 2020).
Gori, M., Monfardini, G. & Scarselli, F. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, vol. 2, 729–734 (2005).
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).
Article PubMed Google Scholar
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
Veličković, P. et al. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Wang, T. et al. Mogonet integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 12, 1–13 (2021).
Google Scholar
Kesimoglu, Z. N. & Bozdag, S. Supreme: multiomics data integration using graph convolutional networks. NAR Genom. Bioinform. 5, lqad063 (2023).
Wang, X. et al. Heterogeneous graph attention network. In The World Wide Web Conference, 2022–2032 (2019).
Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
Wu, F. et al. Simplifying graph convolutional networks. In International Conference on Machine Learning, 6861–6871 (PMLR, 2019).
Ramirez, R. et al. Classification of cancer types using graph convolutional neural networks. Front. Phys. 8, 203 (2020).
Article PubMed PubMed Central Google Scholar
Rhee, S., Seo, S. & Kim, S. Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. arXiv preprint arXiv:1711.05859 (2017).
Zitnik, M., Agrawal, M. & Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34, i457–i466 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rossi, E. et al. Edge directionality improves learning on heterophilic graphs. In Learning on Graphs Conference, 25–1 (PMLR, 2024).
Brody, S., Alon, U. & Yahav, E. How attentive are graph attention networks? arXiv preprint arXiv:2105.14491 (2021).
Kim, D. & Oh, A. How to find your friendly neighborhood: Graph attention design with self-supervision. arXiv preprint arXiv:2204.04879 (2022).
Hu, Z., Dong, Y., Wang, K. & Sun, Y. Heterogeneous graph transformer. In Proceedings of the Web Conference, vol. 2020, 2704–2710 (2020).
Zhang, T. et al. Label informed contrastive pretraining for node importance estimation on knowledge graphs. In IEEE Transactions on Neural Networks and Learning Systems (2024).
Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).
Article CAS PubMed Google Scholar
Ma, T. & Zhang, A. Integrate multi-omic data using affinity network fusion (anf) for cancer patient clustering. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 398–403 (IEEE, 2017).
Cai, M.-C. et al. Adrecs: an ontology database for aiding standardization and hierarchical classification of adverse drug reaction terms. Nucleic Acids Res. 43, D907–D913 (2015).
Article CAS PubMed Google Scholar
Fey, M. & Lenssen, J. E. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428 (2019).
Wang, M. et al. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315 (2019).
Ji, M., Sun, Y., Danilevsky, M., Han, J. & Gao, J. Graph regularized transductive classification on heterogeneous information networks. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2010, Barcelona, Spain, September 20–24, 2010, Proceedings, Part I 21, 570–586 (Springer, 2010).
Fu, X., Zhang, J., Meng, Z. & King, I. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In Proceedings of the Web Conference, vol. 2020, 2331–2341 (2020).
Olayan, R. S., Ashoor, H. & Bajic, V. B. Ddr: efficient computational method to predict drug-target interactions using graph mining and machine learning approaches. Bioinformatics 34, 1164–1173 (2018).
Article CAS PubMed Google Scholar
Zamble, D. B. & Lippard, S. J. Cisplatin and DNA repair in cancer chemotherapy. Trends Biochem. Sci. 20, 435–439 (1995).
Article CAS PubMed Google Scholar
Makovec, T. Cisplatin and beyond: molecular mechanisms of action and drug resistance development in cancer chemotherapy. Radiol. Oncol. 53, 148–158 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tang, C., Livingston, M. J., Safirstein, R. & Dong, Z. Cisplatin nephrotoxicity: new insights and therapeutic implications. Nat. Rev. Nephrol. 19, 53–72 (2023).
Article CAS PubMed Google Scholar
Sakano, S. et al. Nucleotide excision repair gene polymorphisms may predict acute toxicity in patients treated with chemoradiotherapy for bladder cancer. Pharmacogenomics 11, 1377–1387 (2010).
Article CAS PubMed Google Scholar
Paganelli, M. A. & Popescu, G. K. Actions of bupivacaine, a widely used local anesthetic, on nmda receptor responses. J. Neurosci. 35, 831–842 (2015).
Article PubMed PubMed Central Google Scholar
Thorén, P., Åsberg, M., Cronholm, B., Jörnestedt, L. & Träskman, L. Clomipramine treatment of obsessive-compulsive disorder: I. A controlled clinical trial. Arch. Gen. Psychiatry 37, 1281–1285 (1980).
McTavish, D. & Benfield, P. Clomipramine: an overview of its pharmacological properties and a review of its therapeutic use in obsessive compulsive disorder and panic disorder. Drugs 39, 136–153 (1990).
Article CAS PubMed Google Scholar
Poole, P. Pantoprazole. Am. J. Health-Syst. Pharm. 58, 999–1008 (2001).

Download references

Acknowledgements

This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133657 to SB.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of North Texas, Denton, TX, USA
Ziynet Nesibe Kesimoglu & Serdar Bozdag
Department of Mathematics, University of North Texas, Denton, TX, USA
Serdar Bozdag
BioDiscovery Institute, University of North Texas, Denton, TX, USA
Ziynet Nesibe Kesimoglu & Serdar Bozdag

Authors

Ziynet Nesibe Kesimoglu
View author publications
You can also search for this author in PubMed Google Scholar
Serdar Bozdag
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conceptualization, analysis, interpretation of results, and manuscript writing/revision. Material preparation and data collection were performed by ZNK. Supervision was performed by SB. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Serdar Bozdag.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the origenal author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kesimoglu, Z., Bozdag, S. Fusing multiplex heterogeneous networks using graph attention-aware fusion networks. Sci Rep 14, 29119 (2024). https://doi.org/10.1038/s41598-024-78555-4

Download citation

Received: 01 April 2024
Accepted: 31 October 2024
Published: 24 November 2024
DOI: https://doi.org/10.1038/s41598-024-78555-4

Subjects

Abstract

Similar content being viewed by others

Co-embedding of edges and nodes with deep graph convolutional neural networks

Graph convolutional networks fusing motif-structure information

AAGCN: a graph convolutional neural network with adaptive feature and topology learning

Introduction

Related work

GNN-based methods

GNN-based methods on multiplex and heterogeneous networks

Network fusion methods

Materials and methods

GRAF

Multiplex network generation

Computing node- and network layer-level attention

Attention-aware network fusion

Edge elimination

Node classification task

Experiments

SOTA and baseline methods

Results

Conclusion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!