MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation
Abstract
:1. Introduction
- This paper highlights the importance of multimodal semantics in the representation of geographic knowledge graphs and proposes to introduce it to enhance the model’s representation capability for GeoKG.
- This paper proposes the MGKGR method, which mitigates the impact of heterogeneity between different modal features through a two-stage fusion process, providing methodological references for the multimodal feature fusion of GeoKG.
- We constructed two multimodal geographic knowledge datasets and conducted extensive experiments to evaluate MGKGR. The experiments confirmed that multimodal data effectively improves the feature quality of geographic knowledge graphs.
2. Related Work
2.1. Traditional Knowledge Graph
2.2. Multimodal Knowledge Graph
2.3. Geographic Knowledge Graph
3. Methods
3.1. Multimodal GeoKG Encoding
- Structural Feature Encoding
- 2.
- Text Feature Encoding
- 3.
- Vision Feature Encoding
3.2. Two-Stage Multimodal Feature Fusion
3.3. Model Optimization
Algorithm 1 The training procedure of MGKGR |
|
4. Experiment
4.1. Dataset
4.2. Experimental Setup
- TransE is a translation-based embedding model that represents relationships between entities by interpreting them as translations in the embedding space
- DistMult is a bilinear model that represents relationships as diagonal matrices, capturing interactions between entities through element-wise multiplication.
- ComplEx extends DistMult by using complex-valued embeddings, enabling it to capture asymmetric relations between entities.
- ConvE employs convolutional neural networks to learn interactions between entities and relations, enabling more expressive feature learning.
- CompGCN integrates graph convolutional networks with various composition operations to capture complex interactions between entities and relations.
- RGCN extends traditional GCNs to handle multi-relational data by incorporating relation-specific transformations.
- MKGFormer is a transformer-based model for multimodal knowledge graphs that integrates textual, visual, and structural information, enabling comprehensive feature fusion and enhanced representation learning.
4.3. Results and Analysis
5. Discussion
5.1. Ablation Study
- w/o Multimodal: MGKGR without the use of multimodal data. The input data are defined in Section 3.1 excluding the multimodal dataset M.
- w/o Structure: MGKGR without the fusion of structural features captured by KRL model.
- w/o CL: MGKGR without the contrastive learning.
- w/o SFF: MGKGR without the spatial feature fusion for modality enhancement.
- w/o CL and SFF: MGKGR without both the contrastive learning and the spatial feature fusion for modality enhancement.
5.2. Performance Comparison Across Different Relation Scenarios
5.3. The Impact of Diversity in Relation Semantics
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, X.; Huang, Y.; Zhang, C.; Ye, P. Geoscience knowledge graph (GeoKG): Development, construction and challenges. Trans. GIS 2022, 26, 2480–2494. [Google Scholar] [CrossRef]
- Ijumulana, J.; Ligate, F.; Bhattacharya, P.; Mtalo, F.; Zhang, C. Spatial analysis and GIS mapping of regional hotspots and potential health risk of fluoride concentrations in groundwater of northern Tanzania. Sci. Total. Environ. 2020, 735, 139584. [Google Scholar] [CrossRef] [PubMed]
- Casali, Y.; Aydin, N.Y.; Comes, T. Machine learning for spatial analyses in urban areas: A scoping review. Sustain. Cities Soc. 2022, 85, 104050. [Google Scholar] [CrossRef]
- Meng, M.; Dabrowski, M.; Stead, D. Enhancing flood resilience and climate adaptation: The state of the art and new directions for spatial planning. Sustainability 2020, 12, 7864. [Google Scholar] [CrossRef]
- Werneck, H.; Silva, N.; Viana, M.C.; Mourão, F.; Pereira, A.C.; Rocha, L. A survey on point-of-interest recommendation in location-based social networks. In Proceedings of the Brazilian Symposium on Multimedia and the Web, São Luís, Brazil, 30 November–4 December 2020; pp. 185–192. [Google Scholar]
- Islam, M.A.; Mohammad, M.M.; Das, S.S.S.; Ali, M.E. A survey on deep learning based Point-of-Interest (POI) recommendations. Neurocomputing 2022, 472, 306–325. [Google Scholar] [CrossRef]
- Zhao, S.; Zhao, T.; King, I.; Lyu, M.R. Geo-teaser: Geo-temporal sequential embedding rank for point-of-interest recommendation. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 May 2017; pp. 153–162. [Google Scholar]
- Grbovic, M.; Cheng, H. Real-time personalization using embeddings for search ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 311–320. [Google Scholar]
- Liu, X.; Liu, Y.; Li, X. Exploring the context of locations for personalized location recommendations. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, USA, 9–15 July 2016; pp. 1188–1194. [Google Scholar]
- Bijalwan, V.; Semwal, V.B.; Gupta, V. Wearable sensor-based pattern mining for human activity recognition: Deep learning approach. Ind. Robot. Int. J. Robot. Res. Appl. 2022, 49, 21–33. [Google Scholar] [CrossRef]
- Rodrigues, R.; Bhargava, N.; Velmurugan, R.; Chaudhuri, S. Multi-timescale trajectory prediction for abnormal human activity detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 2626–2634. [Google Scholar]
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe, NV, USA, 5–8 December 2013; Volume 26. [Google Scholar]
- Yang, B.; Yih, W.t.; He, X.; Gao, J.; Deng, L. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Chen, J.; Hou, H.; Gao, J.; Ji, Y.; Bai, T. RGCN: Recurrent graph convolutional networks for target-dependent sentiment analysis. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Athens, Greece, 28–30 August 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 667–675. [Google Scholar]
- Qiu, P.; Gao, J.; Yu, L.; Lu, F. Knowledge embedding with geospatial distance restriction for geographic knowledge graph completion. ISPRS Int. J. Geo-Inf. 2019, 8, 254. [Google Scholar] [CrossRef]
- Mai, G.; Janowicz, K.; Cai, L.; Zhu, R.; Regalia, B.; Yan, B.; Shi, M.; Lao, N. SE-KGE: A location-aware knowledge graph embedding model for geographic question answering and spatial semantic lifting. Trans. GIS 2020, 24, 623–655. [Google Scholar] [CrossRef]
- Le-Khac, P.H.; Healy, G.; Smeaton, A.F. Contrastive Representation Learning: A Framework and Review. IEEE Access 2020, 8, 193907–193934. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
- Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
- Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 687–696. [Google Scholar]
- Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, WA, USA, 28 June–2 July 2011; Volume 11, pp. 3104482–3104584. [Google Scholar]
- Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 2071–2080. [Google Scholar]
- Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Vashishth, S.; Sanyal, S.; Nitin, V.; Talukdar, P. Composition-based multi-relational graph convolutional networks. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018; Proceedings 15; Springer: Berlin/Heidelberg, Germany, 2018; pp. 593–607. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Usmani, A.; Khan, M.J.; Breslin, J.G.; Curry, E. Towards Multimodal Knowledge Graphs for Data Spaces. In Proceedings of the Companion Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 1494–1499. [Google Scholar]
- Kannan, A.V.; Fradkin, D.; Akrotirianakis, I.; Kulahcioglu, T.; Canedo, A.; Roy, A.; Yu, S.Y.; Arnav, M.; Al Faruque, M.A. Multimodal knowledge graph for deep learning papers and code. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Galway, Ireland, 19–23 October 2020; pp. 3417–3420. [Google Scholar]
- Liu, Y.; Li, H.; Garcia-Duran, A.; Niepert, M.; Onoro-Rubio, D.; Rosenblum, D.S. MMKG: Multi-modal knowledge graphs. In Proceedings of the The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, 2–6 June 2019; Proceedings 16. Springer: Berlin/Heidelberg, Germany, 2019; pp. 459–474. [Google Scholar]
- Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. Dbpedia: A nucleus for a web of open data. In Proceedings of the International Semantic Web Conference, Busan, Republic of Korea, 11–15 November 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar]
- Fabian, M.; Gjergji, K.; Gerhard, W. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In Proceedings of the 16th International World Wide Web Conference (WWW), Banff, AB, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar]
- Li, X.; Zhao, X.; Xu, J.; Zhang, Y.; Xing, C. IMF: Interactive multimodal fusion model for link prediction. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 2572–2580. [Google Scholar]
- Ben-Younes, H.; Cadene, R.; Cord, M.; Thome, N. Mutan: Multimodal tucker fusion for visual question answering. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2612–2620. [Google Scholar]
- Fang, Q.; Zhang, X.; Hu, J.; Wu, X.; Xu, C. Contrastive multi-modal knowledge graph representation learning. IEEE Trans. Knowl. Data Eng. 2022, 35, 8983–8996. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, N.; Li, L.; Deng, S.; Tan, C.; Xu, C.; Huang, F.; Si, L.; Chen, H. Hybrid transformer with multi-level fusion for multimodal knowledge graph completion. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 904–915. [Google Scholar]
- Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Chen, J.; Deng, S.; Chen, H. Crowdgeokg: Crowdsourced geo-knowledge graph. In Proceedings of the Knowledge Graph and Semantic Computing. Language, Knowledge, and Intelligence: Second China Conference, CCKS 2017, Chengdu, China, 26–29 August 2017; Revised Selected Papers 2; Springer: Berlin/Heidelberg, Germany, 2017; pp. 165–172. [Google Scholar]
- Dsouza, A.; Tempelmeier, N.; Yu, R.; Gottschalk, S.; Demidova, E. Worldkg: A world-scale geographic knowledge graph. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, QLD, Australia, 1–5 November 2021; pp. 4475–4484. [Google Scholar]
- Ning, Y.; Liu, H.; Wang, H.; Zeng, Z.; Xiong, H. UUKG: Unified urban knowledge graph dataset for urban spatiotemporal prediction. In Proceedings of the Advances in Neural Information Processing Systems 36 (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2024; Volume 36. [Google Scholar]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
Dataset | #Entity | #Relation | #Triple | |||
---|---|---|---|---|---|---|
#Attr_Entity | #Geo_Entity | #Attr_Relation | #Adj_Relation | #Attr_Triple | #Adj_Triple | |
PA-30K | 218 | 31,267 | 87 | 514 | 223,987 | 186,049 |
FL-25K | 222 | 24,037 | 87 | 360 | 249,222 | 76,326 |
Dataset | PA-30K | FL-25K | ||||||
---|---|---|---|---|---|---|---|---|
Hit@1 | Hit@3 | Hit@10 | MRR | Hit@1 | Hit@3 | Hit@10 | MRR | |
TransE | 30.66 | 48.37 | 50.58 | 39.51 | 33.51 | 49.88 | 52.08 | 41.82 |
DistMult | 23.81 | 32.48 | 34.67 | 28.52 | 33.06 | 43.26 | 45.17 | 38.46 |
ComplEx | 32.04 | 43.75 | 45.33 | 38.12 | 31.83 | 44.12 | 45.62 | 38.13 |
ConvE | 19.49 | 26.49 | 30.46 | 24.08 | 20.18 | 25.88 | 29.87 | 24.08 |
CompGCN | 18.89 | 24.89 | 25.84 | 22.11 | 19.83 | 25.40 | 26.46 | 22.88 |
RGCN | 32.22 | 46.99 | 49.89 | 39.93 | 35.36 | 47.72 | 49.82 | 41.83 |
MKGFormer | 34.04 | 48.75 | 51.72 | 41.85 | 36.24 | 48.93 | 51.14 | 42.99 |
MGKGR (ours) | 35.06 | 49.93 | 52.43 | 42.96 | 36.63 | 50.16 | 52.73 | 43.92 |
Dataset | PA-30K | FL-25K | ||||||
---|---|---|---|---|---|---|---|---|
Hit@1 | Hit@3 | Hit@10 | MRR | Hit@1 | Hit@3 | Hit@10 | MRR | |
MGKGR | 35.06 | 49.93 | 52.43 | 42.96 | 36.63 | 50.16 | 52.73 | 43.92 |
w/o Multimodal | 32.22 | 46.99 | 49.89 | 39.93 | 35.36 | 47.72 | 49.82 | 41.83 |
w/o Structure | 35.00 | 49.15 | 51.55 | 42.38 | 37.03 | 49.87 | 51.61 | 43.76 |
w/o CL | 35.26 | 49.86 | 52.34 | 43.04 | 35.97 | 49.62 | 52.98 | 43.49 |
w/o SFF | 35.72 | 49.53 | 52.21 | 43.15 | 36.91 | 49.95 | 52.21 | 43.91 |
w/o CL & SFF | 35.41 | 49.88 | 52.38 | 43.16 | 34.41 | 48.71 | 52.71 | 43.32 |
Model | PA-30K | FL-25K | |||||||
---|---|---|---|---|---|---|---|---|---|
Hit@1 | Hit@3 | Hit@10 | MRR | Hit@1 | Hit@3 | Hit@10 | MRR | ||
TransE | “Rich” | 30.66 | 48.37 | 50.58 | 39.51 | 33.51 | 49.83 | 52.08 | 41.82 |
“Sparse” | 30.47 | 48.34 | 50.55 | 39.37 | 33.14 | 49.38 | 50.94 | 40.15 | |
DistMult | “Rich” | 23.81 | 32.48 | 34.67 | 28.52 | 33.06 | 42.36 | 45.17 | 38.46 |
“Sparse” | 28.41 | 35.25 | 38.56 | 33.64 | 44.05 | 45.98 | 49.38 | 38.58 | |
RGCN | “Rich” | 32.22 | 46.99 | 49.89 | 39.93 | 35.36 | 47.62 | 49.82 | 41.83 |
“Sparse” | 31.78 | 44.99 | 49.37 | 39.54 | 34.76 | 47.24 | 50.61 | 40.54 | |
MKGFormer | “Rich” | 34.04 | 48.75 | 51.72 | 41.85 | 36.24 | 48.93 | 51.14 | 42.99 |
“Sparse” | 31.51 | 47.22 | 51.43 | 40.56 | 36.39 | 48.94 | 50.88 | 40.44 | |
MGKGR (ours) | “Rich” | 35.06 | 49.93 | 52.43 | 42.96 | 36.63 | 50.16 | 52.73 | 43.92 |
“Sparse” | 34.98 | 49.89 | 52.22 | 42.89 | 36.26 | 50.11 | 52.83 | 43.56 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Chen, R.; Li, S.; Li, T.; Yao, H. MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation. Algorithms 2024, 17, 593. https://doi.org/10.3390/a17120593
Zhang J, Chen R, Li S, Li T, Yao H. MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation. Algorithms. 2024; 17(12):593. https://doi.org/10.3390/a17120593
Chicago/Turabian StyleZhang, Jianqiang, Renyao Chen, Shengwen Li, Tailong Li, and Hong Yao. 2024. "MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation" Algorithms 17, no. 12: 593. https://doi.org/10.3390/a17120593
APA StyleZhang, J., Chen, R., Li, S., Li, T., & Yao, H. (2024). MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation. Algorithms, 17(12), 593. https://doi.org/10.3390/a17120593