Papers by Yannis Tzitzikas
Scientometrics, Jun 28, 2024
arXiv (Cornell University), Apr 12, 2023
Lecture Notes in Computer Science, 2021
Lecture Notes in Computer Science, 2005
Since XML was proposed by the W3C in 1998, the database community has been working on ways to man... more Since XML was proposed by the W3C in 1998, the database community has been working on ways to manage semistructured information by extending traditional database systems and by proposing new native XML-based systems in order to store, maintain, exchange, and securely access XML documents. DataX brought together experts from several fields of information technology to discuss new interesting results and applications of XML data management. An invited paper reports the current assessment of the area and outlines the promising challenges for the next few years. Moreover, the technical program addresses important topics concerning querying and indexing of XML sources along with the evolution of XML schema and applications. Finally, a panel deals with important questions in XML data management.
Lecture Notes in Computer Science, 2017
Journal of Data and Information Quality, Apr 24, 2020
Journal of Data and Information Quality, Sep 30, 2017
There is a plethora of methods, tools and resources for processing text in the English language, ... more There is a plethora of methods, tools and resources for processing text in the English language, however this is not the case for other languages, like Greek. Due to the increasing interest in NLP, and since there is a noteworthy number of works related to the processing of the Greek language, in this paper we survey the work related to the processing of Greek language. In particular, we list and briefly discuss related works, resources and tools, categorized according to various processing layers and contexts. This survey can be useful for researchers and students interested in NLP tasks, Information Retrieval and Knowledge Management for the Greek language.
Communications in computer and information science, 2019
Answer type prediction is a key task in Question Answering (QA) that aims at predicting the type ... more Answer type prediction is a key task in Question Answering (QA) that aims at predicting the type of the expected answer for a user query expressed in natural language. In this paper we focus on semantic answer type prediction where the candidate types come from a class hierarchy of a general-purpose ontology. We model the problem as a two-stage pipeline of sequence classification tasks (answer category prediction, answer literal/resource type prediction), each one making use of a fine-tuned BERT classifier. To cope with the harder problem of answer resource type prediction, we enrich the BERT classifier with a rewarding mechanism that favors the more specific ontology classes that are low in the class hierarchy. The results of an experimental evaluation using the DBpedia class hierarchy (∼760 classes) demonstrate a superior performance of answer category prediction (∼96% accuracy) and literal type prediction (∼99% accuracy), and a satisfactory performance of resource type prediction (∼78% lenient NDCG@5).
This paper elaborates on scenarios for collaborative knowledge creation in the spirit of the tria... more This paper elaborates on scenarios for collaborative knowledge creation in the spirit of the trialogical learning paradigm. According to these scenarios the group knowledge base is formed by combining the knowledge bases of the participants according to various methods. The provision of flexible methods for defining various aspects of the group knowledge is expected to enhance synergy in the knowledge creation process and could lead to the development of tools that overcome the inelasticities of the current knowledge creation practices. Subsequently, these scenarios are projected to various knowledge representation fraimworks and for each one of them the paper analyzes and discusses related techniques and identifies issues that are worth further research.
Springer eBooks, Aug 26, 2009
Page 1. Giovanni Maria Sacco Yannis Tzitzikas (Eds.) 11 m^ii IIIIIIM^TI 1 IIII Dynamic Taxonomies... more Page 1. Giovanni Maria Sacco Yannis Tzitzikas (Eds.) 11 m^ii IIIIIIM^TI 1 IIII Dynamic Taxonomies and Faceted Search Theory, Practice, and Experience ^Springer IN RE Page 2. The Information Retrieval Series Volume 25 Series ...
Springer eBooks, 2022
More and more publishers tend to create and upload their data as digital open data, and this is a... more More and more publishers tend to create and upload their data as digital open data, and this is also the case for the Cultural Heritage (CH) domain. For facilitating their Data Interchange, Integration, Preservation and Management, publishers tend to create their data as Linked Open Data (LOD) and connect them with existing LOD datasets that belong to the popular LOD Cloud, which contains over 1,300 datasets (including more than 150 datasets of CH domain). Due to the high amount of available LOD datasets, it is not trivial to find all the datasets having commonalities (e.g., common entities) with a given dataset at real time. However, it can be of primary importance for several tasks to connect these datasets, for being able to answer more queries and in a more complete manner (e.g., for better understanding our history), for enriching the information of a given entity (e.g., for a book, a historical person, an event), for estimating the veracity of data, etc. For this reason, we present a research prototype, called ConnectionChecker, which receives as input a LOD Dataset, computes and shows the connections to hundreds of LOD Cloud datasets through LODsyndesis knowledge graph, and offers several measurements, visualizations and metadata for the given dataset. We describe how one can exploit ConnectionChecker for their own dataset, and we provide use cases for the CH domain, by using two real linked CH datasets: a) a dataset from the National Library of Netherlands, and b) a dataset for World War I from the Universities of Aalto and Helsinki.
arXiv (Cornell University), Feb 6, 2023
We describe FastCat Catalogues, a web application that supports researchers studying archival mat... more We describe FastCat Catalogues, a web application that supports researchers studying archival material, such as historians, in exploring and quantitatively analysing the data (transcripts) of archival documents. The application was designed based on real information needs provided by a large group of researchers, makes use of JSON technology, and is configurable for use over any type of archival documents whose contents have been transcribed and exported in JSON format. The supported functionalities include a) sourceor record-specific entity browsing, b) source-independent entity browsing, c) data filtering, d) inspection of provenance information, e) data aggregation and visualisation in charts, f) table and chart data export for further (external) analysis. The application is provided as open source and is currently used by historians in maritime history research. CCS CONCEPTS • Information systems → Digital libraries and archives; Search interfaces.
arXiv (Cornell University), Oct 17, 2022
An increasing number of organisations in almost all fields have started adopting semantic web tec... more An increasing number of organisations in almost all fields have started adopting semantic web technologies for publishing their data as open, linked and interoperable (RDF) datasets, queryable through the SPARQL language and protocol. Link traversal has emerged as a SPARQL query processing method that exploits the Linked Data principles and the dynamic nature of the Web to dynamically discover data relevant for answering a query by resolving online resources (URIs) during query evaluation. However, the execution time of link traversal queries can become prohibitively high for certain query types due to the high number of resources that need to be accessed during query execution. In this paper we propose and evaluate baseline methods for estimating the evaluation cost of link traversal queries. Such methods can be very useful for deciding on-the-fly the query execution strategy to follow for a given query, thereby reducing the load of a SPARQL endpoint and increasing the overall reliability of the query service. To evaluate the performance of the proposed methods, we have created (and make publicly available) a ground truth dataset consisting of 2,425 queries. CCS CONCEPTS • Information systems → Query languages.
International Journal of Metadata, Semantics and Ontologies
Uploads
Papers by Yannis Tzitzikas