0% found this document useful (0 votes)
8 views

AILC Abstract2

This paper presents work on automatic text simplification for Italian. It creates a new corpus for Italian text simplification by merging existing resources. It also fine-tunes a transformer model for sentence simplification, achieving state-of-the-art results for Italian. Additionally, it attempts to create an adaptive model that can simplify text according to specific target populations based on parameterized grammatical features. The baseline simplification model achieves a SARI score of 51.51 on test data from the new corpus, improving the state of the art. The adaptive model achieves the highest reported SARI score of 60.12 for a controllable Italian text simplification system.

Uploaded by

pratik kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

AILC Abstract2

This paper presents work on automatic text simplification for Italian. It creates a new corpus for Italian text simplification by merging existing resources. It also fine-tunes a transformer model for sentence simplification, achieving state-of-the-art results for Italian. Additionally, it attempts to create an adaptive model that can simplify text according to specific target populations based on parameterized grammatical features. The baseline simplification model achieves a SARI score of 51.51 on test data from the new corpus, improving the state of the art. The adaptive model achieves the highest reported SARI score of 60.12 for a controllable Italian text simplification system.

Uploaded by

pratik kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/371156476

Controllable Sentence Simplification with a Unified Text-to-Text Transfer


Transformer

Conference Paper · May 2023

CITATIONS READ

0 1

3 authors, including:

Martina Galletti

8 PUBLICATIONS 6 CITATIONS

SEE PROFILE

All content following this page was uploaded by Martina Galletti on 30 May 2023.

The user has requested enhancement of the downloaded file.


Automatic Text Simplification for Italian poor readers and comprehenders : a new
corpus, a model and an adaptive component

Francesca Padovani [1][2], Martina Galletti* [1][3][5] & Daniele Nardi [3][4][5]
[1]
Sony Computer Science Laboratories-Paris (Sony CSL - Paris), France
[2]
University of Trento, Italy
[3]
Sapienza University of Rome, Italy
[4]
CINI-AIIS, Italy
[5]
Centro di Studi e Ricerche Enrico Fermi, Italy

*martina.galletti@sony.com

Automatic Text Simplification (ATS) is the process of modifying a text to reduce its overall linguistic complexity.
To automate this simplification process, a number of non-trivial operations must be carried out, including the
assessment of the complexity of the source text, the identification of the fundamental words and parts of the text
itself, and the appropriate modification of these elements in the subsequent simplification stages, at the level of
vocabulary, syntax or discourse. The simplification problem has been investigated by several studies proposing
different methodologies to tackle the task on the English language, but other languages, such as Italian, are less
explored. This is due not only to the limited amount of data available but also the poor quality of the accessible
data itself. For the Italian language there are only two small manually curated datasets1 and only one large corpus2,
PaCCSS-IT, created with a data-driven approach. Most ATS systems produce the same output for every target
group, whereas different categories of people, such as those with cognitive and linguistic disabilities, may benefit
from a text simplified according to their vulnerabilities. The output of this abstract is three-fold. We first built a
new enriched corpus of parallel complex/simple sentences for Italian, robust in terms of quality and large in terms
of quantity by merging PaCCSS-IT with the existing manually curated resources3, a small dataset harvested from
the Italian Wikipedia in a semi-automatic way4 and by translating sentences from an English dataset. Secondly,
we fine-tuned a transformer-based encoder-decoder model inspired by the state-of-the-art available for English5.
Finally, we attempted to parameterise grammatical text features to control simplifications with the goal of making
them adaptive for a specific target population. After evaluation, the baseline sentence simplification model
obtained a good result, achieving a SARI value of 51.51 on the test set of the corpus we built and designed. This
result improves the state of the art (+1.51) on Italian language. We have also made an attempt to create the adaptive
model that reached a SARI value of 60.12. This score is the highest obtained for a controllable simplification
system of Italian text.

1
Brunato, D., Dell’Orletta, F., Venturi, G., & Montemagni, S. (2015, June). Design and annotation of the first
Italian corpus for text simplification. In Proceedings of The 9th Linguistic Annotation Workshop (pp. 31-41).
2
Brunato, D., Cimino, A., Dell’Orletta, F., & Venturi, G. (2016, November). Paccss-it: A parallel corpus of
complex-simple sentences for automatic text simplification. In Proceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing (pp. 351-361).
3
Brunato, D., Dell’Orletta, F., Venturi, G., & Montemagni, S. (2015, June). Design and annotation of the first
Italian corpus for text simplification. In Proceedings of The 9th Linguistic Annotation Workshop (pp. 31-41).
4
Tonelli, S., Aprosio, A. P., & Saltori, F. (2016). SIMPITIKI: a Simplification corpus for Italian. In CLiC-
it/EVALITA (pp. 4333-4338).
5
Sheang, K. C., & Saggion, H. (2021, August). Controllable Sentence Simplification with a Unified Text-to-
Text Transfer Transformer. In Proceedings of the 14th International Conference on Natural Language
Generation (pp. 341-352).

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy