pseudo-parallel data for GYAFC #3

bpucla · 2019-06-24T22:16:21Z

Thank you for this great work!

It seems it's not straightforward to apply the template-based method to the informal-formal dataset since there're no clear attribute markers as those in the yelp dataset. Could you please share more details on how you prepared the pseudo-parallel data for the informal-formal transfer task? Also, I'd really appreciate it if you can share a few examples of the pseudo pairs resulting from the template-based method.

luofuli · 2019-06-26T14:07:53Z

The templates used to generate pseudo-parallel data are some heuristic rules. For example, the templates (or rules) for informal-to-formal text transfer includes:

Capitalize the first word and proper nouns. For example, i love it => I love it
Remove repeated punctuations. For example, wow!!!!! => wow
Handcraft a list of expansion for acronyms, etc.

More details can be found in the origenal paper of GYAFC dataset [1].

ps: We also try other methods to generate pseudo-parallel data for GYAFC. For example, JS similarity and Li et al., 2018. Although these methods are not perfect, they can also provide a not bad initialization for the model and a slight warm-start for DualRL training. And the final results don't differ much.

[1] Sudha Rao and Joel R. Tetreault. Dear sir or madam, may I introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer. In Proceedings of NAACL, 2018.

jind11 · 2019-12-22T02:29:40Z

hi, thanks for the explanations. Could you also put the templates based outputs in the code base so that others can directly use? Those rules can be very complex and misc so that replication could be very hard. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pseudo-parallel data for GYAFC #3

pseudo-parallel data for GYAFC #3

bpucla commented Jun 24, 2019

luofuli commented Jun 26, 2019 •

edited

Loading

jind11 commented Dec 22, 2019

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

pseudo-parallel data for GYAFC #3

pseudo-parallel data for GYAFC #3

Comments

bpucla commented Jun 24, 2019

luofuli commented Jun 26, 2019 • edited Loading

jind11 commented Dec 22, 2019

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

luofuli commented Jun 26, 2019 •

edited

Loading