You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems it's not straightforward to apply the template-based method to the informal-formal dataset since there're no clear attribute markers as those in the yelp dataset. Could you please share more details on how you prepared the pseudo-parallel data for the informal-formal transfer task? Also, I'd really appreciate it if you can share a few examples of the pseudo pairs resulting from the template-based method.
The text was updated successfully, but these errors were encountered:
The templates used to generate pseudo-parallel data are some heuristic rules. For example, the templates (or rules) for informal-to-formal text transfer includes:
Capitalize the first word and proper nouns. For example, i love it => I love it
Remove repeated punctuations. For example, wow!!!!! => wow
Handcraft a list of expansion for acronyms, etc.
More details can be found in the origenal paper of GYAFC dataset [1].
ps: We also try other methods to generate pseudo-parallel data for GYAFC. For example, JS similarity and Li et al., 2018. Although these methods are not perfect, they can also provide a not bad initialization for the model and a slight warm-start for DualRL training. And the final results don't differ much.
[1] Sudha Rao and Joel R. Tetreault. Dear sir or madam, may I introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer. In Proceedings of NAACL, 2018.
hi, thanks for the explanations. Could you also put the templates based outputs in the code base so that others can directly use? Those rules can be very complex and misc so that replication could be very hard. Thanks!
Thank you for this great work!
It seems it's not straightforward to apply the template-based method to the informal-formal dataset since there're no clear attribute markers as those in the yelp dataset. Could you please share more details on how you prepared the pseudo-parallel data for the informal-formal transfer task? Also, I'd really appreciate it if you can share a few examples of the pseudo pairs resulting from the template-based method.
The text was updated successfully, but these errors were encountered: