Shuyan S SoP-3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Statement of Purpose

Shuyan Zhou
shuyanzh@cs.cmu.edu

“Shuyan, this is the best gift you have ever given me!” said my mother excitedly, after using an intelligent
speaker for half a year. She told me that this product significantly simplifies her life by interacting through
natural language, instead of clicking and typing. I am very happy that the field I am working in is changing
people’s lives. However, such technology has huge headroom for improvement. For example, the agent
does not have very good Chinese support, compared to English; it failed to retrieve proper information
when my mom asked questions related to her job. I want to contribute more to our NLP communities so
that people, no matter what language they speak and which domain they care about, can equally access
language technology. With this in mind, my research goal is to leverage rich knowledge to build robust and
generalizable NLP tools, which has motivated my research around two prime questions: 1) how to encode
external knowledge to the model across scenarios with different constraints? 2) how to design learning
algorithms that efficiently use available supervision?

1 Generalizable Knowledge Representation across Languages

Knowledge representation first caught my attention when I worked on English short text (e.g. tweets) entity
linking (a.k.a. named entity disambiguation) with Chin-Yew Lin at Microsoft Research Asia [1]. The most
challenging part of this project was encoding rich semantic information about entities into embeddings. We
addressed this problem by representing an entity as a collection of word n-gram embeddings and interac-
tively matched these embeddings with the n-gram embeddings of the queried named entity’s context. An
ablation study indicated the benefits of pretrained word embeddings as they initialized every word with rich
semantic information.

However, the situation was less favorable when I worked on low-resource cross-lingual entity linking (XEL)
as part of the DARPA LORELEI program with Graham Neubig and Jaime Carbonell at Carnegie Mellon
University [2]. Resources required to encode entities are not available in low-resource settings. For exam-
ple, due to a lack of monolingual text corpora, we are unable to obtain high-quality (multilingual) word
embeddings. Given these observations, I asked myself: can we design an entity representation that can
be applied to all languages, regardless of resource availability? Instead of using target language data, we
leveraged ubiquitously-available structured knowledge resources such as English Wikipedia to jointly dis-
ambiguate all named entities in a given document. Our new entity representation is fully language-agnostic
and can be applied to any language.

2 Robust Learning Algorithms with Limited Supervision

With language-independent knowledge representations, an immediate question that came to mind was how
to train a model with these representations as input with little or no annotation from the low-resource lan-
guage. I continued our goal of building an language-independent XEL model that requires zero-resources in
the low-resource language. The idea is to design the model under the transfer learning paradigm [2], training
on a high-resource language and directly applying the model to the target low-resource language without
further fine-tuning. We applied the same technique to [3] and resulted in an end-to-end zero-shot XEL
system. Our work significantly extends the capacity of classic XEL systems towards a true zero-resource
language-invariant pipeline.

Besides exploring transfer learning in building generalizable NLP models across different languages, I also
delved into multi-task learning and data augmentation for robust NLP [4]. It is known that the perturbations
(e.g. typographical and grammatical errors) minor to humans can hurt a neural machine translation (NMT)
model’s performance. However, supervision from the “noisy domain” is often too sparse to train a good
NMT model. It is also hard to manually design generalizable denoising rules. To solve this problem, we
augmented the large size out-of-domain clean data and designed a cascaded multi-task transformer to first
clean the noisy source sentence and then perform translation. Our approach fully automates the denoising
process by providing intermediate supervision.

3 Future Plans

My ambition is to break technology boundaries and eventually develop generalizable natural language un-
derstanding agents for everyday use. I believe that a truly intelligent agent requires 1) rich external knowl-
edge that is mostly environment-invariant to understand the law of the world and 2) models that could
properly inject this knowledge to different scenarios with environment-specific information. I am fascinated
by the research questions behind these requirements and I am strongly motivated to apply for the Ph.D. pro-
gram at Carnegie Mellon University. As a Ph.D. student, I hope to create large scale machine-understandable
knowledge bases by harvesting external knowledge from diverse resources (e.g., Stackoverflow, WikiHow,
Youtube). More specifically, I hope to use information extraction techniques to extract entities and relations;
semantic parsing techniques to model procedural knowledge (e.g., code generation from natural language)
and multimodal techniques to encode demonstrations (e.g., video-text embedding). Further, I hope to de-
sign models that could customize this knowledge for different environments, which requires the interplay
between language (e.g., housekeeping instruction) and the environment. I am interested in building models
that could take advantage of the properties of knowledge (e.g., compositionality, transitivity) and retrieve
related knowledge. The agent could then jointly reason over the knowledge, language meaning and the
changing state of the environment during its executions (if any). These techniques could broadly benefit
different tasks like instruction following, machine reading comprehension and question answering across
different domains.

In light of this, I hope to work with Graham Neubig, Yonatan Bisk, Emma Strubell and Eduard Hovy. I
believe that I will add significant value to the CMU NLP community and receive, in return, the invaluable
opportunity to achieve my research goals in a unique and diverse intellectual environment. I hope one day
I can proudly tell my mom “look mom, my ideas and efforts created your favorite product.”.

References

[1] Feng Nie, Shuyan Zhou, Jing Liu, Jinpeng Wang, Chin-Yew Lin, and Rong Pan. Aggregated Semantic
Matching for Short Text Entity Linking. In CoNLL, 2018.
[2] Shuyan Zhou, Shruti Rijhwani, and Graham Neubig. Towards Zero-resource Cross-lingual Entity Link-
ing. In DeepLo, 2019.

[3] Shuyan Zhou, Shruti Rijhwani, John Wieting, Jaime Carbonell, and Graham Neubig. Improving Can-
didate Generation for Low-resource Cross-lingual Entity Linking. In TACL 2020, to appear.

[4] Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, and Graham Neubig. Improving
Robustness of Neural Machine Translation with Multi-task Learning. In WMT, 2019.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy