26595-Article Text-30658-1-2-20230626

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)
ConvNTM: Conversational Neural Topic Model

Hongda Sun,1, * Quan Tu,1, * Jinpeng Li,2 Rui Yan1,3,†
1
Gaoling School of Artificial Intelligence, Renmin University of China
2
Wangxuan Institute of Computer Technology, Peking University
3
Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education
{sunhongda98, quantu}@ruc.edu.cn, lijinpeng@stu.pku.edu.cn, ruiyan@ruc.edu.cn
Abstract we assume that there are different characteristics to model

topics in different textual analysis tasks. For general topic
Topic models have been thoroughly investigated for multi-
modeling on long documents, each document is typically
ple years due to their great potential in analyzing and un-
derstanding texts. Recently, researchers combine the study assigned a topic distribution, and words in the document
of topic models with deep learning techniques, known as are iteratively generated based on the distribution (Blei, Ng,
Neural Topic Models (NTMs). However, existing NTMs are and Jordan 2003; Miao, Grefenstette, and Blunsom 2017;
mainly tested based on general document modeling with- Dieng, Ruiz, and Blei 2020). For short-text topic model-
out considering different textual analysis scenarios. We as- ing, since the word co-occurrence information is limited, the
sume that there are different characteristics to model topics sparsity problem should be considered during topic extrac-
in different textual analysis tasks. In this paper, we propose tion (Cheng et al. 2014; Zhu, Feng, and Li 2018; Lin, Hu,
a Conversational Neural Topic Model (ConvNTM) designed and Guo 2019). While in the conversational scenario, topic
in particular for the conversational scenario. Unlike the gen- modeling is even more complicated with the following two
eral document topic modeling, a conversation session lasts
unique properties to discover topics: 1) A conversation ses-
for multiple turns: each short-text utterance complies with a
single topic distribution and these topic distributions are de- sion generally consists of multiple turns of short-text utter-
pendent across turns. Moreover, there are roles in conversa- ances (Zhang et al. 2019; Adiwardana et al. 2020), which
tions, a.k.a., speakers and addressees. Topic distributions are usually follow different topic distributions (Sun, Loparo, and
partially determined by such roles in conversations. We take Kolacinski 2020). A simple operation of utterance concate-
these factors into account to model topics in conversations nation as a long document—which is the way of existing
via the multi-turn and multi-role formulation. We also lever- NTMs—leads to the omission of dialogue structural infor-
age the word co-occurrence relationship as a new training ob- mation in topic modeling. As a matter of fact, utterances
jective to further improve topic quality. Comprehensive ex- from different turns are connected and topic distributions are
perimental results based on the benchmark datasets demon- dependent across turns. 2) There are multiple roles within a
strate that our proposed ConvNTM achieves the best per-
conversation session, speakers and addressees (Holtgraves,
formance both in topic modeling and in typical downstream
tasks within conversational research (i.e., dialogue act classi- Srull, and Socall 1989). A series of studies indicate that such
fication and dialogue response generation). roles are essential in keeping the topic consistency and con-
tent coherence within a conversation (Kim and Vossen 2021;
Ma, Zhang, and Zhao 2021). Without the modeling of the
Introduction conversational structure with multiple roles, it is likely that
Topic models are used to discover abstract topics in a series the topic discovery will be compromised due to the missing
of documents to understand the latent semantics of a text consistency and coherence in dialogue understanding.
corpus (Hofmann 1999; Blei, Ng, and Jordan 2003). With To this end, we propose a Conversational Neural Topic
the recent development of neural networks and generative Model (ConvNTM) which is in particular designed for the
models, various neural topic models (NTMs) have been pro- conversational scenario with the mentioned characteristics
posed and applied in document classification, retrieval, se- formulated in topic modeling. Specifically, we develop a hi-
mantic analysis, etc (Larochelle and Lauly 2012; Dieng et al. erarchical conversation encoder to capture the multi-turn di-
2017; Zhao et al. 2021). alogue structure. A sequence encoder is utilized to model
Most existing NTMs are designed for document analysis. the conversation contexts and extract utterance-level repre-
Their main modeling scenarios lie in news articles or social sentations for the role modeling of speakers and addressees.
platform posts (Lang 1995; Li et al. 2016), with less consid- Then we construct a multi-role interaction graph to model
eration on various other textual analysis scenarios. However, speaker/addressee information from two perspectives. On
* These authors contributed equally. the one hand, different roles hold personalized topic distri-
†
Corresponding author: Rui Yan (ruiyan@ruc.edu.cn) butions and they need to integrate the intra-speaker informa-
Copyright © 2023, Association for the Advancement of Artificial tion in their utterances to determine the current topic. All
Intelligence (www.aaai.org). All rights reserved. utterances from a particular speaker should be consistent on
13609
the topic distribution to avoid contradictions. On the other ordinated and jointly trained with the neural variational
hand, a speaker can decide whether to keep or change the inference objective to further improve topic coherence.
topic for themselves based on the utterances of other speak-
ers. That is, topic maintenance and switching in a conversa-
tion continue under the inter-speaker interaction. We employ Related Work
a graph neural network to reason the speaker graph and in- Topic Model
tegrate intra-speaker and inter-speaker dependencies among
utterances. The graph encoder and the sequence encoder co- Topic modeling has always been a catalyst for other research
operate to adequately capture the hierarchical structure of areas in Natural Language Process (NLP) (Panwar et al.
the conversation. The learned representations of the graph 2020; Jin et al. 2021; Srivastava and Sutton 2016). A classic
encoder are incorporated into the topic modeling process. statistical topic model is Latent Dirichlet Allocation (LDA),
Considering the structural properties of the conversation, which is based on Gibbs sampling to extract topics from doc-
we make reasonable assumptions on the topic distribution. uments (Blei, Ng, and Jordan 2003). With the development
First, to prevent confusion from modeling the entire con- of deep generative models, it has led to the study of neu-
versation with a single topic, we perform fine-grained topic ral topic models (NTMs) (Miao, Grefenstette, and Blunsom
modeling by assuming that each utterance compiles with a 2017; Zhu, Feng, and Li 2018; Wang, Zhou, and He 2019).
specific topic distribution. These distributions are mutually Variational Autoencoder (VAE) (Kingma and Welling 2013)
influenced across multiple turns. Additionally, the topic dis- is the most widely used framework for NTMs. GSM (Miao,
tribution of each utterance is assumed to rely on both global Grefenstette, and Blunsom 2017) replaces the prior with a
and local topic information. We assign each speaker a global Gaussian softmax function. ProdLDA (Srivastava and Sut-
topic distribution as a specific role. Then the local topic in- ton 2017) constructs a Laplace approximation to the Dirich-
formation in each utterance will be extracted and interacted let prior. ETM (Dieng, Ruiz, and Blei 2020) shares the
with the global role information to produce final topic dis- embedding space between words and topics. GNTM (Shen
tribution. Based on the novel graphical model of ConvNTM, et al. 2021) adds the document graph into the generative pro-
corresponding neural variational inference methods are car- cess of topic modeling. With the progress of social platforms
ried out for model learning. Furthermore, to further improve (e.g. Microblog and Twitter), application-oriented NTMs
topic coherence, we leverage the word co-occurrence infor- keep pouring out. LeadLDA (Li et al. 2016) considers the
mation as a new training objective, which can be jointly tree structure based on the re-posts and replying relations.
trained with the original objective of neural variational in- ForumLDA (Chen and Ren 2017) cooperatively models the
ference. The ConvNTM that grasps the word co-occurrence evolution of a root post, as well as its relevant and irrelevant
relationship can make related words tend to be clustered into response posts to detect topics. In these posts, people always
the same topic, which helps to obtain higher quality topic- discuss a single hot topic. While in our target conversation
word distributions. scenario, speakers with different roles may switch topics in
We run experiments based on the public benchmark con- multiple turns.
versational datasets, DailyDialog and EmpatheticDialogues.
Our proposed ConvNTM achieves the best performance on Multi-Turn Dialogue
topic modeling in terms of topic coherence and quality met- The simple concatenation of multi-turn dialogue contexts
rics, which indicates that ConvNTM has better topic inter- performs poorly since it makes the latent dialogue struc-
pretability on the dialogue corpora compared against gen- ture ignored. Abundant works suggest that the multi-turn
eral NTMs. Furthermore, we also conduct experiments on dialogue requires specific modeling methods (Qiu et al.
typical downstream tasks for dialogues based on the dis- 2020a,b). Serban et al. devise the hierarchical LSTM to
covered topics, including dialogue act classification and re- encode the structure and generate responses. DialoFlow (Li
sponse generation. The experimental results indicate that et al. 2021) is another solution, which views the dialogue
with the help of the topics discovered by ConvNTM, the as a dynamic flow and designs three objectives to capture
performance is prominently boosted compared against the the information dynamics. Moreover, the speaker feature is
baselines without topic information and existing topic-aware also considered as a pivotal factor in the dialogue. He et al.
dialogue methods. incorporate the turn changes among speakers to capture the
Our overall contributions are summarized as follows: fine-grained semantics of dialogue. Gu et al. introduce a
• To the best of our knowledge, for the first time, we pro- speaker-aware disentanglement strategy to tackle the entan-
pose ConvNTM, the neural topic model in particular de- gled dialogues and improve the performance of multi-turn
signed for the conversational scenario to formulate the dialogue response selection. Topic-aware models take the
multi-turn structure in dialogues to discover topics. advantage of the related topics to make conversational mod-
eling more consistent. Liu et al. propose two topic-aware
• Considering the multi-role interactions (speakers and ad- contrastive learning objectives to handle information scat-
dressees) in conversations, we perform utterance-level tering challenges for the dialogue summarization task. Zhu
fine-grained topic modeling and fuse global and local et al. propose a topic-driven knowledge-aware Transformer
topic information to determine topic distributions. to deal with the emotion detection in dialogue. We hope that
• We also leverage the word co-occurrence relationship to our ConvNTM can better facilitate the development of topic-
constrain the topic-word distribution, which can be co- aware methods.
13610
(a) Conversation Sequence Encoder (b) Multi-Role Graph Encoder (c) Topic Modeling
Speaker 1
...
Attention Layer
MLP MLP
BiLSTM BiLSTM ... BiLSTM
Graph Convolution Layer Layer Layer

Transformer Encoder
Embedding Layer
... Speaker 2
Figure 1: The model overview of ConvNTM: a) The conversation sequence encoder for modeling the multi-turn conversation
contexts; b) The multi-role graph encoder for formulating the intra-speaker and inter-speaker dependencies; c) The topic mod-
eling module to reconstruct utterance-level BoWs based on the fusion of global and local topic information.
Conversational Neural Topic Model for the conversation to describe the multi-role interactions.
(j)
In this section, we describe the modules and training ob- We denote each utterance representation hk as a node,
jectives of ConvNTM in detail. The model overview of the and the two types of edges between nodes reflect the intra-
ConvNTM is illustrated in Figure 1. speaker and inter-speaker dependencies. First, the individ-
ual roles of each speaker in the dialogue have a signifi-
Hierarchical Conversation Encoder cant impact on the continuation of the conversation. The
speaker tends to organize what he/she has said in the pre-
To fully extract semantic information in the multi-turn con- vious utterances to determine the topic of the current utter-
versation to help topic modeling, we use a hierarchical ance. Therefore, we consider intra-speaker dependency to
framework in which both a sequence encoder and a graph keep the topic consistency and avoid contradictions. For the
encoder cooperatively encode the conversation contexts to (j) (j)
better handle cross-utterance dependencies. speaker j, we add a bidirectional edge between hk1 and hk2
only if |k1 − k2 | ≤ Ks , where Ks indicates the window
Conversation sequence encoder. To capture the multi- size for aggregating contextual utterances from the same
turn structure of the conversation, we employ a sequence speaker. Second, a speaker will give feedback on the utter-
encoder that models the conversation contexts from word ance contents from other speakers, and then decide whether
level to utterance level. Suppose that a conversation ses- to keep or shift the current topic. It is also necessary to con-
sion c has J speakers, and the speaker j has nj utterances: struct the inter-speaker dependency in the graph to simu-
(j) (j) (j) (j) late the dynamic interactions. For two speakers j1 and j2 ,
{u1 , u2 , · · · , unj }. The words in the k-th utterance uk
(j ) (j )
(j)
are first encoded as ek through an embedding layer fe . A we add a bidirectional edge between hkj1 and hkj2 only if
1 2
two-layer Transformer encoder ftrm is then used to further |kj1 − kj2 | ≤ Kc , where Kc indicates the absolute distance
(j) window size of two utterances in the conversation. Taking
process ek and obtain the utterance-level representation
(j) Figure 1 as an example, the second speaker has three utter-
sk from the [CLS] token. In order to enhance the contex-
ances interspersed with the first speaker’s four utterances.
tual relationship among the multi-turn utterances, we feed
In this graph, the intra-speaker edges are in grey while the
the Transformer outputs into a bidirectional LSTM frnn and
inter-speaker edges are in black. We utilize a graph convolu-
a standard self-attention layer fattn successively. Finally, we
tion network (GCN) fgcn to update the utterance represen-
denote the learned utterance representations for the speaker
(j) (j) (j) tations under the multi-role interaction relations. Therefore,
j as {h1 , h2 , · · · , hnj }. The encoding process for the se- the learned utterance representation e
(j)
hk is given by:
quence encoder can be formulated as:
(j) (j)
(j) (j)
hk = fgcn (hk ).
e (5)
ek = fe (uk ), (1)
(j) (j) Topic Modeling
sk = ftrm (ek )[CLS] , (2)
Based on the speaker-oriented utterance representations
(j) (j) (j)
sek = frnn (s1 , s2 , · · · , s(j)
nj ) k , (3) from the graph encoder, we then introduce our techniques
(j) (j) (j) for topic modeling.
s1 , se2 , · · · , se(j)
hk = fattn (e nj ) k . (4)
Topic distribution assumption. Given a general docu-
Multi-role graph encoder. Considering the impact of ment, the generative process of existing NTMs is mainly di-
speaker information in a conversation, we construct a graph vided into three steps: 1) sample a topic distribution θ for
13611
a document or each sentence; 2) sample a topic assignment Generative process. Based on the above definitions, we
zt for each word wt from the topic distribution θ; 3) gen- summarize the generative process of ConvNTM as follows.
erate each word wt independently from the corresponding 1. For each speaker j in the conversation session c:
topic-word distribution βzt . However, a conversation con- (j) (j) (j)
tains multiple turns of utterances, the topics in the utterances i) Sample the latent variable zc ∼ N (µc , σc );
follow their respective topic distributions and are related to (j) (j)
ii) Draw θc = softmax(gθ (zc )) as the global topic
each other. The roles of different speakers also influence distribution.
the topic determination. Thus, we need to adapt the origi- (j)
2. For each utterance uk of the speaker j:
nal assumptions on the topic distribution according to the
(j)
unique properties of the conversation. Specifically, we as- i) Draw θk as the local topic information;
sume that each speaker j in the conversation session c holds (j) (j) (j)
(j) ii) Draw θe by fusing θc and θ ;
k k
a global topic information θc , and each utterance k has lo- (j)
(j) iii) For each word w in the utterance uk : draw w ∼
cal topic information θk , which is fused with the corre- (j)
sponding global topic to determine the eventual topic distri- softmax(θek β).
(j)
bution θek .
The Joint Training Objective
NTM framework. We process the nj utterances of Neural variational inference objective. Under the gen-
each speaker j into bag-of-words (BoW) representations: erative process of ConvNTM, the marginal likelihood of the
(j) (j) (j) (j)
{x1 , x2 , · · · , xnj }, where xk is a |V |-dimensional conversation session c is decomposed as:
multi-hot encoded vector for the k-th utterance and V is the
J Z
BoW vocabulary. Note that each g∗ mentioned below repre- Y
sents a multilayer perceptron (MLP). We first normalize the p(c|µ, σ, β) = p(θc(j) |µ(j) (j)
c , σc )
(j)
(j) j=1 θc
BoW vector xk and then use gx to extract the representa- nj
!
(j)
tion x
ek :
YY
· p(w|β, θc(j) ) dθc(j) . (13)
(j)
(j) x k=1 w
ek = gx ( P|V | k (j) ).
x (6)
v=1 k(x ) v Inspired by the success of VAE-based NTMs (Miao, Grefen-
In order to introduce multi-role interactions into topic mod- stette, and Blunsom 2017; Dieng, Ruiz, and Blei 2020), we
(j) (j) also employ a VAE framework for the utterance-level BoW
eling, we concatenate x hk
ek with the node representation e
given by the graph encoder. Then, we obtain the local topic reconstruction process. The posterior global topic distribu-
(j)
(j)
information θk of the utterance through gs : tion p(θc ) for each speaker j can be approximated by the
(j) (j) (j)
(j) (j) (j) inference network q(θc |µc , σc ). We can formulate pa-
kθ = gs (e x ⊕e
k k h ). (7)
rameter updates from the variational evidence lower bound
Next, all the utterances of each speaker j are integrated to (ELBO). From the perspective of ELBO, the training ob-
(j)
derive the global speaker-aware representation hc , which jective for the log-likelihood of the conversation consists of
(j) (j) two terms. The first term is to minimize the cross entropy
can be used to estimate the prior variables µc and log σc
via two separate networks gµ and gσ : between the input normalized BoW and reconstructed BoW,
nj
! and the second Kullback–Leibler (KL) divergence term is to
(j) (j) (j) minimize the distance between the variational posterior and
X
(j)
hc = tanh x k ⊕ hk ) · θ k
gc (e e , (8)
true posterior of latent variables. This part of the training
k=1
loss can be formulated as:
µ(j)
c = gµ (hc ),
(j)
log σc(j) = gσ (h(j)
c ). (9) nj
!
With the reparameterisation trick (Kingma and Welling (j)
X X
(j)
(j) (j) (j) Lc = − Eq(θ(j) |µ(j) ,σ(j) ) log p(w|θc , β)
2013), we can sample a latent variable zc ∼ N (µc , σc ). c c c
k=1 w
(j)
Then we use gθ to generate the global topic distribution θc : +wkl · DKL (q(θc(j) |µ(j) (j) (j)
c , σc )||p(θc )), (14)
θc(j) = softmax(gθ (zc(j) )). (10)
where wkl is the hyper-parameter for the weight of the KL
Finally, we can use gf to fuse local and global topic infor-
(j)
term.
mation to derive the eventual topic distribution θek :
(j) (j)
Controllable word co-occurrence objective. In addition
θe = gf (θ ⊕ θ(j) ).
k k c (11) to the ELBO commonly used in general NTMs, we further
Assuming that the number of topics is K, all the above topic leverage the word co-occurrence information of the training
distributions are K-dimensional vectors. To reconstruct the corpus to improve the topic quality. For the topic-word dis-
BoWs for each utterance in the conversation, we leverage tribution matrix β ∈ RK×|V | , its i-th row represents a multi-
a weighted matrix β ∈ RK×|V | to represent K topic-word nomial distribution on the i-th topic over the vocabulary V .
distributions. The reconstructed utterance BoW can be de- We expect that the top words in each topic are highly cor-
rived as: related and tend to co-occur in the same real conversations.
(j) (j)
x̂k = softmax(θek β). (12) Thus, we count the co-occurrence frequencies of all word
13612
pairs in all conversations in the training corpus, and con- Both of these TC metrics can be obtained in the gensim li-
struct a co-occurrence matrix M ∈ R|V |×|V | . Next, we add brary (Rehurek and Sojka 2011). TD measures the diver-
such a constraint on β, which can be described as the fol- sity across different topics. It is defined as the percentage
lowing loss: of unique words among the top words. A higher TD metric
indicates more topic variability. Pursuing either a high TC
|V | |V |
X X value or a high TD value independently does not guarantee
Lco = − Mw1 ,w2 log(β T β)w1 ,w2 . (15) the topic quality. Inspired by (Dieng, Ruiz, and Blei 2020),
w1 =1 w2 =1 we regard CV as the TC score and measure the topic quality
score (TQ) as the product of TC and TD.
Intuitively, we make the β-derived matrix as close as possi-
ble to the reference co-occurrence matrix M . We set a target log
p(wi ,wj )+ϵ
P (wi )P (wj )
co-occurrence distance as dco , and then design a controllable NPMI(wi , wj ) = . (18)
weight wco for the trade-off between Lc and Lco . Suppose − log(p(wi , wj ) + ϵ)
that there are C conversations in the training set, the overall Baselines. We compare our model with the mainstream
training loss of ConvNTM is given by: and state-of-the-art topic models as baselines. The base-
C X
X J lines include: 1) LDA (Blei, Ng, and Jordan 2003), the most
L = (1 − wco ) L(j) representative statistical topic model using Gibbs sampling;
c + wco Lco . (16)
c=1 j=1
2) GSM (Miao, Grefenstette, and Blunsom 2017), a VAE-
based NTM introducing Gaussian softmax for generating la-
The controllable factor wco is dynamically adjusted as: tent variables; 3) ProdLDA (Srivastava and Sutton 2017), an
NTM constructing Laplace approximation to the Dirichlet
0, Lco ≤ dco ,

prior; 4) ETM (Dieng, Ruiz, and Blei 2020), an NTM pro-
wco = Lco − dco (17) jecting topics and words into the same embedding space; 5)
 min 1, , Lco > dco , GNTM (Shen et al. 2021), a recent NTM designing a docu-
Wco
ment graph and introducing it into the generative process of
where Wco is another hyper-parameter of the correcting fac- topic modeling. For all baselines, we employ their officially
tor for the proportional signal. reported parameter settings.
Implementation details. For the multi-role interaction
Experiments graph, we set the window sizes Ks and Kc to 2. The BoW
Experimental Setup dictionary size is set to 6,500 in DailyDialog and 7,533 in
Datasets. We conduct the experiments on two widely EmpatheticDialogues. The embedding size and hidden size
used multi-turn dialogue datasets, DailyDialog1 and Empa- of the Transformer, LSTM and GCN are all set to 64. For the
theticDialogues2 . DailyDialog (Li et al. 2017) totally con- loss function, wkl and Wco are set to 0.01 and 0.05, while the
tains 13,118 high-quality open-domain daily conversations, value of dco is determined by the number of topics and the
and covers various topics about daily life. It has 7.9 av- dataset. In our main results, dco is recommended to be set to
erage speaker turns per conversation, and each speaker 32 in DailyDialog and 31.375 in EmpatheticDialogues. The
has enough utterances for multi-turn modeling. We use training process has 100 epoches using the Adam optimizer
the official splits, i.e., 11,118/1,000/1,000. EmpatheticDia- with the base learning rate of 0.001. We implement the ex-
logues (Rashkin et al. 2019) contains about 25k personal periments on a Nvidia A40 GPU.3
conversations with rich emotional expressions and topic sit-
uations. Speakers discuss emotional topics and tend to inter- Main Results
act with empathy. We also employ the official splits data, i.e. For all baselines, one conversation is treated as one docu-
19,533/2,770/2,547 for train/val/test respectively. ment for topic modeling. Here we set the number of topics
to 20, and analyze the impact of the number of topics later.
Evaluation metrics. To evaluate the quality of topics gen- To properly evaluate the learned topics, we follow the previ-
erated by topic models, we adopt topic coherence (TC) ous works (Kim et al. 2012; Shen et al. 2021) and select the
and topic diversity (TD) metrics. TC measures the seman- top 10 words with the highest probability under each topic as
tic consistency of top words within each topic. A higher the representative word list to calculate topic quality metrics.
TC metric indicates more relevant keywords within each The comparison results are available in Table 1. Our Con-
topic and better topic interpretability. Following the pre- vNTM outperforms all baselines on two TC metrics (i.e. CV
vious work (Shen et al. 2021), we choose two TC mea- and NPMI) on two datasets, which indicates that with the
surements, CV and normalized pointwise mutual informa- help of formulating the specific multi-turn and multi-role in-
tion (NPMI), to provide a robust evaluation. The NPMI of formation in the conversation, the topics discovered by Con-
the word pair (wi , wj ) is calculated as equation (18). CV vNTM have the best topic interpretability. GNTM achieves
score stands for a widely used Content Vector-based coher- the highest on TD, while ConvNTM is slightly behind. This
ence metric, adopted by (Röder, Both, and Hinneburg 2015). reason may be that GNTM generates words and edges based
1 3
http://yanran.li/dailydialog Our code and data are available at https://github.com/ssshddd/
2
https://github.com/facebookresearch/EmpatheticDialogues ConvNTM.
13613
Dataset DailyDialog EmpatheticDialogues
Method TD CV NPMI TQ TD CV NPMI TQ
LDA 0.390 0.4308 -0.0083 0.1680 0.510 0.4230 0.0011 0.2158
GSM 0.445 0.4931 -0.0040 0.2194 0.530 0.4486 0.0055 0.2378
ProdLDA 0.720 0.5363 -0.0007 0.3861 0.736 0.4610 0.0173 0.3393
ETM 0.690 0.5688 0.0364 0.3925 0.713 0.4690 0.0130 0.3342
GNTM 0.810 0.5916 0.0588 0.4792 0.812 0.4809 0.0289 0.3905
ConvNTM 0.750 0.6542 0.0831 0.4907 0.790 0.5136 0.0495 0.4057
Table 1: Comparison results of topic quality on DailyDialog and EmpatheticDialogues.
on topics at the same time, which may indirectly increase the Method TD TC NPMI TQ
sparsity among topic proportions. ETM and ProdLDA also ConvNTM (w/o contexts) 0.715 0.6240 0.0619 0.4462
have moderate TC metrics, but their TD is relatively low, ConvNTM (w/o graph) 0.705 0.6282 0.0657 0.4429
which is prone to generate redundant topics on the conver- ConvNTM (w/o speaker) 0.650 0.6099 0.0548 0.3964
sation dataset. Comprehensively considering the impact of ConvNTM (w/o Lco ) 0.780 0.6237 0.0645 0.4865
TC and TD, our ConvNTM which integrates multiple turns
ConvNTM 0.750 0.6542 0.0831 0.4907
and speaker roles can achieve state-of-the-art performance
on the TQ score.
Table 2: Ablation results for ConvNTM on DailyDialog.
Ablation Study
In order to verify the effectiveness of key modules of our
model, we compare ConvNTM with the following four
model variants: 1) ConvNTM (w/o contexts) removes the
conversation sequence encoder used to model multi-turn di-
alogue contexts; 2) ConvNTM (w/o graph) removes the
multi-role graph encoder used to model interactions between
speakers; 3) ConvNTM (w/o speaker) sets the number of
speakers to 1 that completely ignores the effect of the roles;
4) ConvNTM (w/o Lco ) remove the loss term Lco for the
word co-occurrence objective. Figure 2: Visualization of an example for discovered topics
Table 2 shows the comparison results of these different (one topic per line). Repeated words are in bold.
ablation methods on DailyDialog. Compared with the full
model, both ConvNTM (w/o contexts) and ConvNTM (w/o
graph) decrease on TC and TD, indicating that both the words in each line have strong associations and focus on a
multi-turn context structure and multi-role interaction infor- certain topic. This means that each learned topic has good
mation of the conversation have a significant impact on the internal coherence. The selected 4 topics can be summarized
topic quality. The performance of ConvNTM (w/o speaker) as food, family & friends, work, and traffic accidents. Mean-
is further degraded when the speaker’s role is not modeled while, ConvNTM has fewer repeated words, indicating less
and the utterances in the conversation are treated as sen- redundancy in the learned topics. While for GNTM, these
tences in the general document. This reflects the superior- topic words are mixed together, and some non-topic words
ity of ConvNTM over general NTMs for topic modeling on are repeated in different topics. For instance, “people” are
the unique properties of the conversation. In addition, when shown in multiple topics, and “work” and “family” appear
removing the word co-occurrence training objective, Con- in the same topic in GNTM, which destroys the topic diver-
vNTM (w/o Lco ) improves slightly on TD, while it drops sity, coherence and interpretability.
more significantly on TC, making the overall topic quality
worse. It means that considering word-occurrence informa- Analysis on Number of Topics
tion can help improve the coherence and interpretability of Since the number of topics is an important factor of the
learned topics. topic model, we compare the topic quality performance of
ConvNTM and several strong baselines. We set the vary-
Analysis on Discovered Topic Examples ing number of topics from 10 to 100, and the comparison
We also perform a qualitative analysis on discovered topics, results are shown in Figure 3. Our ConvNTM achieves the
comparing ConvNTM and the strong baseline GNTM. Fig- highest TC and TQ under all number of topics, which in-
ure 2 shows several representative topics learned by Con- dicates the robustness of our method on topic quality. All
vNTM and GNTM. We display the top 10 words under each models have high topic quality when the number of topics
topic per line. For our ConvNTM, we can see that the top is between 20 and 50. When the number of topics exceeds
13614
Figure 3: Comparison results of the varying number of topics on DailyDialog.
Method Accuracy Method PPL ↓ BLEU-1 ↑ Distinct-1 ↑

JAS 75.9 HERD 41.38 6.40 4.42
DAH-CRF+MANUALconv 86.5 TA-Seq2Seq 38.98 15.84 6.79
DAH-CRF+LDAconv 86.4 DAWnet 39.36 16.90 7.78
DAH-CRF+LDAuttr 88.1 THERD+LDA 36.46 18.26 7.90
STM 87.1 THERD+GNTM 36.68 18.53 8.26
STM+GNTM 87.2 THERD+ConvNTM 34.14 20.14 8.79
STM+ConvNTM 88.9
Table 4: Comparison results of topic-aware models for the
Table 3: Comparison results of topic-aware models for the dialogue response generation task on DailyDialog.
dialogue act classification task on DailyDialog.
GNTM. For dialogue response generation, we borrow the
50, TC tends to be stable or slightly decreases, and TD de- framework of THERD (Dziri et al. 2019), which proposes
creases significantly. ConvNTM can achieve the highest TD a topical hierarchical recurrent framework for multi-turn re-
when the number of topics is large, and hold the best topic sponse generation. THERD utilizes LDA to extract the top
quality under any number of topics. 100 topic words for each conversation. Here LDA can be di-
rectly replaced by GNTM to extract topic words. While for
our ConvNTM, we first label all the utterances of a conver-
Downstream Tasks
sation, and then extract the top 100 words with the highest
The essence of topic modeling is an unsupervised learning probability under these topics. We also compare other topic-
process for latent semantic structure, and we expect that not aware models in this task including HERD (Serban et al.
only ConvNTM can achieve state-of-the-art topic quality, 2016b), TA-Seq2Seq (Xing et al. 2017) and DAWnet (Wang
but we can leverage the topic information learned by Con- et al. 2018). The comparison results on DailyDialog are
vNTM to help improve downstream dialogue tasks. Here, shown in Table 4. THERD+ConvNTM can achieve better
we take dialogue act classification and response generation performance than all topic-aware baselines and GNTM on
as examples to verify that ConvNTM is helpful for improv- multiple metrics.
ing both classification and generation tasks. Specifically, we
choose GNTM as a strong baseline and respectively add
topic information learned by GNTM and ConvNTM into
Conclusion
topic-aware models for comparison. In this work, we propose the first Conversational Neural
We use different topic extraction approaches for different Topic Model (ConvNTM) specifically for the conversation
tasks. For dialogue act classification, we borrow the frame- scenario. We develop a hierarchical conversation encoder
work of (He et al. 2021b) (named STM), which utilized topic to capture the multi-turn dialogue structure. Considering
labels for each utterance when modeling speaker turns. We the impact of roles of different speakers in a conversation,
extract the topic labels using our ConvNTM and GNTM for we construct a multi-role interaction graph to formulate the
comparison, and replace original topic labels with them. We intra-speaker and inter-speaker dependencies. We then per-
also compare other topic-aware models in this task includ- form utterance-level fine-grained topic modeling by fusing
ing JAS (Wallace et al. 2013) and DAH (Li et al. 2019). The global and local topic information. Furthermore, we lever-
comparison results on DailyDialog are shown in Table 3. age the word co-occurrence relationship as a new training
This indicates that ConvNTM can indeed help improve this objective, which can be jointly trained with the neural varia-
task and it performs better than all topic-aware baselines and tional inference objective and further improve topic quality.
13615
Acknowledgements Holtgraves, T.; Srull, T. K.; and Socall, D. 1989. Conversa-
This work was supported by National Natural Science tion memory: The effects of speaker status on memory for
Foundation of China (NSFC Grant No. 62122089 and No. the assertiveness of conversation remarks. Journal of Per-
61876196), Beijing Outstanding Young Scientist Program sonality and Social Psychology, 56(2): 149.
NO. BJJWZYJH012019100020098, and Intelligent Social Jin, Y.; Zhao, H.; Liu, M.; Du, L.; and Buntine, W. 2021.
Governance Platform, Major Innovation & Planning Inter- Neural Attention-Aware Hierarchical Topic Model. arXiv
disciplinary Platform for the ”Double-First Class” Initiative, preprint arXiv:2110.07161.
Renmin University of China. We also wish to acknowledge Kim, H.; Sun, Y.; Hockenmaier, J.; and Han, J. 2012. Etm:
the support provided and contribution made by Public Pol- Entity topic models for mining documents associated with
icy and Decision-making Research Lab of RUC. Rui Yan entities. In 2012 IEEE 12th International Conference on
is supported by Beijing Academy of Artificial Intelligence Data Mining, 349–358. IEEE.
(BAAI) and CCF-Tencent Rhino-Bird Open Research Fund.
Kim, T.; and Vossen, P. 2021. Emoberta: Speaker-aware
emotion recognition in conversation with roberta. arXiv
References preprint arXiv:2108.12009.
Adiwardana, D.; Luong, M.-T.; So, D. R.; Hall, J.; Fiedel,
N.; Thoppilan, R.; Yang, Z.; Kulshreshtha, A.; Nemade, G.; Kingma, D. P.; and Welling, M. 2013. Auto-encoding varia-
Lu, Y.; et al. 2020. Towards a human-like open-domain chat- tional bayes. arXiv preprint arXiv:1312.6114.
bot. arXiv preprint arXiv:2001.09977. Lang, K. 1995. Newsweeder: Learning to filter netnews. In
Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latent dirich- Machine Learning Proceedings 1995, 331–339. Elsevier.
let allocation. Journal of machine Learning research, 3(Jan): Larochelle, H.; and Lauly, S. 2012. A neural autoregressive
993–1022. topic model. Advances in Neural Information Processing
Chen, C.; and Ren, J. 2017. Forum latent Dirichlet allocation Systems, 25.
for user interest discovery. Knowledge-Based Systems, 126: Li, J.; Liao, M.; Gao, W.; He, Y.; and Wong, K.-F. 2016.
1–7. Topic Extraction from Microblog Posts Using Conversation
Cheng, X.; Yan, X.; Lan, Y.; and Guo, J. 2014. Btm: Topic Structures. In Proceedings of the 54th Annual Meeting of
modeling over short texts. IEEE Transactions on Knowledge the Association for Computational Linguistics (Volume 1:
and Data Engineering, 26(12): 2928–2941. Long Papers), 2114–2123. Berlin, Germany: Association for
Dieng, A. B.; Ruiz, F. J.; and Blei, D. M. 2020. Topic mod- Computational Linguistics.
eling in embedding spaces. Transactions of the Association Li, R.; Lin, C.; Collinson, M.; Li, X.; and Chen, G. 2019. A
for Computational Linguistics, 8: 439–453. Dual-Attention Hierarchical Recurrent Neural Network for
Dieng, A. B.; Wang, C.; Gao, J.; and Paisley, J. 2017. Top- Dialogue Act Classification. In CoNLL.
icRNN: A Recurrent Neural Network with Long-Range Se- Li, Y.; Su, H.; Shen, X.; Li, W.; Cao, Z.; and Niu, S.
mantic Dependency. In International Conference on Learn- 2017. DailyDialog: A Manually Labelled Multi-turn Di-
ing Representations. alogue Dataset. In Proceedings of The 8th International
Dziri, N.; Kamalloo, E.; Mathewson, K.; and Zaiane, O. Joint Conference on Natural Language Processing (IJCNLP
2019. Augmenting Neural Response Generation with 2017).
Context-Aware Topical Attention. In Proceedings of the Li, Z.; Zhang, J.; Fei, Z.; Feng, Y.; and Zhou, J. 2021. Con-
First Workshop on NLP for Conversational AI, 18–31. Flo- versations Are Not Flat: Modeling the Dynamic Information
rence, Italy: Association for Computational Linguistics. Flow across Dialogue Utterances. In Proceedings of the 59th
Gu, J.-C.; Li, T.; Liu, Q.; Ling, Z.-H.; Su, Z.; Wei, S.; Annual Meeting of the Association for Computational Lin-
and Zhu, X. 2020. Speaker-aware BERT for multi-turn re- guistics and the 11th International Joint Conference on Nat-
sponse selection in retrieval-based chatbots. In Proceedings ural Language Processing (Volume 1: Long Papers), 128–
of the 29th ACM International Conference on Information 138. Online: Association for Computational Linguistics.
& Knowledge Management, 2041–2044. Lin, T.; Hu, Z.; and Guo, X. 2019. Sparsemax and relaxed
He, Z.; Tavabi, L.; Lerman, K.; and Soleymani, M. 2021a. Wasserstein for topic sparsity. In Proceedings of the twelfth
Speaker Turn Modeling for Dialogue Act Classification. In ACM international conference on web search and data min-
Findings of the Association for Computational Linguistics: ing, 141–149.
EMNLP 2021, 2150–2157. Punta Cana, Dominican Repub- Liu, J.; Zou, Y.; Zhang, H.; Chen, H.; Ding, Z.; Yuan, C.;
lic: Association for Computational Linguistics. and Wang, X. 2021. Topic-Aware Contrastive Learning for
He, Z.; Tavabi, L.; Lerman, K.; and Soleymani, M. 2021b. Abstractive Dialogue Summarization. In Findings of the
Speaker Turn Modeling for Dialogue Act Classification. Association for Computational Linguistics: EMNLP 2021,
arXiv preprint arXiv:2109.05056. 1229–1243. Punta Cana, Dominican Republic: Association
Hofmann, T. 1999. Probabilistic latent semantic indexing. for Computational Linguistics.
In Proceedings of the 22nd annual international ACM SI- Ma, X.; Zhang, Z.; and Zhao, H. 2021. Enhanced Speaker-
GIR conference on Research and development in informa- aware Multi-party Multi-turn Dialogue Comprehension.
tion retrieval, 50–57. arXiv preprint arXiv:2109.04066.
13616
Miao, Y.; Grefenstette, E.; and Blunsom, P. 2017. Discov- communication. In Proceedings of the 2013 Conference on
ering discrete latent topics with neural variational inference. Empirical Methods in Natural Language Processing, 1765–
In International Conference on Machine Learning, 2410– 1775.
2419. PMLR. Wang, R.; Zhou, D.; and He, Y. 2019. Atm: Adversarial-
Panwar, M.; Shailabh, S.; Aggarwal, M.; and Krishna- neural topic model. Information Processing & Management,
murthy, B. 2020. TAN-NTM: Topic attention networks for 56(6): 102098.
neural topic modeling. arXiv preprint arXiv:2012.01524. Wang, W.; Huang, M.; Xu, X.-S.; Shen, F.; and Nie, L. 2018.
Qiu, L.; Zhao, Y.; Shi, W.; Liang, Y.; Shi, F.; Yuan, T.; Yu, Chat more: Deepening and widening the chatting topic via a
Z.; and Zhu, S.-C. 2020a. Structured Attention for Unsu- deep model. In The 41st international acm sigir conference
pervised Dialogue Structure Induction. In Proceedings of on research & development in information retrieval, 255–
the 2020 Conference on Empirical Methods in Natural Lan- 264.
guage Processing (EMNLP), 1889–1899. Online: Associa- Xing, C.; Wu, W.; Wu, Y.; Liu, J.; Huang, Y.; Zhou, M.; and
tion for Computational Linguistics. Ma, W.-Y. 2017. Topic Aware Neural Response Generation.
Qiu, L.; Zhao, Y.; Shi, W.; Liang, Y.; Shi, F.; Yuan, T.; In Proceedings of the Thirty-First AAAI Conference on Arti-
Yu, Z.; and Zhu, S.-C. 2020b. Structured attention for ficial Intelligence, AAAI’17, 3351–3357. AAAI Press.
unsupervised dialogue structure induction. arXiv preprint Zhang, Y.; Sun, S.; Galley, M.; Chen, Y.-C.; Brockett, C.;
arXiv:2009.08552. Gao, X.; Gao, J.; Liu, J.; and Dolan, B. 2019. Dialogpt:
Rashkin, H.; Smith, E. M.; Li, M.; and Boureau, Y.-L. 2019. Large-scale generative pre-training for conversational re-
Towards Empathetic Open-domain Conversation Models: a sponse generation. arXiv preprint arXiv:1911.00536.
New Benchmark and Dataset. In ACL. Zhao, H.; Phung, D.; Huynh, V.; Le, T.; and Buntine, W.
Rehurek, R.; and Sojka, P. 2011. Gensim–python framework 2021. Neural Topic Model via Optimal Transport. In Inter-
for vector space modelling. NLP Centre, Faculty of Infor- national Conference on Learning Representations.
matics, Masaryk University, Brno, Czech Republic, 3(2). Zhu, L.; Pergola, G.; Gui, L.; Zhou, D.; and He, Y. 2021.
Röder, M.; Both, A.; and Hinneburg, A. 2015. Exploring Topic-Driven and Knowledge-Aware Transformer for Dia-
the space of topic coherence measures. In Proceedings of logue Emotion Detection. In Proceedings of the 59th An-
the eighth ACM international conference on Web search and nual Meeting of the Association for Computational Linguis-
data mining, 399–408. tics and the 11th International Joint Conference on Natural
Serban, I. V.; Garcı́a-Durán, A.; Gulcehre, C.; Ahn, S.; Language Processing (Volume 1: Long Papers), 1571–1582.
Chandar, S.; Courville, A.; and Bengio, Y. 2016a. Gener- Online: Association for Computational Linguistics.
ating Factoid Questions With Recurrent Neural Networks: Zhu, Q.; Feng, Z.; and Li, X. 2018. GraphBTM: Graph en-
The 30M Factoid Question-Answer Corpus. In Proceed- hanced autoencoded variational inference for biterm topic
ings of the 54th Annual Meeting of the Association for Com- model. In Conference on Empirical Methods in Natural Lan-
putational Linguistics (Volume 1: Long Papers), 588–598. guage Processing (EMNLP 2018).
Berlin, Germany: Association for Computational Linguis-
tics.
Serban, I. V.; Sordoni, A.; Bengio, Y.; Courville, A.; and
Pineau, J. 2016b. Building End-to-End Dialogue Systems
Using Generative Hierarchical Neural Network Models. In
Proceedings of the Thirtieth AAAI Conference on Artificial
Intelligence, AAAI’16, 3776–3783. AAAI Press.
Shen, D.; Qin, C.; Wang, C.; Dong, Z.; Zhu, H.; and Xiong,
H. 2021. Topic Modeling Revisited: A Document Graph-
based Neural Network Perspective. Advances in Neural In-
formation Processing Systems, 34: 14681–14693.
Srivastava, A.; and Sutton, C. 2016. Neural variational in-
ference for topic models. ArXiv Preprint, 1(1): 1–12.
Srivastava, A.; and Sutton, C. 2017. Autoencoding Varia-
tional Inference For Topic Models. In International Confer-
ence on Learning Representations.
Sun, Y.; Loparo, K.; and Kolacinski, R. 2020. Conversa-
tional structure aware and context sensitive topic model for
online discussions. In 2020 IEEE 14th International Con-
ference on Semantic Computing (ICSC), 85–92. IEEE.
Wallace, B. C.; Trikalinos, T. A.; Laws, M. B.; Wilson, I. B.;
and Charniak, E. 2013. A generative joint, additive, se-
quential model of topics and speech acts in patient-doctor
13617

26595-Article Text-30658-1-2-20230626

Uploaded by

Copyright:

Available Formats

26595-Article Text-30658-1-2-20230626

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

26595-Article Text-30658-1-2-20230626

Uploaded by

Copyright:

Available Formats

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

ConvNTM: Conversational Neural Topic Model

Abstract we assume that there are different characteristics to model

Graph Convolution Layer Layer Layer

Table 1: Comparison results of topic quality on DailyDialog and EmpatheticDialogues.

Method Accuracy Method PPL ↓ BLEU-1 ↑ Distinct-1 ↑

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.