Disentangled Representation for Diversified Recommendations

Xiaoying Zhang Hongning Wang Hang Li

AI Lab, Bytedance Department of Computer Science AI Lab, Bytedance
China University of Virginia, USA China
zhangxiaoying.xy@bytedance.com hw5x@virginia.edu lihang.lh@bytedance.com

ABSTRACT as to identify the items that best suit users’ preferences. The success
Accuracy and diversity have long been considered to be two con- of recommender system in enhancing user experience and boost-
flicting goals for recommendations. We point out, however, that ing platform utility has been witnessed in a number of scenarios
as the diversity is typically measured by certain pre-selected item including e-commerce [17, 42], online news recommendation [33]
attributes, e.g., category as the most popularly employed one, im- and streaming services [9].
proved diversity can be achieved without sacrificing recommen- Recommendation accuracy, which measures whether a recom-
dation accuracy, as long as the diversification respects the user’s mendation model can recommend items that users will like, serves
preference about the pre-selected attributes. This calls for a fine- as the dominant target or even the only target in most previous
grained understanding of a user’s preferences over items, where work [9, 16, 17, 31, 42]. Various complicated models [9, 16, 42]
one needs to recognize the user’s choice is driven by the quality of have been proposed for higher accuracy. While recommendation
the item itself, or the pre-selected attributes of the item. accuracy has been shown to be closely related to user satisfaction,
In this work, we focus on diversity defined on item categories. We it is never the only rule of thumb. Recent work found the rec-
propose a general diversification framework agnostic to the choice ommendation diversity, which measures the dissimilarity among
of recommendation algorithm. Our solution disentangles the learnt recommended items regarding certain pre-selected item attributes
user representation in the recommendation module into category- (e.g., item category) also plays an important role in the overall user
independent and category-dependent components to differentiate experience [18, 32, 43]. For example, even if a user is a fan of basket-
a user’s preference over items from two orthogonal perspectives. ball, he/she can still get bored with recommendations only about
Experimental results on three benchmark datasets and online A/B basketball videos or news, which increases the risk of user attrition.
test demonstrate the effectiveness of our solution in improving both
recommendation accuracy and diversity. In-depth analysis suggests
that the improvement is due to our improved modeling of users’
categorical preferences and refined ranking within item categories.

• Information systems → Information retrieval diversity;
Collaborative filtering.

Recommender system, recommendation diversity, disentangled
user representation
ACM Reference Format:
Figure 1: Illustration of recommendation accuracy and diver-
Xiaoying Zhang, Hongning Wang, and Hang Li. 2023. Disentangled Repre-
sentation for Diversified Recommendations. In Proceedings of the Sixteenth sity optimization in different recommendation models.
ACM International Conference on Web Search and Data Mining (WSDM ’23), Following previous work [30, 31, 40], we focus on diversity de-
February 27-March 3, 2023, Singapore, Singapore. ACM, New York, NY, USA, fined on item categories in this paper and aim to address the so-
9 pages. https://doi.org/10.1145/3539597.3570389 called accuracy-diversity dilemma [40]. On one hand, recommen-
dation models with accuracy as their primary target often lose
1 INTRODUCTION diversity to some extent, due to overly emphasizing items in the
Recommender systems learn users’ interests from historical obser- dominant categories in a user’s interaction history [30, 31]. Fig-
vations (e.g., their clicks, bookmarked or purchased items, etc.) so ure 1(a) illustrates this issue with an example in movie recommenda-
tion, where 70% of the movies watched by a user are action movies,
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed which leads 90% of the system’s recommendations to fall in the
for profit or commercial advantage and that copies bear this notice and the full citation action movie category. Worse still, because of the feedback loop [4],
on the first page. Copyrights for components of this work owned by others than ACM the emphasis on the dominant categories in the system’s recom-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a mendations will be further intensified when the user follows the
fee. Request permissions from permissions@acm.org. recommendations, causing further decreased recommendation di-
WSDM ’23, February 27-March 3, 2023, Singapore, Singapore versity and issues like filter bubbles [25] and echo chambers [14]. On
© 2023 Association for Computing Machinery.
ACM ISBN 978-1-4503-9407-9/23/02. . . $15.00 the other hand, simply diversifying recommendations over all item
https://doi.org/10.1145/3539597.3570389 categories without considering the user’s categorical preference

Disentangled Representation for Diversified Recommendations WSDM ’23, February 27-March 3, 2023, Singapore, Singapore

Table 1: Statistics of Three Datasets a user’s historical interactions as propensity scores to reweigh
Dataset #Users #Items #Interactions #Group items of this category during training. Propensity clipping [29]
ML-1M 6040 3883 1000209 18 is also employed to reduce the variance with clipping threshold
ML-10M 69878 10680 10000047 19 searched in {0.001, 0.005, 0.01, 0.05, 0.1}.
Amazon-Books 22929 33130 1178117 141 • MMR [3]: One of the state-of-art post-processing methods for
diversified recommendations. It re-ranks the recommended items
• Can the disentangled category-independent representation accu- generated by NFM by a greedy strategy to reduce redundancy.
rately distinguish a user’s preference within item categories? • DPP [5]: An effective post-processing method for diversified
A case study is also conducted to illustrate the effectiveness of the recommendations. It selects a diverse set of items from the rec-
proposed DCRS more explicitly. ommended items generated by NFM by balancing the relevance
of items and their similarities.
3.1 Experimental Settings • PD_GAN [36]: A recent work that leverages the generative ad-
Dataset. We use three widely-used datasets under different recom- versarial networks (GAN) framework to generate diverse and rel-
mendation scenarios for evaluation. evant recommendations. Its discriminator aims to distinguish the
• ML-1M1 : This dataset contains 1 million ratings from 6040 users generated diverse set of items by its generator from the ground-
on 3883 movies from the online movie recommendation service truth sets randomly sampled from the observed data of the user.
MovieLens. It also contains rich user features (e.g., age, gender, • DGCN [40]: A recent work that leverages rebalanced neighbor
etc.) and movie features (e.g., titles). We encode user and movie discovering, category-boosted negative sampling and adversarial
features following previous work [31, 42]. We take 𝑦𝑢,𝑖 = 1, if learning on top of Graph Convolutional Networks (GCN) for
user 𝑢 gives item 𝑖 a rating greater than 3, otherwise 𝑦𝑢,𝑖 = 0. diversified recommendations.
• ML-10M2 : This dataset is also from MovieLens. It contains 10 • DecRS [31]: A recent work for alleviating the bias that previous
million ratings from 69878 users on 10680 movies. Similarly, recommendation models over-recommend items of the dominant
we take 𝑦𝑢,𝑖 = 1, if user 𝑢 gives item 𝑖 a rating greater than categories in a user’s interaction history from a causal view. It
3, otherwise 𝑦𝑢,𝑖 = 0. aims at improving both recommendation accuracy and diversity.
⊥𝐶 in Eq.(7) for
• Amazon-Books3 : This dataset contains reviews and metadata • DCRS_CI: A variant of DCRS that leverages 𝑝ˆ𝑢,𝑖
of books from Amazon. To ensure data quality, we only keep item ranking without considering the user’s preference over cat-
categories that link to more than 20 books with 141 categories, egories. Its comparison with DCRS_CI can reveal the importance
and adopt the 20-core settings [31], i.e., discarding users and of modeling users’ categorical preference.
books with less than 20 interactions. To make the number of Implementation Details. Following previous work [17, 31], we
positive and negative samples balanced, we take 𝑦𝑢,𝑖 = 1, if user set the embedding size of user/item features to 64 (i.e., 𝑑 = 64),
𝑢 gives item 𝑖 a rating greater than 4, otherwise 𝑦𝑢,𝑖 = 0. and used AdaGrad [11] for optimization. We used grid search to
The statistics of the three datasets are summarized in Table 1. On select the hyperparameters based on the model’s performance on
each dataset, we also randomly sampled items that the user did not validation dataset: the learning rate was searched in {0.005, 0.01,
interact with as negative instances. We then sorted the user-item 0.05}; the normalization coefficient was searched in {0, 0.1, 0.2}; the
interactions by timestamps, and split them into training, validation, dropout ratio was searched in {0.2, 0.3, ..., 0.5}; 𝜆 for controlling
and testing datasets with the ratio of 80%, 10%, and 10%. strength of category independent and dependent constraints was
Baselines. The proposed DCRS is a general and model-agnostic searched in {0.01, 0.05, 0.1, 0.5, 1}. For baseline algorithms, when
framework to disentangle category dependent and independent rep- evaluating on the dataset the algorithms were also evaluated in their
resentations for accurate and diverse recommendations. In this pa- original papers, we adopted the recommended hyperparameters
per, we instantiated it with Neural Factorization Machine (NFM) [17], from the original paper; otherwise we performed a similar grid
one representative recommendation model that has been widely search as above with the search range following the original paper.
used. NFM was also taken as the backbone model in several closely
3.2 Performance on Recommendation Accuracy
related work for diversified recommendations [15, 31]. We com-
pared DCRS with the following algorithms that have different fo- & Diversity
cuses on recommendation diversity and accuracy. We first evaluate all algorithms in terms of recommendation accu-
• NFM [17]: The state-of-the-art recommendation model serving racy and diversity.
as the backbone model of DCRS. Evaluation Metrics. We evaluate the accuracy of a recommen-
• Unawareness [15]: It also takes NFM as the backbone model dation model from two perspectives: (1) Whether the model can
and tries to improve diversity by directly removing categorical rank positively interacted items of a user before those negatively
features of items from model input. interacted ones accurately in the testing dataset; (2) Whether the
• IPS [29]: It is a state-of-the-art technique of improving diversity model can accurately retrieve those positively interacted items in
by boosting item categories that a user interacted with less often, the testing dataset from the item pool, which includes all items that
while suppressing the dominant categories in the user’s inter- the user did not interact with in the training dataset. For MMR and
action history. Specifically, it takes the category distribution in DPP, because they only re-rank the recommended items generated
1 https://grouplens.org/datasets/movielens/1m/ by NFM, a specifically created item pool that contains top-200 items
2 https://grouplens.org/datasets/movielens/10m/ of NFM is used. We adopted AUC [12] and UAUC [42] as metrics to
3 https://jmcauley.ucsd.edu/data/amazon/ evaluate the first perspective. Basically, UAUC is a micro-average

Disentangled Representation for Diversified Recommendations WSDM ’23, February 27-March 3, 2023, Singapore, Singapore

