New Recommendation Techniques For Multi-Criteria Rating Systems
New Recommendation Techniques For Multi-Criteria Rating Systems
New Recommendation Techniques For Multi-Criteria Rating Systems
has been largely untouched. In order to take full advantage of multi-criteria ratings in various
applications, new recommendation techniques are required. In this paper we propose two new
discuss multiple variations of each proposed approach, and perform empirical analysis of these
approaches using a real-world dataset. Our experimental results show that multi-criteria ratings
rating estimation.
1
1. Introduction and Motivation
In order to make good decisions in any situation, it is typically necessary to possess a certain
especially on the Internet. For instance, if an individual wants to rent a movie online, there are
numerous choices available. However, too much information can make decision-making
systems help to overcome this problem by providing personalized suggestions regarding which
information is most relevant to users. Most of online shopping sites and many other applications
now use recommender systems. The most popular examples include Netflix, which recommends
movies, and Amazon.com, which recommends books, CDs, and various other products. If users
offer their feedback on purchased or consumed items, the task of recommender systems is to
predict user preferences for the yet unseen items based on users prior feedback and activities
and, subsequently, to recommend the item(s) with the highest estimated relevance to the user.
Recommender systems are usually classified into three categories based on their
& Shoham 1997). Content-based recommender systems recommend items similar to the ones the
user preferred in the past. Collaborative (or collaborative filtering) recommender systems
recommend items that users with similar preferences have liked in the past. Finally, hybrid
approaches combine content-based and collaborative methods, which can be done in many
different ways (Adomavicius & Tuzhilin 2005). Furthermore, recommender systems can also be
classified based on the nature of their algorithmic technique into memory-based and model-based
heuristics that calculate recommendations on the fly based directly on the previous user
2
activities. In contrast, model-based techniques use previous user activities to first learn a
predictive model (typically using some statistical or machine-learning methods), which is then
scope of this paper,1 it is important to note that vast majority of current recommender systems
typically use a single criterion (i.e., a single numerical rating) to represent the utility of an item to
a user in the two-dimensional Users Items space. The recommendation process starts with the
specification of the initial set of ratings that is either explicitly provided by the users or is
implicitly inferred by the system. For example, in case of a movie recommender system, user
John Doe may assign a rating of 11 (out of 13) for movie Vertigo, i.e., set R(John_Doe,
Vertigo) = 11. Once these initial ratings are specified, a recommender system tries to estimate
for the (user, item) pairs that have not been rated yet. R0 is usually represented by a totally
ordered set (e.g., integers or real numbers within a certain range). Once function R is estimated, a
recommender system can recommend the highest-rated item (or a set of N highest-rated items)
for each user. In summary, one goal of a typical recommender system is to correctly estimate the
ratings of unrated items based on the given ratings; another goal is to find items that maximize
applications, multi-criteria rating systems are being more and more commonly employed in many
industries. Restaurant guides, such as Zagats Guide, provide three criteria for restaurant ratings
1
A recent survey of recommender systems research literature can be found in (Adomavicius & Tuzhilin 2005).
3
(e.g., food, dcor, and service). Online shopping malls, such as Circuitcity.com and Buy.com,
use multi-criteria ratings for consumer electronics (e.g., display, performance, battery life, and
cost). Note that the aforementioned multi-criteria rating systems are not used in the context of
personalization, i.e., the rating on each criterion is the same for all users (for example, the food
rating for a specific restaurant published by Zagats Guide) and not personalized to each
recently Yahoo! Movies launched a movie recommendation service that uses multi-criteria
ratings for each movie, which indicates that multi-criteria data provides value both to online
content providers and consumers and may become an important component in different
personalization applications. Therefore, in this paper we propose several new approaches on how
information.
The rest of this paper is organized as follows. In Section 2, we briefly discuss some of
the research related to multi-criteria ratings, including from the recommender systems literature.
algorithm, which is used as an example throughout the paper. We then propose new
empirical analysis based on a real-world dataset. Finally, we conclude our paper in Section 6.
2. Related Work
Multi-criteria problems have been studied extensively in operation research and decision science
fields. The majority of engineering problems are essentially multi-criteria optimization problems
(Statnikov & Matusov 1995). For example, when an airplane is being designed, its reliability,
4
longevity, efficiency, cost, and the combination of other utilization factors need to be considered.
Typical methods to solve the multi-criteria optimization problems include: finding Pareto
optimal solutions; optimizing the most important criterion and converting other criteria to
problem, i.e., it considers various points of view, such as financial, human resources-related, and
environmental aspects in making a decision (Figueria et al. 2005). The objective of multi-criteria
decision analysis is to assist a decision maker in choosing the best alternative when multiple
criteria conflict and compete with each other. Most commonly used decision aiding methods,
such as outranking methods and the analytical hierarchy process, are based on multi-criteria
hierarchy process structures multi-criteria into a hierarchy and calculates the score of each
multi-criteria decision problem. For example, when we purchase a car, we consider its multiple
attributes, such as price, brand, and color. The conjoint model is most commonly used technique
for solving multi-criteria problems in this field (Green et al. 2001). This model determines the
importance weights of product attributes and the values of the attributes. The customers
preference for the product then can be calculated as a linear combination of weights and values.
5
procurement settings and enable auction participants negotiate not only on price, but also on
other attributes of a deal (e.g., quality level, style, delivery date). It has been demonstrated that
multi-attribute auctions have several advantages over their single-attribute (i.e., price-only)
counterparts, including the improvements in the overall utility and suitability for various
not intended for personalization and recommendation settings. These problems find the
solutions or items that are optimal in general (i.e., optimal with respect to all users), and
differences in individual user preferences are not explicitly considered. Recently, multi-criteria
rating problems have started receiving attention in recommender systems research and are
regarded as one of the important issues for the next generation of recommender systems
(Adomavicius & Tuzhilin 2005). In recommender systems literature, the roots of multi-criteria
ratings could be traced to the approaches that started incorporating content-based features into
favorite content attributes (e.g., comedy movies) based on the content analysis of the
previously rated items, and then also to recommend items to a user based not only on the ratings
of similar users, but also based on these favorite content attributes (Balabanovic & Shoham
1997). However, the users were able to submit just a single rating for each item, and could not
specify their individualized feedback about a specific movie component/aspect (such as movies
visual effects).
In addition, Ricci et al. (2002) developed a recommender system for personalizing travel
using case-based reasoning techniques. The recommendations are performed by ranking and
aggregating elementary items (locations, activities, services) based on the users preferences and
6
a repository of past travels. While these techniques do not use multi-criteria ratings per se, the
recommendation process does take into account multiple criteria, and the optimization is
capabilities based on item content information. For example, Schafer (2005) implements a meta-
recommendation system that allows users to indicate the preference for each content attribute
(e.g., movie genre, MPAA rating, or film length) and rate the importance of these attributes. For
example, users can indicate that they want only comedy movies, and that it is the most
important condition for recommendations the users requirements will filter the potential
recommendations towards what the users really want. Note, however, that this does not
represent a multi-criteria rating environment, since the users are specifying general filtering
requirements for all movies (such as specifying the preferred value and weight for movie genre
attribute). Similarly, Lee et al. (2002) also obtain the importance weights of content attributes
directly from the user. They use each attributes rank to compare the items, but the value or rank
of each attribute is assumed to be the same for all users. In contrast to Schafer (2005) and Lee et
al. (2002), in multi-criteria rating environment users would be able to specify subjective ratings
for various components of individual items (e.g., to rate visual effects component for the Star
Wars movie), which could then be leveraged for prediction and personalization purposes.
personalization literature that are somewhat related to the issue of incorporating and leveraging
multi-criteria ratings in recommender systems, it would be fair to say that this issue is largely
unexplored. For this reason, in this paper we focus on new recommendation techniques for
7
3. Background: Traditional Single-Rating Similarity-Based Collaborative
Filtering Approach
Before proceeding with the discussion on new recommendation techniques for multi-criteria
rating settings, we briefly describe one of the traditional and commonly used single-rating
paper.
Section 1, lets consider the memory-based collaborative filtering technique that estimates R(u, i)
the rating that user u would give to item i by computing the weighted average of all known
ratings R(u', i), where user u' is similar to u. Two popular ways to compute this weighted
R (u, i ) = z
u N ( u )
sim(u , u ) R(u, i ) ; (2)
R (u, i ) = R(u ) + z
u N ( u )
( )
sim(u, u) R(u, i ) R(u) . (3)
Here the value of rating R(u', i) is weighted by the similarity of user u' to user u the more
similar the two users are, the more weight R(u', i) will have in the computation of rating R(u, i).
z =1 u N ( u )
sim(u , u ) , R(u ) represents the average rating of user u, and N(u) represents the
set of users that are similar to user u. The size of set N(u) can range anywhere from 1 to all users
in the dataset. Limiting the neighborhood size to some specific number (e.g., 3) will determine
how many similar users will be used in the computation of rating R(u, i).
8
Furthermore, there are several ways to compute similarity sim(u, u') between two users,
including cosine-based and correlation-based computations (Breese et al. 1998). We will use
the cosine-based similarity in this paper, since it is arguably the most commonly used technique
for determining how similar two users are in memory-based collaborative filtering algorithms.
Assuming I(u, u') represents the set of all items rated by both users u and u', the cosine-based
sim(u , u ) = R(u, i ) R(u, i ) R(u, i ) 2 R(u , i ) 2 (4)
iI ( u ,u) iI ( u ,u ) iI ( u ,u )
In addition, because of the inherent symmetry between users and items in the traditional
memory-based collaborative filtering setting, this approach can be either user-based or item-
based, depending on whether we want to calculate the similarity between users or items.
Equations (2) and (3) represent the user-based approach, but they can be straightforwardly
rewritten for the item-based approach. For example, the item-based adjusted weighted sum can
R (u , i ) = R(i ) + z
i N ( i )
(
sim(i, i) R(u, i) R(i) ) (5)
and z, R (i ) , sim(i, i'), and N(i) are analogous to their user-based counterparts.
In the rest of the paper, unless explicitly stated otherwise, by traditional collaborative
filtering approach we will refer to the user-based adjusted weighted sum approach (3) that uses
Finally, recommender systems typically recommend the items with the highest predicted
rating to the user. In other words, recommenders often are not concerned about predicting the
ratings of all items as accurately as possible, but rather about accurately predicting the highest-
rated items, since users in real-world personalization applications are usually interested in
9
looking only at few highest-ranked item recommendations. Therefore, it is useful to evaluate the
recommender system performance based on items that get the top N highest scores for each user,
assuming, of course, that the values of the top N ratings are high enough to merit an actual
recommendation. This is the evaluation approach that we adopt in this paper, as will be
discussed in Section 5.
In addition to the overall rating, multi-criteria ratings provide additional information about user
increase the accuracy of the recommendations. Therefore, new techniques are needed in order to
effectively incorporate the multi-criteria rating information into the recommendation process.
The goal of multi-criteria recommender systems is to find items that maximize each
users utility, just as in the single-rating recommender systems. Therefore, one of the important
goals of recommendation systems is to be able to predict the overall rating of each item for each
user, because the system ultimately needs to compare the items based on their overall ratings and
recommend the best items to the users. The difference between single-rating and multi-criteria
rating systems is that the latter have more information about the users and items, which can be
effectively used in the recommendation process. More formally, the general form of a rating
where R0 is the set of possible overall rating values, and Ri represents the possible rating values
for each individual criterion i (i = 1, , k), typically on some numeric scale (e.g., from 1 to 13).
10
In the remainder of this section we propose two new recommendation approaches and
present several different variations of each. The first approach is designed to extend the
approach is not restricted to any specific algorithm. In other words, any existing single-criteria
conjunction with this approach. And, as mentioned earlier, throughout the paper we will use one
Consider a movie recommendation application, where users provide the recommender system
with a single rating (between 1 and 13) for each movie they have seen. Moreover, suppose that
this recommender system is using a traditional user-based collaborative filtering approach for
rating prediction, as described in Section 3. In this case, according to Equation (3), any rating
that user u would give to yet unseen movie i would be estimated based on how users u' that are
similar to target user u rated movie i, i.e., unknown rating R(u, i) is calculated based on ratings
R(u', i). Therefore, the more accurately the system determines who the true peers (or nearest
neighbors) of u are, the more accurate the rating prediction should be. The traditional (two-
dimensional) collaborative filtering calculates the similarity between users u and u' based on how
similar their ratings are for the movies they both have seen.
Figure 1 illustrates this estimation process with a simple example. Assume, that we have
five users u1, , u5 and five movies i1, , i5. Furthermore, lets suppose that the recommender
system needs to estimate how much the target user u1 would like movie i5 and, as indicated in
Figure 1, that all other ratings of different users to different movies are known. Then, the
traditional collaborative filtering approach finds the users that are closest to u1 and that have seen
11
movie i5. In this case, u2 and u3 seem to be perfect matches for user u1, since all of them rated
the common movies exactly the same (see Figure 1). Since both u2 and u3 rate movie i5 as 9, the
Now lets consider the same scenario as above, but in a multi-criteria setting.
Specifically, lets assume that we have the same five users u1, , u5 and five movies i1, , i5.
Also, rating R(u1, i5) is unknown and needs to be predicted, and, as indicated in Figure 2, all
other overall ratings of different users to different movies are known and are exactly the same as
before (in Figure 1). In addition, lets assume that each user is also asked to provide the
feedback about the movie on four specific criteria: story, acting, direction, and visuals2, and that
the overall rating in this case is a simple average of the four individual criteria ratings.
Following the idea behind the standard collaborative filtering approach, in order to
predict R(u1, i5) the recommender system should find the users that are closest to u1 and that have
seen movie i5. However, because of all the additional information that is available in the form of
multi-criteria ratings, one can clearly see that users u2 and u3 are quite different in their tastes and
preferences from user u1, even though their overall ratings for each movie match perfectly. In
2
As is done on some movie review websites, such as Yahoo! Movies (http://movies.yahoo.com).
12
particular, the movie aspects that u1 hated (story and acting) were really liked by u2 and u3 and
vice versa. However, in recommender systems that are based on single-criteria ratings, this
information would be hidden within the aggregate overall rating, which may lead to inaccurate
insights about the true similarity between user preferences (as in this example). Users u4 and u5
seem to be much better matches for user u1 in this example, since not only their overall ratings
are similar, but their preferences for different movie aspects were very similar as well (see Figure
2). Since both u4 and u5 rate movie i5 as 5, the value of target rating R(u1, i5) would be predicted
as 5, which is a very different outcome from the one obtained in a single-criteria rating scenario.
In summary, while the overall rating that a user gives to an item provides the information
regarding how much the user liked the item, multi-criteria ratings provide some insights
regarding why the user liked the item as much as she did. Therefore, having multi-criteria ratings
provides the possibility to estimate the similarity between two users more accurately.
Based on this idea, we propose to extend the standard collaborative filtering algorithm to
include multi-criteria ratings. Specifically, we propose several different ways to include multi-
criteria rating information in the calculation of the similarity between two different users sim(u,
u') or two different items sim(i, i'). Then, given the newly calculated similarity, the rating
13
prediction can be done using the weighted sum or adjusted weighted sum in the same way as
with a standard collaborative filtering algorithm, i.e., using Equations (2), (3), or (5). Below we
describe two different approaches to leverage multi-criteria ratings in the similarity computation.
This approach can use any standard similarity metric, such as cosine-based (4), and calculates the
similarity between users (or items) based on each individual criteria. Lets assume that each
rating that user u gives to item i consists of an overall rating r0, and k multi-criteria ratings r1,
, rk, i.e.,
Then, k+1 different similarity estimations can be obtained by using some standard metric to
measure similarity between users u and u': sim0(u, u') represents the similarity between u and u'
based on the overall rating; sim1(u, u') similarity based on the first criteria rating; sim2(u, u')
similarity based on the second criteria rating; and so on. The overall similarity then can be
1
simavg (u, u) = simi (u, u) ,
k
(8)
k +1 i =0
In multi-criteria rating scenario, each rating R(u, i) = (r0, r1, , rk) represents a point in the k+1-
dimensional space. Therefore, one natural approach to compute similarity between different
users is to use multidimensional distance metrics. Such metrics are easy to understand and
14
straightforward to implement. Note that the metrics of distance and similarity are inversely
related: the smaller the distance between two users, the higher the similarity. We calculate the
First, we have to be able to calculate the distance between two users ratings for the same
item, i.e., d rating ( R(u , i ), R(u , i ) ) , where R (u , i ) = (r0 , r1 , , rk ) and R(u , i ) = (r0, r1, , rk) . For
this purpose, any of the standard multidimensional distance metrics can be used:
ri ri ;
k
Manhattan distance: i =0
(10)
2
ri ri ;
k
Euclidean distance: i =0
(11)
Second, the overall distance between two users u and u' is simply:
1
d user (u , u ) =
I (u , u )
iI ( u ,u )
d rating ( R(u , i ), R(u , i ) ) (13)
where I(u, u') denotes the set of items that both u and u' have rated. In other words, the overall
distance between two users u and u' is the average distance between their ratings for all their
common items.
Finally, because the collaborative filtering techniques operate with the metric of user
similarity (and not user distance), and the distance and similarity are inversely related, we use the
1
sim(u, u ') = . (14)
1 + d user (u, u ')
Note that this definition of similarity has desired range properties, i.e., the similarity will
approach 0 as the distance between two users becomes larger, and it will be 1 if the distance is
15
In summary, both of the approaches presented in this section change only the similarity
function in the traditional collaborative filtering technique in order to reflect multi-criteria rating
information, which should result in a more accurate identification of similar users and,
the previous section apply primarily to the similarity-based recommenders, such as traditional
collaborative filtering techniques. In contrast, in this section we present a different approach that
is not limited to any specific recommendation algorithm. The intuition behind this approach
comes from the assumption that multi-criteria ratings represent users preferences for the
different important components of an item (e.g., story, acting, direction, and visuals aspects in the
case of movie recommender systems). Thus, the overall rating of an item is not just another
rating that is independent of others, but rather serves as some aggregation function f of the
r0 = f (r1 , , rk ) . (15)
In other words, this approach assumes that the overall rating has a certain relationship with multi-
criteria ratings. For instance, in a movie recommendation application, the story criteria rating
may have a very high priority, i.e., the movies with high story ratings are well liked overall by
some users, regardless of other criteria ratings. Therefore, if the story rating of the movie is
predicted to be high, the overall rating of the movie must also be predicted as high in order to be
accurate.
The proposed approach to rating estimation consists of the following three steps, as
16
single-rating recommendation problems and use any traditional single-criteria recommendation
technique to estimate ratings for each individual criterion. Second, we use statistical or machine
learning techniques to estimate aggregation function f based on the known ratings. And third,
using the multi-criteria ratings estimated in step 1 and function f estimated in step 2, we directly
calculate the predicted overall rating. Below we discuss each of these steps in more detail.
Known ratings
R (u , i ) = ( r0 , r1 , , rk )
(1) Predict k multi-criteria ratings using (2) Learn aggregation function f using
any traditional recommendation technique statistical or machine learning techniques
Given: ri (for each i = 1, , k) Given: ( r0 , r1 , , rk )
Compute: ri Estimate: f such that r0 = f ( r1 , , rk )
traditional UsersItems matrix (like the one in Figure 1) and addresses the rating prediction for
one of the individual criteria. In other words, instead of the multi-criteria recommendation
since (unlike with similarity-based approaches mentioned in previous section) any existing
17
single-criteria recommendation technique (e.g., collaborative, content-based, or hybrid) can be
The goal of this step is to estimate relationship f between the overall rating and the underlying
multi-criteria ratings of items, such that r0 = f (r1 , , rk ) . We are already able to predict the
individual multi-criteria ratings (see Step 1 above), but one of the important goals of
recommendation systems is to be able predict the overall rating of each item for each user, which
can be helpful in different situations. For example, having the overall rating for each item
enables the recommender system to rank all items for each user in terms of their predicted utility
(i.e., overall rating) and recommend only the most relevant items. In contrast, to determine the
most relevant items without the presence of the overall rating, the recommender system would
have to deal with a much more complex multi-criteria optimization problem (Statnikov &
Matusov 1995). Thus, finding the aggregation function is crucial for recommender systems, and
Domain expertise. Based on her prior experience and knowledge of the domain, the
domain expert may suggest the appropriate aggregation function. For example, it may
be the case that the overall rating is a simple average of the underlying multi-criteria
techniques. For example, in the case of linear regression, the aggregation function for
the overall rating would be a linear combination of the multi-criteria ratings, i.e.,
18
the importance of this criterion in determining the overall rating. The weights wi (i = 1,
can also be used for this purpose, e.g., artificial neural networks (Mitchell 1997).
Besides the ability to use different learning techniques, the aggregation function can also be of
function if it is used to predict all unknown ratings, e.g., if the criteria weights wi in a regression-
based function mentioned above are consistent for all users and items. However, depending on
functions in some applications. For example, in a movie recommender system, user u may have
a much larger weight on the story component that is consistent for all movies, whereas user u'
may have a significant weight on the visuals component. In this case, it would be
advantageous for user u to have her own user-based aggregation function fu, which would be
learned exclusively from the known ratings of user u (as opposed to all known ratings) using the
assume that each item i will have its own aggregation function that is consistent for all the
Finally, note that a variety of different techniques are available for testing the fitness or
accuracy of the predicted aggregation function(s). For example, in the case of linear regression,
one can estimate the predictive power using its R2 value. Or, more generally, one could use
standard n-fold cross validation techniques to estimate the predictive accuracy of the aggregation
function (Mitchell 1997). Therefore, we have the ability to restrict the use of user-based (or
item-based) aggregation functions only to the ones that exhibit sufficient predictive performance,
19
e.g., whose accuracy is greater than some pre-specified threshold. The remaining users (or
items) could use other techniques, e.g., the total aggregation function. As with every data-driven
computational learning technique, there will be application domains where this approach will
work well (i.e., domains where users/items exhibit consistent preferences on each criterion) and
Finally, as mentioned earlier, we compute each unknown overall rating r0 directly by using the
multi-criteria ratings estimated in step 1 and function f estimated in step 2: r0 = f (r1, , rk) .
Up to this point, we have focused on how new techniques can potentially improve the estimation
enhancement, the usage of multi-criteria ratings in recommender systems can provide other
according to user-specific requests. In other words, recommendations typically are fixed for all
users (e.g., provide 5 most relevant items to each user), and cannot be adjusted by the users on
the fly. There have been interesting attempts to provide recommendation filtering capabilities
based on some item content information (see, for example, Schafer 2005); however, while
undoubtedly useful, this filtering is typically done on the user-specified information that is fixed
to an item and, therefore, same to all the users. For example, in a movie recommender system
the users may be able to narrow their movie recommendations based on the movie genre, MPAA
rating, film length, etc. (Schafer 2005). However, in a multi-criteria recommender system
(similar to the one shown in Figure 2), a certain user may want to request only exceptionally
20
good story movies, where the story component of a movie is completely subjective to each
user and, as mentioned earlier, is estimated individually for each user. Multi-criteria rating
information would allow the recommender systems to respond to users individual dynamic
5. Experimental Results
To evaluate the proposed approaches, we have collected a set of user-submitted movie ratings
from Yahoo! Movies website (movies.yahoo.com) for several hundred randomly chosen movies
from the last decade. When a user submits movie ratings to Yahoo! Movies, in addition to the
overall rating, she is asked to provide four criteria information for each movie: story, acting,
direction, and visuals. All ratings have 13 possible values and are based on a standard grading
scale from A+ to F; for the analysis purposes we changed them to numerical values from 13 to 1.
In the data preprocessing stage, we invoked two constraints on the dataset in order to ensure that
the dataset is not extremely sparse and has enough data for rating prediction: (a) there should be
at least 10 movie ratings per user and (b) at least 10 user ratings per movie.
As a result, we ended up with a dataset that includes 155 users, 50 movies, and has 2,216
known ratings in total (28.6% of ratings are known). Each user has rated 14.3 movies on
average, and the average number of common movies between two users is 5.2. Each movie has
been rated on average by 44.3 users, and the average number of common users between two
movies is 13.6. The average rating on each criterion is approximately 9 (or B).
Furthermore, in order to obtain reliable results with a relatively small amount of data, we
use a standard 10-fold cross validation technique (Mitchell 1997), where we randomly divide the
dataset into 10 disjoint subsets. We use nine-tenths of the data for training, and the remaining
21
one-tenth for testing rating prediction, and then repeat this process 10 times (each time with a
different test dataset) and perform the evaluation on all predicted ratings.
Numerous metrics for evaluating the performance of recommender systems have been
proposed and used in the research literature (Herlocker et al. 2004), including the statistical
accuracy metrics (e.g., mean absolute error and root mean squared error) as well as decision-
support measures that determine how well the recommendation algorithm can predict high-
relevance items (i.e., items that would be rated highly by the user). Examples of decision-
support metrics include precision (the percentage of truly high ratings among those that were
predicted to be high by the recommender system), recall (the percentage of correctly predicted
high ratings among all the ratings known to be high), and F-measure, which is a harmonic
In this paper, we have focused on the popular variation of the above-mentioned precision
metric, i.e., precision-in-top-N, which represents the percentage of truly high overall ratings
among those that were predicted to be N most relevant items for each user. This metric was
chosen because of its practicality, since many users in real-life personalization and
recommendation applications are typically interested in looking only at few highest-ranked item
recommendations.
system makes correct decisions about whether an item that is predicted as highly-ranked is
other words, every rating had to be defined on a binary scale, i.e., as highly-ranked or non-
highly-ranked. Since Yahoo! Movies rating scale (from A+ to F) was not binary, we translated
the overall movie ratings into a binary scale by treating the ratings greater than 10.5 (A+, A, A-)
22
as highly-ranked and ratings less than 10.5 as non-highly-ranked. The threshold of 10.5 was
chosen with the assumption that the users would really want to focus on the recommendations
about movies that are most relevant to them (i.e., movies they would rate as A+, A, A-), and
Also note that, in our dataset, the percentage of the highly-ranked ratings (i.e., overall
ratings above 10.5) was 35.6%, which means that it would be possible to obtain the precision of
35.6% simply by recommending items at random. Any recommender system that does not
achieve 35.6% precision would be worse than a random guess and, therefore, essentially useless.
techniques on real-life data, we performed the empirical analysis of the following five approaches
weighted sum and the cosine similarity metric, as described in (3) and (4). This approach
Two similarity-based techniques (as described in Section 4.1) implemented with the
distance metric.
23
approach:
regression.
separately for each movie and restricted only for the movies that have the best
Note that we use the standard user-based collaborative filtering approach as an integral
part of every technique in order to minimize the non-essential differences between the techniques
as much as possible and, thus, to maximize the possibility that any differences in performance
between the standard CF and multi-criteria recommender systems are due to the newly
neighborhood sizes (the neighborhood of all users vs. the neighborhood of the 3 most similar
users) and for different precision-in-top-N levels (N = 3, 5, and 7). The results are summarized in
Table 1. The shaded cells represent the performance of the baseline CF approach. Note that
nearly every multi-criteria technique performed either better or at least as well as the baseline
24
technique. The precision figures in regular font represent 0%1% improvement over the baseline
approach. The boldface precision figures represent 1%4% improvement over the baseline
approach, and the boldface precision figures marked with *** represent >4% improvement over
the baseline approach. For further comparison, we have also calculated the precision-in-top-N
recommended N movies (N = 3, 5, and 7) that are most liked by all other users, based on the
average rating for each movie. The results show that the precision-in-top-N for this simple
approach is: 61.3% (top 3), 53.3% (top 5), and 46.4% (top 7), which performs better than a
random guess approach mentioned earlier but not as well as collaborative filtering techniques.
For user-based CF, precision-in-top-1 measures (as opposed to top-3, top-5, and top-7
techniques (such as Chebyshev and cos-min), which typically outperformed both the
We also tried movie-based CF (as opposed to user-based CF, as in Table 1), for which
total-reg performed the best of all the techniques for various neighborhood sizes and
generally consistent with similar findings in recommender systems literature about the
widely reported that combining content-based and collaborative systems may improve the
recommendation accuracy).
25
As with most recommender systems and, more generally, computational learning techniques, the
depends significantly on the characteristics of the underlying data. Thus, while we expect the
criterion techniques in all domains where multi-criteria information exists, especially in the ones
relationship between the overall rating and multi-criteria ratings for the users or items.
6. Conclusions
propose two new recommendation approaches the similarity-based approach and the
information. Our experimental results on a real-world dataset confirm that, when available,
expect that the proposed approaches will be useful in other application domains as well, where
they will be able to predict overall ratings more accurately by utilizing the available multi-criteria
rating information.
The area of recommender systems has made significant progress over the last few years;
many techniques have been proposed and many systems have been developed. However,
modern recommender systems still require further significant improvements in order to provide
26
better recommendations and be viable in more complex personalization applications; the ability
to leverage multi-criteria rating information constitutes one such improvement. We believe that
this paper is just the first step in studying multi-criteria recommender systems and that
Acknowledgments
The research reported in this paper was supported in part by the National Science Foundation
grant IIS-0546443.
References
Knowledge and Data Engineering, vol. 17, no. 6, pp. 734-749, Jun. 2005.
1998.
J. Figueria, S. Greco, and M.Ehrgott, Multiple Criteria Decision Analysis: State of the Art
P. E. Green, A. M. Krieger, and Y.Wind, Thirty years of conjoint analysis: Reflections and
27
J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, Evaluating Collaborative
W. Lee, C. Liu, and C. Lu, Intelligent agent-based systems for personalized recommendations
in Internet commerce, Expert Systems with Applications, vol. 22, no. 4, pp. 275-184,
May 2002.
Information Technology and Tourism, vol. 4, nos. 34, pp. 215226, 2002.
R. B. Statnikov and J.B. Matusov, Multicriteria Optimization and Engineering, Chapman &
Hall, 1995.
28