10 26599@bdma 2018 9020012
10 26599@bdma 2018 9020012
10 26599@bdma 2018 9020012
Abstract: The Collaborative Filtering (CF) recommendation algorithm, one of the most popular algorithms in
Recommendation Systems (RS), mainly includes memory-based and model-based methods. When performing
rating prediction using a memory-based method, the approach used to measure the similarity between users or
items can significantly influence the recommendation performance. Traditional CFs suffer from data sparsity when
making recommendations based on a rating matrix, and cannot effectively capture changes in user interest. In
this paper, we propose an improved hybrid collaborative filtering algorithm based on tags and a time factor (TT-
HybridCF), which fully utilizes tag information that characterizes users and items. This algorithm utilizes both tag
and rating information to calculate the similarity between users or items. In addition, we introduce a time weighting
factor to measure user interest, which changes over time. Our experimental results show that our method alleviates
the sparsity problem and demonstrates promising prediction accuracy.
similar interests with a target user as neighbor users. Step three: Predict the unknown rating of the target
Therefore, to make a recommendation to a user, UserCF user u for item i.
first identifies neighbor users by mining the user’s According to the known score of a neighbor set, we
historical behavior. Then, it makes recommendations to can predict the unknown rating of target user u for item
the target user based on the behavior of these neighbor i. Using UserCF, we can predict the rating of user u for
users. The main steps in UserCF are as follows: item i, which can be expressed as shown in Eq. (2).
si m.u; v/.Rvi RN v /
P
Step one: Calculate the similarity between users.
In the UserCF, the measurement of user similarity v2N.u/
PUserCF .Rui / D RN u C P
is a critical step that directly impacts the accuracy si m.u; v/
v2N.u/
of recommendations. There are several commonly
(2)
used similarity measures in RS, including the Pearson
The above equation indicates that we can determine
Correlation Coefficient (PCC), Euclidean distance,
the unknown rating of the target user u for item i based
cosine similarity, and modified cosine similarity
on the weighted sum of the known rating of the user’s
methods, the specific formulas of which are listed in
neighbor set for item i. We calculate sim(u, v) using Eq.
Table 2. Of these, the PCC is the most frequently used
(1) .
and most accurate similarity measure strategy. PCC
calculates the similarity between users based on an item 2.2 Item-based collaborative filtering
set that is co-rated by users. The PCC formula is shown Linden et al.[3] first proposed ItemCF, which assumes
in Eq. (1) below. that a user may like items that are similar to items they
sim.u; v/ D have previously liked. The main steps of ItemCF are as
follows:
RN u /.Rvj RNj /
P
.Ruj Step one: Calculate the similarity between items.
j 2Iuv
r P r P (1) The calculation of the similarity between items i and
.Ruj RN u /2 .Rvj RN v /2 j is based on a user set in which users rate items i and j
j 2Iuv j 2Iuv
together. We again use PCC to calculate the similarity
We use the average score for users u and v to of items, as shown in Eq. (3).
eliminate any difference in different users’ scoring
scales and to ensure similarity accuracy. For example, si m.i; j / D
RN i /.Rvj RNj /
P
some users tend to give high ratings for items, whereas .Rui
u2Uij
other users are more demanding and give lower scores, r P r P (3)
despite the fact that these users may have the same .Rui RN i /2 .Rvj RNj /2
u2Uij u2Uij
interest.
Step two: Find the target user’s neighborhood set. Step two: Find the target item’s neighborhood set.
After Step one, we obtain the user similarity matrix We then sort the obtained similarity matrix based on
and sort this matrix according to the degree of the degree of similarity. Then, we can obtain the top k-
similarity. Then, we can determine the top k-neighbor neighbor items that are most similar to the target item,
users who are most similar to the target user, which we which we label as N.i /.
label as N(u). Step three: Predict the unknown rating of the target
item.
Equation (4) shows the use of ItemCF to predict the
Table 2 Traditional methods for measuring similarities. rating of user u for item i.
si m.i; j /.Ruj RNj /
n
P P
Ruj Rvj
j D1
Cosine sim(u,v)= s j 2N.i /
PI t e mCF .Rui / D RN i C
s
n n
similarity P 2
Ruj
P 2
Rvj
P
j D1 j D1
si m.i; j /
N u /.Rvj N v/ j 2N.i /
P
.Ruj R R
Modified cosine sim(u,v)= r P
j 2Iuv
r P (4)
similarity .Ruj RN u /2 .Rvj R N v /2
j 2Iu j 2Iv This equation indicates that the unknown rating of
N u /.Rvj N v/
P
.Ruj R R
j 2Iuv user u for target item i can be represented by the known
Pearson sim(u,v)= r P r P
correlation .Ruj N u /2
R .Rvj N v /2
R rating of the target item i’s neighbor set. sim(i, j)
j 2Iuv j 2Iuv
coefficient
represents the similarity of items i and j, which is
Chunxia Zhang et al.: An Improved Hybrid Collaborative Filtering Algorithm Based On Tags and Time Factor 131
calculated using Eq. (3). matrix is very sparse, so they experience sparsity
From the above, we can see that UserCF recommends problems. The authors in Ref. [19] proposed an
hotspots in a group in which members have the improved collaborative filtering algorithm based on
same interest as those of a target user, i.e., the combination of tags and ratings, known as UTR-
UserCF emphasizes socialization. In contrast, ItemCF CF. In this paper, we use this UTR-CF method to
emphasizes individualization, in that it recommends calculate the tag similarity between users or items.
similar items based on a user’s historical behavior. In the MovieLens dataset, the movie tag information
However, both methods suffer from problems of is the genre, e.g., action, adventure, animation, and
sparsity and cold start. the user tag information user comprises demographic
2.3 Hybrid CF characteristics, e.g., fMan, 28, ‘educator’g. Before
calculating the similarity of a tag set, we first transform
When using just one recommended algorithm, the tag set and other text information into digital form to
the resulting recommendation accuracy is not facilitate the modeling process. Assume that two users
very high, since UserCF and ItemCF both have (or two movies) are converted to digital information and
advantages and disadvantages. Therefore, to make are represented as two vectors in m-dimensional space:
up for the shortcomings of individual algorithms, t= (t1 , t2 , ..., tm ), s = (s1 , s2 , ..., sm ). Then, we use the
some scholars have proposed the integration of cosine similarity to calculate the similarity of the tag
different recommendation algorithms when making vectors, as follows:
recommendations. The authors of Ref. [24] integrated Pm
tk sk
UserCF and ItemCF in making recommendations, as si m.t; s/ D qP kD1 P (6)
m 2 m 2
shown in Eq. (5) below. . kD1 k t s
kD1 k /
si m.u; v/.Rvi RN v /
P
3.2 Hot item punishment
0 1
v2N.u/
Rui D @RN u C AC In UserCF, the traditional method for calculating user
B C
P
si m.u; v/
v2N.u/ similarity is to consider item ratings co-rated by two
0 P
si m.i; j /.Ruj RNj /
1 users without considering the influence of the hot items.
j 2N.i / For example, if two users buy a Xinhua Dictionary,
.1 / @RN i C A (5)
B C
this does not mean that they have the same interest
P
si m.i; j /
j 2N.i / because most Chinese people have bought this book.
The first part of Eq. (5) comprises the prediction However, if both users buy a book with the title
based on UserCF and the second part on ItemCF. The Machine Learning, we can consider that they have the
parameter is an adjustment parameter, which controls same interest because only those who study this field of
the degree to which this method relies on UserCF and research would buy this book. In summary, if two users
ItemCF. buy hot items, this does not indicate that they have the
same interests. As such, we introduce a weight wi to
3 Our Proposed Model reduce the influence of hot items on user similarity, as
In this paper, we propose an improved collaborative shown in Eq. (7).
filtering method based on tags and a time factor (TT- 1
wi D (7)
Hybrid CF) for RS. In calculating similarity, the TT- lg.1 C Ni /
Hybrid CF algorithm utilizes tag and rating information In this equation, i2I uv , N i represents the number of
to calculate the similarity of users (or items). In users who have rated the item i.
addition, it employs a hot-item penalty when calculating 3.3 Temporal weight
users’ similarity to penalize the influence of a hot item
in their co-rated items. In the prediction phase, TT- Actually, user interest in items fluctuates, but traditional
Hybrid CF takes the time factor into account to measure collaborate filtering algorithms do not take this into
user interest, which changes over time. consideration. The recent behavior of users is more
influential than their earlier behavior. If a user liked
3.1 Calculating similarity with tags
an item last month, this does not mean that he (or she)
Existing collaborative filtering algorithms only use a still likes that item this month. Recent behavior is
rating matrix to calculate similarity, but this rating more likely to indicate a user’s current interest. So,
132 Big Data Mining and Analytics, June 2018, 1(2): 128-136
Fig. 1 Effect of parameter ˇ on MAE of TT-UserCF Fig. 2 Effect of parameter ˛ on MAE values of TT-ItemCF
algorithm. algorithm.
134 Big Data Mining and Analytics, June 2018, 1(2): 128-136
Table 3 MAE values of UserCF, UTR-UserCF, and TT- Table 5 MAE values of the UserCF, ItemCF, HybridCF,
UserCF algorithms UTR-Hybrid, and TT-HybridCF algorithms
collaborative filtering method based on tags and [10] A. Felfernig, Koba4MS: Selling complex products
the time factor (TT-Hybrid CF). In the process of and services using knowledge-based recommender
calculating similarity, we used both tag and rating technologies, in Proc. 7th IEEE Int. Conf. E-Commerce
Technology, Munich, Germany, 2005, pp. 92–100.
information. In addition, the TT-Hybrid CF introduces [11] A. Felfernig and K. Shchekotykhin, Debugging user
a hot-item penlty to the calculation of users’similarity interface descriptions of knowledge-based recommender
to penalize the influence of a hot item among co- applications in Proc. 11st Int. Conf. Intelligent User
rated items. In the process of rating prediction, TT- Interfaces, Sydney, Australia, 2006, pp. 234–241.
[12] J. Wang, A. P. De Vries, and M. J. T. Reinders,
Hybrid CF takes into consideration the users’interest by
Unifying user-based and item-based collaborative filtering
introducing a temporal weight to measure the changing approaches by similarity fusion, in Proc. 29th Annual
user interest over time. Compared with four other Int. ACM SIGIR Conf. Research and Development in
collaborative filtering algorithms (UserCF, HybridCF, Information Retrieval, Seattle, WA, USA, 2006, pp. 501–
ItemCF, and UTR-CF), our proposed TT-Hybrid CF 508.
[13] B. L. Wang, J. H. Huang, L. B. Ou, and R. Wang, A
realizes a great improvement in recommendation
collaborative filtering algorithm fusing user-based, item-
performance. In future work, we will continue to based and social networks, in Proc. 2015 IEEE Int. Conf.
research the problems of sparsity and cold start in Big Data, Santa Clara, CA, USA, 2015, pp. 2337–2343.
traditional collaboraive filtering. [14] H. Koohi and K. Kiani, User based collaborative filtering
using fuzzy c-means, Measurement, vol. 91, pp. 134–139,
Acknowledgment 2016.
[15] X. Y. Liu, C. Aggarwal, Y. F. Li, X. N. Kong, X. Y.
This work was supported by the National Natural Science
Sun, and S. Sathe, Kernelized matrix factorization for
Foundation of China (Nos. 61432008 and 61272222).
collaborative filtering, in Proc. 2016 Siam Int. Conf. Data
References Mining, Miami, FL, USA, 2016, pp. 378–386.
[16] T. Hofmann, Latent semantic models for collaborative
[1] G. Adomavicius and A. Tuzhilin, Toward the next filtering, ACM Trans. Inf. Syst., vol. 22, no. 1, pp. 89–115,
generation of recommender systems: A survey of the state- 2004.
[17] R. Salakhutdinov, A. Mnih, and G. Hinton, Restricted
of-the-art and possible extensions, IEEE Trans. Knowl.
Boltzmann machines for collaborative filtering, in Proc.
Data Eng., vol. 17, no. 6, pp. 734–749, 2005.
[2] L. Y. Lü, M. Medo, C. H. Yeung, Y. C. Zhang, Z. K. Zhang, 24th Int. Conf. Machine Learning, Corvalis, OR, USA,
and T. Zhou, Recommender systems, Phys. Rep., vol. 519, 2007, pp. 791–798.
[18] H. L. Xu, X. Wu, X. D. Li, and B. P. Yan, Comparison
no. 1, pp. 1–49, 2012.
[3] G. Linden, B. Smith, and J. York, Amazon.Com Study of internet recommendation system, (in Chinese), J.
recommendations: Item-to-Item collaborative filtering, Softw., vol. 20, no. 2, pp. 350–362, 2009.
IEEE Int. Comput., vol. 7, no. 1, pp. 76–80, 2003. [19] M. K. Najafabadi, M. N. Mahrin, S. Chuprat, and H. M.
[4] O. Celma and X. Serra, FOAFing the music: Bridging the Sarkan, Improving the accuracy of collaborative filtering
semantic gap in music recommendation, Web Semant., vol. recommendations using clustering and association rules
6, no. 4, pp. 250–256, 2008. mining on implicit data, Comput. Hum. Behav., vol. 67,
[5] F. Hopfgartner, T. Brodt, J. Seiler, B. Kille, A. pp. 113–128, 2017.
Lommatzsch, M. Larson, R. Turrin, and A. Serény, [20] N. Gao and M. Yang, An improved unifying tags and
Benchmarking news recommendations: The CLEF ratings collaborative filtering for recommendation system,
NewsREEL use case, ACM SIGIR Forum, vol. 49, no. 2, (in Chinese), J. Nanjing Nor. Univ. (Nat. Sci. Ed.), vol. 38,
pp. 129–136, 2015. no. 1, pp. 98–103, 2015.
[6] M. Balabanović and Y. Shoham, Combining content-based [21] H. Koohi and K. Kiani, A new method to find neighbor
and collaborative recommendation, Commun. ACM, vol. users that improves the performance of collaborative
40, no. 3, pp. 66–72, 1997. filtering, Expert Syst. Appl., vol. 83, no. C, pp. 30–39,
[7] J. B. Shu, X. X. Shen, H. Liu, B. L. Yi, and Z. L. 2017.
Zhang, A content-based recommendation algorithm for [22] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, Using
learning resources, Multimed. Syst., doi: 10.1007/s00530- collaborative filtering to weave an information tapestry,
017-0539-8. Commun. ACM, vol. 35, no. 12, pp. 61–70, 1992.
[8] Y. Koren, Factorization meets the neighborhood: A [23] Z. D. Zhao and M. S. Shang, User-based collaborative-
multifaceted collaborative filtering model, in Proc. 14th filtering recommendation algorithms on hadoop, in Proc.
ACM SIGKDD Int. Conf. Knowledge Discovery and Data 3rd Int. Conf. Knowledge Discovery and Data Mining,
Mining, Las Vegas, NV, USA, 2008, pp. 426–434. Phuket, Thailand, 2010, 478–481.
[9] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Item- [24] H. Ji, J. F. Li, C. R. Ren, and M. He, Hybrid collaborative
based collaborative filtering recommendation algorithms, filtering model for improved recommendation, in Proc.
in Proc. 10th Int. Conf. World Wide Web, Hong Kong, 2013 IEEE Int. Conf. Service Operations and Logistics,
China, 2001, pp. 285–295. and Informatics, Dongguan, China, 2013.
136 Big Data Mining and Analytics, June 2018, 1(2): 128-136
[25] S. Y. Wei, N. Ye, S. Zhang, X. Huang, and J. Zhu, Item- Information and Knowledge Management, Atlanta,
based collaborative filtering recommendation algorithm GA, USA, 2001, pp. 247–254.
combining item category with interestingness measure, [27] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J.
in Proc. 2012 Int. Conf. Computer Science and Service Riedl, GroupLens: An open architecture for collaborative
System, Nanjing, China, 2012, pp. 2038–2041. filtering of Netnews, in Proc. 1994 ACM Conf. Computer
[26] G. Karypis, Evaluation of item-based top-N Supported Cooperative Work, Chapel Hill, NC, USA,
recommendation algorithms, in Proc. 10th Int. Conf. 1994, pp. 175–186.
Ming Yang received the PhD degree Wanqi Yang received the PhD degree from
from Southeast University at Nanjing in Nanjing University in 2015. Her research
2004. He received the MS degree from interests include multiview learning,
University of Science & Technology of feature selection, multimodal fusion,
China, and BS degree from Anhui Normal abnormal event detection and activity
University, in 1990 and 1987, respectively. recognition. She has published several
He is currently a professor in the school papers in top conferences and journals,
of computer science and technology at e.g., IEEE TNNLS, CVIU. Her work
Nanjing Normal University. His research interests include data currently focuses on multiview dimension reduction, crossview
mining and knowledge discovery, machine learning, pattern correlation analysis and their applications to real-world problems
recognition, and their applications. in image/video analysis.