Lin Lei Addo ML
Lin Lei Addo ML
Lin Lei Addo ML
Solution
xiaoyuuestcg@uestc.edu.cn
1 Introduction
Following the effects of the global financial crisis in 2008, large financial insti-
tutions have collapsed. Knowledge workers, even in wealthiest countries have to
worry about losing their well-paid, full-time jobs, and cannot easily find similar
ones elsewhere. An effective e-recruiting engine can help job seekers easily access
recruitment opportunities and reduces the recruitment labor by providing suit-
able items which match their personal interests and qualifications. It also frees
companies from information overload and advertisement cost. The key module
for a dynamic e-recruiting engine is job matching system which makes an effort
to engage the unemployed who are well suited to the vacancies to be filled.
II
2 Literature Review
like they did. Besides, our resume data are widely collected from various areas
which makes our solution more universal and robust.
3 Dataset description
The dataset used was tapped from a job recommend game1 and can be freely
downloaded2 . The original dataset contains 70,000 resumes with 34,090 different
positions. After cleaning and filtering, 47,346 resumes whose last jobs belong to
a particular predication list of most frequent 32 positions (e.g. software engineer,
cashier and project manager) were used. Even though, there are 18,736 different
positions in the dataset. The most frequent positions are shown in Figure 1.
The rest cluster features are document features. We use LDA to classify
resumes into 32 and 64 topics respectively. In all, there are 72 cluster features.
V
(a) (b)
Random Forests (RF) is an ensemble learning method using the general tech-
nique of random decision trees. Each tree in the ensemble is built from a sample
VI
drawn from the training set and the best split is picked among a random subset
of the features. XGBoost (XGB), a short form for ”Extreme Gradient Boosting”,
is an optimized distributed gradient boosting library designed to be highly effi-
cient, flexible and portable. The grid search results of Results of RF and XGB
are shown in Figure 2. After analysis, we find that 473 features are used in XGB
while only 163 features are selected by RF.
Manuel
Cluster
Convolution
Pooling Softmax
Semantic
Flatten
Full-
connect
Embedding
Flatten
Bagging is one of the earliest and simplest ensemble based algorithms. Usu-
ally, individual classifiers will be combined by taking a simple majority vote
of their decisions. Assume that there are three classifiers to make a positive or
negative predication. We improve the bagging method (named IBagging) by vot-
ing according to the sum of decision probabilities and can easily be extended to
multi-class ensemble. Without any information retrieval techniques and machine
VII
learning methods, the basal manual rule will recommend the most frequent label
as the recommend item. Then, we can measure our resume-job matching solu-
tion in two ways. One is precision, whose goal is to cover as many of correct
positions. The results are as shown in Table 4. By analyzing the experiments,
we can find that XGB performs best among four basal estimators with longest
training time, while CNN model convergences in shortest time with acceptable
precision. In the meantime, our solution benefits from both bagging methods
and our semantic unsupervised feature extract method.
As we know, lots of resume-position pairs may not appear in testing data, but
they are reasonable and often quit similar to those correct pairs over the training
dataset. Thus, the other evaluation method, recall for Top-N recommendations
is used to evaluate different matching solutions. In this case, recall is the pro-
portion of the correct position from the testing dataset. There are 32 possible
position for a resume, with their probabilities given by classifiers, the solution
recommends top N positions to a given resume, and reports recall for various
values of N . The results of Top-N are as shown in Table 5. The results shows
a significant improvement in recall for Top-N using IBagging method compared
to the baseline method.
8 Acknowledge
This work is supported by the National Science Foundation of China (Grant Nos.
61502082) and the Fundamental Research Funds for the Central Universities
(ZYGX2014J065).
References
1. Al-Otaibi, S.T., Ykhlef, M.: A survey of job recommender systems. International
Journal of the Physical Sciences 7(29), 5127–5142 (2012)
2. Carrer-Neto, W., Hernández-Alcaraz, M.L., Valencia-Garcı́a, R., Garcı́a-Sánchez,
F.: Social knowledge-based recommender system. application to the movies do-
main. Expert Systems with Applications 39(12), 10990–11000 (2012)
3. Chollet, F.: Keras. https://github.com/fchollet/keras (2015)
4. Golec, A., Kahya, E.: A fuzzy model for competency-based employee evaluation
and selection. Computers & Industrial Engineering 52(1), 143–161 (2007)
5. Guo, S., Alamudun, F., Hammond, T.: Résumatcher: A personalized résumé-job
matching system. Expert Systems with Applications 60, 169–182 (2016)
6. Lu, J., Wu, D., Mao, M., Wang, W., Zhang, G.: Recommender system application
developments: A survey. Decision Support Systems 74(C), 12–32 (2015)
7. Malinowski, J., Keim, T., Wendt, O., Weitzel, T.: Matching people and jobs: A
bilateral recommendation approach. In: null. p. 137c (2006)
8. Paparrizos, I., Cambazoglu, B.B., Gionis, A.: Machine learned job recommenda-
tion. In: Proceedings of the fifth ACM Conference on Recommender Systems. pp.
325–328. ACM (2011)
9. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine
learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
10. Resnick, P., Varian, H.R.: Recommender systems. Communications of the Acm
40(3), 56–58 (1997)
11. Zhang, L., Fei, W., Wang, L.: P-j matching model of knowledge workers. Procedia
Computer Science 60(1), 1128–1137 (2015)
3
https://github.com/lyoshiwo/resume job matching