Xiang 2017
Xiang 2017
Xiang 2017
Tourism Management
journal homepage: www.elsevier.com/locate/tourman
h i g h l i g h t s
We applied text analytics to compare three major online review platforms, namely, TripAdvisor, Expedia, and Yelp.
Findings show discrepancies in the representation of hotel product on these platforms.
Information quality, measured by linguistic and semantic features, sentiment, rating, and usefulness, varies considerably.
This study is the first to comparatively explore data quality in social media studies in hospitality and tourism.
This study highlights methodological challenges and contributes to the theoretical development of social media analytics.
a r t i c l e i n f o a b s t r a c t
Article history: Online consumer reviews have been studied for various research problems in hospitality and tourism.
Received 1 April 2016 However, existing studies using review data tend to rely on a single data source and data quality is
Received in revised form largely anecdotal. This greatly limits the generalizability and contribution of social media analytics
1 September 2016
research. Through text analytics this study comparatively examines three major online review platforms,
Accepted 6 October 2016
namely TripAdvisor, Expedia, and Yelp, in terms of information quality related to online reviews about
the entire hotel population in Manhattan, New York City. The findings show that there are huge dis-
crepancies in the representation of the hotel industry on these platforms. Particularly, online reviews
Keywords:
Online reviews
vary considerably in terms of their linguistic characteristics, semantic features, sentiment, rating, use-
Hotel industry fulness as well as the relationships between these features. This study offers a basis for understanding
Information quality the methodological challenges and identifies several research directions for social media analytics in
Social media analytics hospitality and tourism.
Text analytics © 2016 Elsevier Ltd. All rights reserved.
Machine learning
http://dx.doi.org/10.1016/j.tourman.2016.10.001
0261-5177/© 2016 Elsevier Ltd. All rights reserved.
52 Z. Xiang et al. / Tourism Management 58 (2017) 51e65
largely anecdotal and often based upon the popularity of the websites to transaction-based online travel agencies (aka OTAs) like Expedia
from which the data were collected, which substantially limits their and Bookings.com where reviews are incorporated as electronic
generalizability and contribution to knowledge. word-of-mouth (Gligorijevic, 2016). Although they may all be
With this in mind, this study comparatively examines three considered part of social media with the common goal to assist
major online review platforms, namely TripAdvisor, Expedia, and consumer decision making by providing trusted, shared social
Yelp, in terms of information quality related to online reviews in knowledge, these platforms are complex sociocultural and eco-
these websites with the goal to provide a basis for understanding nomic systems that reflect different business models, different
the methodological challenges and for identifying opportunities for technological affordances, and different user segments/bases and
the development of social media analytics in hospitality and power distribution in the online eco-system (Jeacle & Carter, 2011;
tourism. The rest of the paper is organized as follows: the next Scott & Orlikowski, 2012). For example, TripAdvisor, by incorpo-
section, Research Background, reviews related literature to provide rating a variety of user data and information tools, represents
the motivations for the present study. In Research Design and various actors, resources and, importantly, business models
Analytical Framework, we outline our methodological approaches through its website (Yoo, Sigala, & Gretzel, 2016). Recent market
and describe key measures and methods used to assess the three dynamics such as Expedia's takeover of Travelocity and Orbitz have
platforms with specific research questions. In Data Collection and created a new power structure within the online eco-system with
Analysis, we describe the data collection process and explain, in the emergence of a potentially dominant social knowledge base
details, the text analytics procedures to develop key metrics to (see http://time.com/money/3707551/expedia-orbitz-impact-
describe review characteristics as well as statistical analyses con- travelers). From the business viewpoint, online reviews including
ducted to compare and contrast the three platforms. Then, research their peripheral cues, such as user-supplied photos and the re-
findings are presented followed by a discussion on implications for viewer's personal information, are intended as means of persuasive
both research and practice. Finally, conclusions are drawn, and communication in order to build credibility and influence user
limitations and directions for future research are discussed. behavior (Sparks, Perkins, & Buckley, 2013; Zhang, Zhang, & Yang,
2016). Therefore, the selection, ranking, and display of online re-
2. Research background views reflect the platform's strategy to maximize these effects on
its targeted audience. Also, the contribution of online reviews is a
Big data analytics has been touted as a new research paradigm self-selection process, which contributes to quality differentiation
that utilizes diverse sources of data and analytical tools to make over the life cycle of product reviews (Li & Hitt, 2008). Mkono and
inferences and predictions about reality (Boyd & Crawford, 2012; Tribe (2016), in a recent study, show that users on travel-related
Mayer-Scho €nberger & Cukier, 2013). Particularly, with increas- social media sites not only are product evaluators but also may
ingly powerful natural language processing and machine learning play additional, important roles such as online activist, troll, social
capabilities, textual contents from the Web provide a huge shared critic, information seeker and socialite. Furthermore, these issues
cognitive and cultural context and, thus, have been analyzed in are confounded with the long-standing concerns about the
many application domains (Halevy, Norvig, & Pereira, 2009). authenticity of online reviews (Luca & Zervas, 2015). Therefore,
However, in recent years there have been growing criticism and social media research using online review data must be cognizant
concerns about the data-driven approach especially those using of the nuances in these data sources in order to make conscious,
online user-generated contents as research data. For example, Ekbia appropriate methodological decisions when considering the
et al. (2015) discuss some of the epistemological dilemmas in representativeness and quality of the data.
existing big data analytics, including the validity of claims about Hospitality and tourism appears to be an ideal application field of
causal relationships, as opposed to mere statistical correlations, social media analytics with tremendous growth and potential. For
within the data. Others (e.g., Fricke, 2015) challenge the nature of example, a recent study (Lu & Stepchenkova, 2015) cited over 100
inductive reasoning in big data analytics and suggest that there are papers primarily focusing on user-generated contents published in
potential hazards in making generalizable claims. Particularly, hospitality and tourism journals in the previous 10 years. Schuckert
Ruths and Pfeffer (2014) argue that studies using social media data et al. (2015b) cited 50 articles related to online reviews published
should be aware of a number of validity problems such as platform within and outside the field of hospitality and tourism, indicating
biases (e.g., platform design, user base, and platform specific the growing interest in understanding the impact of online reviews.
behavior), data availability biases, and data authenticity issues. For the present study, we examined the literature that specifically
Tufekci (2014) specifically highlights the conceptual and method- used online reviews as data in hospitality and tourism. Table 1 lists a
ological challenges in social media studies, particularly sampling sample of recent publications in six leading tourism and hospitality
biases arising from using a single platform as data source due to the journals, namely Tourism Management, Annals of Tourism Research,
sociocultural complexity of user behavior and unrepresentative Journal of Travel Research, International Journal of Hospitality Man-
sampling frame, which may complicate the interpretation of agement, Cornell Hospitality Quarterly, and Journal of Tourism and
research findings. Importantly, she argues that social media plat- Hospitality Research. While there is a considerable amount of pub-
forms are comparable to specimens in biological research wherein lications elsewhere within and outside hospitality and tourism and,
they are selected for certain characteristics suitable for laboratory therefore, this compilation by no means represents a full picture of
examinations at the expense of illuminating other potentially the literature, these journals were selected as the sampling frame
important features. As such, Fricke (2015) suggests that correlations because of their high influence in the field (McKercher, Law, & Lam,
found in many of existing big data studies might only be considered 2006). As shown in the table, these studies all collected and
“candidate” solutions to the problems at hand. Ruths and Pfeffer analyzed online reviews (and associated information) to address a
(2014), among others, call for the use of a variety of triangulation variety of research problems such as travel motivation (e.g., Pearce
approaches for big data analytics such as applying the same & Wu, 2015), opinions and sentiments related to hospitality prod-
methods to examine the performance on two or more distinct data ucts (e.g., Crotts, Mason, & Davis, 2009; Levy, Duan, & Boo, 2013;
sets when studying a new social phenomenon. Xiang et al., 2015), impact of online reviews on hotel business per-
The online eco-system in hospitality and tourism is vast, com- formance (e.g., Melia n-Gonzalez, Bulchand-Gidumal, & Lo pez-
plex, and diverse; so are online review platforms, which range from Valcarcel, 2013; Xie, Zhang, & Zhang, 2014; Ye, Law, & Gu, 2009),
community-based sites such as LonelyPlanet, Tripadvisor and Yelp and the nature and utilities of online reviews as data (e.g., Fang, Ye,
Table 1
Sample of recent literature on analytics using online reviews in hospitality and tourism.
No. Data Source Data and Sampling Study Variables & Analysis Product Reference
1 Booking.com All available 1440 Spanish coastal hotels that have Examines the distribution of scoring scale in Booking.com. Hotels Mellinas, María-Dolores,
186 K reviews. and García (2015)
2 Ctrip All available 3.6 K reviews (between 02/2007-01/2008) Uses a log-linear regression model to examine the influence of online reviews (volume and rating) on Hotels Ye et al. (2009)
for 248 hotels in three cities in China. the number of hotel bookings.
3 Daodao.com 44 K online reviews covering 774 star-rated hotels in Uses linear regression to examine the influence of price on customers' perceptions of service quality Hotels Ye, Li, Wang, and Law
Beijing, China (using avail. price info as filter). and value. (2014)
4 Dianping Reviews (number not specified) of 1242 restaurants Uses an econometric model with thematic analysis, sentiment, and volume to examine differences Restaurants Zhang, Ye, Law, and Li
containing overall ratings in Beijing, China. between consumer-generated reviews and expert reviews. (2010)
5 Expedia 61 K consumer ratings and comments of 11 K hotels in Uses factor analysis based upon lexicon and linear regression to identify guest experience-related Hotels Xiang et al. (2015)
the U.S (approx. 1/3 of population). factors that influence hotel satisfaction.
6 Flyertalk 1.5 K comments from members of five major hotel Uses content analysis to identify communication-based core categories that influence hotel Hotels Berezan, Raab, Tanford,
programs using theoretical sampling. communication programs. and Kim (2015)
7 London- 2.5 K customer comments for 300 restaurants in Uses content analysis to identify salient factors that influence restaurant evaluation. Restaurants Pantelidis (2010)
eating.co.uk London, UK.
Note: List is ordered alphabetically by name of data source. If a source was used in multiple studies, the studies are chronologically ordered. Three studies using multiple data sources (No. 20e22) are listed at the end of table.
53
54 Z. Xiang et al. / Tourism Management 58 (2017) 51e65
Kucukusta, & Law, 2016; Liu & Park, 2015; Park & Nicolau, 2015; 3. Research design and analytical framework
Zhang et al., 2016). Although they are different from each other in
terms of study purpose and methodology, these studies fit into the In order to understand the methodological challenges related to
general definition of social media analytics, which is basically con- data quality in social media analytics, we devised a study to assess
cerned with using analytical tools and frameworks to collect, information quality related to online reviews on three major plat-
analyze, summarize, and interpret social media data to extract forms, namely TripAdvisor, Expedia, and Yelp. The rationale for
useful patterns and insights (Fan & Gordon, 2014). selecting these platforms was three-fold: 1) they are widely used by
In these studies, data were obtained from a variety of sources online consumers (Gligorijevic, 2016; Yoo et al., 2016); 2) each of
including the dominant social media sites such as TripAdvisor, them represents a fairly unique “species” of review platforms (i.e.,
Daodao (TripAdvisor's Chinese outlet) and Yelp, OTA sites such as TripAdvisor as the largest virtual travel community, Expedia the
Expedia and Ctrip, and specialty sites such as Flyertalk and Top- largest OTA, and Yelp the largest online community for small, local
rural. Overall, this sample of literature reflects the growing breadth businesses); and, 3) they have been frequently used as primary data
and diversity of social media analytics research in hospitality and sources in academic literature both within and outside the hospi-
tourism; also, there are several observations specifically relevant to tality and tourism field. We chose to compare all existing customer
the present study: First, in terms of sample size these studies range reviews and other peripheral information extracted from these
from a few hundred to a few hundred thousand reviews or other platforms for the entire population of hotel properties in Manhat-
features and, thus, they may not all be considered big data research. tan, New York City in the US. The hotel sector was chosen as the
Nonetheless, these studies are primarily data-driven with the aim study context because the product is fairly standardized at different
to identify novel patterns in the data in order to develop general- service levels, which may allow us to observe discrepancies and
izable understandings about the phenomenon at hand. Second, nuances between these platforms. Manhattan was chosen because
there are a variety of measures collected and derived from online of its high number of hotel properties located in a relatively small
reviews including length of review, lexicon (words), topics geographical region with a wide variation in service levels (from
(themes), sentiment (valence), rating, readability, usefulness, and budget to luxury) and service types (e.g., leisure vs. business).
peripheral features such as reviewer identity and characteristics. To compare and contrast the information quality of these plat-
These measures cover linguistic characteristics and content of on- forms, we applied an analytical framework with the focus on a set of
line reviews as well as variables that are associated with (e.g., rat- review-related measures (see Fig. 1). The concept of information
ing), or impacted by (e.g., helpfulness), online reviews. Third, in quality has been defined in different ways in the information systems
terms of analysis a variety of analytical methods were used literature (e.g., DeLone & McLean, 1992). Recently, within the eWOM
including content analysis, text mining, machine learning (e.g., contexts it has been operationalized as the quality of review content,
artificial neural network analysis), multivariate regression, econo- linguistic characteristics, and peripheral cues that represent rele-
metric modeling, or various combinations of these techniques. vancy, sufficiency, currency, consistency, credibility and usefulness
Finally, and perhaps most importantly, most of these studies (see Filieri, Alguezaui, & McLeay, 2015 for a comprehensive review).
utilized only a single source of data usually based upon the popu- We followed this general schema to include a set of key measures
larity of the website. For example, several studies used TripAdvisor, widely used in recent social media studies, particularly the text ana-
which appears to be the “premier” sampling field, by citing the fact lytics literature (e.g., Abrahams, Fan, Wang, Zhang, & Jiao, 2015;
that it is the largest travel-related review site in the world (e.g., Korfiatis, Garcia-Bariocanal, & Sanchez-Alonso, 2012; Mudambi &
Banerjee & Chua, 2016; Pearce & Wu, 2015; Xie et al., 2014). Other Schuff, 2010; Wang, Liu, & Fan, 2011). As can be seen in Fig. 1, a review
sources were adopted oftentimes for similar reasons or based upon can be seen as consisting of four basic components including linguistic
the popularity of the website. For example, in the Xiang et al. study features, semantic features, sentiment, and its source (the reviewer
(2015) Expedia data were used because the company requires re- information). Linguistic features refer to characteristics related to the
viewers to make at least one transaction through its website before review textual content such as appropriate amount of data, ease of
being allowed to contribute a review to the website, which, pre- understanding, timeliness, relevancy, and completeness, etc., which
sumably, prevents hospitality businesses or marketers to post are used to measure argument quality in the information science and
inauthentic reviews. In terms of sampling methods some studies
(e.g., Crotts et al., 2009; Fang et al., 2016; Ye et al., 2014) adopted
certain rules of thumbs (e.g., using a minimum length of reviews or a
threshold number of reviews per case), while others (e.g., Mellinas
et al., 2015; Phillips et al., 2015; Ye et al., 2009) used all available
data. In a few rare cases involving multiple sources (e.g., Levy et al.,
2013; Ye et al., 2014; Zhang & Cole, 2016), data were aggregated and
then analyzed without assessment of the potentially unique con-
tributions from each of these sources. It is also noticeable that there
seems to be a growing interest in understanding review data in
terms of its representativeness. For example, in a research note
Mellinas et al. (2015) suggested that the way the scoring scale in
Booking.com is displayed could be misleading and researchers must
treat this kind of data with caution. Using a Yelp data set Park and
Nicolau (2015) showed that there exists an asymmetric relation-
ship between sentiment and rating. However, in general there has
been little discussion and evaluation of the quality of review data,
especially how these data truly reflect the hospitality and tourism
industry as well as consumer experience, which we believe could
Fig. 1. Analytical Framework to Assess Information Quality of Online Reviews. (Note:
substantially limit the generalizability of social media analytics the purpose of this framework was not to validate the theoretical structure; rather, it
research in hospitality and tourism. was intended primarily for understanding the commonalities and differences among
the three platforms).
Z. Xiang et al. / Tourism Management 58 (2017) 51e65 55
communications literature (e.g., Bailey & Pearson, 1983). Within the well as sentiment (e.g., Chua & Banerjee, 2015; Fang et al., 2016;
context of the present study, we considered review length (i.e., word Korfiatis et al., 2012; Liu & Park, 2015; Mudambi & Schuff, 2010).
count) and readability as key measures of linguistic features of a re- Following this, a second set of research questions were formulated:
view (e.g., Fang et al., 2016). Another measure of information quality is Q2a. Are there differences between TripAdvisor, Expedia, and Yelp in
semantic features, i.e., words, topics and semantic relationships be- terms of relationships between rating and review linguistic features,
tween linguistic entities, which in the present study were oper- semantic features, and sentiment?
ationalized as linguistic entities (tokens) and their latent dimensions Q2b. Are there differences between TripAdvisor, Expedia, and Yelp in
(topics). Sentiment, which measures the valence (positive/negative) terms of relationships between review helpfulness and review lin-
of an opinion, is an important feature widely used in text analytics guistic features, semantic features and sentiment?
(Pang & Lee, 2008). Review source represents the credibility of the
information provider (Brinol & Petty, 2009) and has been adopted in 4. Data collection and analysis
social media studies to measure its impact on review user perception
and behavior (e.g., Filieri et al., 2015; Sparks et al., 2013). Finally, rating We applied the social media analytics procedure (e.g., Abrahams
and helpfulness have been extensively studied in the social media et al., 2015; Fan & Gordon, 2014) to answer the above research
literature. Rating is the review provider's overall, numeric evaluation questions. We first collected relevant data from the three platforms.
of the product and actual experience, which reflects the level of Then, the unstructured data were pre-processed and key metrics
satisfaction with the product (Park & Nicolau, 2015; Xiang et al., 2015). including online reviews' linguistic features, sentiment, semantic
Review helpfulness seems to be the “ultimate” measure of review features, and perceived helpfulness were developed and compared
quality since it represents a direct response by users who have read among the platforms. Finally, a set of regression analyses were
the review (e.g., Fang et al., 2016; Liu & Park, 2015; Mudambi & Schuff, conducted to examine the relationships between these measures.
2010; Wang et al., 2011).
Within the study context, it was only possible to compare fea- 4.1. Data collection
tures that are commonly shared by all three sites, because, due to
design differences, not all of the above-mentioned features were Data collection took place in late 2015 on all searchable hotel
available on all of these platforms. For example, peripheral cues properties in Manhattan, NYC in TripAdvisror, Expedia, and Yelp.
such as source credibility and reviewer-provided photos could be Web crawlers written in the Python and Java programming lan-
critical contributors to perceived review helpfulness; however, guages were used to mimic a user's access to the system by speci-
there are discrepancies in the provision and presentation of these fying the travel destination and following all the links of hotel
types of information. For instance, Expedia does not provide properties displayed as search results to download relevant infor-
detailed, trackable information about reviewers. In terms of users' mation. Several types of data were collected including the name of
response to reviews, Yelp offers two more options, namely “funny” the hotel property, its address, hotel class, all of its customer reviews,
and “cool”, besides “useful”. In this case, helpfulness (“useful” in user responses (usefulness or helpfulness), and the overall rating. As
Yelp's case) was used as the measure to compare these three shown in Table 2, in total, along with other types of information we
platforms. With this in mind, we formulated the following research collected approx. 439 k reviews from TripAdvisor, 481 k from
question to guide the analytical process: Expedia, and 31 k from Yelp for a total of approx. 500 hotel prop-
Q1. Are there differences between TripAdvisor, Expedia, and Yelp in erties (number of properties is platform specific). We used the
terms of review linguistic features, semantic features, sentiment, rat- Language Detection Package (i.e., langdetect) in Python (see https://
ing, and review helpfulness? www.python.org) to detect the languages that reviews were written
As suggested by Ruths and Pfeffer (2014), the performance of in. After removing all non-English reviews, there were on average
theoretical relationships should be examined using multiple data 991 reviews per property in TripAdvisor, 752 in Expedia, and 53 in
sets in big data analytics. Therefore, in addition to individually Yelp. Although Expedia had the largest amount of reviews in total,
comparing these measures of information quality, we also would TripAdvisor had the highest number of reviews per hotel property.
like to assess the relationships between some of these key mea- Apparently, Yelp yielded a substantially smaller number of reviews
sures and rating and helpfulness in order to understand the in- compared to the other two. Reviews written in English served as the
tricacies between these variables and, particularly, whether and basis for the subsequent analysis on information quality.
how the three platforms could be structurally similar or different. In
recent text analytics literature, sentiment has been found to be 4.2. Data analysis
highly associated with cues such as product rating (e.g., Park &
Nicolau, 2015) and also a strong predictor of information quality 4.2.1. Data pre-processing
(e.g., Jeong, Mankad, Gavirneni, & Verma, 2016). Rating (i.e., as All English language-based reviews collected from the three
indication of satisfaction) has been found to be a function of various platforms were pre-processed using two basic procedures: toke-
characteristics of a review including latent dimensions revealed in nization and stop words removal. Tokenization is a form of lexical
the textual content of the reviews (Chua & Banerjee, 2015; Xiang analysis whereby a stream of text is broken up into words, phrases,
et al., 2015). Schuckert et al. (2015a) showed that the consistency or other meaningful elements called tokens. In this study, each
between overall rating and specific ratings on different product review was broken up into a vector of unigram-based tokens using
attributes can be used as a means to detect inauthentic reviews. As a function called RegexTokenizer in Python's 'nltk.tokenize' Pack-
such, it was believed the relationships between rating and reviews' age. Stop words are words that do not contribute to the meanings of
linguistic features, semantic features and sentiment are indications the text and are usually filtered out before the processing of natural
of internal consistency for information quality. For example, if a language data. For this particular study, we applied an existing stop
reviewer gave a hotel a rating of five while the review content was word list consisting of 429 English words (see http://www.lextek.
negative, it implies that either the review content or the rating was com/manuals/onix/stopwords1.html), which has been widely
not a truthful reflection of his/her evaluation of the product. In applied in text mining and analytics. These two processes resulted
addition, perceived helpfulness of reviews has also been found to in considerable reduction of the text corpus; that is, in terms of total
be influenced by factors such as reviewer information (source), token frequency there was approx. 40% left in both TripAdvisor and
review content, linguistic features (e.g., length and readability), as Yelp data sets while 43% in Expedia. Then, basic linguistic features
56 Z. Xiang et al. / Tourism Management 58 (2017) 51e65
Table 2
Summary of the main data set.
Review Platform N of Hotels N of Reviews N of English Reviews (percentage) N of English Reviews per Hotel
such as review length were computed for each review. We used the classifiers using the Naïve Bayes method and running the 10-fold
Python Package textstat0.2, which was developed based on the cross validation (a typical performance test in sentiment anal-
Flesch Reading Ease formula (Flesch, 1948) using word length and ysis), the classification results seemed quite satisfactory with pre-
sentence length as core measure plus other weighting factors, to cision and recall rates for predicting both negative and positive
compute a readability score for each review. A higher score repre- reviews higher than 95%. Then, each review was assigned a senti-
sents higher degree of readability and vice versa. ment score between 0 and 1 with 0 and 1 representing the two
extremes of sentiment, i.e., negativity and positivity, respectively.
4.2.2. Development of key metrics
4.2.2.1. Review topics identification. Since the vector space repre-
4.2.2.3. Review helpfulness score development. Review helpfulness
sented by all the linguistic tokens was huge and difficult to describe
is a key indicator of review quality; however, only a small portion of
and interpret, we used topic modeling to reduce the space to a
reviews had been rated as helpful on the three platforms. In order
manageable number of potentially meaningful dimensions
to compare the three platforms based on this key metric, we
(Griffiths & Steyvers, 2004). We applied the Latent Dirichlet Allo-
devised a machine learning procedure to “simulate” a helpfulness
cation (Blei, Ng, & Jordan, 2003) to the review text to discover the
score for each review using the centroid-based summarization
main topics related to consumers' experience and evaluation of
approach developed by Radev, Jing, Stys, and Tam (2004). First, all
hotel product. In a nutshell, The LDA model assumes that there
English-based reviews from the three platforms were merged
exists a hidden structure consisting of a set of topics in the whole
together. Then, the top 3000 tokens with the highest frequencies
text corpus. The LDA algorithm uses the co-occurrence of observed
were selected to form the shared dictionary and build the bag of
words in different documents to infer this hidden structure.
words corpus. For each review, the 3000-token vector was trans-
Mathematically, the model calculates the posterior distribution of
formed into the TF-IDF representation. TF-IDF, abbreviation for
the unobserved variables in the collection of documents. Given a
term frequencyeinverse document frequency, is a numerical value
set of training documents, LDA returns two main outputs. The first
used in text mining to reflect how important a word is to a docu-
one is the list of topics associated with a set of words, which pre-
ment in a collection or corpus. Based upon this token-document TF-
sumably contribute to this topic through their weights. The second
IDF matrix, reviews marked “helpful” were extracted to form the
output is a list of documents with a vector of weight values showing
helpfulness corpus and its mean vector was calculated to represent
the probability of a document containing a specific topic. With this
the semantic centroid of review helpfulness. With this, cosine
we identified the most salient topics within the review text and
similarity between TF-IDF representation of each review and the
also computed and assigned topic scores to each review (i.e., to
centroid is computed and its value was assigned as the helpfulness
represent the likelihood of a topic to occur in a specific review).
score to each review (0e1).
Since the review corpus contained potentially a large number of
possible topics, we used the Elbow Method (Ketchen & Shook,
1996) to examine the perplexity values in order to determine the 4.2.3. Examine the relationships between key measures
appropriate number of topics. Perplexity is commonly used in Using the metrics extracted and developed with the methods
language modeling to test the fitness of a text model given a described above, we ran two linear regression models to compare
training corpus. A lower perplexity score indicates better general- the three platforms. First, we assessed the relationships between
ization performance in new documents. Like in cluster analysis, the rating and review characteristics including review topics and
Elbow Method looks at the percentage of variance explained as a sentiment. The main rationale was that the correlations between
function of the number of topics. With the outputs of topic these features should indicate the internal consistency of a review.
modeling we assigned each review a number of topic scores rep- The second regression analysis assessed the relationships between
resenting the review's likelihood of containing a specific topic. the “simulated” review helpfulness score and review characteristics
including linguistic features, review topics, as well as sentiment.
4.2.2.2. Review sentiment identification. Sentiment analysis is a text Both analyses were intended to observe the models' performances
mining procedure to discover emotive content in texts (Pang & Lee, and also to detect any structural differences and inconsistencies
2008). To develop a sentiment score for each review, we started out across the three platforms.
by applying two existing lexicons of positive and negative word Both regression analyses were run in the JMP statistical software
senses that had been developed in other product domains to the by SAS (see http://www.jmp.com). Since some of the independent
review corpus. However, the results were inconsistent and unreli- variables were constructed in different scales, we applied scale
able. Therefore, we decided to develop our own lexicon of positive transformation to normalize their values. For example, review topic
and negative word senses that is presumably more suitable to the scores were log-transformed due to their low probability values.
hospitality product. Following the sentiment detection procedure Also, multicollinearity was found among some of the variables after
outlined in Abrahams et al. (2015), we first randomly selected the initial runs, which made the models unstable and results
10,000 reviews as the training data set, out of which we further difficult to interpret. We then conducted a series of centralization
generated 2000 reviews based on polarized ratings. We then used operations (i.e., obtaining a new score by using the difference be-
domain experts to manually label these 2000 reviews into a posi- tween the original score and the mean). By doing so, we ensured
tive and a negative set. Based upon this, we were able to generate a the variance inflation factor (VIF) scores for all predictors in the
lexicon of “smoke” words which were then used as the sentiment regression models were below 10, which effectively removed the
classifiers to apply to the entire review corpus. After training the multicollinearity problem.
Z. Xiang et al. / Tourism Management 58 (2017) 51e65 57
In this section, we first present the diagnostic analysis in terms Table 5 lists a number of basic attributes of reviews on three
of the extent to which the three platforms represent the hotel platforms. The data set consisted of reviews dated back to as early
product from the supply side perspective. Then, we describe the as 15 years ago (the oldest reviews in the data set were created in
characteristics of online reviews using the metrics we developed to late 2001 on TripAdvisor, late 2004 on Yelp, and 2005 on Expedia,
measure information quality of these platforms. Finally, we present respectively). On average, reviews in TripAdvisor and Yelp were
the results of regression analyses assessing the relationships be- considerably older than those in Expedia, which can likely be
tween review characteristics and both rating and helpfulness. attributed to Expedia's relatively late incorporation of social media
contents into its transaction-based business focus. Average length
of reviews (measured by number of tokens after data pre-
5.1. Representation of the hotel industry processing) was similar between TripAdvsior and Yelp and much
higher than that of Expedia. In fact, data extracted from Expedia
We examined the number of hotel properties in these websites in contained a considerable number (N ¼ 603) of “empty” reviews
terms of hotel class and brand. Among them only Expedia has a star (i.e., length ¼ 0). Three platforms were similar in terms of review
designation system (from one-star to five-star) for its listed proper- readability. Average rating was similar between TripAdvsior and
ties; therefore, we applied Expedia's star rating system to TripAdvisor Expedia while much lower in Yelp. Finally, average number of
and Yelp by matching hotel names listed in Expedia. As can be seen in helpfulness responses per review was much higher on TripAdvisor
Table 3, there were huge discrepancies between the three platforms in and Yelp than Expedia. Overall, TripAdvisor and Yelp appeared to
terms of number and proportion of hotel properties in different ser- have richer information in relation to online reviews than Expedia,
vice classes. Compared to Expedia, there were 54 fewer hotels in while Yelp seemed to be unique in that it is likely to attract con-
TripAdvisor and 182 in Yelp with a star rating, suggesting that there sumers to voice their dissatisfaction or complaints.
are substantial differences in listed properties on these platforms.
Table 4 lists hotel properties under major brands on these three
5.2.1. Review length
platforms. As can be seen, TripAdvisor and Expedia were very
We plotted the distributions of review length on the three
similar with only one short in Expedia for the InterContinental
platforms alongside each other in Fig. 2 with the X axis repre-
Hotels Group (IHG) brands. Yelp apparently showed most dis-
senting a specific length and Y axis its percentage of reviews on a
crepancies in that the total number of brand properties was sub-
specific platform. As can be seen, review length in Expedia was
stantially higher than the number of either Expedia or TripAdvisor.
substantially skewed toward the shorter end. In fact, the vast ma-
This might be caused by inaccurate hotel names in the system,
jority (87.5%) of reviews in Expedia had a length of no greater than
which led to duplicates in the downloaded data set. From the
50 words, while only 57.4% in TripAdvisor and 54.8% in Yelp were
consumers' standpoint, this could be a big obstacle when they are
equal or shorter. The contrast was even more drastic in that approx.
searching for hotel brands on the website. It is also noteworthy that
61% of reviews in Expedia contained no more than 25 words, while
Hilton brands were under-represented (as shown in percentage) in
only 21.9% in TripAdvisor and 24.1% in Yelp were equal or shorter. In
Yelp, while there was a considerable higher percentage of other
general, TripAdvisor and Yelp appeared to be similar albeit the
brands, compared to the other two. This might be attributable to
distribution was “narrower” surrounding the mode in TripAdvisor
the Yelp business model as a platform for local, small businesses.
while more spread out in Yelp.
destination specific in that they may not be the same if our data had i.e., Dining & Experience. These different levels of manifestation of
been collected from a different city, particularly for Landmarks & the common topics seemed to reflect these platforms' business
Attractions (Topic 3) and Dining & Experience (Topic 4). Other orientations and user bases, and it seems that online reviews on
topics appeared to be more generic, relevant to the hotel product. these websites had different “flavors”.
To understand potential differences between the three plat- In order to understand whether, and the extent to which, these
forms, we then examined the manifestation of these five topics on topics were related to how reviewers evaluate the product and
each of these websites. As shown in Fig. 3 (X axis represents the their experiences, we plotted their distributions against satisfaction
topics and Y the distribution of reviews on each platform as per- rating. As can be seen in Fig. 4, in reviews with rating of one (1 on
centage), overall these five topics were fairly evenly distributed on the X-axis) the topic space was dominated by Basic Service (53.3%)
three platforms. Topic 5, i.e., Core Product, manifested almost followed by Core Product (21.6%), regardless of platform with a
equally between the three websites, suggesting that hotel guest combined percentage of nearly 75%. In reviews with rating of two
room and bathroom shared the same level of concern or mentioned (2) there was almost the same pattern in that these two topics
equally in the reviews. However, other four topics manifested quite dominate the reviews with a combined percentage of 69.2%. This
differently among these websites. Topic 1, i.e., Basic Service, was means that topics related to basic hotel services (i.e., front desk,
much more prominent in Yelp compared to TripAdvisor and staff, etc) and core product (the guest room and bathroom) were
Expedia. Topic 2, i.e., Value, appeared to be more prominent in salient in reviews associated with lower ratings. On the other hand,
Expedia than the other two, which seemed to fit with the topics related to value, landmarks and attractions, and dining and
transaction-based nature of Expedia. Topic 3, i.e., Landmarks & experience increased their share of the semantic space from re-
Attractions, was particularly less prominent on Yelp, which may be views with lower ratings to those with higher ratings. Particularly
a reflection of its status as a platform for local businesses. noteworthy was that Topic 3 (Landmarks & Attractions) drastically
Compared to TripAdvisor and Yelp, Expedia was lower on Topic 4, increased from 5.8% (rating score of 1) to 18.2% (rating of 5) and
Table 6
Review Topics Identified using Latent Dirichlet Allocation (LDA).
Topic 1. Basic Service Topic 2. Value Topic 3. Landmarks & Topic 4. Dining & Topic 5. Core Product
Attractions Experience
desk 0.044 great 0.107 square 0.072 bar 0.030 room 0.081
front 0.040 location 0.080 times 0.060 view 0.018 free 0.027
room 0.040 staff 0.076 central 0.042 trip 0.018 bed 0.026
service 0.035 good 0.062 park 0.039 restaurant 0.017 small 0.022
air 0.011 breakfast 0.029 station 0.023 service 0.013 size 0.020
check 0.010 nice 0.021 building 0.019 experience 0.012 area 0.020
business 0.009 place 0.021 subway 0.018 visit 0.010 coffee 0.019
rate 0.008 excellent 0.018 empire 0.017 wonderful 0.010 nice 0.016
door 0.007 price 0.016 state 0.016 lovely 0.009 bathroom 0.015
customer 0.007 friendly 0.015 broadway 0.014 top 0.009 shower 0.014
Z. Xiang et al. / Tourism Management 58 (2017) 51e65 59
Topic 4 (Dining & Experience) from 11.0% (rating of 1) to 26.5% 5.2.4. Review helpfulness
(rating of 5). This suggests that reviewers may have distinct mental Based upon the semantic centroid identified using existing re-
models when writing reviews with either positive or negative views that were rated helpful, or useful in the case of Yelp, we
sentiment. This also means that topics themselves are not neutral; calculated a helpfulness score for each review by measuring the
that is, they implicitly reflect reviewers' evaluation of the product. cosine similarity between the review's semantic space and the
centroid. Fig. 6 shows the distribution of helpfulness scores of all
5.2.3. Review sentiment English reviews on the three platforms. The X-axis represents the
Based upon the sentiment analysis, we computed and assigned a value of cosine similarity: the higher the value is, the more helpful a
sentiment score to each review. Fig. 5 shows the distribution of review would be rated. The Y-axis represents the percentage of
reviews with their sentiment scores on three platforms (the X-axis reviews of a certain helpfulness score on a specific platform. As can
represents the sentiment score, while the Y-axis indicates the be seen, the distribution exhibited similar normality on three
percentage of reviews within a specific platform). As can be seen, to platforms. However, their medians are different with Expedia the
the right side of the X-axis are reviews with positive sentiment lowest at 0.14 (represented by approx. 1% of all of its reviews).
scores (<¼1), while the negative reviews are aligned toward the left Compared to the Expedia median, there were nearly two-thirds
side (with 0 the lowest). On average, three platforms had an overall (65.6%) of reviews on Yelp and three quarters (74.5%) on Tri-
sentiment score of 0.68; however, Yelp was considerably lower pAdvisor with a higher cosine similarity score, which suggests that
(with a mean of 0.52) than the other two (TripAdvisor had a mean there was considerable amount of reviews on these two platforms
of 0.70 and Expedia 0.66). More interestingly, this graph shows that would be rated helpful than Expedia.
almost an identical pattern in Expedia and TripAdvisor in that both
of their distributions are to a great extent skewed toward the 5.3. Results of regression analyses
positive side. This indicates overall reviews in these two websites
tend to have positive sentiments. Among these three, Yelp was 5.3.1. Rating and review characteristics
unique with a near “saddle”-shaped distribution, which suggests Table 7 shows the results of multivariate linear regression
that its reviewers were more polarized than the other two. analysis to examine the relationships between rating and review
characteristics including review topics and sentiment. The goal was that some of the signs of coefficients for the topics flipped due to
to compare and contrast the three platforms in terms of the extent the introduction of dominant variables such as sentiment into the
to which these characteristics contribute to rating as a measure of model. It is noteworthy that, besides review sentiment, review
internal consistency. To do so, we added review features, one set at topics are also strong predictor of rating. This is consistent with the
a time, to see the differences in their effects in explaining the findings shown in Fig. 4; that is, words related to basic service and
variance in rating. To see the differences between the three plat- core products are not neutral.
forms, we tested an overall model with a combined data set as well While it is not surprising to see these review characteristics
as an individual model based upon the data set for each of the explain a large amount of variance in rating (Adjusted R
platforms. We first introduced a couple of control variables, i.e., Square ¼ 0.56 in the overall model), interestingly, these three
hotel class and brand, to see if the industry structure can be used to platforms yielded different performances. In terms of explanatory
explain rating (Model 1). As can be seen, both hotel class and brand power Yelp appeared to the strongest followed by TripAdvisor and
were significant in a positive way, suggesting that 1) the higher a then Expedia. The relatively large coefficients of variables such as
hotel's class is, the more likely it has a higher rating; and, 2) Sentiment (3.44), Sentiment * Basic Service (4.04), and Sentiment *
whether a hotel is branded or not makes a difference. However, Dining & Experience (1.29) in the Yelp model in Model 4, seemed to
their overall contribution to rating was small with a combined contribute to the explanatory power of its model. Most of the co-
Adjusted R Square of 0.036 (in the overall model). efficients in the Expedia model were smaller than the other two
Model 2 examined the contribution of review topics (measured except in the variable Sentiment * Value, which seemed to be
as the likelihood of a specific topic to be contained in a review) to consistent with the findings shown in Fig. 3 (i.e., Expedia has the
rating. The Adjusted R Square increased considerably (to 0.24 in the flavor of “value”). Compared to the TripAdvisor model, the Expedia
overall model). All topics were significant, while T1 (Basic Service) model had considerably lower explanatory power, which suggests
and T5 (Core Product) were negative and T2 (Value), T3 (Landmarks that the topics contained in its reviews were not as consistent with
& Attractions), and T4 (Dining & Experience) all positive. While, the rating scores as TripAdvisor, although both platforms had
given the large sample size, the level of significance was not sur- almost identical patterns in review sentiment. In Model 2, Tri-
prising, these signs were quite revealing because they seemed to pAdvisor and Yelp were almost equal in Adjusted R Square and
reflect the negative connotations of T1 and T5 (as also noted in Yelp's performance improved considerably once sentiment was
Fig. 4). In Model 3 review sentiment was shown to be a strong introduced (Model 3). This suggests that review topics were likely
predictor of rating in that its introduction more than doubled the the variables that differentiated Expedia from the other two plat-
Adjusted R Square in the overall model. We also examined the forms, and review sentiment the variable that made Yelp unique
interaction effect between sentiment and topics on rating, which compared to the other two.
moderately improved the explanatory power. Among the interac-
tion terms Sentiment * Basic Service appeared to be the strongest 5.3.2. Review helpfulness and review characteristics
predictor (as shown in the size of its coefficient). It should be noted Table 8 shows the results of the regression analysis examining
Z. Xiang et al. / Tourism Management 58 (2017) 51e65 61
Fig. 6. Distribution of helpfulness score among all reviews on three review platforms.
the relationships between review helpfulness and review charac- (Basic Service), T4 (Dining & Experience), and T5 (Core Product) all
teristics. Different from the analysis on rating, two new variables seemed to be negatively correlated to helpfulness. Sentiment also
related to linguistic characteristics, i.e., review length and read- seemed to be a strong predictor of helpfulness (Model 9), which
ability, were added (Models 6 and 7). As can be seen, although not explained over 10% more variance compared to Model 8, indicating
substantial, both hotel class and brand contributed to review the more positive the tone in the review, the more likely it will be
helpfulness. Among the linguistic features, review length seemed perceived as helpful. The final model (Model 10), which included
to a strong indicator of review helpfulness with an Adjusted R the interaction terms between sentiment and topics, reached an
Square of 0.45 in the overall model (Model 6). By introducing topics Adjusted R Square of 0.65 in the overall model. In general, the co-
into the regression model (Model 8) the explanatory power efficients in these models were substantially smaller than those in
increased considerably to 0.54 in the overall model. Also note- the rating models because the dependent variable helpfulness was
worthy was the sign of the coefficients of these variables in that T1 much smaller in scale (i.e., 0e1).
Table 7
Regression analysis examining relationships between rating and review characteristics.
Overall TPA EXP YLP Overall TPA EXP YLP Overall TPA EXP YLP Overall TPA EXP YLP
Class 0.22 0.23 0.22 0.04 0.17 0.18 0.20 0.02 0.09 0.09 0.13 0.003 0.09 0.09 0.14 0.004
Brand 0.05 0.03 0.09 0.05 0.12 0.11 0.10 0.16 0.06 0.06 0.07 0.02** 0.05 0.04 0.07 0.003
T1 1.95 1.65 1.96 3.41 0.34 0.25 0.50 0.33 0.41 0.5 0.31 0.15**
T2 0.57 0.6 0.84 0.15 0.007 0.16 0.25 0.11 0.83 0.9 0.66 0.96
T3 0.34 0.47 0.50 0.49 0.1 0.23 0.05* 0.29 0.96 1.01 0.89 1.01
T4 0.85 1.03 0.83 ¡0.12* 0.78 0.65 0.89 0.71 0.25 0.31 0.13 ¡0.18**
T5 0.46 0.21 0.64 0.98 0.38 0.23 0.48 0.62 0.69 0.78 0.59 0.45
Sentiment 3.1 3.22 2.78 3.77 2.89 2.95 2.69 3.44
Senti * T1 2.6 3.08 1.63 4.04
Senti * T2 0.92 0.42 1.61 0.45
Senti * T3 0.32 0.004 0.52 0.39**
Senti * T4 0.39 0.54 0.02 1.29
Senti * T5 0.52 0.88 0.43 0.64
R2 0.036 0.034 0.029 0.004 0.242 0.245 0.215 0.245 0.539 0.548 0.491 0.584 0.555 0.566 0.501 0.604
Adjusted R2 0.036 0.034 0.029 0.004 0.242 0.245 0.215 0.245 0.539 0.548 0.491 0.584 0.555 0.566 0.501 0.604
Table 8
Regression analysis examining relationships between helpfulness and review characteristics.
Overall TPA EXP YLP Overall TPA EXP YLP Overall TPA EXP YLP Overall
Class 0.000 0.003 0.001 0.000 0.000 0.002 0.001 0.000 ¡0.000* 0.002 0.001 0.000 0.001
Brand 0.003 0.004 0.003 0.006 0.004 0.004 0.004 0.006 0.004 0.004 0.004 0.006 0.003
Length 0.050 0.050 0.050 0.050 0.050 0.05 0.050 0.050 0.060
Readability 0.000 0.000 0.001 0.000** 0.000
T1 0.060
T2 0.007
T3 0.020
T4 0.040
T5 0.010
Sentiment
Senti * T1
Senti * T2
Senti * T3
Senti * T4
Senti * T5
R2 0.001 0.003 0.001 0.002 0.453 0.466 0.291 0.499 0.453 0.466 0.293 0.499 0.537
Adjusted R2 0.001 0.003 0.001 0.002 0.4526 0.466 0.291 0.499 0.453 0.466 0.293 0.499 0.537
Similar to the rating models, among the three platforms the Yelp found elsewhere (e.g., Chua & Banerjee, 2013; Jeacle & Carter, 2011).
data appeared to explain the highest amount of variance in the Furthermore, in terms of review helpfulness TripAdvisor and Yelp
helpfulness score (with an Adjusted R Square of 0.69 in Model 10) appear to have more reviews that would potentially be seen as more
followed by TripAdvisor (0.65) and Expedia (0.54). By examining helpful than Expedia. Besides, our analyses also revealed some
the stepwise models, Model 6 seemed to suggest that review length important nuances that may reflect structural differences among
was the strongest predictor in Yelp (0.50) compared to the other these platforms. Specifically, the connections between rating and
two. Review length was particularly weak in the Expedia model helpfulness and other review characteristics are not as strong in
(0.29), which suggests that its variance in Expedia was not as Expedia as the other two. In particular, review topics are not strong
strongly associated with helpfulness as in other websites. However, predictors for rating in Expedia data, which may suggest likely in-
sentiment seemed to be a stronger predictor in Expedia because, by consistencies between what a reviewer writes and what satisfaction
comparing Model 9 (which included sentiment) with Model 8, the score he/she assigns to the product. Also, review length is not as
increase of Adjusted R Squared in the Expedia model was almost strong a predictor of review helpfulness in Expedia than the other
0.14, as opposed to roughly 0.10 in both TripAdvisor and Yelp. This two, indicating the lack of quality due to the smaller variance in the
seemed to suggest that, if review length was not perceived as in- amount of information contained in its reviews. Overall, Yelp seems
dicator of helpfulness, a user may shift his/her attention to other to have the strongest performance in term of both rating and
review characteristics when evaluating the utilities of a review. helpfulness, which may be attributed to its high variance in its re-
view sentiment. Through these text analytics exercises and above
findings, this study offers several important implications for both
6. Discussion and implications
research and practice.
Motivated by the lack of understanding of data quality in social
media-related studies, we applied a series of text analytics tech- 6.1. Implications for research
niques to “dissect” three major review platforms in hospitality and
tourism. This study shows that TripAdvisor, Expedia, and Yelp, while First and foremost, by showing the differences in major online
all incorporating consumer reviews as primary social knowledge, review platforms this study contributes to the epistemology of
are indeed distinct from each other on a variety of aspects. In terms social media analytics by suggesting that studies directly drawing
of representing the supply of the hotel product, there appears to be data from online websites must, indeed, consider the inherent
huge discrepancies between these platforms. In terms of sheer traits and potential biases in social media data (Ruths & Pfeffer,
amount of review data, TripAdvisor and Expedia are comparable to 2014; Tufekci, 2014). There are huge discrepancies in the repre-
each other, while Yelp is substantially smaller. In terms of topics sentation of the hotel industry on these platforms. Online reviews
contained in the review texts, each platform manifests certain could vary across different platforms in terms of linguistic char-
“flavors”, which may reflect different trending themes, which, in acteristics, semantic features, sentiment as well as impact on
turn, may reflect potentially different user bases on these platforms. users' of the websites. When sampling online review data, careful
In terms of overall sentiment, TripAdvisor and Expedia are similar considerations must be given to the source of the data as well as
while Yelp is quite unique with a more polarized distribution. Be- representativeness of the data within a specific source. Our find-
tween TripAdvisor and Expedia, which are comparable in review ings suggest that one of the platforms alone may not be a suffi-
volume, the former seems to have higher overall quality. This seems cient source for quality data because different platforms may
to explain why TripAdvisor has been widely perceived as a premier possess fairly unique characteristics. For example, when the
data source either based upon anecdotes or empirical evidence research relies upon a representative sample of review sentiment,
Z. Xiang et al. / Tourism Management 58 (2017) 51e65 63
TPA EXP YLP Overall TPA EXP YLP Overall TPA EXP YLP
0.000 0.001 0.001 0.001 0.002 0.001 0.000 0.001 0.002 0.001 0.000
0.003 0.003 0.003 0.002 0.001 0.003 0.000 0.002 0.001 0.003 0.000
0.060 0.050 0.060 0.060 0.060 0.050 0.070 0.060 0.070 0.050 0.060
0.000 0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000
0.050 0.080 0.050 0.010 0.010 0.006 0.030 0.030 0.030 0.040 0.030
0.010 0.004 0.020 0.005 0.008 0.002 0.020 0.020 0.030 0.030 0.020
0.030 0.020 0.020 0.020 0.020 0.020 0.030 0.035 0.040 0.040 0.020
0.030 0.050 0.050 0.030 0.030 0.040 0.030 0.010 0.007 0.000 0.030
0.001 0.030 0.001 0.020 0.020 0.010 0.040 0.040 0.050 0.050 0.030
0.090 0.090 0.090 0.080 0.100 0.100 0.090 0.090
0.080 0.070 0.110 0.070
0.010 0.003* 0.020 0.030
0.006 0.010 0.020 0.020
0.050 0.050 0.070 0.070
0.060 0.050 0.070 0.070
0.547 0.398 0.581 0.643 0.647 0.536 0.683 0.647 0.651 0.542 0.686
0.547 0.398 0.580 0.643 0.647 0.535 0.683 0.647 0.651 0.542 0.686
the distribution of sentiment must be closely examined due to the interestingly, the strong associations between review topics and
possible variations across platforms. Also, review contents may both rating and helpfulness suggest that, within the product eval-
vary considerably by service level and other factors. As such, if the uation context, non-affective words themselves could have affec-
research focus is on identifying topics and opinions that reflect tive connotations similar to the effect of sentiment. This is
consumer experience and evaluation of the product, findings consistent with the Xiang et al. (2015) study wherein topical “fac-
based upon single-source data should only be considered as tors” generated from review texts were found highly correlated
candidate solutions (Ruths & Pfeffer, 2014). In this regard, this with rating. Also, this seems to reflect the findings of a recent study
study offers insights into developing useful heuristics such as by Dodds et al. (2015) which shows that human language inher-
review length and rating to construct sampling rules in social ently possesses a universal positivity bias (i.e., language could be
media analytics research. While in a few recent studies review “colorful” in the emotive sense regardless whether it contains
volume (e.g., Fang et al., 2016; Melian-Gonza lez et al., 2013) and specific emotive words). However, whether this phenomenon is
review length (e.g., Crotts et al., 2009) were used as “filters” for consistent within different product contexts and how it interplays
sampling purposes, the selection of data must be done in a more with the overall sentiment of the review needs to be confirmed and
systematic way. could be an intriguing research question.
Second, there is a growing interest in understanding the Finally, this study also sheds light on the structure of guest
communicative effects of online reviews within and outside experience as revealed in online reviews and its connection with
hospitality and tourism (e.g., Filieri et al., 2015; Mudambi & satisfaction. There is a growing literature on using online reviews to
Schuff, 2010; Racherla & Friske, 2012; Sparks et al., 2013). As an understand how various product and service aspects contribute to
exploratory study, this paper sheds light on the theoretical hotel guests' satisfaction (e.g., Crotts et al., 2009; Zhang & Cole,
development for assessing the utility and impact of eWOM in 2016). For example, in an exploratory study using Expedia review
hospitality and tourism. On the one hand, the analytical frame- data Xiang et al. (2015) found that, in online hotel reviews, the
work used in our study shows considerably different explanatory mentions of “hygiene” factors such as room maintenance and
power across platforms, which supports the claim by Ruths and cleanliness often take place in a negative context and act to prevent
Pfeffer (2014) and others that empirical models developed us- the guest from sharing any positive experience. In the present
ing big data may need to be confirmed with multiple data study, the association between topics related to basic service and
sources. On the other hand, compared to a number of recent core product (room and bathroom) and low ratings seems to
analytics-based studies specifically measuring the consumer confirm their findings across different platforms. This suggests that
value of online reviews (e.g., Fang et al., 2016; Jeong et al., 2016; it is possible to identify, through user-generated content such as
Liu & Park, 2015), this framework seems to be more effective in online reviews, meaningful structures among various aspects and
explaining why a review could be perceived as helpful (as indi- attributes related to hospitality and tourism products. This is
cated by the high overall R squares across different platforms). perhaps one of the promising research areas in social media ana-
Obviously, this could be attributed to a number of reasons such as lytics in hospitality and tourism.
the comprehensiveness of the data set in terms of both size and
variable inclusion as well as specific algorithms used in the text 6.2. Implications for practice
analytics exercises. Nonetheless, the analytical framework
showed good explanatory power and, thus, can be expanded and Although it was not the primary goal of this study to generate
further refined into a general theoretical framework for social managerial insights, this study does offer plenty of implications for
media analytics based upon textual data. Furthermore, this businesses. This study clearly shows that not all review websites
framework seems to be able to capture inconsistencies between are created equal; they vary considerably in quality and focus; they
review content and the reviewer's satisfaction rating, suggesting may represent different consumer segments; generally speaking,
possible directions for detecting inauthentic consumer evalua- review websites come in different sizes and shapes. Therefore,
tion of products, as demonstrated in Schuckert et al. (2015a) and these nuances in data quality must be taken into consideration
others. when developing product and market intelligence for the firm us-
Third, this study reveals the intricacies between some of the ing these data platforms. Hospitality and tourism businesses must
variables associated with online reviews. For example, review make wise decisions when choosing these channels to engage with
length seems to be a strong indicator of helpfulness. More their existing and prospective customers, e.g., when providing
64 Z. Xiang et al. / Tourism Management 58 (2017) 51e65
Li, X., & Hitt, L. M. (2008). Self-selection and information role of online product sales. International Journal of Hospitality Management, 28(1), 180e182.
reviews. Information Systems Research, 19(4), 456e474. Ye, Q., Li, H., Wang, Z., & Law, R. (2014). The influence of hotel price on perceived
Liu, Z., & Park, S. (2015). What makes a useful online review? Implication for travel service quality and value in e-tourism an empirical investigation based on
product websites. Tourism Management, 47, 140e151. online traveler reviews. Journal of Hospitality & Tourism Research, 38(1), 23e39.
Luca, M., & Zervas, G. (2015). Fake it till you make it: Reputation, competition, and Yoo, K. H., Sigala, M., & Gretzel, U. (2016). Exploring TripAdvisor. In R. Egger, I. Gula,
Yelp review fraud. Harvard Business School NOM Unit Working Paper, (14e006). & D. Walcher (Eds.), Open tourism (pp. 239e255). Springer Berlin Heidelberg.
Lu, W., & Stepchenkova, S. (2012). Ecotourism experiences reported online: Clas- Zhang, Y., & Cole, S. T. (2016). Dimensions of lodging guest satisfaction among
sification of satisfaction attributes. Tourism Management, 33(3), 702e712. guests with mobility challenges: A mixed-method analysis of web-based texts.
Lu, W., & Stepchenkova, S. (2015). User-generated content as a research mode in Tourism Management, 53, 13e27.
tourism and hospitality applications: Topics, methods, and software. Journal of Zhang, Z., Ye, Q., Law, R., & Li, Y. (2010). The impact of e-word-of-mouth on the
Hospitality Marketing & Management, 24(2), 119e154. online popularity of restaurants: A comparison of consumer reviews and editor
Mayer-Scho € nberger, V., & Cukier, K. (2013). Big data: A revolution that will transform reviews. International Journal of Hospitality Management, 29(4), 694e700.
how we live, work, and think. Houghton Mifflin Harcourt. Zhang, Z., Zhang, Z., & Yang, Y. (2016). The power of expert identity: How website-
McKercher, B., Law, R., & Lam, T. (2006). Rating tourism and hospitality journals. recognized expert reviews influence travelers' online rating behavior. Tourism
Tourism Management, 27(6), 1235e1252. Management, 55, 15e24.
Melia n-Gonza lez, S., Bulchand-Gidumal, J., & Lo pez-Valc arcel, B. G. (2013). Online
customer reviews of hotels as participation increases, better evaluation is ob-
tained. Cornell Hospitality Quarterly, 54(3), 274e283.
Zheng Xiang, Ph.D., is Associate Professor in the
Mellinas, J. P., María-Dolores, S. M. M., & García, J. J. B. (2015). Booking. com: The
Department of Hospitality and Tourism Management at
unexpected scoring system. Tourism Management, 49, 72e74.
Virginia Tech, USA. His research interests include travel
Mkono, M., & Tribe, J. (2016). Beyond reviewing: Uncovering the multiple roles of
information search, social media marketing, and business
tourism social media users. Journal of Travel Research. http://dx.doi.org/
analytics for the tourism and hospitality industries. He is
0047287516636236. Article number: 0047287516636236.
a recipient of Emerging Scholar of Distinction award by
Mudambi, S. M., & Schuff, D. (2010). What makes a helpful online review? A study
the International Academy for the Study of Tourism. He is
of customer reviews on Amazon.com. MIS quarterly, 34(1), 185e200.
a board member of International Federation for IT and
Nieto, J., Hern andez-Maestro, R. M., & Mun ~ oz-Gallego, P. A. (2014). Marketing de-
Travel & Tourism (IFITT) and editorial board member of
cisions, customer reviews, and business performance: The use of the Toprural
several international journals including Journal of Busi-
website by Spanish rural lodging establishments. Tourism Management, 45,
ness Research, Journal of Travel Research, and Journal of
115e123.
Hospitality and Tourism Research.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and
Trends in Information Retrieval, 2(1e2), 1e135.
Pantelidis, I. S. (2010). Electronic meal experience: A content analysis of online
restaurant comments. Cornell Hospitality Quarterly, 51(4), 483e491.
Park, S., & Nicolau, J. L. (2015). Asymmetric effects of online consumer reviews. Qianzhou Du is a Ph.D. student in the Department of
Annals of Tourism Research, 50, 67e83. Business Information Technology at Virginia Tech, USA.
Pearce, P. L., & Wu, M. Y. (2015). Entertaining international tourists: An empirical His research interests include business intelligence, text
study of an iconic site in China. Journal of Hospitality & Tourism Research, analytics, social media analytics, crowd sourcing, and
1096348015598202. business analytics for the tourism and hospitality in-
Phillips, P., Zigan, K., Silva, M. M. S., & Schegg, R. (2015). The interactive effects of dustries. He is a recipient of ICTAS doctoral scholarship of
online reviews on the determinants of swiss hotel performance: A neural Virginia Tech.
network analysis. Tourism Management, 50, 130e141.
Racherla, P., & Friske, W. (2012). Perceived ‘usefulness’ of online consumer reviews:
An exploratory investigation across three services categories. Electronic Com-
merce Research and Applications, 11(6), 548e559.
Radev, D. R., Jing, H., Stys, M., & Tam, D. (2004). Centroid-based summarization of
multiple documents. Information Processing & Management, 40(6), 919e938.
Ruths, D., & Pfeffer, J. (2014). Social media for large studies of behavior. Science,
346(6213), 1063e1064.
Schuckert, M., Liu, X., & Law, R. (2015a). A segmentation of online reviews by lan-
Yufeng Ma received the BE degree in Computer Science
guage groups: How English and non-English speakers rate hotels differently.
from Wuhan University, China in 2012. Currently he is a
International Journal of Hospitality Management, 48, 143e149.
Ph.D. student in the Department of Computer Science at
Schuckert, M., Liu, X., & Law, R. (2015b). Hospitality and tourism online reviews:
Virginia Tech, USA, where he is co-advised by Dr. Patrick
Recent trends and future directions. Journal of Travel & Tourism Marketing, 32(5),
Fan and Dr. Edward. A. Fox. His research interests include
608e621.
data mining, topic modeling in text mining, computer
Scott, S. V., & Orlikowski, W. J. (2012). Reconfiguring relations of accountability:
vision, deep learning and artificial intelligence in general.
Materialization of social media in the travel sector. Accounting, Organizations
and Society, 37(1), 26e40.
Sparks, B. A., Perkins, H. E., & Buckley, R. (2013). Online travel reviews as persuasive
communication: The effects of content type, source, and certification logos on
consumer behavior. Tourism Management, 39, 1e9.
Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity
and other methodological pitfalls. arXiv preprint arXiv:1403.7400.
Wang, G., Liu, X., & Fan, W. (2011). A knowledge adoption model based framework
for finding helpful user-generated contents in online communities. ICIS 2011
Proceedings. Paper 15. Available at: http://aisel.aisnet.org/icis2011/proceedings/ Weiguo Fan is a full professor of the Department of Ac-
knowledge/15. counting and Information Systems and Director of the
Werthner, H., & Klein, S. (1999). Information technology and tourism: A challenging Center for Business Intelligence and Analytics in the
relationship. Springer-Verlag Wien. Pamplin College of Business. He specializes in business
Xiang, Z., Schwartz, Z., Gerdes, J., & Uysal, M. (2015). What can big data and text intelligence, data and text analytics, social network
analytics tell us about hotel guest experience and satisfaction? International analysis, social media analytics, business data mining
Journal of Hospitality Management, 44(1), 120e130. using both internal and external data, complex system
Xiang, Z., Wo €ber, K., & Fesenmaier, D. R. (2008). Representation of the online analysis and modeling, and consumer behavior modeling
tourism domain in search engines. Journal of Travel Research, 47(2), 137e150. and analysis. He has applied analytics in financial state-
Xie, K. L., Zhang, Z., & Zhang, Z. (2014). The business value of online consumer ment risk analysis, search engine optimization, targeted
reviews and management response to hotel performance. International Journal and precision marketing, sentiment and opinion mining
of Hospitality Management, 43, 1e12. for product quality control, competitive intelligence, and
Ye, Q., Law, R., & Gu, B. (2009). The impact of online user reviews on hotel room deception detection in large scale business transactions.