The more the merrier? Number of reviews versus score on TripAdvisor and Booking.com

ABSTRACT The aim of this research is to confirm whether there is a relationship between the number of reviews and the hotel’s score on Booking.com and TripAdvisor and whether the relationship is different depending on the geographical area. Moreover, the study endeavors to confirm whether the number of reviews influences the score on each website. With the analysis of about 13,899 hotels in 146 cities, our findings suggest that there is some lineal relationship between the amount of reviews and the score on TripAdvisor but not on Booking.com. Moreover, by regions on TripAdvisor hotels from Middle East and Africa and Asia and Pacific have a stronger relationship between reviews and score than those from Europe and America.


Introduction
The online users reviews about goods and services have become more important because they influence on other consumers (Boyd, Clarke, & Spekman, 2014) and are an important information source for decision support (Dellarocas, Zhang, & Awad, 2007). The user control can provide an experience of empowerment and an enriched sense of satisfaction with the outcome of choice (Wathieu et al., 2002).
According to the existing literature, it confirms that there is a positive relationship between the number of reviews (also called volume) and the purchase intention or the increase on sales in different products or services (Dellarocas et al., 2007;Duan, Gu, & Whinston, 2008;Godes & Mayzlin, 2004;Liu, 2006).
In the hospitality industry, a large number of reviews allow the hotel to have more visibility because they are exposed more frequently; could reflect better the reality of hotel quality; and can lead the idea that more reviews, more guests, so more popular (Xie, Zhang, & Zhang, 2014). Online consumer reviews could be more influential for hotels with a larger volume of reviews because they are more trustworthy (Zhu & Zhang, 2010 The electronic word of mouth (eWOM) has been widely studied in the hospitality industry. Researchers have shown that a large number of reviews generate effects on bookings (Sparks & Browning, 2011;Vermeulen & Seegers, 2009) and that a positive eWOM generates positive attitudes and increases sales opportunities (Hong, 2006;Karakaya & Barnes, 2010;Lee, Park, & Han, 2008;Steffes & Burgee, 2013) but, to the best of our knowledge, only few researches have put their attention to know whether the number of reviews influence the rates awarded by past users. In this sense, the research conducted by Melian-Gonzalez, Bulchand-Gidumal, and Gonzalez Lopez-Valcarcel (2013, 274) confirmed "the relationship between valence (positive negative or neutral reviews) and volume (the amount of eWOM disseminated), as the number of reviews increases, the valence becomes more balanced, and the negative effect is mitigated." The aim of this research is to confirm whether there is a relationship between the number of reviews and score on two of the main websites used by the hospitality industry (Booking.com and TripAdvisor) and whether the relationship is different depending on the geographical area and on the website chosen.
Moreover, this study endeavors to confirm whether the number of reviews influences the score on each website. This introduction is followed by a review of the existing literature on the subject. Then the methodology is presented, including the information about data collection and the study objectives are set out, the results are put forward, leading finally to a section for discussion and conclusions of this study.

Theoretical background
In what follows, we will introduce eWOM and user-generated content (UGC) in the hospitality industry, continuing with the existing studies on the significance of the number of reviews, and finally we will introduce some of the most popular online source of hotel information such as TripAdvisor and Booking.com.

User-generated content (UGC) and electronic word of mouth (eWOM)
The use of Web 2.0 applications for the sharing of UGC and the creation of new value added services are enormous (Sigala, 2008). The UGC may serve as a new form of word-of-mouth for products and services (Ye, Law, Gu, & Chen, 2011).
The WOM phenomenon has been studied in marketing (Arndt, 1967) and refers to client communications relating to a consumer experience (Anderson, 1998).
The way in which WOM reviews are made, with the advent of Internet, has been extended thanks to consumer-opinion portals (COPs) (Burton & Khammash, 2010), which allow consumers to review products and services, and other people to view these online reviews.
WOM, propagated via Web 2.0, is known as eWOM (Hennig-Thurau, Gwinner, Walsh, & Gremler, 2004) and according to the most cited definition, eWOM is "all informal communications directed at consumers through Internet-based technology related to the usage or characteristics of particular goods and services, or their sellers" (Litvin, Goldsmith, & Pan, 2008, 461).
The importance of personal recommendations to the tourism industry is considerable (Butler, 1980;Cohen, 1972;Morgan, Pritchard, & Piggott, 2003) because of the intangible nature of tourism products. In tourism, eWOM has drawn the attention of some researchers from the viewpoint of the independent traveler who uses personal recommendations offered in COPs by other users on sites like TripAdvisor, as the independent traveler seems to rely more and more on them (Jeacle & Carter, 2011;Ye et al., 2011). The online reviews have a dual role, functioning both as informant and as recommender (Park, Lee, & Han, 2007), and are an important source of information to travelers (Pan, MacLaurin, & Crotts, 2007) and "are like narrative stories that enable prospective travelers to relive others' past experience" (Chen & Law, 2016: 364).
EWOM-and the traditional WOM-have been widely analyzed in many studies. They conclude that positive eWOM generates positive attitudes and increases sales opportunities, while negative eWOM generates the opposite effect (Hong, 2006;Karakaya & Barnes, 2010;Lee et al., 2008;Steffes & Burgee, 2013), particularly noticeable in the hospitality sector, as has been shown by numerous studies (Pantelidis, 2010;Susskind, 2002;Vermeulen & Seegers, 2009;Ye, Law, & Gu, 2009). The researches on eWOM in hospitality industry could be grouped into two general lines: the factors related to the generation of comments, and the impacts these comments have on consumers and on company perspective (Cantallops & Salvi, 2014).
Such is the significance of UGC that has forced hoteliers to design organizational strategies of continual vigilance and monitor UGC (Baka, 2016).
A study analyzing business tourists indicated that they read both positive and negative e-comments, but that they make decisions based on positive e-comments (Memarzadeh, Blum, & Adams, 2015). Currently, the reviews are potentially effecting traveler decision-making in terms of forming opinions and narrowing choices (Barreda & Bilgihan, 2013).
Consumers' reviews generate more confidence than information from a company itself (Gretzel & Yoo, 2008;Vermeulen & Seegers, 2009), resulting in an increase in the selling prices of rooms for every extra point in the ratings on TripAdvisor (Anderson, 2012).

Significance of the number of reviews
Some authors argue that a large number of reviews may encourage potential consumers to decide to buy a product that many other people have also bought (Dellarocas et al., 2007;Godes & Mayzlin, 2004;Park et al., 2007) as it may be seen as a sign of popularity (Zhang, Zhang, Wang, Law, & Li, 2013;Zhu & Zhang, 2010). Viglia, Furlan, and Ladrón-de-Guevara (2014) concluded that a good or bad review is not the only relevant factor; it is the number of reviews, giving credibility to the theory that volume counts more than valence (Liu, 2006).
Moreover, a study of 16,000 European hotels on TripAdvisor concluded that as the number of a hotel's reviews increases, the ratings in the reviews are more positive (Melian-Gonzalez et al., 2013).

TripAdvisor
TripAdvisor is one of the most influential eWOM sources in the hospitality and tourism context (Yen & Tang, 2015). Because of the significance that TripAdvisor has acquired for any accommodation facilities' reputation, it is often the hotel managers' first point of call (Xie et al., 2014).
Numerous studies have been based on data provided by TripAdvisor (Ayeh, Au, & Law, 2013;Mayzlin, Dover, & Chevalier, 2014;O'Connor, 2008O'Connor, , 2010Vermeulen & Seegers, 2009;Wilson, 2012). Some point out that the percentage of consumers who consult TripAdvisor before booking a room in a hotel has been increasing (Anderson, 2012), while others suggest that online rating lists are more useful and credible when published by wellknown online travel communities like TripAdvisor (Casaló, Flavián, Guinalíu, & Ekinci, 2015).
In TripAdvisor's own words, it "is the world's largest travel site, reaching 340 million unique monthly visitors, and more than 350 million reviews and opinions". TripAdvisor takes into account to calculate the overall score all the reviews awarded by past users, even the oldest reviews (TripAdvisor, 2016).

Booking.com
Travel intermediary websites such as Booking.com are a popular online source of hotel information, as are social media websites like TripAdvisor and Facebook (Sun, Fong, Law, & Luk, 2015).
Booking.com B.V. is part of the Priceline Group, the world leader in booking accommodation online. Each day, over 1,000,000 room nights are reserved on Booking.com. Established in 1996, Booking.com is available in more than 40 languages, and offers 940,759 active properties in 223 countries and territories (Booking.com, 2016) After booking a room through Booking.com and staying in the accommodation, the customer receives an invitation via e-mail to write a comment about the experience. So Booking.com only publishes reviews from users that have booked at least one night in a lodging property through its website and have stayed there. The reviews older than 24 months on Booking.com are not taken into account to calculate the property's overall score, but are shown on this website because "may still be helpful when choosing the perfect place to stay" (Booking.com, 2016).

Research aim and methodology
The main goal of this study is to know whether there is a relationship between the number of reviews and the hotel's score, this aim tries to fill a research gap pointed by Melian-Gonzalez et al. (2013) about comparing hotel reviews on different websites. The websites chosen for the research are two of the main in the hospitality industry, TripAdvisor and Booking.com. We would like to know whether, in case there is some relationship between the number of reviews and the score, there are differences depending on the website chosen and on the geographical area, as the research conducted by Melian-Gonzalez et al. (2013) was done only with European hotels.
Moreover, the study endeavors to confirm whether the number of reviews influences the score on each website and according to the literature review, the hypothesis stated is as follows: the larger (or smaller) the number of online travel reviews, the better (or worse) the score is.
In this study, we analyze the hotels of the top destinations in the world according to the TripAdvisor Ranking 2015, dividing them into four regions, as proposed by Banerjee and Chua (2016): America (AME), Asia and Pacific (ASP), Europe (EUR), and Middle East and Africa (MEA). We then split these regions into countries and cities.
In April 2016, we automatically gathered the rankings of the hotels on Booking.com and TripAdvisor: the number of reviews on both websites, the ranking and scoring, hotel name, city and country, and the hotel category (the latter of these variables was the hotel star category according to Booking.com).
The data were collected using a web browser automatically controlled that simulated a user navigation (clicks and selections) for TripAdvisor and Booking.com. Once the data was available, a new data set was created by joining together corresponding data for a given hotel from both websites. The join criteria used was, for every city: • If hotel name was exactly the same.
• Else if the hotel name from one site was contained, entirely, on the name from the other site (the choosing of container and contained was depending on name length, container chosen as the longest name available). • If no match was found, then the Ratcliff/Obershelp similarity (Ratcliff & Metzener, 1988) was computed between each possible pair of names (one from Booking.com and one from TripAdvisor), the list of distances was then sorted, and the greatest one (best match) was chosen, if that similarity was higher than 0.85 (i.e., 85% of letters match considering position), the pair was chosen, and the names removed from both lists. The data collection from each destination was conducted the same day from both sites in order to have minimum variation, since both websites are active and the data can be modified over time On Booking.com, we filtered the property type by selecting "Hotels" and to obtain the ranking we chose the option "Review score", with the rated by "All reviewers". Once gathered, all the hotels in each city were compared with TripAdvisor taken into account only "Hotels" on TripAdvisor sorted them by "Ranking". Having obtained the two lists (69,997 hotels on TripAdvisor and 40,580 on Booking.com), we automatically compared the hotels listed on both websites. The result was 20,880 hotels that matched on both websites, the missing values were eliminated from all variables and the final result was 19,660 hotels. In order to avoid possible bias, only cities with at least 30 hotels and hotels with at least 30 reviews on Booking.com and on TripAdvisor were selected, so, a total of 13,899 hotels were analyzed, as shown in Table 1.
The statistical calculations were performed using R version 3.2.1. In order to check the hypothesis, we defined a linear model for the score on TripAdvisor (Ar) and Booking.com (Br) versus the number of reviews on TripAdvisor (Aw) and Booking.com (Bw) as follows: Br ¼ Bw β 2 þ ε 2 for Booking:com (2) In order to check the hypothesis, it was necessary to estimate whether the linear regression model allowed the score to be inferred and if it was statistically significant. In other words, the null hypothesis and alternative hypothesis could be stated as follows: Under the null hypothesis, the following test statistic (Faraway, 2014): follows a Fisher distribution, where n = (the number of variables), p = 13,899 (number of samples), rss ¼ P n i¼1 Ar i À Aw i Á β ð Þand tss ¼ P n i¼1 ar i À ar ð Þ: Here we denote ar i and aw i as the sample values of Ar and Aw, respectively, and ar the mean value of Ar: Additionally, we also show the Pearson correlation coefficient in order to determine the strength of the correlation. Missing values were eliminated from both variables to obtain identical pairs.
The test statistic and the Pearson correlation coefficient between Booking. com ranking and reviews adopted expressions analogous to those of TripAdvisor.

Results
To check the relationship between the number of reviews posted on Booking. com and TripAdvisor and the scores, the Pearson correlation coefficients were calculated by regions.
By regions, the p-value for the test statistic (Equations (1) and (2)) is p < .001, as can be seen in Table 2; therefore, the null hypothesis is refused confirming the relationship between number of reviews and score on TripAdvisor and on Booking.com (except EUR), the same that happens in some cities.
The results show that the Pearson correlation between both scores and reviews was higher in MEA, especially on TripAdvisor. On Booking.com, there was a very weak relationship between score and reviews in ASP and in MEA. In AME, the correlation was the opposite; a larger number of reviews led to a worse position in the ranking, and in EUR was not statistically significant.
At the bottom of the ranking are some major cities, such as New York City or Chicago, which had a weak inverse correlation between online reviews and scores on Booking.com, indicating that the higher the number of reviews, the worse the score was. On TripAdvisor, Chicago and New York City did not show any statistical significance.
To check whether the quantity of reviews influences the score, a simple linear model was calculated. By regions, the results show a very weak explanatory power on TripAdvisor (Adj. R 2 between .056 and .210), and on Booking.com (Adj. R 2 between .005 and .021), statistically significant p < .001.
Again, by cities, the previous simple linear regression to predict score on TripAdvisor and on Booking.com based on number of reviews on TripAdvisor and on Booking.com, respectively, was calculated.
The results show that the number of reviews on TripAdvisor and on Booking.com predicted the score on these website, but only in certain cases.
On TripAdvisor, the cities with a higher R 2 were Vancouver, Yogyakarta, Sorrento, Kochi, Dublin, and Zermatt, as shown in Table 3. Cohen (1988) suggested R 2 values as follows: 0.26 (substantial), 0.13 (moderate) and 0.02 (weak). Only R 2 of 12 cities (1 in AME, 4 in ASP, 4 in EUR, and 3 in MEA) are considered substantial and statistically significant, and as a result of that, the power of the reviews on TripAdvisor in explaining the score. On Booking.com, only two cities from ASP had a R 2 ≥ .26. As shown in Table 4, the cities with a higher R 2 on Booking.com were Nha Trang, Hue, and Seoul.
As an example, for hotels in Vancouver, the results of the regression indicate that the predictor (number of reviews on TripAdvisor) explains 39.7% of the variance (F (1,48) = 30.35, p < .001).
Moreover, as the number Therefore, the results partially confirm the hypothesis that the larger the number of reviews, the better score is on TripAdvisor but not on Booking.com.

Discussion
Referring to the hypothesis, we concluded that it was partially confirmed. From the entire data set, a stronger correlation was observed on TripAdvisor than on Booking.com and, from the data split by regions, the correlation coefficient was higher in MEA than in ASP, EUR, and AME on TripAdvisor, and on Booking.com the coefficient was higher in ASP but weak.
Depending on the cities analyzed, the behavior was different. Hence, destinations such as Vancouver, Abu Dhabi, or Yogyakarta had a statistically significant correlation coefficient above 0.6 on TripAdvisor. In these cases, the results therefore indicate a relationship between the quantity of reviews and the score.
More than 100 cities of the total analyzed show a statistically significant correlation coefficient above 0.26 on TripAdvisor, and only 47 cities on Booking.com. By cities, the results show that the explanatory power of the model is lower on Booking.com than on TripAdvisor; very few cities explain that the number of reviews predicts the scores on Booking.com, only Nha Trang and Hue show that the number of reviews explained the 46% and the 33% of the variance, respectively, and are statistically significant.
The results confirm some relationship between the amount of reviews and the score on TripAdvisor, as pointed out by Melian-Gonzalez et al. (2013), who suggested that the more reviews there are, the higher the score is. However, in this study we point out that this trend is not the case worldwide, and that scores do not behave in the same way, as the score on TripAdvisor has a stronger relationship with the reviews than on Booking.com. By regions, the correlation in MEA is higher in both websites than in the rest of the regions.
Given the theory of eWOM that volume counts more than valence (Liu, 2006) and that positive eWOM generates positive attitudes and increases sales opportunities, (Hong, 2006;Karakaya & Barnes, 2010;Lee et al., 2008;Pantelidis, 2010;Steffes & Burgee, 2013;Susskind, 2002;Vermeulen & Seegers, 2009;Ye et al., 2009), with this research we close the triangle confirming that on TripAdvisor there is also a relationship between volume and score and rejecting that, in general, on Booking.com there is such relationship. It could be explained because on this website the reviews that are older than 24 months are not taken into account to calculate the hotel's score, and the conclusion pointed by Melian-Gonzalez et al. (2013, 279) that "as the number of reviews of a hotel increases, the ratings are more positive", with the elimination of the oldest reviews on Booking.com, the possible balancing effect of the valence, disappears.
Moreover, there are other items that influence the overall score on the websites such as hotel management if it is part of a chain or is independent (Banerjee & Chua, 2016), the room price (Martin-Fuentes, 2016;Öğüt & Onur Taş, 2012) or the hotel category (Martin-Fuentes, 2016).

Conclusions
This study contributes to the hospitality literature by explaining that the behavior in the relationship between the number of reviews and the score differs from one website to the other, and for one city to the other. To encourage customers to write reviews on COPs or on other websites about their experience cannot be guaranteed always as a good result on the scores and it is not possible to be sure that the relationship between online travel reviews and score is a question of cause-effect (Mellinas Cánovas, 2015).
Keeping the old reviews to get the score of a hotel causes that the scores tend to be more positive but, on the other hand, does not show the current reality of hotels, instead, Booking.com deletes old reviews, which allows obtaining an overall score that is closer to the recent situation.