Online travel review rating scales and effects on hotel scoring and competitiveness

Purpose – The purpose of this paper is to determine whether different scales andways to collect reviews and ratings found on online travel agencies (OTAs) can affect hotels, and whether hotels obtain the same or different evaluations. Design/methodology/approach – Hotel ratings from five OTAs in four European markets were collected and compared in pairs. An initial comparison was made with the hotel scores of each OTA to show what a typical user would see. Then, a rescaled score (0-10) was used to compare all the OTA scales appropriately and to distinguish betweenwhat customers observe andwhat the reality is. Findings – The results reveal that Booking.com that uses a scale (2.5-10) and Agoda with a scale (2-10) seem to give higher rating scores than Atrapalo (1-10), Travel Republic (0-10) and hotel reservation service (1-10). However, when the scores are rescaled (0-10), the worst ratings are found on Booking. com followed by Agoda. Practical implications – OTAs should include, next to the scores, the scale used to rate hotels so as to provide users with better and clearer information. Moreover, rating questionnaires should match the verbal denominationswith their numerical values to avoid biased ratings. Social implications – OTAs and hotel managers are losing information provided by customers because customers are not aware of the scale when rating hotels. Moreover, hotel ratings are used by potential customers to obtain a clearer image of an establishment. However, if some hotels are being overrated by some scales, customersmight have higher expectations, whichmay not bemet. Originality/value – The unique rating scales of Booking.com and Agoda provide additional insights into their hotel evaluations, which seem to be apparently higher when in fact they are not.


Introduction
Online travel reviews (OTRs) have grown exponentially in recent years, transforming the tourism industry (Buhalis and Law, 2008), especially the exchange of information and the social media have changed consumer behaviour (Femenia-Serra et al., 2019). OTRs are written by tourists who provide opinions and evaluations about their travel experiences on platforms belonging to community-based sites or transaction-based online travel agencies (OTAs) (Xiang et al., 2017) . These OTRs consist not only of text space for users to describe their travel experiences but also a numeric questionnaire that allows customers to rate the services offered or the overall experience. In this sense, recent research shows that more priority is given to rating symbols than to textual material (Aicher et al., 2016) because of an excess of information.
Nowadays, hotels rely on online distribution channels, especially OTAs (Leung, 2019), therefore, hotels that have higher scores on OTA websites are better positioned in the ranking when customers search according to best reviewed hotels. Consequently, a better score contributes to increasing reservations (Vermeulen and Seegers, 2009), increasing online hotel room sales (Cezar and Ö güt, 2016) and leads to increased occupancy rates .
OTAs and consumer opinion platforms use different systems for collecting numerical ratings, some of them use a 1-5 rating scale such as TripAdvisor, Expedia or Hotels.com, and others an apparent 1-10 rating scale such as Booking.com, Agoda or HRS.
A study revealed that Booking.com uses a scale from 2.5-10, inducing apparent distortions in scores given to hotels (Mellinas et al., 2015), but the effects of this unique scale have not been studied. The same authors tested a sample of US hotels by comparing their scores with those on the Priceline website. They concluded that hotels get better scores on Booking.com (Mellinas et al., 2016). Research comparing the scale of  with TripAdvisor (1-5) with 20 million reviews of more than 20,000 hotels worldwide concluded that the Booking.com scale benefits one-to three-star hotels in Europe and America and is detrimental to five-star hotels worldwide . Moreover, important differences have been identified in reviews registered on TripAdvisor, Expedia and Yelp, in several aspects, such as ratings (Xiang et al., 2017) and on different OTAs offering hotels from Hong Kong (Leung et al., 2018).
Thus, it seems that there is still an important gap in the development of research into differences in hotel rating scales and scores on various websites. This is of some concern as the increasingly frequent use of these information sources requires reliable and precise rating scales to avoid distortions and errors in the results obtained, as has been detected in some cases. Thus, this research aims to analyse in-depth the hotel rating scales of five OTAs (Booking.com, Agoda, Atrapalo, HRS and Travel Republic[TR]), focusing on those whose systems show an apparent 0-10 or 1-10 scale, to determine whether different scales can lead to significant score variations, and to determine which OTA rating scale provides better or worse hotels score.

Online travel agencies and hotel reviews
OTAs have made great efforts in terms of usability, security and quality of service (Bernardo et al., 2012;Chen and Kao, 2010;Chiou et al., 2011;Cho and Agrusa, 2006;Fu Tsang et al., 2010;Kaynama and Black, 2000;Park et al., 2007). Moreover, one of the most important attributes valued by OTA users are the reviews (Kim et al., 2007) that allow users to have a better idea of the services being offered before purchasing.
However, the huge number of reviews published on OTAs and on other platforms about products and services lead to an information overload that makes decision-making difficult (Lamest and Brady, 2019;. Nevertheless, there are different ways to reduce the options when choosing hotel accommodation, such as other users' ratings or rankings, which serve to reduce the time and effort in the search for information about products or services (Filieri and McLeay, 2014). Thus, customers use them to make quicker and more efficient decisions (Browning et al., 2013) because ratings help not only customers' decision-making, but also provide visibility for hotels (Nieto-Garcia et al., 2019).
In the academic field, the relevance of hotel reviews in the sector from the point of view of OTAs, users and hotels, has become an increasingly popular subject, generating a large number of publications (Linchi et al., 2017;Schuckert et al., 2015;Serra Cantallops and Salvi, 2014). Large databases of thousands or even millions of hotel reviews are being used (Martin-Fuentes and Mellinas, 2018), usually supported by automatically controlled systems (Radojevic et al., 2015) in a quick, cheap and convenient way. However, the use of this information can imply important errors if the process of capturing this information is not known in sufficient depth.
All OTAs selected for this study use an apparent 0-10 or 1-10 rating scale, but the reality is that two of them, Booking.com and Agoda, start the scale at 2.5 and 2, respectively. This can lead to confusion among users, who could think that the minimum score is 0 or 1 in a typical scale. This confusion has been identified in several types of research erroneously analysing the distinct measurement scales of some OTAs. Mellinas et al. (2015) identified 13 articles that made the mistake of considering the Booking.com scale to be 0-10 or 1-10. Since then, most authors have taken this scale into account when they have used data from Booking.com (24 citations in 3 years). However, many authors are repeating the mistake of considering that all OTAs use a scale of 0-10 or 1-10 ( Abrate and Viglia, 2016;Castro and Ferreira, 2018;Ert et al., 2016;Kim and Park, 2017;Leung et al., 2018;Pokryshevskaya and Antipov, 2017, among others). Even a UNWTO report indicated that Booking.com uses a 1-10 scale (Blomberg-Nygard and Anderson, 2016).
Often, this error does not affect the results of the studies carried out, because they are qualitative studies based on the content of the reviews. The most important inaccuracies occur when using scores from different websites and trying to homogenize scales as stated by Leung et al. (2018): By contrast, Booking.com, Agoda, Priceline, and Kayak used a 10-point scale [. . .] To standardize the baseline scores for comparison in this study, the 10-point scores were divided by two to achieve a 5-point scale score

Scales
OTAs encourage consumers to write reviews about services once they have used them by sending an e-mail to the person who bought and consumed the service. They use different types of questionnaires to collect consumers' opinions, as summarized in Table I. Booking. com asks guests to rate six categories or attributes, and the hotel's final score is the arithmetic average of them. The effects of these ratings have been studied by Nieto-Garcia et al. (2019), who conclude that not all the attributes play the same role for revenue maximization. Agoda uses a similar system, also with six categories and Atrapalo asks customers to rate eight categories.
Although categories to rate hotels are quite similar on OTAs, the questionnaires are different when it comes to the number of answers in each question. On Booking.com, before 2015, there were four-point answers in each category in which a designation of "poor" assigned a 2.5 rating to the hotel, "fair" was rated with a 5, "good" with a 7.5 and "excellent" with a 10 (Mellinas et al., 2015). Since 2015, Booking.com has continued to use the same system, but now uses smiley faces instead of the mentioned designations.
All OTAs seem to use a Likert scale in their questionnaires. This was first used in research to measure the five major "attitude areas" in psychology (Likert, 1932). However, there is no consensus as to the number of points for the answers to surveys using a Likert scale (Bisquerra-Alzina and Pé rez-Escoda, 2015; Boone and Boone, 2012), nor in the number of points for the answers in an OTA survey .
Most research that uses a Likert scale applies between three-to seven-point responses, but there is a wide range of points, between 2 and 20 (Bisquerra-Alzina and Pé rez-Escoda, 2015), although using more than a five-point scale complicates the denomination of each point because tags normally accompany the Likert scale. Bisquerra-Alzina and Pé rez-  (2015) recommend the use of an 11-point Likert scale (from 0 to 10) because it increases the sensitivity of the results. Dawes (2008) analysed the results of the same questions using five-, seven-, and ten-point scales concluding that the latter produces slightly lower scores compared to the former ones. Leung et al. (2018) also concluded that the results of OTA ratings that use a five-point scale were higher than those from a ten-point one, although, as already mentioned, this study did not take into account that the Booking.com scale was not from 0 to 10.
Furthermore, Worcester and Burns (1975) detected that a four-point scale without a midpoint seemed to get more answers towards the most positive part of the scale, whereas Adelson and McCoach (2010) confirmed that there were no differences in the results with a four-point and a five-point scale, so a neutral point was not so important.
Although Booking.com uses a four-point Likert scale, each point is multiplied by 2.5 to reach 10 as a maximum score, and the minimum is not 0 but 2.5 (Mellinas et al., 2015). Agoda uses a five-point Likert scale, in which the minimum score is 2, and the maximum is 10, so each point is multiplied by 2. TR uses an eleven-point Likert scale from 0 to 10, and HRS a ten-point Likert scale from 1 to 10. And Atrapalo uses the system of a ten-point Likert scale from 1 to 10.
Related to the tags that accompany the scale, Worcester and Burns (1975) confirmed that the interpretation sometimes cannot be adjusted not because "different words mean different things, but that the same word can be made to mean different things as the context changes" (Worcester and Burns, 1975: p. 182).
It is worth mentioning that the description of the answers on some OTAs are more positive than negative, thus in the four-point Likert scale on Booking.com, the second point used to be described as "fair," which is a neutral point. The same happens today with the use of smiley faces; the second point is a neutral smiley, drawn as.

Research aim and hypothesis
To provide additional insights about OTA rating scales, this study aims to analyse the rating scales that apparently use a 0-10 or a 1-10 scale by answering the research question: Do OTA rating scale systems provide the same score results for the same hotels? Conversely, do the rating scales lead to significant rating variations? Which OTA rating scale produces better/worse scores for the same hotels?
Moreover, the specific objective of this research is to compare the rating scales of Booking. com (2.5-10), Agoda (2-10) and other OTAs that use a 0-10 or 1-10 measurement scale. This will contribute to a better understanding for the scarce literature about the effects of these "singular" rating scales.
H2. Booking.com and Agoda rating scales rescaled to 0-10 provide lower hotel scores than the OTAs using the original 0-10 scales.
H3. Ten-and eleven-point Likert scales show lower scores than four-and five-point scales.

Methodology
To know whether OTA rating scales provide the same rating results for the same hotels, a search was performed among a wide range of OTAs operating in Europe that use an apparent 0-10 or 1-10 scoring system, identifying those that implement "verified review" systems.
The same hotels were selected in each comparison and, to minimize possible biases, we set a requirement to select a hotel with a minimum of 40 reviews. This condition made it difficult to identify valid websites for our study because, although there are OTAs that use this scale, some do not have significant activity in Europe (Bookit, Despegar, Malapronta, Ctrip, etc.) or do not have a significant number of hotels with the minimum of 40 reviews (Hoteliers, Splendia, etc.).
As not all the hotels operate with all OTAs, it was not possible to compare the same hotels at the same time on all OTAs. This is the reason why the comparison of the hotels was performed with OTAs in pairs, so that we could compare exactly the same hotels from the same destinations in each comparison.
Finally, we selected five platforms that met the above conditions, but, despite having apparently identical scoring systems, they showed relevant differences, as can be observed in Table I. TR uses a 0-10 scale; HRS and Atrapalo a 1-10 scale, whereas Agoda uses a 2-10 scale, and Booking a 2.5-10 scale. Moreover, HRS and Booking.com delete reviews after a certain period, but the three other websites do not seem to delete old reviews.
The three selected websites that use a conventional system (scale 0-10 or 1-10) have a limited geographical scope. This made finding a significant number of hotels with 40 reviews on several of these websites unattainable. However, Agoda and Booking.com have worldwide implementation, allowing us to find the same hotels with more than 40 reviews in the websites identified.
Booking.com has been used in numerous studies as a source of information as already mentioned. Agoda has also been used in various investigations, especially focusing on the Asian market, in some cases assuming wrongly that it uses a scale that ranges from 0 to 10 (Zhou et al., 2014) or 1 to 10 (Muangon et al., 2014). In other cases, studies have focused on semantic analysis, so the scoring system does not affect the results obtained (Haruechaiyasak et al., 2010;Patel et al., 2015). HRS has been used in very specific cases focused on its geographical sphere of influence (Jannach et al., 2014;Schütze, 2008), and Atrapalo has been used in studies on the Spanish market (María-Dolores et al., 2012;Poggi et al., 2007). The TR database has not been used for academic research, as far as the authors are aware.
Data were taken manually on different hotel samples in Europe with different locations during May 2015 starting with the largest cities of each sample, analysing all the hotels in each city and randomly selecting hotels with at least 40 reviews. This is the reason most of the hotels analysed in this research were located in large cities, as can be seen in Table II. Moreover, the hotels were of all categories and the largest operate with different OTAs. The shortage of hotels that fulfilled the conditions prevented larger samples from being used and from being able to realize a specific sample design, which would have allowed the sample to be segmented by hotel or client type.
When the number of hotels indicated for each sample was reached, the selection of new cities and hotels was stopped. In all cases, there was a Web of reference that uses a system of conventional assessment, whose scores were compared with those of Agoda and Booking.com in three of the cases and, exclusively with Booking.com in the fourth case. In addition, we compared scores obtained from Agoda and Booking.com for the three cases in which it is possible. This was the same-pair sample, i.e. the scores of the assessment of the different websites in each market were performed on the same hotels.
As the aim of this study is to determine whether the rating scales of OTAs provide the same results for the same hotels, a mean comparison Student's t-test distribution for same pairs was performed divided by markets (Table III) and OTAs (Table IV) with the mean ratings, as announced on each OTA (Row: Rating OTA) and with the rescaled ratings from 0 to 10 (Row: Rescaled rating OTA).
Statistical calculations were performed with SPSS V.24 and the analyses were carried out with normal ratings (ratings assigned to each hotel by each OTA) to compare the results a typical user would see when entering each OTA to look for a hotel. To compare ratings with the same scales, the scores of all hotels in each OTA were rescaled to 0-10 with the minmax normalization method because of its simplicity.

Results
The results of the Student's t-test distribution for same pairs by markets are shown in Table III and by OTAs in Table IV.

Results from scores announced by online travel agencies
The results of Student's t-test distribution for same pairs (Table III) show that Booking.com scores are significantly higher than other platforms in all markets when the rating taken is the one announced by Booking.com for each hotel, as already confirmed (Parra et al., 2018). Results show minor variations when comparing Booking.com scores with Agoda, where the differences are lower. Worth noting is the case of hotels in Germany, where the  The results comparing Agoda with HRS in Germany confirm that Agoda scale system also gives higher scores. However, when comparing Agoda with Atrapalo or TR, the results are not statistically significant. The highest statistical score mean difference is in Germany; on Booking.com, the hotels obtain 7.947, and hotels on HRS get 7.609 (t = 10.78; p < 0.001).
The results in Table IV show that, when comparing Booking.com original scores with any of the OTAs analysed, the mean hotel score is higher and statistically significant in all cases. With Agoda, the comparison is made with only three of the four samples, because this site does not have enough hotels with the required minimum of reviews for the Spanish coast hotels sample.
Thus, the results confirm the first hypothesis that the Booking.com rating scale (2.5-10) and Agoda rating scale (2-10) provide higher hotel scores compared to the OTAs that use 0-10 or 1-10 scales.

Results from rescaled ratings from 0 to 10
This section shows the results a user of any of those OTAs can observe with rescaled ratings (0-10), the results are different because the worst scores are obtained on Booking. com followed by Agoda.
First, Booking.com shows the lowest ratings when compared with any other OTA, even with Agoda with all the data set, as observed in Table IV, although they use similar measurement scale systems. However, when markets separate the results, there are no mean differences between Booking.com and Agoda in Europe or the UK, as can be seen in Table III. The highest statistical mean score difference is in comparing Booking. com with TR. Hotels on TR get 7.636, but on Booking.com, the hotels obtain 7.020 (t = 21.06; p < 0.001).
Second, Agoda obtains the lowest ratings when compared with Atrapalo and TR, but when compared with HRS in Germany, although there is a slightly higher rating in favour of HRS, they are not statistically significant. In this sense, the highest statistical score mean difference is between TR and Agoda with 7.815 and 7.233 (t = 15.71; p < 0.001), respectively.
Moreover, a Levene's test was performed to assess the homogeneity of variance of the ratings by market. The results show that there is homogeneity of variance in all cases because the p-value is higher than 0.05, except for the German market between Agoda and Booking.com (p < 0.001) and between Agoda an HRS (p < 0.001) where there are statistically significant differences in variances.
When the ratings are rescaled from 0 to 10 to compare scores fairly, that is, using the same scale, the results confirm the second hypothesis that Booking.com obtains the lowest hotel j TOURISM REVIEW j scores followed by Agoda. As Booking.com and Agoda use a four-and five-point Likert scale, respectively, and the other OTAs use a ten-or eleven-point scale, the third hypothesis is rejected because the highest results are for the OTAs that use 10-or 11-point scales.

Discussion
An analysis of several OTAs that use a 0-10 or 1-10 scale for hotel ratings demonstrates that there is a disparity in scoring systems. In addition to the already known Booking.com 2.5-10 scale, Agoda uses a 2-10 scale; TR a 0-10 scale, whereas HRS and Atrapalo use a 1-10 scale. In the last two cases, although they would appear to be fully equivalent systems, we observe that HRS calculates a global score for each hotel as an arithmetic average of up to 12 items, whereas Atrapalo uses 8. The OTAs also use different delete criteria for old reviews. Even how to describe the customer experience is different with some OTAs asking customers about the most positive and negative aspects of the hotel.
This study shows that the OTA rating scale systems provide different score results for the same hotels when comparing the scores calculated by the OTA and when comparing the rescaled scores from 0 to 10, but the differences in each case are in the opposite direction.
The singular scales of Booking.com (2.5-10) and Agoda (2-10) provide apparently better scores for the hotels when compared with the other OTAs analysed. This is also confirmed by Leung et al. (2018) with their study comparing several OTAs, but the study did not take into account that the Booking.com scale starts on 2.5 instead of 1.
This effect is what any user sees when consulting these websites to compare hotel scores. Therefore, these OTAs use a scale that is not the usual one and leads users to believe that a hotel with a score of 5 or 6 on average in the assessment on Agoda and on Booking.com is a hotel with a "pass mark" when it is actually a failure. This perception is reinforced when words such as "passable", "pleasant", "acceptable" or "above average" are included next to these scores (Mellinas and Reino, 2019).
Not only can users of these websites be confused with these peculiar scales, but also researchers that have made studies based on the scale of Booking.com as 0-10 or 1-10 ( Abrate and Viglia, 2016;Ert et al., 2016;Kim and Park, 2017;Leung et al., 2018;Pokryshevskaya and Antipov, 2017) even when Mellinas et al. (2015) reported that the scale was from 2.5 to 10. These publications have passed the filter of reviewers and editors, disseminating erroneous data among the scientific community. It has happened with Booking.com, and may happen with the rest of the OTA websites in the future, if prior studies of them are not carried out.
When analysing the results with a normalized scale (0-10), comparable among OTAs, the results are different because the worst scores are obtained by hotels on Booking.com, followed by those on Agoda.
Booking.com uses a four-point Likert scale and Agoda uses a five-point one and produces, when normalizing the scale, the worst results compared with the scales of 10 or 11 points. This result goes in the opposite direction to Dawes' study (2018) in which ten-point scales produced slightly lower scores compared with five-point and seven-point ones. It is worth mentioning that with the Booking.com and Agoda scales, although OTAs have similarities with four-and five-point Likert scales, the results are multiplied by 2.5 and 2, respectively; therefore, the results obtained do not have to coincide with the aforementioned study related to the highest score on a ten-point Likert scale.
Moreover, if we take into account that a four-point scale without a midpoint seems to attract more answers to the positive side of the scale (Worcester and Burns, 1975), the results could be better for OTAs using this scale, although there is no consensus on the effects of using an even or odd number in the response scale (Adelson and McCoach, 2010) neither in the number of points used in the answers of surveys using Likert scales (Bisquerra-Alzina and Pé rez-Escoda, 2015;Boone and Boone, 2012).
It is also important to note that in collecting text responses and reviews, Booking.com and Agoda ask users to evaluate the most positive and negative aspects of the accommodation, which could mean respondents have to perform a memory exercise of negative situations experienced in the hotel. Therefore, numerical ratings are lower than the other OTAs, once the scale is normalized. This could explain the fact that Booking.com and Agoda have the lowest results.
Regarding the tags that accompany the numeric Likert scales, the literature confirms that instead of providing more information, sometimes they can confuse respondents because they can have different meanings (Worcester and Burns, 1975). In the case of OTAs that use verbal denominations or tags in the form of smiley faces instead of numbers to rate the hotel, the authors have confirmed that some OTA use tags or textual descriptions of each item to be assessed on a Likert scale that do not have the same relation with the numerical value. For example, the authors observed that some OTAs use adjectives or elements such as smileys that tend to be more positive than negative. For instance, Booking.com used to describe "fair" as the second point in a four-point scale. At present, it describes this point with a neutral face, but the response is not at the midpoint of the scale because there are only four points, which could induce respondents to rate better than they really would.
OTAs seek to obtain the best ratings for hotels because this implies greater satisfaction of hotel managers because of the online reputation they obtain through these websites, and it is linked to the success of the hotel and the quality of service (Jalilvand et al., 2017). Also, it have been proven that better online reviews and hotel ratings increase users willingness to book rooms there (Vermeulen and Seegers, 2009), resulting in better hotel room sales (Cezar and Ö gü t, 2016). So being the OTA with the best for hotels makes it more desirable both for potential customers and for accommodation establishments to distribute their rooms with them. Booking.com and Agoda's scales lead customers to believe that hotels are better valued than they really are. Thus, when a user compares hotels on different OTAs websites, at equal prices, customers perceive a hotel that has apparently a higher score and is better than the same hotel on an OTA with a scale from 0 to 10. For example, a real hotel of our dataset in Calella (Spain) is more likely to make sales through Booking.com (6.3) or Agoda (6.4) than through TR (5.6) or Atrapalo (4.8).
Furthermore, OTAs establish mechanisms for collecting reviews and ratings that are as beneficial as possible to obtain the best results for hotels. Proof of this is that since the collection of the data herein, we have seen how the collection format has changed in several OTAs (e.g. Booking.com has changed the description of the four-point response and changed the policy of eliminating old reviews from 14 to 24 months).
The two websites that use "non-conventional" scoring systems (Booking.com and Agoda) belong to Booking holdings. These websites also use very positive words together with scores of the lowest rated hotels (Mellinas and Reino, 2018). We wonder if it is all part of a strategy to improve hotel quality perception that is close to being considered fake or deceptive advertising. Clearly, the way to calculate the final score seems to be more honest for OTAs with a real 0-10 or 1-10 scale such as Atrapalo, HRS, or TR.

Theoretical implications
Through an analysis of hotels that operate with several OTAs that use different measurement scales, this study confirms that there are differences in the final scores obtained by hotels, as suggested by various authors comparing only two websites Mellinas et al., 2016). This research provides an additional insight with an analysis of the same hotels that operate in five OTAs in different markets.
This finding should be taken into consideration when comparing ratings from these OTAs because comparing scales from 0 to 10, from 1 to 10, from 2 to 10 or from 2.5 to 10, without normalizing them, is like comparing "apples with oranges".
Despite having analysed the number of points on a Likert scale through OTAs, the recurrent question in numerous investigations about how many points to use in the responses on a Likert scale remains unanswered.

Managerial implications
This research has several practical implications for OTAs using different scales than the 0-10 or 1-10. First, OTAs and hotel managers are losing information provided by customers because guests are not aware of the scale system when rating them. Second, online reviews are a valuable source of information for potential customers that allow them to have a closer understanding of the services and the facilities they will find in the accommodation establishment. With some scales, hotels can be overrated, and customers will create high expectations that are not consistent with reality. This will be detrimental to hotels because they might not be able to satisfy these expectations; therefore, customers will be dissatisfied with their hotel experience. There are better ways to improve the hotel ratings such as training the staff on emotional intelligence (Koc and Boz, 2019), answering the online reviews of previous guests (Wei et al., 2013), encouraging guests to handwrite their opinions as it is demonstrate that subsequent ratings are better if the opinion has been previously handwritten (Tassiello et al., 2018).
A third group of implications could come from the competitors of Bookings holdings. These should consider whether it is appropriate to maintain a system of conventional scales and allow hotels to be better valued in other websites or choose to change the scale of their systems. In this case, competitors could invent new scales that inflate scores even more than Agoda or Booking.com, which would lead to a very controversial situation.
In conclusion, these variations in the scales and the confusion that they entail are having negative effects for consumers, hoteliers, researchers and competitors. For this reason, we suggest OTAs include, next to the score, the type of scale used to rate the hotel to provide users with more information. In this sense, we applaud the initiative of Booking.com, which has recently created a website section titled "How is my review score calculated?" (Booking.com, 2018) explaining its review score system. However, this information is located in a section aimed at Booking.com partners. Therefore, we have doubts about the number of users who know about it and take it into account when choosing a hotel.
Furthermore, OTAs that use textual descriptions or tags in the form of smiley faces to rate the hotel should be honest and match the verbal denomination with its numerical value and not use elements that tend to be more positive than negative because they can cause confusion and biased answers from users.

Conclusions
This study confirms the existence of differences between scores when using different rating scales. Both hoteliers and researchers should not continue to make mistakes when considering that the scores provided by different websites are equivalent, simply because they seem to use identical information and score collection systems. As has been demonstrated, the scales are variable, in addition to the items used, which leads to significant differences in scores. These findings should be considered in future investigations with quantitative analyses using these sources of information, especially if they are designed to combine different sources though assuming that the data are equivalent.

Limitations and future directions for research
Evidence shows that scores for the same hotels vary depending on each OTA. With the present study, we cannot point to a single reason such as the measurement scale, method of collecting reviews, textual description of positive and negative aspects, old review policies of OTAs or even that different platforms might have different users that might come with some sample self-selection issues on platforms. There are more factors to be considered such as the user's nationality that can be more or less demanding of hotel services (Au et al., 2014;Leung et al., 2018), cultural differences in responses to a Likert scale (Lee et al., 2002). Another factor could be the percentage of reviews collected on each OTA through either mobile phones or personal computers, since the analysis carried out by Mariani et al. (2019) determined that the ratings on Booking.com were higher for responses collected with smartphones.
Finally, it would be interesting to carry out similar research with websites that only use rating scales from 1 to 5. This way, whether there are significant differences in scoring systems, as seen in this research, could be verified.