Navigating Measurement Challenges in Product Ratings and Reviews

Huseyin Baytar
4 min readNov 8, 2023

Hello data science enthusiasts, in this article, I will be discussing Measurement Problems. It’s a topic that aims to utilize the skills of a data scientist to guide our company’s customers to accurate and reliable products.

Measurement Problems

When a user makes a purchase decision, it is influenced by social proof, often referred to as “the wisdom of crowds.” Let’s say we are about to buy a product and we are stuck between two options. The first product has a 5-star rating, while the second product has a 4-star rating. We would prefer to purchase the product with a 5-star rating.

Let’s modify our scenario a bit. The product with a 5-star rating has been endorsed by 7 people, while the 4-star product has been endorsed by 256 people. In this case, we would choose the 4-star product over the one with 5 stars, indicating the significant influence of strong social proof over other features. It is crucial as it demonstrates that even when we are certain of its flaws, we accept its imperfections. We tend to prefer products with more votes and comments. The importance of accurate ranking is significant from the perspective of both the user aiming for the best product in terms of price and performance and the seller aiming to deliver the products to users in the most accurate way possible.

Rating Product

When calculating the ratings of products, taking the simple average might not be reliable. For instance, a user might have voted immediately after visiting the site, while another user who spent a longer time on the site may have a different opinion. To address this issue, we need to consider user quality, which includes factors such as the number of comments a user has made and the time they have spent on the site.

Similarly, if we calculate the average for a newly released product and an older product, the older product may appear superior due to its higher number of reviews and the time it has been available. To tackle this, we need to consider a time-based average.

Finally, we can create a Weighted Average ranking by giving importance to both of these different averaging methods (user quality and time-based) by assigning weights.

Sorting Product

The topic of ranking is not limited to products; it is a situation that can arise in various processes. For example, in the context of recruitment, various factors such as interview score, English proficiency score, and technical language proficiency score can be weighted differently to create a ranking based on their averages.

To provide another example, when making a purchase on a website, the system can prioritize showing the most relevant items based on the keyword entered in the search bar. Product ranking can be done in various ways, such as sorting by rating, sorting by the number of comments and purchases, and sorting by a combination of rating, comments, and purchase counts. However, sponsored advertisements often disrupt this ranking.

Sorting Reviews

In comment sorting, it is crucial to prioritize the comments that other users find useful, regardless of whether they are positive or negative.

There are three simple methods for comment sorting. The first method is the up_down score, formulated as follows:

However, when applying this method, a comment with 600 up and 400 down ratings would have a 60% positive score, while a comment with 5500 up and 4500 down would have a 55% positive score. In this scenario, the answer from the second comment, which is 1000, would be prioritized over the first comment, which is 200. To address this issue, we can employ the second method, which involves taking the ratio.

The formula is;

In the previous scenario, the first comment had a ratio of 0.6, while the second comment had a ratio of 0.55. However, this also poses a problem. For instance, when comparing a comment with 2 up and 0 down to a comment with 100 up and 1 down, the comment with 2 up would be assigned a ratio of 1 since it has 0 down, while the comment with 100 up and 1 down would be assigned a ratio of 0.99. In this case, the first comment would be recommended.

To solve this problem, the Wilson Lower Bound Score method should be used.

It calculates the probability of binary options by taking the parameter P of the Bernoulli distribution, providing us with the probability within a certain confidence interval. Depending on the size of the sample and the confidence interval, this method offers a reliable lower limit estimate of the ratio, providing protection against the probability of being below the actual population ratio at a certain confidence level.

I explained more detailed with coding part on my Kaggle notebook which is below i linked;

To Be Countinued…