An Analysis of 220k+ Reviews from the Wine Enthusiast

Alessandro D'Ippolito
8 min readFeb 11, 2021

Wine is a drink that has played an important role throughout human history. It has been produced for thousands of years by countless civilizations where it was used for everything from religious ceremonies to medicine. Today, wine comes in an enormous number of different styles and flavors. Although traditionally wine is made from fermented grapes, today it is also created from additional crops such as fermented rice or cherries.

For this analysis, I am going to examine 220k+ reviews and statistics of all types of wine from Wine Enthusiast (winemag.com). I will be investigating some interesting questions and see what trends I discover.

The Data

The original data was downloaded from kaggle where it had been scraped and posted by user zackthoutt (https://www.kaggle.com/zynicide/wine-reviews). The latest version of this dataset contains all reviews from Wine Enthusiast up to November 22, 2017. An additional updated dataset with duplicates removed and containing all reviews up to March 14, 2019 was made available by user tbraam and can be found here https://www.kaggle.com/zynicide/wine-reviews/discussion/83970. Lastly, user Valeriy Mukhtarulin provided a dataset that contained all reviews from 2017–2020 and is located here https://www.kaggle.com/manyregression/updated-wine-enthusiast-review.

I merged the dataset from trbraam and Valeriy together and removed all duplicates. This provided a final working dataset of over 220k reviews to begin my analysis.

Top 5 rows of the final merged dataset

One final step of data preparation was to simplify the ‘variety’ column. For example, there are wines listed as Chardonnay-Arinto and Chardonnay-Macabeo. These were reduced to Chardonnay in a new column to simplify the analysis of different types of wine.

Each row represents a different review on a bottle of wine. Let’s take a look at the breakdown of review count by country of production.

Since Wine Enthusiast is an American magazine, it is not surprising that most of the reviewed bottles of wine come from the United States. The other countries in the top 12 are places well known for their wine.

Next, let’s dig in deeper and ask some interesting questions of this dataset.

Is a More Expensive Bottle of Wine More Likely to have a Higher Review Score?

The cost of a bottle of wine is this dataset ranges from $4 to $5000 (all prices USD). That is a massive price range and presumably the more expensive the bottle the better the quality.

Review scores in the dataset range from 80 to 100. The scores breakdown as follows:

Source: https://www.winemag.com/our-buying-guide-and-blind-tasting-process/

Wine Enthusiast also notes that “products deemed unacceptable (receiving a rating below 80 points) are not reviewed.”

Next let’s look at a distribution of prices for each of the review ranges listed above.

As we can see above, the trend of the mean price per bottle does increase as the review score increases. However, the price range for each review score category is still quite large. At the extreme range, a $39 bottle of wine scored in the highest possible category (98–100 Classic) while the most expensive bottle ($5000) was two categories down (90–93 Excellent). Therefore, it seems that just because a wine is expensive, it does not guarantee that it will be the highest quality.

What is the Best Vintage for Popular Types of Wine?

Vintage refers to the year that the grapes (or associated crop) are primarily grown and harvested for a particular batch of wine. Often certain vintages will be considered higher quality for particular types of wine. First let’s look at the top 12 wine types in the dataset.

The 6 wine types with the most reviews in the dataset are Pinot Noir, Chardonnay, Red Blend, Cabernet Sauvignon, Bordeaux Red Blend, and Riesling. Next let’s look at the distribution of review counts by vintage.

It’s easy to see that the bulk of the reviews involve wines with a vintage within the 2000's. To ensure a high minimum number of reviews for the analysis, we will focus on vintages from 2005 to 2018.

Based on the previous score table, we will consider any wine that has a score of at least 90 to be of high quality. Therefore, let’s look at the percentage of reviews with a score of 90 or greater by vintage and wine type.

From the graph above there are a number of interesting observations. The earlier vintages had a smaller difference in percentage of high quality scores between wine types. The later years see a greater spread in percentage of high quality scores. Cabernet Sauvignon in particular has a low of 16.13% for a 2018 vintage. However, Cabernet Sauvignon is considered one of the main wines that gets better with age. Therefore, this percentage will likely increase over time as reviewers test more 2018 vintage bottles that have had more time to age.

Bordeaux Style Red Blends peak at 70.42% in 2018, however they are also considered to get better with age. This leads to a large discrepancy between the max and min values of 2018. Based on this, the difference is likely caused by a low number of reviews.

Additionally, this analysis does not take country of production into consideration which can also have a significant impact on quality of a particular vintage. Ultimately, due to factors such as this I do not believe this chart can be used to tell which vintage is best for individual wine types.

Which Country is the Best at Producing Some of the Most Popular Wine Types?

When going to a wine store you will often find bottles organized by country of production. If you are looking for a nice Chardonnay to pair with your seafood dinner, which aisle should you go to? Let’s see if we can answer this question by looking at the data.

The analysis of Chardonnay scores shows Austria and Canada in first and second but with a relatively small number of reviews compared to France and the US.

Once again, Austria is in first for the Pinot Noir but with a much smaller number of reviews than the US in second place.

For Red Blends, we once again have Austria in first followed by Israel and South Africa. All three of these countries have a relatively similar number of reviews but France in fourth place has a much larger number.

The top three for Cabernet Sauvignon(Israel, Australia, and the US) all have a similar percentage of reviews with high scores (around 50%). However, the US has a much larger review count.

The Riesling analysis shows Austria in first once again but this time with a much larger number of reviews than it’s previous appearances.

Lastly, the Bordeaux Style Red Blends have South Africa in first followed by Israel and Argentina. All of the top 5 have a relatively small number of reviews compared to sixth place US.

Overall, Austria scored the highest in 4 out of the 6 categories. However, the number of reviews for Austria was relatively small for all except for the Riesling category. Countries such as the US and France are at an inherent disadvantage in this analysis due to the large number of reviews for those countries.

Therefore, it seems that you can’t simply look at these graphs and be certain that the country with the highest percentage of reviews with high scores is the best.

What are some of the characteristics of the most popular types of wine?

To answer this question, the review column was analyzed to see if it contained particular words used to describe wine. The first comparison is in regards to the fruit level. Most wines can be described as savory or fruity.

It looks like fruity is used much more often to describe wine that savory. Chardonnay and Bordeaux Style Red Blend both have a larger relative magnitude towards fruity. Both of these wine types are often considered to have strong fruit aromas.

Another spectrum that wine is often compared along is dry to sweet. Let’s see what the review analysis shows.

Dry and sweet both show up in Riesling reviews a relatively large amount of the time. Rieslings can come in both dry and sweet styles so this is not surprising. The relative magnitudes of Chardonnay and Red Blend both show more along the sweet side of the spectrum.

Another common term used to describe wine is the body profile. A light bodied wine typically has a lower alcohol content and tastes leaner. A full bodied wine is more intense and typically has a higher alcohol content. Medium bodied wines fall somewhere in between.

The analysis shows Rieslings relatively high magnitude in the light bodied category while Cabernet Sauvignon and Red Blends showing higher magnitude in the full bodied category. Chardonnay is often considered a medium to full bodied white wine and this is also reflected in the analysis.

Next we’ll look at some of the typical finishes of wine.

Looking at the chart, it shows Red Blends often considered spicy and Rieslings are often considered tart relative to the other wine types. Few reviews consider wine to be bitter, but Bordeaux Style Red Blends lead among them. Lastly, Cabernet Sauvignon leads the smoky finish category.

Which Winery Produces Canada’s Highest Rated Bottle?

Since I am born and raised in Canada, I wanted to take a deeper look specifically at Canadian wines in the dataset. Below is a plot of review points scored vs winery for all Canadian wines.

The top Canadian wines are both Icewines from the Niagara region in Ontario. The wineries involved are Inniskillin and Reif Estate both scoring a 96 on their Icewines.

Conclusion

Overall there were a lot of interesting different analyses to perform on this dataset that provided some interesting insights. One particular thing that stood out for me is the consistent high scores given to Austrian wine. Austria was not a country that I typically associated with wine production but it looks like I’ll have to give some a try!

--

--