Which Country Produced the Best Male Tennis Players in 2017- A Data Science Project

Last modified date

Comments: 0


Are the Swiss or the French producing the best tennis players in the world? 

Th Swiss! The French! The Spaniards!

The eternal topic of which country produces the best tennis players needs to be settled. With data, not passion.

Given the topic close to my passion (in high school, I played on the varsity tennis team, not to mention my long career as a ball kid at the W&S Cincinnati ATP tournament each August) it is only natural that my winter Data Science project is tennis related.

Any data science has a few steps:

  1. Find the data. Preferably, a lot of data.
  2. Extract the data.
  3. Visualize and draw conclusions.

Step 1: Find the Data

For the first step, I went onto the ATP (Association of Tennis Professionals) website and looked for the top male tennis player rankings based on the 2017 data. On the website, the data was separated as player name, country, ranking and total tournament points. Pretty easy to extract to parse, so ready for step 2.

Step 2: Extract the data with Python and Beautiful Soup

While looking at website, we used “inspect element” to see how the data was organized inside the website. After confirming the data was organized in a repetitive pattern, we decided to use Python to extract the data from the website.

Then, by using beautiful soup, found in the Python library, I could parse HTML.

Working with Python to extract and format the data

In case you are not familiar with Phyton, here is the “cooking recipe” to extract and parse the data:

1) Import the libraries at the top
2) Assign the website link to a variable and used two separate functions so that we could parse through the HTML
3) Use the “find_all” function to find all of the data on the players, points and countries. Using the implementation we found online, we used “inspect element” to find out how the data was organized, inputted this into the function and extracted the data into a list
4) Create an empty list
5) Implement a for-loop to go through each category list and strip all of the excess. As a result, we appended the leftover to our recently created list
6) Create a CSV file and added our data into it using a for-loop. We ran our script using the terminal. After checking the file, we saved it as an excel file ready to be uploaded to Tableau

Steps 3: Visualize and draw conclusions

Using Tableau, we uploaded our data and created a presentation of data by using a world map to reveal the total accumulated points by country, total number of tennis players, and average points by country.

This presentation revealed that Spain had the most amount of points, 24,536 points, among its 26 players but held an average of 1,534 points. Bulgaria, on the other hand held a total of 5,150 points but only had one player, Grigor Dimitrov.

We created two separate presentations to show the total points and average point viewpoints. With this, we demonstrate that Spain had the best players based on the total cumulative points , while Bulgaria had the highest average points.

Way to go, Grigor Dimitrov! You single-handedly managed to put Bulgaria on the tennis map.

And Rafa, also being at the top representing Spain! Together, they represent the best countries in the world for producing top tennis players.

Leave a Reply

Your email address will not be published. Required fields are marked *

Post comment