Probably one of the most interesting EDAs I did

4 min readFeb 21, 2024

I last edited the Colab file (where the EDA code is hosted) on Oct 10, 2020. But, it just stays as fresh in my head as it was on that timestamp. This was right after completing my Post Graduation in AI/ML. EDA was like a bread and butter for that entire one year of building different types of models — whether Classification / Regression / NLP / you name it.

One fine day, while doing an Alum session for my Engineering college; I had to present something in view of the concept / practices of EDA. I chose Roger Federer’s Career on a whim. Oh boy, I am an ardent fan. I don’t think so I have missed any of his important games ever since I started watching Tennis. And if you have already started wondering which one of his games has been your favorite? Is the AO 2017 final against Nadal, by any chance, your favorite? If yes, we eat the same rice, mate :p
Though, I couldn’t get his games data post ’12. So, here’s all the EDA from his 1998–2012 career.

Let me share some pieces of that EDA :-

OBJECTIVE OF THIS EDA

Closely look at Roger Federer’s extraordinary career statistics in his prime years of tennis : 1998–2012 and perform a full-stack EDA. Later, also perform Predictive Analytics and make prospective predictions of him being able to win any possible Grand Slams in the years that followed and see if the predictions are right.

Why Roger Federer’s Career data?

I wanted to work on a data that appears consistent and is explained richly with the help of visualizations. In the Open Era of Lawn Tennis, we do not have any better player than Roger Federer when it comes to consistent extra-ordinary career statistics.

Why Federer’s Career data only from 1998 to 2012?

While Roger Federer, in later years (2017 / 2018) was seen getting back to his prime and adding three more Grand Slams to his overall tally of 17 (then), and now 20, we would still stick to his career span of 1998 to 2012 for this project, for the reason that we observe a better data sanctity and consistent data available for this particular period, as compared to his overall career period.

Here’s the link to the Colab file :-

Google Colaboratory

Edit description

colab.research.google.com

You will need permissions to access this. So please drop a comment if you’d like to look at the code.

I am sharing a few snippets below, nevertheless :-

— — — — IMPORTING LIBRARIES — — — —

Why am I using Pandas? I don’t think I should be answering that :p
Let me quickly tell about pylab. It is a module that provides a Matlab like namespace by importing functions from the modules Numpy and Matplotlib.
SimpleImputer — is a sci-kit library used to fill in the missing values in the datasets.

— — — — EDA BEGINS — — — —

Looking at the top 2 rows of the data-frame

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Python

Exploratory Data Analysis

Roger Federer

Written by Shivam Dutt Sharma

112 Followers

74 Following

Data Science . Product Engineering . Tennis . Running

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Shivam Dutt Sharma

Analytics Vidhya

Shivam Dutt Sharma

Facebook Graph API | Python

Ever since I was apprised of Python and I got skilled with its popular libraries, one area which has always amused me and has never failed…

May 2, 2020

101

Analytics Vidhya

Shivam Dutt Sharma

itertools.product() in Python

Now, it’s my belief that Python is a lot easier than to teach to students programming and teach them C or C++ or Java at the same time…

Apr 26, 2020

My GUI Programming Cheatsheet | Python3 | Jupyter

Analytics Vidhya

Shivam Dutt Sharma

My GUI Programming Cheatsheet | Python3 | Jupyter

You have a GUI programming assignment to complete, with some strict deadlines imposed on you, and you do not know much about how to do it…

Aug 23, 2020

6ft. Runners

Shivam Dutt Sharma

The day I ran “The Tuffman 2023!”

The day was March 26, 2023. I got up at 03 AM in the morning as I had to report outside Fluor Ericsson Parking, Adjacent to DLF Gateway…

Sep 25, 2023

See all from Shivam Dutt Sharma

Recommended from Medium

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

25K

732

Biased-Algorithms

Amit Yadav

Active Learning for Data Labeling

Problem Overview

Oct 10, 2024

Lists

Coding & Development

11 stories1033 saves

Predictive Modeling w/ Python

20 stories1857 saves

Practical Guides to Machine Learning

10 stories2225 saves

ChatGPT

21 stories991 saves

Data Science All Algorithm Cheatsheet 2025

Artificial Intelligence in Plain English

Ritesh Gupta

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

Jan 5

1.4K

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Level Up Coding

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

10.6K

260

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

9.4K

170

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

DataDrivenInvestor

Austin Starks

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Sep 15, 2024

9.1K

242

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams