Code Header

Analyze your cycling data with Python

Athlete Cycling on hometrainer

I want to show you how to analyze your cycling workout with Python, dig into the data and calculate scientific metrics based on your performance. The analysis is based on a fitfile containing speed, cadence, heart rate, power (obtained by a power meter), distance (and a couple more) of the workout. These parameter are required to calucluate meaningful results and plots. I was using my Wahoo Element Bolt to generate these fitfiles and transfer them via Dropbox. In case you don't have any workout files like that just download one of my workouts here.


To get on starting we need to install the pandas, fitparse and matplotlib library within Python (I use Python 3.6). You can do it via pip like this:

$ pip install pandas fitparse matplotlib

Data import

Now we are ready to import the workout file and transform the data into a pandas dataframe. Unfortunately we have to use an ugly hack with this "while" loop to avoid timing issues. Then we are looping through the file, append the records to a list and convert the list to a pandas dataframe.

from fitparse import FitFile
import pandas as pd
import matplotlib.pyplot as plt

fitfile = FitFile('2019-03-11-185427-ELEMNT BOLT')

while True:
%t%except KeyError:
workout = []
for record in fitfile.get_messages('record'):
%t%r = {}
%t%for record_data in record:
%t%%t%r[] = record_data.value
df = pd.DataFrame(workout)

Pandas Dataframe

Afterwards the pandas dataframe looks like this and we can start working with it.

Pandas provides a very good documentation. If you want to learn more about dataframes and its conversions or functions, check it out here .

Dataframe Workout small

In the next step we want to check out what is in the workout data. Pandas provides a couple of handy functions, two of them are "head" and "describe". Head shows the beginning of the file:


The describe method is defined as followed in the documentation: Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Like that we are getting a nice overview of our key measurements. I have filtered the dataframe, so only power, heart rate, speed and cadence are displayed. The function works like this:

print(df[['power', 'heart_rate', 'speed', 'cadence']].describe())

The results contains min, max, mean values, the standard deviation, and the 25th, 50th and 75th percentile. It already gives you a little summary of your workout and how much effort you have put into.

Discribe function workout

Plots 1/3 - The Means and SD

Now we are start plotting the data. The first plot is graphically visualizing the mean of all values in the workout file (except position, distance and timestamp, because this does not make any sense). Also the standard deviation of each measurement is displayed. Some adjustments in matplotlib are made to have a nicer look and a more useful grid (I don't want to go much into details - but checkout the docs here).

df_dropped = df.drop(['position_lat', 'position_long', 'distance', 'timestamp'], axis=1)
means = df_dropped.mean()
errors = df_dropped.std()
fig, ax = plt.subplots(), ax=ax)
ax.grid(which='major', linestyle='-', linewidth='0.5', color='red')
ax.grid(which='minor', linestyle=':', linewidth='0.5', color='black')

Mean Standard Deviation

Plots 2/3 - Histogram

Another interesting analysis is the frequency of each value. For simplicity I have done this only for power, heart rate and cadence measurements. To have more details in the graph I have limited the values by 400 (you can adjust this if needed).

The frequency is useful to check how targeted your workout was. Is your power scattered all over the chart? Or were you training on specific power/heart rate levels? How was your cadence during the workout?

fig, ax = plt.subplots()
df[['power', 'heart_rate', 'cadence']].plot.hist(bins=100, alpha=0.5, range=(0, 400), ax=ax)
ax.grid(which='major', linestyle='-', linewidth='0.5', color='red')
ax.grid(which='minor', linestyle=':', linewidth='0.5', color='black')

One possibility of improvement is to show your heart rate and power zones in the background. These values are based on your maximum heart rate and your functional threshold power (FTP). Like that you are able to compare your workout with your maximum abilities. You can find multiple guides in the internet of how to set your zones and when to use them, for example this one

Histogram Cycling

Plots 3/3 - Time series

Last but not least I want to plot the progression of heart rate, power and cadence over time. In my case you can see nicely the different intervals I was doing during my workout.

Since the time in seconds is corresponding with the amount of data points, I don't use the timestamp values explicitly.

fig, ax = plt.subplots()
df[['power', 'heart_rate', 'cadence']].plot(ax=ax)
ax.grid(which='major', linestyle='-', linewidth='0.5', color='red')
ax.grid(which='minor', linestyle=':', linewidth='0.5', color='black')

Nice improvement of this graph would be to identify and mark the specific intervals and show mean values during each interval. Also I could add my expected power output based on my training plan and cross check it with the actual values.

Cycling Training

TSS, Normalized Power & Intensity Factor

Now I want to show you how to calculate some important key figures of your workout. Especially I want to highlight the Training Stress Score (TSS). It shows you the amount of stress your body was exposed during the workout. A TSS of 100 means, you have been for one hour at your FTP level. To calculate the TSS we need the normalized power (NP) and the intensity factor (IF). Normalized power is the rolling mean of the power during 30 seconds. The intensity factor is the division of the normalized power with the FTP (functional threshold power). Update your FTP value (if you have made a FTP test yet) in the first line on this snippet.

ftp = 275
norm_power = np.sqrt(np.sqrt(np.mean(df['power'].rolling(30).mean() ** 4)))
intensity = norm_power / ftp
tss = (moving_time * norm_power * intensity) / (ftp * 3600.0) * 100.0
%t%%s%%s%'NP: {} W \n'
%t%%s%%s%'TSS: {} \n'
%t%%s%%s%'IF: {}'.format(str(round(norm_power, 1)), str(round(tss, 1)), str(round(intensity, 2))))

This script will give you following output:

NP: 200.4 W
TSS: 56.8
IF: 0.73

Check out Trainingpeaks for more information about TSS and how it's calculate:

Friel, Joe. (2009, Sept 21). Estimating Training Stress Score (TSS).

Download the code

Here you can download the full analysis script written in Python and with a couple of comments.

Please let me know when you have any questions or if you have made any modifications or improvements.

The next step on my agenda is to have bulk imports and analyze a series of workouts. Also I want to create a AI model which estimates my current FTP based on my current performance (I don't want to have a stressful FTP test on a frequent basis).

blog comments powered by Disqus