How to build and visualise a Monte Carlo simulation with Python and Plotly

January 6, 2023 12 minute read

Contents

Some of the links in this article are affiliate links. This means if you click on the link and purchase or subscribe, I will receive a commission. All opinions and recommendations remain objective. You can also read the affiliates disclosure for more information.

This article does not constitute financial advice and is for educational purposes only.

What are Monte Carlo simulations?

Monte Carlo simulations are used to model the probabilities of different outcomes where those outcomes are hard to predict due to random variables. The Law of large numbers states that as a sample size grows, its mean gets closer to the average of the whole population. This is due to the sample being more representative of the population as the sample become larger. In other words, with a Monte Carlo simulation the goal is to simulate the collection of all or many possible paths (using random sampling) in order to find the possibilities and the most likely or theoretical solution.

In summary:

A Monte Carlo simulation is a model used to predict the probability of a variety of outcomes when the potential for random variables is present.
Monte Carlo simulations help to explain the impact of risk and uncertainty in prediction and forecasting models.
A Monte Carlo simulation requires assigning multiple values to an uncertain variable to achieve multiple results and then averaging the results to obtain an estimate.
A Monte Carlo model is a stochastic model, meaning that due to randomness the results may differ each time, as opposed to a deterministic model where given the same inputs you'll get the same result every time.

A quick example to illustrate

Monte Carlo simulations are named after the Monte Carlo casino in Monaco, so let's ask a casino based question.

"If we always pick red at roulette, how often would we win?"

The roulette wheel has 18 red slots, 18 black slots, and 1 green slot for a total of 37 slots.

roulette.py

import random

def play_roulette():
    total_slots = 37
    red_probability   = (18 / total_slots) * 100
    black_probability = (18 / total_slots) * 100
    green_probability = (1 / total_slots) * 100
    
    possible_outcomes = ["red", "black", "green"]
    probabilities = [red_probability, black_probability, green_probability]
    
    outcome = random.choices(
        possible_outcomes,
        weights=probabilities,
        k=1
    )[0]
    
    return outcome
    
    
def perform_simulation(n_times=1000, choice="red"):   
    results = { "red": 0, "black": 0, "green": 0 }
    
    for i in range(n_times):
        outcome = play_roulette()
        results[outcome] += 1
           
    win_percentage = results[choice] / n_times
        
    return results, win_percentage

        
if __name__ == "__main__":
    results, win_percentage = perform_simulation(n_times=1000000, 
                                                 choice="red")
    
    print(results)
    print(win_percentage)

TERMINAL

Python

user@ShedloadOfCode:~$ roulette.py

So after 1 million simulations, we can say we win just less than half of the time with a 48.71% probability. We've proven that the extra green pocket gives the house an edge over the long run.

Building the Monte Carlo model with Python

Now we have an idea of what a Monte Carlo simulation is and have seen a short example, we can build a more complex model. The challenge I have set here is to recreate an awesome Monte Carlo retirement simulation from engaging-data.com using Python and Plotly. After playing around with this calculator I wondered how this could be re-created in Python with a few individual touches. I got quite close and there's lots to learn from the code.

The question this time is "If I invest a set amount for a number of years, how much might I have?"

All of the code for this model can be found on GitHub.

model.py

"""
Monte Carlo model to simulate the growth 
of an investment portfolio over time.
"""
import numpy as np
from helpers import (
    get_random_returns, 
    get_confidence_levels,
    get_yearly_percentiles)

from plots import (
    plot_histogram,
    plot_yearly_percentiles)


def perform_simulation(inputs: dict):
    """
    Performs a simulation to find out how much
    the pot is worth in £ after years of growth.
    
    Returns:
        pot (float)     - the final amount at the end 
        history (list)  - the yearly history of results 
                          [10000, 11000, 12000, ...]
        
    """
    years = inputs['end_age'] - inputs['start_age']
    pot = inputs['starting_pot']
    returns = get_random_returns(years=years) 
    mean_return = (np.mean(returns) - 1) * 100
    
    history = []
    
    for i in range(years):
        annual_return = returns[i]
        pot *= annual_return
        pot += inputs['annual_contributions']
        history.append(int(pot))
        
    return pot, history, mean_return
    
    
def perform_monte_carlo(inputs: dict, n: int = 1000):
    pot_sizes = []
    results = []
    mean_returns = []
    
    for i in range(n):
        final_amount, history, mean_return = perform_simulation(inputs)
        pot_sizes.append(final_amount)
        results.append(history)
        mean_returns.append(mean_return)
        
    lower_confidence, upper_confidence = get_confidence_levels(pot_sizes)
    
    print('Monte carlo model done :)', end='\n')
    print('Plots saved to /outputs folder')
    print('Mean return across all simulations: ', end='')
    print(f'{round(np.mean(mean_returns), 1)}%')
    
    return {
        'pot_sizes': pot_sizes,
        'results': results, 
        'yearly_percentiles': get_yearly_percentiles(results, inputs),
        'lower_confidence': lower_confidence,
        'upper_confidence': upper_confidence,
        'mean_returns': mean_returns
    }
    
    
if __name__ == "__main__":
    inputs = {
        'start_age': 20,
        'end_age': 65,
        'starting_pot': 5000,
        'annual_contributions': 500 * 12, 
        'target_amount': 300000,
        'n_simulations': 10000
    }
    
    mc = perform_monte_carlo(inputs, 
                             n=inputs['n_simulations'])
    
    plot_histogram(mc['pot_sizes'], 
                   mc['upper_confidence'], 
                   mc['lower_confidence'])
    
    plot_yearly_percentiles(inputs=inputs,
                            df=mc['yearly_percentiles'])

This model takes a dictionary 'inputs' which you can change to adapt the simulation. The 'perform_monte_carlo' function carries out a given number of simulations and returns the final 'pot_sizes' with other useful information like the history and mean returns of each simulation, the yearly percentiles, alongside upper and lower confidence intervals.

For this example our starting age is 20 and end age is 65. We start with £5,000 and our annual contributions are £6,000 (or £500 per month) and we're aiming for a £300,000 pot! We will run this simulation 10,000 times. You might be thinking, how do we simulate the randomness of what our returns might be each year?

A quick Google search tells us that the historic annual average return of the S&P500 is 10% per year. Sorry but I'm much more pessimistic and expect lower. I have modelled a range of returns and assigned them probability weights in the file helpers.py below. This means I've assumed low returns are more likely, but there is also a chance of higher returns, or negative returns. Nobody knows what the markets will do, and that's why randomness will help us with this uncertainty and view the outcomes of many simulations.

helpers.py

import random
import numpy as np
import pandas as pd


def get_random_returns(years: int):
    """
    Generates a list of random return percentages
    for the length of years required.
    """
    random_returns = []
    
    for i in range(years):
        high_negative_returns = (random.randint(-20, -8) / 1000) + 1
        low_negative_returns = (random.randint(-7, -1) / 1000) + 1
        low_returns = (random.randint(0, 4) / 100) + 1
        medium_returns = (random.randint(5, 9) / 100) + 1
        high_returns = (random.randint(10, 20) / 100) + 1
        
        possible_returns = [        # Weights
            high_negative_returns,  # 5  % chance
            low_negative_returns,   # 25 % chance
            low_returns,            # 40 % chance
            medium_returns,         # 25 % chance
            high_returns            # 5  % chance
        ]
        
        random_return = random.choices(
            possible_returns,
            weights=(5, 25, 40, 25, 5),
            k=1
        )[0]
        
        random_returns.append(
            random_return
        )
    
    return random_returns


def get_confidence_levels(pot_sizes):    
    upper_confidence = round(np.quantile(pot_sizes, 0.975), 2)
    lower_confidence = round(np.quantile(pot_sizes, 0.025), 2)
    
    return lower_confidence, upper_confidence


def get_yearly_percentiles(results, inputs) -> pd.DataFrame:
    """
    Finds the percentiles for each year.
    """
    results_rotated = list(zip(*results[::-1]))

    year = []
    age = []
    ninetieth_percentile = []
    seventy_fifth_percentile = []
    median = []
    twenty_fifth_percentile = []
    tenth_percentile = []
    
    for i, year_results in enumerate(results_rotated):
        new_age = (inputs['start_age'] + 1) + i
        ninetieth_percentile_value = np.percentile(year_results, 90)
        seventy_fifth_percentile_value = np.percentile(year_results, 75)
        median_value = np.median(year_results)
        twenty_fifth_percentile_value = np.percentile(year_results, 25)
        tenth_percentile_value = np.percentile(year_results, 10)
        
        year.append(i + 1)
        age.append(new_age)
        ninetieth_percentile.append(ninetieth_percentile_value)
        seventy_fifth_percentile.append(seventy_fifth_percentile_value)
        median.append(median_value)
        twenty_fifth_percentile.append(twenty_fifth_percentile_value)
        tenth_percentile.append(tenth_percentile_value)
        

    return pd.DataFrame(
        list(
            zip(year,
                age,
                ninetieth_percentile, 
                seventy_fifth_percentile,
                median, 
                twenty_fifth_percentile,
                tenth_percentile)
        ),
        columns=[
            'year',
            'age',
            '90th_percentile',
            '75th_percentile',
            'median', 
            '25th_percentile',
            '10th_percentile']
    )

The randomness we've introduced here is for every year in each of the 10,000 or more simulations a:

5% chance of negative returns between -20% and -8%
25% chance of negative returns between -7% and -1%
40% chance of low returns between 0% and 4%
25% chance of medium returns between 5% and 9%
5% chance of high returns between 10% and 20%

If you think these are too pessimistic or optimistic please go ahead change the values or weights 👍

The 'get_yearly_percentiles' function takes the 2D list 'results' (all of the histories for all simulations year by year), rotates it to line up year 1, year 2, year 3 and so on, and then finds the percentiles (10th, 25th, median, 75th, 90th) for each year. This effectively shows the range of results from all simulations for each year in a DataFrame:

year	age	90th_percentile	75th_percentile	median	25th_percentile	10th_percentile
1	21	11450	11300	11100	10990	10970
2	22	18153	17804	17399	17110	16957
3	23	25296.1	24570.25	23919	23395.75	23051
4	24	32631	31605	30632	29832	29279.8
5	25	40342.1	38841.25	37513.5	36407.75	35624.9
...	...	...	...	...	...	...
41	61	618714.6	553043.8	492832	442674.5	403963.4
42	62	645462.7	578133	514355	461004.3	420718
43	63	673703	602295	535547	478970	437741.3
44	64	703538.1	629788.8	557324.5	498292.3	453481.4
45	65	739303.7	656680.5	580414.5	517842	470615.8

This can then be plotted using Plotly along with the final pot sizes.

Plotting the Monte Carlo results with Plotly

I was using Spyder to carry out this analysis, and saved the plots as html files in the /output directory. You'll need to install Plotly with python -m pip install plotly

plots.py

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

pio.renderers.default='svg'


def plot_histogram(pot_sizes: list, 
                   upper_confidence:float, 
                   lower_confidence: float):
    """
    Plots the frequencies of the final pot sizes.
    """
    fig = px.histogram(pot_sizes, 
                       title=f"The final pot size after {len(pot_sizes)} simulations.")
    
    fig.add_vline(x=lower_confidence, 
                  line_width=3, 
                  line_dash="dash", 
                  line_color="green")
    
    fig.add_vline(x=upper_confidence, 
                  line_width=3, 
                  line_dash="dash", 
                  line_color="green")
    
    fig.add_vline(x=np.median(pot_sizes), 
                  line_width=3, 
                  line_dash="dash", 
                  line_color="black",
                  annotation_text="median",
                  annotation_font_size=15)
    
    fig.add_vrect(x0=lower_confidence, 
                  x1=upper_confidence, 
                  line_width=0, 
                  fillcolor="green",
                  opacity=0.2,
                  annotation_text="95% confidence interval",
                  annotation_font_size=15)
    
    fig.update_layout(
        xaxis_title="Amount (£)",
        yaxis_title="Count",
        showlegend=False,
        font=dict(
            family="Arial",
            size=14
        ),
        paper_bgcolor='rgba(0,0,0,0)',
        plot_bgcolor='rgba(0,0,0,0)',
    )
    
    fig.write_html('outputs/mc-histogram.html', auto_open=False)


def plot_yearly_percentiles(inputs, df):
    """
    Plots the year by year percentile graph.
    """
    exact_np = df[df['90th_percentile'] > inputs['target_amount']].iloc[0]
    exact_sfp = df[df['75th_percentile'] > inputs['target_amount']].iloc[0]
    exact_median = df[df['median'] > inputs['target_amount']].iloc[0]
    exact_tfp = df[df['25th_percentile'] > inputs['target_amount']].iloc[0]
    exact_tp = df[df['10th_percentile'] > inputs['target_amount']].iloc[0]
    
    fig = go.Figure()
    
    fig.add_traces(go.Scatter(x=df['age'], 
                              y=df['10th_percentile'],
                              line = dict(color='#FFA502'),
                              mode='lines',
                              name='10th %tile',
                              fill='none', 
                              fillcolor = '#F7CA77'))
    
    fig.add_traces(go.Scatter(x=df['age'], 
                              y=df['25th_percentile'],
                              line = dict(color='#7BE56E'),
                              mode='lines',
                              name='25th %tile',
                              fill='tonexty', 
                              fillcolor = '#F7CA77'))

    
    fig.add_traces(go.Scatter(x=df['age'], 
                              y=df['median'],
                              line=dict(color='black'),
                              line_width=3,
                              mode='lines',
                              name="median", 
                              fill='tonexty',
                              fillcolor='#00FF66'))
    
    fig.add_traces(go.Scatter(x=df['age'], 
                              y=df['75th_percentile'],
                              line = dict(color='#7BE56E'),
                              mode='lines',
                              name="75th %tile",
                              fill='tonexty', 
                              fillcolor = '#00FF66'))
    
    fig.add_traces(go.Scatter(x=df['age'], 
                              y=df['90th_percentile'],
                              line = dict(color='#FFA502'),
                              mode='lines',
                              name="90th %tile",
                              fill='tonexty', 
                              fillcolor = '#F7CA77'))
    
    fig.update_layout(hovermode="x")
    
    fig.update_xaxes(tickangle=0, 
                     dtick=1,
                     showticklabels=True, 
                     gridcolor='lightgray',
                     type='category')
    
    fig.update_yaxes(gridcolor='lightgray',
                     rangemode="tozero")
    
    fig.add_hline(y=inputs['target_amount'], 
                  line_width=2, 
                  line_dash='dash', 
                  line_color='red',
                  annotation_text='Target amount',
                  annotation_font=dict(
                    family="Arial",
                    size=15,
                    color="red"
                  ),
                  annotation_font_size=15,
                  annotation_position='bottom left',
                  fillcolor='red')
    
    fig.add_shape(type="line",
                  x0=int(exact_median['year'] - 1), 
                  y0=0, 
                  x1=int(exact_median['year'] - 1), 
                  y1=inputs['target_amount'],
                  line_width=2,
                  line_color='gray',
                  line_dash='dash')
    
    fig.add_shape(type="line",
                  x0=int(exact_tp['year'] - 1), 
                  y0=0, 
                  x1=int(exact_tp['year'] - 1), 
                  y1=inputs['target_amount'],
                  line_width=2,
                  line_color='orange',
                  line_dash='dash')
    
    fig.add_shape(type="line",
                  x0=int(exact_np['year'] - 1), 
                  y0=0, 
                  x1=int(exact_np['year'] - 1), 
                  y1=inputs['target_amount'],
                  line_width=2,
                  line_color='orange',
                  line_dash='dash')
    
    fig.add_shape(type="line",
                  x0=int(exact_tfp['year'] - 1), 
                  y0=0, 
                  x1=int(exact_tfp['year'] - 1), 
                  y1=inputs['target_amount'],
                  line_width=2,
                  line_color='green',
                  line_dash='dash')
    
    fig.add_shape(type="line",
                  x0=int(exact_sfp['year'] - 1), 
                  y0=0, 
                  x1=int(exact_sfp['year'] - 1), 
                  y1=inputs['target_amount'],
                  line_width=2,
                  line_color='green',
                  line_dash='dash')
      
    fig.add_annotation(x=int(exact_median['year'] - 1), 
                       y=inputs['target_amount'] * 1.45,
                       text=f"<b>{int(exact_median['year'])} years</b>",
                       font=dict(
                            color="black",
                            size=21
                       ),
                       showarrow=False,
                       yshift=10)
    
    fig.add_annotation(x=int(exact_median['year'] - 1), 
                       y=inputs['target_amount'] * 1.3,
                       text=f"<b>(Age {int(exact_median['age'])})</b>",
                       font=dict(
                            color="black",
                            size=21
                       ),
                       showarrow=False,
                       yshift=10)
    
    fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.2, 
                       y=df['10th_percentile'].max() - 8000,
                       text="<b>10%</b>",
                       font=dict(
                            color="black",
                            size=12
                       ),
                       showarrow=False,
                       yshift=10)
    
    fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.2, 
                       y=df['25th_percentile'].max() - 8000,
                       text="<b>25%</b>",
                       font=dict(
                            color="black",
                            size=12
                       ),
                       showarrow=False,
                       yshift=10)
        
    fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.25, 
                       y=df['median'].max() - 5000,
                       text="<b>median</b>",
                       font=dict(
                            color="black",
                            size=12
                       ),
                       showarrow=False,
                       yshift=10)
    
    fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.2, 
                       y=df['75th_percentile'].max() - 4000,
                       text="<b>75%</b>",
                       font=dict(
                            color="black",
                            size=12
                       ),
                       showarrow=False,
                       yshift=10)
    
    fig.add_annotation(x=inputs['end_age'] - inputs['start_age'] - 1.2, 
                       y=df['90th_percentile'].max() - 5000,
                       text="<b>90%</b>",
                       font=dict(
                            color="black",
                            size=12
                       ),
                       showarrow=False,
                       yshift=10)
    
    fig.add_annotation(x=.99,
                       xref='paper',
                       xanchor='right',
                       y=0,
                       yanchor='bottom',
                       text="<b>shedloadofcode.com</b>",
                       font=dict(
                            color="gray",
                            size=14
                       ),
                       showarrow=False)
    
    fig.add_annotation(x=0.01,
                       xref='paper',
                       yref='paper',
                       xanchor='left',
                       y=0.99,
                       yanchor='top',
                       text=f"In <b>{inputs['n_simulations']}</b> simulations " +
                            f"<b>{int(exact_median['age'])}</b> " +
                            f"is the median age ({int(exact_median['year'])} years)<br>",
                       font=dict(
                            color="black",
                            size=15
                       ),
                       showarrow=False)
    
    fig.add_annotation(x=0.01,
                       xref='paper',
                       yref='paper',
                       xanchor='left',
                       y=0.96,
                       yanchor='top',
                       text="<span style=\"color:orange\">10th to 90th %ile: " +
                            f"<b>{int(exact_np['year'])} to {int(exact_tp['year'])} " + 
                             "years</b> to target</span>",  
                       font=dict(
                            color="black",
                            size=15
                       ),
                       showarrow=False)
    
    fig.add_annotation(x=0.01,
                       xref='paper',
                       yref='paper',
                       xanchor='left',
                       y=0.93,
                       yanchor='top',
                       text="<span style=\"color:green\">25th to 75th %ile: " +
                            f"<b>{int(exact_sfp['year'])} to {int(exact_tfp['year'])} " + 
                             "years</b> to target</span>",
                       font=dict(
                            color="black",
                            size=15
                       ),
                       showarrow=False)
     
    
    fig.update_layout(
        title=f"Percentiles by year after {inputs['n_simulations']} simulations.",
        xaxis_title="Age",
        yaxis_title="Amount (£)",
        legend_title="Legend Title",
        showlegend=False,
        font=dict(
            family="Arial",
            size=14
        ),
        paper_bgcolor='rgba(0,0,0,0)',
        plot_bgcolor='rgba(0,0,0,0)',
    )
    
    fig.write_html('outputs/mc-percentiles.html', auto_open=False)

This outputs the year by year percentiles to 'outputs/mc-percentiles.html'. The good part about the Plotly HTML output is that after uploading to GitHub it can be viewed via htmlpreview.github.io

Go ahead and take a look at the Monte Carlo percentile graph.

You can also use this to embed the interactive plot in a web page using an iframe like the one below!

<iframe height="800" width="100%" loading="lazy" src="https://htmlpreview.github.io/?https://raw.githubusercontent.com/shedloadofcode/monte-carlo-simulation/main/outputs/mc-percentiles.html"></iframe>

We can see that the median amount crosses the target at age 51 given our inputs. However, given a better or worse outcome it could cross the target amount between ages 47 and 54.

Monte Carlo simulations are a great way to deal with uncertainty when we simply don't know what the expected values (in this case investment returns) will be.

We can also take a look at the frequencies of the final pot sizes at the end age 65 in a histogram.

We can see the 95% confidence interval is between £420k - £850k with the median pot size being £581k. This demonstrates the power of compounding and starting investing from an early age.

Comparing the results

I have used numerous scenarios as inputs to test this model against the calculator from engaging-data.com (ED) to see how the results align, which has been pretty fun. I set the average return on the ED calculator to 4% as my model is a bit more pessimistic. As mentioned earlier, you can change the probability weights for a given set of returns in the 'get_random_returns' function if you feel more optimistic.

Here are my findings in three scenarios:

Scenario 1 inputs

Input	Value
Start age	20
End age	65
Starting pot	5,000
Annual contributions	6,000
Target amount	300,000
Simulations	10,000

Scenario 1 results View calculator results

Model	Years	Age
This model	30	50
ED calculator	30.3	50

Scenario 2 inputs

Input	Value
Start age	30
End age	65
Starting pot	10,000
Annual contributions	10,000
Target amount	400,000
Simulations	10,000

Scenario 2 results View calculator results

Model	Years	Age
This model	26	56
ED calculator	25.2	55

Scenario 3 inputs

Input	Value
Start age	25
End age	65
Starting pot	20,000
Annual contributions	20,000
Target amount	1,000,000
Simulations	10,000

Scenario 3 results View calculator results

Model	Years	Age
This model	30	55
ED calculator	30.2	55

As you can see the results are very closely aligned, so I'm very pleased with how well this model is performing. Of course, as George Box said All models are wrong, but some are useful. We should not forget that models and simulations can only give us an indication of possible outcomes, we should never blindly trust them but use them as tools. I think it's also important to keep them realistic and not introduce too much bias or ego into our assumptions. It would be great to get 10% returns every year, but is that realistically going to happen?

Lowering our model's assumptions ensures we are closer to a 'worst case scenario' and any over-performance is a bonus!

Conclusion

We've learned lots on both Monte Carlo methods and creating / embedding Plotly visualisations with Python.

Some of the techniques used in this article with Plotly can also be used for variations of fan charts typically used for forecasting and acknowledging uncertainty.

I didn't quite get around to incrementing the results by 0.1 and plotting the circles which you can achieve with Plotly shapes. Maybe this is something you can try to replicate if you want to. I actually preferred seeing the vertical lines show which age the amount goes above the target rather than the exact intersection - this also is the foundation of statistical process control charts to make variation in the results explicit.

Thanks to engaging-data.com for giving me the inspiration to try and re-create this awesome model and visualisation with Python and Plotly.

Finally, it's worth mentioning that DataCamp has an interactive course Monte Carlo Simulations in Python and many other great courses on data science and machine learning. Read the full review Developing your data science and analytical coding skills - a review of DataCamp.

If you enjoyed this article you may also be interested in: