Appendix C: Statistical Model

1. Introduction

Accurate heatwave prediction requires not only robust data ingestion and feature engineering (covered in Appendices A and B) but also sophisticated statistical and machine learning models. This appendix outlines the key mathematical formulations and statistical model equations integrated into our forecasting framework. It describes:

  • Time-Series Models: For capturing temporal dependencies.

  • Regression Models and Generalized Additive Models (GAMs): For modeling nonlinear relationships.

  • Deep Learning Architectures: Including loss functions and training objectives for neural networks.

  • Ensemble Methods: To quantify uncertainty and improve prediction reliability.

  • Model Evaluation Metrics: For assessing model performance.

The following sections present each of these components in detail, with both mathematical equations and corresponding Python code snippets.


2. Time-Series Modeling

2.1 ARIMA and SARIMA Models

Autoregressive Integrated Moving Average (ARIMA) models are widely used in time-series forecasting.

2.1.1 Python Code Example for ARIMA

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA

# Sample time-series data: daily maximum temperature (for example purposes)
# In practice, replace with actual historical temperature data for Toronto.
np.random.seed(0)
dates = pd.date_range(start='2017-01-01', periods=1000, freq='D')
temperature = 30 + np.random.normal(0, 5, size=len(dates))  # Synthetic temperature data

data = pd.Series(temperature, index=dates)

# Fit an ARIMA model (example: ARIMA(2,1,2))
model = ARIMA(data, order=(2, 1, 2))
model_fit = model.fit()
print(model_fit.summary())

# Forecast the next 10 days
forecast = model_fit.forecast(steps=10)
plt.plot(data.index, data, label='Historical Data')
plt.plot(forecast.index, forecast, label='Forecast')
plt.legend()
plt.title("ARIMA Forecast for Temperature")
plt.show()

3. Regression Models and Generalized Additive Models (GAMs)

3.1 Linear Regression Model

3.2 Generalized Additive Models (GAMs)

3.2.1 Python Code Example for a GAM

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pygam import LinearGAM, s

# Synthetic dataset: heatwave intensity as a function of temperature and humidity
np.random.seed(0)
n = 200
temp = np.random.uniform(30, 45, n)  # temperature in Celsius
rh = np.random.uniform(40, 80, n)     # relative humidity in %
# Simulated heatwave intensity with non-linear relationships
intensity = 0.5 * temp**2 - 3 * temp + 0.1 * rh**2 - 0.5 * rh + np.random.normal(0, 5, n)

X = np.column_stack((temp, rh))
y = intensity

# Fit a GAM model with smooth functions for both predictors
gam = LinearGAM(s(0) + s(1)).fit(X, y)
XX = gam.generate_X_grid(term=0)
plt.figure()
for i, term in enumerate(gam.terms):
    if term.isintercept:
        continue
    plt.plot(XX[:, i], gam.partial_dependence(term=i, X=XX))
    plt.xlabel("Feature " + str(i))
    plt.ylabel("Partial Dependence")
plt.title("GAM Partial Dependence Plots")
plt.show()

4. Neural Network Models and Loss Functions

4.1 Mean Squared Error (MSE)

4.2 Cross-Entropy Loss (for Classification)

4.3 Neural Network Cost Function Example in Code

import torch
import torch.nn as nn

# Example: A simple feed-forward neural network for regression
class HeatwavePredictor(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(HeatwavePredictor, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Instantiate the model, define MSE loss and an optimizer
model = HeatwavePredictor(input_dim=10, hidden_dim=64, output_dim=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Example forward pass
sample_input = torch.randn(32, 10)  # Batch size 32, 10 features
sample_output = model(sample_input)
sample_target = torch.randn(32, 1)  # Target values
loss = criterion(sample_output, sample_target)
print(f"Sample Loss: {loss.item():.4f}")

5. Ensemble Methods and Uncertainty Quantification

5.1 Ensemble Prediction Aggregation

5.2 Uncertainty Metrics

5.2.1 Brier Score

5.2.2 Continuous Ranked Probability Score (CRPS)

A metric for probabilistic forecasts that measures the difference between the forecast cumulative distribution function (CDF) and the empirical CDF of the observed value.

5.3 Python Code Example for Ensemble Aggregation

def ensemble_prediction(predictions, weights):
    """
    Compute the weighted ensemble prediction.
    
    Parameters:
        predictions (list or np.ndarray): List/array of predictions from different models.
        weights (list or np.ndarray): Corresponding weights for each prediction. 
                                     Sum of weights must be 1.
    
    Returns:
        float: Ensemble prediction.
    """
    predictions = np.array(predictions)
    weights = np.array(weights)
    return np.sum(predictions * weights)

# Example usage:
model_preds = [34.5, 35.0, 33.8, 34.9]  # Hypothetical temperature forecasts (Celsius)
model_weights = [0.25, 0.25, 0.25, 0.25]
ensemble_pred = ensemble_prediction(model_preds, model_weights)
print(f"Ensemble Prediction: {ensemble_pred:.2f} °C")

6. Model Evaluation Metrics

6.1 Regression Metrics

  • Mean Absolute Error (MAE)

  • Root Mean Squared Error (RMSE)

6.2 Probabilistic Metrics

  • Brier Score

  • CRPS

6.3 Code Example for RMSE Calculation

import numpy as np

def rmse(y_true, y_pred):
    """
    Calculate the Root Mean Squared Error (RMSE).
    
    Parameters:
        y_true (array-like): True values.
        y_pred (array-like): Predicted values.
        
    Returns:
        float: RMSE value.
    """
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    return np.sqrt(np.mean((y_true - y_pred) ** 2))

# Example usage:
true_vals = [35.0, 36.0, 34.5, 35.5]
predicted_vals = [34.8, 36.2, 34.3, 35.7]
print(f"RMSE: {rmse(true_vals, predicted_vals):.2f}")

7. Summary

This appendix has provided a comprehensive overview of the key statistical and machine learning equations, ensemble methods, and evaluation metrics integral to the heatwave prediction systems. The formulas and code snippets included serve as practical implementations for:

  • Time-Series Modeling (ARIMA/SARIMA)

  • Regression and Generalized Additive Models (GAMs)

  • Deep Learning Loss Functions and Neural Network Cost Functions

  • Ensemble Methods and Uncertainty Quantification

  • Model Evaluation Metrics (MAE, RMSE, Brier Score, CRPS)

Last updated

Was this helpful?