CAPM Analysis of Size, Value, and Investment Portfolios#

Introduction#

The Capital Asset Pricing Model (CAPM) predicts that excess returns are proportional to market beta, with Jensen’s alpha (intercept) equal to zero in equilibrium. This notebook tests this prediction on portfolios sorted by:

  1. Size and Book-to-Market: 6 portfolios formed on market capitalization and book-to-market ratio (from Fama-French 1993)

  2. Investment: 3 portfolios formed on asset growth (conservative, medium, aggressive investment)

Learning Objectives#

  1. Load and prepare portfolio return data from parquet files

  2. Run CAPM regressions using the finm package

  3. Interpret alpha, beta, t-statistics, and R-squared

  4. Create publication-quality regression results tables

  5. Understand anomalies that CAPM cannot explain

The CAPM Model#

The CAPM regression equation is:

\[R_{p,t} - R_{f,t} = \alpha_p + \beta_p (R_{m,t} - R_{f,t}) + \epsilon_{p,t}\]

Where:

  • \(R_{p,t}\) is portfolio return at time \(t\)

  • \(R_{f,t}\) is the risk-free rate

  • \(R_{m,t}\) is the market return

  • \(\alpha_p\) is Jensen’s alpha (abnormal return not explained by market risk)

  • \(\beta_p\) is market beta (sensitivity to market movements)

import pandas as pd
import numpy as np

import finm
from settings import config

DATA_DIR = config("DATA_DIR")

Step 1: Load Data#

We load three data sources:

  • FF_1993_vwret.parquet: 6 size/book-to-market portfolio returns (monthly)

  • univ_3_inv_vwret.parquet: 3 investment portfolio returns (monthly)

  • FF_FACTORS.parquet: Market factor (Mkt-RF) and risk-free rate (RF)

# Load the 6 Fama-French size/BM portfolios
ff_portfolios = pd.read_parquet(f"{DATA_DIR}/FF_1993_vwret.parquet")
print(f"FF Size/BM portfolios shape: {ff_portfolios.shape}")
print(f"Columns: {ff_portfolios.columns.tolist()}")
print(f"Portfolio codes: {ff_portfolios['sbport'].unique()}")
ff_portfolios.head()
FF Size/BM portfolios shape: (4284, 5)
Columns: ['jdate', 'szport', 'bmport', 'vwret', 'sbport']
Portfolio codes: ['BH' 'BL' 'BME' 'SH' 'SL' 'SME']
jdate szport bmport vwret sbport
0 1965-07-31 B H 0.033876 BH
1 1965-07-31 B L 0.016636 BL
2 1965-07-31 B ME 0.003068 BME
3 1965-07-31 S H 0.038735 SH
4 1965-07-31 S L 0.033604 SL
# Pivot to wide format (portfolios as columns)
ff_wide = ff_portfolios.pivot(index="jdate", columns="sbport", values="vwret")
ff_wide.columns.name = None
print(f"FF portfolios wide format: {ff_wide.shape}")
ff_wide.head()
FF portfolios wide format: (714, 6)
BH BL BME SH SL SME
jdate
1965-07-31 0.033876 0.016636 0.003068 0.038735 0.033604 0.025474
1965-08-31 0.023274 0.029329 0.015826 0.037690 0.050752 0.046083
1965-09-30 0.027948 0.048687 0.020349 0.044086 0.034213 0.034153
1965-10-31 0.056545 0.031947 0.010186 0.060084 0.048750 0.028552
1965-11-30 -0.016404 -0.012471 -0.011560 0.041411 0.041210 0.026719
# Load the 3 investment portfolios
inv_portfolios = pd.read_parquet(f"{DATA_DIR}/univ_3_inv_vwret.parquet")
print(f"\nInvestment portfolios shape: {inv_portfolios.shape}")
print(f"Columns: {inv_portfolios.columns.tolist()}")
print(f"Portfolio codes: {inv_portfolios['inv_port'].unique()}")
inv_portfolios.head()
Investment portfolios shape: (2070, 3)
Columns: ['jdate', 'inv_port', 'vwret_inv']
Portfolio codes: ['High' 'Low' 'Medium']
jdate inv_port vwret_inv
0 1967-07-31 High 0.060813
1 1967-07-31 Low 0.058490
2 1967-07-31 Medium 0.034118
3 1967-08-31 High -0.004922
4 1967-08-31 Low -0.018276
# Pivot to wide format
inv_wide = inv_portfolios.pivot(index="jdate", columns="inv_port", values="vwret_inv")
inv_wide.columns.name = None
print(f"Investment portfolios wide format: {inv_wide.shape}")
inv_wide.head()
Investment portfolios wide format: (690, 3)
High Low Medium
jdate
1967-07-31 0.060813 0.058490 0.034118
1967-08-31 -0.004922 -0.018276 -0.007149
1967-09-30 0.023649 0.017793 0.038591
1967-10-31 -0.019926 -0.036670 -0.027705
1967-11-30 0.006494 -0.005037 0.011197
# Load Fama-French factors (including market factor and risk-free rate)
factors = pd.read_parquet(f"{DATA_DIR}/FF_FACTORS.parquet")
# Rename columns to standard names used by finm package
factors = factors.rename(columns={
    "date": "jdate",
    "mktrf": "Mkt-RF",
    "smb": "SMB",
    "hml": "HML",
    "rf": "RF",
})
# Convert nullable Float64 to numpy float64 for compatibility
for col in ["Mkt-RF", "SMB", "HML", "RF"]:
    factors[col] = factors[col].astype("float64")
factors = factors.set_index("jdate")
print(f"\nFactors shape: {factors.shape}")
print(f"Columns: {factors.columns.tolist()}")
factors.head()
Factors shape: (1193, 8)
Columns: ['Mkt-RF', 'SMB', 'HML', 'RF', 'year', 'month', 'umd', 'dateff']
Mkt-RF SMB HML RF year month umd dateff
jdate
1926-07-31 0.0289 -0.0255 -0.0239 0.0022 1926.0 7.0 <NA> 1926-07-31
1926-08-31 0.0264 -0.0114 0.0381 0.0025 1926.0 8.0 <NA> 1926-08-31
1926-09-30 0.0038 -0.0136 0.0005 0.0023 1926.0 9.0 <NA> 1926-09-30
1926-10-31 -0.0327 -0.0014 0.0082 0.0032 1926.0 10.0 <NA> 1926-10-30
1926-11-30 0.0254 -0.0011 -0.0061 0.0031 1926.0 11.0 <NA> 1926-11-30

Step 2: Align Data and Compute Excess Returns#

We need to:

  1. Find the common date range across all datasets

  2. Compute excess returns (portfolio return - risk-free rate)

# Find common dates across all datasets
common_dates = ff_wide.index.intersection(inv_wide.index).intersection(factors.index)
print(f"Common dates: {len(common_dates)}")
print(f"Date range: {common_dates.min()} to {common_dates.max()}")

# Align all data to common dates
ff_aligned = ff_wide.loc[common_dates]
inv_aligned = inv_wide.loc[common_dates]
factors_aligned = factors.loc[common_dates]

# Combine all portfolios
all_portfolios = pd.concat([ff_aligned, inv_aligned], axis=1)
print(f"\nAll portfolios shape: {all_portfolios.shape}")
print(f"Portfolio names: {all_portfolios.columns.tolist()}")
Common dates: 690
Date range: 1967-07-31 00:00:00 to 2024-12-31 00:00:00

All portfolios shape: (690, 9)
Portfolio names: ['BH', 'BL', 'BME', 'SH', 'SL', 'SME', 'High', 'Low', 'Medium']
# Compute excess returns for each portfolio
rf = factors_aligned["RF"]
excess_returns = all_portfolios.sub(rf, axis=0)
print("Computed excess returns for all portfolios")
excess_returns.head()
Computed excess returns for all portfolios
BH BL BME SH SL SME High Low Medium
jdate
1967-07-31 0.069242 0.046555 0.028994 0.096480 0.059197 0.066041 0.057613 0.055290 0.030918
1967-08-31 0.009736 -0.011248 -0.015236 0.001396 -0.005925 -0.008742 -0.008022 -0.021376 -0.010249
1967-09-30 0.017920 0.036723 0.019407 0.046046 0.054902 0.036981 0.020449 0.014593 0.035391
1967-10-31 -0.052199 -0.016809 -0.044634 -0.039934 -0.020694 -0.029065 -0.023826 -0.040570 -0.031605
1967-11-30 -0.005694 0.007192 0.000413 -0.015107 0.010628 0.005021 0.002894 -0.008637 0.007597

Step 3: Run CAPM Regressions#

For each portfolio, we regress excess returns on the market excess return. The results will tell us:

  • Alpha: Is there abnormal return after accounting for market risk?

  • Beta: How sensitive is the portfolio to market movements?

  • R-squared: How much of the variance is explained by the market?

# Get market excess return
mkt_rf = factors_aligned["Mkt-RF"]

# Run CAPM regression for each portfolio
capm_results = {}
for portfolio in all_portfolios.columns:
    result = finm.run_capm_regression(
        excess_returns=excess_returns[portfolio],
        market_excess_returns=mkt_rf,
        annualization_factor=12,  # Monthly data
    )
    capm_results[portfolio] = result

print(f"Completed CAPM regressions for {len(capm_results)} portfolios")
Completed CAPM regressions for 9 portfolios

Step 4: Create Results Table#

We compile all regression results into a summary table with:

  • Alpha (monthly and annualized)

  • Alpha t-statistic

  • Market beta

  • Beta t-statistic

  • R-squared

# Create summary DataFrame
summary_data = []
for portfolio, result in capm_results.items():
    summary_data.append({
        "Portfolio": portfolio,
        "Alpha (monthly)": result.alpha,
        "Alpha (annual)": result.alpha_annualized,
        "Alpha t-stat": result.alpha_tstat,
        "Alpha p-value": result.alpha_pvalue,
        "Beta": result.betas["Mkt-RF"],
        "Beta t-stat": result.beta_tstats["Mkt-RF"],
        "R-squared": result.r_squared,
        "N": result.n_observations,
    })

summary_df = pd.DataFrame(summary_data)
summary_df = summary_df.set_index("Portfolio")

# Format for display
display_df = summary_df.copy()
display_df["Alpha (monthly)"] = display_df["Alpha (monthly)"].map("{:.4f}".format)
display_df["Alpha (annual)"] = display_df["Alpha (annual)"].map("{:.2%}".format)
display_df["Alpha t-stat"] = display_df["Alpha t-stat"].map("{:.2f}".format)
display_df["Alpha p-value"] = display_df["Alpha p-value"].map("{:.4f}".format)
display_df["Beta"] = display_df["Beta"].map("{:.2f}".format)
display_df["Beta t-stat"] = display_df["Beta t-stat"].map("{:.2f}".format)
display_df["R-squared"] = display_df["R-squared"].map("{:.2%}".format)

print("CAPM Regression Results")
print("=" * 80)
print(display_df.to_string())
CAPM Regression Results
================================================================================
          Alpha (monthly) Alpha (annual) Alpha t-stat Alpha p-value  Beta Beta t-stat R-squared    N
Portfolio                                                                                           
BH                 0.0014          1.62%         1.39        0.1664  0.93       43.80    73.60%  690
BL                 0.0001          0.09%         0.16        0.8713  1.01      105.46    94.17%  690
BME                0.0004          0.42%         0.58        0.5654  0.92       69.20    87.44%  690
SH                 0.0033          3.94%         2.66        0.0080  1.06       39.64    69.55%  690
SL                -0.0026         -3.07%        -1.98        0.0486  1.33       47.19    76.40%  690
SME                0.0016          1.97%         1.61        0.1077  1.08       48.80    77.59%  690
High              -0.0006         -0.74%        -1.41        0.1588  1.09      114.80    95.04%  690
Low                0.0012          1.39%         2.08        0.0376  0.94       78.11    89.87%  690
Medium             0.0007          0.88%         1.84        0.0665  0.89      102.55    93.86%  690

Step 5: Analyze Size/BM Portfolios#

The 6 size/book-to-market portfolios are:

  • S = Small (below NYSE median market cap)

  • B = Big (above NYSE median market cap)

  • L = Low book-to-market (growth stocks, below 30th percentile)

  • M = Medium book-to-market (between 30th and 70th percentile)

  • H = High book-to-market (value stocks, above 70th percentile)

According to CAPM, all portfolios should have alpha = 0.

# Extract size/BM results
size_bm_portfolios = ["BH", "BL", "BME", "SH", "SL", "SME"]
size_bm_results = summary_df.loc[size_bm_portfolios].copy()

print("Size/Book-to-Market Portfolios - CAPM Results")
print("=" * 60)
print("\nPortfolio Key:")
print("  B = Big (large cap), S = Small (small cap)")
print("  H = High B/M (value), L = Low B/M (growth), ME = Medium B/M")
print()

# Check for significant alphas
for port in size_bm_portfolios:
    result = capm_results[port]
    sig_marker = "*" if abs(result.alpha_tstat) > 1.96 else ""
    print(f"{port}: Alpha = {result.alpha_annualized:>7.2%} (t={result.alpha_tstat:>5.2f}){sig_marker}, "
          f"Beta = {result.betas['Mkt-RF']:.2f}, R² = {result.r_squared:.2%}")

print("\n* indicates significance at 5% level (|t| > 1.96)")
Size/Book-to-Market Portfolios - CAPM Results
============================================================

Portfolio Key:
  B = Big (large cap), S = Small (small cap)
  H = High B/M (value), L = Low B/M (growth), ME = Medium B/M

BH: Alpha =   1.62% (t= 1.39), Beta = 0.93, R² = 73.60%
BL: Alpha =   0.09% (t= 0.16), Beta = 1.01, R² = 94.17%
BME: Alpha =   0.42% (t= 0.58), Beta = 0.92, R² = 87.44%
SH: Alpha =   3.94% (t= 2.66)*, Beta = 1.06, R² = 69.55%
SL: Alpha =  -3.07% (t=-1.98)*, Beta = 1.33, R² = 76.40%
SME: Alpha =   1.97% (t= 1.61), Beta = 1.08, R² = 77.59%

* indicates significance at 5% level (|t| > 1.96)

Step 6: Analyze Investment Portfolios#

The 3 investment portfolios are formed on asset growth:

  • Low: Conservative investment (below 30th percentile of asset growth)

  • Medium: Moderate investment (between 30th and 70th percentile)

  • High: Aggressive investment (above 70th percentile)

Firms with low investment (conservative) tend to have higher returns than firms with high investment (aggressive), which CAPM cannot explain.

# Extract investment portfolio results
inv_portfolios_list = ["Low", "Medium", "High"]
inv_results = summary_df.loc[inv_portfolios_list].copy()

print("Investment Portfolios - CAPM Results")
print("=" * 60)
print("\nPortfolio Key:")
print("  Low = Conservative investment (low asset growth)")
print("  Medium = Moderate investment")
print("  High = Aggressive investment (high asset growth)")
print()

for port in inv_portfolios_list:
    result = capm_results[port]
    sig_marker = "*" if abs(result.alpha_tstat) > 1.96 else ""
    print(f"{port:8s}: Alpha = {result.alpha_annualized:>7.2%} (t={result.alpha_tstat:>5.2f}){sig_marker}, "
          f"Beta = {result.betas['Mkt-RF']:.2f}, R² = {result.r_squared:.2%}")

print("\n* indicates significance at 5% level (|t| > 1.96)")
Investment Portfolios - CAPM Results
============================================================

Portfolio Key:
  Low = Conservative investment (low asset growth)
  Medium = Moderate investment
  High = Aggressive investment (high asset growth)

Low     : Alpha =   1.39% (t= 2.08)*, Beta = 0.94, R² = 89.87%
Medium  : Alpha =   0.88% (t= 1.84), Beta = 0.89, R² = 93.86%
High    : Alpha =  -0.74% (t=-1.41), Beta = 1.09, R² = 95.04%

* indicates significance at 5% level (|t| > 1.96)

Step 7: Interpretation and Conclusions#

What We Observe#

  1. Size Effect: Small stocks (S portfolios) tend to have higher betas and may show positive alpha, suggesting they earn returns beyond what their market risk would predict.

  2. Value Effect: High book-to-market stocks (H portfolios) often show positive alpha, indicating value stocks outperform after controlling for market risk.

  3. Investment Effect: Low investment (conservative) firms may show positive alpha relative to high investment (aggressive) firms.

CAPM’s Limitations#

If CAPM were a complete description of returns, all alphas would be zero. The presence of significant alphas suggests that:

  • Market beta alone does not fully explain the cross-section of returns

  • Additional risk factors (size, value, investment) may be priced

  • This motivates multi-factor models like Fama-French 3-factor and 5-factor

The next notebook (07_Fama_French_3_factor) will analyze whether adding SMB and HML factors can explain these anomalies.

# Summary statistics
print("Summary Statistics")
print("=" * 60)
print(f"\nNumber of portfolios with significant alpha (|t| > 1.96):")
n_significant = sum(1 for r in capm_results.values() if abs(r.alpha_tstat) > 1.96)
print(f"  {n_significant} out of {len(capm_results)} portfolios")

print(f"\nAverage R-squared across all portfolios: {summary_df['R-squared'].mean():.2%}")
print(f"Range of betas: {summary_df['Beta'].min():.2f} to {summary_df['Beta'].max():.2f}")
Summary Statistics
============================================================

Number of portfolios with significant alpha (|t| > 1.96):
  3 out of 9 portfolios

Average R-squared across all portfolios: 84.17%
Range of betas: 0.89 to 1.33