CAPM Analysis of Size, Value, and Investment Portfolios#
Introduction#
The Capital Asset Pricing Model (CAPM) predicts that excess returns are proportional to market beta, with Jensen’s alpha (intercept) equal to zero in equilibrium. This notebook tests this prediction on portfolios sorted by:
Size and Book-to-Market: 6 portfolios formed on market capitalization and book-to-market ratio (from Fama-French 1993)
Investment: 3 portfolios formed on asset growth (conservative, medium, aggressive investment)
Learning Objectives#
Load and prepare portfolio return data from parquet files
Run CAPM regressions using the
finmpackageInterpret alpha, beta, t-statistics, and R-squared
Create publication-quality regression results tables
Understand anomalies that CAPM cannot explain
The CAPM Model#
The CAPM regression equation is:
Where:
\(R_{p,t}\) is portfolio return at time \(t\)
\(R_{f,t}\) is the risk-free rate
\(R_{m,t}\) is the market return
\(\alpha_p\) is Jensen’s alpha (abnormal return not explained by market risk)
\(\beta_p\) is market beta (sensitivity to market movements)
import pandas as pd
import numpy as np
import finm
from settings import config
DATA_DIR = config("DATA_DIR")
Step 1: Load Data#
We load three data sources:
FF_1993_vwret.parquet: 6 size/book-to-market portfolio returns (monthly)
univ_3_inv_vwret.parquet: 3 investment portfolio returns (monthly)
FF_FACTORS.parquet: Market factor (Mkt-RF) and risk-free rate (RF)
# Load the 6 Fama-French size/BM portfolios
ff_portfolios = pd.read_parquet(f"{DATA_DIR}/FF_1993_vwret.parquet")
print(f"FF Size/BM portfolios shape: {ff_portfolios.shape}")
print(f"Columns: {ff_portfolios.columns.tolist()}")
print(f"Portfolio codes: {ff_portfolios['sbport'].unique()}")
ff_portfolios.head()
FF Size/BM portfolios shape: (4284, 5)
Columns: ['jdate', 'szport', 'bmport', 'vwret', 'sbport']
Portfolio codes: ['BH' 'BL' 'BME' 'SH' 'SL' 'SME']
| jdate | szport | bmport | vwret | sbport | |
|---|---|---|---|---|---|
| 0 | 1965-07-31 | B | H | 0.033876 | BH |
| 1 | 1965-07-31 | B | L | 0.016636 | BL |
| 2 | 1965-07-31 | B | ME | 0.003068 | BME |
| 3 | 1965-07-31 | S | H | 0.038735 | SH |
| 4 | 1965-07-31 | S | L | 0.033604 | SL |
# Pivot to wide format (portfolios as columns)
ff_wide = ff_portfolios.pivot(index="jdate", columns="sbport", values="vwret")
ff_wide.columns.name = None
print(f"FF portfolios wide format: {ff_wide.shape}")
ff_wide.head()
FF portfolios wide format: (714, 6)
| BH | BL | BME | SH | SL | SME | |
|---|---|---|---|---|---|---|
| jdate | ||||||
| 1965-07-31 | 0.033876 | 0.016636 | 0.003068 | 0.038735 | 0.033604 | 0.025474 |
| 1965-08-31 | 0.023274 | 0.029329 | 0.015826 | 0.037690 | 0.050752 | 0.046083 |
| 1965-09-30 | 0.027948 | 0.048687 | 0.020349 | 0.044086 | 0.034213 | 0.034153 |
| 1965-10-31 | 0.056545 | 0.031947 | 0.010186 | 0.060084 | 0.048750 | 0.028552 |
| 1965-11-30 | -0.016404 | -0.012471 | -0.011560 | 0.041411 | 0.041210 | 0.026719 |
# Load the 3 investment portfolios
inv_portfolios = pd.read_parquet(f"{DATA_DIR}/univ_3_inv_vwret.parquet")
print(f"\nInvestment portfolios shape: {inv_portfolios.shape}")
print(f"Columns: {inv_portfolios.columns.tolist()}")
print(f"Portfolio codes: {inv_portfolios['inv_port'].unique()}")
inv_portfolios.head()
Investment portfolios shape: (2070, 3)
Columns: ['jdate', 'inv_port', 'vwret_inv']
Portfolio codes: ['High' 'Low' 'Medium']
| jdate | inv_port | vwret_inv | |
|---|---|---|---|
| 0 | 1967-07-31 | High | 0.060813 |
| 1 | 1967-07-31 | Low | 0.058490 |
| 2 | 1967-07-31 | Medium | 0.034118 |
| 3 | 1967-08-31 | High | -0.004922 |
| 4 | 1967-08-31 | Low | -0.018276 |
# Pivot to wide format
inv_wide = inv_portfolios.pivot(index="jdate", columns="inv_port", values="vwret_inv")
inv_wide.columns.name = None
print(f"Investment portfolios wide format: {inv_wide.shape}")
inv_wide.head()
Investment portfolios wide format: (690, 3)
| High | Low | Medium | |
|---|---|---|---|
| jdate | |||
| 1967-07-31 | 0.060813 | 0.058490 | 0.034118 |
| 1967-08-31 | -0.004922 | -0.018276 | -0.007149 |
| 1967-09-30 | 0.023649 | 0.017793 | 0.038591 |
| 1967-10-31 | -0.019926 | -0.036670 | -0.027705 |
| 1967-11-30 | 0.006494 | -0.005037 | 0.011197 |
# Load Fama-French factors (including market factor and risk-free rate)
factors = pd.read_parquet(f"{DATA_DIR}/FF_FACTORS.parquet")
# Rename columns to standard names used by finm package
factors = factors.rename(columns={
"date": "jdate",
"mktrf": "Mkt-RF",
"smb": "SMB",
"hml": "HML",
"rf": "RF",
})
# Convert nullable Float64 to numpy float64 for compatibility
for col in ["Mkt-RF", "SMB", "HML", "RF"]:
factors[col] = factors[col].astype("float64")
factors = factors.set_index("jdate")
print(f"\nFactors shape: {factors.shape}")
print(f"Columns: {factors.columns.tolist()}")
factors.head()
Factors shape: (1193, 8)
Columns: ['Mkt-RF', 'SMB', 'HML', 'RF', 'year', 'month', 'umd', 'dateff']
| Mkt-RF | SMB | HML | RF | year | month | umd | dateff | |
|---|---|---|---|---|---|---|---|---|
| jdate | ||||||||
| 1926-07-31 | 0.0289 | -0.0255 | -0.0239 | 0.0022 | 1926.0 | 7.0 | <NA> | 1926-07-31 |
| 1926-08-31 | 0.0264 | -0.0114 | 0.0381 | 0.0025 | 1926.0 | 8.0 | <NA> | 1926-08-31 |
| 1926-09-30 | 0.0038 | -0.0136 | 0.0005 | 0.0023 | 1926.0 | 9.0 | <NA> | 1926-09-30 |
| 1926-10-31 | -0.0327 | -0.0014 | 0.0082 | 0.0032 | 1926.0 | 10.0 | <NA> | 1926-10-30 |
| 1926-11-30 | 0.0254 | -0.0011 | -0.0061 | 0.0031 | 1926.0 | 11.0 | <NA> | 1926-11-30 |
Step 2: Align Data and Compute Excess Returns#
We need to:
Find the common date range across all datasets
Compute excess returns (portfolio return - risk-free rate)
# Find common dates across all datasets
common_dates = ff_wide.index.intersection(inv_wide.index).intersection(factors.index)
print(f"Common dates: {len(common_dates)}")
print(f"Date range: {common_dates.min()} to {common_dates.max()}")
# Align all data to common dates
ff_aligned = ff_wide.loc[common_dates]
inv_aligned = inv_wide.loc[common_dates]
factors_aligned = factors.loc[common_dates]
# Combine all portfolios
all_portfolios = pd.concat([ff_aligned, inv_aligned], axis=1)
print(f"\nAll portfolios shape: {all_portfolios.shape}")
print(f"Portfolio names: {all_portfolios.columns.tolist()}")
Common dates: 690
Date range: 1967-07-31 00:00:00 to 2024-12-31 00:00:00
All portfolios shape: (690, 9)
Portfolio names: ['BH', 'BL', 'BME', 'SH', 'SL', 'SME', 'High', 'Low', 'Medium']
# Compute excess returns for each portfolio
rf = factors_aligned["RF"]
excess_returns = all_portfolios.sub(rf, axis=0)
print("Computed excess returns for all portfolios")
excess_returns.head()
Computed excess returns for all portfolios
| BH | BL | BME | SH | SL | SME | High | Low | Medium | |
|---|---|---|---|---|---|---|---|---|---|
| jdate | |||||||||
| 1967-07-31 | 0.069242 | 0.046555 | 0.028994 | 0.096480 | 0.059197 | 0.066041 | 0.057613 | 0.055290 | 0.030918 |
| 1967-08-31 | 0.009736 | -0.011248 | -0.015236 | 0.001396 | -0.005925 | -0.008742 | -0.008022 | -0.021376 | -0.010249 |
| 1967-09-30 | 0.017920 | 0.036723 | 0.019407 | 0.046046 | 0.054902 | 0.036981 | 0.020449 | 0.014593 | 0.035391 |
| 1967-10-31 | -0.052199 | -0.016809 | -0.044634 | -0.039934 | -0.020694 | -0.029065 | -0.023826 | -0.040570 | -0.031605 |
| 1967-11-30 | -0.005694 | 0.007192 | 0.000413 | -0.015107 | 0.010628 | 0.005021 | 0.002894 | -0.008637 | 0.007597 |
Step 3: Run CAPM Regressions#
For each portfolio, we regress excess returns on the market excess return. The results will tell us:
Alpha: Is there abnormal return after accounting for market risk?
Beta: How sensitive is the portfolio to market movements?
R-squared: How much of the variance is explained by the market?
# Get market excess return
mkt_rf = factors_aligned["Mkt-RF"]
# Run CAPM regression for each portfolio
capm_results = {}
for portfolio in all_portfolios.columns:
result = finm.run_capm_regression(
excess_returns=excess_returns[portfolio],
market_excess_returns=mkt_rf,
annualization_factor=12, # Monthly data
)
capm_results[portfolio] = result
print(f"Completed CAPM regressions for {len(capm_results)} portfolios")
Completed CAPM regressions for 9 portfolios
Step 4: Create Results Table#
We compile all regression results into a summary table with:
Alpha (monthly and annualized)
Alpha t-statistic
Market beta
Beta t-statistic
R-squared
# Create summary DataFrame
summary_data = []
for portfolio, result in capm_results.items():
summary_data.append({
"Portfolio": portfolio,
"Alpha (monthly)": result.alpha,
"Alpha (annual)": result.alpha_annualized,
"Alpha t-stat": result.alpha_tstat,
"Alpha p-value": result.alpha_pvalue,
"Beta": result.betas["Mkt-RF"],
"Beta t-stat": result.beta_tstats["Mkt-RF"],
"R-squared": result.r_squared,
"N": result.n_observations,
})
summary_df = pd.DataFrame(summary_data)
summary_df = summary_df.set_index("Portfolio")
# Format for display
display_df = summary_df.copy()
display_df["Alpha (monthly)"] = display_df["Alpha (monthly)"].map("{:.4f}".format)
display_df["Alpha (annual)"] = display_df["Alpha (annual)"].map("{:.2%}".format)
display_df["Alpha t-stat"] = display_df["Alpha t-stat"].map("{:.2f}".format)
display_df["Alpha p-value"] = display_df["Alpha p-value"].map("{:.4f}".format)
display_df["Beta"] = display_df["Beta"].map("{:.2f}".format)
display_df["Beta t-stat"] = display_df["Beta t-stat"].map("{:.2f}".format)
display_df["R-squared"] = display_df["R-squared"].map("{:.2%}".format)
print("CAPM Regression Results")
print("=" * 80)
print(display_df.to_string())
CAPM Regression Results
================================================================================
Alpha (monthly) Alpha (annual) Alpha t-stat Alpha p-value Beta Beta t-stat R-squared N
Portfolio
BH 0.0014 1.62% 1.39 0.1664 0.93 43.80 73.60% 690
BL 0.0001 0.09% 0.16 0.8713 1.01 105.46 94.17% 690
BME 0.0004 0.42% 0.58 0.5654 0.92 69.20 87.44% 690
SH 0.0033 3.94% 2.66 0.0080 1.06 39.64 69.55% 690
SL -0.0026 -3.07% -1.98 0.0486 1.33 47.19 76.40% 690
SME 0.0016 1.97% 1.61 0.1077 1.08 48.80 77.59% 690
High -0.0006 -0.74% -1.41 0.1588 1.09 114.80 95.04% 690
Low 0.0012 1.39% 2.08 0.0376 0.94 78.11 89.87% 690
Medium 0.0007 0.88% 1.84 0.0665 0.89 102.55 93.86% 690
Step 5: Analyze Size/BM Portfolios#
The 6 size/book-to-market portfolios are:
S = Small (below NYSE median market cap)
B = Big (above NYSE median market cap)
L = Low book-to-market (growth stocks, below 30th percentile)
M = Medium book-to-market (between 30th and 70th percentile)
H = High book-to-market (value stocks, above 70th percentile)
According to CAPM, all portfolios should have alpha = 0.
# Extract size/BM results
size_bm_portfolios = ["BH", "BL", "BME", "SH", "SL", "SME"]
size_bm_results = summary_df.loc[size_bm_portfolios].copy()
print("Size/Book-to-Market Portfolios - CAPM Results")
print("=" * 60)
print("\nPortfolio Key:")
print(" B = Big (large cap), S = Small (small cap)")
print(" H = High B/M (value), L = Low B/M (growth), ME = Medium B/M")
print()
# Check for significant alphas
for port in size_bm_portfolios:
result = capm_results[port]
sig_marker = "*" if abs(result.alpha_tstat) > 1.96 else ""
print(f"{port}: Alpha = {result.alpha_annualized:>7.2%} (t={result.alpha_tstat:>5.2f}){sig_marker}, "
f"Beta = {result.betas['Mkt-RF']:.2f}, R² = {result.r_squared:.2%}")
print("\n* indicates significance at 5% level (|t| > 1.96)")
Size/Book-to-Market Portfolios - CAPM Results
============================================================
Portfolio Key:
B = Big (large cap), S = Small (small cap)
H = High B/M (value), L = Low B/M (growth), ME = Medium B/M
BH: Alpha = 1.62% (t= 1.39), Beta = 0.93, R² = 73.60%
BL: Alpha = 0.09% (t= 0.16), Beta = 1.01, R² = 94.17%
BME: Alpha = 0.42% (t= 0.58), Beta = 0.92, R² = 87.44%
SH: Alpha = 3.94% (t= 2.66)*, Beta = 1.06, R² = 69.55%
SL: Alpha = -3.07% (t=-1.98)*, Beta = 1.33, R² = 76.40%
SME: Alpha = 1.97% (t= 1.61), Beta = 1.08, R² = 77.59%
* indicates significance at 5% level (|t| > 1.96)
Step 6: Analyze Investment Portfolios#
The 3 investment portfolios are formed on asset growth:
Low: Conservative investment (below 30th percentile of asset growth)
Medium: Moderate investment (between 30th and 70th percentile)
High: Aggressive investment (above 70th percentile)
Firms with low investment (conservative) tend to have higher returns than firms with high investment (aggressive), which CAPM cannot explain.
# Extract investment portfolio results
inv_portfolios_list = ["Low", "Medium", "High"]
inv_results = summary_df.loc[inv_portfolios_list].copy()
print("Investment Portfolios - CAPM Results")
print("=" * 60)
print("\nPortfolio Key:")
print(" Low = Conservative investment (low asset growth)")
print(" Medium = Moderate investment")
print(" High = Aggressive investment (high asset growth)")
print()
for port in inv_portfolios_list:
result = capm_results[port]
sig_marker = "*" if abs(result.alpha_tstat) > 1.96 else ""
print(f"{port:8s}: Alpha = {result.alpha_annualized:>7.2%} (t={result.alpha_tstat:>5.2f}){sig_marker}, "
f"Beta = {result.betas['Mkt-RF']:.2f}, R² = {result.r_squared:.2%}")
print("\n* indicates significance at 5% level (|t| > 1.96)")
Investment Portfolios - CAPM Results
============================================================
Portfolio Key:
Low = Conservative investment (low asset growth)
Medium = Moderate investment
High = Aggressive investment (high asset growth)
Low : Alpha = 1.39% (t= 2.08)*, Beta = 0.94, R² = 89.87%
Medium : Alpha = 0.88% (t= 1.84), Beta = 0.89, R² = 93.86%
High : Alpha = -0.74% (t=-1.41), Beta = 1.09, R² = 95.04%
* indicates significance at 5% level (|t| > 1.96)
Step 7: Interpretation and Conclusions#
What We Observe#
Size Effect: Small stocks (S portfolios) tend to have higher betas and may show positive alpha, suggesting they earn returns beyond what their market risk would predict.
Value Effect: High book-to-market stocks (H portfolios) often show positive alpha, indicating value stocks outperform after controlling for market risk.
Investment Effect: Low investment (conservative) firms may show positive alpha relative to high investment (aggressive) firms.
CAPM’s Limitations#
If CAPM were a complete description of returns, all alphas would be zero. The presence of significant alphas suggests that:
Market beta alone does not fully explain the cross-section of returns
Additional risk factors (size, value, investment) may be priced
This motivates multi-factor models like Fama-French 3-factor and 5-factor
The next notebook (07_Fama_French_3_factor) will analyze whether adding SMB and HML factors can explain these anomalies.
# Summary statistics
print("Summary Statistics")
print("=" * 60)
print(f"\nNumber of portfolios with significant alpha (|t| > 1.96):")
n_significant = sum(1 for r in capm_results.values() if abs(r.alpha_tstat) > 1.96)
print(f" {n_significant} out of {len(capm_results)} portfolios")
print(f"\nAverage R-squared across all portfolios: {summary_df['R-squared'].mean():.2%}")
print(f"Range of betas: {summary_df['Beta'].min():.2f} to {summary_df['Beta'].max():.2f}")
Summary Statistics
============================================================
Number of portfolios with significant alpha (|t| > 1.96):
3 out of 9 portfolios
Average R-squared across all portfolios: 84.17%
Range of betas: 0.89 to 1.33