HW Guide Part A: CRSP Market Returns Indices#

The CRSP (Center for Research in Security Prices) dataset provides two indices for market returns: an equal-weighted index and a value-weighted index (both provided in terms of returns with and without dividends). The equal-weighted index computes the simple average of returns across stocks. This series is available as EWRETD and EWRETX, (with and without dividends, respectively). The value-Weighted Returns index represents a stock market index that calculates the return on investment by considering both the price changes and dividends of each component security, weighted by its market capitalization. This means that larger companies have a greater impact on the index’s performance compared to smaller companies. The value-weighting approach aims to reflect the actual investment returns that an investor would achieve by holding a market portfolio, mirroring the performance of the overall market or specific market segments more accurately than equal-weighted indices. The CRSP indices are widely used in academic research and financial analysis to study market trends, evaluate investment strategies, and benchmark the performance of portfolios against the broader market. This series is available in the CRSP tables under the mnemonic VWRETD and VWRETX (with and without dividends, respectively).

In this guide, we’ll discuss the construction of the equal- and value-weighted market return indices. To construct these indices, we’ll follow the suggestions here: https://wrds-www.wharton.upenn.edu/pages/support/support-articles/crsp/index-and-deciles/constructing-value-weighted-return-series-matches-vwretd-crsp-monthly-value-weighted-returns-includes-distributions/

These suggestions boil down to the most important part: we must select the correct universe of stocks that comprise “the market”.

import pandas as pd

import calc_CRSP_indices
import misc_tools
import pull_CRSP_stock
from settings import config

DATA_DIR = config("DATA_DIR")
df_msf = pull_CRSP_stock.load_CRSP_monthly_file(data_dir=DATA_DIR)
df_msix = pull_CRSP_stock.load_CRSP_index_files(data_dir=DATA_DIR)
df_msix.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 720 entries, 0 to 719
Data columns (total 35 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   caldt     720 non-null    datetime64[ns]
 1   vwretd    720 non-null    Float64       
 2   vwindd    720 non-null    Float64       
 3   vwretx    720 non-null    Float64       
 4   vwindx    720 non-null    Float64       
 5   ewretd    720 non-null    Float64       
 6   ewindd    720 non-null    Float64       
 7   ewretx    720 non-null    Float64       
 8   ewindx    720 non-null    Float64       
 9   sprtrn    720 non-null    Float64       
 10  spindx    720 non-null    Float64       
 11  decret1   720 non-null    Float64       
 12  decind1   720 non-null    Float64       
 13  decret2   720 non-null    Float64       
 14  decind2   720 non-null    Float64       
 15  decret3   720 non-null    Float64       
 16  decind3   720 non-null    Float64       
 17  decret4   720 non-null    Float64       
 18  decind4   720 non-null    Float64       
 19  decret5   720 non-null    Float64       
 20  decind5   720 non-null    Float64       
 21  decret6   720 non-null    Float64       
 22  decind6   720 non-null    Float64       
 23  decret7   720 non-null    Float64       
 24  decind7   720 non-null    Float64       
 25  decret8   720 non-null    Float64       
 26  decind8   720 non-null    Float64       
 27  decret9   720 non-null    Float64       
 28  decind9   720 non-null    Float64       
 29  decret10  720 non-null    Float64       
 30  decind10  720 non-null    Float64       
 31  totval    720 non-null    Float64       
 32  totcnt    720 non-null    Int64         
 33  usdval    720 non-null    Float64       
 34  usdcnt    720 non-null    Int64         
dtypes: Float64(32), Int64(2), datetime64[ns](1)
memory usage: 220.9 KB
df_msix[
    [
        "caldt",
        "vwretd",
        "vwretx",
        "vwindx",
        "ewretd",
        "ewretx",
    ]
].tail()
caldt vwretd vwretx vwindx ewretd ewretx
715 2024-08-30 0.022022 0.02056 4277.321 -0.017553 -0.019342
716 2024-09-30 0.02104 0.019764 4361.857 0.014494 0.012487
717 2024-10-31 -0.007766 -0.00864 4324.172 -0.002514 -0.00355
718 2024-11-29 0.06656 0.065075 4605.566 0.072639 0.07098
719 2024-12-31 -0.031325 -0.032778 4454.604 -0.018721 -0.021791

Inclusion into the CRSP Market Index:#

From https://wrds-www.wharton.upenn.edu/pages/support/support-articles/crsp/index-and-deciles/constructing-value-weighted-return-series-matches-vwretd-crsp-monthly-value-weighted-returns-includes-distributions/ ,

Our experiments with different VWRETD replication methods show that it is relatively easy to come close to this data series using PERMNO-based returns in the CRSP datasets, but exact matches to every data month is not possible because we do not know the exact sample set of PERMNOs used by CRSP. Their criteria is listed in the CRSP manual and is roughly:

CRSP CAP-BASED PORTFOLIOS – The following types of securities, listed on NYSE, AMEX, and Nasdaq National Market, are eligible for inclusion in the Cap-Based Indices:

  • Common Stocks

  • Certificates

  • Shares of Beneficial Interest

  • Units (Depository Units, Units of Beneficial Interest, Units of Limited Partnership Interest, Depository Receipts, etc.)

The following types of securities are NOT eligible for inclusion in the Cap-Based Indices:

  • ADRs

  • Closed-End Mutual Funds, WEBS Index Funds, Unit Investment Trusts

  • All Common Stocks with non-US Incorporation

  • Americus Trust Components

  • HOLDRs Trusts

  • REITs (Real Estate Investment Trusts)

  • Rights and Warrants

  • Preferred stock

  • “Packaged” Units (Common Stocks Bundled with Rights or Warrants)

  • Over-the-Counter Bulletin Board Issues

  • N.B. The Cap-Based Indices do include returns from time ranges during which eligible securities trade on “leading prices” or “reorganization” when-issued status. The Cap-Based Indices do NOT include returns from time ranges during which eligible securities trade on “ex-distribution” or “additional” when-issued status.

Note that VWRETD is not computed by WRDS but provided directly by CRSP along with the PERMNO based returns. For general SAS coding help for this problem see the WRDS Research Application: Portfolios by Size and Book-to-Market. This WRDS Support document provides examples of cap-based decile breakdowns, but the same general principles apply to the total market index.

I’ve provided code for you that will take care of this subsetting in the function pull_CRSP_monthly_file:

    SELECT
        date,
        msf.permno, msf.permco, shrcd, exchcd, comnam, shrcls,
        ret, retx, dlret, dlretx, dlstcd,
        prc, altprc, vol, shrout, cfacshr, cfacpr,
        naics, siccd
    FROM crspm.msf AS msf
    LEFT JOIN
        crspm.msenames as msenames
    ON
        msf.permno = msenames.permno AND
        msenames.namedt <= msf.date AND
        msf.date <= msenames.nameendt
    LEFT JOIN
        crspm.msedelist as msedelist
    ON
        msf.permno = msedelist.permno AND
        date_trunc('month', msf.date)::date =
        date_trunc('month', msedelist.dlstdt)::date
    WHERE
        msf.date BETWEEN '{start_date}' AND '{end_date}' AND
        msenames.shrcd IN (10, 11, 20, 21, 40, 41, 70, 71, 73)

To best understand this, please look up shrcd in the Data Manual here: https://wrds-www.wharton.upenn.edu/documents/396/CRSP_US_Stock_Indices_Data_Descriptions.pdf . You’ll find the information on p. 81.

Calculation of Equal-Weighted Returns and Value-Weighted Returns#

With the proper universe of stocks in hand, all that is left is to group the returns by permno (the identifier of choice here) and average. However, the equal weighted average is a mere simple average. To calculate the value-weighted average, we need to calculate the lagged market cap of each stock \(i\) at time \(t\).

That is, the value-weighted return is given by the following formula:

\[ r_t = \frac{\sum_{i=1}^{N_t} w_{i,t-1} \, r_{i,t}}{\sum_{i=1}^{N_t} w_{i,t-1}} \]

where \(w_{i,t-1}\) is the market capitalization of stock \(i\) at time \(t-1\) and \(r_t\) can be the returns with dividends ret or the returns without dividends retx. The market capitalization of a stock is its price times the shares outstanding, $\( w_{it} = \text{SHROUT}_{it} \times \text{PRC}_{it}. \)$

df_eq_idx = calc_CRSP_indices.calc_equal_weighted_index(df_msf)
df_vw_idx = calc_CRSP_indices.calc_CRSP_value_weighted_index(df_msf)
df_idxs = calc_CRSP_indices.calc_CRSP_indices_merge(df_msf, df_msix)
df_idxs[
    [
        "vwretd",
        "vwretx",
        "ewretd",
        "ewretx",
        "vwretd_manual",
        "vwretx_manual",
        "ewretd_manual",
        "ewretx_manual",
    ]
].head()
vwretd vwretx ewretd ewretx vwretd_manual vwretx_manual ewretd_manual ewretx_manual
date
1965-03-31 -0.009715 -0.011428 0.015393 0.012949 -0.010013 -0.011732 0.015417 0.01297
1965-04-30 0.033652 0.032737 0.04156 0.040315 0.033658 0.032725 0.04163 0.040378
1965-06-30 -0.051868 -0.053592 -0.083718 -0.086084 -0.051935 -0.053614 -0.083916 -0.086195
1965-08-31 0.030883 0.026133 0.041844 0.039063 0.030763 0.02599 0.041557 0.038771
1965-09-30 0.031952 0.030357 0.02742 0.025166 0.032055 0.030437 0.027271 0.025007
df_idxs[
    [
        "vwretd",
        "vwretx",
        "ewretd",
        "ewretx",
        "vwretd_manual",
        "vwretx_manual",
        "ewretd_manual",
        "ewretx_manual",
    ]
].corr()
vwretd vwretx ewretd ewretx vwretd_manual vwretx_manual ewretd_manual ewretx_manual
vwretd 1.000000 0.999479 0.859245 0.859267 0.999565 0.999044 0.857187 0.857211
vwretx 0.999479 1.000000 0.859486 0.859744 0.999058 0.999561 0.857472 0.857772
ewretd 0.859245 0.859486 1.000000 0.999933 0.860192 0.860531 0.999121 0.999037
ewretx 0.859267 0.859744 0.999933 1.000000 0.860217 0.860792 0.999051 0.999095
vwretd_manual 0.999565 0.999058 0.860192 0.860217 1.000000 0.999494 0.858262 0.858286
vwretx_manual 0.999044 0.999561 0.860531 0.860792 0.999494 1.000000 0.858655 0.858957
ewretd_manual 0.857187 0.857472 0.999121 0.999051 0.858262 0.858655 1.000000 0.999930
ewretx_manual 0.857211 0.857772 0.999037 0.999095 0.858286 0.858957 0.999930 1.000000

As you can see above, our manually-created return index doesn’t match the CRSP index perfectly but is still very close. In this HW, you’ll be required to construct this index only approximately. A loose match, as seen here, will be fine.

Note, a helpful tool to create the lagged time series for market capitalization is provided in misc_tools. Use the function with_lagged_column, which will create a lagged column that accounts for the fact that multiple stocks show up in a flat file. See the following example:

a = [
    [1, "1990/1/1", 1],
    [1, "1990/2/1", 2],
    [1, "1990/3/1", 3],
    [2, "1989/12/1", 3],
    [2, "1990/1/1", 3],
    [2, "1990/2/1", 4],
    [2, "1990/3/1", 5.5],
    [2, "1990/4/1", 5],
    [2, "1990/6/1", 6],
]
data = pd.DataFrame(a, columns=["id", "date", "value"])
data["date"] = pd.to_datetime(data["date"])
data
id date value
0 1 1990-01-01 1.0
1 1 1990-02-01 2.0
2 1 1990-03-01 3.0
3 2 1989-12-01 3.0
4 2 1990-01-01 3.0
5 2 1990-02-01 4.0
6 2 1990-03-01 5.5
7 2 1990-04-01 5.0
8 2 1990-06-01 6.0
data_lag = misc_tools.with_lagged_columns(
    df=data, column_to_lag="value", id_column="id", lags=1, freq="MS"
)
data_lag
id date value L1_value
2 1 1990-01-01 1.0 NaN
4 1 1990-02-01 2.0 1.0
6 1 1990-03-01 3.0 2.0
8 1 1990-04-01 NaN 3.0
1 2 1989-12-01 3.0 NaN
3 2 1990-01-01 3.0 3.0
5 2 1990-02-01 4.0 3.0
7 2 1990-03-01 5.5 4.0
9 2 1990-04-01 5.0 5.5
11 2 1990-05-01 NaN 5.0
13 2 1990-06-01 6.0 NaN

As you can see, naively using shift to create our lag would miss the fact that observation 1989-12-01 for stock id=2 should have a missing lagged value. For example, the following would be incorrect:

data["value"].shift(1)
0    NaN
1    1.0
2    2.0
3    3.0
4    3.0
5    3.0
6    4.0
7    5.5
8    5.0
Name: value, dtype: float64