CRSP Treasury Data Oveview#
The CRSP US Treasury Database is a comprehensive source of historical Treasury security data maintained by the Center for Research in Security Prices at the University of Chicago’s Booth School of Business. This section provides an essential overview of this critical dataset, which will serve as the foundation for our subsequent lecture on replicating the Gürkaynak, Sack, and Wright (2006) yield curve estimation methodology.
Database Description#
The CRSP US Treasury Database provides complete historical descriptive information and market data for US Treasury securities, including:
Prices
Returns
Accrued interest
Yields
Durations
The database begins in 1961 and is updated monthly. The data is available in ASCII, Excel, and SAS formats.
Let’s examine a sample of the data using Python. First, we’ll import the necessary modules:
import pull_CRSP_treasury
import pandas as pd
from settings import config
DATA_DIR = config("DATA_DIR")
# Load the consolidated CRSP Treasury data
df = pull_CRSP_treasury.load_CRSP_treasury_consolidated(data_dir=DATA_DIR)
# Display the first few rows and data info
df[['caldt', 'tcusip', 'price', 'tcouprt', 'tdyld']].head()
caldt | tcusip | price | tcouprt | tdyld | |
---|---|---|---|---|---|
0 | 1970-01-02 | 912810AE | 101.052989 | 4.0 | 0.000212 |
1 | 1970-01-05 | 912810AE | 101.148098 | 4.0 | 0.000205 |
2 | 1970-01-06 | 912810AE | 101.190217 | 4.0 | 0.000199 |
3 | 1970-01-07 | 912810AE | 101.201087 | 4.0 | 0.000202 |
4 | 1970-01-08 | 912810AE | 101.211957 | 4.0 | 0.000204 |
# Data structure information:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2444137 entries, 0 to 2444136
Data columns (total 21 columns):
# Column Dtype
--- ------ -----
0 kytreasno float64
1 kycrspid object
2 tcusip object
3 caldt datetime64[ns]
4 tdatdt datetime64[ns]
5 tmatdt datetime64[ns]
6 tfcaldt int64
7 tdbid float64
8 tdask float64
9 tdaccint float64
10 tdyld float64
11 price float64
12 tcouprt float64
13 itype float64
14 original_maturity float64
15 years_to_maturity float64
16 tdduratn float64
17 tdretnua float64
18 days_to_maturity int64
19 callable bool
20 run int64
dtypes: bool(1), datetime64[ns](3), float64(12), int64(3), object(2)
memory usage: 375.3+ MB
In the consolidated dataframe that I have constructed from the data
in pull_CRSP_treasury.py
, the following variables are present:
Identification:
kytreasno
: Treasury record identifier (unique numeric ID)kycrspid
: CRSP-assigned unique identifiertcusip
: Treasury CUSIP number (unique security identifier)
Dates:
caldt
: Quote date (date of price observation)tdatdt
: Date dated (original issue date when interest starts accruing)tmatdt
: Maturity date (when principal is repaid)tfcaldt
: First call date (0 if not callable)
Prices and Yields:
tdbid
: Daily bid price (clean price)tdask
: Daily ask price (clean price)tdaccint
: Daily accrued interest since last coupontdyld
: Daily yield (bond equivalent yield)price
: Dirty price (clean price + accrued interest)
Issue Characteristics:
tcouprt
: Coupon rate (annual)itype
: Type of issue (1: bonds, 2: notes)callable
: Boolean flag indicating if security is callable
Maturity Measures:
original_maturity
: Years to maturity at issuanceyears_to_maturity
: Remaining years to maturitydays_to_maturity
: Remaining days to maturity
Trading Information:
tdduratn
: Duration (price sensitivity to yield changes)tdretnua
: Unadjusted return (price change + accrued interest)
Market Structure:
run
: Security’s “runness” (0: off-the-run, 1: first off-the-run, 2: on-the-run)
Data Sources#
The price data in CRSP comes from several sources over time:
Note
Historical Data Sources:
1961-1989: Manual input with double-entry verification
1990-1996 (Sept): Department of Commerce electronic bulletin board
1996 (Oct)-2009 (Jan): GovPX, Inc.
2009 (Feb)-Present: ICAP
Supplementary data comes from various official sources:
Amount outstanding: Monthly Statement of the Public Debt (Treasury Department)
Publicly held debt: US Treasury Bulletin (quarterly)
Money rates: Federal Reserve
Issue characteristics (dates, coupon payments, eligibility, tax status): US Treasury Department
Database Structure#
The database consists of three primary files:
Calendar File
Contains daily quote dates and delivery dates
Includes derived date information
Master File
Contains security identification and characteristics
Includes header information for each security
Cross-Sectional Files
Contains time series of prices, yields, and returns
Organized by date
Reference:
CRSP US TREASURY DATABASE GUIDE
https://www.crsp.org/wp-content/uploads/guides/CRSP_US_Treasury_Database_Guide_for_SAS_ASCII_EXCEL_R.pdf
Data Description:
TFZ_DLY ( DAILY TIME SERIES ITEMS)
kytreasno: TREASURY RECORD IDENTIFIER
kycrspid: CRSP-ASSIGNED UNIQUE ID
caldt: QUOTATION DATE
tdbid: DAILY BID
tdask: DAILY ASK
tdaccint: DAILY SERIES OF TOTAL ACCRUED INTEREST
tdyld: DAILY SERIES OF PROMISED DAILY YIELD
TFZ_ISS (ISSUE DESCRIPTIONS)
tcusip: TREASURY CUSIP
tdatdt: DATE DATED BY TREASURY
tmatdt: MATURITY DATE AT TIME OF ISSUE
tcouprt: COUPON RATE
itype: TYPE OF ISSUE (1: NONCALLABLE BONDS, 2: NONCALLABLE NOTES)
See for example, this function in pull_CRSP_treasury.py
:
def pull_CRSP_treasury_consolidated(
start_date="1970-01-01",
end_date=datetime.today().strftime("%Y-%m-%d"),
wrds_username=WRDS_USERNAME,
):
"""Pull consolidated CRSP Treasury data with all relevant fields.
Includes fields from these tables:
- tfz_dly (daily quotes): bid/ask prices, accrued interest, yields
- tfz_iss (issue info): CUSIP, dates, coupon rates, issue types
Price Terminology:
- Clean Price = (bid + ask)/2 = quoted price without accrued interest
- Dirty Price = Clean Price + Accrued Interest = actual transaction price
Date Fields:
- Quote Date (caldt): Date of the price observation
- Date Dated (tdatdt): Original issue date when interest starts accruing
- Maturity Date (tmatdt): Date when the security matures
Returns:
- Unadjusted Return (tdretnua): Simple price change plus accrued interest,
not accounting for tax effects or reinvestment
Fields are based on CRSP Daily US Treasury Database Guide specifications.
"""
query = f"""
SELECT
-- Identification
tfz.kytreasno, tfz.kycrspid, iss.tcusip,
-- Dates
tfz.caldt, -- Quote date: Date of price observation
iss.tdatdt, -- Date dated: Original issue date when interest starts accruing
iss.tmatdt, -- Maturity date: When principal is repaid
iss.tfcaldt, -- First call date (0 if not callable)
-- Prices and Yields
tfz.tdbid, -- Bid price (clean)
tfz.tdask, -- Ask price (clean)
tfz.tdaccint, -- Accrued interest since last coupon
tfz.tdyld, -- Bond equivalent yield
((tfz.tdbid + tfz.tdask) / 2.0 + tfz.tdaccint) AS price,
-- Dirty price (clean price + accrued interest)
-- Issue Characteristics
iss.tcouprt, -- Coupon rate (annual)
iss.itype, -- Type of issue (1: bonds, 2: notes)
ROUND((iss.tmatdt - iss.tdatdt) / 365.0) AS original_maturity,
-- Original maturity at issuance
-- Additional Derived Fields
ROUND((iss.tmatdt - tfz.caldt) / 365.0) AS years_to_maturity,
-- Remaining time to maturity
-- Trading info (if available)
tfz.tdduratn, -- Duration: Price sensitivity to yield changes
tfz.tdretnua -- Return (unadjusted): Simple price change + accrued interest
FROM
crspm.tfz_dly AS tfz
LEFT JOIN
crspm.tfz_iss AS iss
ON
tfz.kytreasno = iss.kytreasno AND
tfz.kycrspid = iss.kycrspid
WHERE
tfz.caldt BETWEEN '{start_date}' AND '{end_date}' AND
iss.itype IN (1, 2) -- Only include Treasury bonds (1) and notes (2)
"""
db = wrds.Connection(wrds_username=wrds_username)
df = db.raw_sql(query, date_cols=["caldt", "tdatdt", "tmatdt", "tfcaldt"])
df["days_to_maturity"] = (df["tmatdt"] - df["caldt"]).dt.days
df["tfcaldt"] = df["tfcaldt"].fillna(0)
df["callable"] = df["tfcaldt"] != 0 # Add boolean callable flag
db.close()
df = df.reset_index(drop=True)
return df
Data Quality and Filtering#
CRSP implements several quality control measures:
Price verification:
Historical manual prices were double-entered and compared
Multiple iteration verification process
Logical filters for data cleaning
Spread calculation:
Pre-2009: Imputed from representative quotes
Post-2009: Actual bid-ask spreads from ICAP
Secondary verification:
Cross-checking against sources like Wall Street Journal
Internal consistency checks with each release
Let’s examine the bid-ask spreads in our data:
df['spread'] = df['tdask'] - df['tdbid']
print("\nBid-Ask Spread Statistics:")
df.groupby(df['caldt'].dt.year)['spread'].describe().tail()
Bid-Ask Spread Statistics:
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
caldt | ||||||||
2020 | 78765.0 | 0.048491 | 0.017829 | 0.007812 | 0.039062 | 0.046875 | 0.062500 | 2.070312 |
2021 | 80632.0 | 0.047548 | 0.013401 | 0.007812 | 0.039062 | 0.046875 | 0.062500 | 0.218750 |
2022 | 81005.0 | 0.041079 | 0.016491 | 0.007812 | 0.031250 | 0.039062 | 0.046875 | 0.328125 |
2023 | 83652.0 | 0.039486 | 0.017600 | 0.007812 | 0.031250 | 0.031250 | 0.046875 | 0.546875 |
2024 | 85415.0 | 0.039168 | 0.019176 | 0.000028 | 0.031250 | 0.031250 | 0.046875 | 0.500000 |
df.groupby(df['caldt'].dt.year)['spread'].describe().head()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
caldt | ||||||||
1970 | 9461.0 | 0.293911 | 0.234581 | 0.03125 | 0.125 | 0.25 | 0.25 | 1.0 |
1971 | 9924.0 | 0.321119 | 0.279893 | 0.03125 | 0.125 | 0.25 | 0.25 | 1.0 |
1972 | 10442.0 | 0.322744 | 0.244351 | 0.03125 | 0.250 | 0.25 | 0.25 | 1.0 |
1973 | 10733.0 | 0.327052 | 0.261802 | 0.03125 | 0.250 | 0.25 | 0.25 | 1.0 |
1974 | 11051.0 | 0.387957 | 0.328981 | 0.06250 | 0.250 | 0.25 | 0.25 | 1.0 |
Looking Ahead#
This overview provides the foundation for our next lecture, where we will examine how GĂĽrkaynak, Sack, and Wright (2006) used CRSP Treasury data to construct their influential yield curve estimates. We will implement their methodology using Python, focusing on their specific data filtering approach and estimation techniques.
Note: The CRSP Treasury Database serves as the primary source for many influential academic studies in fixed income research. Understanding its structure and evolution is crucial for empirical work in this area.
References#
Center for Research in Security Prices. (2010). Daily US Treasury Database Guide. See https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/crsp/treasuries/