{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example: Connecting to the WRDS Platform With Python\n",
"\n",
"\n",
"The `wrds` Python package is a tool designed to facilitate data retrieval from the Wharton Research Data Services (WRDS) database. \n",
" \n",
"- It provides direct access to the WRDS database, allowing users to programmatically query and retrieve data. \n",
"- The package supports a simple Python API as well as the ability to send raw SQL queries.\n",
"- Retrieved data can be easily converted into Pandas DataFrames.\n",
"- Secure access to the WRDS database is managed through user authentication, ensuring data security and compliance with usage policies.\n",
"\n",
"## Installation\n",
"\n",
"The package can be installed via pip:\n",
"\n",
"```\n",
"pip install wrds\n",
"```\n",
"\n",
"To use the package, one would typically import it in Python, authenticate with WRDS credentials, and then proceed with data queries and analysis. This notebook walks through some of these basic features.\n",
"\n",
"## Usage\n",
"\n",
"Here are some helpful links to learn how to use the `wrds` Python package:\n",
"\n",
" - [Video: Using Python on the WRDS Platform](https://wrds-www.wharton.upenn.edu/pages/grid-items/using-python-wrds-platform/)\n",
" - [PyPI Homepage for WRDS Python Package](https://pypi.org/project/wrds/)\n",
" - [WRDS Guide: Querying WRDS Data using Python](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/)\n",
" - [WRDS Python Package Demo Walkthrough](https://wrds-www.wharton.upenn.edu/documents/1443/wrds_connection.html)\n",
"\n",
"\n",
"### Import and Establish Connection\n",
"\n",
"Establish connection with WRDS server. \n",
"Log in using your WRDS username and password.\n",
"Set up a `pgpass` file to store the info. This will allow you to access WRDS without supplying your password each time, as long as you supply your username (which we are doing below via environment variables). We want to be able to access WRDS without a password so that we can automate the query."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import config\n",
"from pathlib import Path\n",
"OUTPUT_DIR = Path(config.OUTPUT_DIR)\n",
"DATA_DIR = Path(config.DATA_DIR)\n",
"WRDS_USERNAME = config.WRDS_USERNAME\n",
"\n",
"import wrds\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading library list...\n",
"Done\n"
]
}
],
"source": [
"db = wrds.Connection(wrds_username=WRDS_USERNAME)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" cusip | \n",
" permno | \n",
" date | \n",
" shrout | \n",
" prc | \n",
" ret | \n",
" retx | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1985-12-31 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-01-31 | \n",
" 3680.0 | \n",
" -4.375000 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-02-28 | \n",
" 3680.0 | \n",
" -3.250000 | \n",
" -0.257143 | \n",
" -0.257143 | \n",
"
\n",
" \n",
" 3 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-03-31 | \n",
" 3680.0 | \n",
" -4.437500 | \n",
" 0.365385 | \n",
" 0.365385 | \n",
"
\n",
" \n",
" 4 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-04-30 | \n",
" 3793.0 | \n",
" -4.000000 | \n",
" -0.098592 | \n",
" -0.098592 | \n",
"
\n",
" \n",
" 5 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-05-30 | \n",
" 3793.0 | \n",
" -3.109375 | \n",
" -0.222656 | \n",
" -0.222656 | \n",
"
\n",
" \n",
" 6 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-06-30 | \n",
" 3793.0 | \n",
" -3.093750 | \n",
" -0.005025 | \n",
" -0.005025 | \n",
"
\n",
" \n",
" 7 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-07-31 | \n",
" 3793.0 | \n",
" -2.843750 | \n",
" -0.080808 | \n",
" -0.080808 | \n",
"
\n",
" \n",
" 8 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-08-29 | \n",
" 3793.0 | \n",
" -1.093750 | \n",
" -0.615385 | \n",
" -0.615385 | \n",
"
\n",
" \n",
" 9 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-09-30 | \n",
" 3793.0 | \n",
" -1.031250 | \n",
" -0.057143 | \n",
" -0.057143 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" cusip permno date shrout prc ret retx\n",
"0 68391610 10000.0 1985-12-31 NaN NaN NaN NaN\n",
"1 68391610 10000.0 1986-01-31 3680.0 -4.375000 NaN NaN\n",
"2 68391610 10000.0 1986-02-28 3680.0 -3.250000 -0.257143 -0.257143\n",
"3 68391610 10000.0 1986-03-31 3680.0 -4.437500 0.365385 0.365385\n",
"4 68391610 10000.0 1986-04-30 3793.0 -4.000000 -0.098592 -0.098592\n",
"5 68391610 10000.0 1986-05-30 3793.0 -3.109375 -0.222656 -0.222656\n",
"6 68391610 10000.0 1986-06-30 3793.0 -3.093750 -0.005025 -0.005025\n",
"7 68391610 10000.0 1986-07-31 3793.0 -2.843750 -0.080808 -0.080808\n",
"8 68391610 10000.0 1986-08-29 3793.0 -1.093750 -0.615385 -0.615385\n",
"9 68391610 10000.0 1986-09-30 3793.0 -1.031250 -0.057143 -0.057143"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The `obs` keyword argument limits the number of rows returned.\n",
"# You can omit it to get all rows.\n",
"df = db.get_table(library='crsp', table='msf', columns=['cusip', 'permno', 'date', 'shrout', 'prc', 'ret', 'retx'], obs=10)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the Data\n",
"\n",
"As described [here](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/), \n",
"\n",
"> Data at WRDS is organized in a hierarchical manner by vendor (e.g. crsp), referred to at the top-level as libraries. Each library contains a number of component tables or datasets (e.g. dsf) which contain the actual data in tabular format, with column headers called variables (such as date, askhi, bidlo, etc).\n",
"> \n",
"> You can analyze the structure of the data through its metadata using the wrds module, as outlined in the following steps:\n",
">\n",
"> - List all available libraries at WRDS using list_libraries()\n",
"> - Select a library to work with, and list all available datasets within that library using list_tables()\n",
"> - Select a dataset, and list all available variables (column headers) within that dataset using describe_table()\n",
">\n",
"> **NOTE:** When referencing library and dataset names, you must use all lowercase.\n",
">\n",
"> Alternatively, a comprehensive list of all WRDS libraries is available at the Dataset List. This resource provides a listing of each library, their component datasets and variables, as well as a tabular database preview feature, and is helpful in establishing the structure of the data you're looking for in an easy manner from a Web browser.\n",
">\n",
"> 1. Determine the libraries available at WRDS:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['aha_sample',\n",
" 'ahasamp',\n",
" 'audit',\n",
" 'audit_acct_os',\n",
" 'audit_audit_comp',\n",
" 'audit_common',\n",
" 'audit_corp_legal',\n",
" 'auditsmp',\n",
" 'auditsmp_all',\n",
" 'bank']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted(db.list_libraries())[0:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> This will list all libraries available at WRDS in alphabetical order. Though all libraries will be shown, you must have a valid, current subscription for a library in order to access it via Python, just as with SAS or any other supported programming language at WRDS. You will receive an error message indicating this if you attempt to query a table to which your institution does not have access.\n",
">\n",
"> 2. To determine the datasets within a given library:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['acti',\n",
" 'asia',\n",
" 'asib',\n",
" 'asic',\n",
" 'asio',\n",
" 'asix',\n",
" 'bmdebt',\n",
" 'bmheader',\n",
" 'bmpaymts',\n",
" 'bmquotes']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.list_tables(library=\"crsp\")[0:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Where 'library; is a dataset, such as crsp or comp, as returned from step 1 above.\n",
">\n",
"> 3. To determine the column headers (variables) within a given dataset:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Approximately 4922867 rows in crsp.msf.\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" nullable | \n",
" type | \n",
" comment | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" cusip | \n",
" True | \n",
" VARCHAR(8) | \n",
" None | \n",
"
\n",
" \n",
" 1 | \n",
" permno | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" permco | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 3 | \n",
" issuno | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 4 | \n",
" hexcd | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 5 | \n",
" hsiccd | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 6 | \n",
" date | \n",
" True | \n",
" DATE | \n",
" None | \n",
"
\n",
" \n",
" 7 | \n",
" bidlo | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 8 | \n",
" askhi | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 9 | \n",
" prc | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 10 | \n",
" vol | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 11 | \n",
" ret | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 12 | \n",
" bid | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 13 | \n",
" ask | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 14 | \n",
" shrout | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 15 | \n",
" cfacpr | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 16 | \n",
" cfacshr | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 17 | \n",
" altprc | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 18 | \n",
" spread | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
" 19 | \n",
" altprcdt | \n",
" True | \n",
" DATE | \n",
" None | \n",
"
\n",
" \n",
" 20 | \n",
" retx | \n",
" True | \n",
" DOUBLE_PRECISION | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name nullable type comment\n",
"0 cusip True VARCHAR(8) None\n",
"1 permno True DOUBLE_PRECISION None\n",
"2 permco True DOUBLE_PRECISION None\n",
"3 issuno True DOUBLE_PRECISION None\n",
"4 hexcd True DOUBLE_PRECISION None\n",
"5 hsiccd True DOUBLE_PRECISION None\n",
"6 date True DATE None\n",
"7 bidlo True DOUBLE_PRECISION None\n",
"8 askhi True DOUBLE_PRECISION None\n",
"9 prc True DOUBLE_PRECISION None\n",
"10 vol True DOUBLE_PRECISION None\n",
"11 ret True DOUBLE_PRECISION None\n",
"12 bid True DOUBLE_PRECISION None\n",
"13 ask True DOUBLE_PRECISION None\n",
"14 shrout True DOUBLE_PRECISION None\n",
"15 cfacpr True DOUBLE_PRECISION None\n",
"16 cfacshr True DOUBLE_PRECISION None\n",
"17 altprc True DOUBLE_PRECISION None\n",
"18 spread True DOUBLE_PRECISION None\n",
"19 altprcdt True DATE None\n",
"20 retx True DOUBLE_PRECISION None"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.describe_table(library=\"crsp\", table=\"msf\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Where 'library' is a dataset such as crsp as returned from #1 above and 'table' is a component database within that library, such as msf, as returned from query #2 above. Remember that both the library and the dataset are case-sensitive, and must be all-lowercase.\n",
">\n",
"> Alternatively, a comprehensive list of all WRDS libraries is available via the WRDS Dataset List. This online resource provides a listing of each library, their component datasets and variables, as well as a tabular database preview feature, and is helpful in establishing the structure of the data you're looking for in an easy, web-friendly manner.\n",
">\n",
"> By examining the metadata available to us -- the structure of the data -- we've determined how to reference the data we're researching, and what variables are available within that data. We can now perform our actual research, creating data queries, which are explored in depth in the next section."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Querying WRDS Data\n",
"\n",
"Continue the walkthrough provided here: https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using `get_table`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" cusip | \n",
" permno | \n",
" date | \n",
" shrout | \n",
" prc | \n",
" ret | \n",
" retx | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1985-12-31 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-01-31 | \n",
" 3680.0 | \n",
" -4.375000 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-02-28 | \n",
" 3680.0 | \n",
" -3.250000 | \n",
" -0.257143 | \n",
" -0.257143 | \n",
"
\n",
" \n",
" 3 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-03-31 | \n",
" 3680.0 | \n",
" -4.437500 | \n",
" 0.365385 | \n",
" 0.365385 | \n",
"
\n",
" \n",
" 4 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-04-30 | \n",
" 3793.0 | \n",
" -4.000000 | \n",
" -0.098592 | \n",
" -0.098592 | \n",
"
\n",
" \n",
" 5 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-05-30 | \n",
" 3793.0 | \n",
" -3.109375 | \n",
" -0.222656 | \n",
" -0.222656 | \n",
"
\n",
" \n",
" 6 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-06-30 | \n",
" 3793.0 | \n",
" -3.093750 | \n",
" -0.005025 | \n",
" -0.005025 | \n",
"
\n",
" \n",
" 7 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-07-31 | \n",
" 3793.0 | \n",
" -2.843750 | \n",
" -0.080808 | \n",
" -0.080808 | \n",
"
\n",
" \n",
" 8 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-08-29 | \n",
" 3793.0 | \n",
" -1.093750 | \n",
" -0.615385 | \n",
" -0.615385 | \n",
"
\n",
" \n",
" 9 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-09-30 | \n",
" 3793.0 | \n",
" -1.031250 | \n",
" -0.057143 | \n",
" -0.057143 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" cusip permno date shrout prc ret retx\n",
"0 68391610 10000.0 1985-12-31 NaN NaN NaN NaN\n",
"1 68391610 10000.0 1986-01-31 3680.0 -4.375000 NaN NaN\n",
"2 68391610 10000.0 1986-02-28 3680.0 -3.250000 -0.257143 -0.257143\n",
"3 68391610 10000.0 1986-03-31 3680.0 -4.437500 0.365385 0.365385\n",
"4 68391610 10000.0 1986-04-30 3793.0 -4.000000 -0.098592 -0.098592\n",
"5 68391610 10000.0 1986-05-30 3793.0 -3.109375 -0.222656 -0.222656\n",
"6 68391610 10000.0 1986-06-30 3793.0 -3.093750 -0.005025 -0.005025\n",
"7 68391610 10000.0 1986-07-31 3793.0 -2.843750 -0.080808 -0.080808\n",
"8 68391610 10000.0 1986-08-29 3793.0 -1.093750 -0.615385 -0.615385\n",
"9 68391610 10000.0 1986-09-30 3793.0 -1.031250 -0.057143 -0.057143"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The `obs` keyword argument limits the number of rows returned.\n",
"# You can omit it to get all rows.\n",
"df = db.get_table(library='crsp', table='msf', columns=['cusip', 'permno', 'date', 'shrout', 'prc', 'ret', 'retx'], obs=10)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using `raw_sql`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" cusip | \n",
" permno | \n",
" date | \n",
" shrout | \n",
" prc | \n",
" ret | \n",
" retx | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1985-12-31 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-01-31 | \n",
" 3680.0 | \n",
" -4.375000 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-02-28 | \n",
" 3680.0 | \n",
" -3.250000 | \n",
" -0.257143 | \n",
" -0.257143 | \n",
"
\n",
" \n",
" 3 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-03-31 | \n",
" 3680.0 | \n",
" -4.437500 | \n",
" 0.365385 | \n",
" 0.365385 | \n",
"
\n",
" \n",
" 4 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-04-30 | \n",
" 3793.0 | \n",
" -4.000000 | \n",
" -0.098592 | \n",
" -0.098592 | \n",
"
\n",
" \n",
" 5 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-05-30 | \n",
" 3793.0 | \n",
" -3.109375 | \n",
" -0.222656 | \n",
" -0.222656 | \n",
"
\n",
" \n",
" 6 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-06-30 | \n",
" 3793.0 | \n",
" -3.093750 | \n",
" -0.005025 | \n",
" -0.005025 | \n",
"
\n",
" \n",
" 7 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-07-31 | \n",
" 3793.0 | \n",
" -2.843750 | \n",
" -0.080808 | \n",
" -0.080808 | \n",
"
\n",
" \n",
" 8 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-08-29 | \n",
" 3793.0 | \n",
" -1.093750 | \n",
" -0.615385 | \n",
" -0.615385 | \n",
"
\n",
" \n",
" 9 | \n",
" 68391610 | \n",
" 10000.0 | \n",
" 1986-09-30 | \n",
" 3793.0 | \n",
" -1.031250 | \n",
" -0.057143 | \n",
" -0.057143 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" cusip permno date shrout prc ret retx\n",
"0 68391610 10000.0 1985-12-31 NaN NaN NaN NaN\n",
"1 68391610 10000.0 1986-01-31 3680.0 -4.375000 NaN NaN\n",
"2 68391610 10000.0 1986-02-28 3680.0 -3.250000 -0.257143 -0.257143\n",
"3 68391610 10000.0 1986-03-31 3680.0 -4.437500 0.365385 0.365385\n",
"4 68391610 10000.0 1986-04-30 3793.0 -4.000000 -0.098592 -0.098592\n",
"5 68391610 10000.0 1986-05-30 3793.0 -3.109375 -0.222656 -0.222656\n",
"6 68391610 10000.0 1986-06-30 3793.0 -3.093750 -0.005025 -0.005025\n",
"7 68391610 10000.0 1986-07-31 3793.0 -2.843750 -0.080808 -0.080808\n",
"8 68391610 10000.0 1986-08-29 3793.0 -1.093750 -0.615385 -0.615385\n",
"9 68391610 10000.0 1986-09-30 3793.0 -1.031250 -0.057143 -0.057143"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = db.raw_sql(\n",
" \"\"\"\n",
" SELECT\n",
" cusip, permno, date, shrout, prc, ret, retx\n",
" FROM \n",
" crsp.msf\n",
" LIMIT 10\n",
" \"\"\",\n",
" date_cols=['date']\n",
" )\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" gvkey | \n",
" datadate | \n",
" tic | \n",
" conm | \n",
" at | \n",
" lt | \n",
" prccm | \n",
" cshoq | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 006066 | \n",
" 1962-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 2112.301 | \n",
" 731.700 | \n",
" 389.9996 | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 006066 | \n",
" 1963-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 2373.857 | \n",
" 782.119 | \n",
" 506.9994 | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" 006066 | \n",
" 1964-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 3309.152 | \n",
" 1055.072 | \n",
" 409.4995 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 006066 | \n",
" 1965-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 3744.917 | \n",
" 1166.771 | \n",
" 498.9991 | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 006066 | \n",
" 1966-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 4660.777 | \n",
" 1338.149 | \n",
" 371.4997 | \n",
" NaN | \n",
"
\n",
" \n",
" 5 | \n",
" 006066 | \n",
" 1967-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 5598.668 | \n",
" 1767.067 | \n",
" 626.9995 | \n",
" NaN | \n",
"
\n",
" \n",
" 6 | \n",
" 006066 | \n",
" 1968-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 6743.430 | \n",
" 2174.291 | \n",
" 314.9998 | \n",
" NaN | \n",
"
\n",
" \n",
" 7 | \n",
" 006066 | \n",
" 1969-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 7389.957 | \n",
" 2112.967 | \n",
" 364.4994 | \n",
" 113.717 | \n",
"
\n",
" \n",
" 8 | \n",
" 006066 | \n",
" 1970-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 8539.047 | \n",
" 2591.909 | \n",
" 317.7496 | \n",
" 114.587 | \n",
"
\n",
" \n",
" 9 | \n",
" 006066 | \n",
" 1971-12-31 | \n",
" IBM | \n",
" INTL BUSINESS MACHINES CORP | \n",
" 9576.219 | \n",
" 2933.837 | \n",
" 336.4996 | \n",
" 115.534 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" gvkey datadate tic conm at lt \\\n",
"0 006066 1962-12-31 IBM INTL BUSINESS MACHINES CORP 2112.301 731.700 \n",
"1 006066 1963-12-31 IBM INTL BUSINESS MACHINES CORP 2373.857 782.119 \n",
"2 006066 1964-12-31 IBM INTL BUSINESS MACHINES CORP 3309.152 1055.072 \n",
"3 006066 1965-12-31 IBM INTL BUSINESS MACHINES CORP 3744.917 1166.771 \n",
"4 006066 1966-12-31 IBM INTL BUSINESS MACHINES CORP 4660.777 1338.149 \n",
"5 006066 1967-12-31 IBM INTL BUSINESS MACHINES CORP 5598.668 1767.067 \n",
"6 006066 1968-12-31 IBM INTL BUSINESS MACHINES CORP 6743.430 2174.291 \n",
"7 006066 1969-12-31 IBM INTL BUSINESS MACHINES CORP 7389.957 2112.967 \n",
"8 006066 1970-12-31 IBM INTL BUSINESS MACHINES CORP 8539.047 2591.909 \n",
"9 006066 1971-12-31 IBM INTL BUSINESS MACHINES CORP 9576.219 2933.837 \n",
"\n",
" prccm cshoq \n",
"0 389.9996 NaN \n",
"1 506.9994 NaN \n",
"2 409.4995 NaN \n",
"3 498.9991 NaN \n",
"4 371.4997 NaN \n",
"5 626.9995 NaN \n",
"6 314.9998 NaN \n",
"7 364.4994 113.717 \n",
"8 317.7496 114.587 \n",
"9 336.4996 115.534 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = db.raw_sql(\n",
" \"\"\"\n",
" SELECT \n",
" a.gvkey, a.datadate, a.tic, a.conm, a.at, a.lt, b.prccm, b.cshoq\n",
" FROM \n",
" comp.funda a\n",
" JOIN \n",
" comp.secm b ON a.gvkey = b.gvkey AND a.datadate = b.datadate\n",
" WHERE \n",
" a.tic = 'IBM' AND \n",
" a.datafmt = 'STD' AND \n",
" a.consol = 'C' AND \n",
" a.indfmt = 'INDL'\n",
" LIMIT 10\n",
" \"\"\"\n",
")\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" datadate | \n",
" gvkey | \n",
" cusip | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1982-10-31 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 1 | \n",
" 1983-10-31 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 2 | \n",
" 1984-10-31 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 3 | \n",
" 1985-10-31 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 4 | \n",
" 1986-10-31 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 5 | \n",
" 1987-10-31 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 6 | \n",
" 1988-10-31 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 7 | \n",
" 1989-06-30 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 8 | \n",
" 1990-06-30 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
" 9 | \n",
" 1991-06-30 | \n",
" 002484 | \n",
" 121579932 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" datadate gvkey cusip\n",
"0 1982-10-31 002484 121579932\n",
"1 1983-10-31 002484 121579932\n",
"2 1984-10-31 002484 121579932\n",
"3 1985-10-31 002484 121579932\n",
"4 1986-10-31 002484 121579932\n",
"5 1987-10-31 002484 121579932\n",
"6 1988-10-31 002484 121579932\n",
"7 1989-06-30 002484 121579932\n",
"8 1990-06-30 002484 121579932\n",
"9 1991-06-30 002484 121579932"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"params = {\"tickers\": (\"0015B\", \"0030B\", \"0032A\", \"0033A\", \"0038A\")}\n",
"df = db.raw_sql(\n",
" \"\"\"\n",
" SELECT \n",
" datadate, gvkey, cusip \n",
" FROM comp.funda \n",
" WHERE \n",
" tic IN %(tickers)s\n",
" LIMIT 10\n",
" \"\"\",\n",
" params=params,\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Misc"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"28073"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = db.get_row_count('djones', 'djdaily')\n",
"data"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"db.close()"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}