{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example: Connecting to the WRDS Platform With Python\n", "\n", "\n", "The `wrds` Python package is a tool designed to facilitate data retrieval from the Wharton Research Data Services (WRDS) database. \n", " \n", "- It provides direct access to the WRDS database, allowing users to programmatically query and retrieve data. \n", "- The package supports a simple Python API as well as the ability to send raw SQL queries.\n", "- Retrieved data can be easily converted into Pandas DataFrames.\n", "- Secure access to the WRDS database is managed through user authentication, ensuring data security and compliance with usage policies.\n", "\n", "## Installation\n", "\n", "The package can be installed via pip:\n", "\n", "```\n", "pip install wrds\n", "```\n", "\n", "To use the package, one would typically import it in Python, authenticate with WRDS credentials, and then proceed with data queries and analysis. This notebook walks through some of these basic features.\n", "\n", "## Usage\n", "\n", "Here are some helpful links to learn how to use the `wrds` Python package:\n", "\n", " - [Video: Using Python on the WRDS Platform](https://wrds-www.wharton.upenn.edu/pages/grid-items/using-python-wrds-platform/)\n", " - [PyPI Homepage for WRDS Python Package](https://pypi.org/project/wrds/)\n", " - [WRDS Guide: Querying WRDS Data using Python](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/)\n", " - [WRDS Python Package Demo Walkthrough](https://wrds-www.wharton.upenn.edu/documents/1443/wrds_connection.html)\n", "\n", "\n", "### Import and Establish Connection\n", "\n", "Establish connection with WRDS server. \n", "Log in using your WRDS username and password.\n", "Set up a `pgpass` file to store the info. This will allow you to access WRDS without supplying your password each time, as long as you supply your username (which we are doing below via environment variables). We want to be able to access WRDS without a password so that we can automate the query." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import config\n", "from pathlib import Path\n", "OUTPUT_DIR = Path(config.OUTPUT_DIR)\n", "DATA_DIR = Path(config.DATA_DIR)\n", "WRDS_USERNAME = config.WRDS_USERNAME\n", "\n", "import wrds\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading library list...\n", "Done\n" ] } ], "source": [ "db = wrds.Connection(wrds_username=WRDS_USERNAME)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnodateshroutprcretretx
06839161010000.01985-12-31NaNNaNNaNNaN
16839161010000.01986-01-313680.0-4.375000NaNNaN
26839161010000.01986-02-283680.0-3.250000-0.257143-0.257143
36839161010000.01986-03-313680.0-4.4375000.3653850.365385
46839161010000.01986-04-303793.0-4.000000-0.098592-0.098592
56839161010000.01986-05-303793.0-3.109375-0.222656-0.222656
66839161010000.01986-06-303793.0-3.093750-0.005025-0.005025
76839161010000.01986-07-313793.0-2.843750-0.080808-0.080808
86839161010000.01986-08-293793.0-1.093750-0.615385-0.615385
96839161010000.01986-09-303793.0-1.031250-0.057143-0.057143
\n", "
" ], "text/plain": [ " cusip permno date shrout prc ret retx\n", "0 68391610 10000.0 1985-12-31 NaN NaN NaN NaN\n", "1 68391610 10000.0 1986-01-31 3680.0 -4.375000 NaN NaN\n", "2 68391610 10000.0 1986-02-28 3680.0 -3.250000 -0.257143 -0.257143\n", "3 68391610 10000.0 1986-03-31 3680.0 -4.437500 0.365385 0.365385\n", "4 68391610 10000.0 1986-04-30 3793.0 -4.000000 -0.098592 -0.098592\n", "5 68391610 10000.0 1986-05-30 3793.0 -3.109375 -0.222656 -0.222656\n", "6 68391610 10000.0 1986-06-30 3793.0 -3.093750 -0.005025 -0.005025\n", "7 68391610 10000.0 1986-07-31 3793.0 -2.843750 -0.080808 -0.080808\n", "8 68391610 10000.0 1986-08-29 3793.0 -1.093750 -0.615385 -0.615385\n", "9 68391610 10000.0 1986-09-30 3793.0 -1.031250 -0.057143 -0.057143" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The `obs` keyword argument limits the number of rows returned.\n", "# You can omit it to get all rows.\n", "df = db.get_table(library='crsp', table='msf', columns=['cusip', 'permno', 'date', 'shrout', 'prc', 'ret', 'retx'], obs=10)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring the Data\n", "\n", "As described [here](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/), \n", "\n", "> Data at WRDS is organized in a hierarchical manner by vendor (e.g. crsp), referred to at the top-level as libraries. Each library contains a number of component tables or datasets (e.g. dsf) which contain the actual data in tabular format, with column headers called variables (such as date, askhi, bidlo, etc).\n", "> \n", "> You can analyze the structure of the data through its metadata using the wrds module, as outlined in the following steps:\n", ">\n", "> - List all available libraries at WRDS using list_libraries()\n", "> - Select a library to work with, and list all available datasets within that library using list_tables()\n", "> - Select a dataset, and list all available variables (column headers) within that dataset using describe_table()\n", ">\n", "> **NOTE:** When referencing library and dataset names, you must use all lowercase.\n", ">\n", "> Alternatively, a comprehensive list of all WRDS libraries is available at the Dataset List. This resource provides a listing of each library, their component datasets and variables, as well as a tabular database preview feature, and is helpful in establishing the structure of the data you're looking for in an easy manner from a Web browser.\n", ">\n", "> 1. Determine the libraries available at WRDS:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['aha_sample',\n", " 'ahasamp',\n", " 'audit',\n", " 'audit_acct_os',\n", " 'audit_audit_comp',\n", " 'audit_common',\n", " 'audit_corp_legal',\n", " 'auditsmp',\n", " 'auditsmp_all',\n", " 'bank']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(db.list_libraries())[0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> This will list all libraries available at WRDS in alphabetical order. Though all libraries will be shown, you must have a valid, current subscription for a library in order to access it via Python, just as with SAS or any other supported programming language at WRDS. You will receive an error message indicating this if you attempt to query a table to which your institution does not have access.\n", ">\n", "> 2. To determine the datasets within a given library:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['acti',\n", " 'asia',\n", " 'asib',\n", " 'asic',\n", " 'asio',\n", " 'asix',\n", " 'bmdebt',\n", " 'bmheader',\n", " 'bmpaymts',\n", " 'bmquotes']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.list_tables(library=\"crsp\")[0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Where 'library; is a dataset, such as crsp or comp, as returned from step 1 above.\n", ">\n", "> 3. To determine the column headers (variables) within a given dataset:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Approximately 4922867 rows in crsp.msf.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namenullabletypecomment
0cusipTrueVARCHAR(8)None
1permnoTrueDOUBLE_PRECISIONNone
2permcoTrueDOUBLE_PRECISIONNone
3issunoTrueDOUBLE_PRECISIONNone
4hexcdTrueDOUBLE_PRECISIONNone
5hsiccdTrueDOUBLE_PRECISIONNone
6dateTrueDATENone
7bidloTrueDOUBLE_PRECISIONNone
8askhiTrueDOUBLE_PRECISIONNone
9prcTrueDOUBLE_PRECISIONNone
10volTrueDOUBLE_PRECISIONNone
11retTrueDOUBLE_PRECISIONNone
12bidTrueDOUBLE_PRECISIONNone
13askTrueDOUBLE_PRECISIONNone
14shroutTrueDOUBLE_PRECISIONNone
15cfacprTrueDOUBLE_PRECISIONNone
16cfacshrTrueDOUBLE_PRECISIONNone
17altprcTrueDOUBLE_PRECISIONNone
18spreadTrueDOUBLE_PRECISIONNone
19altprcdtTrueDATENone
20retxTrueDOUBLE_PRECISIONNone
\n", "
" ], "text/plain": [ " name nullable type comment\n", "0 cusip True VARCHAR(8) None\n", "1 permno True DOUBLE_PRECISION None\n", "2 permco True DOUBLE_PRECISION None\n", "3 issuno True DOUBLE_PRECISION None\n", "4 hexcd True DOUBLE_PRECISION None\n", "5 hsiccd True DOUBLE_PRECISION None\n", "6 date True DATE None\n", "7 bidlo True DOUBLE_PRECISION None\n", "8 askhi True DOUBLE_PRECISION None\n", "9 prc True DOUBLE_PRECISION None\n", "10 vol True DOUBLE_PRECISION None\n", "11 ret True DOUBLE_PRECISION None\n", "12 bid True DOUBLE_PRECISION None\n", "13 ask True DOUBLE_PRECISION None\n", "14 shrout True DOUBLE_PRECISION None\n", "15 cfacpr True DOUBLE_PRECISION None\n", "16 cfacshr True DOUBLE_PRECISION None\n", "17 altprc True DOUBLE_PRECISION None\n", "18 spread True DOUBLE_PRECISION None\n", "19 altprcdt True DATE None\n", "20 retx True DOUBLE_PRECISION None" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.describe_table(library=\"crsp\", table=\"msf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Where 'library' is a dataset such as crsp as returned from #1 above and 'table' is a component database within that library, such as msf, as returned from query #2 above. Remember that both the library and the dataset are case-sensitive, and must be all-lowercase.\n", ">\n", "> Alternatively, a comprehensive list of all WRDS libraries is available via the WRDS Dataset List. This online resource provides a listing of each library, their component datasets and variables, as well as a tabular database preview feature, and is helpful in establishing the structure of the data you're looking for in an easy, web-friendly manner.\n", ">\n", "> By examining the metadata available to us -- the structure of the data -- we've determined how to reference the data we're researching, and what variables are available within that data. We can now perform our actual research, creating data queries, which are explored in depth in the next section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Querying WRDS Data\n", "\n", "Continue the walkthrough provided here: https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `get_table`" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnodateshroutprcretretx
06839161010000.01985-12-31NaNNaNNaNNaN
16839161010000.01986-01-313680.0-4.375000NaNNaN
26839161010000.01986-02-283680.0-3.250000-0.257143-0.257143
36839161010000.01986-03-313680.0-4.4375000.3653850.365385
46839161010000.01986-04-303793.0-4.000000-0.098592-0.098592
56839161010000.01986-05-303793.0-3.109375-0.222656-0.222656
66839161010000.01986-06-303793.0-3.093750-0.005025-0.005025
76839161010000.01986-07-313793.0-2.843750-0.080808-0.080808
86839161010000.01986-08-293793.0-1.093750-0.615385-0.615385
96839161010000.01986-09-303793.0-1.031250-0.057143-0.057143
\n", "
" ], "text/plain": [ " cusip permno date shrout prc ret retx\n", "0 68391610 10000.0 1985-12-31 NaN NaN NaN NaN\n", "1 68391610 10000.0 1986-01-31 3680.0 -4.375000 NaN NaN\n", "2 68391610 10000.0 1986-02-28 3680.0 -3.250000 -0.257143 -0.257143\n", "3 68391610 10000.0 1986-03-31 3680.0 -4.437500 0.365385 0.365385\n", "4 68391610 10000.0 1986-04-30 3793.0 -4.000000 -0.098592 -0.098592\n", "5 68391610 10000.0 1986-05-30 3793.0 -3.109375 -0.222656 -0.222656\n", "6 68391610 10000.0 1986-06-30 3793.0 -3.093750 -0.005025 -0.005025\n", "7 68391610 10000.0 1986-07-31 3793.0 -2.843750 -0.080808 -0.080808\n", "8 68391610 10000.0 1986-08-29 3793.0 -1.093750 -0.615385 -0.615385\n", "9 68391610 10000.0 1986-09-30 3793.0 -1.031250 -0.057143 -0.057143" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The `obs` keyword argument limits the number of rows returned.\n", "# You can omit it to get all rows.\n", "df = db.get_table(library='crsp', table='msf', columns=['cusip', 'permno', 'date', 'shrout', 'prc', 'ret', 'retx'], obs=10)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using `raw_sql`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnodateshroutprcretretx
06839161010000.01985-12-31NaNNaNNaNNaN
16839161010000.01986-01-313680.0-4.375000NaNNaN
26839161010000.01986-02-283680.0-3.250000-0.257143-0.257143
36839161010000.01986-03-313680.0-4.4375000.3653850.365385
46839161010000.01986-04-303793.0-4.000000-0.098592-0.098592
56839161010000.01986-05-303793.0-3.109375-0.222656-0.222656
66839161010000.01986-06-303793.0-3.093750-0.005025-0.005025
76839161010000.01986-07-313793.0-2.843750-0.080808-0.080808
86839161010000.01986-08-293793.0-1.093750-0.615385-0.615385
96839161010000.01986-09-303793.0-1.031250-0.057143-0.057143
\n", "
" ], "text/plain": [ " cusip permno date shrout prc ret retx\n", "0 68391610 10000.0 1985-12-31 NaN NaN NaN NaN\n", "1 68391610 10000.0 1986-01-31 3680.0 -4.375000 NaN NaN\n", "2 68391610 10000.0 1986-02-28 3680.0 -3.250000 -0.257143 -0.257143\n", "3 68391610 10000.0 1986-03-31 3680.0 -4.437500 0.365385 0.365385\n", "4 68391610 10000.0 1986-04-30 3793.0 -4.000000 -0.098592 -0.098592\n", "5 68391610 10000.0 1986-05-30 3793.0 -3.109375 -0.222656 -0.222656\n", "6 68391610 10000.0 1986-06-30 3793.0 -3.093750 -0.005025 -0.005025\n", "7 68391610 10000.0 1986-07-31 3793.0 -2.843750 -0.080808 -0.080808\n", "8 68391610 10000.0 1986-08-29 3793.0 -1.093750 -0.615385 -0.615385\n", "9 68391610 10000.0 1986-09-30 3793.0 -1.031250 -0.057143 -0.057143" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = db.raw_sql(\n", " \"\"\"\n", " SELECT\n", " cusip, permno, date, shrout, prc, ret, retx\n", " FROM \n", " crsp.msf\n", " LIMIT 10\n", " \"\"\",\n", " date_cols=['date']\n", " )\n", "df" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gvkeydatadateticconmatltprccmcshoq
00060661962-12-31IBMINTL BUSINESS MACHINES CORP2112.301731.700389.9996NaN
10060661963-12-31IBMINTL BUSINESS MACHINES CORP2373.857782.119506.9994NaN
20060661964-12-31IBMINTL BUSINESS MACHINES CORP3309.1521055.072409.4995NaN
30060661965-12-31IBMINTL BUSINESS MACHINES CORP3744.9171166.771498.9991NaN
40060661966-12-31IBMINTL BUSINESS MACHINES CORP4660.7771338.149371.4997NaN
50060661967-12-31IBMINTL BUSINESS MACHINES CORP5598.6681767.067626.9995NaN
60060661968-12-31IBMINTL BUSINESS MACHINES CORP6743.4302174.291314.9998NaN
70060661969-12-31IBMINTL BUSINESS MACHINES CORP7389.9572112.967364.4994113.717
80060661970-12-31IBMINTL BUSINESS MACHINES CORP8539.0472591.909317.7496114.587
90060661971-12-31IBMINTL BUSINESS MACHINES CORP9576.2192933.837336.4996115.534
\n", "
" ], "text/plain": [ " gvkey datadate tic conm at lt \\\n", "0 006066 1962-12-31 IBM INTL BUSINESS MACHINES CORP 2112.301 731.700 \n", "1 006066 1963-12-31 IBM INTL BUSINESS MACHINES CORP 2373.857 782.119 \n", "2 006066 1964-12-31 IBM INTL BUSINESS MACHINES CORP 3309.152 1055.072 \n", "3 006066 1965-12-31 IBM INTL BUSINESS MACHINES CORP 3744.917 1166.771 \n", "4 006066 1966-12-31 IBM INTL BUSINESS MACHINES CORP 4660.777 1338.149 \n", "5 006066 1967-12-31 IBM INTL BUSINESS MACHINES CORP 5598.668 1767.067 \n", "6 006066 1968-12-31 IBM INTL BUSINESS MACHINES CORP 6743.430 2174.291 \n", "7 006066 1969-12-31 IBM INTL BUSINESS MACHINES CORP 7389.957 2112.967 \n", "8 006066 1970-12-31 IBM INTL BUSINESS MACHINES CORP 8539.047 2591.909 \n", "9 006066 1971-12-31 IBM INTL BUSINESS MACHINES CORP 9576.219 2933.837 \n", "\n", " prccm cshoq \n", "0 389.9996 NaN \n", "1 506.9994 NaN \n", "2 409.4995 NaN \n", "3 498.9991 NaN \n", "4 371.4997 NaN \n", "5 626.9995 NaN \n", "6 314.9998 NaN \n", "7 364.4994 113.717 \n", "8 317.7496 114.587 \n", "9 336.4996 115.534 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = db.raw_sql(\n", " \"\"\"\n", " SELECT \n", " a.gvkey, a.datadate, a.tic, a.conm, a.at, a.lt, b.prccm, b.cshoq\n", " FROM \n", " comp.funda a\n", " JOIN \n", " comp.secm b ON a.gvkey = b.gvkey AND a.datadate = b.datadate\n", " WHERE \n", " a.tic = 'IBM' AND \n", " a.datafmt = 'STD' AND \n", " a.consol = 'C' AND \n", " a.indfmt = 'INDL'\n", " LIMIT 10\n", " \"\"\"\n", ")\n", "df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datadategvkeycusip
01982-10-31002484121579932
11983-10-31002484121579932
21984-10-31002484121579932
31985-10-31002484121579932
41986-10-31002484121579932
51987-10-31002484121579932
61988-10-31002484121579932
71989-06-30002484121579932
81990-06-30002484121579932
91991-06-30002484121579932
\n", "
" ], "text/plain": [ " datadate gvkey cusip\n", "0 1982-10-31 002484 121579932\n", "1 1983-10-31 002484 121579932\n", "2 1984-10-31 002484 121579932\n", "3 1985-10-31 002484 121579932\n", "4 1986-10-31 002484 121579932\n", "5 1987-10-31 002484 121579932\n", "6 1988-10-31 002484 121579932\n", "7 1989-06-30 002484 121579932\n", "8 1990-06-30 002484 121579932\n", "9 1991-06-30 002484 121579932" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = {\"tickers\": (\"0015B\", \"0030B\", \"0032A\", \"0033A\", \"0038A\")}\n", "df = db.raw_sql(\n", " \"\"\"\n", " SELECT \n", " datadate, gvkey, cusip \n", " FROM comp.funda \n", " WHERE \n", " tic IN %(tickers)s\n", " LIMIT 10\n", " \"\"\",\n", " params=params,\n", ")\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Misc" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "28073" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = db.get_row_count('djones', 'djdaily')\n", "data" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "db.close()" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }