{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example: Connecting to the WRDS Platform With Python\n",
    "\n",
    "\n",
    "The `wrds` Python package is a tool designed to facilitate data retrieval from the Wharton Research Data Services (WRDS) database. \n",
    " \n",
    "- It provides direct access to the WRDS database, allowing users to programmatically query and retrieve data. \n",
    "- The package supports a simple Python API as well as the ability to send raw SQL queries.\n",
    "- Retrieved data can be easily converted into Pandas DataFrames.\n",
    "- Secure access to the WRDS database is managed through user authentication, ensuring data security and compliance with usage policies.\n",
    "\n",
    "## Installation\n",
    "\n",
    "The package can be installed via pip:\n",
    "\n",
    "```\n",
    "pip install wrds\n",
    "```\n",
    "\n",
    "To use the package, one would typically import it in Python, authenticate with WRDS credentials, and then proceed with data queries and analysis. This notebook walks through some of these basic features.\n",
    "\n",
    "## Usage\n",
    "\n",
    "Here are some helpful links to learn how to use the `wrds` Python package:\n",
    "\n",
    " - [Video: Using Python on the WRDS Platform](https://wrds-www.wharton.upenn.edu/pages/grid-items/using-python-wrds-platform/)\n",
    " - [PyPI Homepage for WRDS Python Package](https://pypi.org/project/wrds/)\n",
    " - [WRDS Guide: Querying WRDS Data using Python](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/)\n",
    " - [WRDS Python Package Demo Walkthrough](https://wrds-www.wharton.upenn.edu/documents/1443/wrds_connection.html)\n",
    "\n",
    "\n",
    "### Import and Establish Connection\n",
    "\n",
    "Establish connection with WRDS server. \n",
    "Log in using your WRDS username and password.\n",
    "Set up a `pgpass` file to store the info. This will allow you to access WRDS without supplying your password each time, as long as you supply your username (which we are doing below via environment variables). We want to be able to access WRDS without a password so that we can automate the query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import config\n",
    "from pathlib import Path\n",
    "OUTPUT_DIR = Path(config.OUTPUT_DIR)\n",
    "DATA_DIR = Path(config.DATA_DIR)\n",
    "WRDS_USERNAME = config.WRDS_USERNAME\n",
    "\n",
    "import wrds\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading library list...\n",
      "Done\n"
     ]
    }
   ],
   "source": [
    "db = wrds.Connection(wrds_username=WRDS_USERNAME)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>cusip</th>\n",
       "      <th>permno</th>\n",
       "      <th>date</th>\n",
       "      <th>shrout</th>\n",
       "      <th>prc</th>\n",
       "      <th>ret</th>\n",
       "      <th>retx</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1985-12-31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-01-31</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-4.375000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-02-28</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-3.250000</td>\n",
       "      <td>-0.257143</td>\n",
       "      <td>-0.257143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-03-31</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-4.437500</td>\n",
       "      <td>0.365385</td>\n",
       "      <td>0.365385</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-04-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-4.000000</td>\n",
       "      <td>-0.098592</td>\n",
       "      <td>-0.098592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-05-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-3.109375</td>\n",
       "      <td>-0.222656</td>\n",
       "      <td>-0.222656</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-06-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-3.093750</td>\n",
       "      <td>-0.005025</td>\n",
       "      <td>-0.005025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-07-31</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-2.843750</td>\n",
       "      <td>-0.080808</td>\n",
       "      <td>-0.080808</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-08-29</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-1.093750</td>\n",
       "      <td>-0.615385</td>\n",
       "      <td>-0.615385</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-09-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-1.031250</td>\n",
       "      <td>-0.057143</td>\n",
       "      <td>-0.057143</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      cusip   permno        date  shrout       prc       ret      retx\n",
       "0  68391610  10000.0  1985-12-31     NaN       NaN       NaN       NaN\n",
       "1  68391610  10000.0  1986-01-31  3680.0 -4.375000       NaN       NaN\n",
       "2  68391610  10000.0  1986-02-28  3680.0 -3.250000 -0.257143 -0.257143\n",
       "3  68391610  10000.0  1986-03-31  3680.0 -4.437500  0.365385  0.365385\n",
       "4  68391610  10000.0  1986-04-30  3793.0 -4.000000 -0.098592 -0.098592\n",
       "5  68391610  10000.0  1986-05-30  3793.0 -3.109375 -0.222656 -0.222656\n",
       "6  68391610  10000.0  1986-06-30  3793.0 -3.093750 -0.005025 -0.005025\n",
       "7  68391610  10000.0  1986-07-31  3793.0 -2.843750 -0.080808 -0.080808\n",
       "8  68391610  10000.0  1986-08-29  3793.0 -1.093750 -0.615385 -0.615385\n",
       "9  68391610  10000.0  1986-09-30  3793.0 -1.031250 -0.057143 -0.057143"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The `obs` keyword argument limits the number of rows returned.\n",
    "# You can omit it to get all rows.\n",
    "df = db.get_table(library='crsp', table='msf', columns=['cusip', 'permno', 'date', 'shrout', 'prc', 'ret', 'retx'], obs=10)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exploring the Data\n",
    "\n",
    "As described [here](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/), \n",
    "\n",
    "> Data at WRDS is organized in a hierarchical manner by vendor (e.g. crsp), referred to at the top-level as libraries. Each library contains a number of component tables or datasets (e.g. dsf) which contain the actual data in tabular format, with column headers called variables (such as date, askhi, bidlo, etc).\n",
    "> \n",
    "> You can analyze the structure of the data through its metadata using the wrds module, as outlined in the following steps:\n",
    ">\n",
    "> - List all available libraries at WRDS using list_libraries()\n",
    "> - Select a library to work with, and list all available datasets within that library using list_tables()\n",
    "> - Select a dataset, and list all available variables (column headers) within that dataset using describe_table()\n",
    ">\n",
    "> **NOTE:** When referencing library and dataset names, you must use all lowercase.\n",
    ">\n",
    "> Alternatively, a comprehensive list of all WRDS libraries is available at the Dataset List. This resource provides a listing of each library, their component datasets and variables, as well as a tabular database preview feature, and is helpful in establishing the structure of the data you're looking for in an easy manner from a Web browser.\n",
    ">\n",
    "> 1. Determine the libraries available at WRDS:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['aha_sample',\n",
       " 'ahasamp',\n",
       " 'audit',\n",
       " 'audit_acct_os',\n",
       " 'audit_audit_comp',\n",
       " 'audit_common',\n",
       " 'audit_corp_legal',\n",
       " 'auditsmp',\n",
       " 'auditsmp_all',\n",
       " 'bank']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sorted(db.list_libraries())[0:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> This will list all libraries available at WRDS in alphabetical order. Though all libraries will be shown, you must have a valid, current subscription for a library in order to access it via Python, just as with SAS or any other supported programming language at WRDS. You will receive an error message indicating this if you attempt to query a table to which your institution does not have access.\n",
    ">\n",
    "> 2. To determine the datasets within a given library:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['acti',\n",
       " 'asia',\n",
       " 'asib',\n",
       " 'asic',\n",
       " 'asio',\n",
       " 'asix',\n",
       " 'bmdebt',\n",
       " 'bmheader',\n",
       " 'bmpaymts',\n",
       " 'bmquotes']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.list_tables(library=\"crsp\")[0:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Where 'library; is a dataset, such as crsp or comp, as returned from step 1 above.\n",
    ">\n",
    "> 3. To determine the column headers (variables) within a given dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Approximately 4922867 rows in crsp.msf.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>nullable</th>\n",
       "      <th>type</th>\n",
       "      <th>comment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>cusip</td>\n",
       "      <td>True</td>\n",
       "      <td>VARCHAR(8)</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>permno</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>permco</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>issuno</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>hexcd</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>hsiccd</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>date</td>\n",
       "      <td>True</td>\n",
       "      <td>DATE</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>bidlo</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>askhi</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>prc</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>vol</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>ret</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>bid</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>ask</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>shrout</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>cfacpr</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>cfacshr</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>altprc</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>spread</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>altprcdt</td>\n",
       "      <td>True</td>\n",
       "      <td>DATE</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>retx</td>\n",
       "      <td>True</td>\n",
       "      <td>DOUBLE_PRECISION</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        name  nullable              type comment\n",
       "0      cusip      True        VARCHAR(8)    None\n",
       "1     permno      True  DOUBLE_PRECISION    None\n",
       "2     permco      True  DOUBLE_PRECISION    None\n",
       "3     issuno      True  DOUBLE_PRECISION    None\n",
       "4      hexcd      True  DOUBLE_PRECISION    None\n",
       "5     hsiccd      True  DOUBLE_PRECISION    None\n",
       "6       date      True              DATE    None\n",
       "7      bidlo      True  DOUBLE_PRECISION    None\n",
       "8      askhi      True  DOUBLE_PRECISION    None\n",
       "9        prc      True  DOUBLE_PRECISION    None\n",
       "10       vol      True  DOUBLE_PRECISION    None\n",
       "11       ret      True  DOUBLE_PRECISION    None\n",
       "12       bid      True  DOUBLE_PRECISION    None\n",
       "13       ask      True  DOUBLE_PRECISION    None\n",
       "14    shrout      True  DOUBLE_PRECISION    None\n",
       "15    cfacpr      True  DOUBLE_PRECISION    None\n",
       "16   cfacshr      True  DOUBLE_PRECISION    None\n",
       "17    altprc      True  DOUBLE_PRECISION    None\n",
       "18    spread      True  DOUBLE_PRECISION    None\n",
       "19  altprcdt      True              DATE    None\n",
       "20      retx      True  DOUBLE_PRECISION    None"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.describe_table(library=\"crsp\", table=\"msf\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Where 'library' is a dataset such as crsp as returned from #1 above and 'table' is a component database within that library, such as msf, as returned from query #2 above. Remember that both the library and the dataset are case-sensitive, and must be all-lowercase.\n",
    ">\n",
    "> Alternatively, a comprehensive list of all WRDS libraries is available via the WRDS Dataset List. This online resource provides a listing of each library, their component datasets and variables, as well as a tabular database preview feature, and is helpful in establishing the structure of the data you're looking for in an easy, web-friendly manner.\n",
    ">\n",
    "> By examining the metadata available to us -- the structure of the data -- we've determined how to reference the data we're researching, and what variables are available within that data. We can now perform our actual research, creating data queries, which are explored in depth in the next section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying WRDS Data\n",
    "\n",
    "Continue the walkthrough provided here: https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using `get_table`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>cusip</th>\n",
       "      <th>permno</th>\n",
       "      <th>date</th>\n",
       "      <th>shrout</th>\n",
       "      <th>prc</th>\n",
       "      <th>ret</th>\n",
       "      <th>retx</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1985-12-31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-01-31</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-4.375000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-02-28</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-3.250000</td>\n",
       "      <td>-0.257143</td>\n",
       "      <td>-0.257143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-03-31</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-4.437500</td>\n",
       "      <td>0.365385</td>\n",
       "      <td>0.365385</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-04-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-4.000000</td>\n",
       "      <td>-0.098592</td>\n",
       "      <td>-0.098592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-05-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-3.109375</td>\n",
       "      <td>-0.222656</td>\n",
       "      <td>-0.222656</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-06-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-3.093750</td>\n",
       "      <td>-0.005025</td>\n",
       "      <td>-0.005025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-07-31</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-2.843750</td>\n",
       "      <td>-0.080808</td>\n",
       "      <td>-0.080808</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-08-29</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-1.093750</td>\n",
       "      <td>-0.615385</td>\n",
       "      <td>-0.615385</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-09-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-1.031250</td>\n",
       "      <td>-0.057143</td>\n",
       "      <td>-0.057143</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      cusip   permno        date  shrout       prc       ret      retx\n",
       "0  68391610  10000.0  1985-12-31     NaN       NaN       NaN       NaN\n",
       "1  68391610  10000.0  1986-01-31  3680.0 -4.375000       NaN       NaN\n",
       "2  68391610  10000.0  1986-02-28  3680.0 -3.250000 -0.257143 -0.257143\n",
       "3  68391610  10000.0  1986-03-31  3680.0 -4.437500  0.365385  0.365385\n",
       "4  68391610  10000.0  1986-04-30  3793.0 -4.000000 -0.098592 -0.098592\n",
       "5  68391610  10000.0  1986-05-30  3793.0 -3.109375 -0.222656 -0.222656\n",
       "6  68391610  10000.0  1986-06-30  3793.0 -3.093750 -0.005025 -0.005025\n",
       "7  68391610  10000.0  1986-07-31  3793.0 -2.843750 -0.080808 -0.080808\n",
       "8  68391610  10000.0  1986-08-29  3793.0 -1.093750 -0.615385 -0.615385\n",
       "9  68391610  10000.0  1986-09-30  3793.0 -1.031250 -0.057143 -0.057143"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The `obs` keyword argument limits the number of rows returned.\n",
    "# You can omit it to get all rows.\n",
    "df = db.get_table(library='crsp', table='msf', columns=['cusip', 'permno', 'date', 'shrout', 'prc', 'ret', 'retx'], obs=10)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using `raw_sql`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>cusip</th>\n",
       "      <th>permno</th>\n",
       "      <th>date</th>\n",
       "      <th>shrout</th>\n",
       "      <th>prc</th>\n",
       "      <th>ret</th>\n",
       "      <th>retx</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1985-12-31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-01-31</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-4.375000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-02-28</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-3.250000</td>\n",
       "      <td>-0.257143</td>\n",
       "      <td>-0.257143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-03-31</td>\n",
       "      <td>3680.0</td>\n",
       "      <td>-4.437500</td>\n",
       "      <td>0.365385</td>\n",
       "      <td>0.365385</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-04-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-4.000000</td>\n",
       "      <td>-0.098592</td>\n",
       "      <td>-0.098592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-05-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-3.109375</td>\n",
       "      <td>-0.222656</td>\n",
       "      <td>-0.222656</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-06-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-3.093750</td>\n",
       "      <td>-0.005025</td>\n",
       "      <td>-0.005025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-07-31</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-2.843750</td>\n",
       "      <td>-0.080808</td>\n",
       "      <td>-0.080808</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-08-29</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-1.093750</td>\n",
       "      <td>-0.615385</td>\n",
       "      <td>-0.615385</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>68391610</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>1986-09-30</td>\n",
       "      <td>3793.0</td>\n",
       "      <td>-1.031250</td>\n",
       "      <td>-0.057143</td>\n",
       "      <td>-0.057143</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      cusip   permno       date  shrout       prc       ret      retx\n",
       "0  68391610  10000.0 1985-12-31     NaN       NaN       NaN       NaN\n",
       "1  68391610  10000.0 1986-01-31  3680.0 -4.375000       NaN       NaN\n",
       "2  68391610  10000.0 1986-02-28  3680.0 -3.250000 -0.257143 -0.257143\n",
       "3  68391610  10000.0 1986-03-31  3680.0 -4.437500  0.365385  0.365385\n",
       "4  68391610  10000.0 1986-04-30  3793.0 -4.000000 -0.098592 -0.098592\n",
       "5  68391610  10000.0 1986-05-30  3793.0 -3.109375 -0.222656 -0.222656\n",
       "6  68391610  10000.0 1986-06-30  3793.0 -3.093750 -0.005025 -0.005025\n",
       "7  68391610  10000.0 1986-07-31  3793.0 -2.843750 -0.080808 -0.080808\n",
       "8  68391610  10000.0 1986-08-29  3793.0 -1.093750 -0.615385 -0.615385\n",
       "9  68391610  10000.0 1986-09-30  3793.0 -1.031250 -0.057143 -0.057143"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = db.raw_sql(\n",
    "    \"\"\"\n",
    "    SELECT\n",
    "        cusip, permno, date, shrout, prc, ret, retx\n",
    "    FROM \n",
    "        crsp.msf\n",
    "    LIMIT 10\n",
    "    \"\"\",\n",
    "    date_cols=['date']\n",
    "    )\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gvkey</th>\n",
       "      <th>datadate</th>\n",
       "      <th>tic</th>\n",
       "      <th>conm</th>\n",
       "      <th>at</th>\n",
       "      <th>lt</th>\n",
       "      <th>prccm</th>\n",
       "      <th>cshoq</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>006066</td>\n",
       "      <td>1962-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>2112.301</td>\n",
       "      <td>731.700</td>\n",
       "      <td>389.9996</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>006066</td>\n",
       "      <td>1963-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>2373.857</td>\n",
       "      <td>782.119</td>\n",
       "      <td>506.9994</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>006066</td>\n",
       "      <td>1964-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>3309.152</td>\n",
       "      <td>1055.072</td>\n",
       "      <td>409.4995</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>006066</td>\n",
       "      <td>1965-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>3744.917</td>\n",
       "      <td>1166.771</td>\n",
       "      <td>498.9991</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>006066</td>\n",
       "      <td>1966-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>4660.777</td>\n",
       "      <td>1338.149</td>\n",
       "      <td>371.4997</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>006066</td>\n",
       "      <td>1967-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>5598.668</td>\n",
       "      <td>1767.067</td>\n",
       "      <td>626.9995</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>006066</td>\n",
       "      <td>1968-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>6743.430</td>\n",
       "      <td>2174.291</td>\n",
       "      <td>314.9998</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>006066</td>\n",
       "      <td>1969-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>7389.957</td>\n",
       "      <td>2112.967</td>\n",
       "      <td>364.4994</td>\n",
       "      <td>113.717</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>006066</td>\n",
       "      <td>1970-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>8539.047</td>\n",
       "      <td>2591.909</td>\n",
       "      <td>317.7496</td>\n",
       "      <td>114.587</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>006066</td>\n",
       "      <td>1971-12-31</td>\n",
       "      <td>IBM</td>\n",
       "      <td>INTL BUSINESS MACHINES CORP</td>\n",
       "      <td>9576.219</td>\n",
       "      <td>2933.837</td>\n",
       "      <td>336.4996</td>\n",
       "      <td>115.534</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    gvkey    datadate  tic                         conm        at        lt  \\\n",
       "0  006066  1962-12-31  IBM  INTL BUSINESS MACHINES CORP  2112.301   731.700   \n",
       "1  006066  1963-12-31  IBM  INTL BUSINESS MACHINES CORP  2373.857   782.119   \n",
       "2  006066  1964-12-31  IBM  INTL BUSINESS MACHINES CORP  3309.152  1055.072   \n",
       "3  006066  1965-12-31  IBM  INTL BUSINESS MACHINES CORP  3744.917  1166.771   \n",
       "4  006066  1966-12-31  IBM  INTL BUSINESS MACHINES CORP  4660.777  1338.149   \n",
       "5  006066  1967-12-31  IBM  INTL BUSINESS MACHINES CORP  5598.668  1767.067   \n",
       "6  006066  1968-12-31  IBM  INTL BUSINESS MACHINES CORP  6743.430  2174.291   \n",
       "7  006066  1969-12-31  IBM  INTL BUSINESS MACHINES CORP  7389.957  2112.967   \n",
       "8  006066  1970-12-31  IBM  INTL BUSINESS MACHINES CORP  8539.047  2591.909   \n",
       "9  006066  1971-12-31  IBM  INTL BUSINESS MACHINES CORP  9576.219  2933.837   \n",
       "\n",
       "      prccm    cshoq  \n",
       "0  389.9996      NaN  \n",
       "1  506.9994      NaN  \n",
       "2  409.4995      NaN  \n",
       "3  498.9991      NaN  \n",
       "4  371.4997      NaN  \n",
       "5  626.9995      NaN  \n",
       "6  314.9998      NaN  \n",
       "7  364.4994  113.717  \n",
       "8  317.7496  114.587  \n",
       "9  336.4996  115.534  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = db.raw_sql(\n",
    "    \"\"\"\n",
    "    SELECT \n",
    "        a.gvkey, a.datadate, a.tic, a.conm, a.at, a.lt, b.prccm, b.cshoq\n",
    "    FROM \n",
    "        comp.funda a\n",
    "    JOIN \n",
    "        comp.secm b ON a.gvkey = b.gvkey AND a.datadate = b.datadate\n",
    "    WHERE \n",
    "        a.tic = 'IBM' AND \n",
    "        a.datafmt = 'STD' AND \n",
    "        a.consol = 'C' AND \n",
    "        a.indfmt = 'INDL'\n",
    "    LIMIT 10\n",
    "    \"\"\"\n",
    ")\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>datadate</th>\n",
       "      <th>gvkey</th>\n",
       "      <th>cusip</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1982-10-31</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1983-10-31</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1984-10-31</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1985-10-31</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1986-10-31</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1987-10-31</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1988-10-31</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1989-06-30</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1990-06-30</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1991-06-30</td>\n",
       "      <td>002484</td>\n",
       "      <td>121579932</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     datadate   gvkey      cusip\n",
       "0  1982-10-31  002484  121579932\n",
       "1  1983-10-31  002484  121579932\n",
       "2  1984-10-31  002484  121579932\n",
       "3  1985-10-31  002484  121579932\n",
       "4  1986-10-31  002484  121579932\n",
       "5  1987-10-31  002484  121579932\n",
       "6  1988-10-31  002484  121579932\n",
       "7  1989-06-30  002484  121579932\n",
       "8  1990-06-30  002484  121579932\n",
       "9  1991-06-30  002484  121579932"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = {\"tickers\": (\"0015B\", \"0030B\", \"0032A\", \"0033A\", \"0038A\")}\n",
    "df = db.raw_sql(\n",
    "    \"\"\"\n",
    "    SELECT \n",
    "        datadate, gvkey, cusip \n",
    "    FROM comp.funda \n",
    "    WHERE \n",
    "        tic IN %(tickers)s\n",
    "    LIMIT 10\n",
    "    \"\"\",\n",
    "    params=params,\n",
    ")\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Misc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "28073"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = db.get_row_count('djones', 'djdaily')\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "db.close()"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}