Last updated: Sunday, November 25, 2018 5:08 PM .
These pages contain documents, programs, and program outputs related to my paper "Price Discovery in High Resolution".
These are contained in a zipped directory HRVAR_V01.zip. The table at the end of this page summarizes the programs and maps them to tables and figures in the paper.
There are three program subdirectories, organized by source language:
Mathematica is used only for the computational appendix; The SAS programs are used mainly for extraction of the TAQ data (from WRDS). Most of the estimation code is written in Matlab.
Notes on the SAS programs
The first extraction program (taqExtract01.sas) runs on WRDS. It pulls off the consolidated trade and quote data for the sample, and builds the various BBO (best bid and offer) series used in the paper. The second program (taqExtract02.sas) runs locally and converts the sas datasets produced by taqExtract01 to csv files, which can be read directly into Matlab. The output from the extraction programs is placed in the wrdsSASdatasets directory. As the data are proprietary, I cannot supply these files.
Notes on the Matlab programs
The main programs and scripts are in /MatlabNovember2018. The /mFiles subdirectory contains most functions; the mClasses subdirectory contains the class definitions. Program names that start with "MVARi" are computational. An MVARi object (Microstructure Vector Autoregression, version i) describes a VAR/VECM and serves as a container for estimation results. MVARi methods set up the VAR/VECM, build the crossproduct matrix, compute impulse response functions and so forth.
There are three sorts of analyses: "PartSIP" (participant vs. securities information processor time stamps); LexExl (bids and asks from listing and non-listing exchanges); and, "DarkLit" (quotes, lit trades and dark trades).
There are three numerical suffixes, for example: MVARiPartSIPxx, where xx is 01, 02 or 03. The 01 analyses cover IBM and NVDA for October 3, 2016 at resolutions ranging from one second down to 10 microseconds; the 02 analyses cover October 3 through November 11 at a 10 microsecond resolution; the 03 analyses are low-order multiple resolution estimations that are subsequently combined into bridged IRF analyses.
The production runs are executed as batch array jobs (one for each symbol/date) on NYU's High Performance Computing (HPC) system. Each job produces three files: a listing file that contains the Matlab diary (the copy of the screen output) for the run; an "out" file that contains the log file; and a "mat" file that contains the save results of the run. Most of these files are downloaded to the /HPC directory. Program names that start with "summary" summarize the HPC runs and produce figures.
Also note:
- Matlab does not reinitialize diary files; it appends the new output. So some of the listing files may include earlier incomplete runs.
- The computational routines are structured for parallel processing: "parfor" loops are used instead of "for" loops. Some of the runs generated warning messages which are visible in the "out" files.
Notes on the live script files
In addition to the production runs, there are two Matlab mlx files. These are "live script" files. They document the essential classes and computations in the paper, and also contain executable Matlab code:
- VECM_demo.mlx (and the nonexecutable VECM_demo.pdf) demonstrate the setup and estimation of VECMs.
- MVARClassesDemo.mlx (and the nonexecutable MVARClasses.pdf) demonstrate the basic sparse vector, polynomial and crossproduct classes.
Program cross-reference table:
| Analysis | Description | Language | Program 1 | Program 2 | Result Directory | Output Files | Output Figures | Paper Tables | Paper Figures | 
| PartSIP | Resolutions 1s-10us | Matlab | MVARiPartSIP01.m | summaryPartSIP01.m | summary01 | summaryPartSIP01.xlsx | summaryPartSIP01 IBM/NVDA.fig/jpg | 5A | 1 | 
| LexExl | Resolutions 1s-10us | Matlab | MVARiLexExl01.m | summaryLexExl01.m | summary01 | summaryLexExl01.xlsx | summaryLexExl01 IBM/NVDA.fig/jpg | 6A | 2 | 
| DarkLit | Resolutions 1s-10us | Matlab | MVARiDarkLit01.m | summaryDarkLit01.m | summary01 | summaryDarkLit01.xlsx | summaryDarkLit01 IBM/NVDA.fig/jpg | 7A | 3 | 
| PartSIP | Event time | Matlab | MVARiPartSIP01ET.m | MVARi01ETStats.m | summary01 | allStatsET.txt | 5A,B | ||
| LexExl | Event time | Matlab | MVARiLexExl01ET.m | MVARi01ETStats.m | summary01 | allStatsET.txt | 6A,B | ||
| DarkLit | Event time | Matlab | MVARiDarkLit01ET.m | MVARi01ETStats.m | summary01 | allStatsET.txt | 7A,B | ||
| PartSIP | 10us res; 30 days | Matlab | MVARiPartSIP02.m | MVARi02Stats.m | summary02 | allStats.txt | 5B | ||
| LexExl | 10us res; 30 days | Matlab | MVARiLexExl02.m | MVARi02Stats.m | summary02 | allStats.txt | 6B | ||
| DarkLit | 10us res; 30 days | Matlab | MVARiDarkLit02.m | MVARi02Stats.m | summary02 | allStats.txt | 7B | ||
| PartSIP | Bridged info shares | Matlab | MVARiPartSIP03.m | summaryPartSIP03.m | summary03 | summaryPartSIP03.txt | summaryPartSIP03 IBM/NVDA.fig/jpg | 9A | 5 | 
| LexExl | Bridged info shares | Matlab | MVARiLexExl03.m | summaryLexExl03.m | summary03 | summaryLexExl03.txt | summaryLexExl03 IBM/NVDA.fig/jpg | 9B | |
| DarkLit | Bridged info shares | Matlab | MVARiDarkLit03.m | summaryDarkLit03.m | summary03 | summaryDarkLit03.txt | summaryDarkLit03 IBM/NVDA.fig/jpg | 9C | |
| WRDS TAQ extraction | SAS | taqExtract01.sas | |||||||
| SAS datasets to csv | SAS | taqExtract02.sas | |||||||
| Descriptive statistics | SAS | DescStats01.sas | odsDescStats | tables234.rtf/docx | 2,3,4 | ||||
| Bridged models | SAS | varsim04.sas | odsVarsim04 | varsim04_1.rtf/docx | varsim04_1.rtf/docx | 8 | 4 | ||
| VECM examples | MLX | VECM_demo.mlx | VECM_demo.pdf | ||||||
| Sparse vector examples | MLX | MVARClassesDemo.mlx | MVARClassesDemo.pdf |