More free advice, and worth what you paid for it. This is one of a
collection of advice pages, all of them with the same theme: get
skills, especially quant skills, they're fun and they pay off. They
include advice
for undergrads, advice for MBAs (soon!), and this one. None of
these thoughts are the official view of the Department, the School, or
the University, but we (meaning my colleagues and I) believe them to be
accurate, maybe even useful. Comments, of course, are welcome.

These thoughts are aimed at both undergrad and MBA students. The
main difference between them is time: undergrads have four years to
take courses, MBAs only have two. Even so, we think a few courses along
these lines will prove useful to you in your life and career.

[Work in progress, please send comments for improvement]

**Q2. How do I get
started?**

We like the idea of leaping into the deep end of the pool to see if
you know how to swim, but you will find it easier if you have some
background:

- First,
**programming experience** is really helpful,
but if you don't have it, you can teach yourself. See Q4 below. The
languages of choice in the business world right now are Python, R,
Matlab, and C++. They serve different purposes, but we find that
once you know one, learning another isn't that hard. If you're
looking for a place to start, we recommend Python. More below.
- Second, it's helpful to know some
**basic math**.
Calculus and linear algebra are incredibly useful, you would
benefit from familiarity with each.

Even without this background, there's a lot you can do, but you can
do more if you have at least the first of these.

**Q3. Yeah, fine, but what about Data
Science?**

If you're not ready to swim yet, return to Q2. How will you know?
Look at the prereqs of the courses you want to take. Or give them a try
and see if they cause you more stress than you're ready to deal with.

If you're ready to go, you can put these skills to work in a number
of ways. Data Science is a portfolio of skills, you can get them one at
a time or focus on those that interest you most. A relatively standard
set of courses would include some or all of the following:

- Data science: an overview. Here's an example
with lots of hands-on practice with data and (mostly) Python
programming.
- Probability theory: mathematical models of randomness.
- Multivariate statistics or econometrics: estimating linear models
-- picture a scatterplot with a straight line drawn through it.
- Data mining: a variety of methods for finding patterns in large
datasets.

In most cases, there are applied versions of these courses at Stern
and more theoretical versions at Courant. You can find descriptions of
courses and programs at NYU's Center for Data Science, at
Stern's IOMS
group, and at the Courant Institute of
Mathematical Sciences. This is way too much information, but you
might want to page through it anyway. For current purposes, you should
probably ignore the programs and focus on the courses and their
content. If there's a course at NYU for one program, you can probably
count it toward another program.

One thing we'd add: get some practice with graphics, which now goes
by the name "visualization." They give you a good first cut at data and
can be an effective way to describe it to others.

**Q4. Can you tell me more about
Python?**

We thought you'd never ask! We think Python
is the obvious entry point if you want to find out what programming is
about. It's a flexible general-purpose high-level language, high-level
meaning that you don't need to do everything, the program does most of
it for you. It's quickly building a large community of users, including
many at investment and consulting firms. It has packages that allow you
to do scientific programming, graphics, and data analysis. It's the
basis of the Google
app engine. It's a skill employers ask for.

If you want a taste, you can teach yourself Python as a summer or
winter break project. It'll take some discipline, but Professor Okun
did it, and says if he can, then you can, too.

[What follows is a work in progress, but stop by or email if you
have questions.]

**Distributions.** If you do this, you will need the
program (the "code distribution"), a user interface (a "GUI" or "IDE"),
and a collection of packages that do whatever specialized tasks you're
interested in.

**Packages.**
There are lots of packages, but these are essential. NumPy does vector arithmetic, making
it a good substitute for Matlab. SciPy does math and statistics. Pandas does data analysis. Matplotlib does graphics.

**Installing Python.** There are lots of ways to do this
-- unfortunately. Here are two we like a lot. Wakari lets you edit and run IPython
notebooks online. The beauty of this is that you get a controlled
operating environment with no setup. Anaconda gives
you a standard set of quantitative packages and comes with Spyder, a
user interface that makes it easy to edit and run programs. In both
cases, I prefer to use recent versions of Python (some version of
Python 3). That leads to occasional conflicts with older versions, but
I think you're better off looking to the future.

**Getting started.** There are lots of things out
there, but we think the two best are Python the hard way
and Codecademy.
The latter is particularly user-friendly because you do all your coding
online. Both give you the basics, but only the basics: you won't see
Pandas or Matplotlib, the standard tools for data management and
graphics, respectively. You can get a quick sense of what they do from
Sargent and Stachurski's Quantitative
Economics. It's very good, but a little on the terse side. We're
working on our own Data Bootcamp, which will cover that material a
little more systematically, but it's not ready for prime time yet.
Here's a link to a very
rough draft. When we fix it up, we'll try to update the link.

Comments welcome on all of this.

**Q5. Does economics help
here?**

Our goal here isn't to sell you on economics, although we think
economics is helpful in giving you a framework for thinking about data.
See the comments by Susan
Athey, the chief economist at Microsoft, and Michael
Bailey, who works at Facebook.

More advice along the same lines: Undergrad
advice | Graduate
programs | MBA advice (soon!)