Potential DBMS Final Exam Topics
Spring, 2008
Johnson

It goes without saying, but you're responsible for all material covered in lecture (and not just in PPTs) and for all assigned reading. (Incidentally, something appearing in a PPT, perhaps something "for further info" but which wasn't covered in class, well, was not covered.) The greatest emphasis will be on material from lecture, and the least emphasis will be on material from the assigned "skimming" (although there may be some to check that you did it). Generally speaking, I'd recommend studying your notes from class, the slides, and the reading, in that order.

As is often the case, the final will be cumulative in theory, and somewhat cumulative in practice. That is, you remain responsible for all material covered throughout the semester, but the greater emphasis will naturally be placed on material covered since the midterm, with one proviso:

SQL remains the single most important skill you should have gained in this class, so there will be a significant amount of SQL on the final. The SQL questions will involve queries of hw2-level difficulty, as opposed to midterm-level difficulty. sqlzoo.net (which has solutions for many of the questions at the bottom of the front page) would again be a very good place to study.

In addition to short answer, SQL, and problem-solving, some questions might ask for short essay responses: i.e., compare and contrast these two things/concepts/techniques, with pros and cons, or what does this have to do with that? The exam is intended to be moderately challenging. Good luck!

Including SQL, questions will address a number of other topics, such as:

SQL:
-cross product / joins (including new-style)
-self-joins, outer-joins
-group by, having
-subqueries
-set operators

basics of programming for SQL (regardless of language):
-opening connection/logging in
-resultsets
-cursors
-fetching

prepared statements
parameterized statements

on-the-fly content
-with CGI and external programs
-with scripting languages like PHP

simple PHP/HTML/forms coding
simple DB connection/lookup/fetch programming in PHP

security:
-hybrid protocols (ssh/https, etc.)
--authentication
--encryption
-challenge/response (car keys, etc.)
--secure hashing

injection attacks
-what, conceptually, allows them to occur?
-how to prevent them
-writing explicit injection attacks

file organization
-why not just sorted/unsorted files?

binary search trees
-explicitly inserting to and searching
-why better than hash tables?

hash tables:
-as indices
-what sorts of searches are they good for (firstname/lastname, etc.)?
-what kinds of problems are they used for besides indices?
-why would we want to hash the passwords stored for a system or website?

trees in general:
-what (special) kinds of trees have we talked about?
-what kinds of problems are they used for?

logs
-be completely comfortable with the log operation
-we've seen LOTS of things that take log time (or have log many levels...)
-where?
-what accounts for this?
-what do they have in common?

PL/SQL / stored procedures / triggers
-what they are, their significance
-be able to read/understand simple programs

Mining:
-supervised v. non-supervised learning
-frequent set problem & association rules

algorithms/methods to understand:
-k-NN

-Bayes learning - Bayes Theorem
-k-means

websearch
-inverted index
-be able to perform/calculate PageRank
-understand the analogies, like the random surfer model
-why do Google Bombs work?

we've seen several iterative algorithms
-where?
-why are they used?

how RAID works
-given the data n disks for level 4, show the RAID data;
-given the RAID disk and n-1 disks, with one missing (destroyed) disk, compute the data for the missing disk

big recurring themes:
-SQL is really just relational algebra, i.e., tables are relations (i.e., cross products of sets)
-programming for SQL turns out to be basically the same, regardless of language
-security being asymmetric, and therefore hard
-distributed intelligence: many users + backend DB - programming = web2.0
-iteration making problems more tractable
-things that take log-time/#levels
-orgranization (i.e., data structues) influencing how easy it is to find out data
-lots of things in life turn out to be trees (or hashes, or sets...)

And everything from the midterm, everything on hw3, and anything else we've done that I've forgotten about...