Potential DBMS Final Exam Topics
Spring, 2004
Johnson

It goes without saying, but you're responsible for all material covered in lecture (and not just in PPTs) and for all assigned reading. (Incidentally, something appearing in a PPT, perhaps something "for further info" but which wasn't covered in class, well, was not covered.) The greatest emphasis will be on material from lecture, and the least emphasis will be on material from the assigned "skimming" (although there may be some to check that you did it). Generally speaking, I'd recommend studying your notes from class, the slides, and the reading, in that order.

As is often the case, the final will be cumulative in theory, and somewhat cumulative in practice. That is, you remain responsible for all material covered throughout the semester, but the greater emphasis will naturally be placed on material covered sinc e the midterm, with one proviso:

SQL remains the single most important skill you should have gained in this class, so there will be a significant amount of SQL on the final. The SQL questions will involve queries of hw2/hw3-level difficulty, as opposed to midterm-level difficulty. sqlzoo.net (which has solutions for many of the questions at the bottom of the front page) would again be a very good place to study. Also, here are some practice SQL questions.

In addition to short answer, SQL, and problem-solving, some questions might ask for short essay responses: i.e., compare and contrast these two things/concepts/techniques, or what does this have to do with that?

The exam is intended to be moderately challenging (more difficult than the midterm, and with no optional questions). It won't be the sort of exam that you have to get 95% to get a good grade. As you know, final grades will be curved. This is a very small class, so, to the extent that scores are high, I'll be hoping to be persuaded to stray somewhat from the Stern Curve. Help me out!

Don't for get to do the anonymous final eval, by Tuesday, May 3, in order not to lose 50% of your final grade.

Apart from SQL, questions will address a number of other topics, such as:

hash tables:
-what sorts of searches are they good for (firstname/lastname, etc.)?
-what kinds of problems are they used for besides indices?
-why would we want to hash the passwords stored for a system or website?

trees in general:
-what (special) kinds of trees have we talked about?
-what kinds of problems are they used for?

Granting and revoking permissions

Perl/CGI/PHP/HTML/HTTP
-how does Perl/CGI produce dynamic content in responce to HTTP requests?
-how does PHP do it?
-write a very simple dynamic page, using whichever language you used for the project
-what are injection attacks?
--given a particular query and get/post vars, find inputs that would result in the user being (wrongly) authenticated or, say, allow the user to do something Bad
--what, conceptually, allows them to occur?
--what are the ways we can prevent them?
--does hashing our stored passwords help?

SSD/XML:
-how does XML differ from HTML?
-what makes XML "extensible"?
-what are DTDs/Schemas for?
-what are Well-formed and Valid XML?
-convert a non-tree SSD graph into XML, using ids and idrefs -why, in terms of trees or hard disk directory structure, is xml in which the tags are closed in the wrong order not allowed?
-dicuss some important uses of XML

What is data warehousing?
-why is it done?
-what does it have to do with GROUP BY queries?
-What are OLAP and OLTP?
-what are data cubes?
-be able to write ROLLUP and CUBE queries

What are some technologies used for data mining?
-are any of them trees?
-how can you find frequent itemsets or association rules
-why would you want to?

How does Google work?
-more narrowly, how does PageRank work?
-what has Google got to do with RAID?
-why do "Google bombs" work?
-what is an inverted index?

Programming with SQL:
-understanding the stategies for connecting SQL with another programming language (and why?)
--preprocessing with Pro*C
--using an abstract API like Java JDBC
--writing stored procedures and functions in a language like PL/SQL
--writing triggers, again in a language like PL/SQL
-how do you, in practice, write one of these (what are the compilers, etc.)?
-when (for what kinds of problems/under what circumstances) would you use these choices?

Vendors: in picking a DBMS for your organization, how would you choose between (say) Oracle and MySQL?

RAID
-what it's for
-what the important "levels" are
-given the data n disks for level 4, show the RAID data; given the RAID disk and n-1 disks, with one missing (destroyed) disk, compute the data for the missing disk

Everything from the midterm, and any other topics I've forgotten...