C20.0046 - Spring, 2008 - Johnson
Homework 3
Due: before the final, presumably

Practice questions

(do, but do not turn in!)
  1. Any and all subparts of exercises 5.1, 5.2, 5.3, 5.4.
  2. Any and all query exercises in SQLzoo's tutorials, 1-7. These start out very easy and get progressively more challenging.
  3. Why would you prepare queries rather than just execute-immediate? Why would you use parameterized queries?
  4. Given a PHP code snippet, explain what it's doing, or predict its behavior on a particular input (CGI param values), or modify to provide a different desired bahvior.
  5. Write a complete html/php script (or complete a partial script) to perform a simple task, e.g. take a param from the user, search for rows containing it, and print the results.
  6. Suppose you send me the encryption (using my public key) of a random number and that I decrypt it and send it back to you. What does this accomplish? If what we want to do is be able to encrypt a large amount of communication between us, what should we do instead?
  7. Why would you hash your users' paswords? Why would you salt them?
  8. Suppose you learn that a website logs a user in by checking for a non-zero result set for the following query: SELECT * FROM users WHERE user=$user AND password=$pass;. How might you learn this? Given that you have, give a user/password input that might allow you to successfully log into user bigshot's account. How might you protect the website against this? Is, say, using JavaScript in the browser to remove quotes from the input a reliable method?
  9. Compare/contrast/explain/give examples of (as applicable) the following: hash table, hash function, secure hash function, family of hash functions.
  10. Compare/contrast/etc. the following: public-key encryption, private-key encryption, hashing.
  11. Given a (relatively simple) stored procedure or trigger, explain what it's doing, or predict its behavior on a particular input (CGI param values).
  12. Why would you want to use stored procedures instead of programs in conventional languages?
  13. Give the exact definition (including the inherent quality componenet) of the PageRank of a given page. Given an example webgraph and the current estimated PR values of all pages, perform a single iteration, recomputing everyone's PR; or, more quickly, just update a single page's PR based on its linkers.
  14. For the Adwords problem (bidders to keywords--assuming for simplcity that click-through rates are always 1), show that choosing the highest bid is not an optimal algorithm (will not always maximize profits). What about if all bidders have infinite daily budgets?
  15. Identify these as supervised or unsupervised: classification, clustering, finding frequent sets, kNN, Bayes learning, k-means.
  16. Suppose half of emails are spam and half are not. Suppose I receive an email containing words W1,W2,W3 (say, "Nigeria","Minister","account", but for simplicity pretend they form one long word), and that statistically I observe that 5% of spam messages contain these words but only .001% of non-spam messages do. What is the probability (according to Naive Bayes) that this message is spam? (Hint: what is the probability of these words appearing in any message?)
  17. logs:
    • What are log_2(2), log_2(256), log_2(1024), log_2(1million), log_2(1billion),..., log_10(10), log_10(1000), log_10(1milion),... Don't (just) memorize.
    • If there are one million row keys in a (nicely balanced) BST (and if comparing the key to a given node's key and recursing to one of its children if necessary counts as one operation), then how many operations do you expect a single search to take?
    • If there are one million rows in the dictionary, how many compare-and-divide-in-half operations to find a word? What if you do compare-and-divide-into-k-parts operations?
    • If there are 12 rows in the table, how many cross product operations required to produce a table with 123,456 rows?
    • How much more security do you have if secret keys (which are bitstrings) are of length 128 instead of 64?
    • Suppose that in my pyramid multi-level marketing scheme, once you get six of your friends to sign up and pay the start-up fee, you no longer have to pay your monthly fee (= you suceed). In the scheme, call these six people your children. (Each child has only one parent.) In this way, we obtain a series of generations, the first of which comprises just one person, who began the scheme. Suppose we're optimistic and assume that at least at first, everyone who joins succeeds. That is, at least at first, everyone in each generation succeeds. How long can this continue? That is, give an upper bound on the number of generations that can possibly succeed.

mjohnson-at-stern.nyu.edu