Rolling Readings - Spring 2000 
Intelligent Information Systems

Professor Foster Provost
Phone: 212-998-0806 
Office: KMC 9-71 
Email: fprovost@stern.nyu.edu


Weekly Reading Assignments
Regular meeting time: Fridays 10am-1pm

Week 1 - Friday, January 28, 2000
Week 2 - Friday, February 4, 2000
Week 3 - Friday, February 11, 2000
Week 4 - Friday, February 18, 2000
Week 5 - Friday, February 25, 2000
Week 6 - Friday, March 3, 2000
Week 7 - Friday, March 11, 2000
Week 8 - Friday, March 24, 2000
Week 9 - Friday, March 31, 2000
Week 10 - Friday, April 7, 2000
Week 11 - Friday, April 14, 2000
Week 12 - Friday, April 21, 2000
Week 13 - Friday, April 28, 2000
Supplemental Reading


Week & Topic Papers


Week 1
 
Information Agents I & Text Processing Concepts T. Joachims, D. Freitag, and T. Mitchell, "WebWatcher: A Tour Guide for the World Wide Web," Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97).
(Students) Billsus, D. and Pazzani, M. (1999). "A Personal News Agent that Talks, Learns and Explains", Proceedings of the Third International Conference on Autonomous Agents (Agents '99). PDFPostscript
(Prof. H. Hirsh, Rutgers - guest moderator) Pazzani M., & Billsus, D. (1997). Learning and Revising User Profiles: The identification of interesting web sites. Machine Learning 27, 313-331. PDF
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley (1999). 
Section 2.5: Classic Information Retrieval
Chapter 7: Text Operations
Excerpt from Chapter 6 of  Machine Learning, Tom Mitchell, McGraw Hill, 1997.  (On "naive" Bayesian classification)
Supplemental reading: Michael W. Berry, Susan T. Dumais, and Todd A. Letsche Computational Methods for Intelligent Information Access, Proc. of Supercomputing '95. (See also the Telcordia Technologies Latent Semantic Indexing Web site.) 

Week 2
Information Extraction I R. Grishman, "Information Extraction: Techniques and Challenges."  Information Extraction (International Summer School SCIE-97), ed. Maria Teresa Pazienza, Springer-Verlag, 1997.
(Prof. R. Grishman, NYU CSD) R. Grishman, "Information Extraction," Chapter 27, Handbook of Computational Linguistics (to appear).
incl. demo on extracting information from WSJ news stories
 


Week 3
Information Integration and Web Info Processing I W. Cohen, "WHIRL: A Word-based Information Representation Language," Draft.
(William Cohen, ATT Research) William W. Cohen. Recognizing Structure in Web Pages using Similarity Queries In AAAI-99. Download.
. William W. Cohen, Haym Hirsh Joins that Generalize: Text Classification Using WHIRL In KDD-98. Download.
WHIRL Site (including demos)
Supplemental reading: William W. Cohen The WHIRL Approach to Information Integration In the Trends & Controversies section of IEEE Intelligent Systems. Download. (About 2Mb in Acrobat format).
William W. Cohen Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity In SIGMOD-98. Download.

Week 4
Information Integration and Web Info Processing II Chapter 3, Chapter 14 and Sections 4.6-4.8
of Principles of Database and Knowledge-Base Systems, by Ullman.
(Vassalos) Database Techniques for the World-Wide Web: A Survey,
by Florescu et al, at
Microsoft's vision for XML, by Adam Bosworth
XML and Electronic Commerce: Enabling the Network Economy, by Meltzer and Glushko
Supplemental reading::  Information Integration Using Logical Views, by Ullman
The TSIMMIS approach to mediation: Data models and Languages

Week 5
Data Mining 101 T. Mitchell.  Machine Learning and Data Mining. Communications of the ACM Vol. 42 No. 11, November 1999, pp. 31-36.
(Dhar & Provost) Usama Fayyad and Ramasamy Uthurusamy.  Data mining and knowledge discovery in databases. Communications of the ACM Vol. 39 No. 11, November 1996, pp. 24 - 26.
Usama Fayyad, Gregory Piatetsky-Shapiro and Padhraic Smyth. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM Vol. 39 No. 11, November 1996, pp. 27 - 34 
Chapter 3 of  Machine Learning, Tom Mitchell, McGraw Hill, 1997.  (On Decision Tree Learning.)
Johannes Fürnkranz. Separate-and-Conquer Rule Learning. Artificial Intelligence Review 13(1):3-54, 1999.  Draft.
Supplemental reading: R. Brachman, T. Khabaza, W. Kloesgen, G. Piatetsky-Shapiro,
and E. Simoudis. Mining Business Databases.  Communications of the ACM Vol. 39 No. 11, November 1996, pp. 42-48.


Week 6
Statistical AI and Finance  Stein, working paper IS 99-15
(Roger Stein, Moody's)
Artificial Neural Networks.  Chapter 4 of  Machine Learning, Tom Mitchell, McGraw Hill, 1997.
Bernard Widrow, David E. Rumelhart and Michael A. Lehr.  Neural networks: applications in industry, business and science.  CACM 37(3) pages 93 - 105 (March 1994).
Supplemental: Provost, F., D. Jensen and T. Oates, "Efficient Progressive Sampling." Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99).

Week 7 -- In-class assignment - No readings


Week 8
 
Information Extraction II Robin Burke, Kristian Hammond, Vladimir Kulyukin, Steven Lytinen, Noriko Tomuro, and Scott Schoenberg, Question Answering from Frequently-Asked Question Files: Experiences with the FAQ Finder System. University of Chicago, Department of Computer Science Technical Report TR-97-05. (See also the FAQIndex web site.)
(Students) P.D. Turney, Learning to Extract Keyphrases from Text. NRC Technical Report ERB-1057, National Research Council Canada, 1999.
Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin and Craig G. Nevill-Manning (1999). Domain-Specific Keyphrase Extraction. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, Stockholm, Sweden. Morgan Kaufmann Publishers, San Francisco, CA, pp. 668-673.
Mary Elaine Califf and Raymond J. Mooney. Relational Learning of Pattern-Match Rules for Information Extraction.  Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, pp. 328-334, July, 1999. 
To read more: Bibliography on text summarization/keyphrase extraction


Week 9
Knowledge-based Systems Frederick Hayes-Roth and Neil Jacobstein. The state of knowledge-based systems.  CACM 37 (3), pages 26 - 39 (March 1994).
(Provost & Dhar) Richard Duda and Edward Shortliffe (1983).  Expert Systems Research.  Science 200 (4594), pages 261-268.
Extracts from:
Bruce G. Buchanan and Edward H. Shortliffe.  Rule-based Expert Systems.  Addison-Wesley (1984).
D. Lenat (1995). Cyc: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM 38(11) (November). For more recent information, you may want to browse CYC's web site.
V. Dhar (1987). On the Plausibility and Scope of Expert Systems in Management.  Journal of Management Information Systems 4(1). [FIRST PART (FRAMEWORK) ONLY.]
Supplemental reading: B.G.Buchanan and E.A.Feigenbaum. DENDRAL and MetaDENDRAL: Their applications dimension. Artificial Intelligence 11 (1978), 5-24.
  George A. Miller (1995). WordNet a lexical database for English. Communications of the ACM 38(11), pages 39-41. 

Week 10
Information Mining M Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery, "Learning to Extract Symbolic Knowledge from the World Wide Web," Fifteenth National Conference on Artificial Intelligence (AAAI-98).
(Students) M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Construct Knowledge Bases from the World Wide Web, to appear in Artificial Intelligence.
L. Giles, K. Bollacker, S. Lawrence (1998). "CiteSeer: An Automatic Citation Indexing System, Proceedings of the 3rd ACM Conference on Digital Libraries." pp. 89-98.  Also see "ResearchIndex", an implementation of CiteSeer.

Week 11
Wireless Information Access

Prof. H. Hirsh, Rutgers
Three submitted papers distributed


Week 12
Issues in the Evaluation of "Discovered" Knowledge and Machine Learning (continued) Parts II & IV. F. Provost and D. Jensen.  Evaluating Machine Learning and Knowledge Discovery.  Tutorial (Given at KDD-98, AAAI-99, IJCAI-99).
(Provost)
(Tuzhilin)
Provost, F. and T. Fawcett, "Robust Classification for Imprecise Environments." To appear in Machine Learning (submitted version).
Foster Provost, Tom Fawcett, and Ron Kohavi (1998). The case against accuracy estimation for comparing induction algorithms. Proceedings of the Fifteenth International Conference on Machine Learning.
J. Friedman (1996). On bias, variance, 0/1 loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1:1, pp. 55-77, 1997. (also appears as a technical report).
need refs for Alex's half
Supplemental material: Provost & Jensen. Bibliography on evaluation issues for discovery systems.

Week 13a&b
a) Knowledge Discovery from Relational Information Webs M. Sparrow (1991). Network vulnerabilities and strategic intelligence in law enforcement. International Journal of Intelligence and Counterintelligence 5(3): 255-274. 
(Prof. D. Jensen, Umass) Members of the Clever Project (1999). Hypersearching the Web. Scientific American. June. pp. 54-60.
Demo: Understanding corporate ownership/management webs J.R. Quinlan (1990). Learning logical definitions from relations. Machine Learning. 5: 239-266.
b) Issues in the Evaluation of "Discovered" Knowledge and Machine Learning Parts I & III. F. Provost and D. Jensen.  Evaluating Machine Learning and Knowledge Discovery.  Tutorial (Given at KDD-98, AAAI-99, IJCAI-99).
(Prof. D. Jensen, Umass) D. Jensen and P.R. Cohen (2000). Multiple comparisons in induction algorithms. Machine Learning 38(3)
Supplemental reading: T. Oates and Jensen, D. (1997), The effects of training set size on decision tree complexity.  Machine Learning: Proceedings of the Fourteenth International Conference. Morgan Kaufmann. 254-262.

 




Supplemental Reading
Search and Knowledge Discovery Agents that Plan.  Chapter 7, N. Nilsson, Artificial Intelligence: A New Synthesis, Morgan Kaufmann (1998).
Uninformed Search. Chapter 8, N. Nilsson, Artificial Intelligence: A New Synthesis, Morgan Kaufmann (1998).
. Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207--216, Washington, D.C., May 1993.
Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, September 1994.
Ramakrishnan Srikant and Rakesh Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995. 
Webb, G.I. (1995) "OPUS: An Efficient Admissible Algorithm for Unordered Search", Journal of AI Research, Volume 3, pages 431-465. PDF
Provost, F., J. Aronis, and B. Buchanan. "Rule-space search for knowledge-based discovery."  Report #IS 99-012, Information Systems Department, Stern School of Business, New York University.
Heuristic Search. Chapter 8, N. Nilsson, Artificial Intelligence: A New Synthesis, Morgan Kaufmann (1998).