| Rolling Readings - Spring 2000
Intelligent Information Systems |
| Professor Foster
Provost
Phone: 212-998-0806 Office: KMC 9-71 Email: fprovost@stern.nyu.edu |
Week
1 - Friday, January 28, 2000
Week
2 - Friday, February 4, 2000
Week
3 - Friday, February 11, 2000
Week
4 - Friday, February 18, 2000
Week
5 - Friday, February 25, 2000
Week
6 - Friday, March 3, 2000
Week
7 - Friday, March 11, 2000
Week
8 - Friday, March 24, 2000
Week
9 - Friday, March 31, 2000
Week
10 - Friday, April 7, 2000
Week
11 - Friday, April 14, 2000
Week
12 - Friday, April 21, 2000
Week
13 - Friday, April 28, 2000
Supplemental
Reading
| Week & Topic | Papers |
| Information Agents I & Text Processing Concepts | T. Joachims, D. Freitag, and T. Mitchell, "WebWatcher: A Tour Guide for the World Wide Web," Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97). |
| (Students) | Billsus, D. and Pazzani, M. (1999). "A Personal News Agent that Talks, Learns and Explains", Proceedings of the Third International Conference on Autonomous Agents (Agents '99). PDFPostscript |
| (Prof. H. Hirsh, Rutgers - guest moderator) | Pazzani M., & Billsus, D. (1997). Learning and Revising User Profiles: The identification of interesting web sites. Machine Learning 27, 313-331. PDF |
| R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley
(1999).
Section 2.5: Classic Information Retrieval; Chapter 7: Text Operations |
|
| Excerpt from Chapter 6 of Machine Learning, Tom Mitchell, McGraw Hill, 1997. (On "naive" Bayesian classification) | |
| Supplemental reading: | Michael W. Berry, Susan T. Dumais, and Todd A. Letsche Computational Methods for Intelligent Information Access, Proc. of Supercomputing '95. (See also the Telcordia Technologies Latent Semantic Indexing Web site.) |
| Information Extraction I | R. Grishman, "Information Extraction: Techniques and Challenges." Information Extraction (International Summer School SCIE-97), ed. Maria Teresa Pazienza, Springer-Verlag, 1997. |
| (Prof. R. Grishman, NYU CSD) | R. Grishman, "Information Extraction," Chapter 27, Handbook of Computational Linguistics (to appear). |
| incl. demo on extracting information from WSJ news stories | |
| Information Integration and Web Info Processing I | W. Cohen, "WHIRL: A Word-based Information Representation Language," Draft. |
| (William Cohen, ATT Research) | William W. Cohen. Recognizing Structure in Web Pages using Similarity Queries In AAAI-99. Download. |
| . | William W. Cohen, Haym Hirsh Joins that Generalize: Text Classification Using WHIRL In KDD-98. Download. |
| WHIRL Site (including demos) | |
| Supplemental reading: | William W. Cohen The WHIRL Approach to Information Integration In the Trends & Controversies section of IEEE Intelligent Systems. Download. (About 2Mb in Acrobat format). |
| William W. Cohen Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity In SIGMOD-98. Download. |
| Information Integration and Web Info Processing II | Chapter 3, Chapter 14 and Sections 4.6-4.8
of Principles of Database and Knowledge-Base Systems, by Ullman. |
| (Vassalos) | Database
Techniques for the World-Wide Web: A Survey,
by Florescu et al, at |
| Microsoft's vision for XML, by Adam Bosworth | |
| XML and Electronic Commerce: Enabling the Network Economy, by Meltzer and Glushko | |
| Supplemental reading:: | Information Integration Using Logical Views, by Ullman |
| The TSIMMIS approach to mediation: Data models and Languages |
| Data Mining 101 | T. Mitchell. Machine Learning and Data Mining. Communications of the ACM Vol. 42 No. 11, November 1999, pp. 31-36. |
| (Dhar & Provost) | Usama Fayyad and Ramasamy Uthurusamy. Data mining and knowledge discovery in databases. Communications of the ACM Vol. 39 No. 11, November 1996, pp. 24 - 26. |
| Usama Fayyad, Gregory Piatetsky-Shapiro and Padhraic Smyth. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM Vol. 39 No. 11, November 1996, pp. 27 - 34 | |
| Chapter 3 of Machine Learning, Tom Mitchell, McGraw Hill, 1997. (On Decision Tree Learning.) | |
| Johannes Fürnkranz. Separate-and-Conquer Rule Learning. Artificial Intelligence Review 13(1):3-54, 1999. Draft. | |
| Supplemental reading: | R. Brachman, T. Khabaza, W. Kloesgen, G. Piatetsky-Shapiro,
and E. Simoudis. Mining Business Databases. Communications of the ACM Vol. 39 No. 11, November 1996, pp. 42-48. |
| Statistical AI and Finance | Stein, working paper IS 99-15 |
| (Roger Stein, Moody's) | |
| Artificial Neural Networks. Chapter 4 of Machine Learning, Tom Mitchell, McGraw Hill, 1997. | |
| Bernard Widrow, David E. Rumelhart and Michael A. Lehr. Neural networks: applications in industry, business and science. CACM 37(3) pages 93 - 105 (March 1994). | |
| Supplemental: | Provost, F., D. Jensen and T. Oates, "Efficient Progressive Sampling." Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99). |
| Information Extraction II | Robin Burke, Kristian Hammond, Vladimir Kulyukin, Steven Lytinen, Noriko Tomuro, and Scott Schoenberg, Question Answering from Frequently-Asked Question Files: Experiences with the FAQ Finder System. University of Chicago, Department of Computer Science Technical Report TR-97-05. (See also the FAQIndex web site.) |
| (Students) | P.D. Turney, Learning to Extract Keyphrases from Text. NRC Technical Report ERB-1057, National Research Council Canada, 1999. |
| Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin and Craig G. Nevill-Manning (1999). Domain-Specific Keyphrase Extraction. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, Stockholm, Sweden. Morgan Kaufmann Publishers, San Francisco, CA, pp. 668-673. | |
| Mary Elaine Califf and Raymond J. Mooney. Relational Learning of Pattern-Match Rules for Information Extraction. Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, pp. 328-334, July, 1999. | |
| To read more: | Bibliography on text summarization/keyphrase extraction |
| Knowledge-based Systems | Frederick Hayes-Roth and Neil Jacobstein. The state of knowledge-based systems. CACM 37 (3), pages 26 - 39 (March 1994). |
| (Provost & Dhar) | Richard Duda and Edward Shortliffe (1983). Expert Systems Research. Science 200 (4594), pages 261-268. |
| Extracts from:
Bruce G. Buchanan and Edward H. Shortliffe. Rule-based Expert Systems. Addison-Wesley (1984). |
|
| D. Lenat (1995). Cyc: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM 38(11) (November). For more recent information, you may want to browse CYC's web site. | |
| V. Dhar (1987). On the Plausibility and Scope of Expert Systems in Management. Journal of Management Information Systems 4(1). [FIRST PART (FRAMEWORK) ONLY.] | |
| Supplemental reading: | B.G.Buchanan and E.A.Feigenbaum. DENDRAL and MetaDENDRAL: Their applications dimension. Artificial Intelligence 11 (1978), 5-24. |
| George A. Miller (1995). WordNet a lexical database for English. Communications of the ACM 38(11), pages 39-41. |
| Information Mining | M Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery, "Learning to Extract Symbolic Knowledge from the World Wide Web," Fifteenth National Conference on Artificial Intelligence (AAAI-98). |
| (Students) | M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. Learning to Construct Knowledge Bases from the World Wide Web, to appear in Artificial Intelligence. |
| L. Giles, K. Bollacker, S. Lawrence (1998). "CiteSeer: An Automatic Citation Indexing System, Proceedings of the 3rd ACM Conference on Digital Libraries." pp. 89-98. Also see "ResearchIndex", an implementation of CiteSeer. | |
| Wireless Information Access | |
|
Prof. H. Hirsh, Rutgers |
Three submitted papers distributed |
| Issues in the Evaluation of "Discovered" Knowledge and Machine Learning (continued) | Parts II & IV. F. Provost and D. Jensen. Evaluating Machine Learning and Knowledge Discovery. Tutorial (Given at KDD-98, AAAI-99, IJCAI-99). |
| (Provost)
(Tuzhilin) |
Provost, F. and T. Fawcett, "Robust Classification for Imprecise Environments." To appear in Machine Learning (submitted version). |
| Foster Provost, Tom Fawcett, and Ron Kohavi (1998). The case against accuracy estimation for comparing induction algorithms. Proceedings of the Fifteenth International Conference on Machine Learning. | |
| J. Friedman (1996). On bias, variance, 0/1 loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1:1, pp. 55-77, 1997. (also appears as a technical report). | |
| need refs for Alex's half | |
| Supplemental material: | Provost & Jensen. Bibliography on evaluation issues for discovery systems. |
| a) Knowledge Discovery from Relational Information Webs | M. Sparrow (1991). Network vulnerabilities and strategic intelligence in law enforcement. International Journal of Intelligence and Counterintelligence 5(3): 255-274. |
| (Prof. D. Jensen, Umass) | Members of the Clever Project (1999). Hypersearching the Web. Scientific American. June. pp. 54-60. |
| Demo: Understanding corporate ownership/management webs | J.R. Quinlan (1990). Learning logical definitions from relations. Machine Learning. 5: 239-266. |
| b) Issues in the Evaluation of "Discovered" Knowledge and Machine Learning | Parts I & III. F. Provost and D. Jensen. Evaluating Machine Learning and Knowledge Discovery. Tutorial (Given at KDD-98, AAAI-99, IJCAI-99). |
| (Prof. D. Jensen, Umass) | D. Jensen and P.R. Cohen (2000). Multiple comparisons in induction algorithms. Machine Learning 38(3) |
| Supplemental reading: | T. Oates and Jensen, D. (1997), The effects of training set size on decision tree complexity. Machine Learning: Proceedings of the Fourteenth International Conference. Morgan Kaufmann. 254-262. |
| Search and Knowledge Discovery | Agents that Plan. Chapter 7, N. Nilsson, Artificial Intelligence: A New Synthesis, Morgan Kaufmann (1998). |
| Uninformed Search. Chapter 8, N. Nilsson, Artificial Intelligence: A New Synthesis, Morgan Kaufmann (1998). | |
| . | Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207--216, Washington, D.C., May 1993. |
| Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, September 1994. | |
| Ramakrishnan Srikant and Rakesh Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995. | |
| Webb, G.I. (1995) "OPUS: An Efficient Admissible Algorithm for Unordered Search", Journal of AI Research, Volume 3, pages 431-465. PDF | |
| Provost, F., J. Aronis, and B. Buchanan. "Rule-space search for knowledge-based discovery." Report #IS 99-012, Information Systems Department, Stern School of Business, New York University. | |
| Heuristic Search. Chapter 8, N. Nilsson, Artificial Intelligence: A New Synthesis, Morgan Kaufmann (1998). |