DIMACS Summer School Tutorial on New Frontiers in Data Mining Date: August 13 - 17, 2001 Location: DIMACS Center, Rutgers University, Piscataway, NJ Organizers: Dimitrios Gunopulos, University of California at Riverside, [log in to unmask] Nikolaos Koudas, AT&T Labs - Research, [log in to unmask] Related to: Special Focus on Computational Molecular Biology and Special Focus on Data Analysis and Mining. Focus: This "summer school" tutorial program is aimed at providing background, vocabulary, and theoretical methodology to non-specialists in data mining and to others who wish to explore this field and at bringing together students, postdocs, and researchers working on algorithms for data mining with those working in various applications areas. More specifically, we aim to introduce the attendees to the fundamental theoretical/algorithmic issues that arise in data mining and its applications. Data mining is an exciting new field of computer science research, encompassing several diverse techniques for analyzing large datasets. The goal of data mining is to obtain new, interesting and actionable pieces of information. Vast amounts of data are accumulated in diverse application domains, including bioinformatics, epidemiology, business, physical sciences, web applications, and networking. Data mining research is stimulated by hard real life problems in analyzing data in all those areas. Data mining is fundamentally an interdisciplinary field, borrowing and combining techniques from theory, statistics, databases and machine learning, and ultimately producing new approaches. A goal of this tutorial is to bring together students, postdocs, and researchers from the fields of data mining, bioinformatics, networking, and the web, and to facilitate the collaboration between fields, as well as to introduce the field of data mining to those who are not yet working in it or are not yet working in it from an algorithmic point of view. In the tutorial we concentrate on new research directions that are currently emerging in the field: data mining applications in bioinformatics, networking, and the web. We will explore new problems that come up in these areas, identify common threads among the various applications, and consider new paradigms, methods and techniques that are being developed to address these problems. In the tutorial we will emphasize the algorithmic aspects of analyzing large datasets. There are different general ways to approach this problem, such as approximate algorithms and data summarization techniques. We will look at new techniques on stream processing and online algorithms, and their applications to specific problems. Biological research is undergoing a major revolution as new technologies, such as high-throughput DNA sequencing and DNA microarrays, are creating large amounts of data. New techniques in analyzing such data are important in the understanding of biological processes. Many bioinformatics problems can be formulated as generalized searching problems in a large space. We will look at general lattice search techniques with different constraints, as well as new string algorithms. We will also look at applications of classification techniques in the area. Networking and telecommunications applications produce large amounts of data that can be mined for various properties of interest. Time series data prevail in such domains and algorithms for time series matching, sequential pattern identification are of great interest. We will concentrate on incremental and one pass algorithms for networking problems and explore the connection between these problems and similar incremental and one pass problems arising in the biological sciences. The web has emerged as a vast datastore, containing diverse pieces of information. We will examine recent approaches to mine information on the World Wide Web, including efficient web searching and web site personalization efforts. We will also look at data and resource management issues in the web environment, with emphasis on bioinformatics and telecommunications applications. Registration Fee and Procedure: The registration fee is $200 for the week. Graduate students, postdocs and DIMACS Members pay $95 for the week. The fee less registration deposit will be collected on site, cash, check, Visa or Mastercard. Registration fees cover two meals a day, breaks, and all workshop materials. Registration is first come, first served and is limited to 60 people. A non-refundable $50 registration deposit will hold your place. See http://dimacs.rutgers.edu/Workshops/MiningTutorial/registnew.html for information on how to register. Financial Support: Limited financial support for travel, local expenses, or registration fees may be available depending upon support from funding agencies. Applications for financial support can be found at http://dimacs.rutgers.edu/Workshops/MiningTutorial/support.html. WWW Information: http://dimacs.rutgers.edu/Workshops/MiningTutorial DIMACS Center Rutgers, The State University of New Jersey CoRE Bldg., 96 Frelinghuysen Road Piscataway, NJ 08854-8018, USA TEL: 732-445-5928 FAX: 732-445-5932 EMAIL: [log in to unmask] Web: http://dimacs.rutgers.edu/ DIMACS is a partnership of Rutgers University, Princeton University, AT&T Labs - Research, Bell Laboratories, the NEC Research Institute and Telcordia Technologies. ******************************************************************* Christine Spassione Tel: (732) 445-4304 Visitor Coordinator Fax: (732) 445-5932 DCI Program Administrator [log in to unmask] DIMACS Center Rutgers University 96 Frelinghuysen Road Piscataway, NJ 08854-8018 *******************************************************************