DIMACS Summer School Tutorial on New Frontiers in Data Mining
Date: August 13 - 17, 2001
Location: DIMACS Center, Rutgers University, Piscataway, NJ
Dimitrios Gunopulos, University of California at Riverside, [log in to unmask]
Nikolaos Koudas, AT&T Labs - Research, [log in to unmask]
Related to: Special Focus on Computational Molecular Biology
and Special Focus on Data Analysis and Mining.
Focus: This "summer school" tutorial program is aimed at
providing background, vocabulary, and theoretical methodology
to non-specialists in data mining and to others who wish to
explore this field and at bringing together students, postdocs,
and researchers working on algorithms for data mining with those
working in various applications areas. More specifically, we aim
to introduce the attendees to the fundamental theoretical/algorithmic
issues that arise in data mining and its applications.
Data mining is an exciting new field of computer science research,
encompassing several diverse techniques for analyzing large datasets.
The goal of data mining is to obtain new, interesting and actionable
pieces of information. Vast amounts of data are accumulated in diverse
application domains, including bioinformatics, epidemiology, business,
physical sciences, web applications, and networking. Data mining
research is stimulated by hard real life problems in analyzing data
in all those areas. Data mining is fundamentally an interdisciplinary
field, borrowing and combining techniques from theory, statistics,
databases and machine learning, and ultimately producing new approaches.
A goal of this tutorial is to bring together students, postdocs, and
researchers from the fields of data mining, bioinformatics,
networking, and the web, and to facilitate the collaboration between
fields, as well as to introduce the field of data mining to those who
are not yet working in it or are not yet working in it from an
algorithmic point of view.
In the tutorial we concentrate on new research directions that are
currently emerging in the field: data mining applications in
bioinformatics, networking, and the web. We will explore new problems
that come up in these areas, identify common threads among the various
applications, and consider new paradigms, methods and techniques that
are being developed to address these problems. In the tutorial we will
emphasize the algorithmic aspects of analyzing large datasets. There are
different general ways to approach this problem, such as approximate
algorithms and data summarization techniques. We will look at new
techniques on stream processing and online algorithms, and their
applications to specific problems.
Biological research is undergoing a major revolution as new technologies,
such as high-throughput DNA sequencing and DNA microarrays, are creating
large amounts of data. New techniques in analyzing such data are important
in the understanding of biological processes. Many bioinformatics problems
can be formulated as generalized searching problems in a large space. We
will look at general lattice search techniques with different constraints,
as well as new string algorithms. We will also look at applications of
classification techniques in the area.
Networking and telecommunications applications produce large amounts of
data that can be mined for various properties of interest. Time series data
prevail in such domains and algorithms for time series matching, sequential
pattern identification are of great interest. We will concentrate on
incremental and one pass algorithms for networking problems and explore
the connection between these problems and similar incremental and one pass
problems arising in the biological sciences.
The web has emerged as a vast datastore, containing diverse pieces of
information. We will examine recent approaches to mine information on the
World Wide Web, including efficient web searching and web site
personalization efforts. We will also look at data and resource management
issues in the web environment, with emphasis on bioinformatics and
Registration Fee and Procedure: The registration fee is $200 for the week.
Graduate students, postdocs and DIMACS Members pay $95 for the week. The
fee less registration deposit will be collected on site, cash, check, Visa
or Mastercard. Registration fees cover two meals a day, breaks, and all
workshop materials. Registration is first come, first served and is limited
to 60 people. A non-refundable $50 registration deposit will hold your
for information on how to register.
Financial Support: Limited financial support for travel, local
expenses, or registration fees may be available depending upon support
from funding agencies. Applications for financial support can be found at
WWW Information: http://dimacs.rutgers.edu/Workshops/MiningTutorial
Rutgers, The State University of New Jersey
CoRE Bldg., 96 Frelinghuysen Road
Piscataway, NJ 08854-8018, USA
EMAIL: [log in to unmask]
DIMACS is a partnership of Rutgers University, Princeton University,
AT&T Labs - Research, Bell Laboratories, the NEC Research Institute
and Telcordia Technologies.
Christine Spassione Tel: (732) 445-4304
Visitor Coordinator Fax: (732) 445-5932
DCI Program Administrator [log in to unmask]
96 Frelinghuysen Road
Piscataway, NJ 08854-8018