[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

summary - information-sifting front ends to data bases

Hi MCLers,

a few weeks ago I posted this message;

>has anyone out there written applications for information retrieval
>or information filtering? I am interested in extensible,
>programmable high-level DB interfaces which 
>- implement spreadsheet functionality or
>- can be used for personal bibliographical databases or
>- perform fuzzy matching or
>- carry out statistical classification or
>- serve as user-friendly database client to an SQL (Sybase)
>RDBMS on a Unix box accessible via TCP/IP.
>Any pointers, also to the published literature, are welcome.

The response was not overwhelming. A few people asked for a summary. Here it

1. from Ben Moreland <bjm@antares.res.utc.com>:

"... a lot of work has gone on at GE Research Labs in this area, as well as at
the University of Massachusetts (Amhearst)."
2. from Robert J. Kuhns <kuhns@world.std.com>:

Bob Kuhns and Tom Martin have developed a system hooked up to a 
Knight-Ridder satellite dish that was able to scan, parse and filter incoming 
stories at 9600 baud. There is a Mac version of this program and Robert Kuhns
is currently working on database-filtering version.

The system "is capable of automatic indexing of text against large indexing 
vocabularies. It differs from other systems in that it employs a 
(Government-Binding-based) parser [whatever that is] at its core. 
Parses that are actually predicate-argument strutures are mapped 
onto concept filters and successful matches result in a set of indexes being 
assigned to a piece of text. In this way, the system has the capability of
determining who is doing what to whom."

Bob Kuhns' current affiliation is with SigniCorp which is just forming to 
develop and market text processing software and database generators.

Bob Kuhns has kindly sent a paper (presented at the AAAI Spring Symposium 
Series on Text-Based Intelligent Systems, Stanford Univeristy, Palo Alto, 
California, March, 1990) of which I include the abstract here:

"News Analysis: A Natural Language Application to Text Processing"
Robert J. Kuhns
This paper describes the major barriers of developing large-scale 
natural language processing systems. In particular, development 
issues and approaches for parser, lexicon, and semantic 
interpretation are presented. Design strategies for each component 
are described using illustrations from an existing system that indexes 
and routes news reports from electronic input (newswires). This 
system relies heavily on a parser, thereby demonstrating the value 
of natural language techniques for extracting information from 
textual sources.
3. from Ashok Khosla, VCA, Khosla Consulting <KHOSLA@AppleLink.Apple.COM>:

"I've done statistical classification using standard Bayesian techniques as 
pattern classifiers. The code is in Think C, and is based on an article on 
gesture recognizers in the 1991 Siggraph proceedings."

He recommends the books "Information Retrieval - Data Structures
and Algorithms" by William Frakes, and Ricardo Baeza-Yates (Prentice-Hall
c1992 - ISN 0-13-463837 - 9 and Duda and Hart, "Pattern Classification and
Scene Analysis" c1973, published by Wiley Interscience.
4. from Lawrence Au, DARPA Subcontractor <lau@darpa.mil>:

"I've written a high-level interface to Sybase communicating via MacTCP 
using the MacTCP code shipped with MCL 2.0 final. I am about to turn it 
into a low cost source code product, to allow MCL to quickly access all 
the common SQL database products over Telnet connections.  

It would consist of a query object containing slots for SQL, SQL-rows, 
SQL-columns, login, password, host, stream.  There would be methods for 
login, exec-query,  and accessor methods for accessing specific cells in 
the query results by row and column names.

If you are interested,  give me a call at 202-508-4746(b) or 703-276-2825(h)
or send me mail at lau@a.darpa.mil"
5. from myself <adorf@eso.org>:

I have written a chapter "Information-sifting front ends to databases" 
for a book on "Adding Intelligence to Information Retrieval: The Case of
Astronomy and Related Space Sciences", edited by Andre Heck & Fionn Murtagh,
Kluwer Academic, Dordrecht (to appear).

"Information-sifting front ends to databases"
Hans-Martin Adorf
Abstract: Several astronomical database management systems are 
reviewed and compared with advanced commercial and public domain 
database front-ends. A number of important user interface features are 
identified that facilitate the information retrieval process from the user's 
point of view. It is envisaged that these features will become a standard 
part of existing or next-generation database systems in astronomy.
Hans-Martin Adorf				      adorf@eso.org (Internet)
Space Telescope - European Coordinating Facility	 adorf@eso.uucp (UUCP)
European Southern Observatory			  adorf@dgaeso51.bitnet (EARN)
Karl-Schwarzschild-Str. 2				     ESO::ADORF (SPAN)
D-8046 Garching bei Muenchen
F.R. Germany
Phone: +49-89-320 06-261 -- Fax: +49-89-320 06-480