[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Persistant objects



   Date: Tue, 23 Jan 90 07:18:28 EST
   From: pab@starbase.mitre.org (Paul Birkel)


   In their AAAI88 tutorial, Bobrow and Kiczales refer to an example they
   used which "[would] sketch how CLOS can be connected to a persistent
   object system".

   Alas, I was unable to attend, and the course notes do not mention such
   a topic.

   Although I have a couple of ideas on the subject, I'm new to CLOS and
   would prefer to benefit from the experiences of those who have gone
   before. Does anyone have any ideas, papers, ongoing research, and/or
   code implementing any such mechanisms? If so, would they be so kind as
   to share them?

   I know of only one commercial implementation of such an idea (Mercury, AI
   Technology), although various ES shells (KeeConnection, Nexpert, ART "real
   soon now") have similar (albeit crude) mechanisms for mirroring "objects"
   in their proprietary representation language into SQL DBMS tables.

   Thank you.

   Paul A. Birkel
   MITRE, Mailstop W418
   7525 Colshire Drive
   McLean, VA  22102-3481

   (703) 883-6399


Symbolics has a commercial object-oriented database product called Statice which
implements a persistent CLOS object system.  While it really is persistent new
Flavors, for practical purposes it is CLOS and in fact it uses CLOS syntax for
the class definitions in anticipation of Symbolics switching over to CLOS.

Unlike most "persistent object systems", Statice has a real distributed access
database underlying it providing full transactional consistency and recovery, as
well as a relational calculus for doing queries.

There hasn't been much published about it, but you can look at a paper entitled
"An Object-Oriented Database System to Support an Integrated Programming
Environment" in IEEE Data Engineering, June 88, Vol. 11 No. 2.

Making objects persistent is actually very difficult.  The basic problem occurs
when it isn't sufficient to simply make objects persist, but they also need to
be shared.  Adding sharing opens up the whole problem of transactional
consistency, which is often ignored by "persistent object systems". 

Sharing objects means that access must be coordinated.  This problem has been
explored in depth in the database world, and a transaction model has emerged
ensuring that all access to the database is synchronized, preventing data from
changing out from under a process or from dirty data being visible to other
processes.  Statice does provide full transactional consistency, meaning that
all access to persistent data is synchronized with other processes.

However, the real problem has still not been solved.  Transactions, by their
very nature, can be aborted and potentially restarted.  This means that you
cannot do any side-effect inside of a transaction which cannot be undone.
However, most applications need to do a very important undoable side-effect,
namely communicating with a user.  This introduces the whole problem of going
into a transaction to get a snapshot of the state of the database, outside of
the transaction presenting the snapshot to the user and modifying some parts of
it, and then entering a transaction again to store the changes.  However, once
you are outside of the transaction all the problems of synchronizing access
reappears.

This problem is known in the database world as "long-lived transactions".  They
have been unable to provide general solutions to the problem since any solution
is dependent upon the actual application.  Simple locks are insufficient; I'm
researching possibilities of addressing this problem by creating syncronization
objects to reify the "long-lived transactions", providing a set of abstract
protocols ranging from simple exclusive locks up to version management.  The
basic point is that given an object-oriented system, you can have "application
specific knowledge" by providing abstract classes with known protocols which
applications can specialize with concrete implementations of those protocols.

The second problem is that just making object-oriented language persistent
generally retains the notion that objects are accessed navagationally by
following pointers.  However, once you have a significant amount of data, which
taking objects out of virtual memory makes possible, navagational access breaks
down, and people need to be able to make queries.  Basically, object-oriented
languages provide abstract behaviors for objects, but they don't provide any
data independent query facility.  I tend to characterize this by saying that
they don't provide abstract relationships between objects.  Providing data
independent queries is the whole point of relational databases, which have
provided abstract relationships, but don't provide abstract datatypes and thus
behaviors.

As an example of what I mean by navagational access being insufficient, say you
have a hypertext system with a whole lot of information stored as nodes with
"links" to other nodes, and that each node has an author.  To find all nodes
written by "North" navagationally, you have to tree walk the whole structure.
Query languages solve this by providing a language to pose this question, where
the calling program does not need to know how the objects are actually
organized.  The query optimizer can know whether there is an index there or not,
or whether it in effect has to do some full associative scan, and do the
appropriate type of lookup.  While you may say that you could simply add a hash
table from author to node and make it persistent, without a query language you
have simply added another navagational structure which the calling programs must
all be changed to take advantage of (and to keep up to date).  Eliminating this
requirement of changing calling programs is what object oriented languages are
all about by providing data abstraction.

Statice does add a relational calculus to CLOS, thereby addressing this issue
for at least persistent objects.  However, once you have copies of objects in
VM which you are manipulating, the problem is not solved by Statice.

This is sort of due to a problem related to this and to the first problem of
long-lived transactions.  Once you have made objects persistent, you end up with
a distinction between the state of an object in the persistent (and shared)
database, and in a particular virtual memory.  While it is inviting to try to
gloss this difference over and make it transparent, you really can't succeed in
doing so.  You end up having to deal with the fact that you need to "snapshot"
the state of the persistent data (probably using some synchronizer object to
implement the long-lived transaction), and then deal with the snapshot in
memory.  You really want to use the same query language to access the snapshot,
and in fact the snapshot sort of should look like a database itself.  Statice
does not do this since it didn't go the full route of really adding abstract
relationships to VM objects; it just did it for persistent objects.  There is
some work being done in Sweden about adding query languages to VM objects.

And a final problem is that existing relational algebras cannot query on
behaviors, since they were designed for relational databases where there was no
concept of objects or behaviors.

Well, this response is already way too long.  There is some work being done here
to address some of these issues, especially dealing with the synchronization
problems.