[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

issue CONSTANT-COMPILABLE-TYPES, version 7



Thanks to Cris Perdue, here is a brand new version of this issue.
There have been a few major changes in content since the version that
was distributed prior to the January meeting:

- All mention of cross-compilation has gone away.

- The behavior of readtables, methods, and generic functions is now
  unspecified.  Functions are still in, but see the note I added to the
  discussion section.

- Error terminology has changed.  The proposal used to say "is an error"
  but now requires implementations either to do something useful or
  signal an error.  If anybody has a problem with this, speak up.


Forum:		Compiler
Issue:		CONSTANT-COMPILABLE-TYPES
References:	CLtL pp. 56, 77-80, 324
		Issue CONSTANT-MODIFICATION
		Issue CONSTANT-CIRCULAR-COMPILATION
		Issue CONSTANT-ARRAY-ATTRIBUTES
		Issue QUOTE-SEMANTICS
		Issue LOAD-OBJECTS
Category:	CLARIFICATION, ADDITION
Edit history:	11/07/88, V1 by Cris Perdue
		11/14/88, V2 by Cris Perdue
		11/22/88, V3 by Cris Perdue
		12/20/88, V4 by Cris Perdue
		01/06/89, V5 by Sandra Loosemore (minor editorial
			clarifications, expand discussion)
		03/05/89, V6 by Cris Perdue (more response to comments,
			especially from Moon and and from Loosemore)
                03/05/89, V7 by Loosemore (more editorial tweaks)
Status:		Ready for release

Problem description:

CLtL does not specify what objects can be in compiled constants and it
does not say what relationship there is to be between the
constant passed to the compiler and the one that is established by
compiling and then loading its file.  Relevant remarks in CLtL
concerning file compilation and the definition of QUOTE suggest that
the compiler handles constants in ways that are not actually possible.

Introduction to the proposal:

The proposal CONSTANT-COMPILABLE-TYPES:SPECIFY attempts to spell out
what types can appear in compiled constants, and what it means when
they appear.  Unless stated otherwise, in this proposal where the term
"constant" is used, it means a quoted or self-evaluating constant, not
a named (defconstant) constant.

The key is a definition of a form of equivalence between Lisp objects,
"similarity as constants".  Code passed through the file compiler and
then loaded must behave as though quoted constants in it are "similar"
to quoted constants in the corresponding interpreted "source" code.

Because it is legitimate to compile in one address space and load into
a different one, it is necessary for the constraints to be defined
across address spaces.  This proposal only concerns quoted constants
to be processed by COMPILE-FILE.  Some other issues related to file
compilation are CONSTANT-COLLAPSING, CONSTANT-CIRCULAR-COMPILATION,
and QUOTE-SEMANTICS.

Some implementations "lose information" about some constants during
compilation.  Typically all constant arrays become simple arrays
during the process of compiling and loading.  We try to balance the
desire for more functionality against the effort required from
implementors.

Comments within the text of the proposal are enclosed in double angle
brackets, <<like this>>.

Proposal:  CONSTANT-COMPILABLE-TYPES:SPECIFY

An object may be used as a quoted constant processed by COMPILE-FILE
if the compiler can guarantee that the resulting constant established
by loading the compiled file is "similar as a constant" to the
original.  Treatment of uninterned symbols must be consistent across
the entire file as described below.  There also is a constraint on
arrays which is not symmetrical; compilation can make arrays
"simpler", but not "less simple".  (See below for the definition.)

We refer below to "quoted constants" or just "constants".  In this
section these terms refer to objects appearing in expressions of the
form (QUOTE <object>), to objects used as self-evaluating forms, and
to objects appearing in code at locations described as "not
evaluated".

Some types such as streams are not supported in constants.  Put
another way, an object containing one of these is not considered
similar as a constant to any other object.  Some implementations may
support them and define how they are treated.  For any object that
appears in a constant, but is not supported by the language as part of
a constant, the behavior of the compiler is unspecified; either the
the compiler and/or loader will handle that constant (in an
implementation-dependent manner) or the compiler will detect the
situation and signal an error.

Of the types supported in constants, some are treated as aggregate
objects.  For these types, being similar as constants is defined
recursively.  We say that an object of these types has certain
attributes, and to be similar as a constant to another object, the
values of the corresponding attributes of the two objects must also be
similar as constants.

This kind of definition has problems with any circular or "infinitely
recursive" object such as a list that is an element of itself.  We
use the idea of depth-limited comparison, and say that two
objects are similar as constants if they are similar at all finite
levels.  This idea is implicit in the definitions below, and applies
in all the places where attributes of two objects are required to be
similar as constants.

Here we define the notion of two objects being "similar as constants",
organizing the definition by type, and note additional constraints
that the compiler and loader working together must meet:

Number

  If either of the two objects is a number, both must be of the same
  type and must represent the same mathematical value.
  
Character

  If either of the two objects is a character, both must be character
  objects that represent the same character.  <<Note that this
  definition has to depend on the results of the Character Set
  proposals.>>

Random-state
  
  Let us say that two random-states are functionally equivalent if 
  applying RANDOM to them repeatedly always produces the same 
  pseudo-random numbers in the same order.  
  
  Two random-states are similar as constants exactly if copies of them
  made via MAKE-RANDOM-STATE are functionally equivalent.

  Note that a constant random-state object cannot be used as the "state"
  argument to the function RANDOM (because RANDOM side-effects this
  data structure).

Symbol

  A symbol can only be similar to a symbol.  References to interned
  symbols are "by name".  <<See issue COMPILE-FILE-SYMBOL-HANDLING for
  details.>>

  If a symbol is not interned, i.e. its home package is NIL, it is
  treated in a rather special way.  To be similar as a constant to
  another symbol, both symbols must be uninterned and have the same
  name.
  
  Constants that contain uninterned symbols have to satisfy an extra
  constraint.  Consider the set of places in a constant that refer to
  the same (EQ) uninterned symbol.  In any similar constant, the
  corresponding places must also all be EQ -- no more places and no
  fewer.  Moreover, COMPILE-FILE must arrange for the EQness of all 
  constant uninterned symbols that appear in the file to be preserved,
  even if they are referenced in separate constants.

  Because hash keys can be aggregate objects and because we treat hash
  tables as unordered sets of <key, value> pairs, similarity of hash
  tables is more complex.  See under "Hash Tables", below, for the
  definition.

Package

  A package can only be similar as a constant to a package.  References
  to packages are permitted in any constant.  References to packages are
  "by name": two packages are similar as constants when their names are
  similar as constants.  Within a Lisp "address space", packages with
  the same name are EQ.
  
  At load time, the package becomes the same as returned by
  FIND-PACKAGE, given the package name.  An error is signalled if no
  package of that name exists at load time.

  
AGGREGATE TYPES
---------------
  
For each of the types listed below, if either object is of the given
type, the two objects are similar as constants exactly if the other is
of that type and the values of all of the specified attributes are
similar as constants.

The attributes listed here can be called the "Basic Attributes" of
objects of each of these types.

Cons	     CAR, CDR.

Array	     For 1-dimensional arrays:
	     LENGTH, ARRAY-ELEMENT-TYPE, and ELT for all legal indices.

	     For arrays of other dimensions:
	     ARRAY-DIMENSIONS, ARRAY-ELEMENT-TYPE, AREF for all legal
	     indices.

	     An array of type SIMPLE-ARRAY can only be similar as a
	     constant to an array of type SIMPLE-ARRAY.  However, we
	     allow the file-compiler a bit of latitude here.  Where
	     constants in source code are displaced, have fill
	     pointers, or are adjustable, constants in the code
	     resulting from compilation and loading are permitted to
	     lack any or all of these qualities.

Hash Table   Keys and value pairs.  The table's test is unchanged
	     also.  If the file compiler is given a constant containing a
	     a hash table that has keys that are similar as
	     constants, the consequences are undefined.

	     Consider a hash table as an unordered set of key and
	     value pairs.  Two hash tables are similar as constants
	     exactly if there is a one-to-one correspondence between
	     the key and value pairs of each and a one-to-one
	     correspondence between the uninterned symbols of each
	     such that the two keys of each corresponding pair are
	     similar as constants and the two values are also similar
	     as constants.  The correspondence of uninterned symbols
	     must be consistent with the correspondence defined for
	     the entire set of constants in the file.

Pathname     Each pathname component.

OTHER TYPES
-----------

Stream, Compiled-Function, Readtable, Generic-function, Method
             Objects of these types are not supported in compiled
             constants.

Function     Only function constants that are not compiled-functions
	     and do not close over any (lexical) variables are
	     supported in compiled constants.

	     Two such functions are similar as constants if their
	     SOURCE-LAMBDA-EXPRESSIONs are similar as constants.

Structure, Standard-object
             <<There is a cl-cleanup issue, LOAD-OBJECTS, pending
             which proposes a mechanism for dealing with objects.>>
             For structure instances with no method defined at compile
             time for MAKE-LOAD-FORM, the slot values and the name of
             structure type (a symbol reference) are recorded by the
             compiler and reconstructed by the loader.


Examples:

If source code contains a constant that could PRINT as (#1=#:FOO
#2=#:FOO #1# #2#), the constant resulting from compiling and loading
that code would have to be PRINTable as (#1=#:FOO #2=#:FOO #1# #2#).

If we make a hash table H, set three variables A, B, and C to
different uninterned symbols named FOO, and enter keys and values as
follows:

(setf (gethash a h) b)
(setf (gethash b h) a)
(setf (gethash c h) c)

If H appears in a compiled constant, after compiling and loading it,

(let ((value (list)))
  (maphash #'(lambda (x y) (push (list x y) value)) h)
  value)

could print as

((#1=#:FOO #2=#:FOO) (#2# #1#) (#3=#:FOO #3#))

but not as

((#1=#:FOO #2=#:FOO) (#2# #3=#:FOO) (#3# #1#))


Rationale:

For the benefit of users, this proposal tries to define a fairly large
set of types that all Common Lisp implementations are to handle.  It
also attempts to leave room for implementations to differ.  Some
implementations have made opposing choices because the language
doesn't specify one over the other.  Some implementations already
handle constants that this proposal defines as not legal in Common
Lisp programs, and that is useful to users of those systems.
Different implementors have different amounts of resources to apply to
their system, and implementations differ in their whole approach in
some cases.

This proposal appears to reflect user demand and appears not to exceed
the capabilities of most implementations of the language.

The proposal ensures that all references to the same uninterned symbol
within a file will all map to references to just one uninterned symbol
after compiling and loading.  This is needed to support PCL.


Current practice:

>From Gail Zacharias (Nov 14): "Coral pretty much implements this
proposal (I think we currently coalesce hash table keys, but that's
just a bug that will be fixed).  We also fasdump packages (using the
package name) and compiled functions (but not foreign functions).  For
symbols, we dump the name, and if (roughly speaking) the symbol would
get printed with a package prefix, we also dump the package name and
load the symbol into that package (otherwise it gets loaded into the
current load-time package)."

>From David Gray (Nov 9): "The Explorer can compile constant functions,
read tables, and hash tables; an error is signalled for a stream.  A
package object used to break the compiler but in release 5 it has been
fixed to generate instructions to call FIND-PACKAGE on the package
name at load time."  (Nov 15): [The Explorer does not guarantee
retention of displaced-to and displaced-index-offset attributes.]
"The Explorer also does not currently support dumping closures (either
compiled or evaluated), although non-closure compiled functions can be
dumped."

>From David Moon (Jan 24): "Symbolics Genera current practice: aside
 from some current bugs we have with circular structures of certain
types and with preserving the identity of CONSes under EQ, this is
more or less consistent with our current practice, if you made the
changes implied by my earlier comments.  We preserve the :displaced-to
and :fill-pointer array attributes.  I doubt that we do what the
proposal says for hash-tables, readtables, and random-states.  We
support dumping compiled and interpreted functions, but not closures,
which in effect means we don't support dumping functions."

>From Sandra Loosemore (Mar 3): "UCL currently can handle only
constants that are of type number, character, symbol, cons,
simple-vector, or string (which it turns into simple-string).  It
signals an error if an attempt is made to compile any other kind of
object as a constant."


Adoption cost:

Not known.  Probably moderate or low -- for most implementations.  The
cost would be to implementors rather than users since this part of the
language is currently underspecified.  The author believes the cost
will be reasonable for KCL, an implementation where there is some
concern about this issue.

This proposal is close to compatible with the Franz, Lucid, Coral,
Texas Instruments, and Symbolics implementations.  It is probably
compatible or nearly compatible with other "Lisp Machine"
implementations.


Benefits:

Users would be able to use aggregate objects in constants with
confidence about the behavior of their code.


Conversion cost:

Where this proposal *requires* different behavior than an existing
implementation, there is a conversion cost for users of that
implementation.  It appears that this cost will be small, less than
the cost of leaving things unspecified.


Esthetics:

Since there is no adequate definition at present, a fuller definition
would be more esthetic.


Discussion:

This proposal does leave some user-visible attributes of objects
unspecified across the compile-and-load process, except that they must
be consistent with the attributes that must be retained.  This
situation is a compromise between the desire for full specification on
the one hand, and on the other hand the desire to leave freedom for
different implementations to remain different and to support some
optimizations such as compacting hash tables and "simplifying" arrays.

Proposals will be entertained for tighter specification of datatypes
such as arrays.

The full extension of the concept of coalescing of constants is to say
that they can be coalesced exactly when they are similar as constants.

Comparing functions semantically is intertwined with the specification
of what conforming programs and implementations are allowed to do.
This proposal does not attempt to do that since compiled functions are
not supported by this proposal in compiled constants.

The definition of similarity for random-states supports the
possibility of random states that are immutable because of being in
compiled constants.

Readtables need not be supported by an implementation.  If a readtable
contains only symbols to represent functions, here is Cris Perdue's
suggested spec for similarity of readtables:

Character syntax type for each character in the table;
function for each readmacro character, mappings for
dispatch macros; whether terminating or nonterminating
for each readmacro.

Interest has been expressed by a number of people including users, in
support for user-definable "dumping" of CLOS objects and structure
instances.  The cleanup issue LOAD-OBJECTS deals with this.

This subsumes the issue CONSTANT-ARRAY-ATTRIBUTES.

Sandra Loosemore says:

  I plan to submit an amendment to this proposal which would remove the
  requirement that the compiler be able to dump non-compiled, non-closed
  functions.  The reason for removing this requirement is that there is
  no way to portably construct an object which is guaranteed to be a
  non-compiled, non-closed function.  Note that implementations are
  permitted to make all functions COMPILED-FUNCTIONs.

JonL White notes:

  The analogy between FIND-PACKAGE and FIND-CLASS suggests that class 
  objects are in the same "database" category as packages.  Shouldn't
  they be referenced "by name" in compiled file?

Moon responds:

  In Flavors we generate metaobjects at compile time, but we never put
  them (to speak loosely) into the compiled-code file; instead macros
  like DEFFLAVOR and DEFMETHOD expand into Lisp code that obtains new
  metaobjects at load time, based on the class name and generic function
  name.  I don't see how any other way could work, actually, since two
  distinct compiled files that refer to a class by the same name must
  end up referring to the same metaobject after loading.  In Flavors we
  don't have anonymous classes nor anonymous generic functions, so we
  don't have to solve those issues.

  [Issue LOAD-OBJECTS proposes] that the way to load an instance of a
  standard-class from a compiled file is for a method of the instance
  to return a form which is then evaluated at load time.  The semantics
  of loading an instance of a standard-class from a compiled file can be
  entirely understood in terms of MAKE-INSTANCE or whatever other
  function is called by the form returned by MAKE-LOAD-FORM; no new
  concepts need be introduced.  If the programmer of a given class wants
  to use the class redefinition protocol, that class's MAKE-LOAD-FORM
  method can output something that uses that protocol, and if he
  doesn't, it can output something that doesn't.
-------