[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Compiler section (4.2)



This section was written by Sandra and reviewed by RPG. Any other comments
before I begin working on it?
kathy

 
[rpg: Comments look like this.]
 
Introduction
============
 
The compiler is a utility that translates programs into an
implementation-dependent form that can be represented and/or executed
more efficiently.  The nature of the processing performed during
compilation is discussed in the "Compilation Semantics" section below.
This is followed by a discussion of the behavior of COMPILE-FILE and
the interface between COMPILE-FILE and LOAD.
 
[rpg: 
 
The compiler is a utility that may translate programs into an
implementation-dependent form that might be represented or executed
more efficiently.  The nature of the processing performed during
compilation is discussed in the "Compilation Semantics" section below.
This is followed by a discussion of the behavior of COMPILE-FILE and
the interface between COMPILE-FILE and LOAD.
 
]
 
% References:
%    CLtL page 143 (next to last paragraph)
%    CLtL page 321 (second paragraph)
[rpg: CLtL page 438]
 
The functions COMPILE and COMPILE-FILE are used to explicitly force
[rpg: yuck] compilation to take place.  It is permissible for
conforming implementations to also perform implicit compilation during
ordinary evaluation.  While the evaluator is typically implemented as
an interpreter that traverses the given form recursively, performing
each step of the computation as it goes, a permissible alternate
approach is for the evaluator first to completely compile the form
into machine-executable code and then invoke the resulting code.
Various mixed strategies are also possible.  All of these approaches
should produce the same results when executing a correct program, but
may produce different results for incorrect programs.
 
[rpg: This should say that execution of programs can be accomplished
by a variety of means ranging from direct interpretation of the list
structure representing a program through compilation to machine code,
and that the designer of an interpreter (evaluator?) can select any
of these strategies, and the designer of the compiler should select
any strategy that generally results in code that is no slower or no
bigger and which satisfies the constraints just below.]
 
% This paragraph should really conclude with a stronger statement that
% conforming programs must be structured so they will work if implicit
% compilation does take place, but CLtL doesn't come right out and say
% that, and we have never voted on any issue to say that either.
 
 
Compilation Semantics
=====================
 
% References:
%    Issue COMPILE-ENVIRONMENT-CONSISTENCY [pending]
%    Issue COMPILED-FUNCTION-REQUIREMENTS [pending]
% The material in this section will have to be updated to reflect further
% changes to these issues.
 
Conceptually, compilation can be viewed as a process which traverses a
program, performs certain kinds of syntactic and semantic analysis
using information (such as proclamations and macro definitions)
present in the compile time environment, and produces a modified
program.  As a minimum, the compiler must perform the following
actions:
 
- All macro calls appearing lexically within the code being compiled
  must be expanded at compile time and will not be expanded again at
  run time.  The process of compilation effectively turns MACROLET
  and SYMBOL-MACROLET constructs int PROGNs, with all calls to the local
  macros in the body replaced by their expansions.
 
  The compiler must treat any form that is a list beginning with a
  symbol that does not name a macro or special form as a function call.
  (This implies that SETF methods must also be available at compile time.)
 
- The compiler must capture declarations to determine whether
  variable bindings and references appearing lexically within the
  code being compiled are to be treated as lexical or special.  The
  compiler must treat any binding of a variable that has not been
  declared or proclaimed to be SPECIAL as a lexical binding.
 
- The compiler must process EVAL-WHEN forms that appear lexically within
  the program being compiled.  Effectively, the compiler must replace
  the EVAL-WHEN form with either a PROGN containing the body forms, or
  a constant NIL.
 
- The compiler must process LOAD-TIME-VALUE forms that appear lexically
  within the program being compiled.  In the case of COMPILE, evaluation
  of the LOAD-TIME-VALUE form happens at compile time and the resulting
  value is treated as a literal constant at run time.  In the case of
  COMPILE-FILE, the compiler must arrange for evaluation of the form
  to take place at load time.
 
In addition, the compiler is permitted to incorporate the following
kinds of information into the code it produces, if the information is
present in the compile time environment and is referenced within the
code being compiled.  Except where some other behavior is explicitly
stated, when the compile time and run time definitions are different,
it is unspecified which will prevail within the compiled code.  It is
also permissible for implementations to signal an error at run time to
complain about the discrepancy.  [rpg: Diction.] In all cases, the
absence of the information at compile time is not an error, [rpg:
terminology] but its presence may enable the compiler to generate more
efficient code.
 
[rpg: There is a complicated issue: Can the compiler assume that the
resulting code in a compile-file situation will be run in the same
Lisp? The same implementation? The same computer?  The same type of
computer?
 
I suggest that we say that the semantics we discuss presumes that the
compiler can assume that when doing compile-file that the resulting
code will be loaded into a fresh copy of the same Lisp.]
 
- The compiler may assume that functions that are defined and
  declared or proclaimed INLINE in the compile time environment will
  retain the same definitions at run time.
 
- The compiler may assume that, within a named function, a
  recursive call to a function of the same name refers to the
  same function, unless that function has been declared NOTINLINE.
 
[rpg: the interpreter can assume the same thing, right? That is, a
valid Common Lisp has be one in all code is compile-filed by a
separate program and loaded and executed in the apparent Common Lisp
image.]
  
- COMPILE-FILE may assume that, in the absence of NOTINLINE
  declarations, a call within the file being compiled to a named
  function which is defined in that file refers to that function.
  (This permits "block compilation" of files.)  The behavior of
  the program is unspecified if functions are redefined individually 
  at run time.
  
[rpg: the interpreter can assume the same thing, right?]
 
- The compiler may assume that the argument syntax and number of return
  values for all built-in Common Lisp functions will not change.  In
  addition, the compiler may treat all built-in Common Lisp functions
  as if they had been proclaimed INLINE.
  
[rpg: the interpreter can assume the same thing, right? This follows from
LISP-SYMBOL-REDEFINITION.]
 
- The compiler may assume that the argument syntax and number of return
  values for all functions with FTYPE information available at
  compile time will remain the same at run time.
 
% Reference:  CLtL page 69
- The compiler may assume that symbolic constants that have been
  defined with DEFCONSTANT in the compile time environment will retain
  the same value at run time as at compile time.  The compiler may replace
  references to the name of the constant with the value of the constant,
  provided that such "copies" are EQL to the object that is the
  actual value of the constant.
 
% The following paragraph from issue COMPILE-ENVIRONMENT-CONSISTENCY
%    seems likely to change:
 
- The compiler can assume that type definitions made with DEFTYPE 
  or DEFSTRUCT in the compile time environment will retain the same 
  definition in the run time environment.  It may also assume that
  a class defined by DEFCLASS in the compile time environment will
  be defined in the run time environment in such a way as to have
  the same superclasses and metaclass.  [rpg: compatible metaclass?]
 
[rpg: This is a little curious. Is this talking only about this sort of case:
 
(defclass c ...)
 
(compile-file <something using c>)
 
or is it trying to cover the case of 
 
(compile-file <...(declass c ...) ... something using c>)
]
 
This implies that
  subtype/supertype relationships of type specifiers will not 
  change between compile time and run time.  (Note that it is not 
  an error [rpg: terminology?] for an unknown type to appear in a
  declaration at
  compile time, although it is reasonable for the compiler to 
  emit a warning in such a case.)
 
% Ref:  CLtL page 153
- The compiler may assume that if type declarations are present
  in the compile time environment, the corresponding variables and 
  functions present in the run time environment will actually be of
  those types; otherwise, the run time behavior of the program is 
  undefined.
 
The compiler *must not* make any additional assumptions about
consistency between the compile time and run time environments.  In 
particular:
 
- The compiler may not assume that functions that are defined
  in the compile time environment will retain the either the
  same definition or the same signature at run time, except in the
  situations explicitly listed above.
 
- The compiler may not signal an error if it sees a call to a
  function that is not defined at compile time, since that function
  may be provided at run time.
 
 
 
File Compilation
================
 
The function COMPILE-FILE performs compilation processing (described
in the previous section) on forms appearing in a file, producing an
output file which may then be loaded with LOAD.
 
Normally, the top-level forms appearing in a file compiled with
COMPILE-FILE are executed only when the resulting compiled file is
loaded, and not when the file is compiled.  However, it often happens
that some forms in the file must be evaluated at compile time in order
for the remainder of the file to be read and compiled correctly; for
example, forms that change the values of *PACKAGE* or *READTABLE* and
macro definitions.  In such cases, the distinction between processing
that is performed at compile time and processing that is performed at
load time becomes important.
 
The special form EVAL-WHEN can be used to give explicit control over
the time at which evaluation of a top-level form takes place, allowing
forms to be executed at compile time, load time, or both.  The
behavior of this construct may be more precisely understood in terms
of a model of how COMPILE-FILE processes forms in a file to be
compiled.
 
Successive forms are read from the file by the file compiler [rpg:
COMPILE-FILE] using READ. These top-level forms are normally processed
in what we call `not-compile-time' mode; in this mode, the file
compiler arranges for forms to be evaluated only at load time and not
at compile time.  There is one other mode, called `compile-time-too'
mode, in which forms are evaluated both at compile and load times.
 
[rpg: what is the file compiler? The thing that compile-file causes to
run?
 
Also, isn't this requirement that COMPILE-FILE to use READ new? I don't
see why it's required. I suggest removing it.]
 
Processing of top-level forms in the file compiler works as follows:
 
* If the form is a macro call, it is expanded and the result is
  processed as a top-level form in the same processing mode
  (compile-time-too or not-compile-time).
 
* If the form is a PROGN form, each of its body forms is
  sequentially processed as top-level forms in the same processing
  mode.
 
* If the form is a LOCALLY, MACROLET, or SYMBOL-MACROLET,
  the file compiler makes the appropriate bindings and recursively
  processes the body forms as an implicit top-level PROGN with those 
  bindings in effect, in the same processing mode.  (Note that this
  implies that the lexical environment in which top-level forms are
  processed is not necessarily the null lexical environment.)
 
* If the form is an EVAL-WHEN form, it is handled according to
  the following table:
 
  :COMPILE-  :LOAD-    :EXECUTE compile-time-too  Action 
   TOPLEVEL   TOPLEVEL 
 
   Yes   Yes  --     --             Process body in compile-time-too mode
   No    Yes  Yes    Yes            Process body in compile-time-too mode
   No    Yes  Yes    No             Process body in not-compile-time mode
   No    Yes  No     --             Process body in not-compile-time mode
   Yes   No   --     --             Evaluate body
   No    No   Yes    Yes            Evaluate body
   No    No   Yes    No             do nothing
   No    No   No     --             do nothing
 
  "Process body" means to process the body (using the procedure 
  outlined in this subsection) as an implicit top-level PROGN.
  "Evaluate body" means to evaluate the body forms as an implicit
  PROGN in the dynamic execution context of the compiler and in the
  lexical environment in which the EVAL-WHEN appears.
 
* Otherwise, the form is a top-level form that is not one of the
  special cases.  If in compile-time-too mode, the compiler first
  evaluates the form and then performs normal compiler processing
  on it.  If in not-compile-time mode, only normal compiler
  processing is performed.  Any subforms are treated as non-top-level
  forms.
 
Note that top-level forms are processed in the order in which they
textually appear in the file, and that each top-level form read by the
compiler is processed before the next is read.  However, the order of
processing (including, in particular, macro expansion) of subforms
that are not top-level forms is unspecified.
 
EVAL-WHEN forms cause compile time evaluation only at top-level.  In
non-top-level locations, both the :COMPILE-TOPLEVEL and :LOAD-TOPLEVEL
situations are ignored and only the :EXECUTE situation is considered.
 
The following macros make definitions that are typically used during
compilation and are defined to make those definitions available at
both compile time and run time when calls to those macros appear in a
file being compiled.  As with EVAL-WHEN, these compile time
side-effects happen only when the defining macros appear at top-level.
 
% The specific details of the compile time side effects should go under
% the description of the macro in chapters 6 & 7.
    DEFTYPE
    DEFMACRO
    DEFINE-MODIFY-MACRO
    DEFVAR
    DEFPARAMETER
    DEFCONSTANT
    DEFSETF
    DEFINE-SETF-METHOD
    DEFSTRUCT
    DEFINE-CONDITION
    DEFPACKAGE
    IN-PACKAGE
% These depend on the outcome of issue CLOS-MACRO-COMPILATION
    DEFCLASS
    DEFGENERIC
    DEFMETHOD
    DEFINE-METHOD-COMBINATION
% This depends on the outcome of issue PROCLAIM-ETC-IN-COMPILE-FILE
    DEFPROCLAIM
 
The compile time behavior of these macros can be understood as if
their expansions effectively include (EVAL-WHEN (:COMPILE-TOPLEVEL)
...) forms.  It is not required that the compile time definition be
made in the same manner as if the defining macro had been evaluated
directly.  In particular, the information stored by the defining
macros at compile time may or may not be available to the evaluator
(either during or after compilation), or during subsequent calls to
COMPILE or COMPILE-FILE.  If the definition must be visible during
compile time evaluation, it should be placed within an explicit
(EVAL-WHEN (:COMPILE-TOPLEVEL) ...) to ensure that it will be fully
defined at compile time.
 
   Wrong:  (defmacro foo (x) `(car ,x))
    	   (eval-when (:execute :compile-toplevel :load-toplevel)
             (print (foo '(a b c))))
 
   Right:  (eval-when (:execute :compile-toplevel :load-toplevel)
             (defmacro foo (x) `(car ,x))
             (print (foo '(a b c))))
 
 
 
Compiler/Loader Interface
=========================
 
% Reference: Issue QUOTE-SEMANTICS
 
The functions EVAL and COMPILE always ensure that constants referenced
within the resulting interpreted or compiled code objects are EQL to
the corresponding objects in the source code.  COMPILE-FILE, on the
other hand, must produce an output file which contains instructions
[rpg: to] tell the loader how to reconstruct the objects appearing in
the source code when the compiled file is loaded.  
 
[rpg: I prefer this, because the objects may not be *re*constructed since
they might not have been constructed in the first place. Also, ``instructions''
might never appear, only some collaboration need be implied:
 
COMPILE-FILE, on the other hand, must produce an output file which
when loaded with LOAD constructs the objects defined by the source
code.]
 
The EQL relationship is not well-defined in this case, since the
compiled file may be loaded into a different Lisp image than the one
that it was compiled in.  This section defines a notion of "similarity
as constants" which relates objects in the the compile time
environment to the corresponding objects in the load time environment.
 
The constraints on constants described in this subsection apply only
to COMPILE-FILE; implementations are not permitted to copy or coalesce
constants appearing in code processed by EVAL or COMPILE.
 
 
Terminology
-----------
 
% Reference:  Issue CONSTANT-COMPILABLE-TYPES
 
The following terminology is used in this section.
 
The term "constant" refers to a quoted or self-evaluating constant
or an object that is a substructure of such a constant, not a named
(DEFCONSTANT) constant. [rpg: ``self-evaluating means....'']
 
The term "source code" is used to refer to the objects constructed
when COMPILE-FILE calls READ, and additional objects constructed by
macroexpansion during COMPILE-FILE.
 
[rpg: I think the source code is whatever the representation is in
whatever a file is. I think this use of READ as a semantic crutch is
unnecessary.]
 
The term "compiled code" is used to refer to objects constructed by 
LOAD.
 
[rpg: so a floating-point number constructed by LOAD is ``compiled
code''?]
 
The term "coalesce" is defined as follows.  Suppose A and B are two
objects used as quoted constants in the source code, and that A' and
B' are the corresponding objects in the compiled code.  If A' and B'
are EQL but A and B were not EQL, then we say that A and B have been
coalesced by the compiler.
 
[rpg: here is a first pass at changing this wording to avoid READ:
 
The term "coalesce" is defined as follows.  Suppose A and B are two
objects defined as quoted constants in the source code, and that A'
and B' are the corresponding objects in the compiled code.  If A' and
B' are EQL but A and B were not defined to be EQL, then we say that A
and B have been coalesced by the compiler.]
 
 
What may appear as a constant
-----------------------------
 
An object may be used as a quoted constant processed by COMPILE-FILE
if the compiler can guarantee that the resulting constant established
by loading the compiled file is "similar as a constant" to the
original.
 
The notion of "similarity as a constant" is not well-defined on all
data types.  Objects of these types may not portably appear as
constants in code processed with COMPILE-FILE.  Conforming
implementations are required to handle such objects either by having
the compiler and/or loader reconstruct an equivalent copy of the
object in some implementation-specific manner; or by having the
compiler signal an error.
 
For some aggregate data types, being similar as constants is defined
recursively.  We say that an object of these types has certain "basic
attributes", and to be similar as a constant to another object, the
values of the corresponding attributes of the two objects must also be
similar as constants.
 
This kind of definition has problems with any circular or "infinitely
recursive" object such as a list that is an element of itself.  We use
the idea of depth-limited comparison, and say that two objects are
similar as constants if they are similar at all finite levels.  This
idea is implicit in the definitions below, and applies in all the
places where attributes of two objects are required to be similar as
constants.
 
[rpg: Hm, this comment can be got around.]
 
% Reference:  issue CONSTANT-CIRCULAR-COMPILATION
 
Such circular objects may legitimately appear as constants to be
compiled.  More generally, if two constants appearing in the source code
for a single file processed with COMPILE-FILE are EQL, the corresponding
constants in the compiled code must also be EQL.
 
% Reference:  issue CONSTANT-COLLAPSING
 
However, the converse of this relationship need not be true; if two
objects are EQL in the compiled code, that does not always imply that
the corresponding objects in the source code were EQL.  This is
because COMPILE-FILE is permitted to coalesce constants appearing in
the source code if and only if they are similar as constants, except if
the objects involved are of type SYMBOL, PACKAGE, STRUCTURE, or
STANDARD-OBJECT.  Objects of these types are never coalesced.
 
 
Similarity as constants
-----------------------
 
Two objects are defined to be "similar as a constant" if and only if
they are both of one of the [rpg: same type from the list of] types
listed below and satisfy the additional requirements listed for that
type.
 
Number
 
  Two numbers are similar as constants if they are of the same type
  and represent the same mathematical value.
  
Character
 
  Two characters are similar as constants if they both represent
  the same character.
 
% Note that this definition has to depend on the results of the
% Character Set proposals.  The intent is that this be compatible with
% how EQL is defined on characters.
 
Symbol
 
% Issue COMPILE-FILE-SYMBOL-HANDLING defines how the file compiler
%  and loader handle interned symbols.
 
  An uninterned symbol in the source code is similar as a constant
  to an uninterned symbol in the compiled code if their print names
  are similar as constants.
 
Package
 
  A package in the source code is similar as a constant to a package in
  the compiled code if their names are similar as constants.  Note that
  the loader finds the corresponding package object as if by calling
  FIND-PACKAGE with the package name as an argument.  An error is
  signalled if no package of that name exists at load time.
 
Random-state
 
  Let us say that two random-states are functionally equivalent if 
  applying RANDOM to them repeatedly always produces the same 
  pseudo-random numbers in the same order.  
  
  Two random-states are similar as constants if and only if copies of
  them made via MAKE-RANDOM-STATE are functionally equivalent.
 
  Note that a constant random-state object cannot be used as the "state"
  argument to the function RANDOM (because RANDOM side-effects this
  data structure).
 
Cons
 
  Two conses are similar as constants if the values of their respective
  CAR and CDR attributes are similar as constants.
 
Array
 
  Two arrays are similar as constants if the corresponding values each
  of the following attributes are similar as constants:
 
  For 1-dimensional arrays:
  LENGTH, ARRAY-ELEMENT-TYPE, and ELT for all valid indices.
 
  For arrays of other dimensions:
  ARRAY-DIMENSIONS, ARRAY-ELEMENT-TYPE, AREF for all valid indices.
 
  In addition, if the array in the source code is a SIMPLE-ARRAY, then
  the corresponding array in the compiled code must also be a
  SIMPLE-ARRAY.  If the array in the source code is displaced, has a
  fill pointer, or is adjustable, the corresponding array in the
  compiled code is permitted to lack any or all of these qualities.
 
[rpg: hm]
 
Hash Table   
 
  Two hash tables are similar as constants if they meet the following
  three requirements:
 
  (1) They both have the same test (e.g., they are both EQL hash tables).
 
  (2) There is a unique one-to-one correspondence between the keys of
      the two tables, such that the corresponding keys are similar as
      constants.
 
  (3) For all keys, the values associated with two corresponding keys
      are similar as constants.
 
  If there is more than one possible one-to-one correspondence between
  the keys of the two tables, the results are unspecified.  A conforming
  program cannot use such a table as a constant.
 
[rpg: So, compilers can only be heuristic in such cases, no?] 
 
Pathname
 
  Two pathnames are similar as constants if all corresponding pathname
  components are similar as constants.
 
Stream, Readtable, Method
 
  Objects of these types are not supported in compiled constants.
 
Function
 
%  Issue CONSTANT-FUNCTION-COMPILATION specifies how the compiler and
%  loader handle constant functions.
 
Structure, Standard-object
 
% Reference: issue LOAD-OBJECTS
 
  Objects of type structure and standard-object may appear in compiled
  constants if there is an appropriate MAKE-LOAD-FORM method defined
  for that type.
 
  COMPILE-FILE calls MAKE-LOAD-FORM on any object that is referenced as
  a constant or as a self-evaluating form, if the object's metaclass is
  STANDARD-CLASS, STRUCTURE-CLASS, any user-defined metaclass (not a
  subclass of BUILT-IN-CLASS), or any of a possibly-empty
  implementation-defined list of other metaclasses.  COMPILE-FILE will
  only call MAKE-LOAD-FORM once for any given object (compared with EQ)
  within a single file.
 
Condition
 
% This somehow got overlooked.  Are they handled under LOAD-OBJECTS?
 
[rpg: Yes, since they are instances of classes.]
 
 
Compile Time Error Handling
===========================
 
% Reference:  Issue COMPILER-DIAGNOSTICS
% The STYLE-WARNING condition needs to be integrated into the section
%     describing the hierarchy of condition types.
 
Errors and warnings may be issued within COMPILE or COMPILE-FILE.
This includes both arbitrary errors which may occur due to
compile-time processing of (EVAL-WHEN (:COMPILE-TOPLEVEL) ...)  forms
or macro expansion, and conditions signalled by the compiler itself.
 
Conditions of type ERROR may be signalled by the compiler in
situations where the compilation cannot proceed without
intervention.
 
    Examples:
        file open errors
        syntax errors
 
Conditions of type WARNING may be signalled by the compiler in 
situations where the standard explicitly states that a warning must,
should, or may be signalled; and where the compiler can determine 
that a situation with undefined consequences or that would cause
an error to be signalled would result at runtime.
 
[rpg: But this is not to be construed as an escape clause that allows
an implementation to not warn when it is required. This attempts to
only talk about how the warning is issued, right?]
 
    Examples:
        violation of type declarations
        SETQ'ing or rebinding a constant defined with DEFCONSTANT
        calls to built-in Lisp functions with wrong number of arguments
          or malformed keyword argument lists
        referencing a variable declared IGNORE
        unrecognized declaration specifiers
 
The compiler is permitted to issue warnings about matters of
programming style as conditions of type STYLE-WARNING.  Although 
STYLE-WARNINGs -may- be signalled in these situations, no 
implementation is -required- to do so.  However, if an 
implementation does choose to signal a condition, that condition 
will be of type STYLE-WARNING and will be signalled by a call to 
the function WARN.
 
    Examples:
	redefinition of function with different argument list
	calls to function with wrong number of arguments
	unreferenced local variables not declared IGNORE
	declaration specifiers described in CLtL but ignored by 
	  the compiler
 
 
Both COMPILE and COMPILE-FILE are allowed to establish a default
condition handler.  If such a condition handler is established,
however, it must first resignal the condition to give any
user-established handlers a chance to handle it.  If all user error
handlers decline, the default handler may handle the condition in an
implementation-specific way; for example, it might turn errors into
warnings.
 
% Reference:  issue WITH-COMPILATION-UNIT
 
In some implementations, some kinds of warnings may be deferred until
"the end of compilation"; see WITH-COMPILATION-UNIT.
 
-------