[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Backend Background Info



As has become apparent in recent mail and phone calls, there seems to be a
need for some background info on the different data structures/concepts
used by the compiler backend.

In order to aid in portability (and other reasons), the compiler does not
convert directly from source code to object code, but instead runs through
several internal representations in the middle (two to be exact, IR1 and
IR2).  Almost all of the machine dependent aspects of the compile are
contained in IR2, so we will emphasize that.  But before explaining how the
IR2 is used to generate code, we need to introduce some terms/concepts
first:

A Storage Base (SB) corresponds to some storage resource.  Typical storage
bases are the different register files, the different stacks, and
constants.  Storage bases are either :finite, :unbounded, or :non-packed.
Finite storage bases are things like register files where some finite
number of elements will fit in it.  Unbounded storage bases are things like
stacks which can grow.  Non-packed storage bases are for things like
immediates and constants that don't have to be packed into some resource.

A Storage Class (SC) corresponds to way or place an object can be
represented.  Each storage class uses some storage base, and can optionally
restrict the locations in that storage base that can be used.  Each storage
class has a list of alternate SCs and constant SCs.  The list of alternates
is used when the compiler wants to use more of a particular SC than the SB
that SC is using.  When the compiler runs out of locations in the SB, it
can spill one or more objects into any of the SCs alternate SCs.  The list
of constant SCs are just that: the list of SCs that are used to represent
any constants that would have otherwise been represented with this SC (if
it were not for the fact that they were constant).

A Primitive Type is the compilers idea of what the most basic name for a
particular object is.  Primitive types contain a list of allowed SCs.  This
means that any object of that primitive type can only be represented by
those SCs (or their alternate SCs and constant SCs).  This is how the
compiler assures that GC disciplines will be followed.  By controlling the
allowed SCs for a primitive type, and controlling the allowed locations
and/or the SB for the SC, you can control exactly where and how objects of
a particular primitive type will be represented.

If an object can be represented in more than one way at different times,
there must be seperate SCs for each representation.  Also, if the
representation does not contain the type of the object, that representation
must have it's own SC, otherwise it would be impossible to tell the
different types of objects in that SC apart.  For example, all descriptor
representations could use the same SC, but 32-bit raw numbers and 32-bit
single floats would each need their own SC so that they could be
distinguished from each other and descriptor objects.  (In practice, there
will probably be more than one SC for descriptor objects to distinguish
those objects that GC must scavenge and those objects that GC can ignore.)

Temporary Names (TNs) are used by the compiler to represent a particular
value or object.  The compiler allocates TNs for every variable in the
source code in addition to any temporary value that it might need to
introduce into complex expressions.

The program is represented in IR2 as a set of basic blocks, each block
consisting of a sequence of Virtual OPerations (VOPs).  Each vop
corresponds to a small sequence of code.  VOPs take TNs as arguments, use
TNs as temporaries, and produce TNs as results.  Actually, it would be more
correct to say that the code sequence that corresponds to the VOP takes
arguments from the locations specified by the argument TNs and puts results
in the locations specified by the result TNs, possibly using additional
locations specified by any temporary TNs.  This distinction is important
conceptually.  VOPs don't create or destroy TN's; they just extract info
from them.

When a VOP is defined, the argument and result TNs can be restricted to
some particular SC or SCs.  It is very important to understand that this
restriction will not control whether or not the compiler tries to use the
VOP, but instead allows the VOP definition mechanism to automatically
coerce the argument or result from one representation to another.  For
example, a VOP can restrict an argument to a register SC, and the define
VOP macro will arrange for any stack TNs to be loaded into a temporary
load-TN which is in the correct SC.

Additionally, a primitive type can be listed for each argument or result.
In this case, the compiler will only use the VOP if the primitive types of
the TNs match those listed for the VOP.  Note the difference between
specifying a primitive type restriction and a SC restriction on a VOP.  The
primitive type restriction controls whether the VOP is used or not, while
the SC restriction just controls what automatic loading the VOP will do on
your behalf.

VOPs are either emitted by name, or are marked as translating a function
known to the compiler.  Primitive type restrictions are really only useful
when controlling whether or not a particular VOP should or shouldn't be
used to translate a particular function.


Ack, my brain is scrod.  Due to such complications as the war starting this
is taking much longer than I had though it would.  I'll try to add more
info when my brain is working again.

-William