[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

PRETTY-PRINT-INTERFACE, version 4



Version 3 (by Guy Steele Jr) supersedes version 2 and is changed from
version 1 as follows: adds a functional interface to supplement the
interface through FORMAT, and reflects comments by Barrett and
Pierson.

Version 4 (by Dick Waters) is changed from version 3 as follows: The
short summary is updated to reflect the functional interface.  The
functional interface is changed following suggestions made by Dave Moon.
The proposal is amended in a few minor ways to increase the
compatibility with variable width fonts.  Additional discussion has been
added with regard to the advantages of XP with regard to handling
circularity detection and abbreviation, the interaction with CLOS, and
the extended type specifier CONS used by XP.

The document attached to version 1 has also been fully revised, but is
sent in a separate message due to mailer problems.
--Dick

Issue:		PRETTY-PRINT-INTERFACE

References:	Description of XP by Dick Waters (attached)
		*PRINT-PRETTY* (CLtL p. 371)
		WRITE (CLtL p. 382)
		PPRINT (CLtL p. 383)
		FORMAT (CLtL pp. 385-407)
		FORMAT ~T directive (CLtL pp. 398-399)
		FORMAT ~< directive (CLtL pp. 404-406)

Related issues: 

Category:	CLARIFICATION CHANGE ADDITION

Edit history:	Version 1, 24-Feb-89 by Steele
		Version 2, 15-Mar-89 by Steele and Waters
		Version 3, 15-Mar-89 by Steele
		Version 4, 22-Mar-89 by Waters

Problem description:

At present, Common Lisp provides no specification whatsoever of how
pretty-printing is to be accomplished, and no way for the user to control
it.  In particular, there is no protocol by which a user can write a
print-function for a structure, or a method for PRINT-OBJECT, that will
interact smoothly with the built-in pretty-printer in a portable manner.

Proposal (PRETTY-PRINT-INTERFACE:XP):

Adopt the interfaces and protocols of the XP pretty-printer by Dick Waters,
described in full in the attached 12-page document.  Here is a very brief
summary of the proposal.

New variables:	*PRINT-DISPATCH*
		*PRINT-RIGHT-MARGIN*
		*DEFAULT-RIGHT-MARGIN*
		*PRINT-MISER-WIDTH*
		*PRINT-LINES*
		*LAST-ABBREVIATED-PRINTING*

New functions:	COPY-PRINT-DISPATCH
		FILL-STYLE
		LINEAR-STYLE
		TABULAR-STYLE
		CONDITIONAL-NEWLINE
		LOGICAL-BLOCK-TAB
		LOGICAL-BLOCK-INDENT

New macros:	DEFINE-PRINT-DISPATCH
		WITHIN-LOGICAL-BLOCK
		LOGICAL-BLOCK-COUNT
		LOGICAL-BLOCK-POP

New FORMAT directives:	~W  ~_  ~I  ~:T  ~/name/  ~<...~:>

New # reader macro:  #"..."

The function WRITE is extended to accept additional keyword arguments
:DISPATCH, :RIGHT-MARGIN, :LINES, and :MISER-WIDTH corresponding to the
first four of the new variables.


Examples:	See attached document.

Rationale:

There ought to be a good user interface to the pretty printer.
This is the only proposal for which there is a portable implementation
that has seen extensive use and is being made freely available.


Current practice:

XP son of PP son of GPRINT son of PRINT* is the latest in a line of pretty
printers that goes back 13 years.  All of these printers use essentially
the same basic algorithm and conceptual interface.  Further, except for
PRINT*, which was implemented solely to satisfy the author's personal
needs, each of these printers has had extensive use.  XP has been in
experimental use as the pretty printer in CMU Common Lisp for 6 months.  PP
has been the pretty printer in DEC Common Lisp for the past 3 years.  Prior
to three years ago, GPRINT was used for 2 years as the pretty printer in
DEC Common Lisp.  In addition, GPRINT has been the pretty printer in
various generations of Symbolics Lisp for upwards of 5 years.
(See Waters R.C., "User Format Control in a Lisp Prettyprinter", ACM TOPLAS,
5(4):513--531, October 1983.)


Cost to Implementors:

A fair amount of effort (perhaps a few man-weeks at most).
Source code for XP is available to all comers from Dick Waters, and
the system is documented in great detail:

Waters, Richard C., "XP: A Common Lisp Pretty Printing System",
Artificial Intelligence Laboratory Technical Memo 1102,
Massachusetts Institute of Technology, Cambridge MA, March 1989.


Cost to Users:  None (I think).  This is an upward-compatible extension.

Cost of non-adoption:  Continued inability for user print-functions
to interact with the pretty-printer in a useful and portable manner.


Performance impact:  XP is claimed to be quite fast.

Benefits:  User control of pretty-printing in a portable manner.

Aesthetics:

Using ~<...~:> may strike some as uncomfortably close in the syntactic
space of FORMAT directives to the existing ~<...~>.  However, it is very
unlikely that both of these directives (pretty-print logical block and
columnar justification, respectively) will be used in the same call to
FORMAT.  Previous versions of XP used ~!...~. instead of ~<...~:> but this
made FORMAT strings very difficult to read; it is preferable to have
a directive that looks like matching brackets of some sort.

Dan Pierson comments:  You might mention that some people will undoubtedly
find piling more hair on FORMAT ugly (of course these same people may
well find FORMAT in general ugly :-)).

Discussion:

Zetalisp used ~:T to mean pixelwise tabulation, so the use of ~:T
suggested here may be a problem.  If so, another suggestion for naming
this directive would be appropriate.

The ~/.../ directive is already in Zetalisp, and is not an idea new
to this proposal.  However, it should be noted that the proposal for
~/.../ here is simpler than, and incompatible with, the current Zatalisp
practice.

Guy Steele and Dick Waters strongly support this proposal.  (As an example,
Guy Steele has a portable simulator for Connection Machine Lisp, and would
like very much to have xappings and xectors pretty-print properly.)


Dan Pierson comments: You can add me to the list of strong supporters of
this proposal.  While the proposal is long and complex, it is supported by
a long history of usage in several different Lisp environments.  Unlike
some earlier members of this family, this version fits cleanly enough into
the rest of Common Lisp to warrant standardization.

The utility of *PRINT-LINES* becomes more obvious if it is pointed out
that Dick's pretty printers are implemented to print each line as it
is computed.  This means that a small value for *PRINT-LINES* saves
significant time as well as output medium space.  In fact, many people
find that a very pleasant REP loop is created by setting *PRINT-LINES*
to a value from 1-4, *PRINT-PRETTY* to T, and defining a short-name
function (say (PP*)) that funcalls *LAST-ABBREVIATED-PRINTING* with
abbreviation bound off.  This is almost as fast and compact as, and
MUCH more readable than, a non-pretty-printing REP loop.

The advantages of compiled format strings (format functions) should be
brought out as benefits in their own right.  The current proposal just
mentions them as a minor feature of XP.

At first this struck me a very cute end run around the failure of
STREAM-INFO, then I realized that one of the problems with STREAM-INFO
may have been that it was a standard at the wrong level.  STREAM-INFO
permitted people to use XP, but not to count on it.  This proposal
makes it possible to write portable code whose new data structures and
language elements print correctly in whatever Common Lisp environment
they're run in.  [End of comments by Pierson]


It has been noted by Guy Steele that some places in the initial document
where it says that circularity detection is handled correctly, this is
true a fortiori following the decision on PRINT-CIRCLE-STRUCTURE.
However, Waters notes that the vote on PRINT-CIRCLE-STRUCTURE said
nothing about the handling of *PRINT-LEVEL*.  Therefore, the fact that
XP handles *PRINT-LEVEL* correctly is an improvement.

In addition, PRINT-CIRCLE-STRUCTURE is also silent on what is supposed
to happen if a user program decomposes a list itself (e.g., with DOLIST
or ~{~}) rather than calling a print function.  Assumedly *PRINT-CIRCLE*
etc. is not handled in this case.  In contrast, if one uses
WITHIN-LOGICAL-BLOCK or ~<~:>, then *PRINT-CIRCLE*, *PRINT-LEVEL*, and
*PRINT-LENGTH* are all automatically handled correctly.

For example, (format nil "-~1{~A ~A ~A ~A ~A ~}-" '#1=(1 #1# 2 . #1#))?
produces "-1 #1=(1 #1# 2 . #1#) 2 1 #1=(1 #1# 2 . #1#) -"
even under PRINT-CIRCLE-STRUCTURE and
(format nil "-~1{~A ~}-" '#1=(1 #1# 2 . #1#)) 
cause infinite looping.  However, in XP,
(format nil "-~:<~W ~W ~W ~W ~W~:>-" '#1=(1 #1# 2 . #1#))
produces "-#1=(1 #1# 2 . #1#)-".
This proves to be very useful when writing pretty printing functions for things.
Note also that ~<~:> supports *print-level* and *print-length* correctly.
All the same things can be said about the functional interface and using
WITHIN-LOGICAL-BLOCK rather than traversing a list yourself in some fashion.

All in all, Waters claims that PRINT-CIRCLE-STRUCTURE covers at most 1/4
of what XP does in support of *print-circle* and does not cover anything
of what XP does to support *print-level*, *print-length*, and
robustness in the face of malformed arguments.  These are vital
features for writing printing functions that really work right all the time.


It has been noted by Dave Moon that things would be much more elegant if
DEFINE-PRINT-DISPATCH could be expressed directly as a CLOS DEFMETHOD
for an appropriate generic function.  Dick Waters agrees with this.
However, DEFINE-PRINT-DISPATCH depends on type specifiers that are more
complex than the ones CLOS deals with and ones that do not have clear
subtype/supertype relationships, compensating for the latter problem by
supporting numerical priorities to disambiguate things.  (The defaulting
behavior is a key feature of the pretty printer.)  At the very least,
this means that DEFINE-PRINT-DISPATCH will not fit into CLOS in a simple way.

Given the problems, Moon suggests that "it does seem that right now it
might be best to keep a separate DEFINE-PRINT-DISPATCH macro, with the
idea that the expansion is implementation-dependent at the moment, but
might some day be changed to be defined to expand into DEFMETHOD.  I
haven't looked to see whether any syntactic changes would be appropriate
to make that transition smoother."

(Waters also worries that the overhead needed to locate the right CLOS
method would seriously degrade the pretty printer, because the printer
has to do this for every part of every object printed.  This dispatching is
currently done by very fast code that is tuned to take advantage of the
observed distribution of kinds of objects that have special pretty
printers attached to them.  Even with this special purpose code,
dispatching takes a significant part of the pretty printer's time.)


Dave Moon also comments that it is not good to have something that looks
like a type specifier (i.e., the extended form of the CONS type specifier
used by DEFINE-PRINT-DISPATCH) and yet is not a real type specifier.  He
suggests that we should either amend Common Lisp to accept the extended
form of the CONS type specifier, or stop having DEFINE-PRINT-DISPATCH
use it.  

Waters supports any course of action that retains the use of the
extended CONS type specifier in conjunction with DEFINE-PRINT-DISPATCH.
However, he notes that the trade-off is clear.  One could avoid the
complex CONS type specifier without any significant loss of
functionality by introducing a new macro DEFINE-LIST-PRINT-DISPATCH that
is identical to DEFINE-PRINT-DISPATCH except that it is relevant only to
conses and the type specifier applies to the CAR of the object to be
printed rather than to the object as a whole.  However, this appears to
him to be significantly less elegant than the current approach.

-------------------- detailed documentation --------------------

The full description is too large to fit in with everything else in this
message.  A fully correct version follows in a separate message.  The
stuff below summarizes all of the changes from the full description in
version 1.
                          Amendments

To a considerable extent, the design of the XP interface is completely
neutral about the issue of variable- versus fixed- width fonts.  In
particular, most of the discussion of how the formating proceeds either
talks about absolute positions of zero or talks about something being
in the same horizontal position as something else.  These statements are
all font-independent.  (Further, although Waters' current implementation
does not support variable-width fonts, the algorithms used could be
extended to support them without radical changes.)

Nevertheless, there are 9 places where users specify explicit
non-zero lengths: the variables *PRINT-RIGHT-MARGIN*,
*DEFAULT-RIGHT-MARGIN*, and *PRINT-MISER-WIDTH*, the numeric
arguments to ~T, ~I, and ~/tabular-style/ and their associated functions
LOGICAL-BLOCK-TAB, LOGICAL-BLOCK-INDENT, and TABULAR-STYLE.

It is proposed that all of these lengths be in the same units, and that
this unit be ems (the length of an "m" in the font currently being used
to output characters to the relevant output stream at the moment that
the command is encountered or a variable is consulted).

It is further proposed that users and implementors be advised to set
things up so that explicit lengths do not have to be specified.  For
implementors, this means making streams smart enough that they know how
wide they are.  (This avoids the use of *PRINT-RIGHT-MARGIN* and
*DEFAULT-RIGHT-MARGIN* in most situations.)  For users, this means
relying on streams knowing their own widths (which is a good idea for
adaptability in any case) and using ~:I to specify indentations wherever
possible.  Further, it should be noted that since *PRINT-MISER-WIDTH* is
essentially heuristic in nature, it does not matter if its value is only
an approximate length and users will only need to change the
value of *PRINT-MISER-WIDTH* in unusual situations.  This leaves only
tabbing as an area where explicit lengths have to be specified on a
regular basis.  Fortunately, approximate lengths are often acceptable in
this situation as well.

                  Functional Interface  

The primary interface to operations for dynamically determining the
arrangement of output is provided through FORMAT.  This is done,
because FORMAT strings are typically the most convenient way of
interacting with pretty printing.  However, these operations have
nothing inherently to do with FORMAT per se.  In particular, they can
also be accessed via the six functions and macros below.

WITHIN-LOGICAL-BLOCK (STREAM-SYMBOL LIST                     [Macro]
                      :PREFIX :PER-LINE-PREFIX :SUFFIX)
                      &BODY BODY

In the manner of ~<...~:>, this macro causes printing to be
grouped into a logical block.  The value NIL is always returned.

STREAM-SYMBOL must be a symbol.  If it is NIL, it is treated the same as
if it were *STANDARD-OUTPUT*.  If it is T, it is treated the same as if
it were *TERMINAL-IO*.  The run-time value of STREAM-SYMBOL must be a
stream.  The logical block is printed into this destination stream.

The BODY can contain any arbitrary Lisp forms.  Within the BODY,
STREAM-SYMBOL is bound to a special kind of stream that supports dynamic
decisions about the arrangement of output and then forwards the output
to the destination stream.  All the standard printing functions (e.g.,
WRITE, PRINC, TERPRI) can be used to print output into STREAM-SYMBOL.
All and only the output sent to STREAM-SYMBOL is treated as being in the
logical block.  (It is an error to send any output directly to the
underlying destination stream.)

The :SUFFIX, :PREFIX, and :PER-LINE-PREFIX must all be expressions that
(at run time) evaluate to strings.  :SUFFIX (which defaults to the null
string) specifies a suffix that is printed just after the logical block.
:PREFIX specifies a prefix to be printed before the beginning of the
logical block.  :PER-LINE-PREFIX specifies a prefix that is printed
before the block and at the beginning of each new line in the block.  It
is an error for :PREFIX and :PRE-LINE-PREFIX to both be used. If neither
is used, a :PREFIX of the null string is assumed.

LIST is interpreted as being a list that BODY is responsible for
printing.  If LIST does not (at run time) evaluate to a list, it is
printed using WRITE.  If *PRINT-CIRCLE* is not NIL and LIST is a
circular reference to a cons, then an appropriate #n# marker is printed.
If *PRINT-LEVEL* is not NIL and the logical block is at a dynamic
nesting depth of greater than *PRINT-LEVEL* in logical blocks, # is
printed.  If either of the three conditions above occures, the indicated
special output is printed on STREAM-SYMBOL and the BODY is skipped along
with the printing of the prefix and suffix.  (If the BODY is
not responsible for printing a list, then the first two tests above can
be turned off by supplying NIL for the LIST argument.)

CONDITIONAL-NEWLINE KIND &OPTIONAL (STREAM *STANDARD-OUTPUT*)    [Function]

CONDITIONAL-NEWLINE is the functional equivalent of ~_.  STREAM (which
defaults to *STANDARD-OUTPUT*) follows the standard conventions for
stream arguments to printing functions (i.e., NIL stands for
*STANDARD-OUTPUT* and T stands for *TERMINAL-IO*).  The KIND argument
specifies the style of conditional newline.  It must be one of :LINEAR,
:FILL, :MISER, or :MANDATORY.  If STREAM is a special stream bound by
WITHIN-LOGICAL-BLOCK, a conditional newline is sent to it.  Otherwise,
CONDITIONAL-NEWLINE has no effect.  The value NIL is always returned.

LOGICAL-BLOCK-INDENT RELATIVE-TO N &OPTIONAL (STREAM *STANDARD-OUTPUT*) [Function]

LOGICAL-BLOCK-INDENT is the functional equivalent of ~I.  STREAM (which
defaults to *STANDARD-OUTPUT*) follows the standard conventions for
stream arguments to printing functions.  N specifies the indentation in
ems.  If RELATIVE-TO is :BLOCK, this indentation is relative to the
start of the enclosing block (as for ~I).  If RELATIVE-TO is :CURRENT,
the indentation is relative to the current output position (as for ~:I).
It is an error for RELATIVE-TO to take on any other value.  If STREAM is
a special stream bound by WITHIN-LOGICAL-BLOCK, LOGICAL-BLOCK-INDENT
sets the indentation in the innermost enclosing logical block.
Otherwise, LOGICAL-BLOCK-INDENT has no effect.  The value NIL is always
returned.

LOGICAL-BLOCK-TAB KIND COLNUM COLINC &OPTIONAL (STREAM *STANDARD-OUTPUT*)

LOGICAL-BLOCK-TAB is the functional equivalent of ~T.  STREAM (which
defaults to *STANDARD-OUTPUT*) follows the standard conventions for
stream arguments to printing functions.  The arguments COLNUM and COLINC
correspond to the two numeric parameters to ~T and are in terms of ems.
The KIND argument specifies the style of tabbing.  It must be one of
:LINE (tab using ~T), :BLOCK (tab using ~:T), :LINE-RELATIVE (tab using
~@T), or :BLOCK-RELATIVE (tab using ~:@T).  If STREAM is a special
stream bound by WITHIN-LOGICAL-BLOCK, tabbing is performed.  Otherwise,
LOGICAL-BLOCK-TAB has no effect.  The value NIL is always returned.

LOGICAL-BLOCK-POP ARGS &OPTIONAL (STREAM *STANDARD-OUTPUT*)      [Macro]
LOGICAL-BLOCK-COUNT &OPTIONAL (STREAM *STANDARD-OUTPUT*)         [Macro]

LOGICAL-BLOCK-POP is identical to POP except that it supports
*PRINT-LENGTH* and *PRINT-CIRCLE*.  It is an error to use
LOGICAL-BLOCK-POP anywhere other than syntactically nested within a
call on WITHIN-LOGICAL-BLOCK.

ARGS must be a symbol or expression acceptable to POP.  STREAM (which
defaults to *STANDARD-OUTPUT*) follows the standard conventions for
stream arguments to printing functions.  If STREAM is a special stream
bound by WITHIN-LOGICAL-BLOCK, then LOGICAL-BLOCK-POP performs the
special operations described below.  Otherwise, LOGICAL-BLOCK-POP is
identical to POP.

Each time LOGICAL-BLOCK-POP is called, it performs three tests.  if
ARGS is not a cons, ". " is printed followed by ARGS.  If
*PRINT-LENGTH* is NIL and LOGICAL-BLOCK-POP has already been called
*PRINT-LENGTH* times within the immediately containing logical block,
"..." is printed.  If *PRINT-CIRCLE* is not NIL, and ARGS is a circular
reference, then ". " is printed followed by an appropriate #n# marker.
If either of the three conditions above occurs, the special output is
printed on :STREAM and the execution of the immediately containing
WITHIN-LOGICAL-BLOCK is terminated except for the printing of the
suffix.  Otherwise, LOGICAL-BLOCK-POP pops the top value off of ARGS
and returns this value.

LOGICAL-BLOCK-COUNT is identical to LOGICAL-BLOCK-POP except that it
does not take an ARGS argument, always returns NIL, and only performs
the second test discussed above.  It is useful when the components of a
non-list are being printed.

Using the functions above, TABULAR-STYLE could be defined as follows.

  (defun tabular-style (s list &optional (colon? T) atsign? (tabsize nil))
      (declare (ignore atsign?))
    (if (null tabsize) (setq tabsize 16))
    (within-logical-block (s list :prefix (if colon? "(" "")
				  :suffix (if colon? ")" ""))
     (when list
       (loop (write (logical-block-pop list s) :stream s)
	     (if (null list) (return nil))
	     (write-char #\space s)
	     (logical-block-tab :block-relative 0 tabsize s)
	     (conditional-newline :fill s)))))

    
The function below prints a vector using #(...) notation.
    
  (defun print-vector (v *standard-output*)
    (within-logical-block (nil nil :prefix "#(" :suffix ")")
      (let ((end (length v)) (i 0))
	(when (plusp end)
	  (loop (logical-block-count)
		(write (aref v i))
		(if (= (incf i) end) (return nil))
		(write-char #\space)
		(conditional-newline :fill))))))

FILL-STYLE STREAM LIST &OPTIONAL (COLON? T) ATSIGN?
LINEAR-STYLE STREAM LIST &OPTIONAL (COLON? T) ATSIGN?
TABULAR-STYLE STREAM LIST &OPTIONAL (COLON? T) ATSIGN? (TABSIZE 16)

The directives ~/fill-style/, ~/linear-style/, and ~/tabular-style/ are
supported by the three functions above.  These functions can also be
called directly by the user.  Each function prints parentheses around
the output if an only if COLON? (default T) is not NIL.  Each function
ignores its ATSIGN? argument and returns NIL.  (These arguments are
optional to facilitate the direct use of the three functions.)  Each
function handles abbreviation and circularity detection correctly, and
uses WRITE to print LIST when given a non-list argument.

The function LINEAR-STYLE prints a list either all on one line, or with
each element on a separate line.  The function FILL-STYLE prints a list
with as many elements as possible on each line.  The function
TABULAR-STYLE is the same as FILL-STYLE except that it prints the
elements so that they line up in columns.  This function takes an
additional argument TABSIZE (default 16) that specifies the column
spacing in ems.

[End of attached document]