[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Issue: PATHNAME-SUBDIRECTORY-LIST (version 6)



This issue is on the agenda for the June X3J13 meeting.  KMP and I
have prepared a revised writeup which we think is ready for release.
I'd like to distribute this to X3J13 as soon as discussion, if any,
in the cleanup subcommittee is completed.

Issue:          PATHNAME-SUBDIRECTORY-LIST
References:     Pathnames (pp410-413), MAKE-PATHNAME (p416),
                PATHNAME-DIRECTORY (p417)
Related-issues: PATHNAME-COMPONENT-CASE, PATHNAME-COMPONENT-VALUE
Category:       CHANGE
Edit history:   18-Jun-87, Version 1 by Ghenis.pasa@Xerox.COM
                05-Jul-88, Version 2 by Pitman (major revision)
                28-Dec-88, Version 3 by Pitman (merge discussion)
                22-Mar-89, Version 4 by Moon (fix based on discussion)
                19-May-89, Version 5 by Moon (improve based on mail)
                21-May-89, Version 6 by Moon (final cleanups)

Problem Description:

  It is impossible to write portable code that can produce a pathname
  in a subdirectory of a hierarchical file system. This defeats much of
  the purpose of the pathname abstraction.

  According to CLtL, only a string is a portable value for the directory
  component of a pathname.  Thus in order to denote a subdirectory, the use
  of punctuation characters (such as dots, slashes, or backslashes) would
  be necessary. The very fact that such syntax varies from host to host
  means that although the representation might be "portable", the code
  using that representation is not portable.

  This problem is even worse for programs running on machines on a network
  that can retrieve files from multiple hosts, each using a different OS
  and thus different subdirectory punctuation.

  Related problems:

  - In some implementations "FOO.BAR" might denote the "BAR" subdirectory
    of "FOO", while in other implementations it would denote a top-level
    directory, because "." is not treated as punctuation. To be safe,
    portable programs must avoid all potential punctuation characters.

  - Even in implementations where "." is used for subdirectories,
    "FOO.BAR" may be recognized by some to mean the "BAR" subdirectory of
    "FOO" and by others to mean `a seven character directory name with "."
    as the fourth character.'

  - In fact, CLtL does not even say whether punctuation characters are
    part of the string. eg, is "foo" or "/foo" the directory component for
    a unix pathname "/foo/bar.lisp". Similarly, is "[FOO]" or "FOO" the
    directory component for a VMS pathname "[FOO]ME.LSP"?
    PATHNAME-COMPONENT-VALUE:SPECIFY says punctuation characters are not
    part of the string.

Proposal (PATHNAME-SUBDIRECTORY-LIST:NEW-REPRESENTATION)

  Remove the "structured" directory feature mentioned on CLtL p.412.
  
  Allow the value of a pathname's directory component to be a list.  The
  car of the list is one of the symbols :ABSOLUTE or :RELATIVE.
  Each remaining element of the list is a string or a symbol (see below).
  Each string names a single level of directory structure.  The strings
  should contain only the directory names themselves -- no punctuation
  characters.

  A list whose car is the symbol :ABSOLUTE represents a directory path
  starting from the root directory.  The list (:ABSOLUTE) represents
  the root directory.  The list (:ABSOLUTE "foo" "bar" "baz") represents
  the directory called "/foo/bar/baz" in Unix [except possibly for
  alphabetic case -- that is the subject of a separate issue].

  A list whose car is the symbol :RELATIVE represents a directory path
  starting from a default directory.  The list (:RELATIVE) has the same
  meaning as NIL and hence is not used.  The list (:RELATIVE "foo" "bar")
  represents the directory named "bar" in the directory named "foo" in the
  default directory.

  In place of a string, at any point in the list, symbols may occur to
  indicate special file notations. The following symbols have standard
  meanings.  Implementations are permitted to add additional objects of any
  non-string type if necessary to represent features of their file systems
  that cannot be represented with the standard strings and symbols.
  Supplying any non-string, including any of the symbols listed below, to a
  file system for which it does not make sense signals an error of type
  FILE-ERROR.  For example, Unix does not support :WILD-INFERIORS.

   :WILD           - Wildcard match of one level of directory structure.
   :WILD-INFERIORS - Wildcard match of any number of directory levels.
   :UP             - Go upward in directory structure (semantic).
   :BACK           - Go upward in directory structure (syntactic).

  "Syntactic" means that the action of :BACK depends only on the pathname
  and not on the contents of the file system.  "Semantic" means that the
  action of :UP depends on the contents of the file system; to resolve
  a pathname containing :UP to a pathname whose directory component
  contains only :ABSOLUTE and strings requires probing the file system.
  :UP differs from :BACK only in file systems that support multiple
  names for directories, perhaps via symbolic links.  For example,
  suppose that there is a directory
    (:ABSOLUTE "X" "Y" "Z")
  linked to 
    (:ABSOLUTE "A" "B" "C")
  and there also exist directories
    (:ABSOLUTE "A" "B" "Q")
    (:ABSOLUTE "X" "Y" "Q")
  then
    (:ABSOLUTE "X" "Y" "Z" :UP "Q")
  designates
    (:ABSOLUTE "A" "B" "Q")
  while
    (:ABSOLUTE "X" "Y" "Z" :BACK "Q")
  designates
    (:ABSOLUTE "X" "Y" "Q")

  If a string is used as the value of the :DIRECTORY argument to
  MAKE-PATHNAME, it should be the name of a toplevel directory and should
  not contain any punctuation characters.  Specifying a string, str, is
  equivalent to specifying the list (:ABSOLUTE str).  Specifying the symbol
  :WILD is equivalent to specifying the list (:ABSOLUTE :WILD-INFERIORS),
  or (:ABSOLUTE :WILD) in a file system that does not support :WILD-INFERIORS.

  The PATHNAME-DIRECTORY function always returns NIL, :UNSPECIFIC, or a
  list, never a string, never :WILD.

  In non-hierarchical file systems, the only valid list values for the
  directory component of a pathname are (:ABSOLUTE string) and
  (:ABSOLUTE :WILD).  :RELATIVE directories and the keywords
  :WILD-INFERIORS, :UP, and :BACK are not used in non-hierarchical file
  systems.

  Pathname merging treats a relative directory specially.  Let
  <pathname> and <defaults> be the first two arguments to
  MERGE-PATHNAMES.  If (PATHNAME-DIRECTORY <pathname>) is a list whose
  car is :RELATIVE, and (PATHNAME-DIRECTORY <defaults>) is a list, then
  the merged directory is the value of
    (APPEND (PATHNAME-DIRECTORY <defaults>)
            (CDR  ;remove :RELATIVE from the front
              (PATHNAME-DIRECTORY <pathname>)))
  except that if the resulting list contains a string or :WILD immediately
  followed by :BACK, both of them are removed.  This removal of redundant
  :BACKs is repeated as many times as possible.
  If (PATHNAME-DIRECTORY <defaults>) is not a list or
  (PATHNAME-DIRECTORY <pathname>) is not a list whose car is :RELATIVE, the
  merged directory is
    (OR (PATHNAME-DIRECTORY <pathname>) (PATHNAME-DIRECTORY <defaults>))

  A relative directory in the pathname argument to a function such as
  OPEN is merged with *DEFAULT-PATHNAME-DEFAULTS* before accessing the
  file system.

Test Cases/Examples:

  (PATHNAME-DIRECTORY (PARSE-NAMESTRING "[FOO.BAR]BAZ.LSP")) ;on VMS
  => (:ABSOLUTE "FOO" "BAR")

  (PATHNAME-DIRECTORY (PARSE-NAMESTRING "/foo/bar/baz.lisp")) ;on Unix
  => (:ABSOLUTE "foo" "bar")
  or (:ABSOLUTE "FOO" "BAR")
  If PATHNAME-COMPONENT-CASE:KEYWORD-ARGUMENT passes with a default of
  :COMMON, the value is the second one shown.

  (PATHNAME-DIRECTORY (PARSE-NAMESTRING "../baz.lisp")) ;on Unix
  => (:RELATIVE :UP)

  (PATHNAME-DIRECTORY (PARSE-NAMESTRING "/foo/bar/../mum/baz")) ;on Unix
  => (:ABSOLUTE "foo" "bar" :UP "mum")

  (PATHNAME-DIRECTORY (PARSE-NAMESTRING ">foo>**>bar>baz.lisp")) ;on LispM
  => (:ABSOLUTE "FOO" :WILD-INFERIORS "BAR")

  (PATHNAME-DIRECTORY (PARSE-NAMESTRING ">foo>*>bar>baz.lisp")) ;on LispM
  => (:ABSOLUTE "FOO" :WILD "BAR")

Rationale:

  This would allow programs to usefully deal with hierarchical file
  systems, which are by far the most common file system type.
  This would allow a system construction utility that organizes programs
  by subdirectories to be portable to all implementations that have
  hierarchical file systems.

  Discussion indicated that "Implementations are permitted to add
  additional objects of any non-string type if necessary to represent
  features of their file systems that cannot be represented with the
  standard strings and symbols" is a necessary escape hatch for things like
  home directories and fancy pattern matching.  Implementations should
  limit their use of this loophole and use the standard keyword symbols
  whenever that is possible.

Current Practice:

  Symbolics Genera implements something very similar to this. The main
  differences are:
   - In Genera, there is no :ABSOLUTE keyword at the head of the list.
     This has been shown to cause some problems in dealing with root
     directories. Genera represents the root directory by a keyword
     symbol (rather than a list) because the list representation 
     was not adequately general.
   - Genera has no separate concepts of :UP and :BACK.  Genera
     represents Unix ".." as :UP, but deals with :UP syntactically, not
     semantically.

Cost to Implementors:

  In principle, nothing about the implementation needs to change except
  the treatment of the directory component by MAKE-PATHNAME and
  PATHNAME-DIRECTORY. The internal representation can otherwise be left
  as-is if necessary.

  Implementations such as Genera that already have hierarchical directory
  handling will have to make an incompatible change to switch to what
  is proposed here.

  For implementations that choose to rationalize this representation
  throughout their internals and any other implementation-specific
  accessors, the cost will be necessarily higher.

Cost to Users:

  None for portable programs. This change is upward compatible with CLtL.
  Nonportable programs will have to be changed if they use implementation
  dependent hierarchical directory handling and the implementation
  removes support for that when it adds support for this proposal.

Cost of Non-Adoption:

  Serious portability problems would continue to occur. Programmers would be
  driven to the use of implementation-specific facilities because the need
  for this is frequently impossible to ignore.

Benefits:

  The serious costs of non-adoption would be avoided.

Aesthetics:

  This representation of hierarchical pathnames is easy to use and quite
  general. Users will probably see this as an improvement in the aesthetics.

Discussion:

  This issue was raised a while back but no one was fond of the particular
  proposal that was submitted. This is an attempt to revive the issue.

  The original proposal, to add a :SUBDIRECTORIES component to a
  pathname, was discarded because it imposed an unnatural distinction
  between a toplevel directory and its subdirectories. Pitman's guess is
  the the idea was to try to make it a compatible change, but since most
  programmers will probably want to change from implementation-specific
  primitives to portable ones anyway, that's probably not such a big
  deal. Also, there could have been some programs which thought the
  change was compatible and ended up ignoring important information (the
  :SUBDIRECTORIES component). Pitman thought it would be better if
  people just accepted the cost of an incompatible change in order to
  get something really pretty as a result.

  Moon doesn't like having both :UP and :BACK, but admits that some
  file systems do it one way and some do it the other.  He still thinks
  it would be simpler not to have both.

  To keep it simple, we chose not to add to this issue the functions
  DIRECTORY-PATHNAME-AS-FILE and PATHNAME-AS-DIRECTORY, which convert
  the name of a directory from or to the directory component of a file
  inferior to that directory.  This conversion is system-dependent, for
  example TOPS-20 appends a type field and Unix does not.  Also in some
  systems the root directory has a name and in others it doesn't.  Of
  course these functions signal an error in non-hierarchical file
  systems.  Examples (for Unix, assuming #P print syntax for pathnames):
   (directory-pathname-as-file #P"/usr/bin/sh") => #P"/usr/bin"
   (pathname-as-directory #P"/usr/bin") => #P"/usr/bin"/
  These functions have not been proposed because they are mainly useful
  in conjunction with additional functions for manipulating directories
  (creating, expunging, setting access control) that have not been made
  available in Common Lisp.