[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[reid%Shasta: editor evaluations]

To: bug-emacs@MIT-OZ
Subject: [reid%Shasta: editor evaluations]
From: Leigh L. Klotz <KLOTZ@MIT-OZ>
Date: Mon, 16 May 1983 21:21 EDT
Date: Sunday, 15 May 1983 11:02-PDT
From: Brian Reid <reid%Shasta at SU-SCORE.ARPA>
To:   editor-people at SCORE
Re:   editor evaluations
ReSent-date: Mon 16 May 83 00:01:04-PDT
ReSent-from: J.Q. Johnson <JQJ@SU-SCORE.ARPA>
ReSent-to: Editor People: ;

I was a participant (technical EMACS) in the Roberts-Moran study, and
so was my sister (nontechnical Star).  Based on those two data points
I can only conclude that the entire study is fraudulent.  I got my
hands on a draft of the paper last fall, and exchanged quite a number
of messages with Moran about it.  He used magnificently circular
reasoning to discount my complaints.  It seems to be typical of some
"researchers" in the psychology/human factors area to put data above
reality, and to believe that a bunch of measurements taken to 3
significant figures, regardless of the meaning of what is being
measured, constitutes truth.  I suppose that in a sense he is
harmless, because he will just go off and design an editor based on
his idea of the truth, but it annoys me to have pseudo-fact being
paraded as demonstrated truth.

I normally use EMACS on a Concept-100 terminal running at 9600 baud
on a lightly-loaded VAX.  Like all serious EMACS users, I have a set
of custom key-bindings and custom text-oriented functions that I have
developed over the years, and a set of work habits and strategies
based on the response speed to which I am accustomed.

When I went to Xerox to participate in this study, I was required to
use an Alto running a DM-2500 simulator at about 1000 baud, connected
over an Ethernet link to a DEC-10 running a fairly old TENEX EMACS.
I complained bitterly at the time that (a) this setup was too slow,
less than 1/10th the speed at which I was accustomed to running, that
(b) the keystroke/response delays were short-term unpredictable
(Ethernet delays depend on other load on the ethernet), that (c) the
Alto was in terrible adjustment (small fuzzy letters) and that (d)
the cursor was nearly invisible [the DMchat program that they were
running has two modes of operation, one with a fly-speck cursor and
the other with a bigger boldface cursor, and I was unable to get the
experimentor to let me fetch a version of DMchat that had a visible
cursor].  A final factor invalidating the data was that DMchat/Alto
shows 60 lines on the screen, so that whenever the display must be
completely repainted it takes a substantial fraction of a minute
(80*60/100=48 seconds to redraw the entire screen.

Because of the intermittent and unpredictable delays in the Ethernet
connection I was unable to make proper use of cursor-positioning
commands:  I had to wait for the display to quiesce after each burst
of cursor commands, and I often found myself overshooting a word or a
line and needing to go back the other direction.  Because of the
nearly-invisible cursor I was unable to make proper use of search
commands for positioning the cursor; I often found myself issuing a
search and then spending 10 or 15 seconds hunting for the cursor on
the screen.  Because of the enormous screen and the slow baud-rate I
was unable to scroll through the file just looking for things, and
whenever a search went off the end of the current window there was a
30-second penalty while the screen was repainted.  I finally realized
the big-screen problem about halfway through the editing session and
split the screen into two windows and shrunk the one I was using,
which helped a lot.

When I complained to Moran that the data he had taken on my EMACS
usage bore no relation to the way that EMACS is used in real life, he
responded with the smug circular argument that "the EMACS data
corresponded perfectly with the keystroke model, and therefore it
must be right." This "keystroke model" is a creation of Moran and
some of his colleagues, based on data that they have taken in
previous years.  Claiming that the keystroke model supports this data
is tantamount to claiming that their sloppy experimental technique
has not improved over the years, and that the measurements on which
they based their keystroke model were just as bad as the measurements
they are making today.

I had reasons more than mere disgust with their experimental
technique to be suspicious of their results.  I had spent a good part
of the previous summer doing BravoX editing at Xerox PARC, and had
measured the number of wall clock minutes that it took me to make
various changes.  I often time myself for Emacs edits in my office,
using my wristwatch rather than some fancy electronic measuring gear,
and my speed at editing with Emacs in my office was uniformly about
double my speed at editing with BravoX at Xerox, for similar kinds of
material.  I would be reluctant to claim even one significant figure
of precision in these measurements, since the material being edited
was not the same, and I included the time to take a sip of coffee, to
listen to conversation in the hall, and sometimes to read the
material that I was editing.  But in spite of all this noise it was
overwhelmingly obvious to me that Emacs let me edit much more quickly
than BravoX.

It is nice to have people taking real data.  Numbers are often good.
But numbers are not, of necessity, correct, and I think that the
community of people who design and evaluate text editors needs to be
extremely careful when reading the results of this particular study.
It won't make a whole lot of difference to posterity whether or not
Emacs is better or worse than editor X or Y, but it will certainly
matter if a large body of belief about text editors gets built up
from completely incorrect data.  This whole Roberts/Moran study
reminds me very much of the old saw about the drunk who was looking
for his lost wallet under the streetlamp, even though he didn't lose
it there, because the light was better.  I don't think anybody really
knows how to measure effectiveness of text editors as they are
actually used, so instead people find things that they do know how to
measure, and then concoct elaborate theories trying to tie these
measurements to the elusive unmeasurable truth.

Be careful of the great god whose name is Data; he is a slippery and
deceitful god.

	Brian Reid
	Stanford
Prev by Date: file name defaulting
Next by Date: [no subject]
Previous by thread: file name defaulting
Next by thread: [no subject]
Index(es):
- Date
- Thread