[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dylan rather than CL -- why?



> 					      In addition, where the
>    target hardware can support it, it is nice to have at least some of
>    the development environment present, to allow end-user programming and
>    live system debugging.
> 
> Ah, but here you're pulling the whole environment back in.  What IS an
> important goal is to be able to do dump analysis.  But that doesn't
> necessarily require having the development environment loaded in the
> standard execution environment.

Certainly dump analysis is important, but debugging an active
production system can be even more useful/important.

Consider the case of a system with an extensive and complicated (from
the developer's perspective!) user interface: when a user of this
system discovers an intermittent problem which seems to only happen at
random intervals, crash dumps (even forced crash dumps) are virtually
useless most of the time.  What is really needed isthe capability for
the user to call a help line saying: "It's happening now!" and then
have a developer "attach" to the system's debugging environment.

I can say that this is most useful from experience; for my current
system project (a distributed financial trading system with ~150 users
on three continents using three synchronized databases and multiple
domain-specific compute-server processes) we incorporated a GDB
"client" into the system, allowing a developer to use GDB to
"attach" to the remotely executing system and do some basic
debugging.  This system has been a big success; we have been able to
track down many problems which had been open for several months,
thereby improving service to our end-users.  

However, it would be even more useful if more development support
could be provided at this level: currently, given that the system is
built in C, we are pretty much limited to error-detection; a fix
requires a recompile and subsequent shutdown and redistribution of
executables.  If we had a development environment built into the
system, we would be able to make changes to the executing software,
and prevent any interruption to end-user's work.  Furthermore, if this
development environment were "linked" to the main development and code
control environment, we would have the capability to track and fold
such changes into the master source inventory, preventing versioning
problems.  For today's large-scale end-user oriented systems, it seems
like this would be a rather important development!

To further the above argument with a rather controversial point,
consider the following: in my business, "time to market" (while we
don't sell our software, our end-users are our "market") is
everything.  Our business uses the software we develop to gain
competitive edge; frequently, the difference of only a few days can
mean large differences in business revenue.  In this environment,
software testing is a difficult subject: on the one hand, you don't
want error-prone software; on the other hand, we can't afford (from a
time to market perspective) exhaustive testing.  Today, we do the best
we can.  However, if we were able to bring the cost of fixing certain
software errors down -- in particular user interface problems -- we
could then spend more time with exhaustive testing on the most
critical parts, and let morte of the less critical errors slide by, to
be fixed when they happen in the production environment.  While this
may sound rather outrageous, in a time constrained environment, you
have little choice.

Of course, the arguments of "use a better development
language/environment" (as opposed to C) come readily to mind as more
appropriate first-cut solutions to the problem than "permitting"
software errors; however, I think we still need to come to terms with
the fact that today's time-constrained business climate precludes
software proofs, which of course means that some quantity of errors
will always occur (in fact, it's not clear to me that with "proved"
software errors, particularly semantic errors, would be a thing of the
past).  As business software developers, we need to rank potential
errors in terms of severity, and then allocate our detection and
resolution capabilities according to risk.  Today, while not all
errors are equally severe, the cost of locating and fixing an error is
constant; I feel that my developer's life would be much improved if we
could allocate our expensive detection and resolution skills to the
serious errors (errors with potential to have large-scale negative
impact on the business), and sole the less serous errors at a lower
cost (on-line software analysis and software change).

Just some more thoughts; thanks to everyone for the replies to my
previous letter!

-frank