MUMPS/CACHÉ PRIMER.
DRAFT COPY ONLY.
(Procedure 151).
G. William Moore, MD, PhD.
Chief, Quality Assurance Section.
Chief, Autopsy Section.
Jules J. Berman, PhD, MD.
http://www.netautopsy.org/axsop/axsop151.htm
NEXT PAGE
PREVIOUS PAGE
RETURN TO TABLE OF CONTENTS
United States Government Work, uncopyrighted, public-domain,
DRAFT COPY ONLY. This document does not necessarily represent the views
or policies of any United States Government agency. This document is
provided "as is", without warranty of any kind, express or implied,
including but not limited to the warranties of merchantability,
fitness for a particular purpose and noninfringement. In no event
shall the authors be liable for any claim, damages or other liability,
whether in an action of contract, tort or otherwise, arising
from, out of or in connection with the document or the use or
other dealings made with the document.
PRINCIPLE OF THE TEST.
The Massachusetts-General-Hospital Utility Multi-Programming
System (MUMPS) is the computer language that underlies
all VistA® software used in the Veterans' Affairs medical
computing systems, formerly DHCP. CACHEacute; is a
proprietary version of MUMPS used by VistA®, which underlies
the Microsoft® Windows®-based
Computerized Patient Record System (CPRS).
Routine users of VistA®/CPRS need not be fluent in MUMPS;
but an understanding of MUMPS concepts shows the context in which
the modern systems developed. Efforts to replace MUMPS have so far failed,
partly due to a failure of many commercial systems to carry over powerful
MUMPS concepts, such as implicit-sort and leading-substring-match.
SPECIMEN REQUIRED.
Not applicable.
REAGENTS, INSTRUMENTATION.
Not applicable.
STEP-BY-STEP DESCRIPTION.
1. TABLE OF CONTENTS.
1. Table of Contents.
2. VistA® computer system.
3. History of MUMPS.
4. MUMPS multitasking.
5. Implicit sorting.
6. MUMPS partial substring/successor match.
7. Patient encounter in VistA®
8. Binary tree.
9. Persistent objects.
10. Leading substring/successor match.
11. Easy to write simple M program code.
...............
22. M variable-arrays.
23. M OPERATORS.
24. M commands.
25. M functions.
26. ANSI/ISO standard M limits.
27. Mglobal M device numbers.
2. INTRODUCTION.
The VistA® computer system,
formerly DHCP (Decentralized Hospital Computer System), and the
Computerized Patient Record System (CPRS) employ
powerful software concepts present in modern medical informatics
systems. You don't need to be a programmer in order to understand
the important, pioneering concepts that underly the hospital software
system. The major concepts are: MULTITASKING, IMPLICIT SORTING,
and PARTIAL SUBSTRING MATCH.
3. HISTORY OF MUMPS.
The Massachusetts-General-Hospital Utility Multi-Programming System
(MUMPS) was invented by G. Octo Barnett, MD, (a medical school classmate
at Harvard of Dr. Iseri!) and Robert H. Greenes, MD, at the
Harvard Medical School Laboratory of Medical Computing [refs].
The language has been renamed as M or M-technology,
but the insiders and old-timers still call it MUMPS. In those days,
the dominant computer language was IBM's FORTRAN
(FORmula TRANslation), which was suitable for numerical calculations,
but not well-suited for searching, sorting, listing,
and other manipulations that are common in medical informatics.
The dominant computing industry emphasized single-user computing and
significant security systems, to keep users OUT OF, rather than
inviting them INTO the shared database.
In Barnett's laboratory, there were limited
computing resources (in contrast to today, where everyone has a powerful
computer on his/her desktop). MUMPS was originally designed to support
MULTITASKING, i.e., the simultaneous use of one computer
by many users at the same time, using a shared database, or GLOBAL.
The pioneering features of MUMPS are: MULTITASKING, IMPLICIT SORTING,
and PARTIAL SUBSTRING MATCH.
MUMPS had a rough start in the Department of Veterans' Affairs (DVA).
A small group of rebellious computer programmers in the DVA broke away
from the dominant computing system of the time, called TRIMIS
(Triservices Medical Information System), which consumed over
a billion U. S. tax dollars, and never took care of a single patient.
This cumbersome dinosaur just never got off the ground.
Nowadays, MUMPS systems underly all the DVA hospital software,
a system of 172 medical centers, with 85,000 federal workers
pounding away on MUMPS systems every day.
MUMPS systems are behind the largest commercial pathology laboratory
packages (CoPath, Sunquest), and are employed by The Johns Hopkins Hospital,
Baltimore, MD, the state hospital system of Finland, and several
large state hospital systems in Germany.
The VistA® computer system is written in M-Technology,
formerly MUMPS.
The special features of M-Technology that make it particularly
suitable as a programming language for patient records are:
4. MUMPS MULTITASKING.
In MUMPS, it has always been possible to do several jobs at once, and to
share data from one job that is running at the same time as another job.
The original purpose of this multitasking environment was to share data
and computing resources in a laboratory where computing resources
were expensive. Mainstream computing environments ignored
this sharing paradigm, because they were more concerned with
privacy and protection from data theft. The multitasking feature
of MUMPS has been taken over by Microsoft® Windows and
other modern operating systems, and in a larger sense, by the Internet.
5. IMPLICIT SORTING.
A MUMPS GLOBAL ARRAY takes the form, ^A(x,y,z,..),
where ^A() is the name of the array, x is the first argument,
y is the second argument, z is the third argument, etc.
A MUMPS global array may have dozens of arguments, and the arguments
may be either numbers or character-strings (enclosed in "").
When you read a global-data-array into a MUMPS program, the data are
implicitly sorted, primarily by the first argument,
secondarily by the second argument, etc.
MUMPS sorts character-strings in alphabetic order
(so-called collation sequence, established by the ASCII/ISO ordering).
6. MUMPS PARTIAL SUBSTRING/SUCCESSOR MATCH.
The MUMPS-order-command, $ORDER() or simply $O(),
If you have an array,
^A(),
and
^A("VETERAN")
is a valid argument for
^A(),
then
$O(^A("VETERAM") = "VETERAN"
that is, the next-argument-in-order after
^A("VETERAM")
is
^A("VETERAN")
Amazingly, this powerful feature of MUMPS has been virtually
ignored by the commercial computing software industry at large.
In the early days of MUMPS, the excuse for this neglect was
that it was computationally expensive. Nowadays, there is no excuse.
7. Each patient encounter is indexed primarily by the patient's identifiers,
and secondarily by the date/time when the specimen was obtained
from the patient. M-technology supports
approximate date/time data.
That is, a
Veterans Affairs Fileman Date
is a numeral with seven digits before the decimal-point,
and six digits after the decimal point, of the form,
cyymmdd.hhmmss
where
c
is the century digit (0=1700, 1=1800, 2=1900, 3=2000,...);
yy are the year-digits;
mm are the month-digits (before the decimal-point);
dd are the day-digits;
hh are the hour-digits;
mm are the minute-digits (after the decimal-point); and
ss are the second-digits;
Note:
Veterans Affairs Fileman does not have a Y2K problem, it has a Y2700 problem!
Since every U. S. Veteran was born after Y1700 (there were no septuagenarians
in the continental army!), every U. S. Veteran's birthdate
is representable as a Veterans Affairs Fileman Date.
8. Each patient-record in the M-Technology database is instantly
sorted when the record enters the computer. The sort is optimized
as a BINARY TREE, as shown below,
in the background, when the computer system has unused computing cycles.
________ Abraham
|
___________|
| |
Before Mzzz | |_______ Campbell
___________|
| | ________ Jones
| | |
| |__________|
| |
All | |_______ Miller
________|
|
| ________ Norbert
| |
| |
| ___________|
| | |
| | |_______ Rogers
| |
|__________|
After N |
| ________ Smith
| |
| |
|__________|
|
|_______ Zachary
In this example, it requires only three decisions to find JONES
in this sorted decision tree, whereas a random search of the eight names,
in the worst case, could require eight decisions.
This doesn't seem like much of a difference in a fast computer,
but consider the difference in a list of 5,000,000 veterans
in the VA system. Here we are speaking of a difference between
5 million decisions against 23 decisions (=log25,000,000)
on a sorted, binary decision tree. Multiply this by different
encounter-dates, different laboratory tests, etc., and you have
a noticeable difference in computer speed and efficiency.
Historically, the patient would enter the medical institution
from the emergency room, and enter several other divisions of the hospital
before his initial records and laboratory tests would catch up with him/her,
all because of slow sorting.
This feature, namely,
instant sorting and resorting in background,
is a unique feature of M-technology, and one of the main reasons
why other computer systems are unable to manage systems
as complex as the VA medical record system.
Remarkably, the executives that pay for patient record systems
don't seem to "get it", and continue to purchase systems
without the
instant sorting and resorting in background feature.
9. The sorted index remains available (i.e., doesn't have
to be reloaded) the next time one queries the index, so-called
PERSISTENT OBJECTS.
10. M-technology has a feature, unheardof in other commercial
programming environments, of
LEADING SUBSTRING MATCH.
To look up the record for VETERAN,JOHN Q, it suffices to
query for VETERAN,JO..., in case the patient's name is listed
in the system is
VETERAN,JONATHON or VETERAN,JOSEPH.
M-technology/VistA® also supports ALIASES,
or alternate names for the same person.
In fact, you don't even have to have the leading substring correct,
since M-technology supports
SUCCESSOR SUBSTRING MATCH. In the example, it suffices,
say, to look for VETERAN,JN.... The system looks down through
all the VETERAN,JNs, until it reaches
VETERAN,JOHN Q, VETERAN,JONATHON, VETERAN,JOSEPH, etc.
It is amazing how often that entry clerks need to enter
slight variants of a patient's name in order to find the patient
on the system. It is a nightmare to untangle records that have been
entered hastily for the same patient under different names.
And it is unethical and unreasonable to deny care to a patient who
enters the emergency room, say, in extremis, without all his/her
proper identification paperwork. The VA is much better at
patient identification than some community hospitals,
where the computer system does not support these nuances.
11. It is easy to learn to write simple program code
in M-technology. Software programming projects that require
a multiperson programming staff and months of effort
for composing a database in C or Oracle,
can be prototyped in a day by a skilled M-programmer.
11.1 Here is a sample MUMPS program
that sorts a list of patient names.
ROUTINE ^PATLIST.2A
PATLIST ;GWM,SORT PATIENT LIST,,;20APR04 3:18PM;;
ENTRY ; ENTER PATIENT NAMES.
S ^P("CAMPBELL")="" ;
S ^P("MILLER")="" ;
S ^P("ABRAHAM")="" ;
S ^P("NORBERT")="" ;
S ^P("JONES")="" ;
S ^P("ZACHARY")="" ;
S ^P("ROGERS")="" ;
S ^P("SMITH")="" ;
; INITIALIZE SUCCESSOR LIST ;
S O=-999999 ;
; PRINT SUCCESSOR LIST ;
READ S O=$O(^P(O)) G:(O="") EXIT W " ",O G READ ;
EXIT H ;
*D ^PATLIST
ABRAHAM CAMPBELL JONES MILLER NORBERT ROGERS SMITH ZACHARY
The commands are:
Set.
Write.
Go-to.
Go-to: (i.e., conditional go-to).
Halt.
Comment begins with ;
The beginning of the successor list is negative-infinity, here denoted as
O=-999999.
The end of the successor list is null-string, here denoted as
O="".
Note that, although the list was entered in arbitrary order,
the list is returned to the printer in alphabetical order
(so-called COLLATION SEQUENCE).
There is no specific SORT command in MUMPS.
Just enter the data, and the list is immediately sorted.
This is one of MUMPS's most powerful features.
12. M-technology is ignored by academic computer scientists (too easy
to learn); and hated by businesses (same reason).
13. HIGH-LEVEL COMPUTER LANGUAGES: REVIEW.
1. BASIC: Beginners All-purpose Symbolic Instruction Code. Easy
to learn, bundled free with your IBM-compatible operating system
(MS-DOS ver. 5.0 or greater), aggressively marketed by MicroSoft.
2. VISUAL BASIC: Programming power of ordinary BASIC married to
an easy-to-use graphics-user-interface (GUI). Price: $100.
A fantastic value.
3. MUMPS: Recently renamed `M', to downplay its association with
acute parotitis. Expensive ($250, no discounts), poor graphics,
unless you pay extra. Very good for searching and sorting large
text files. It's easy to write in M, and even easier to write
sloppy code (`spaghetti code') that nobody else can understand.
International Standards Organization (ISO) standard no 11456.
4. SmallTalk: The ultimate object-oriented computer language.
Hard to learn, but once you know it, extremely powerful.
If you learn by reading error messages, SmallTalk is not for you.
The error messages in SmallTalk are inscrutable.
5. C: The language that does everything. Very fast. Hard to learn.
Input/output and memory management features highly challenging.
6. COBOL: COmmon Business Oriented Language.
A great leap forward in the late 1940s, pioneered by the late
Commodore Grace Hopper, PhD, U.S. Navy. Still used in some businesses
with antique management information systems (MIS) departments.
7. Java: The language that does everything on the internet.
Very fast. Hard to learn. Input/output and memory management features
highly challenging.
8. FORTRAN: FORmula TRANSlation. Pioneering scientific computer
language used for writing number-crunching programs. Now virtually
overtaken by BASIC.
9. PASCAL: Named after the seventeenth century Swiss mathematician
and philosopher. Computer language with an anal-retentive
personality disorder. Used mostly in university courses, to teach
discipline through torture to rebellious computer science students.
10. LISP: So-called `thinking man's language of artificial
intelligence'. Poor input-output capabilities. The programming
language is a sea of parentheses, which will drive you crazy.
11. Solder: Alloy of lead and tin, used for connecting wires on a
circuit board. Favorite programming language of the old-timers.
14. M LANGUAGE EMPTY STRING IN ARRAYS.
1. An empty string ("") represents the first element in the M (=MUMPS)
collating sequence.
2. All other values sequentially follow the empty string.
3. However, the empty string cannot be added to an array.
(it is only used as the starting and ending subscript values
for the $O(rder) or $Q(uery) commands
15. M (FORMERLY, MUMPS) AS A DEVELOPMENT LANGUAGE.
1. MUMPS: Massachusetts General Hospital Utility Multi-programming System.
2. Invented by Neil Pappalardo and Curt Marble in the late 1960's.
3. Excellent language for utility programs, particularly involving
sorting and string manipulation.
4. Promoters of M claim that M requires only 8 to 12%
of the development time as other programming languages.
16. M (FORMERLY, MUMPS) FOR INDEXING TEXT.
1. M has especially powerful capabilities for string manipulation,
sparse data arrays, and implicit sorting, all used heavily
by the indexing software in hospital information systems,
as well as the indexing software used by the `Lightning Hypertext'.
2. M also has a unique capability for modifying its own code
`on the fly', i.e., an M computer program can rewrite itself
during execution, based on data received or calculations.
3. M is not particularly good for making `brute force' repetitive
numerical calculations, such as predicting lunar orbits or making
quantum mechanics estimations
17. M ROUTINE.
1. A `routine' is an M program module.
2. To create a software application in M, the programmer creates
and edits one or more routines.
3. One routine can call another routine, using a DO or GOTO command.
4. The programmer writes a routine in `source code'.
5. Source code is compiled by the system to obtain faster-running
`object code' in machine language.
18. M JOB OR PROCESS.
1. M was originally designed as a multitasking system, with multiple
`jobs' or `processes' running simultaneously.
2. Each user `owns' his/her own keyboard and monitor, and controls an
`active job' running on that keyboard-and-monitor.
3. Any number of additional `background jobs' may be running simultaneously.
4. Actually, the computer allots short time intervals to each job,
and executes a small part of each job in a round-robin-sequence.
5. Each job occupies a unique memory area, called a `partition',
and has its own, unique `process identification', contained
in the $JOB variable.
19. M LOCAL AND GLOBAL VARIABLES.
1. An M variable is a named item, for example, A(1)=2.
In this example, `1' is the `argument' and `2' is the `value'.
Arguments and values may be either numeric or character-strings.
2. M variables may be `local' or `global'.
3. A `local variable' is a named memory item, specific to the job
which created it. It disappears when the job terminates.
4. A `global variable' is a named database record, accessible to
all jobs running in the system.
5. A global variable may be seized by a LOCK command. Then any other
job seeking access to that global variable must wait until
the first user unlocks that global variable.
20. M DEVICES.
1. All sequential input/output is directed through `logical devices',
numbered from 0 through 255.
2. Devices include the user's keyboard (input device 0),
the video-monitor (output device 0), printer ports,
RS-232 serial ports, floppy-disk and hard-disk files, etc.
3. The OPEN and CLOSE commands turn a device on or off.
4. The USE command directs input/output to a particular device.
5. The READ and WRITE commands perform input and output through
the device which has most recently been USEd.
21. M (FORMERLY, MUMPS) AS A COMPUTER LANGUAGE.
1. M has only a handful of operators, commands, and functions,
from which highly sophisticated programs can be written.
2. If the programmer memorizes only 30 terms, he or she can begin
writing serviceable M applications.
3. M does not check for correct syntax of a statement
until the program actually tries to execute that statement.
4. This feature handy for `quick and dirty' debugging and
for rapid prototyping, because one is not forced to write
an entirely syntactically correct program in order to test
early versions of the program
22. M VARIABLE-ARRAYS.
1. There is only one variable-type in M: the character-string.
2. A variable may be single or an array. An `array' has a `name',
1 or more `arguments', and a `value'. For example,
`A(1,2,3)=17' is 1 element from an array named `A()',
with arguments 1,2,3 and value 17.
3. M arrays have the form of a hierarchical tree. For example:
A(1)=2
A(1,2)=5
A(1,2,3)=17
A(1,2,6)=14
4. Unlike FORTRAN or BASIC, M does not require a DIMENSION statement
for preassigning storage space for a variable.
If you `SET A(1,2,3)=17', then A(1,2,3) exists;
if you `KILL A(1,2,3)', then A(1,2,3) ceases to exist.
23. M OPERATORS.
There are only a few operators, commands, and functions necessary
to write a functional M program. The operators are:
+ - plus ' - not
- - minus ampersand symbol - and
* - times exclamation point symbol - or
/ - divided by : - if
\ - divided by, ; - end-command-line
truncate to integer
[ - contains
] - follows
24. M COMMANDS.
Important M commands include:
B - Break. O - Open.
C - Close. Q - Quit.
D - Do. R - Read.
E - Else. S - Set.
F - For. U - Use.
G - Go to. W - Write.
H - Halt. X - eXecute.
I - If.
K - Kill.
25. M FUNCTIONS.
Important M functions include:
$A - ASCII. $O - Order.
$C - Character. $P - Piece.
$E - Extract. $R - Random.
$F - Find. $S - Set.
$G - Get. $T - Timeout/Truth.
$H - Horolog (=clock).
$J - Justify.
26. ANSI/ISO STANDARD M LIMITS.
1. ANSI = American National Standards Institute.
ISO = International Standards Organization.
MUMPS is a ANSI/ISO standardized computer language,
with published language standards.
2. A string may not exceed 255 characters. In some commercial products,
the string-limit may be much longer.
3. The length of the variable-name plus argument-string may not exceed
127 characters. For example, the following string has 59 characters:
ARRAY("the quick brown fox","jumped over","the lazy dogs")
4. Every number is a valid argument. Every string containing only ASCII
values between 32 and 126 is a valid argument. Numeric calculations
are supported between 10-25 and 1025. Numeric
precision is 12 decimal digits.
5. Variable names may be any length, but they must differ from 1 another
within the first 8 letters. Variable names must start with
% or an alphabetic letter (upper-case or lower-case).
6. Source code for each routine may not exceed 5 Kilobytes.
7. Maximum storage for local variables may not exceed 5 Kilobytes.
27. MGLOBAL M DEVICE NUMBERS.
1. For MGlobal M, the device numbers are as follows:
0. The user's own keyboard and monitor.
1. System console.
2. Parallel printer 1 (LPT1).
3. Parallel printer 2 (LPT2).
5. MS-DOS file access channel 1.
6. MS-DOS file access channel 2.
8. Serial port 1.
9. Serial port 2.
2. Different vendors have different device-number assignments,
or allow the user to define his/her own device-numbers.
28. MGLOBAL M PROGRAM EDIT COMMANDS.
1. These edit commands are specific for MGlobal M. These commands
can be used in programmer mode, i.e., in the ZE editor.
B ... Breaks the current line into 2 lines at the cursor.
E ... Moves the cursor to the end of the current line.
.I ... Inserts lines when the cursor is put at the beginning of the line
where you want the lines inserted.
.Q ... Quit without filing.
.F ... File (save) the program.
.R ... Remove line from program.
2. Other M vendors have other editing commands.
29. CONTROLLING M ROUTINES DURING EXECUTION.
1. S or B suspends the routine.
2. Q starts it up again.
3. C interrupts a routine so that you can't continue it again.
REFERENCES.
1. Berman JJ, Moore GW, et al.
The Lightning Hypertext of Disease.
http://www.pathinfo.com
2.
Brown DG, Brown G, Goldstein M.
Introduction to CCS MUMPS.
COMP Computing, Inc. 1601 Westheimer,
Suite 201, Houston, Texas 77006, 1985;:1-99.
3.
DataTree, Inc.
DataTree MUMPS-PC System Overview, v. 4.2.
DataTree, Inc., 300 Fifth Ave,
Waltham, MA 02154 1-617-890-1620, 1991;:10-13.
4.
DataTree, Inc.
DataTree MUMPS Language Reference, v. 4.2.
DataTree, Inc., 300 Fifth Ave,
Waltham, MA 02154 1-617-890-1620, 1991;:16-17.
5.
Walters RF, Bowie J, Wilcox JC.
Mumps Primer. M Technology Association,
1738 Elton Road, Suite 205, Silver Spring, MD 20903-1725.
6.
Kirsten W.
Von ANS MUMPS zu ISO/M.
epsilon Verlag, Darmstadt Hochheim. 1993;:47-84.
7.
Dvorak JC.
Inside track.
PC Magazine. 1991 May 28;:83.
Volume in drive C has no label.
Volume Serial Number is 5868-509D
8.
Free MUMPS/Fileman, at URL:
http://www.hardhats.org
For a free, single-user demo version of VistA/MUMPS, go to the above website.
At the top of the home webpage, you will see:
Search | HOME | MUMPS | Fileman | .......
Click on MUMPS. Scroll down to:
Learn M and FM v21 for free.
Click on Download, and save the file, fm_ws.zip.
Use PKUNZIP.EXE, in order to UNZIP the ZIPped file,
fm_ws.zip.
Directory:
02/23/2005 11:03 AM .
02/23/2005 11:03 AM ..
01/26/1999 09:34 AM 10,240,000 FILEMAN.M
01/24/1999 02:29 AM 159,744 FM.EXE
02/23/2005 11:00 AM 3,210,667 fm_ws.zip
01/25/1999 03:13 PM 2,450,704 MSMWS002.DLL
02/01/1993 01:04 AM 28,959 PKUNZIP.EXE
01/26/1999 09:49 AM 287 README.TXT
6 File(s) 16,090,361 bytes
2 Dir(s) 22,820,491,264 bytes free
type readme.txt
The FM.EXE file runs the VA's File Manager program from the data stored
in the Fileman.m file. A limitation of this current edition, is that the
three files (msmws002.dll, fm.exe and Fileman.m) must be stored in the
following path:
C:\Program Files\Micronetics\MSMWS\Program\
AUTHOR AND EFFECTIVE DATE.
Date last revised: 1/5/2004, Dong H. Lee, MD.
Signature and date approved:
Chief, Pathology and Laboratory Medicine Service (113)
Chief, Anatomic Pathology Section
Supervisor, Histology and Cytology