MUMPS/CACHÉ PRIMER.
DRAFT COPY ONLY.
(Procedure 151).

G. William Moore, MD, PhD.
Chief, Quality Assurance Section.
Chief, Autopsy Section.
Jules J. Berman, PhD, MD.
http://www.netautopsy.org/axsop/axsop151.htm


NEXT PAGE
PREVIOUS PAGE
RETURN TO TABLE OF CONTENTS

United States Government Work, uncopyrighted, public-domain, DRAFT COPY ONLY. This document does not necessarily represent the views or policies of any United States Government agency. This document is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the document or the use or other dealings made with the document.



PRINCIPLE OF THE TEST.

      The Massachusetts-General-Hospital Utility Multi-Programming System (MUMPS) is the computer language that underlies all VistA® software used in the Veterans' Affairs medical computing systems, formerly DHCP. CACHEacute; is a proprietary version of MUMPS used by VistA®, which underlies the Microsoft® Windows®-based Computerized Patient Record System (CPRS). Routine users of VistA®/CPRS need not be fluent in MUMPS; but an understanding of MUMPS concepts shows the context in which the modern systems developed. Efforts to replace MUMPS have so far failed, partly due to a failure of many commercial systems to carry over powerful MUMPS concepts, such as implicit-sort and leading-substring-match.



SPECIMEN REQUIRED.


      Not applicable.



REAGENTS, INSTRUMENTATION.


      Not applicable.



STEP-BY-STEP DESCRIPTION.


1. TABLE OF CONTENTS.
1. Table of Contents.
2. VistA® computer system.
3. History of MUMPS.
4. MUMPS multitasking.
5. Implicit sorting.
6. MUMPS partial substring/successor match.
7. Patient encounter in VistA®
8. Binary tree.
9. Persistent objects.
10. Leading substring/successor match.
11. Easy to write simple M program code.
...............
22. M variable-arrays.
23. M OPERATORS.
24. M commands.
25. M functions.
26. ANSI/ISO standard M limits.
27. Mglobal M device numbers.


2. INTRODUCTION. The VistA® computer system, formerly DHCP (Decentralized Hospital Computer System), and the Computerized Patient Record System (CPRS) employ powerful software concepts present in modern medical informatics systems. You don't need to be a programmer in order to understand the important, pioneering concepts that underly the hospital software system. The major concepts are: MULTITASKING, IMPLICIT SORTING, and PARTIAL SUBSTRING MATCH.

3. HISTORY OF MUMPS. The Massachusetts-General-Hospital Utility Multi-Programming System (MUMPS) was invented by G. Octo Barnett, MD, (a medical school classmate at Harvard of Dr. Iseri!) and Robert H. Greenes, MD, at the Harvard Medical School Laboratory of Medical Computing [refs]. The language has been renamed as M or M-technology, but the insiders and old-timers still call it MUMPS. In those days, the dominant computer language was IBM's FORTRAN (FORmula TRANslation), which was suitable for numerical calculations, but not well-suited for searching, sorting, listing, and other manipulations that are common in medical informatics. The dominant computing industry emphasized single-user computing and significant security systems, to keep users OUT OF, rather than inviting them INTO the shared database. In Barnett's laboratory, there were limited computing resources (in contrast to today, where everyone has a powerful computer on his/her desktop). MUMPS was originally designed to support MULTITASKING, i.e., the simultaneous use of one computer by many users at the same time, using a shared database, or GLOBAL. The pioneering features of MUMPS are: MULTITASKING, IMPLICIT SORTING, and PARTIAL SUBSTRING MATCH.

MUMPS had a rough start in the Department of Veterans' Affairs (DVA). A small group of rebellious computer programmers in the DVA broke away from the dominant computing system of the time, called TRIMIS (Triservices Medical Information System), which consumed over a billion U. S. tax dollars, and never took care of a single patient. This cumbersome dinosaur just never got off the ground.

Nowadays, MUMPS systems underly all the DVA hospital software, a system of 172 medical centers, with 85,000 federal workers pounding away on MUMPS systems every day. MUMPS systems are behind the largest commercial pathology laboratory packages (CoPath, Sunquest), and are employed by The Johns Hopkins Hospital, Baltimore, MD, the state hospital system of Finland, and several large state hospital systems in Germany. The VistA® computer system is written in M-Technology, formerly MUMPS. The special features of M-Technology that make it particularly suitable as a programming language for patient records are:

4. MUMPS MULTITASKING. In MUMPS, it has always been possible to do several jobs at once, and to share data from one job that is running at the same time as another job. The original purpose of this multitasking environment was to share data and computing resources in a laboratory where computing resources were expensive. Mainstream computing environments ignored this sharing paradigm, because they were more concerned with privacy and protection from data theft. The multitasking feature of MUMPS has been taken over by Microsoft® Windows and other modern operating systems, and in a larger sense, by the Internet.

5. IMPLICIT SORTING. A MUMPS GLOBAL ARRAY takes the form, ^A(x,y,z,..), where ^A() is the name of the array, x is the first argument, y is the second argument, z is the third argument, etc. A MUMPS global array may have dozens of arguments, and the arguments may be either numbers or character-strings (enclosed in "").
When you read a global-data-array into a MUMPS program, the data are implicitly sorted, primarily by the first argument, secondarily by the second argument, etc. MUMPS sorts character-strings in alphabetic order (so-called collation sequence, established by the ASCII/ISO ordering).

6. MUMPS PARTIAL SUBSTRING/SUCCESSOR MATCH. The MUMPS-order-command, $ORDER() or simply $O(), If you have an array, ^A(), and ^A("VETERAN") is a valid argument for ^A(), then $O(^A("VETERAM") = "VETERAN" that is, the next-argument-in-order after ^A("VETERAM") is ^A("VETERAN")
Amazingly, this powerful feature of MUMPS has been virtually ignored by the commercial computing software industry at large. In the early days of MUMPS, the excuse for this neglect was that it was computationally expensive. Nowadays, there is no excuse.

7. Each patient encounter is indexed primarily by the patient's identifiers, and secondarily by the date/time when the specimen was obtained from the patient. M-technology supports approximate date/time data. That is, a Veterans Affairs Fileman Date is a numeral with seven digits before the decimal-point, and six digits after the decimal point, of the form, cyymmdd.hhmmss where c is the century digit (0=1700, 1=1800, 2=1900, 3=2000,...); yy are the year-digits; mm are the month-digits (before the decimal-point); dd are the day-digits; hh are the hour-digits; mm are the minute-digits (after the decimal-point); and ss are the second-digits;
Note: Veterans Affairs Fileman does not have a Y2K problem, it has a Y2700 problem! Since every U. S. Veteran was born after Y1700 (there were no septuagenarians in the continental army!), every U. S. Veteran's birthdate is representable as a Veterans Affairs Fileman Date.

8. Each patient-record in the M-Technology database is instantly sorted when the record enters the computer. The sort is optimized as a BINARY TREE, as shown below, in the background, when the computer system has unused computing cycles.
                               ________ Abraham
                               |
                    ___________|
                    |          |
     Before Mzzz    |          |_______ Campbell
         ___________|
         |          |          ________ Jones
         |          |          |
         |          |__________|
         |                     |
 All     |                     |_______ Miller
 ________|                                         
         |                                        
         |                     ________ Norbert
         |                     |
         |                     |
         |          ___________|
         |          |          |
         |          |          |_______ Rogers
         |          |
         |__________|
     After N        |
                    |          ________ Smith
                    |          |
                    |          |
                    |__________|
                               |
                               |_______ Zachary
In this example, it requires only three decisions to find JONES in this sorted decision tree, whereas a random search of the eight names, in the worst case, could require eight decisions. This doesn't seem like much of a difference in a fast computer, but consider the difference in a list of 5,000,000 veterans in the VA system. Here we are speaking of a difference between 5 million decisions against 23 decisions (=log25,000,000) on a sorted, binary decision tree. Multiply this by different encounter-dates, different laboratory tests, etc., and you have a noticeable difference in computer speed and efficiency.

Historically, the patient would enter the medical institution from the emergency room, and enter several other divisions of the hospital before his initial records and laboratory tests would catch up with him/her, all because of slow sorting.

This feature, namely, instant sorting and resorting in background, is a unique feature of M-technology, and one of the main reasons why other computer systems are unable to manage systems as complex as the VA medical record system. Remarkably, the executives that pay for patient record systems don't seem to "get it", and continue to purchase systems without the instant sorting and resorting in background feature.

9. The sorted index remains available (i.e., doesn't have to be reloaded) the next time one queries the index, so-called PERSISTENT OBJECTS.

10. M-technology has a feature, unheardof in other commercial programming environments, of LEADING SUBSTRING MATCH. To look up the record for VETERAN,JOHN Q, it suffices to query for VETERAN,JO..., in case the patient's name is listed in the system is VETERAN,JONATHON or VETERAN,JOSEPH. M-technology/VistA® also supports ALIASES, or alternate names for the same person. In fact, you don't even have to have the leading substring correct, since M-technology supports SUCCESSOR SUBSTRING MATCH. In the example, it suffices, say, to look for VETERAN,JN.... The system looks down through all the VETERAN,JNs, until it reaches VETERAN,JOHN Q, VETERAN,JONATHON, VETERAN,JOSEPH, etc.

It is amazing how often that entry clerks need to enter slight variants of a patient's name in order to find the patient on the system. It is a nightmare to untangle records that have been entered hastily for the same patient under different names. And it is unethical and unreasonable to deny care to a patient who enters the emergency room, say, in extremis, without all his/her proper identification paperwork. The VA is much better at patient identification than some community hospitals, where the computer system does not support these nuances.

11. It is easy to learn to write simple program code in M-technology. Software programming projects that require a multiperson programming staff and months of effort for composing a database in C or Oracle, can be prototyped in a day by a skilled M-programmer.

11.1 Here is a sample MUMPS program that sorts a list of patient names.
ROUTINE ^PATLIST.2A
PATLIST  ;GWM,SORT PATIENT LIST,,;20APR04 3:18PM;;
ENTRY    ; ENTER PATIENT NAMES.
         S ^P("CAMPBELL")="" ;
         S ^P("MILLER")="" ;
         S ^P("ABRAHAM")="" ;
         S ^P("NORBERT")="" ;
         S ^P("JONES")="" ;
         S ^P("ZACHARY")="" ;
         S ^P("ROGERS")="" ;
         S ^P("SMITH")="" ;
         ; INITIALIZE SUCCESSOR LIST ;
         S O=-999999 ;
         ; PRINT SUCCESSOR LIST ;
READ     S O=$O(^P(O)) G:(O="") EXIT W " ",O G READ ;
EXIT     H  ;
                    
*D ^PATLIST
 ABRAHAM CAMPBELL JONES MILLER NORBERT ROGERS SMITH ZACHARY
The commands are: Set. Write. Go-to. Go-to: (i.e., conditional go-to). Halt. Comment begins with ; The beginning of the successor list is negative-infinity, here denoted as O=-999999. The end of the successor list is null-string, here denoted as O="".
Note that, although the list was entered in arbitrary order, the list is returned to the printer in alphabetical order (so-called COLLATION SEQUENCE). There is no specific SORT command in MUMPS. Just enter the data, and the list is immediately sorted. This is one of MUMPS's most powerful features.

12. M-technology is ignored by academic computer scientists (too easy to learn); and hated by businesses (same reason).

13. HIGH-LEVEL COMPUTER LANGUAGES: REVIEW.
1. BASIC: Beginners All-purpose Symbolic Instruction Code. Easy to learn, bundled free with your IBM-compatible operating system (MS-DOS ver. 5.0 or greater), aggressively marketed by MicroSoft.
2. VISUAL BASIC: Programming power of ordinary BASIC married to an easy-to-use graphics-user-interface (GUI). Price: $100. A fantastic value.
3. MUMPS: Recently renamed `M', to downplay its association with acute parotitis. Expensive ($250, no discounts), poor graphics, unless you pay extra. Very good for searching and sorting large text files. It's easy to write in M, and even easier to write sloppy code (`spaghetti code') that nobody else can understand. International Standards Organization (ISO) standard no 11456.
4. SmallTalk: The ultimate object-oriented computer language. Hard to learn, but once you know it, extremely powerful. If you learn by reading error messages, SmallTalk is not for you. The error messages in SmallTalk are inscrutable.
5. C: The language that does everything. Very fast. Hard to learn. Input/output and memory management features highly challenging.
6. COBOL: COmmon Business Oriented Language. A great leap forward in the late 1940s, pioneered by the late Commodore Grace Hopper, PhD, U.S. Navy. Still used in some businesses with antique management information systems (MIS) departments.
7. Java: The language that does everything on the internet. Very fast. Hard to learn. Input/output and memory management features highly challenging.
8. FORTRAN: FORmula TRANSlation. Pioneering scientific computer language used for writing number-crunching programs. Now virtually overtaken by BASIC.
9. PASCAL: Named after the seventeenth century Swiss mathematician and philosopher. Computer language with an anal-retentive personality disorder. Used mostly in university courses, to teach discipline through torture to rebellious computer science students.
10. LISP: So-called `thinking man's language of artificial intelligence'. Poor input-output capabilities. The programming language is a sea of parentheses, which will drive you crazy.
11. Solder: Alloy of lead and tin, used for connecting wires on a circuit board. Favorite programming language of the old-timers.

14. M LANGUAGE EMPTY STRING IN ARRAYS.
1. An empty string ("") represents the first element in the M (=MUMPS) collating sequence.
2. All other values sequentially follow the empty string.
3. However, the empty string cannot be added to an array. (it is only used as the starting and ending subscript values for the $O(rder) or $Q(uery) commands

15. M (FORMERLY, MUMPS) AS A DEVELOPMENT LANGUAGE.
1. MUMPS: Massachusetts General Hospital Utility Multi-programming System.
2. Invented by Neil Pappalardo and Curt Marble in the late 1960's.
3. Excellent language for utility programs, particularly involving sorting and string manipulation.
4. Promoters of M claim that M requires only 8 to 12% of the development time as other programming languages.

16. M (FORMERLY, MUMPS) FOR INDEXING TEXT.
1. M has especially powerful capabilities for string manipulation, sparse data arrays, and implicit sorting, all used heavily by the indexing software in hospital information systems, as well as the indexing software used by the `Lightning Hypertext'.
2. M also has a unique capability for modifying its own code `on the fly', i.e., an M computer program can rewrite itself during execution, based on data received or calculations.
3. M is not particularly good for making `brute force' repetitive numerical calculations, such as predicting lunar orbits or making quantum mechanics estimations

17. M ROUTINE.
1. A `routine' is an M program module.
2. To create a software application in M, the programmer creates and edits one or more routines.
3. One routine can call another routine, using a DO or GOTO command.
4. The programmer writes a routine in `source code'.
5. Source code is compiled by the system to obtain faster-running `object code' in machine language.

18. M JOB OR PROCESS.
1. M was originally designed as a multitasking system, with multiple `jobs' or `processes' running simultaneously.
2. Each user `owns' his/her own keyboard and monitor, and controls an `active job' running on that keyboard-and-monitor.
3. Any number of additional `background jobs' may be running simultaneously.
4. Actually, the computer allots short time intervals to each job, and executes a small part of each job in a round-robin-sequence.
5. Each job occupies a unique memory area, called a `partition', and has its own, unique `process identification', contained in the $JOB variable.

19. M LOCAL AND GLOBAL VARIABLES.
1. An M variable is a named item, for example, A(1)=2. In this example, `1' is the `argument' and `2' is the `value'. Arguments and values may be either numeric or character-strings.
2. M variables may be `local' or `global'.
3. A `local variable' is a named memory item, specific to the job which created it. It disappears when the job terminates.
4. A `global variable' is a named database record, accessible to all jobs running in the system.
5. A global variable may be seized by a LOCK command. Then any other job seeking access to that global variable must wait until the first user unlocks that global variable.

20. M DEVICES.
1. All sequential input/output is directed through `logical devices', numbered from 0 through 255.
2. Devices include the user's keyboard (input device 0), the video-monitor (output device 0), printer ports, RS-232 serial ports, floppy-disk and hard-disk files, etc.
3. The OPEN and CLOSE commands turn a device on or off.
4. The USE command directs input/output to a particular device.
5. The READ and WRITE commands perform input and output through the device which has most recently been USEd.

21. M (FORMERLY, MUMPS) AS A COMPUTER LANGUAGE.
1. M has only a handful of operators, commands, and functions, from which highly sophisticated programs can be written.
2. If the programmer memorizes only 30 terms, he or she can begin writing serviceable M applications.
3. M does not check for correct syntax of a statement until the program actually tries to execute that statement.
4. This feature handy for `quick and dirty' debugging and for rapid prototyping, because one is not forced to write an entirely syntactically correct program in order to test early versions of the program

22. M VARIABLE-ARRAYS.
1. There is only one variable-type in M: the character-string.
2. A variable may be single or an array. An `array' has a `name', 1 or more `arguments', and a `value'. For example, `A(1,2,3)=17' is 1 element from an array named `A()', with arguments 1,2,3 and value 17.
3. M arrays have the form of a hierarchical tree. For example:
A(1)=2
A(1,2)=5
A(1,2,3)=17
A(1,2,6)=14
4. Unlike FORTRAN or BASIC, M does not require a DIMENSION statement for preassigning storage space for a variable. If you `SET A(1,2,3)=17', then A(1,2,3) exists; if you `KILL A(1,2,3)', then A(1,2,3) ceases to exist.

23. M OPERATORS.
There are only a few operators, commands, and functions necessary to write a functional M program. The operators are:
          + - plus                     ' - not
          - - minus                   ampersand symbol - and
          * - times                    exclamation point symbol - or
          / - divided by               : - if
          \ - divided by,              ; - end-command-line
               truncate to integer
                                       [ - contains
                                       ] - follows


24. M COMMANDS.
Important M commands include:
          B - Break.                    O - Open.
          C - Close.                    Q - Quit.
          D - Do.                       R - Read.
          E - Else.                     S - Set.
          F - For.                      U - Use.
          G - Go to.                    W - Write.
          H - Halt.                     X - eXecute.
          I - If.
          K - Kill.


25. M FUNCTIONS.
Important M functions include:
          $A - ASCII.                  $O - Order.
          $C - Character.              $P - Piece.
          $E - Extract.                $R - Random.
          $F - Find.                   $S - Set.
          $G - Get.                    $T - Timeout/Truth.
          $H - Horolog (=clock).
          $J - Justify.


26. ANSI/ISO STANDARD M LIMITS.
1. ANSI = American National Standards Institute. ISO = International Standards Organization. MUMPS is a ANSI/ISO standardized computer language, with published language standards.
2. A string may not exceed 255 characters. In some commercial products, the string-limit may be much longer.
3. The length of the variable-name plus argument-string may not exceed 127 characters. For example, the following string has 59 characters: ARRAY("the quick brown fox","jumped over","the lazy dogs")
4. Every number is a valid argument. Every string containing only ASCII values between 32 and 126 is a valid argument. Numeric calculations are supported between 10-25 and 1025. Numeric precision is 12 decimal digits.
5. Variable names may be any length, but they must differ from 1 another within the first 8 letters. Variable names must start with % or an alphabetic letter (upper-case or lower-case).
6. Source code for each routine may not exceed 5 Kilobytes.
7. Maximum storage for local variables may not exceed 5 Kilobytes.

27. MGLOBAL M DEVICE NUMBERS.
1. For MGlobal M, the device numbers are as follows:
0. The user's own keyboard and monitor.
1. System console.
2. Parallel printer 1 (LPT1).
3. Parallel printer 2 (LPT2).
5. MS-DOS file access channel 1.
6. MS-DOS file access channel 2.
8. Serial port 1.
9. Serial port 2.
2. Different vendors have different device-number assignments, or allow the user to define his/her own device-numbers.

28. MGLOBAL M PROGRAM EDIT COMMANDS.
1. These edit commands are specific for MGlobal M. These commands can be used in programmer mode, i.e., in the ZE editor.
 B  ... Breaks the current line into 2 lines at the cursor.
 E  ... Moves the cursor to the end of the current line.
 .I ... Inserts lines when the cursor is put at the beginning of the line
            where you want the lines inserted.
 .Q ... Quit without filing.
 .F ... File (save) the program.
 .R ... Remove line from program.
2. Other M vendors have other editing commands.

29. CONTROLLING M ROUTINES DURING EXECUTION.
1. S or B suspends the routine.
2. Q starts it up again.
3. C interrupts a routine so that you can't continue it again.



REFERENCES.


1. Berman JJ, Moore GW, et al.
The Lightning Hypertext of Disease.
http://www.pathinfo.com

2. Brown DG, Brown G, Goldstein M.
Introduction to CCS MUMPS.
COMP Computing, Inc. 1601 Westheimer, Suite 201, Houston, Texas 77006, 1985;:1-99.

3. DataTree, Inc.
DataTree MUMPS-PC System Overview, v. 4.2.
DataTree, Inc., 300 Fifth Ave, Waltham, MA 02154 1-617-890-1620, 1991;:10-13.

4. DataTree, Inc.
DataTree MUMPS Language Reference, v. 4.2.
DataTree, Inc., 300 Fifth Ave, Waltham, MA 02154 1-617-890-1620, 1991;:16-17.

5. Walters RF, Bowie J, Wilcox JC.
Mumps Primer. M Technology Association,
1738 Elton Road, Suite 205, Silver Spring, MD 20903-1725.

6. Kirsten W.
Von ANS MUMPS zu ISO/M.
epsilon Verlag, Darmstadt Hochheim. 1993;:47-84.

7. Dvorak JC.
Inside track.
PC Magazine. 1991 May 28;:83. Volume in drive C has no label. Volume Serial Number is 5868-509D

8. Free MUMPS/Fileman, at URL:
http://www.hardhats.org
For a free, single-user demo version of VistA/MUMPS, go to the above website. At the top of the home webpage, you will see:
Search | HOME | MUMPS | Fileman | .......
Click on MUMPS. Scroll down to:
Learn M and FM v21 for free.
Click on Download, and save the file, fm_ws.zip. Use PKUNZIP.EXE, in order to UNZIP the ZIPped file, fm_ws.zip.
 Directory:
                  
 02/23/2005  11:03 AM              .
 02/23/2005  11:03 AM              ..
 01/26/1999  09:34 AM        10,240,000 FILEMAN.M
 01/24/1999  02:29 AM           159,744 FM.EXE
 02/23/2005  11:00 AM         3,210,667 fm_ws.zip
 01/25/1999  03:13 PM         2,450,704 MSMWS002.DLL
 02/01/1993  01:04 AM            28,959 PKUNZIP.EXE
 01/26/1999  09:49 AM               287 README.TXT
                6 File(s)     16,090,361 bytes
                2 Dir(s)  22,820,491,264 bytes free
                            
 type readme.txt
 The FM.EXE file runs the VA's File Manager program from the data stored
 in the Fileman.m file.  A limitation of this current edition, is that the
 three files (msmws002.dll, fm.exe and Fileman.m) must be stored in the
 following path:
                            
  C:\Program Files\Micronetics\MSMWS\Program\




AUTHOR AND EFFECTIVE DATE.


Date last revised: 1/5/2004, Dong H. Lee, MD.

Signature and date approved:

Chief, Pathology and Laboratory Medicine Service (113)

Chief, Anatomic Pathology Section

Supervisor, Histology and Cytology