The Johns Hopkins Autopsy aRchive
Information System (JHARIS).
PROCEDURE MANUAL.
DRAFT COPY ONLY.
6/8/2006.
http://www.netautopsy.org/jharispm.htm

G. William Moore, M.D., Ph.D,
Grover M. Hutchins, M.D.



Program Source Code: http://www.netautopsy.org/jharis.cgi
Procedure Manual: http://www.netautopsy.org/jharispm.htm
JHAR Bibliography: http://www.netautopsy.org/jharpubl.htm


1. DISCLAIMER.



United States Government Work, uncopyrighted, public-domain, DRAFT COPY ONLY. This document does not necessarily represent the views or policies of any United States Government agency. This document is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. In no event shall the authors be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of, or in connection with the document or the use or other dealings made with the document..

2. TABLE OF CONTENTS.



1. Disclaimer.
2. Table of Contents.
3. Introduction.
4. Getting Started.
5. Re-installation.
6. Autopsy Facesheet Format.
7. Output Report Format.
8. Moby Dick.
9. Barrier Words.
10. Perl Source Code.
11. MUMPS Source Code.
12. File Transfer Protocol (FTP).
13. References.
14. Glossary.

3. INTRODUCTION.



3.1. The Johns Hopkins Autopsy aRchive Information System (JHARIS) is a collection of computer programs that indexes autopsy facesheets for the first 50,000 autopsies listed in the files of The Johns Hopkins Autopsy Resource. This instruction manual and all associated computer programs described herein are in the public domain, and may be copied freely. Academic citation is desirable as a courtesy, but not required. The actual autopsy facesheet files, and the index to these files, are unavailable to the public.

The design of the system is very simple in principle. There are three datafiles: jharuprt.txt, i.e., the text-listing (upright file) of all autopsy facesheets; jharkeyy.txt, i.e., autopsy demographics; and jharindx.txt, i.e., the autopsy facesheets indexed by word. The upright text file, jharuprt.txt, and autopsy demographics, jharkeyy.txt, are supplied as raw data; and the index file, jharindx.txt, is generated by the MUMPS database system (vide infra).

To illustrate the mechanics of the computer programs, an example is employed using Herman Melville's Moby Dick, which is published and in the public domain. The materials used in the present manual are from Project Gutenberg. There is a parallel upright file, named mobyuprt.txt; and a parallel index file, named mobyindx.txt.

4. GETTING STARTED.



4.1. You should have a folder-icon, named c:\jharis, already installed on your c:-drive, that contains at least the following sixteen files:
1. jharis.cgi (Perl program to perform search).
2. jharispm.htm (this manual).
3. jharuprt.txt (upright autopsy facesheets)
4. jharindx.txt (autopsy facesheets indexed by word).
5. jharkeyy.txt (autopsy demographics).
6. jharrprt.htm (search output report file).
7. perl.exe. (Perl compiler).
8. perl100.dll. (Perl compiler).
9. perlglob.exe. (Perl compiler).
10. mumps.exe. (MUMPS compiler).
11. mumpscfg.txt. (MUMPS compiler).
12. mumpsglb.dat. (MUMPS compiler).
13. ws_ftp.exe (File_transfer_protocol program).
14. ws_ftp.ext (File_transfer_protocol program).
15. ws_ftp.hlp (File_transfer_protocol program).
16. ws_ftp.ini (File_transfer_protocol program, including password).
If one or more of these files are either missing or corrupted, then all files, except for files jharuprt.txt, jharkeyy.txt, jharindx.txt, and ws_ftp.ini, may be downloaded from the internet. If the confidential files, namely, jharuprt.txt (upright autopsy facesheets), jharkeyy.txt (autopsy demographics), jharindx.txt (autopsy facesheets indexed by word), or ws_ftp.ini (password to internet download), are missing, then these files must be reloaded from offline backup copies on CD-ROM.

You should have an additional folder-icons on your desktop: c:\jharis\jharispm.htm, that displays this procedure manual; and c:\jharis\jharrprt.htm, that displays the search-report.

4.2. To execute a search, click on the folder-icon, JHARIS. A black screen with a white cursor will appear. Enter:
perl jharis.cgi
You will be prompted:
Enter up to twenty search-words:
Please enter search-word ==>
Please enter search-word ==>....

Enter search line distance ==>
List Excluded Cases? N ==>

Please enter EXCLUDED sex ==>
Please enter EXCLUDED race ==>
Please enter EXCLUDED lower age in years ==>
Please enter EXCLUDED upper age in years ==>
Please enter EXCLUDED lower autopsy number ==>
Please enter EXCLUDED upper autopsy number ==>
For example, you may use ADENOSQUAMOUS as a search term. The final report may be viewed and printed by clicking on the JHARRPRT icon.

The program works by first finding all index numbers that begin with the FIRST SEARCH WORD. If more than one search word is used, then it is advisable to enter the LEAST COMMON WORD as the first search-word, so that the program doesn't waste a lot of time and compute-cycles thrashing through undesired cases. For example, if one desires ADENOSQUAMOUS CARCINOMA, then ADENOSQUAMOUS (8 occurrences) should be the first search term, NOT CARCINOMA (5248 occurrences).

A search word may be a leading substring; for example, the search-word UTER captures all cases containing one-or-more occurrences of uterine, uteroabdominal, uterocolonic, uteroileal, uteropelvic, uteroperitoneal, uteroplacental, uteroplasty, uteropyelonephritis, uterorectal, uterosigmoidostomy, uteroureterostomy, uterovesical, uterus, as well as a few misspellings.

The prompt Enter search line distance ==> allows the user to limit multiple-word searches. For example, a search for SCLEROSIS MULTIPLE yields 1818 cases, of which only 37 cases are relevant for additional review. The search may be limited by restricting the printout so that SCLEROSIS and MULTIPLE must occur on the same line, i.e., Enter search line distance ==> 0. For search terms at most n lines apart, Enter search line distance ==> n. The default search_line_distance is 999,999.

You may EXCLUDE sex (M or F; race (B or W); and any lower or upper range of age in years or autopsy number.

If one wishes to display the case-numbers for excluded cases, then one should answer the prompt as follows: List Excluded Cases? N ==> YES. The default is NO.

5. RE-INSTALLATION.



5.1. In the event of a disk-drive crash or other problem on your computer, you should be able to re-install JHARIS from either the back-up CD-ROM or else from an on-line source. If the on-line source is available:
 Click on the JHARIS icon on your desktop.
 The black MS-DOS screen will appear.
 Enter
                ws_ftp
 followed by <ENTER>
 The Session Profile Box will appear.
 Click on OK.
             
 A four-part dialog box should appear.
 Click on OK, upper right corner.
                   
 If there is a JHMI security box, just click on OK to get rid of it.
                   
 In the upper-right corner, roll down the box, and double-click on
                *****************
 In the upper-right corner, double-click on
                *****************
 In the lower-RIGHT corner, double-click on each of the filenames,
 to transfer them into your computer.
                   
 When you are finished, exit the dialog box by clicking X
 in the upper-right corner.
             
 Click back into the JHARIS icon, and enter
               perl jharis.cgi
 followed by <ENTER>
5.2. If you are starting completely from scratch, then you should set up a special workspace on your computer, named c:\jharis, available from the Microsoft® Desktop® as a Command Prompt icon (i.e., MS-DOS window). Proceed as follows:
  1.  Click on: Start (lower left corner of Microsoft® Desktop®).
  2.  Click on: All programs.
  3.  Click on: Accessories.
  4.  RIGHT-CLICK on Command Prompt.
  5.  Click on: Send to.
  6.  Click on: Desktop.
  7.  RIGHT-CLICK on Command Prompt.
  8.  Click on: Rename. Rename as jharis.
  9.  RIGHT-CLICK on JHARIS
  10.  Click on: Properties
  11.  Click on: Shortcut (tab at top of window).
  12.  Click on: Start in. Enter: c:\jharis
  13.  Click on: Apply (lower right corner of window).
  14.  Click on: OPTIONS (tab at top of window).
  15.  Cursor size: LARGE
  16.  Buffer size: 50
  17.  Number of Buffers: 4
  18.  Display options: FULL SCREEN
  19.  Edit options: X Quick Edit. X Insert.
  20.  Click on: Apply (lower right corner of window).
  21.  Click on: FONTS (tab at top of window).
  22.  Size: 36
  23.  Lucida console.
  24.  Click on: Apply (lower right corner of window).
  25.  Click on: LAYOUT (tab at top of window).
  26.  Screen buffer size: Width 80, Height 25.
  27.  Screen buffer window: Width 80, Height 25.
  28.  Click on: OK (lower left corner of window).
  29.  Click on: Apply (lower left corner of window).


6. AUTOPSY FACESHEET FORMAT.



6.1. There are three datafiles used by JHARIS: Upright autopsy facesheets are stored in file: c:\jharis\jharuprt.txt. Autopsy demographics are stored in file: c:\jharis\jharkeyy.txt. Autopsy facesheets indexed by word are stored in file: c:\jharis\jharindx.txt.

All lines of all three data files begin with CARRIAGERETURN LINEFEED (i.e., ASCII 13, ASCII 10: MS-DOS style, not Unix style). The first line of each autopsy upright facesheet begins with ##### followed by a 5-digit autopsy number, with leading zeros if necessary. For example, autopsy upright facesheet for autopsy number 02345:
#####02345
  A/D: 
  Acute pleuritis, right lung
  Acute and chronic inflammation, right lung, partially intrabronchial.
  Pulmonary edema, marked.
  Acute bronchitis and pneumonitis, bilateral lungs.
  Acute and chronic inflammation, diffuse, right kidney.
  Arteriosclerosis, generalized.
  Arteriosclerotic nephritis, left and right kidneys.
  Cholelithiasis, cholesterol stones.
  Colonic diverticulosis, moderate, descending and sigmoid colon.
  Acute emphysematous cellulitis, left arm. 
6.2. Autopsy demographics are stored in file: c:\jharis\jharkeyy.txt. For example:

 54321^^1234567890123^12345^64^W^M^1966^1
is the demographic line for case 12345, who is a 64-year-old White Male.

6.3. Autopsy facesheets indexed by word are stored in file: c:\jharis\jharindx.txt For example:

### aaa 12345 12346 12360 12361
### aabc 67890
### aamc 12346
### abandoned 23456
### abandonment 34567
### abatement 45678
### abating 98765
### abbott 87654
### abbreviated 76543
### abdomen 12357 12349 65432 66445 67456 68467 69478...............
### abdominal 25432 35545 45656 55767 65878
### abdominis 54532
### abdomino 34532 36833 37934 38135 39236 ...............
### abdominopelvic 65432 
### abdominoperineal 54332 
### abdominoperitoneal 43732 
### abdominothoracic 46732 47732 
### abdominovesical 53833 63934 ...............
In this example, the word aaa appears in cases 12345 12346 12360 12361; the word abandoned appears in case 23456; the word abandonment appears in case 34567, etc.

7. OUTPUT REPORT FORMAT.



7.1. The final output report is stored in file: c:\jharis\jharrprt.htm, and is written in HyperText Markup Language (HTML). HTML is the presentation language of the Internet. These HTML files may reside transparently anywhere in cyberspace, or be restricted to one's own personal computer.

It is easy to learn justs enough HTML, from which one can build simple files for presentation. One becomes more fluent in HTML as one examines the source code for attractive files already available on the Internet. To see the source code for any HTML file displayed on your computer monitor, click on ...........

According to legend, the internet began as a telephone connection between the MIT   Laboratory of Artificial Intelligence and the Pepsi vending machine in an adjoining building, so that the computer students didn't have to leave their desks if the Pepsi vending machine was empty. The Pepsi vending machine even included a thermocouple, so that the computer students would know that their Pepsi would be cold.

7.2. The enormous growth of the internet over the past decade is largely due to the WORLDWIDE WEB, an inexpensive means for exchanging information on the internet. Originally, the internet consisted of an uninterrupted telephone connection between the main computer and the smaller, client computer, and was limited by the number of telephone connections that the main computer could manage at the same time. This was an inefficient arrangement, since computers run much faster than

The worldwide web is designed like a chess master that plays many simultaneous games of chess with a room full of amateurs. The master spends a few moments at each amateur chessboard, then moves to the next amateur chessboard. On the worldwide web, each client computer is continuously connected to the main computer; but the main computer only processes information from a particular client momentarily, then moves onto the next client. When the client clicks on SUBMIT, then the main computer processes the information, sends a file back to the client, and then moves onto other clients. The currency of these transactions is HTML. The main computer sends an HTML file to the client; the client clicks on SUBMIT; then the main computer sends an appropriate response back to the client as another HTML file, etc. The purpose of a Perl program on the main computer is to receive a client SUBMIssion, and to build a return HTML file.

8. MOBY DICK.



8.1. Moby Dick is a nineteenth century American classic, studied by most U. S. high school students. ................

8.2. Public domain document, broken into xxxx, numbered paragraphs. http://www.netautopsy.org/mobyuprt.txt 8.3. The book is approximately the size of 1000 JHMI autopsy facesheets.

9. BARRIER WORDS.



9.1. STOPWORDS or BARRIER WORDS are very frequent words, typically articles, prepositions, and auxiliary verbs, that are unimportant for indexing major concepts. The concept of very frequent but unimportant words was first introduced by biblical scholars in the 19th century, with the publication of James Strong's [1822-1894] Exhaustive Concordance of the Bible (1890). The concept of word frequencies was further developed by French telegraphic engineer Émile Baudot [1845-1903]; and by E. U. Condon (1928). Prof. George K. Zipf [1902-1950], a Harvard professor of philology, proposed that these high-frequency words satisfied the "Principle of Least Effort". These words are spoken often because it takes less effort to say short words. If words are listed in descending order by frequency, then the most frequent word has rank 1; the second-most frequent word has rank 2, etc. Zipf's First Law is the assertion that f is inversely proportional to r, where f is word frequency and r is word rank. For the first 50,000 autopsy facesheets of The Johns Hopkins Autopsy Resource, the word distribution is as follows:

Zipf's First Law:
50,000 Autopsy Facesheets.
The Johns Hopkins Autopsy Resource.




According to Zipf's First Law, the log-log plot of the word distribution for 50,000 Autopsy Facesheets should be a straight line:

Zipf's First Law:
50,000 Autopsy Facesheets.
The Johns Hopkins Autopsy Resource.




Zipf Distribution:
50,000 Autopsy Facesheets.
The Johns Hopkins Autopsy Resource.




In 1988, Tersmette et al employed the barrier word method for rapid identification of short noun phrases in free text (vide infra).

From: http://ii.nlm.nih.gov/MTI/barrier.shtml
"The barrier word method is a fast way of identifying short noun phrases in free text. The text is parsed into sentences, where a sentence is computed as a set of words beginning with a capital letter and delimited by terminating punctuation. A potential nominal phrase is computed as a sequence of words occurring between barrier words, which are derived from a set of stopwords including articles, prepositions, and verbs. For example, consider the text: The local anesthetic bupivacaine is cardiotoxic when accidentally injected into the circulation. The set of barrier words might be used to identify local anesthetic bupivacaine, cardiotoxic, and circulation as nominal phrases. While this method has been used for some time, the use of a very long list of barrier words (approximately 24,000) was found to be much more effective in identifying nominal phrases in text than the traditional shorter lists."
JHARIS Barrier Words:
http://www.netautopsy.org/jharisbw.htm

10. Perl SCRIPT



10.1. One of the most versatile web resource-languages is Perl, which is widely available on the worldwide web cost-free. Perl has very power character-string manipulation features. As with learning English as a second language, it's easy to learn just enough Perl to handle a few elementary problems; in-depth understanding can take a lifetime. The JHARIS Perl source code uses only thirty commands. They are:
X=Y ... Active equal. This expression means: set X equal to the value of Y.
X==Y ... Conditional equal. This expression is TRUE iff X equals Y.
X>Y ... Conditional greater_than. This expression is TRUE iff X is greater than Y.
X<Y ... Conditional less_than. This expression is TRUE iff X is less than Y.
$X ... Variable named X.
@X ... Array named X.
$X[1] ... First element in array @X
$X[2] ... Second element in array @X, ....
print "Now is the time" ... Print "Now is the time" onto the computer screen.
print FILE "Now is the time" ... Print "Now is the time" into the file named FILE.
open(FILE) ... Open the file named FILE.
close(FILE) ... Close the file named FILE.
binmode
seek
tell
close(FILE)
split
join
substr
$line =
chop
if(TRUEFALSE){___} ... If expression TRUEFALSE is true, then perform the operations ___.
while(TRUEFALSE){___} ... While expression TRUEFALSE is true, then perform the operations ___.




10.2. Perl is nicely explained at URL:
http://virtual.park.uga.edu/humcomp/perl/perl5.html
You may download a cost-free copy of Perl at:
http://www.activestate.com/Products/ActivePerl/?_x=1
This website can also be found on google.com, search on: PERL DOWNLOAD. When you get to the ActiveState.com website, click on FREE DOWNLOAD. Follow the download and installation instructions. Under Microsoft® Windows®, I recommend the MSI version, which allows itself to be uninstalled if you turn out not to like the product. Follow the instructions on the installation wizard. 10.3. As an example, you may use the following sample Perl source code, named hllowrld.pl:
#!/usr/bin/perl
"Content-type: text/html\n\n";
###
### PRINT PAGE HEADER.
print qq| <head><title>HELLO WORLD.</title></head><body>|;
print qq|\n <!-- Last modified: 6/2/2006, G. William Moore, MD, PhD.-->|;
print qq|\n <h2><center>HELLO WORLD.</center></h2>|;
###
### END JOB.
print qq|\n<br><hr> Last modified: 6/2/2006, G. William Moore, MD, PhD. |;
print qq|\n <br></body></html>\n\n |; exit;
In order to run this sample program, SELECT this text by dragging your mouse over the text while left-mouse-clicked. Then EDIT/COPY the text into your clipboard, and click on your NOTEPAD ICON. Now EDIT/PASTE the clipboard contents into the notepad file, and SAVE_AS... the file as hllowrld.pl. Click on START from your desktop. Click on RUN. Enter perl -w hllowrld.pl into the dialog box. In Windows, the resulting HTML page flashes quickly onto the screen. In MS-DOS, you can watch the screen at your leisure.

10.4. Perl source code for program for JHARIS uses the following six general functionalities:
1. Housekeeping: start, end, comments.

2. Variables, arrays.

3. Arithmetic: ordinary, Boolean.

4. Files, including <STDIN>.

5. while(){} -loops, if(){} -statements.

6. Character-string functions: substr(,,) , split(,) , join(,) .
10.4.1. HOUSEKEEPING: START, END, COMMENTS.
 #!/usr/bin/perl
 print "Content-type: text/html\n\n";
 ### INSERT PROGRAM AND COMMENTS HERE.....
 exit;


10.5. Perl source code for JHARIS: The following three files: jharuprt.txt (upright autopsy facesheets), jharkeyy.txt (autopsy demographics), and jharindx.txt (autopsy facesheets indexed by word), are the data-input for execution for the Perl program, jharis.cgi
#!/usr/bin/perl
print "Content-type: text/html\n\n";
### 
### JOHNS HOPKINS AUTOPSY ARCHIVE INFORMATION SYSTEM.
### THIS DOCUMENT HAS NO OFFICIAL STATUS.
###                
### DISCLAIMER. United States Government Work,
### uncopyrighted, public-domain, DRAFT COPY ONLY.
### This document does not necessarily represent the views
### or policies of any United States Government agency.
### This document is provided "as is", without warranty of any kind,
### express or implied, including but not limited to the warranties
### of merchantability, fitness for a particular purpose and
### non-infringement. In no event shall the authors be liable
### for any  claim, damages or other liability, whether in an
### action of contract, tort or otherwise, arising from, out of,
### or in connection with the document or the use or other dealings
### made with the document.
###                       
### CONTROL CHARACTERS AND CONSTANTS.
 $carriagereturn=chr(13); $linefeed=chr(10); $bksp=chr(32);
 $crlf=join('',$carriagereturn,$linefeed);
 $crlfps="$crlf#"; $crlfpss="$crlf### ";
 $crlfpsfv="$crlf#####"; $psfv="#####"; $upar=chr(94);
###                   
### OPEN JHAR REPORT FILE.
 $nshowaut=0;
 $jharrprt=">jharrprt.htm"; open(JHARRPRT,$jharrprt);
### 
### PRINT HEADER.
 $prln="<html><head><title>JHARIS: Johns Hopkins Autopsy aRchive, 6/2/2006.</title></head><body>";
 print $prln; print JHARRPRT $prln; 
 $jharuprt="jharuprt.txt"; $sizeuprt= -s $jharuprt ;
 $prln="\n<!-- Last modified: 6/2/2006, G. William Moore, MD, PhD.-->";
 print $prln; print JHARRPRT $prln; 
 $prln="<h2><center>JHARIS: Johns Hopkins<br>Autopsy aRchive<br>Information System.";
 print $prln; print JHARRPRT $prln; 
 $prln="\n<br><a href=\"http://www.netautopsy.org/gwmpmbio.htm\">G. William Moore, MD, PhD.</a>";
 print $prln; print JHARRPRT $prln; 
 $prln="\n<br><a href=\"http://www.netautopsy.org/gmhpmbio.htm\">Grover M. Hutchins, MD.</a>";
 print $prln; print JHARRPRT $prln; 
 $prln="\n<br>6/2/2006. "; print $prln; print JHARRPRT $prln; 
 $prln="</center></h2>"; print $prln; print JHARRPRT $prln; 
 $prln="<br><br><big><b>Program Source Code: <a href=\"http://www.netautopsy.org/jharis.cgi\">
 http://www.netautopsy.org/jharis.cgi </a> ";
 print $prln; print JHARRPRT $prln; 
 $prln="<br>Procedure Manual: <a href=\"http://www.netautopsy.org/jharispm.htm\">
 http://www.netautopsy.org/jharispm.htm </a></b></big> ";
 print $prln; print JHARRPRT $prln; 
### 
### ASK FOR SEARCH WORDS.
 $nask=0; $nsearch=0; print "\n Enter up to twenty search words.";
 print "\n Enter the least common search word first:";
 while($nask<20){$nask++;  print "\n Please enter search word ==> ";
   $inputline=<STDIN>; chop($inputline); $linl=length($inputline);
   $lcin=lc($inputline); if($linl>2){$nsearch++; $search[$nsearch]=$lcin;};
   if($linl<3){$nask=99999;};};
### 
### IF NO SEARCH WORDS ENTERED, JOB TERMINATED.
 if($nsearch<1){$prln="\n NO SEARCH WORDS ENTERED. JOB TERMINATED.";
   print $prln; print JHARRPRT $prln; 
   $prln="\n<br><hr> Last modified: 6/2/2006, G. William Moore, MD, PhD.";
   print $prln; print JHARRPRT $prln; 
   $prln="\n <br></body></html>\n\n";
   print $prln; print JHARRPRT $prln; close(JHARRPRT); exit;};
### 
### ASK FOR SEARCH LINE DISTANCE.
 $linedistmax=1000000;
 if($nsearch>1){print "\n Enter search line distance ==> ";
   $inputline=<STDIN>; chop($inputline); $linl=length($inputline);
   if($linl>0){$linedistmax=$inputline-0+1;};};
 $ldm=$linedistmax-1;
### 
### LIST EXCLUDED CASES?
 $showsw=0; print "\n List excluded cases?  N ==> ";
 $inputline=<STDIN>; chop($inputline); $linl=length($inputline);
 $lcshow=lc($inputline); $sublcshow=substr($lcshow,0,1);
 if($linl>0){if($sublcshow eq "y"){$showsw=1;};};
 if($showsw>0){$prln="\n<br><b> Excluded cases listed. </b>";
   print $prln; print JHARRPRT $prln;};
 if($showsw<1){$prln="\n<br><b> Excluded cases not listed. </b>";
   print $prln; print JHARRPRT $prln;};
### 
### LIST SEARCH WORDS ENTERED.
 if($nsearch>0){$prln="\n<br><big> Search words entered: <b>";
   print $prln; print JHARRPRT $prln; $isearch=0;
   while($isearch<$nsearch){$isearch++; $ucsearch=uc($search[$isearch]);
     $prln=" $ucsearch "; print $prln; print JHARRPRT $prln;};
   $prln="</b></big>"; print $prln; print JHARRPRT $prln;
   if($nsearch>1){$prln="\n<br><b> Search line distance: $ldm </b>";
    print $prln; print JHARRPRT $prln;};};
### 
### ASK FOR EXCLUDED SEX.
 print "\n Please enter EXCLUDED sex ==> "; $inputline=<STDIN>;
 chop($inputline); $linl=length($inputline);
 $lcin=lc($inputline); $frlcin=substr($lcin,0,1);
 $exclsx="x"; if($frlcin eq "m"){$exclsx="m";};
 if($frlcin eq "f"){$exclsx="f";};
 if($exclsx eq "x"){$prln="\n<br><b> Both sexes are included. </b>";
   print $prln; print JHARRPRT $prln;};
 if($exclsx eq "m"){$prln="\n<br><b> Males are excluded. </b>";
   print $prln; print JHARRPRT $prln;};
 if($exclsx eq "f"){$prln="\n<br><b> Females are excluded. </b>";
   print $prln; print JHARRPRT $prln;};
### 
### ASK FOR EXCLUDED RACE.
 print "\n Please enter EXCLUDED race (W,B) ==> "; $inputline=<STDIN>;
 chop($inputline); $linl=length($inputline);
 $lcin=lc($inputline); $frlcin=substr($lcin,0,1);
 $exclrc="u"; if($frlcin eq "w"){$exclrc="w";};
 if($frlcin eq "b"){$exclrc="b";};
 if($exclrc eq "u"){$prln="\n<br><b> All races are included. </b>";
   print $prln; print JHARRPRT $prln;};
 if($exclrc eq "b"){$prln="\n<br><b> Blacks are excluded. </b>";
   print $prln; print JHARRPRT $prln;};
 if($exclrc eq "w"){$prln="\n<br><b> Whites are excluded. </b>";
   print $prln; print JHARRPRT $prln;};
### 
### ASK FOR EXCLUDED LOWER AGE.
 $lowerage=-1;
 print "\n Please enter EXCLUDED LOWER age in years ==> "; $inputline=<STDIN>;
 chop($inputline); $linl=length($inputline);
 if($linl>0){$lowerage=$inputline-0;};
 $prln="\n<br><b> Excluded lower age in years: $lowerage.</b>";
 print $prln; print JHARRPRT $prln;
### 
### ASK FOR EXCLUDED UPPER AGE.
 $upperage=200;
 print "\n Please enter EXCLUDED UPPER age in years ==> "; $inputline=<STDIN>;
 chop($inputline); $linl=length($inputline);
 if($linl>0){$upperage=$inputline-0;};
 $prln="\n<br><b> Excluded upper age in years: $upperage.</b>";
 print $prln; print JHARRPRT $prln;
### 
### ASK FOR EXCLUDED LOWER AUTOPSY NUMBER.
 $loweraun=-1;
 print "\n Please enter EXCLUDED LOWER autopsy number ==> ";
 $inputline=<STDIN>; chop($inputline); $linl=length($inputline);
 if($linl>0){$loweraun=$inputline-0;};
 $prln="\n<br><b> Excluded lower autopsy number: $loweraun.</b>";
 print $prln; print JHARRPRT $prln;
### 
### ASK FOR EXCLUDED UPPER AUTOPSY NUMBER.
 $upperaun=99999;
 print "\n Please enter EXCLUDED UPPER autopsy number ==> ";
 $inputline=<STDIN>; chop($inputline); $linl=length($inputline);
 if($linl>0){$upperaun=$inputline-0;};
 $prln="\n<br><b> Excluded upper autopsy number: $upperaun.</b>";
 print $prln; print JHARRPRT $prln;
### 
### INITIALIZE HIT ARRAY.
 $iaunhit=-1; while($iaunhit<50001){$iaunhit++;
   $aunhit[$iaunhit]=0; $thishit[$iaunhit]=0; $casepoint[$iaunhit]=0;};
### 
### LOAD DEMOGRAPHICS.
 print "\n Loading demographics....";
 $jharkeyy="jharkeyy.txt"; $sizejhar= -s $jharkeyy ;
 open(JHARKEYY,$jharkeyy); binmode JHARKEYY; seek(JHARKEYY,0,0);
 $/=$crlf; $idemo=0; $mychunk=<JHARKEYY>; chop($mychunk);
 while($idemo<59999){$idemo++; $mychunk=<JHARKEYY>; chop($mychunk);
 @uparspl=split(/\^/,$mychunk); $nuparspl=@uparspl; $aun=$uparspl[3]-0;
 if($aun<1){$idemo=999999;};
 if($aun>0){$age[$aun]=$uparspl[4]-0; $sex[$aun]=lc($uparspl[6]);
   $raceaun=lc($uparspl[5]); if($raceaun eq ""){$raceaun="u";};
   $race[$aun]=$raceaun;};};
 close(JHARKEYY);
### 
### OPEN JHARINDX FILE.
 $identifier=join('',$crlfpss,$search[1]); $searchchunk=$search[1];
 $jharindx="jharindx.txt"; $sizejhar= -s $jharindx ;
 open(JHARINDX,$jharindx); binmode JHARINDX; seek(JHARINDX,0,0);
 $/=$identifier; $mychunk=<JHARINDX>; $tellmy=tell(JHARINDX);
### 
### EXAMINE NEXT 300KB OF JHARINDX FILE FOR CASE NUMBERS.
 seek(JHARINDX,$tellmy,0); read(JHARINDX,$scalar,300000);
 @chkspl=split(/$crlfpss/,$scalar); $nchkspl=@chkspl; $ichkspl=-1;
 while($ichkspl<$nchkspl){$ichkspl++; $linebase=$chkspl[$ichkspl];
   @srchht=split(/$searchchunk/,$linebase); $nsrchht=@srchht;
   @linspl=split(/$bksp/,$linebase); $nlinspl=@linspl; $jlinspl=0;
   while($jlinspl<$nlinspl){$jlinspl++; $linepiece=$linspl[$jlinspl];
### 
### COLLECT NEXT AUTOPSY NUMBER.
     if($jlinspl<$nlinspl){$linenum=$linepiece-0;
       if(($ichkspl<1)||($nsrchht>1)){$aunhit[$linenum]=1;};};
### 
### STOP WHEN INDEX TERMS RUN OUT.
     if(($ichkspl>1)&&($nsrchht<2)){$ichkspl=2*$nchkspl;};};};
### 
### COUNT NUMBER OF HITS.
 $nrhit=0; $iaunhit=0;
 while($iaunhit<50001){$iaunhit++;
   if($aunhit[$iaunhit]>0){$nrhit++; $aunhit[$iaunhit]=$nrhit;
     $thishit[$nrhit]=$iaunhit;};};
### 
### CLOSE JHARINDX FILE.
 close(JHARINDX);
### 
### HEADER FOR CASE NUMBERS.
 $prln="\n<br><br><big><b>Raw Autopsy Numbers ($nrhit total): ";
 print $prln; print JHARRPRT $prln;
 $ithishit=0;
### 
### LIST CASE NUMBERS.
 while($ithishit<$nrhit){$ithishit++; $authishit=$thishit[$ithishit];
   $prln=" $authishit "; print $prln; print JHARRPRT $prln;};
 $prln="\n</b></big> "; print $prln; print JHARRPRT $prln; 
### 
### OPEN UPRIGHT FILE, $jharuprt="jharuprt.txt".
 $jharuprt="jharuprt.txt"; $sizeuprt= -s $jharuprt;
 open(JHARUPRT,$jharuprt); binmode JHARUPRT; seek(JHARUPRT,0,0);
 $iseekr=0; $multseeknr=5;
### 
### SKIP THROUGH UPRIGHT RILE IN CHUNKS OF 10KB.
 while($iseekr<4300){$iseekr++; seek(JHARUPRT,$multseeknr,0);
   read(JHARUPRT,$scalar,10150);
   @uparspl=split(/$crlfpsfv/,$scalar); $nuparspl=@uparspl; $iuparspl=0;
   while($iuparspl<($nuparspl-1)){$iuparspl++;
     $uparspli=$uparspl[$iuparspl]; $lnuparspli=length($uparspli);
     if($lnuparspli>10){$subi=substr($uparspli,0,5); $isub=$subi-0;
       if($isub>0){$casepoint[$isub]=$multseeknr;};};};
   $multseeknr=$multseeknr+10000;};
### 
### ASSEMBLE AUTOPSY CASE HEADER.
 seek(JHARUPRT,0,0); $irhit=0;
 while($irhit<$nrhit){$irhit++; $iaunhit=$thishit[$irhit];
   $aun=$iaunhit; $aunr=$iaunhit; $excludecase=0; $excldemo=0;
   $multseeknr=$casepoint[$iaunhit]; $paun=100000+$iaunhit;
   $qaun=substr($paun,1,5); $crlfpsfvqaun=join('',$crlfpsfv,$qaun);
   $ithishit=$irhit; $icrspl=0;
### 
### PRINT AUTOPSY NUMBER.
   $prln1="\n<br><br><big><b> $irhit. Autopsy: $crlfpsfvqaun, ";
   $ageaunr=$age[$aunr]; $sexaunr=$sex[$aunr]; $ucsexaunr=uc($sexaunr);
   $raceaunr=$race[$aunr]; $ucraceaunr=uc($raceaunr);
### 
### PRINT AUTOPSY AGE, RACE, SEX.
   $prln2= "Age: $ageaunr, Race: $ucraceaunr, Sex: $ucsexaunr.";
### 
### ERROR MESSAGES FOR EXCLUSION BY DEMOGRAPHICS.
### 
### ERROR MESSAGE FOR LOWER BOUND AGE.
   if($lowerage>$ageaunr){$prln3=" LOWER AGE EXCLUDED.";
     $excludecase=1; $excldemo=1;};
### 
### ERROR MESSAGE FOR UPPER BOUND AGE.
   if($upperage<$ageaunr){$prln4=" UPPER AGE EXCLUDED.";
     $excludecase=1; $excldemo=1;};
### 
### ERROR MESSAGE FOR LOWER BOUND AUTOPSY NUMBER.
   if($loweraun>$aunr){$prln5=" LOWER AUTOPSY NUMBER EXCLUDED.";
     $excludecase=1; $excldemo=1;};
### 
### ERROR MESSAGE FOR UPPER BOUND AUTOPSY NUMBER.
   if($upperaun<$aunr){$prln6=" UPPER AUTOPSY NUMBER EXCLUDED.";
     $excludecase=1; $excldemo=1;};
### 
### ERROR MESSAGE FOR EXCLUDED SEX.
   if($exclsx ne "x"){if($exclsx eq $sexaunr){$prln7=" SEX EXCLUDED.";
       $excludecase=1; $excldemo=1;};};
### 
### ERROR MESSAGE FOR EXCLUDED RACE.
   if($exclrc ne "u"){if($exclrc eq $raceaunr){$prln8=" RACE EXCLUDED.";
       $excludecase=1; $excldemo=1;};};
   $prln9="</b></big>";
### 
### SEEK HIT AUTOPSY CASE FROM UPRIGHT FILE.
   seek(JHARUPRT,$multseeknr,0); read(JHARUPRT,$scalar,15000);
   @xparspl=split(/$crlfpsfvqaun/,$scalar); $nxparspl=@uparspl; $ixparspl=0;
   $thatcase=$xparspl[1]; @thatspl=split(/$crlfpsfv/,$thatcase);
   $aurpt=$thatspl[0]; $lcaurpt=lc($aurpt);
### 
### HIT ON CASE NUMBER.
   @crspl=split(/$crlf/,$aurpt); $ncrspl=@crspl;
### 
### INITIALIZE LINE NUMBER INDEX.
   $icrspl=-1;
   while($icrspl<$ncrspl){$icrspl++; $isearchw=0;
     while($isearchw<$nsearch){$isearchw++;
       $linhit[$icrspl][$isearchw]=0;};};
### 
### ITERATE LINE-BY-LINE.
   $icrspl=0;
   while($icrspl<$ncrspl){$icrspl++; $crspli=$crspl[$icrspl];
     $xcrspli=$crspli; $lccrspli=lc($crspli); $isearch=0;
### 
### HIGHLIGHT SEARCH WORDS.
     while($isearch<$nsearch){$isearch++; $search1=$search[$isearch];
       @lnspl=split(/$search1/,$lccrspli); $nlnspl=@lnspl;
       if($nlnspl>1){$xcrspli="<big><b>$crspli</b></big>";
         $linhit[$icrspl][$isearch]=1;};};
### 
### SET UP PRINT LINE.
     $prlbl[$icrspl]="\n<br> $xcrspli ";};
### 
### DETECT MISSING WORDS.
   $exclmiss=0; $isearchw=0;
   while($isearchw<$nsearch){$isearchw++; $icrspl=0; $scrspl=0;
     while($icrspl<$ncrspl){$icrspl++;
       if($linhit[$icrspl][$isearchw]==1){$scrspl++;};};
     if($scrspl<1){$exclmiss=1;};};
   if($exclmiss>0){$prlna="<big><b> MISSING WORD EXCLUDED.</b></big>";
     $excludecase=1;};
### 
### CALCULATE LINE DISTANCES.
   $excldist=1; if($exclmiss<1){$linxz=0; $excldist=0;
   if($nsearch>1){$excldist=1; $isearchw=0;
     while($isearchw<$nsearch){$isearchw++; $linxi=999999; $icrspl=0;
       while($icrspl<$ncrspl){$icrspl++;
         if($linhit[$icrspl][$isearchw]==1){$jsearchw=$isearchw;
           if($jsearchw<$nsearch){
             while($jsearchw<$nsearch){$jsearchw++; $jcrspl=0;
               while($jcrspl<$ncrspl){$jcrspl++; 
                 if($linhit[$jcrspl][$jsearchw]==1){$dcrspl=$icrspl-$jcrspl;
                   if($dcrspl<0){$ecrspl=-$dcrspl; $dcrspl=$ecrspl;};
                   if($dcrspl<$linxi){$linxi=$dcrspl;};};};};
             if($linxi>$linxz){$linxz=$linxi;};};};};};
     if($linxz<$linedistmax){$excldist=0;};
     if($excldist>0){$prlnb="<big><b> LINE DISTANCE EXCLUDED.</b></big>";
       $excludecase=1;};};};
### 
### PRINT UNEXCLUDED AUTOPSY, LINE-BY-LINE.
  if($showsw>0){
    print $prln1; print JHARRPRT $prln1;
    print $prln2; print JHARRPRT $prln2;
    if($lowerage>$ageaunr){
      print $prln3; print JHARRPRT $prln3;};
    if($upperage<$ageaunr){$prln4=" UPPER AGE EXCLUDED.";
      print $prln4; print JHARRPRT $prln4;};
    if($loweraun>$aunr){$prln5=" LOWER AUTOPSY NUMBER EXCLUDED.";
      print $prln5; print JHARRPRT $prln5;};
    if($upperaun<$aunr){$prln6=" UPPER AUTOPSY NUMBER EXCLUDED.";
      print $prln6; print JHARRPRT $prln6;};
    if($exclsx ne "x"){if($exclsx eq $sexaunr){$prln7=" SEX EXCLUDED.";
        print $prln7; print JHARRPRT $prln7;};};
    if($exclrc ne "u"){if($exclrc eq $raceaunr){$prln8=" RACE EXCLUDED.";
        print $prln8; print JHARRPRT $prln8;};};
    print $prln9; print JHARRPRT $prln9;
    if($exclmiss>0){print $prlna; print JHARRPRT $prlna;};
    if($excldist>0){print $prlnb; print JHARRPRT $prlnb;};};
  if($excludecase<1){if($showsw<1){
      print $prln1; print JHARRPRT $prln1;
      print $prln2; print JHARRPRT $prln2;
      print $prln9; print JHARRPRT $prln9;};
### 
### NEXT INCLUDED AUTOPSY.  PRINT INCLUSION COUNT.
    $nshowaut++; $icrspl=0;
    $prln=" <big><b> INCLUSION COUNT: $nshowaut </b></big> ";
    print $prln; print JHARRPRT $prln;
### 
### PRINT TEXT OF INCLUDED AUTOPSY REPORT.
    while($icrspl<$ncrspl){$icrspl++; $crspli=$crspl[$icrspl];
      $prln=$prlbl[$icrspl]; print $prln; print JHARRPRT $prln;};};};
### 
### CLOSE JHARUPRT REPORT FILE.
 close(JHARUPRT);
### 
### END JOB.
 $prln="\n<br><hr> Last modified: 6/2/2006, G. William Moore, MD, PhD.";
 print $prln; print JHARRPRT $prln; 
 $prln="\n <br></body></html>\n\n";
 print $prln; print JHARRPRT $prln; 
 close(JHARRPRT); exit;


11. MUMPS SOURCE CODE.



11.1 MUMPS is a medically-oriented computer language, which is particularly suitable for handling large lists. The software is cheap ($100 for a single user license), compared to its competitors (Microsoft® Access® and Oracle®), each at least a thousand dollars.

MUMPS has two important features which have not been effectively copied by many of its competitors: IMPLICIT SORT and PERSISTENT COOKIES.

11.2. IMPLICIT SORT means that information read into a MUMPS database is sorted as soon as the data enter the database, i.e., in "real time". There is no separate sort operation in MUMPS. This feature of MUMPS is not widely appreciated by vendors of cheap software, for reasons that I have never understood. For software used in hospital laboratories, it means that, when new data are produced in the laboratory, there is no waiting time before the data are placed in the right position, and are thus available to their clinician-consumers in other parts of the medical institution.

For the present application, the implicit sort feature means that the entire upright autopsy facesheet file may be read into a MUMPS persistent cookie (vide infra), and immediately read out for use by the JHARIS search program.

11.3. In internet parlance, a COOKIE is an information packet, deposited by an external internet source or program, that contains structured information provided by the user. Typically, when you provide information to a business through the internet, such as your name, address, billing information, etc., this information is stored in a cookie. A TRANSIENT COOKIE survives in your computer memory only so long as the connection with the business is active. A PERSISTENT COOKIE is retained by your computer memory indefinitely, and must be intentionally removed by the user. Currently, it is illegal for some U. S. Federal Government internet sites to deposit persistent cookies on your computer without the recipient's explicit permission.

MUMPS allows the user to deposit persistent cookies, called GLOBALS, representing sorted lists of information. The name of each global always begins with ^ (ASCII 94, shift_6 on the keyboard). The MUMPS Perl source code uses only ...... commands. They are:
=
$
OPEN
CLOSE
USE
$EXTRACT
$TRANSLATE
GOTO
$ORDER
QUIT
WRITE
READ
SET




11.4. The MUMPS indexing software proceeds in two steps. The first MUMPS-program, named JHARINDX, collects text-lines from the upright autopsy file, jharuprt.txt. A new autopsy number, AUN, is detected on each line beginning with #####nnnnn , where nnnnn is the five-digit autopsy number, with leading zeros if necessariy. Otherwise, the text-line:
is dropped to lower-case;
punctuation is translated to blank spaces;
words less than 3 letters are discarded;
barrier words are discarded; and
each remaining word is assigned to global (persistent) array, ^JHARINDX(WORD,AUN).
When the program runs to completion, all the elements of global array ^JHARINDX are implicitly sorted primarily in order of WORD and secondarily in order of AUN .

The second MUMPS-program, named JHAROUTP, writes the contents of global array ^JHARINDX into an output file, named jharindx.txt. Each output line begins with ### wwww... followed by autopsy numbers, separated by blanks, where wwww... , is a word. These three files: jharuprt.txt (upright autopsy facesheets), jharkeyy.txt (autopsy demographics), and jharindx.txt (autopsy facesheets indexed by word), are the data-input for execution for the Perl program, jharis.cgi (where .cgi denotes Common Gateway Interface.

11.5. Herewith is the source code for two MUMPS programs, written in MGlobal® MUMPS. Program JHARINDX builds a persistent cookie, or MUMPS global array, named ^JHARINDX, consisting of each word in the upright autopsy file, jharuprt.txt, followed by every autopsy number containing that word. Barrier words, residing in global ^JHARISBW, are not indexed.
 JHARINDX ; GWMOORE - INPUT JHAR CASES;14MAY06 8:04AM
 ENTRY S BK=" ",UA="^",PSG="#" ;
  ; INITIALIZE PUNCTUATIONS, BLANKS.
  S PUNC="0123456789~`!@#$%^&*()_+-={}[]|\:;""'<,>.?/" ;
  S BLNK="                                           " ;
  ; INITIALIZE UPPERCASE, LOWERCASE LETTERS.
  S UC="ABCDEFGHIJKLMNOPQRSTUVWXYZ",LC="abcdefghijklmnopqrstuvwxyz" ;
  S UCPC=UC_PUNC,LCBK=LC_BLNK ;
  ; INITIALIZE FILENAME, AUTOPSY NUMBER.
  S FILENAME="jharuprt.txt",AUN=0 ;
  ; OPEN INPUT FILE, FILENAME="jharuprt.txt".
 OPENF C 5 O 5:("FN":FILENAME,"FA":0) U 5 S $ZDS=0 ;
  ; READ THE NEXT LINE FROM INPUT FILE, jharuprt.txt.
 READL U 5 R X U 0 S LINE=X S FSL=$E(LINE,1,1) ;
  ; IF THE FIRST LINE CONTAINS #, THEN UPDATE AUTOPSY NUMBER, AUN.
  I (FSL=PSG) S AUN=+$E(LINE,6,10) U 0 W !,AUN G READL ;
  ; OTHERWISE, DROP TO LOWERCASE, AND CHANGE PUNCTUATION TO BLANKS.
  S TRL=$TR(LINE,UCPC,LCBK) S LBK=$L(TRL,BK),IBK=0 ;
  ; EXAMINE EACH STRING BOUNDED BY BLANKS.
 IBK S IBK=IBK+1 G:(IBK>LBK) READL S PBK=$P(TRL,BK,IBK),LPB=$L(PBK) ;
  ; DISCARD WORDS LESS THAN 3 LETTERS, DISCARD BARRIER WORDS.
  G:(LPB<3) IBK G:$D(^JHARISBW(PBK)) IBK ;
  ; UPDATE INDEX GLOBAL, ^JHARINDX(PBK,AUN).
  U 0 W BK,PBK S ^JHARINDX(PBK,AUN)="" G IBK ;
  ; EXECUTION COMPLETE ;
 EXIT Q  ; 
11.6. Program JHAROUTP downloads the index global, ^JHARINDX, as an output file, named jharindx.txt.
 JHAROUTP ; GWMOORE - OUTPUT JHAR CASES;14MAY06 8:05PM
 ENTRY S BK=" ",UA="^",PSG="#",PSBK="### " ;
  ; INITIALIZE OUTPUT FILENAME, jharindx.txt.
  S FILENAME="jharindx.txt",READO=-999999 ;
  ; OPEN OUTPUT FILE, FILENAME="jharindx.txt".
 OPENF C 5 O 5:("FN":FILENAME,"FA":2) U 5 S $ZDS=0 ;
  ; READ NEXT INDEX WORD.
 READO S READO=$O(^JHARINDX(READO)) G:(READO="") EXIT S READP=-999999 ;
  U 5 W !,PSBK,READO,BK U 0 W !,PSBK,READO,BK ;
  ; READ NEXT AUTOPSY NUMBER FOR THAT INDEX WORD.
 READP S READP=$O(^JHARINDX(READO,READP)) G:(READP="") READO ;
  ; WRITE INDEX WORD TO OUTPUT FILE.
  U 5 W READP,BK ;
  ; EXECUTION COMPLETE ;
 EXIT Q  ;


12. FILE TRANSFER PROTOCOL.



12.1. File Transfer Protocol (FTP) is a method for transferring updated files from the JHAR internet site to the user site. The FTP-program used by JHARIS, namely, WS_FTP, is is a copyrighted product, © 1994-1995, John A. Junod; but it may be used cost-free by individual users and U. S. Government agencies for non-commercial purposes.

13. REFERENCES.



1. Aitchison J.
Teach Yourself Linguistics. Fifth Edition.
Chicago: NTC/Contemporary Publishing Co. 2000. ISBN: 0844226688.

2. Manning CD, Schuetze H.
Foundations of Statistical Natural Language Processing.
Cambridge, MA: The MIT Press. ISBN: 0262133601. 2000.

3. Moore GW, Boitnott JK, Miller RE, Eggleston JC, Hutchins GM.
Integrated anatomic pathology reporting system using natural language diagnoses.
Modern Pathol 1988;1:44-50.

4. Moore GW, Miller RE, Hutchins GM.
Indexing by MeSH titles of natural language pathology phrases identified on first encounter using the Barrier Word Method.
In: Scherrer JR, Cote RA, Mandil SH, eds. Computerized Natural Medical Language Processing for Knowledge Representation. North-Holland. 1989;29-39.

5. Nelson SJ, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD.
Identifying concepts in medical knowledge.
Medinfo. 1995;8:33-36.

6. Tersmette KWF, Scott AF, Moore GW, Matheson NW, Miller RE.
Barrier word method for detecting molecular biology multiple word terms.
Proc 12th Annu Symp Comput Appl Med Care. 1988;12:.

7. Wong RL, Gaynon P.
An automated parsing routine for diagnostic statements of surgical pathology reports.
Methods Inf Med. 1971 Jul;10(3):168-175.

8. Wong RL, Reno JD, Hain TC, Platt RC, Gaynon PS, Joseph DM.
Profile of a dictionary compiled from scanning over one million words of surgical pathology narrative text.
Comput Biomed Res. 1980 Aug;13(4):382-398.

9. Zipf GK.
Human Behavior and The Principle of Least Effort. An Introduction to Human Ecology.
Reading, MA: Addison-Wesley Press. 1949;:19-55.

10. USNLM Barrier Words.
http://ii.nlm.nih.gov/MTI/barrier.shtml
Site last tested: 6/3/2006.

11. JHARIS Barrier Words.
http://www.netautopsy.org/jharisbw.htm
Site last tested: 6/3/2006.

12. U. S. Department of Veterans Affairs.
Guide to Directive 6601 (VA webpages).
Glossary for the worldwide web is quite comprehensive and well-written.

13. Worldwide Web Consortium.
http://www.w3c.org/
The inventors and standards organization for the Worldwide Web. The Worldwide Web is based upon the principle.......
Site last tested: 6/3/2006.

13. Strong J.
The Exhaustive Concordance of the Bible.
Nashville, TN: Holman Bible Publishers. Undated.
ISBN 0-87981-626-0. 1340 pages.

p. 4. Directions and Explanations. "Forty-seven unimportant words of very frequent occurrence...."
 a     as    for   him   is    not   out   that  them  to    us    with
 an    be    from  his   it    O     shall the   they  unto  was   ye
 and   but   he    I     me    of    shalt thee  thou  up    we    you
 are   by    her   in    my    our   she   their thy   upon  were  


14. Strong J.
A Concise Distionary of the Words in the Hebrow Bible.
Nashville, TN: Holman Bible Publishers. Undated.
ISBN 0-87981-626-0. 128 pages.
8674 Hebrew words, not including grammatical variants.

15. Strong J.
A Concise Distionary of the Words in the Greek New Testament.
Nashville, TN: Holman Bible Publishers. Undated.
ISBN 0-87981-626-0. 79 pages.
5624 Greek words, not including grammatical variants.

16. Explanation of Perl.
http://virtual.park.uga.edu/humcomp/perl/perl5.html
Site last tested: 6/3/2006.

17. Download cost-free Perl software.
http://www.activestate.com/Products/ActivePerl/?_x=1
Site last tested: 6/3/2006.

18. Zipf GK.
Prof. George K. Zipf [1902-1950].
Site last tested: 6/3/2006.

19. Zipf GK.
Relative frequency as a determinant of phonetic change. Doctoral Thesis.
Harvard Studies in Classical Philology. 1929;40:1-95.

20. Zipf GK.
The Psychobiology of Language.
Boston: Houghton Mifflin. 1935.

21. Zipf GK.
Human Behavior and The Principle of Least Effort. An Introduction to Human Ecology.
Addison-Wesley Press. 1949;:19-55.

22. Tersmette KWF, Scott AF, Moore GW, Matheson NW, Miller RE.
Barrier word method for detecting molecular biology multiple word terms.
Proc Annu Symp Comput Appl Med Care. 1988;12:.

23. Fedorowicz J.
A Zipfian model of an automatic bibliographic system: An application to MEDLINE.
J Am Soc Info Sci 1982;33:223-232.

24. Giere W.
Foundations of clinical data automation in cooperative programs.
Proc 5th Ann Symp Comp Applic Med Care. 1981;5:1142-1148.

25. Zhang Q.
Easy entry of Chinese character set symbols.
Proc 5th Ann Symp Comp Appl Med 1981;5:143-149.

26. Condon EU.
Statistics of vocabulary.
Science 1928;67:300, 1928.

27. Moore GW, Boitnott JK, Miller RE, Eggleston JC, Hutchins GM.
Integrated pathology reporting, indexing, and retrieval system using natural language diagnoses.
Mod Pathol. 1988 Jan;1(1):44-50.
PMID: 3070549.
PubMed Entry
Site last tested: 6/3/2006.

28. Kanter I, Kessler DA.
Markov processes: Linguistics and Zipf's law.
Phys. Rev. Lett. (Print). 1995 May 29;74(22):4559-4562.
PMID: 10058537
PubMed Entry
Site last tested: 6/3/2006.

29. Tsonis AA, Elsner JB, Tsonis PA.
Is DNA a language?
J Theor Biol. 1997 Jan 7;184(1):25-29.
PMID: 9039397.
PubMed Entry
Site last tested: 6/3/2006.

30. Konopka AK, Martindale C.
Noncoding DNA, Zipf's law, and language.
Science. 1995 May 12;268(5212):789.
PMID: 7754361.
PubMed Entry
Site last tested: 6/3/2006.

31. Nelson SJ, Cole WG, Tuttle MS, Olson NE, Sherertz DD.
Recognizing new medical knowledge computationally.
Proc Annu Symp Comput Appl Med Care. 1993;17:409-413.
PMID: 8130505.
PubMed Entry
Site last tested: 6/3/2006.

32. Nelson SJ, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD.
Identifying concepts in medical knowledge.
Medinfo. 1995;8 Pt 1:33-36.
PMID: 8591188.
PubMed Entry
Site last tested: 6/3/2006.

33. Estoup JB.
Gammes Stenographiques.
Paris: 1916.

34. Mandelbrot B.
Structure formelle des textes et communication.
Word 1954: 10:1-27.
" ... bien que le formule de Zipf donne l'allure generale des courbes, elle en represente tres mal les details .... " [Although Zipf's formula gives the general shape of the curves, it represents the details very badly.]
" ... lorsque Zipf essayit de representer tout par cette loi, il essayait d'habiller tout le monde avec des vetements d'une seule taille .... " [... when Zipf tried to represent everything with this law, he tried to dress the everybody with clothes of a single size.]

35. Li W.
References on Zipf's Law.
http://www.nslij-genetics.org/wli/zipf/
Site last tested: 6/3/2006.

36. Baudot É.
Émile Baudot [1845-1903] French telegraphic engineer.
Inventor of Baudot codes, similar to ASCII and Morse code.
Site last tested: 6/3/2006.

37. Melville H.
Moby Dick
New York: Bantam Books. 1967;:. First printing: 1851:;.
ISBN 0-553-21311-3, 593 pages.
Edited and with an introduction by Walcutt CC.

38. Hart M.
Project Gutenberg.
http://www.gutenberg.org
Site last tested: 6/3/2006.

39. Moore GW.
Anatomic Pathology Natural Language Processing.
http://www.netautopsy.org/natlngpr.htm
http://www.netautopsy.org/natlngpr.ppt
Presented at: The Johns Hopkins Medical Institutions, Preclinical Teaching Building 206B, December 6, 2005, 9:00-10:30 AM.
Site last tested: 6/3/2006.

40. File Transfer Protocol (FTP)

41.

14. GLOSSARY.





File Transfer Protocol (FTP) Method for transferring files (text, image, multimedia, etc.) to an internet web host. Program used by JHARIS is ws_ftp, cost-free for non-commercial and U. S. Government use.

For additional information, send queries to the JHARIS administrator, at: George.Moore4@va.gov.

Last Updated: 6/8/2006, G. William Moore, MD, PhD.