ANATOMIC PATHOLOGY
NATURAL LANGUAGE PROCESSING.
DRAFT COPY ONLY.
2/9/2006.

G. William Moore, MD, PhD.

Departments of Pathology,
Baltimore Veterans Affairs Medical Center,
University of Maryland Medical System,
The Johns Hopkins Medical Institutions.

http://www.netautopsy.org/natlngpr.htm
http://www.netautopsy.org/natlngpr.ppt


Presented at: Preclinical Teaching Building 206B, December 6, 2005, 9:00-10:30 AM, for the course: Data, Information, and Knowledge (ME 600.701), Division of Health Science Informatics, The Johns Hopkins Medical Institutions, Baltimore, MD 21287.

Send comments and correspondence to: George.Moore4@va.gov
See also: http://www.netautopsy.org/gwmcv.htm .................. http://www.netautopsy.org/vhpsapsx.htm .................. http://www.netautopsy.org/apdmchap.htm .................. http://www.netautopsy.org/jharzipf.htm

United States Government Work, uncopyrighted, public-domain, DRAFT COPY ONLY. This document does not necessarily represent the views or policies of any United States Government agency. This document is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement. In no event shall the authors be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of, or in connection with the document or the use or other dealings made with the document..

CHAPTER 0. TABLE OF CONTENTS.

Chapter 1. Introduction.
Chapter 2. Linguistic Science.
Chapter 3. Rule-based Systems.
Chapter 4. Generative Linguistics.
Chapter 5. Artificial Intelligence.
Chapter 6. Basic Concepts of Linguistics.
Chapter 7. Competence Grammar.
Chapter 8. Ambiguity of Language.
Chapter 9. Corpus Linguistics. Introduction.
Chapter 10. Zipf's Laws.
Chapter 11. Collocations.
Chapter 12. Concordances.
Chapter 13. Mathematical Foundations.
Chapter 14. Statistics.
Chapter 15. General Linguistics.
Chapter 16. Phrase Structure Grammar.
Chapter 17. Context-Free Grammar.
Chapter 18. Dependency Grammar.
Chapter 19. Corpus Linguistics: Sources.
Chapter 20. Words and Phrases.
Chapter 21. Syntax.
Chapter 22. JHAR/JHSP Corpus.
Chapter 23. Statistical Inventory.
Chapter 24. Future of NLP in medicine.
Chapter 25. NLP Problems in Anatomic Pathology.
Chapter 26. References.
Chapter 27. Mini-histories.
Chapter 28. Glossary.

SLIDES FOR PRESENTATION


Chapter 1. Introduction.

1.1. Why NLP in medicine?
1.1.1. Copious computerized natural language information (terabytes annually).
1.1.2. Storage and organization of information is chaotic.
1.1.3. Standards are too loose to be useful (HL7).
1.1.4. Anatomic pathologists use free-text language precisely, because they are consultants with no direct patient contact.
1.1.5. Synoptic diagnoses: for billing and regulatory purposes.
1.1.6. NLP vs synoptic: the war is on.

Variant forms, same diagnosis.


Colon adenocarcinoma metastatic to lung.
Colonic adenocarcinoma metastatic to lung.
Large bowel adenocarcinoma metastatic to lung.
Large intestine adenocarcinoma metastatic to lung.
Large intestinal adenocarcinoma metastatic to lung.
Colon's adenocarcinoma metastatic to lung.
Adenocarcinoma of colon with metastasis to lung.
Adenocarcinoma of colon with lung metastasis.
Adenocarcinoma of colon with pulmonary metastasis.

Colon adenocarcinoma, metastatic to lung.
Colonic adenocarcinoma, metastatic to lung.
Large bowel adenocarcinoma, metastatic to lung.
Large intestine adenocarcinoma, metastatic to lung.
Large intestinal adenocarcinoma, metastatic to lung.
Colon's adenocarcinoma, metastatic to lung.
Adenocarcinoma of colon, with metastasis to lung.
Adenocarcinoma of colon, with lung metastasis.
Adenocarcinoma of colon, with pulmonary metastasis.
IT IS UNREASONABLE TO DEMAND...


...that a busy physician navigate through a hierarchy of pick-lists in order to write his/her report, as long as the report is:
1. Spelled correctly;
2. Grammatically correct;
3. Complete; and
4. Unambiguous.

HOWEVER, WHAT ABOUT A REPORT LIKE THIS?
UNDERSTANDABLE, YET NUMEROUS MISSPELLINGS.

http://www.mrc-cbu.cam.ac.uk/~mattd/Cmabrigde/
http://www.mrc-cbu.cam.ac.uk/%7Emattd/Cmabrigde/
This is really weird. Can you raed tihs? Olny srmat poelpe can. I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. This is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Amzanig huh? yaeh and I awlyas tghuhot slpeling was ipmorantt! if you can raed tihs psas it on !!
Physicians can read this, but not computers. Thanks to Drs. James DeLeo and Larry Brown and to Mrs. Liz Dunbar for showing this to me.
Basics of Natural Language Processing.


1.2. Questions addressed by NLP:
1.2.1. What you say: syntax.
1.2.2. What it means: semantics.
1.3. Scope of medical NLP: in principle, all medical texts, especially text involved in billing or quality assurance.
1.4. Zipf's Law: f ∝ 1/r, for f=word frequency and r=word rank.
1.5. Anecdotal NLP:early scientific linguistic literature.
1.6. Statistical NLP: reach for the low-hanging fruit.
1.7. Medical NLP: All significant fruit should hang low.

Reach for the low-hanging fruit.




Tantalus.. Hans Holbein the Younger [1497-1543]. U. S. National Gallery of Art, Washington, DC, USA.

Reach for the low-hanging fruit.




Adam und Eva. Vertreibung aus dem Paradies. Der Sündenfall.. Lukas Cranach the Elder [1472-1553]. Kunsthistorisches Museum, Wien, Österreich.

Chapter 2. Linguistic Science.

2.1. Characterize and explain linguistic observations.
2.1.1. Conversation.
2.1.2. Writing.
2.1.3. Childhood development.
2.2. NLP in medicine.
2.2.1. Medical dictations (speech recognition).
2.2.2. Medical handwriting (ugh!).
2.2.3. Paper printed texts, scanned into computer.
2.2.4. Electronic medical record: Veterans Affairs Computerized Medical Record System (VA-CPRS).

Veterans Affairs
Computerized Medical Record System
(CPRS).



Veterans Affairs
Enterprise Reference Terminology.
The Veterans Health Affairs (VHA) branch of the Department of Veterans Affairs, arguably the largest integrated healthcare provider in the United States, has completely computerized virtually all clinical transactions, including physician orders and documentation. VHA has undertaken an Enterprise Reference Terminology (ERT) which has been designed to provide a terminology development environment, terminology services, and maintenance services for the clinical and business content in Health Data Repository (HDR) and other VHA applications. The goal is for the ERT to encompass all HDR domains by 2008.
How did the VA enforce compliance?

1. The VA is a U. S. military organization, with top-down management.
2. There is a U. S. federal mandate for record exchangeability among VA hospitals and clinics nationwide.
3. Implementation by January 1, 2001.
4. No exceptions, no discussion.
5. Employees: timid federal bureaucrats.

Tower of Babel.




Tower of Babel. Pieter Brueghel [1520-1569]. Museum Boymans-van Beuningen, Rotterdam, The Netherlands.

Ancient Alphabets.


Phoenician alphabet (1000 BC):


Hebrew alphabet:
א ב ג ד ה ו ז ח ט י כ ל מ נ ס ע פ צ ק ר ש ת


Greek alphabet:
Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω


Roman alphabet:
A B C D E F G H I K L M N O P Q R S T V X Y Z


Chapter 3. Rule-based systems.
3.1. Grammars in ancient civilizations: translation.
3.1.1. Ancient Phoenician/Hebrew: Tower of Babel,Pentecost.
3.1.2. Ancient Greco-Roman.
3.1.2.1. Everyone WANTED to learn Latin and abandon their local tongue.
3.1.2.2. Exception: Masada.

3.1.3. Ancient China: Qin Shi-Huang (260-210 BC):
3.1.3.1. Everyone REQUIRED to adopt the imperial ideograms, or else.
3.1.3.2. Execution of 460 scholars. (The Ten Crimes of Qin.)

Qin Shi-Huang [260-210 BC].
First Emperor of China.




Aristotle [384-322 BC].




3.2. Aristotelian Logic.

3.2.1. All Greeks are mortal; Socrates is a Greek; ....
3.2.2. Flaws in Greek logic: no inclusive-or, no empty-set ("zero").
3.2.2. Formalized in Boolean logic: inclusive-or; algebraic expressions.
3.2.3. Application: Boolean searches in PubMed: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
3.2.4. Disadvantage: all-or-none reasoning: no room for probability.
3.2.5. No tense, no real subjunctive.
3.2.6. No tolerance of inconsistency: ex falso quod libet.
3.2.7. No belief (doxistic), reality (ontology), deontic (moral obligation), alethic (knowable), intentionality, or other variant logics.
3.2.8. Fuzzy logic.
3.2.9. Paraconsistency, modal logic.
George Boole [1815-1864].




Col. John Shaw Billings, MD. [1838-1913].



U. S. Civil War Surgeon.
Creator of Index Medicus.
Father of PubMed.

Approach to Mathematical Grammars.


3.3. "All grammars leak" (Sapir, 1921).
3.3.1. Retreat in despair.
3.3.2. Characterize statistical properties.
3.3.3. Common patterns in language use.

3.4. Early scientific foundations.

3.4.1. Rationalist approach (1960s-1970s).
3.4.2. Chomsky: innate language facility.
3.4.3. Poverty of stimulus (does not explain language competence).
3.4.4. Assume, characterize the rule-base.
3.4.5. Generative grammars.

Prof. Noam Chomsky [1928-].




Father of modern computational linguistics.

3.5. Empirical Approach.

3.5.1. Active in 1980s.
3.5.2. Assume basic cognitive ability; deny tabula rasa.
3.5.3. Baby has general operations:

3.5.3.1. Association.
3.5.3.2. Pattern recognition.
3.5.3.3. Generalization.

3.5.4. Rich sensory input.
3.5.5. Deduce parameters for general model.
3.5.6. Corpus Linguistics.
3.5.7. American Structuralists.

3.5.7.3. Harris (1950s): know-nothing computer program.
3.5.7.3. Harris's student, Sager (1980s): NYU Linguistic String Project.


Chapter 4. Generative Linguistics.

4.1. Chomsky: describe the innate language (I-language).
4.2. Indirect evidence: text: E-language.
4.3. Linguistic competence: "competence grammar". Property of the speaker.
4.4. Linguistic performance: memory lapses, distractions, etc.
4.5. Medicine:
4.5.1. linguistic competence in the medical writer.
4.5.2. models of medical reports: ADASP in pathology.
4.5.3. performance: slow typist, time constraints, idiosyncratic abbrs.
4.5.4. JCAHO: forbidden abbreviations.


Chapter 5. Artificial Intelligence.

5.1. Build small systems that behave intelligently.
5.2. Criticized as "toy problems".
5.3. Language engineering: devoid of general principles.
5.4. Prof. Marvin Minsky [1927-], Father of Artificial Intelligence.

Prof. Marvin Minsky [1927-].




Father of artificial intelligence.

Chapter 6. Basic Concepts.

6.1. Fundamental Questions.
6.1.1. What do people say/write?
6.1.2. What do these utterances/writings say about the world?

6.2. Grammaticality.
6.3. Conventionality.
6.4. Ambiguity.
6.5. Corpus Sources. Brown corpus, Gutenberg, OMIM, JHAR, JHSP.
6.6. Zipf's Laws.
6.7. Collocations.
6.8. Concordances.

Chapter 7. Competence Grammar.

7.1. Property of the rational speaker.
7.2. Grammaticality-only includes wierd sentences: "Colorless green ideas sleep furiously."
7.3. Conventionality: the usual expression, even when others are possible or more sensible (e.g.: how do you do?)
7.4. Conventionality in medicine:
7.4.1. Malignant melanoma => melanosarcoma.
7.4.2. Hepatoma => hepatocellular carcinoma.
7.4.3. Hypernephroma => renal cell carcinoma.
7.4.4. Nutmeg liver => chronic passive congestion of liver.
7.4.5. Caseous necrosis => necrotic granuloma.

Statistical conventionality in medicine.
Conventional meanings for conventional phrases.

Chapter 8. Ambiguity of Language.

8.1. Our department|NP      is|H      training pathologists|VP.
8.2. Our department|NP      is|V      training pathologists|VP.
8.3. Our department|NP      is|V      training pathologists|NP.

Chapter 9. Corpus Linguistics.

9.1. Text-corpora: Brown corpus. One million words, tagged, representative of American English.
9.2. Text-corpora: Project Gutenberg. 17,000 uncopyrighted literary texts (Tom Sawyer, etc.)
9.3. Text-corpora: OMIM: Comprehensive list of medical conditions.
9.2. Word frequencies.
9.3. Zipf's First Law.

Chapter 10. Zipf's Laws.

10.1. Zipf's First Law.

10.1.1. f ∝ 1/r:
f = word-frequency,
r = word-frequency rank,
m = number of meetings per word.

10.1.2. There exists a k such that f × r = k.
10.1.3. Alternatively, log f = log k - log r.
10.1.4. English literature, Johns Hopkins Autopsy Resource, German, and Chinese.
10.2. Zipf's Second Law.
10.2.1. m ∝ √f
10.2.2. There exists a k such that k × f = m2.
10.2.3. Corollary: m ∝ 1/√r
10.2.4. Highly dependent upon what qualifies a "different meaning" for a word.
10.3. Zipf's Third Law.

10.3.1. f ∝ 1/wordlength:
10.3.2. There exists a k such that f × wordlength = k.

10.3.3. Highly dependent upon word-division conventions for a language. For example, "mitral valve stenosis" (English, 3 words) is Mitralklappenstenose (1 word) in German and 6 words in Japanese.

10.3.4. German, Turkish, and Finnish are highly agglomerative languages. Examples: Donaudampfschifffahrtsgesellschaftskapitän (German: Danube steam shipping line company captain); Avrupalilastirilamiyanlardansiniz (Turkish: you are one of those who cannot be Europeanized).

10.4. Mandelbrot (fractal guy): "...bien que la formule de Zipf donne l'allure générale des courbes, elle en représente très mal les détails...."
10.4.1. f = P(r + ρ)-B, P, B, ρ are parameters.
10.5. Similar observations made by Baudot (1870), Pareto (1896), Estoup (1916), and Condon (1928).

Mandelbrot's Formula.


1. f = P(r + ρ)-B.
2. With six parameters, you can draw an elephant.
3. With seven parameters, you can wag its tail.

The Johns Hopkins Autopsy Resource.




Zipf's First Law:
50,000 JHH Autopsy Facesheets.




Zipf's First Law:
50,000 JHH Autopsy Facesheets.




Zipf Distribution:
50,000 JHH Autopsy Facesheets.




Zipf's Law:
Chinese.




This Chinese ideogram is variously translated as of, which, or the adjectival endings -ic or -ical. It is pronounced, de. The word is a loan-word from English! Supposedly, the word it corresponds to is the English suffix, -tic, as in sclerotic, nephritic, and fibrotic, etc.

Zipf's Law as a signature.


Ten professors of medicine at Goethe University Medical School, Frankfurt, Germany.
Zipf distribution of their computerized medical workups.
Medical students could identify the professor by his/her Zipf distribution.

Kenner's Corollary:
Zipf's Law of Messy Desks.


See: Kenner (2003).

Chapter 11. Collocations.


11.1. Multiple word sequence, perceived to have an existence beyond sum of parts.
11.2. Example: Johns Hopkins Surgical Pathology (JHSP).
11.3. Reuse of phrases: cliches, not requiring Chomskyan high-level competence.
11.4. Barrier word method vs frequency distribution filter.
11.5. Barrier words from JHSP:
 RANK	FREQUENCY   BARRIER WORD
   1      222,175   and
   2      196,153   of
   3      189,799   with
   4      107,039   for
   5      104,067   the
   6       82,104   note
   7       80,740   in
   8       78,549   right
   9       77,885   left
  10       70,923   is
  11       70,261   see
  12       67,917   are
  13       53,071   mild
  14       49,987   identified
  15       47,804   to
  16       41,467   consistent
  17       39,792   this
  18       30,352   present
  19       27,189   seen
  20       25,371   at
  21       25,097   there
  22       24,657   on
  23       24,284   or
  24       23,021   be
  25       21,243   associated
  26       19,515   was
  27       18,376   one
  28       16,122   but
  29       16,057   case
  30       16,057   from

11.6. Example of barrier word filter:
TERMINAL ILEUM , CECUM , APPENDIX and COLON ( RIGHT HEMICOLECTOMY ) ; MODERATELY DIFFERENTIATED COLONIC ADENOCARCINOMA , with extension through MUSCULARIS PROPRIA into PERICOLIC SOFT TISSUE , and with involvement of PERINEURAL SPACES . TUBULOVILLOUS ADENOMA and associated VASCULAR MALFORMATION in the TRANSVERSE COLON ; TUBULAR ADENOMA in the DESCENDING COLON . recent COLOSTOMY SITE with SUBMUCOSAL FIBROSIS and INFLAMED GRANULATION TISSUE in the SEROSA . multiple ADHESIONS and SEROSAL ABSCESSES with GRANULATION TISSUE , FOREIGN BODY GIANT CELLS , SCARRING , focal OSSIFICATION , and FAT NECROSIS . ISCHEMIC BOWEL DISEASE diffusely involving ILEAL MUCOSA , with focal TRANSMURAL NECROSIS and ACUTE INFLAMMATION .

BARRIER WORD FILTER, GERMAN:
CHRONISCHE SKLEROSIERENDE PANCREATITIS , vorwiegend im bereich des CAPUT PANCREATIS . PSEUDOCYSTE des PANCREASKOPFES . multiple KALKSPRITZERARTIGE FETTGEWEBSNEKROSEN des CAPUT PANCREATIS und des CORPUS PANCREATIS . CHRONISCHSKLEROSIERENDE EXTRAHEPATISCHE CHOLANGITIS . FETTLEBER . PARIETALTHROMBOSE der PFORTADER . SUBCAPSULAERER ABSZESS des rechten LEBERLAPPENS mit ANAEMISCHER NEKROSE der nachbarschaft . FLAECHENHAFTE PERITONEALVERWACHSUNGEN der LEBEROBERFLAECHE. STAUUNGSMILZ. FLAECHENHAFTE PERITONEALVERWACHSUNGEN der MILZKAPSEL. zustand nach nicht ganz frischer LAPAROTOMIE im bereich des rechten OBERBAUCHES mit QUERVERLAUFENDER ABDOMINALNAHT und anlage einer DRAINAGE der BURSA OMENTALIS . DILATATION des rechten HERZVENTRIKELS . schwere STENOSIERENDE CORONARARTERIENSKLEROSE . multiple PETECHIEN der HERZHINTERWAND , vorwiegend im bereich beider VORHOEFE . beginnende GALLERTATROPHIE des SUBEPICARDIALEN FETTGEWEBES . flaechenhafte PLEURAVERWACHSUNGEN beiderseits . PLEURASPITZENSCHWIELEN beiderseits . schweres LUNGENOEDEM . akute BLUTSTAUUNG der LUNGEN . INTIMALIPOIDOSE der PULMONALARTERIEN . ATELEKTASEN BASALER und PARAVERTEBRALER LUNGENABSCHNITTE . NARBENCARCINOM der spitze des linken LUNGENUNTERLAPPENS . mittelgradige allgemeine ARTERIOSKLEROSE . OEDEM der ARYEPIGLOTTISCHEN FALTEN . mehrere NEKROSEN der SCHLEIMHAUT von EPIGLOTTIS , LARYNX , und TRACHEA . GASTROMALACIA ACIDA . ULCUSNARBE des ANTRUM VENTRICULI. frische HAEMORRHAGISCHE MAGENSCHLEIMHAUTEROSION des ANTRUM VENTRICULI . teils BLUTIGER , teils HAEMATINHALTIGER DUENNDARMINHALT . TRUEBE SCHWELLUNG der NIEREN . RESTE RENCULAERER LAPPUNG . sogenannte KALKINFARKTE der NIERENPAPILLEN . FLECKFOERMIGE HARNBLASENSCHLEIMHAUTBLUTUNGEN . LIPOIDVERARMUNG der NEBENNIERENRINDE . fortgeschrittene AUTOLYSE .


Chapter 12. Concordances.

12.1. Bible concordances.
 is no balm in Gilead; [is there] no   physician there? why then is not the health of th
 them, They that be whole need not a   physician, but they that are sick. 13 But go ye a
  that are whole have no need of the   physician, but they that are sick: I came not to 
 ll surely say unto me this proverb,   Physician, heal thyself: whatsoever we have heard
 hem, They that are whole need not a   physician; but they that are sick. 32 I came not 
 in Hierapolis. 14 Luke, the beloved   physician, and Demas, greet you. 15 Salute the br

12.2. Keyword in Context (KWIC).
 epidermoid carcinoma , uterine cervix extending to   fundus , adnexa , bladder , rectum , and pelvic
 r . diverticula colon . surgical absence , uterine   fundus , and appendix . peritoneal adhesions . 
 iae . external cardiac massage . petechiae gastric   fundus .                                       
  cell nuclei . capillary microaneurysms left optic   fundus . history of traumatic lumbar puncture .
 eral renal pelves and trachea . surgical absence ,   fundus and corpus uteri , and subtotal absence 
  and intact healed end to side anastomosis between   fundus of stomach and proximal jejunum . hyperp
 ial necrosis aorta . surgical absence , body , and   fundus of uterus , appendix , and left sixth ri


Chapter 13. Mathematical Foundations.

13.1. Probability Theory. Sample space, S; event A ⊆ S; field F is the set of all events A ⊆ S. Probability, P(A), defined for every event A, such that 0 < P(A) < 1.
13.2. Axioms of probability.
13.2.1. P(S)=1;
13.2.2. P(Ø)=0; and
13.2.3. P(∪i Ai) = ∑i Ai if (Ai ∩ Aj) = Ø for every i≠j.
13.3. Uniform distribution. All events are equally likely.
13.4. Conditional Probability. For events A, B, the conditional probability, P(A|B) of event A given event B is defined as: P(A|B) = P(A∩B)/P(B).
13.5. Probabilistic Independence. For events A and B, P(A∩B) = P(A) × P(B).
13.6. Bayes' Law. The conditional probability, P(B|A) of event B given event A is defined as: P(B|A) = (P(A|B)×P(B))/P(A). Bayes' Law is used when it is relatively more easy to calculate P(A|B), the more difficult P(B|A) is desired.
13.7. Random variable. Function X : S -> R, that maps a probability event space into the real line, R.

Chapter 14. Statistics.

14.1. Estimation: The average, or expected value, of a probabilistic process.
14.2. Hypothesis Testing: If the NULL HYPOTHESIS is true, then a particular experimental outcome is likely at a given probability, e.g., p < 0.05.
14.3. Expected Value: E(X) = ∑ x P(X=x).
14.4. Variance: Var(X) = E[X - E(X)]]2
14.5. Binomial distribution:
14.5.1. The fair coin toss (p=0.5); unfair coin toss (p≠0.5).
14.5.2. r successes, n trials: B(r,n,p) = n!/((n-r)!r!).

14.6. Normal (Gaussian) distribution:
14.6.1. N(x,μ,σ) = (1/√2π)e-0.5((x-μ)/σ)2
14.6.2. Limit of binomial distribution for large n, and probability "close" to 1/2.


Chapter 15. General Linguistics.

15.1. Parts of Speech, morphology.
15.2. Nouns, pronouns, cases, declensions.
15.3. Proper nouns: Dr. Smith, Ms. Barrett. Wilms. Grave's.
15.4. Adverbial nouns: home, west, tomorrow.
15.5. Determiners, adjectives.
15.6. Verbs: tenses, person.
15.7. Conjunction, complementizers.
15.8. Phrase Structure Grammar.
15.9. Context Free Grammar.
15.10. Generative Grammar.

Chapter 16. Phrase Structure Grammar.

16.1. All grammar can be reduced to a sequence of phrases.
16.2. Noun phrase.
16.3. Prepositional phrase.
16.4. Verb phrase.
16.5. Adjectival phrase.
16.6. Phrase Structure Grammar.
16.6.1. free word order (Latin, Russian).
16.6.2. dependency grammar (English).
16.7. Rewrite rules.
16.8. Backus-Naur form.

16.8.1. [] ==> [Nφ]
16.8.2. [Nφ] ==> [N]
16.8.3. [Nφ] ==> [AN]
16.8.4. [Nφ] ==> [NPN]
where:
[]=null-sentence.
Nφ=noun-phrase.
N=noun.
P=preposition.
A=adjective


Chapter 17. Context Free Grammar.

17.1. Rewrite rules depend solely upon internal structure. Examples of non-context-free grammars. Is peels a transitive or intransitive verb?
Grandma peels: potatoes?
scrofulitic?
ecdysiastic?
Medical:
Foot
Foot of hippocampus.
Fundus:
uterine.
ocular.
gastric.
German (verb separable-prefixes):
Hör mal! → Listen up!
Hör mal auf! → Desist!

17.2. Surrounding context is irrelevant.
17.3. Recursive phrase structure expressions.
17.4. Used in high-level computer languages, compilers, interpreters.

Chapter 18. Dependency Grammar.

18.1. Definition: dependency between words (arrows):
 The old man ate the rice slowly.
                       ______________
                       |            | 
                       ↓             | 
 The → old man --→  ate    the → rice       slowly.
  |__________↑         ↑                            |
                       |_________________________|
18.2. Arguments: noun phrases, e.g, the old man; verb phrases, e.g., ate the rice

18.3. Adjuncts: adverbs, e.g., slowly

18.4. Useful for disambiguating multiple noun phrases:
red carpet movers.
nevus
blue nevus
cellular blue nevus

Chapter 19. Corpus Linguistics.

19.1. Corpus sources.

http://www.netautopsy.org Johns Hopkins Autopsy Resource.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM Online Mendelian Inheritance in Man.
http://www.gutenberg.org Project Gutenberg.
http://www.ldc.upenn.edu Linguistic Data Consortium.
http://www.elra.info European Language Resources Association.
http://nora.hd.uib.no/icame.html International Computer Archive of Modern English.
http://ota.ahds.ac.uk Oxford Text Archive.
http://childes.psy.cmu.edu Child Language Data Exchange System.

19.2. Markup.
19.3. Word Frequencies.
19.4. Programming languages.
19.4.1. C/C++
19.4.2. SNOBOL: historical predecessor of string languages.
19.4.3. MUMPS: good sorting, U. S. taxpayer-sponsored monopoly.
19.4.3. Perl: Cost-free, universal on internet, good string commands.


Chapter 20. Words and Phrases.

20.1. Collocations.
20.2. Word-sense disambiguation.
20.3. Lexical acquisition.

Chapter 21. Syntax.
21.1. Markov Models.
Markov chain: chain of events, A1, A2, A3, ..., with a limited memory, classically, only a single step.

Markov (1913) originally developed Markov chains to examine the sequence of letters in Russian literature.

The probablity of letter/word n depends only upon the previous k words.
21.2. Hidden Markov Model (HMM): probabilistic function of a Markov process.
21.3. HMMs are the dominant model in speech recognition research.
21.4. HMMs used in part-of-speech tagging of a document.
21.5. Forward Hidden Markov Model algorithm.
21.6. Backward Hidden Markov Model algorithm.
21.7. Probabilistic Context-Free Grammars.
21.8. Probabilistic Parsing.

Chapter 22. Experience with JHSP/JHAR corpus.

22.1. Johns Hopkins Autopsy Resource (JHAR), posted 1995-2003.
22.2. Not publicly available now: HIPAA.
22.3. Requires Institutional Review Board (IRB) approval.
22.3.1. Why the project won't harm the patients.
22.3.2. Why the risk of harm is outweighed by presumed benefits.
22.4. Same for http://www.netautopsy.org/vhpsapsx.htm JHSP corpus.

Chapter 23. Statistical Inventory.

23.1. All Words: Zipf's Law.
23.2. Barrier Words: Zipf's Law.
23.3. Collocations: Zipf's Law.
23.4. Grammaticality: Zipf's Law.
23.5. BNF formulas: Zipf's Law.

23.2. Barrier Words: Zipf's Law.
RANK	FREQUENCY   BARRIER WORD
   1      222,175   and
   2      196,153   of
   3      189,799   with
   4      107,039   for
   5      104,067   the
   6       82,104   note
   7       80,740   in
   8       78,549   right
   9       77,885   left
  10       70,923   is
  11       70,261   see
  12       67,917   are
  13       53,071   mild
  14       49,987   identified
  15       47,804   to
  16       41,467   consistent
  17       39,792   this
  18       30,352   present
  19       27,189   seen
  20       25,371   at
  21       25,097   there
  22       24,657   on
  23       24,284   or
  24       23,021   be
  25       21,243   associated

23.3. Collocations: Zipf's Law.
RANK	FREQUENCY   COLLOCATION
   1       38,401   chronic inflammation
   2       20,328   lymph nodes
   3       18,428   diff quik
   4       16,104   soft tissue
   5       14,456   bone marrow
   6       13,104   non diagnostic
   7       13,021   diagnostic findings
   8       13,004   non diagnostic findings
   9       12,868   helicobacter pylori
  10       12,328   crypt distortion
  11       12,316   lymph node
  12       12,292   quik stain
  13       12,284   diff quik stain
  14       11,080   mild chronic
  15       10,229   epithelial changes
  16       10,004   fibroadipose tissue
  17        9,967   non specific
  18        9,052   left breast
  19        8,893   inflammatory disease
  20        8,741   gastroesophageal reflux

23.4. Grammaticality: Zipf's Law.
  RANK  FREQUENCY       SENTENCE-PATTERN   EXAMPLE 
     1    423,177                    [N]   hemangioma
     2    106,034                 [N[N]]   liver [needle]
     3     98,958                   [AN]   left foot
     4     85,908                  [N|V]   scar
     5     79,741                 [NN|V]   skin scar
     6     62,042                  [AAN]   epidermal inclusion cyst
     7     50,461                [AN[N]]   laryngeal mass [biopsy]
     8     41,958                  [NCN]   decidua and villi
     9     38,689                [A|NPN]   negative for actinomyces
    10     26,745               [N[NPN]]   cervix [biopsy at 9:00] 
    11     22,097                [N[NN]]   cervix [biopsy 9:00]
    12     21,704                 [NPAN]   skin of left ear
    13     21,102                   [NN]   ear lobe
    14     20,638                  [BAN]   non diagnostic findings
    15     16,864               [AAN[N]]   left chest wall [biopsy]
    16     13,674                 [AAAN]   left axillary soft tissue
    17     12,798              [NCAN[N]]   skin , left flank [biopsy]
    18     12,692                [ANCAN]   soft tissue , inguinal region
    19     12,596               [ANPAAN]   fibrous plaque from left carotid artery
    20     12,507   [N[N]ANCA|VANCA|NPN]   leg [ bka ] old thrombus and calcified atherosclerotic plaque , negative for osteomyelitis 

23.5. BNF formulas: Zipf's Law.
  RANK   FREQUENCY      BNF FORMULA   EXAMPLE
     1     689,478       [N] ==> []   [prostate]
     2     313,234      [AN] ==> []   [actinic keratosis]
     3     117,039     [AAN] ==> []   [hypertrophic actinic keratosis]
     4      86,762     [N|V] ==> []   [scar]
     5      80,127    [NN|V] ==> []   [skin scar]
     6      66,816     [NAN] ==> []   [skin soft tissue]
     7      60,129     [NCN] ==> []   [decidua and villi]
     8      55,728       [AN ==> [N   [actinic KERATOSIS
     9      52,777     [A|N] ==> []   [negative]
    10      47,375      [NN] ==> []   [granulation tissue]
    11      47,139       [A] ==> []   [void]
    12      42,661     [NPN] ==> []   [adenocarcinoma of colon]
    13      36,076    [AAAN] ==> []   [focal bowenoid actinic keratosis]
    14      31,946    [NPAN] ==> []   [skin with actinic keratosis]
    15      25,168     [BAN] ==> []   [focally invasive tumor]
    16      22,761    [NCAN] ==> []   [ulcer and acute inflammation]
    17      22,276     [ANN] ==> []   [exuberant granulation tissue]
    18      16,791       [NN ==> [N   [lung CARCINOMA
    19      15,577    [NAPN] ==> []   [carcinoma metastatic to lung]
    20      13,764     [NNN] ==> []   [liver gallbladder pancreas]

PHRASE STRUCTURE GRAMMAR, PARSING.
   [ adenocarcinoma     of   colon   metastatic   to   lung ]
   [        N            P       N       A         P     N  ]

PHRASE STRUCTURE GRAMMAR, UMLS CODES.
  [ ADENOCARCINOMA     OF       COLON   METASTATIC     TO          LUNG   ]
  [    C0001418     C0332285  C0009368   C0027627    C0332286    C0024109 ]

PHRASE STRUCTURE GRAMMAR, XML FORMAT.
  <code section scheme="UMLS">
    <c type="morph" value="C0001418>adenocarcinoma
      >c type="topo" value="C0009368">colon
        <c type="morph" value="C0027627">metastatic
          <c type="topo" value="C0024109">lung
          </c>
        </c>
      </c>
    </c>
  </code-section>

A NOTE OF PESSIMISM.

"Linguistic theories ... do not cover varieties of exceptional expressions which practical machine translation systems have to handle. A machine translation system, which is still imperfect and will never be completed, is exposed to very crude tests when the system construction reaches a certain stage. At that stage of development, the system is given a comparatively simple sentence for translation, with structures that can be analyzed by a grammar given to the system. After completion, people other than those who developed the system are asked to translate a variety of texts such as newspaper articles, science magazines, patent documents, contract documents, and commercial letters. Because the documents have not been adequately tested at the development stage, users are disappointed by the poor translation results produced by the system. Many of the failures of the system come from the fact that the dictionary and the grammar are not sufficient to accept such unexpected input sentences."


Chapter 24. Conclusions: Future of NLP in medicine.

24.1. Terabytes of text information in medicine annually.
24.2. Raw materials for epidemiologic studies.
24.3. Competition: fast turnaround time versus tolerating a grammatical filter (e.g., Microsoft® Word® email filter (ugh!).
24.4. Acceptable phrase structure grammar rules: professional societies.
24.5. NLP reducible to synoptic reporting.
24.6. Physicians do not easily surrender control of their documents.
24.7. Prof. Siegel's (father of filmless radiology) Test: Who wins the first lawsuit.

Chapter 25. Problems for NLP in anatomic pathology.

25.1. Undetected associations between diseases, e.g., Mesothelioma-asbestos.
25.2. Does one "outgrow" cancer? Age-specific cancer incidences in an aging population.

Chapter 26. References.


Chapter 27. Mini-histories.


Chapter 28. Glossary.


CHAPTER 1.
INTRODUCTION.



1.1. Reasons for NLP in medicine.

There is currently a raging controversy going on in anatomic pathology practice, and the fallout will eventually reach our colleagues in other medical specialties. Anatomic pathologists have always written their diagnostic reports in free text, either English or some other medically competent language (including Latin!). So far, my colleagues have successfully resisted the onslaught of data-miners and administrators who want us to write our diagnoses in standardized coding systems (CAP, 2005; Ackerman, 2005; Ackerman, 2004).

This controversy was a big topic at the most recent meeting of Advancing Practice, Instruction, and Innovation through Informatics (APIII, 2005); and is a requirement for hospitals accredited as a certified cancer center by the College of American Pathologists (CAP, 2005); or by the American College of Surgeons (ACS, 2005). The driving forces are billing ( Mauung, 2004; Hardhats, 2005) and regulation ( JCAHO, 2005). When do two diagnostic reports deserve the same compensation; and what is the mix of cases for a particular medical institution? It is hopeless to tabulate records of this complexity manually. And, in my opinion, it is equally hopeless to expect pathologists and other physicians to compose their reports by making selections from pick-lists.

CHAPTER 2.
LINGUISTIC SCIENCE.



2.1. Characterize and explain linguistic observations.


CHAPTER 3.
RULE-BASED SYSTEMS.



3.1. Grammars in Ancient Civilizations.


CHAPTER 4. GENERATIVE LINGUISTICS.



4.1. Chomsky: describe the innate language (I-language).


CHAPTER 5. ARTIFICIAL INTELLIGENCE.



5.1. Build small systems that behave intelligently.


CHAPTER 6. BASIC CONCEPTS.



6.1. Fundamental questions.


CHAPTER 7. COMPETENCE GRAMMAR.



7.1. Property of the rational speaker.


CHAPTER 8. AMBIGUITY OF LANGUAGE.



8.1. Verbs, gerunds, gerundives.


CHAPTER 9. CORPUS LINGUISTICS: INTRODUCTION.



9.1. Text corpora: Brown corpus.


CHAPTER 10. ZIPF'S LAWS.



10.1. Zipf's First Law.


10.2. Zipf's Second Law.


10.3. Zipf's Third Law.


CHAPTER 11. COLLOCATIONS.



11.1. Definition: Multiple word sequence.


CHAPTER 12. CONCORDANCES.



12.1. Biblical.


CHAPTER 13. MATHEMATICAL FOUNDATIONS.



13.1. Probability Theory.


CHAPTER 14. STATISTICS.



14.1. Estimation.


CHAPTER 15. GENERAL LINGUISTICS.



15.1. Parts-of-speech, morphology.


CHAPTER 16. PHRASE STRUCTURE GRAMMAR.



16.1. Grammar reduced to a sequence of phrases.


CHAPTER 17. CONTEXT-FREE GRAMMAR.



17.1. Surrounding context is irrelevant.


CHAPTER 18. DEPENDENCY GRAMMAR.



18.1. Definition: dependency between words.


CHAPTER 19. CORPUS LINGUISTICS: SOURCES AND METHODS.



19.1. Johns Hopkins Autopsy Resource.


CHAPTER 20. WORDS AND PHRASES.



20.1. Collocations.


CHAPTER 21. SYNTAX.



21.1. Markov models.


CHAPTER 22. JHAR/JHSP CORPORA.



22.1. JHAR.


CHAPTER 23. STATISTICAL INVENTORY.



23.1. All words: Zipf's Law.


23.2. Barrier words: Zipf's Law.


23.3. Collocations: Zipf's Law.


23.4. Grammaticality: Zipf's Law.


23.5. BNF formulas: Zipf's Law.


CHAPTER 24. FUTURE OF NLP IN MEDICINE.



24.1. Terabytes of medical text annually.


CHAPTER 25. PROBLEMS FOR NLP IN PATHOLOGY



25.1. Undetected associations.


CHAPTER 26.
REFERENCES.



Pubmed.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi

Ackerman AB.
Protocols for the reporting of cutaneous melanoma.
Am J Clin Pathol. 2004 Nov;122(5):815-7. No abstract available.
PMID: 15540388.
PubMed Entry

Ackerman AB.
Dermatologist not equal to dermatopathologist: no place in a profession for pretenders.
J Am Acad Dermatol. 2005 Oct;53(4):698-699.
PMID: 16198796.
PubMed Entry

Ackerman AB.
Garble that derives from lack of definition.
Am J Dermatopathol. 2005 Aug;27(4):369-370.
PMID: 16121068.
PubMed Entry

Ackerman AB.
The future of pathology as a discipline: none without a dictionary!
Cesk Patol. 2005 Jan;41(1):4-5.
PMID: 15816116.
PubMed Entry

Ackerman AB.
Reviewer conflicts of interest should be disclosed.
J Am Acad Dermatol. 2005 Mar;52(3 Pt 1):538; author reply 538; discussion 538-539.
PMID: 15761446.
PubMed Entry

Ackerman AB.
Decline of a discipline: abetment by journals.
J Cutan Pathol. 2005 Mar;32(3):254; author reply 254.
PMID: 15701091.
PubMed Entry

Kung JX, Ackerman AB.
Staging of melanoma: a critique of the most recent (2002) system proposed by the American Joint Committee on Cancer: part II.
Am J Dermatopathol. 2005 Apr;27(2):165-167.
PMID: 15798445.
PubMed Entry

Bakotic B, Ackerman AB.
Staging of melanoma: a critique in historical perspective: part I.
Am J Dermatopathol. 2005 Apr;27(2):160-164.
PMID: 15798444.
PubMed Entry

Dabbs DJ, Geisinger KR, Ruggiero F, Raab SS, Nalesnik M, Silverman JF; Association of Directors of Anatomic and Surgical Pathology.
Recommendations for the reporting of tissues removed as part of the surgical treatment of malignant liver tumors.
Hum Pathol. 2004 Nov;35(11):1315-1323.
PMID: 15668887.
PubMed Entry
ADASP Reporting protocol.

Wei JT, Miller EA, Woosley JT, Martin CF, Sandler RS.
Quality of colon carcinoma pathology reporting: a process of care study.
Cancer. 2004 Mar 15;100(6):1262-1267.
PMID: 15022295.
PubMed Entry
ADASP Reporting protocol.

Jaffe ES, Banks PM, Nathwani B, Said J, Swerdlow SH.
Recommendations for the reporting of lymphoid neoplasms: A report from the Association of Directors of Anatomic and Surgical Pathology.
Mod Pathol. 2004 Jan;17(1):131-135.
PMID: 14657953.
PubMed Entry
ADASP Reporting protocol.

Lawrence WD; Association of Directors of Anatomic and Surgical Pathology.
ADASP recommendations for processing and reporting of lymph node specimens submitted for evaluation of metastatic disease.
Virchows Arch. 2001 Nov;439(5):601-603. Review.
PMID: 11764377.
PubMed Entry
ADASP Reporting protocol.

Association of Directors of Anatomic and Surgical Pathology.
ADASP recommendations for processing and reporting lymph node specimens submitted for evaluation of metastatic disease.
Am J Surg Pathol. 2001 Jul;25(7):961-963.
PMID: 11420470.
PubMed Entry

ADASP Committee. The Association of Directors of Anatomic and Surgical Pathology.
ADASP recommendations for processing and reporting of lymph node specimens submitted for evaluation of metastatic disease.
Mod Pathol. 2001 Jun;14(6):629-632.
PMID: 11406667.
PubMed Entry
ADASP Reporting protocol.

Kishi K.
Comments regarding the American Association of Directors of Anatomic and Surgical Pathology (ADASP) recommendations for the reporting of urinary bladder specimens containing bladder neoplasms: comparison with the Japanese General Rule for Clinical and Pathological Studies on Bladder Cancer.
Pathol Int. 1997 May;47(5):332.
PMID: 9143031.
PubMed Entry
ADASP Reporting protocol.

Association of Directors of Anatomic and Surgical Pathology.
Recommendations for the reporting of resected large intestinal carcinomas. Association of Directors of Anatomic and Surgical Pathology.
Am J Clin Pathol. 1996 Jul;106(1):12-15.
PMID: 8701921.
PubMed Entry
ADASP Reporting protocol.

Association of Directors of Anatomic and Surgical Pathology. Recommendations for the reporting of breast carcinoma.
Association of Directors of Anatomic and Surgical Pathology.
Am J Clin Pathol. 1995 Dec;104(6):614-619.
PMID: 8526202.
PubMed Entry
ADASP Reporting protocol.

Simpson PR, Tschang TP.
ADASP recommendations: consultations in surgical pathology. Association of Directors of Anatomic and Surgical Pathology.
Hum Pathol. 1993 Dec;24(12):1382.
PMID: 8276389.
PubMed Entry
ADASP Reporting protocol.

Aitchison J.
Teach Yourself Linguistics. Fifth Edition.
Chicago: NTC/Contemporary Publishing Co. 2000.
ISBN: 0844226688.

Bengtsson S, Schneider W, Spencer WA, Pratt AW, Kastner VV, Reichertz P, Lamson BG, Anderson J.
The application of computer techniques in health care.
World Hosp. 1976;12(1):47-51.
PMID: 1024332.
PubMed Entry

Berman JJ, Moore GW.
Object-oriented controlled-vocabulary translator using TRANSOFT + HyperPAD.
Proc Annu Symp Comput Appl Med Care. 1991;15:973-975.
PMID: 1807773.
PubMed Entry

Berman JJ.
Tumor classification: molecular analysis meets Aristotle.
BMC Cancer. 2004 Mar 17;4:10.
PMID: 15113444.
PubMed Entry

Borst F, Lyman M, Nhan NT, Tick LJ, Sager N, Scherrer JR.
TEXTINFO: a tool for automatic determination of patient clinical profiles using text analysis.
Proc Annu Symp Comput Appl Med Care. 1991;:63-67.
PMID: 1807679.
PubMed Entry

Bundy A, ed.
Artificial Intelligence Techniques: A Comprehensive Catalogue. Fourth, Revised Edition.
Heidelberg: Springer Verlag. 1997;:.
ISBN: 3540593233.

Chi EC, Sager N, Tick LJ, Lyman MS.
Relational data base modelling of free-text medical narrative.
Med Inform (Lond). 1983 Jul-Sep;8(3):209-223.
PMID: 6600043.
PubMed Entry

Chomsky N.
Morphophonemics of Modern Hebrew.
Undergraduate Honors Essay. University of Pennsylvania. 1949;:. Cited in: Newmeyer FJ. Generative Linguistics. A historical Perspective. London: Routledge. 1996;:.

Chomsky N.
Syntactic Structures.
The Hague: Mouton. 1957;:.

Chomsky N.
The development of grammar in child language: Formal discussion.
Monogr Soc Res Child Dev. 1964;29:35-39.
PMID: 14125365.
PubMed Entry

Chomsky N.
Aspects of the Theory of Syntax.
Cambridge, MA: MIT Press. 1965;:.

Chomsky N.
Language and Mind.
San Diego: Harcourt Brace Jovanovich. 1968.

Chomsky N.
Rules and Representations.
New York: Columbia University Press. 1980;:.

Chomsky N.
Knowledge of Language: Its Nature, Origin, and Use.
New York: Prager. 1986;:.

Chomsky N.
The Minimalist Program.
Cambridge, MA: MIT Press. 1995;:.

Chomsky N.
Universals of human nature.
Psychother Psychosom. 2005;74(5):263-268.
PMID: 16088263.
PubMed Entry

Cios KJ, Moore GW.
Medical Data Mining and Knowledge Discovery: Overview.
Chapter 1. In: Cios KJ. Medical Data Mining and Knowledge Discovery. Berlin: Springer Verlag. 2000;1:1-16.
ISBN: 3-7908-1340-0, 502 pages.
Published within the series: "Studies in Fuzziness and Soft Computing", Physica-Verlag Heidelberg, a Springer-Verlag Company.

Condon EU.
Statistics of vocabulary.
Science 1928;67:300.

Craig J, Bevington W.
Designing with type. A basic course in typography. Fourth edition.
New York: Watson-Guptill Publications. 1999;:.
ISBN 0-8230-1347-2, 176 pages.
Chapter 1. Origins of the Alphabet. pp. 8-11.

Dunham GS, Pacak MG, Pratt AW.
Automatic indexing of pathology data.
J Am Soc Inf Sci. 1978 Mar;29(2):81-90.
PMID: 10318395.
PubMed Entry

Estoup JB.
Gammes Sténographiques. Fourth Edition.
Paris:. 1916;:.

Fedorowicz J.
A Zipfian model of an automatic bibliographic system: An application to MEDLINE.
J Am Soc Info Sci 1982;33:223-232.

Fitch WT, Hauser MD, Chomsky N.
The evolution of the language faculty: Clarifications and implications.
Cognition. 2005 Sep;97(2):179-210.
PMID: 16112662.
PubMed Entry

Giere W.
Foundations of clinical data automation in cooperative programs.
Proc 5th Ann Symp Comp Applic Med Care. 1981;5:1142-1148.

Graepel PH, Henson DE, Pratt AW.
Comments on the use of the Systematized Nomenclature of Pathology.
Methods Inf Med. 1975 Apr;14(2):72-75.
PMID: 1207468.
PubMed Entry

Description of VistA® Filemanager.
http://www.hardhats.org
Includes instructions for obtaining at-cost copies of the complete, public-domain system, through the Freedom of Information Act.

Harris Z.
Methods in Structural Linguistics.
Chicago: University of Chicago Press. 1951;:.

Hauser MD, Chomsky N, Fitch WT.
The faculty of language: what is it, who has it, and how did it evolve?
Science. 2002 Nov 22;298(5598):1569-1579. Review.
PMID: 12446899.
PubMed Entry

Hirschman L, Story G, Marsh E, Lyman M, Sager N.
An experiment in automated health care evaluation from narrative medical records.
Comput Biomed Res. 1981 Oct;14(5):447-463.
PMID: 7273723.
PubMed Entry

Huff D.
How to lie with statistics.
New York: W. W. Norton & Company. 1954;:.
ISBN 0-393-31072-8, 142 pages.

Hutchins WJ.
Machine Translation : Past, Present, Future .
Ellis Horwood/Wiley, Chichester/ New York. 1986. Ellis Horwood Series in Computers and Their Applications. ASIN: 0135435218 .

Hutchins GM, Berman JJ, Moore GW, Hanzlick R, the Autopsy Committee of the College of American Pathologists.
Practice Guidelines for Autopsy Pathology.
Arch Pathol Lab Med. 1999; 123:1085-1092.

Joseph DM, Wong RL.
Correction of misspellings and typographical errors in a free-text medical English information storage and retrieval system.
Methods Inf Med. 1979 Oct;18(4):228-234.

Justeson JS, Katz SM.
Technical terminology: some linguistic properties and an algorithm for identification in text.
Natural Language Engineering. 1995;1:9-27. December 7, 2003: The master critic - The late Hugh Kenner's theory of everything. By John Wilson. The Boston Globe / available from Boston.com. "When Hugh Kenner died on Nov. 24, a few weeks shy of his 81st birthday, the first problem for writers of obituaries and tributes was how to categorize him. ... He was himself a 'pattern recognizer,' as he described inventor Raymond Kurzweil in the December 1990 issue of the pioneering personal computer magazine Byte. ... This openness to experience, this confidence that the patterns he saw derived from some ultimate coherence, must have been owing in part to Kenner's faith, a subject about which he was reticent in his writing. ... [W]hile some of his coreligionists were wringing their hands about the implications of artificial intelligence -- and while MIT's Marvin Minsky was proclaiming that human beings are machines made out of meat -- Kenner was busy devising, with Joseph O'Rourke, a computer program called TRAVESTY, which manipulates a text to create odd effects of language. Later, with Charles Hartman, Kenner published a volume of computer-generated poetry, 'Sentences.'" See: Poetry, Tributes, Pattern Recognition, Natural Language Processing, Machine Learning, Applications

Kenner's Corollary: Article in Discover Magazine, circa 1985: The idea that a desk with an "archeologic ordering" of papers, i.e., chronological with most recently used papers at the top of the pile, is a demonstration of Zipf's Law. That is, the 90% of papers used most often typically appear in the top 10% of the pile.

Kucera H, Francis WN.
Computational Analysis of Present-Day American English.
Providence, RI: Brown University Press. 1967;:.

Laird CG.
The miracle of language.
Publisher: Fawcett Publications. 1965;:.
ASIN: B0007I1X2Y, 255 pages.

Lewis CI, Langford CH.
Symbolic Logic. Second Edition.
New York: Dover Publications, Inc. 1932.

Li W.
Zipf's Law Bibliography.
http://linkage.rockefeller.edu/wli/zipf/index_ru.html

Lyman M, Sager N, Tick L, Nhan N, Borst F, Scherrer JR.
The application of natural-language processing to healthcare quality assessment.
Med Decis Making. 1991 Oct-Dec;11(4 Suppl):S65-S68.
PMID: 1770852.
PubMed Entry

Mandelbrot B.
Structure formelle des textes et communication.
Word 1954;10:1-27.

Manning CD, Schütze H.
Foundations of Statistical Natural Language Processing.
Cambridge, MA: The MIT Press. 2000;:.
ISBN: 0262133601, 680 pages.
http://www-nlp.stanford.edu/fsnlp/intro/

Markov AA.
An example of statistical investigation in the text of Eugene Onyegin, illustrating coupling of tests in chains.
Proc Acad Sci St Petersburg 1913;7;153-162.
Markov was a student of Tschebyscheff.

Masarie FE jr, Miller RA, Bouhaddou O, Guise NB, Warner HR.
An Interlingua for Electronic Interchange of Medical Information: Using Frames to Map Between Clinical Vocabularies.
Comp Biomed Res 1991; 24(4):379-400.

Maung RTA.
What is the best indicator to determine anatomic pathology workload? Canadian experience.
Am J Clin Pathol. 2005;123:45-55.

Upstate Medicare Division.
Sample CPT® Fee Schedule: Upstate Medicare Division, 2004 Fee Schedule.
http://www.umd.nycpic.com/2004_80000-89999.html
Accessed January 18, 2005.
From:
http://www.umd.nycpic.com/
Note: CPT® NUMBER and CPT® DESCRIPTOR are copyrighted products of the American Medical Association.

Minsky M, Hillis D, Rudisch G.
Artificial intelligence.
N Engl J Med. 1980 Jun 26;302(26):1482.
PMID: 7374720.
PubMed Entry

Moore GW, Miller RE, Hutchins GM, Riede UN, Polacsek RA.
Multilingual translation techniques in the analysis of narrative medical text.
Proc Annu Symp Comput Appl Med Care. 1985;9:. November 10-13, 1985, Baltimore, MD.

Moore GW, Miller RE, Hutchins GM.
Microcomputer translator for medical text: Theorem verification for Chapter Two of Zeman's Modal Logic.
Adv Math Comput Med. 7:1621-1633, 1986.

Moore GW, Riede UN, Polacsek RA, Miller RE, Hutchins GM.
Automated translation of German to English medical text.
Am J Med. 1986 Jul;81(1):103-111.
PMID: 3755289.
PubMed Entry

Moore GW, Riede UN, Polacsek RA, Miller RE, Hutchins GM.
Group theory approach to computer translation of medical German.
Methods Inf Med. 1986 Jul;25(3):176-182.
PMID: 3755498.
PubMed Entry

Moore GW, Polacsek RA, Erozan YS, de la Monte SM, Miller RE, Hutchins GM, Riede UN.
Multilingual translation techniques in the analysis of narrative medical text.
Comput Methods Programs Biomed. 1986 Mar;22(1):35-42.
PMID: 3634670.
PubMed Entry

Moore GW, Hutchins GM, Boitnott JK, Miller RE, Polacsek RA.
Word root translation of 45,564 autopsy reports into MeSH titles.
Proc Annu Symp Comput Appl Med Care. 1987;11:. Washington DC, November 1-4, 1987.

Moore GW, Boitnott JK, Miller RE, Eggleston JC, Hutchins GM.
Integrated anatomic pathology reporting system using natural language diagnoses.
Modern Pathol 1988;1:44-50.

Moore GW, Miller RE, Hutchins GM.
Indexing by MeSH titles of natural language pathology phrases identified on first encounter using the Barrier Word Method.
In: Scherrer JR, Cote RA, Mandil SH, eds. Computerized Natural Medical Language Processing for Knowledge Representation. North-Holland. 1989;:29-39.

Moore GW, Wakai I, Satomura Y, Giere W.
TRANSOFT: Medical translation expert system.
Artif Intell Med 1:149-157, 1989.

Moore GW.
TRANSOFT: Public-domain English-to-SNOMED computer translation shell, using the DVA File Manager. Abstract.
Mod Pathol. 4:123A, 1991.

Moore GW.
Medical Expert System User Interface. Editorial.
Artif Intell Med. 1991:15;.

Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM.
A prototype internet autopsy database: 1625 consecutive fetal and neonatal autopsy facesheets spanning twenty years.
Arch Pathol Lab Med. 1996;120:782-785.
http://www.medparse.com/protoiad.htm

Moore GW, Berman JJ.
Anatomic Pathology Data Mining.
Chapter 4. In: Cios KJ. Medical Data Mining and Knowledge Discovery. Berlin: Springer Verlag. 2000;4:61-107.
ISBN: 3-7908-1340-0, 502 pages.
Published within the series: "Studies in Fuzziness and Soft Computing", Physica-Verlag Heidelberg, a Springer-Verlag Company.
http://www.medparse.com/apdmchap.htm

Nagao M.
Machine Translation.
In: Shapiro SC, ed. Encyclopedia of Artificial Intelligence. Volume 2. M-Z. New York: Wiley-Interscience. 1992;2:898-902.
A nice quote from one of the leaders in the field, that captures the fruitlessness of open-ended programs for computer translation:
"Linguistic theories ... do not cover varieties of exceptional expressions which practical machine translation systems have to handle. A machine translation system, which is still imperfect and will never be completed, is exposed to very crude tests when the system construction reaches a certain stage. At that stage of development, the system is given a comparatively simple sentence for translation, with structures that can be analyzed by a grammar given to the system. After completion, people other than those who developed the system are asked to translate a variety of texts such as newspaper articles, science magazines, patent documents, contract documents, and commercial letters. Because the documents have not been adequately tested at the development stage, users are disappointed by the poor translation results produced by the system. Many of the failures of the system come from the fact that the dictionary and the grammar are not sufficient to accept such unexpected input sentences."


Naur P.
Revised Report on the Algorithmic Language ALGOL 60.
Comm ACM, 1960 May; 3(5):299-314.

Nelson SJ, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD.
Identifying concepts in medical knowledge.
Medinfo. 1995;8:33-36.

Newmeyer FJ.
Generative Linguistics. A historical Perspective.
London: Routledge. 1996;:.

Pacak MG, Pratt AW.
Identification and transformation of terminal morphemes in medical English part II.
Methods Inf Med. 1978 Apr;17(2):95-100.
PMID: 661609.
PubMed Entry

Pareto V.
Cours d'economie politique
Geneva: Droz. 1896;:. Lausanne and Paris: Rouge. 1897;:.
Pareto's Principle, a predecessor of Zipf's Law.

Pratt AW, Pacak M.
Identification and transformation of terminal morphemes in medical English.
Methods Inf Med. 1969 Apr;8(2):84-90.
PMID: 5819388.
PubMed Entry

Pratt AW.
Interactive data processing in the medical research institution.
Methods Inf Med Suppl. 1976;10:65-76.
PMID: 1078477.
PubMed Entry

Sager N, Bross ID, Story G, Bastedo P, Marsh E, Shedd D.
Automatic encoding of clinical narrative.
Comput Biol Med. 1982;12(1):43-56.
PMID: 7075165.
PubMed Entry

Sager N, Wong R.
Developing a database from free-text clinical data.
J Clin Comput. 1983;11(5-6):184-194.
PMID: 10278191.
PubMed Entry

Sager N, Lyman M, Tick LJ, Nhan NT, Bucknall CE.
Natural language processing of asthma discharge summaries for the monitoring of patient care.
Proc Annu Symp Comput Appl Med Care. 1993;:265-268.
PMID: 8130474.
PubMed Entry

Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ.
Natural language processing and the representation of clinical data.
J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-160. Review.
PMID: 7719796.
PubMed Entry

Sager N, Lyman M, Nhan NT, Tick LJ.
Automatic encoding into SNOMED III: a preliminary investigation.
Proc Annu Symp Comput Appl Med Care. 1994;:230-234.
PMID: 7949925.
PubMed Entry

Sager N, Lyman M, Nhan NT, Tick LJ.
Medical language processing: applications to patient data representation and automatic encoding.
Methods Inf Med. 1995 Mar;34(1-2):140-146.
PMID: 9082123.
PubMed Entry

Salton G.
Automatic text analysis.
Science. 1970 Apr 17;168(929):335-343.
PMID: 5435890.
PubMed Entry

Salton G.
Experiments in automatic thesaurus construction for information retrieval.
In: Proceedings IFIP Congress, 1971;:43-49.

Salton G, ed.
The Smart Retrieval System - Experiments in Automatic Document Processing.
Englewood Cliffs, NJ: Prentice-Hall. 1971;:.

Salton G, McGill MJ.
Introduction to modern information retrieval.
New York: McGraw-Hill. 1983;:.

Salton G, Fox EA, Wu H.
Extended boolean information retrieval.
Communications of the ACM 1983;26:1022-1036.

Salton G, Buckley C, Fox EA.
Automatic query formulations in information retrieval.
J Am Soc Inf Sci. 1983 Jul;34(4):262-280.
PMID: 10299297.
PubMed Entry

Salton G.
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.
Reading, MA: Addison Wesley. 1989;:.

Salton G, Buckley C.
Global text matching for information retrieval.
Science. 1991;253:1012-1015.

Salton G, Allen J.
Selective text utilization and text traversal.
In: Proceedings of ACM Hypertext 93, New York.
New York: Association for Computing Machinery. 1993;:.

Salton G, Allan J, Buckley C, Singhal A.
Automatic analysis, theme generation and summarization of machine-readable texts.
Science 1994;264:1421-1426.

Sawyer R, Berman JJ, Borkowski A, Moore GW.
Elevated prostate-specific antigen levels in black men and white men.
Mod Pathol. 1996 Nov;9(11):1029-1032.
http://www.medparse.com/elevpsal.htm

Sorace JM, Berman JJ, Carnahan GE, Moore GW.
PRELOG: precedence logic inference software for blood donor deferral.
Proc Annu Symp Comput Appl Med Care. 1991;:976-977.
PMID: 1807774.
PubMed Entry

Suppes P.
Introduction to Logic.
New York: Van Nostrand. 1957;:.

Suppes P.
Probabilistic grammars for natural languages.
Synthese 1970;22:95-116.

Suppes P.
Axiomatic Set Theory.
New York: Dover Publications. 1972;:.
ISBN: 0486616304.

Suppes P.
Probabilistic Metaphysics.
Oxford: Blackwell. 1984;:.

Suppes P, Bottner M, Liang L.
Machine learning comprehension grammars for ten languages.
Computational Linguistics. 1996;22:329-350.

Taylor M, Saltz J, Nichols JH.
Design of an Integrated Clinical Data Warehouse.
J Assn Lab Automation. 2000. in press.

Tersmette KWF, Scott AF, Moore GW, Matheson NW, Miller RE.
Barrier word method for detecting molecular biology multiple word terms.
Proc Annu Symp Comput Appl Med Care. 1988;12:207-211. Washington DC, November 6-9, 1988.

Twain M.
Life on the Mississippi.
New York: Signet Classics, Reissue edition. 2001;:. (November 7, 2001). Twain M, Kaplan J.
ISBN: 0451528174, 359 pages. See:
http://en.wikipedia.org/wiki/Mark_Twain

Tymoczko T, ed.
New Directions in the Philosophy of Mathematics.
Princeton, NJ: Princeton University Press. 1998;:.

U. S. National Library of Medicine.
Unified Medical Language System.
http://www.nlm.nih.gov/research/umls/

U. S. National Library of Medicine.
UMLS Knowledge Sources. Eleventh Edition. Unified Medical Language System.
U. S. Department of Health and Human Services. National Institutes of Health. National Library of Medicine. 2000;:.

U. S. National Library of Medicine.
UMLS Knowledge Sources. Tenth Edition. Unified Medical Language System.
U. S. Department of Health and Human Services. National Institutes of Health. National Library of Medicine. 1999.

U. S. National Library of Medicine.
UMLS Knowledge Sources. Ninth Edition. Unified Medical Language System.
U. S. Department of Health and Human Services. National Institutes of Health. National Library of Medicine. 1998;:.

Wilbur WJ.
Overview of Books at NCBI.
http://www.ncbi.nlm.nih.gov:80/books/mboc/bookshelp/bookover.html#link

Wingert F.
[PAULA: program for evaluation of logical expressions. Plausibility-control and evaluation of optical mark reader forms]
Methods Inf Med. 1972 Apr;11(2):96-103.
PMID: 5026579.
PubMed Entry

Wingert F, Ries P.
[Pathology findings system]
Methods Inf Med. 1973 Jul;12(3):150-155. German.
PMID: 4729117.
PubMed Entry

Wingert F.
[Morphosyntactical analysis of compound word forms in medical language]
Methods Inf Med. 1977 Oct;16(4):248-255. German.
PMID: 337050.
PubMed Entry

Wingert F.
Morphologic analysis of compound words.
Methods Inf Med. 1985 Jul;24(3):155-162.
PMID: 4033445.
PubMed Entry

Wingert F.
Automated indexing based on SNOMED.
Methods Inf Med. 1985 Jan;24(1):27-34.
PMID: 3982279.
PubMed Entry

Wingert F.
An indexing system for SNOMED.
Methods Inf Med. 1986 Jan;25(1):22-30.
PMID: 3753739.
PubMed Entry

Wingert F.
Automated indexing of SNOMED statements into ICD.
Methods Inf Med. 1987 Jul;26(3):93-98.
PMID: 3670105.
PubMed Entry

Wingert F.
Medical linguistics: automated indexing into SNOMED.
Crit Rev Med Inform. 1988;1(4):333-403.
PMID: 3288353.
PubMed Entry

Wittgenstein L.
Philosophical Investigations [Philosophische Untersuchungen]. Third edition.
Oxford: Basil Blackwell. 1968;:.

Wong RL, Gaynon P.
An automated parsing routine for diagnostic statements of surgical pathology reports.
Methods Inf Med. 1971 Jul;10(3):168-175.

Wong RL, Reno JD, Hain TC, Platt RC, Gaynon PS, Joseph DM.
Profile of a dictionary compiled from scanning over one million words of surgical pathology narrative text.
Comput Biomed Res. 1980 Aug;13(4):382-398.

Yu CC-Y, Moore GW, Unschuld PU.
Romanized Chinese respelling rules for an English medical word list.
Proc Annu Symp Comput Appl Med Care. 1987;11:. Washington DC, November 1-4, 1987.

Zhang Q.
Easy entry of Chinese character set symbols.
Proc 5th Ann Symp Comp Appl Med 1981;5:143-149.

Zipf GK.
Relative frequency as a determinant of phonetic change.
Harvard Studies in Classical Philology 1929;40:1-95.

Zipf GK. Selective Studies and the Principle of Relative Frequency in Language.
?1932.

Zipf GK.
The Psycho-Biology of Language.
Boston, MA: Houghton Mifflin. 1935;:.
Boston, MA: MIT Press. 1965;:.

Zipf GK.
National Unity and Disunity: The Nation As a Bio-Social Organism.
Bloomington, IN: Principia Press. 1941;:.

Zipf GK.
Human Behavior and The Principle of Least Effort. An Introduction to Human Ecology.
Reading, MA: Addison-Wesley Press. 1949;:19-55.

Campbell JR, Carpenter P, Sneiderman C, Cohn S, Chute CG, Warren J.
Phase II evaluation of clinical coding schemes: completeness, taxonomy, mapping, definitions, and clarity. CPRI Work Group on Codes and Structures.
J Am Med Inform Assoc. 1997 May-Jun;4(3):238-51.
http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=9147343

Chute CG, Cohn SP, Campbell KE, Oliver DE, Campbell JR.
The content coverage of clinical classifications. For The Computer-Based Patient Record Institute's Work Group on Codes & Structures.
J Am Med Inform Assoc. 1996 May-Jun;3(3):224-33.
PMID 8723613.
http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=8723613

Campbell JR, Payne TH.
A Comparison of Four Schemes for Codification of Problem Lists.
Proc SCAMC 1994, Washington, DC, p. 201-205
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=7949920&query_hl=4

Humphreys BL, McCray AT, Cheh ML
Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test.
J Am Med Inform Assoc. 1997 Nov-Dec;4(6):484-500.
http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=9391936

U. S. National Library of Medicine.
Papers covering UMLS/SNOMED/Read Codes in different domains:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Display&dopt=pubmed_pubmed&from_uid=9147343

Apelon Resources:
http://www.apelon.com/literature/conferencepapers.htm

Evaluation of SNOMED coverage of VHA Terms.
http://www.apelon.com/literature/papers/FinalVA_SNOMEDPaper.pdf

Lingologix is the commercial tool that uses NLP to map to (SNOMED CT), in clinical use at Mayo and Hopkins:
http://www.lingologix.com/

CHAPTER 27.
MINI-HISTORIES.



Genesis 11:1-19 [circa 4000 BC]. Tower of Babel. According to this story, all persons on earth once spoke a single language. The people attempted to build a tower reaching to heaven. Because of their arrogance, God punished them by confounding their languages, and their building project failed.
There are now over 2000 distinct written languages on earth today.
In this story, different languages are viewed as a curse, a barrier to understanding.

Aristotle (Αριστοτελης) [384 BC - 322 BC]. Greek philosopher, who compiled an encyclopedia of all scientific and other human knowledge available at that time. Aristotle's Rule: for every positive y such that x > y, there exists an n > 0 such that yn > x. Note that if y=0, the rule doesn't work. This and other pernicious properties of zero caused Aristotle to avoid the concept. Zero was rediscovered and developed almost a millennium later by Indian and Arabic mathematicians.
See: http://en.wikipedia.org/wiki/Aristotle

Rosetta Stone [196 BC] The Rosetta Stone is a dark granite stone with writing in two languages, Egyptian and Greek, using three scripts: Hieroglyphic Egyptian, Demotic Egyptian, and Greek. Because Greek was well known, the stone was important to scholars for deciphering the hieroglyphs. Ptolemy V assumed the crown at age five, and was faced with the task of reclaiming lands lost to various invaders. As an attempt to reestablish legitimacy for Ptolemy, his priests issued a series of decrees, inscribed on stones and distributed throughout Egypt. The Rosetta stone is the decree issued in the city of Memphis. It stone describes various taxes repealed by Ptolemy V, and instructs that his statues be erected in temples in three languages.

"Rosetta" is iconic for "translation", and some computerized translation systems have "Rosetta" as part of their name.
See: http://en.wikipedia.org/wiki/Rosetta_Stone

Qin Shi-Huang (夌始皇) [260 BC - 210 BC] First emperor of China (Qin = Ch'in), only emperor of the Qin Dynasty, who unified the country administratively and linguistically, in part by burning all books which disagreed with his regime. The advantage of this linguistic unification is that a document written in one part of China can be read anywhere else in China (assuming that the readers are literate), even though the spoken languages (so-called dialects) are mutually unintelligible. Everyone was REQUIRED to adopt the imperial ideograms, or else. Execution of 460 scholars. (The Ten Crimes of Qin.) See:
http://en.wikipedia.org/wiki/Qin_Shi_Huang

The subject of the rise of Emperor Qin, and the conflict of scholarship versus political unification, is treated in the movie Ying Xiong (2002) (Hero, starring Jet Li, Mandarin with English subtitles).
See: http://www.imdb.com/title/tt0299977/

Acts 2:1-15. [circa 35 AD] The Christian Pentecost miracle, where the Holy Spirit descends upon a group of disciples, and allows them to preach in many different languages. In contrast to the Tower of Babel, this Biblical reference is a positive reference to the multiple languages of the earth.

Masada. [72 AD] Site of an apparent mass suicide among first-century Jews, rather than be conquered and subjugated to the spiritual and linguistic demands of the Roman Empire. Chronicled by Flavius Josephus, a first-century Jewish historian, based upon eye-witness accounts. Masada (Hebrew: מצדה = fortress) was built by Herod the Great between 37 and 31 BC as a refuge for himself, in case his subjects should rise up against him. In 66 AD, a group of Jewish rebels overtook Masada from the Roman garrison, and used Masada as their base for raiding and harassing local settlements. In 72 AD, the Roman governor of Judaea, Lucius Flavius Silva, marched against Masada and eventually built a rampart against the western plateau, using thousands of tons of stones and beaten earth. Silva finally breached the wall of the fortress with a battering ram. When the Romans entered the fortress, they discovered that its defenders had set all the buildings ablaze and committed mass suicide, rather than face certain capture or defeat. See:
http://en.wikipedia.org/wiki/Masada

Gaius Suetonius Tranquillus: Lives of the Grammarians and Rhetoricians. "The science of grammar was in ancient times far from being in vogue at Rome; indeed, it was of little use in a rude state of society, when the people were engaged in constant wars, and had not much time to bestow on the cultivation of the liberal arts. At the outset, its pretensions were very slender, for the earliest men of learning, who were both poets and orators, may be considered as half-Greek: I speak of Livius and Ennius, who are acknowledged to have taught both languages as well at Rome as in foreign parts. But they only translated from the Greek, and if they composed anything of their own in Latin, it was only from what they had before read. For although there are those who say that this Ennius published two books, one on "Letters and Syllables," and the other on "Metres," Lucius Cotta has satisfactorily proved that they are not the works of the poet Ennius, but of another writer of the same name...."
Translation by Alexander Thompson, MD.
See: http://en.wikipedia.org/wiki/Suetonius
http://classicpersuasion.org/pw/cicero/suetoniusrhetor.htm

Rev. Thomas Bayes. British Anglican priest who developed the theory of conditional probability.
See: http://en.wikipedia.org/wiki/Bayes

Benjamin Disraeli, Earl of Beaconsfield (1804-1881). Conservative British Prime Minister during the Victorian Era. "There are lies, damn lies, and statistics." It is not an accident that statistics developed in Great Britain, and that the world's best statisticians still live and work there. Great Britain is an island nation, and has always made its national livelihood from maritime trade. Ships at sea, like dice at a gaming table, are subject to chance occurrences. In his career, Disraeli must have seen more than his share of deceptive statistics. See:
http://en.wikipedia.org/wiki/Benjamin_Disraeli

John Maynard Keynes (1883-1946). "In the long run, we're all dead." British economist, who developed concepts of national fiscal and monetary policy. Many economic theories distinguish between short-run and long-run processes, without really specifying how long is long-run. This quote is Keynes's ridicule of this particular paradox of academic economics. See:
http://en.wikipedia.org/wiki/John_Maynard_Keynes

Karl Pearson. Early twentieth century British statistician, who introduced the correlation coefficient, or Pearson's r. Father of E. S. Pearson, another twentieth century statistical giant. See:
http://en.wikipedia.org/wiki/Karl_Pearson

Aleksander N. Kolmogorov. Great 20th c. Russian statistician and mathematician, who introduced many non-parametric methods in statistics, including the Kolmogorov-Smirnov test. See:
http://en.wikipedia.org/wiki/Kolmogorov

George Boole [1815-1864] British mathematician and philosopher. As the inventor of Boolean algebra, the basis of all modern computer arithmetic, Boole is regarded as one of the founders of the field of computer science, although computers did not exist in his day.
See: http://en.wikipedia.org/wiki/George_Boole

Col. John Shaw Billings [1838 - 1913]. U. S. surgeon and librarian, born in Indiana. In the Civil War, Billings was medical inspector of the Army of the Potomac. After the war, he directed the Surgeon General's Library in Washington, DC. The catalog entries greatly increased under his supervision by 1873, and soon thereafter, Billings began work on the Index Catalogue. Sixteen volumes appeared before his military retirement. In 1879, he initiated the Index Medicus, a monthly guide to current medical literature, which eventually became PubMed, curated by the U. S. National Library of Medicine. Dr. Billings designed plans for the construction of Johns Hopkins Hospital. His works include classic essays on hospital administration and training. Under his leadership (1864 - 1895), the National Library of Medicine became one of the greatest medical library systems in the world.

Émile Baudot The Baudot code was used extensively in telegraph systems. It is a five bit code invented by the Frenchman Emile Baudot in 1870.

Ludwig Josef Johann Wittgenstein [1889 - 1951] was an Austrian philosopher, who contributed several ground-breaking works to modern philosophy, primarily on the foundations of logic and the philosophy of language. He is widely regarded as one of the most influential philosophers of the 20th century. See:
http://en.wikipedia.org/wiki/Ludwig_Wittgenstein

George Kingsley Zipf [1902-1950] was an American linguist and philologist, who studied the statistical properties of different languages. He is the eponym of Zipf's Law (actually, Zipf's First Law), which states that only a few words are used very often, whereas many or most words are used rarely, according to the formula:
f = k/r
where f is word-frequency, r is word-rank, and k is a constant. Zipf's work was treated harshly when it first appeared, perhaps somewhat justifiably because Zipf's claims were so grandiose: namely, an explanation for all linguistic usage in all major human languages. Also, Zipf's "principle of least effort" (i.e., speakers use a few words repeatedly, because they are linguistically lazy) has never been verified experimentally.

As recently as a few years ago, a humanities professor from a prestigious east-coast university made disparaging remarks to me about Zipf's work. (This wasn't a friendly conversation: I impugned the professor's abilities and discernment as a scientist.) Also, Zipf's "principle of least effort" (i.e., speakers use a few words repeatedly, because they are linguistically lazy) has never been verified experimentally.

However, Zipf was right. His basic claim (i.e., Zipf's First Law) has been verified for many major languages, including English, German, and Chinese; as well as for specialized bodies of medical text, including The Frankfurt University Medical Consultation Database and The Johns Hopkins University Autopsy Facesheets. Major internet indexing systems (google.com, yahoo.com) apparently exploit Zipf's First Law, although their exact search algorithms are closely-guarded trade secrets.

Furthermore, as anyone knows who has studied a second language, all beginning (i.e., first-year) textbooks introduce fewer than a thousand words. Even though this is the vocabulary of a preschooler, it is the thousand most-used words in the language, and gets you a pretty good start on ordering dinner or checking into a hotel.

Zipf died at age 48, and did not live to see the incredible growth of interest in his work. See:
http://en.wikipedia.org/wiki/George_Kingsley_Zipf

Marvin Lee Minsky [1927-]. U. S. scientist in the field of artificial intelligence (AI), co-founder of the Laboratory of Artificial Intelligence at the Massachusetts Institute of Techology, and author of several texts on AI and philosophy. He served in the U.S. Navy in 1944-1945. He holds a BA in Mathematics from Harvard (1950) and a PhD in Mathematics from Princeton (1954). He has been on the faculty of the Massachusetts Institute of Techology since 1958. He is currently Toshiba Professor of Media Arts and Sciences and Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology.

Prof. Avram Noam Chomsky [1928-] is Institute Professor Emeritus of linguistics at the Massachusetts Institute of Technology. Chomsky developed the theory of generative grammar, regarded as the most significant contribution to the field of theoretical linguistics of the 20th century. Chomsky established the so-called Chomsky hierarchy, a classification of formal languages in terms of their generative power. Chomsky is also widely known for his political activism, and for his criticism of the foreign policy of the United States and other governments, particularly in the Vietnam War era. See:
http://en.wikipedia.org/wiki/Noam_Chomsky

Lotfi Asker Zadeh [1922-]. The so-called "Pope of Fuzzy Logic", whose 1968 paper introducing fuzzy set theory has been cited over 11,000 times in peer-reviewed journals of mathematics, computer science, or engineering.
See: http://en.wikipedia.org/wiki/Lotfi_Zadeh

William S. Gossett (Student). An employee of the Guinness Brewery in Dublin, Ireland, who wrote the ground-breaking papers in the British journal, Nature, about the Student t test. Gossett was a student of Karl Pearson, but because Gossett did his work as an employee, he concealed his identity because of his commercial ties. His papers were signed, simply, Student. The Guinness Book of World Records was written by the Guinness Brewery as an aid to settle arguments in British bars were Guinness products were served. See:
http://en.wikipedia.org/wiki/William_Sealey_Gossett

Sir Ronald A. Fisher. Greatest British statistician of the twentieth century. Sir Ronald corrected a small error in a formula for variance that had originally been promulgated by Karl Pearson. Fisher proved that the correct formula for the sample variance is: s2 = (∑ni=1  (xi) - x)2)/(n-1), not s2 = (∑ni=1  (xi) - x)2)/n, as Pearson had thought.
Sir Ronald was the scientist who demonstrated statistically that Mendel had probably fudged his data.
The F-test for the analysis of variance is named in honor of Fisher.
However, Sir Ronald sold out to the tobacco industry. When the news first emerged that tobacco use was bad for your health, Fisher defended the tobacco industry by asserting that the cause-effect relationship was not conclusively demonstrated. Fisher developed the concept of CONFOUNDING, in which he argued that tobacco users might have some other mysterious quality that caused them to develop tobacco-related illnesses, apart from the tobacco use. Fisher's prominence in the field of statistics helped the tobacco industry hide from its responsibilities for a number of years. Fisher's assertion was eventually rebuffed by the fact that tobacco users who quit experienced subsequent decrease in tobacco-related illnesses. See:
http://en.wikipedia.org/wiki/Ronald_Fisher

Huff D.
How to lie with statistics.
New York: W. W. Norton & Company. 1954;:.
ISBN 0-393-31072-8, 142 pages.
"In the space of one hundred seventy-six years, the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token, any person can see that seven hundred and forty-two years from now, the Lower Mississippi will be only a mile and three-quarters long, and Cairo [Illinois] and New Orleans [Louisiana] will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact."
Cited in: Huff D. How to lie with statistics. New York: W. W. Norton & Company. 1954;:. ISBN 0-393-31072-8, 142 pages. Page 142.
COMMENT. Mark Twain's classic book, Life on the Mississippi, is the first book in the world ever submitted by an author to a publisher as a typewritten manuscript, in 1883. The inventor of the typewriter was ... Howe, who was born on June 23, 18.. The (mechanical) typewriter was invented in 1868. Source: Garrison Keillor, Author's Corner, Maryland Public Radio, June 23, 2004.

CHAPTER 28.
GLOSSARY.



Estimation. The statistical procedure of determining the best value for a statistical parameter, given sample data.

Random variable. Function X : S -> R, that maps a probability event space into the real line, R.

Expected Value: The average value, E(X), of a random value over the probability space: E(X) = ∑ x P(X=x).

Variance: The average squared deviation, Var(X), of a random value from its expected value, over the probability space: Var(X) = E[X - E(X)]]2.

Hypothesis testing.

Null hypothesis.

Alternative hypothesis.

Set theory: Zermelo-Frankel Set Theory (ZFST) is ordinary set theory.

Set: Undefined concepts of ZFST: is-a-member-of or belongs-to ; null-set or empty-set, Ø or {}.

Set: defined exactly by its members, arbitrary order.

Set-of-x not equal x.

There are no repeat elements in a set.

Set-Roster (extensional, list) notation: set X = {heart, lung, liver, pancreas, ...}.

Set-Raster (intensional) notation: O = {x|x is a major-body-organ}.

Set-subset: X ⊆ Y if and only if for every x ∈ X, x ∈ Y.

set-equality: X = Y if and only if X ⊆ Y and Y ⊆ X.

set-union: X ∪ Y is the set of all x such that x ∈ X or x ∈ Y or both.

Set-intersection: X ∪ Y is the set of all x such that x ∈ X and x ∈ Y.

Set-subtraction: X - Y is the set of all x such that x ∈ X and x ~∈ Y.

CHAPTER 29.
ADDITIONAL READINGS.



Campbell JR, Carpenter P, Sneiderman C, Cohn S, Chute CG, Warren J.
Phase II evaluation of clinical coding schemes: completeness, taxonomy, mapping, definitions, and clarity. CPRI Work Group on Codes and Structures.
J Am Med Inform Assoc. 1997 May-Jun;4(3):238-251.
PMID: 9147343.
PubMed Entry

Humphreys BL, McCray AT, Cheh ML.
Evaluating the coverage of controlled health data terminologies: report on the results of the NLM/AHCPR large scale vocabulary test.
J Am Med Inform Assoc. 1997 Nov-Dec;4(6):484-500.
PMID: 9391936.
PubMed Entry

Chute CG, Cohn SP, Campbell KE, Oliver DE, Campbell JR.
The content coverage of clinical classifications. For The Computer-Based Patient Record Institute's Work Group on Codes & Structures.
J Am Med Inform Assoc. 1996 May-Jun;3(3):224-233.
PMID: 8723613.

Cimino JJ.
Review paper: coding systems in health care.
Methods Inf Med. 1996 Dec;35(4-5):273-284.
PMID: 9019091.

Campbell JR, Payne TH.
A comparison of four schemes for codification of problem lists.
Proc Annu Symp Comput Appl Med Care. 1994;:201-205.
PMID: 7949920.

Langlotz CP, Caldwell SA.
The completeness of existing lexicons for representing radiology report information.
J Digit Imaging. 2002;15 Suppl 1:201-5. Epub 2002 Mar 21.
PMID: 12105728.

Hales JW, Schoeffler KM, Kessler DP.
Extracting medical knowledge for a coded problem list vocabulary from the UMLS Knowledge Sources.
Proc AMIA Symp. 1998;:275-279.
PMID: 9929225.

Brown PJ, Warmington V, Laurence M, Prevost AT.
Randomised crossover trial comparing the performance of Clinical Terms Version 3 and Read Codes 5 byte set coding schemes in general practice.
BMJ. 2003 May 24;326(7399):1127.
PMID: 12763986.

Elkin PL, Ruggieri AP, Brown SH, Buntrock J, Bauer BA, Wahner-Roedler D, Litin SC, Beinborn J, Bailey KR, Bergstrom L.
A randomized controlled trial of the accuracy of clinical record retrieval using SNOMED-RT as compared with ICD9-CM.
Proc AMIA Symp. 2001;:159-163.
PMID: 11825173.

Mullins HC, Scanland PM, Collins D, Treece L, Petruzzi P Jr, Goodson A, Dickinson M.
The efficacy of SNOMED, Read Codes, and UMLS in coding ambulatory family practice clinical records.
Proc AMIA Annu Fall Symp. 1996;:135-139.
PMID: 8947643.
PubMed Entry

Bodenreider O, Burgun A, Botti G, Fieschi M, Le Beux P, Kohler F.
Evaluation of the Unified Medical Language System as a medical knowledge source.
J Am Med Inform Assoc. 1998 Jan-Feb;5(1):76-87.
PMID: 9452987.
PubMed Entry

Campbell KE, Musen MA.
Representation of clinical data using SNOMED III and conceptual graphs.
Proc Annu Symp Comput Appl Med Care. 1992;:354-358.
PMID: 1482897.
PubMed Entry

Campbell JR.
Semantic features of an enterprise interface terminology for SNOMED RT.
Medinfo. 2001;10(Pt 1):82-85.
PMID: 11604710.
PubMed Entry

Han SB, Kwak M, Kim S, Yoo S, Park H, Kijoo J, Kim J, Choi M, Choi J.
A comparative study on concept representation between the UMLS and the clinical terms in Korean medical records.
Medinfo. 2004;11(Pt 1):616-620.
PMID: 15360886.
PubMed Entry

O'Keefe KM, Sievert M, Mitchell JA.
Mendelian inheritance in man: diagnoses in the UMLS. Proc Annu Symp Comput Appl Med Care. 1993;:735-739. PMID: 8130573.
PubMed Entry

Humphreys BL, Hole WT, McCray AT, Fitzmaurice JM.
Planned NLM/AHCPR large-scale vocabulary test: using UMLS technology to determine the extent to which controlled vocabularies cover terminology needed for health care and public health.
J Am Med Inform Assoc. 1996 Jul-Aug;3(4):281-287.
PMID: 8816351.
PubMed Entry

Han SB, Choi J.
The comparative study on concept representation between the UMLS and the clinical terms in Korean medical records.
Int J Med Inform. 2005 Jan;74(1):67-76.
PMID: 15626637.
PubMed Entry

Wasserman H, Wang J.
An applied evaluation of SNOMED CT as a clinical vocabulary for the computerized diagnosis and problem list.
AMIA Annu Symp Proc. 2003;:699-703.
PMID: 14728263.
PubMed Entry

Berman JJ.
Nomenclature-based data retrieval without prior annotation: facilitating biomedical data integration with fast doublet matching.
In: Silico Biol. 2005;5(3):313-22. Epub 2005 Apr 3.
PMID: 15984939.
PubMed Entry

Goldberg H, Goldsmith D, Law V, Keck K, Tuttle M, Safran C.
An evaluation of UMLS as a controlled terminology for the Problem List Toolkit.
Medinfo. 1998;9 Pt 1:609-612.
PMID: 10384527.

Vardy DA, Gill RP, Israeli A.
Coding medical information: classification versus nomenclature and implications to the Israeli medical system.
J Med Syst. 1998 Aug;22(4):203-210.
PMID: 9690178.

Klimczak JC, Hahn AW, Sievert ME, Petroski G, Hewett J.
Comparing clinical vocabularies using coding system fidelity.
Proc Annu Symp Comput Appl Med Care. 1995;:883-887.
PMID: 8563419.

Strang N, Cucherat M, Boissel JP.
Which coding system for therapeutic information in evidence-based medicine.
Comput Methods Programs Biomed. 2002 Apr;68(1):73-85.
PMID: 11886704.

Boxwala AA, Zeng QT, Chamberas A, Sato L, Dierks M.
Coverage of patient safety terms in the UMLS metathesaurus.
AMIA Annu Symp Proc. 2003;:110-114.
PMID: 14728144.

Ruggieri AP, Elkin P, Chute CG.
Representation by standard terminologies of health status concepts contained in two health status assessment instruments used in rheumatic disease management.
Proc AMIA Symp. 2000;:734-738.
PMID: 11079981.

Bodenreider O, Burgun A, Rindflesch TC.
Assessing the consistency of a biomedical terminology through lexical knowledge.
Int J Med Inform. 2002 Dec 4;67(1-3):85-95.
PMID: 12460634.

Chiang MF, Casper DS, Cimino JJ, Starren J.
Representation of ophthalmology concepts by electronic systems: adequacy of controlled medical terminologies.
Ophthalmology. 2005 Feb;112(2):175-183.
PMID: 15691548.

Brown PJ, O'Neil M, Price C.
Semantic definition of disorders in version 3 of the Read Codes.
Methods Inf Med. 1998 Nov;37(4-5):415-419.
PMID: 9865039.

Cimino JJ.
Use of the Unified Medical Language System in patient care at the Columbia-Presbyterian Medical Center.
Methods Inf Med. 1995 Mar;34(1-2):158-164.
PMID: 9082126.

Spackman KA, Campbell KE.
Compositional concept representation using SNOMED: towards further convergence of clinical terminologies.
Proc AMIA Symp. 1998;:740-744.
PMID: 9929317.
PubMed Entry

Kostoff RN, Block JA, Stump JA, Pfeil KM.
Information content in Medline record fields.
Int J Med Inform. 2004 Jun 30;73(6):515-27.
PMID: 15171980.
PubMed Entry

Lussier YA, Shagina L, Friedman C.
Automating SNOMED coding using medical language understanding: a feasibility study.
Proc AMIA Symp. 2001;:418-422.
PMID: 11825222.

Hausam RR, Hahn AW.
Representation of clinical problem assessment phrases in U. S. family practice using Read version 3.1 terms: a preliminary study.
Proc Annu Symp Comput Appl Med Care. 1995;:426-430.
PMID: 8563317.

Friedman C, Shagina L, Lussier Y, Hripcsak G.
Automated encoding of clinical documents based on natural language processing.
J Am Med Inform Assoc. 2004 Sep-Oct;11(5):392-402. Epub 2004 Jun 7.
PMID: 15187068.

van Mulligen EM.
UMLS-based access to CPR data. Unified Medical Language Systems.
Int J Med Inform. 1999 Feb-Mar;53(2-3):125-131.
PMID: 10193882.

Rosenberg KM, Coultas DB.
Acceptability of Unified Medical Language System terms as substitute for natural language general medicine clinic diagnoses.
Proc Annu Symp Comput Appl Med Care. 1994;:193-197.
PMID: 7949918.
PubMed Entry

Travers DA, Haas SW.
Evaluation of emergency medical text processor, a system for cleaning chief complaint text data.
Acad Emerg Med. 2004 Nov;11(11):1170-1176.
PMID: 15528581.
PubMed Entry

Folk LC, Hahn AW, Patrick TB, Allen GK, Smith AB, Wilcke JR.
Salvaging legacy data: mapping an obsolete medical nomenclature to a modern one.
Biomed Sci Instrum. 2002;38:405-10.
PMID: 12085640.
PubMed Entry

Peden AH.
An overview of coding and its relationship to standardized clinical terminology.
Top Health Inf Manage. 2000 Nov;21(2):1-9. Review.
PMID: 11143274.
PubMed Entry

Huang Y, Lowe HJ, Hersh WR.
A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports.
J Am Med Inform Assoc. 2003 Nov-Dec;10(6):580-587. Epub 2003 Aug 4.
PMID: 12925544.
PubMed Entry

Berman JJ.
A tool for sharing annotated research data: the "Category 0" UMLS (Unified Medical Language System) vocabularies.
BMC Med Inform Decis Mak. 2003 Jun 16;3:6. Epub 2003 Jun 16.
PMID: 12809560.
PubMed Entry

McCray AT, Divita G.
ASN.1: defining a grammar for the UMLS knowledge sources.
Proc Annu Symp Comput Appl Med Care. 1995;:868-872.
PMID: 8563416.
PubMed Entry

Miller G, Britt H.
Data collection and changing health care systems. 1. United Kingdom.
Med J Aust. 1993 Oct 4;159(7):471-476.
PMID: 8412921.
PubMed Entry

Cimino JJ, Min H, Perl Y.
Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus.
J Biomed Inform. 2003 Dec;36(6):450-61.
PMID: 14759818.
PubMed Entry

Mutalik PG, Deshpande A, Nadkarni PM.
Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS.
J Am Med Inform Assoc. 2001 Nov-Dec;8(6):598-609.
PMID: 11687566 [PubMed - indexed for MEDLINE]
PubMed Entry

Bidgood WD Jr.
The SNOMED DICOM microglossary: controlled terminology resource for data interchange in biomedical imaging.
Methods Inf Med. 1998 Nov;37(4-5):404-414.
PMID: 9865038.
PubMed Entry

Dolin RH, Mattison JE, Cohn S, Campbell KE, Wiesenthal AM, Hochhalter B, LaBerge D, Barsoum R, Shalaby J, Abilla A, Clements RJ, Correia CM, Esteva D, Fedack JM, Goldberg BJ, Gopalarao S, Hafeza E, Hendler P, Hernandez E, Kamangar R, Kahn RA, Kurtovich G, Lazzareschi G, Lee MH, Lee T, Levy D, Lukoff JY, Lundberg C, Madden MP, Ngo TL, Nguyen BT, Patel NP, Resneck J, Ross DE, Schwarz KM, Selhorst CC, Snyder A, Umarji MI, Vilner M, Zer-Chen R, Zingo C.
Kaiser Permanente's Convergent Medical Terminology.
Medinfo. 2004;11(Pt 1):346-350.
PMID: 15360832.
PubMed Entry

Brown PJ, Warmington V, Laurence M, Prevost AT.
A methodology for the functional comparison of coding schemes in primary care.
Inform Prim Care. 2003;11(3):145-148.
PMID: 14680537.

Elkin PL, Ruggieri A, Bergstrom L, Bauer BA, Lee M, Ogren PV, Chute CG.
A randomized controlled trial of concept based indexing of Web page content.
Proc AMIA Symp. 2000;:220-224.
PMID: 11079877.

Fung KW, Hole WT, Nelson SJ, Srinivasan S, Powell T, Roth L.
Integrating SNOMED CT into the UMLS: an exploration of different views of synonymy and quality of editing.
J Am Med Inform Assoc. 2005 Jul-Aug;12(4):486-94. Epub 2005 Mar 31.
PMID: 15802483.

Zeng Q, Cimino JJ.
Mapping medical vocabularies to the Unified Medical Language System.
Proc AMIA Annu Fall Symp. 1996;:105-109.
PMID: 8947637.

Elkin PL, Bailey KR, Chute CG.
A randomized controlled trial of automated term composition.
Proc AMIA Symp. 1998;:765-769.
PMID: 9929322.

Wang AY, Sable JH, Spackman KA.
The SNOMED clinical terms development process: refinement and analysis of content.
Proc AMIA Symp. 2002;:845-849.
PMID: 12463944.

Sinha U, Yaghmai A, Thompson L, Dai B, Taira RK, Dionisio JD, Kangarloo H.
Evaluation of SNOMED3.5 in representing concepts in chest radiology reports: integration of a SNOMED mapper with a radiology reporting workstation.
Proc AMIA Symp. 2000;:799-803.
PMID: 11079994.

O'Neil M, Payne C, Read J.
Read Codes Version 3: a user led terminology.
Methods Inf Med. 1995 Mar;34(1-2):187-192.
PMID: 9082130.

Kudla KM, Rallins MC.
SNOMED: a controlled vocabulary for computer-based patient records.
J AHIMA. 1998 May;69(5):40-44; quiz 45-46.
PMID: 10179248.

Gray J, Orr D, Majeed A.
Use of Read codes in diabetes management in a south London primary care group: implications for establishing disease registers.
BMJ. 2003 May 24;326(7399):1130.
PMID: 12763987.

Bodenreider O.
The Unified Medical Language System (UMLS): integrating biomedical terminology.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-270.
PMID: 14681409.

Burgun A, Botti G, Bodenreider O, Delamarre D, Leveque JM, Lukacs B, Mayeux D, Bremond M, Kohler F, Fieschi M, Le Beux P.
Methodology for using the UMLS as a background knowledge for the description of surgical procedures.
Int J Biomed Comput. 1996 Dec;43(3):189-202.
PMID: 9032008.

Bodenreider O, McCray AT.
From French vocabulary to the Unified Medical Language System: a preliminary study.
Medinfo. 1998;9 Pt 1:670-674.
PMID: 10384539.

Cohen PT, Henry SB.
Representing HIV clinical terminology with SNOMED.
Medinfo. 1995;8 Pt 1:172.
PMID: 8591146.

Cowie JM, Wanger KM, Cartwright A, Bailey H, Millar JA, Price S, Henry M.
A review of Clinical Terms Version 3 (Read Codes) for speech and language record keeping.
Int J Lang Commun Disord. 2001 Jan-Mar;36(1):117-26.
PMID: 11221428.

Cimino JJ.
Formal descriptions and adaptive mechanisms for changes in controlled medical vocabularies.
Methods Inf Med. 1996 Sep;35(3):202-210.
PMID: 8952304

Lussier YA, Bourque M.
Comparing SNOMED and ICPC retrieval accuracies using relational database models.
Proc AMIA Annu Fall Symp. 1997;:514-518.
PMID: 9357679.

de Lusignan S. Codes, classifications, terminologies and nomenclatures: definition, development and application in practice.
Inform Prim Care. 2005;13(1):65-70.
PMID: 15949178.

Petersson H, Nilsson G, Strender LE, Ahlfeldt H.
The connection between terms used in medical records and coding system: a study on Swedish primary health care data.
Med Inform Internet Med. 2001 Apr-Jun;26(2):87-99.
PMID: 11560294.

Zollo KA, Huff SM.
Automated mapping of observation codes using extensional definitions.
J Am Med Inform Assoc. 2000 Nov-Dec;7(6):586-592.
PMID: 11062232.

Lowe HJ, Antipov I, Hersh W, Smith CA, Mailhot M.
Automated semantic indexing of imaging reports to support retrieval of medical images in the multimedia electronic medical record.
Methods Inf Med. 1999 Dec;38(4-5):303-307.
PMID: 10805018.

Emelin IV, Levenson R, Perov YL, Rykov VV.
A Russian version of SNOMED-International.
Medinfo. 1995;8 Pt 1:173.
PMID: 8591147.

Boulos MN, Roudsari AV, Carson ER.
Towards a semantic medical Web: HealthCyberMap's tool for building an RDF metadata base of health information resources based on the Qualified Dublin Core Metadata Set.
Med Sci Monit. 2002 Jul;8(7):MT124-136.
PMID: 12118210.

Fenton SH.
Clinical vocabularies and terminologies: impact on the future of health information management.
Top Health Inf Manage. 2000 Nov;21(2):74-80.
PMID: 11143283.

van Mulligen EM.
UMLS-based access to CPR data.
Medinfo. 1998;9 Pt 1:166-170.
PMID: 10384441.

Pietrzyk PM.
Free text analysis.
Int J Biomed Comput. 1995 Apr;39(1):139-144.
PMID: 7601527.

Phantumvanit P, Monteil RA, Walsh TF, Miotti FA, Carlsson P, Doukoudakis A, Fox C, Harzer W.
4.2 Clinical records and global diagnostic codes.
Eur J Dent Educ. 2002;6 Suppl 3:138-146.
PMID: 12390270.

Levy B.
Evolving to clinical terminology.
J Healthc Inf Manag. 2004 Summer;18(3):37-43.
PMID: 15301416.

Mejino JL Jr, Rosse C.
The potential of the digital anatomist foundational model for assuring consistency in UMLS sources.
Proc AMIA Symp. 1998;:825-829.
PMID: 9929334.

Sager N, Lyman M, Nhan NT, Tick LJ.
Medical language processing: applications to patient data representation and automatic encoding.
Methods Inf Med. 1995 Mar;34(1-2):140-146.
PMID: 9082123.

Friedman C, Liu H, Shagina L, Johnson S, Hripcsak G.
Evaluating the UMLS as a source of lexical knowledge for medical language processing.
Proc AMIA Symp. 2001;:189-193.
PMID: 11825178.

Rose JS, Fisch BJ, Hogan WR, Levy B, Marshall P, Thomas DR, Kirkley D.
Common medical terminology comes of age, Part Two: Current code and terminology sets--strengths and weaknesses.
J Healthc Inf Manag. 2001 Fall;15(3):319-330.
PMID: 11642148.

Rector AL, Solomon WD, Nowlan WA, Rush TW, Zanstra PE, Claassen WM.
A Terminology Server for medical language and medical information systems.
Methods Inf Med. 1995 Mar;34(1-2):147-157.
PMID: 9082124.

Lussier YA, Rothwell DJ, Cote RA.
The SNOMED model: a knowledge source for the controlled terminology of the computerized patient record.
Methods Inf Med. 1998 Jun;37(2):161-164.
PMID: 9656658.

LeMier M, Cummings P, West TA.
Accuracy of external cause of injury codes reported in Washington State hospital discharge records.
Inj Prev. 2001 Dec;7(4):334-338.
PMID: 11770664.

Michael J, Mejino JL Jr, Rosse C.
The role of definitions in biomedical concept representation.
Proc AMIA Symp. 2001;:463-467.
PMID: 11825231.

Brennan PF, Aronson AR.
Towards linking patients and clinical information: detecting UMLS concepts in e-mail.
J Biomed Inform. 2003 Aug-Oct;36(4-5):334-341.
PMID: 14643729.

McDonald CJ, Martin DK, Overhage JM.
Standards for the electronic transfer of clinical data: progress and promises.
Top Health Rec Manage. 1991 Jun;11(4):1-16.
PMID: 10112033.

Klimczak JC, Hahn AW, Hausam RR, Sievert ME, Mitchell JA.
A system for browsing the SNOMED International vocabulary.
Biomed Sci Instrum. 1994;30:127-132.
PMID: 7948624.

Wang AY, Barrett JW, Bentley T, Markwell D, Price C, Spackman KA, Stearns MQ.
Mapping between SNOMED RT and Clinical terms version 3: a key component of the SNOMED CT development process.
Proc AMIA Symp. 2001;:741-745.
PMID: 11825284.

Ingenerf J, Reiner J, Seik B.
Standardized terminological services enabling semantic interoperability between distributed and heterogeneous systems.
Int J Med Inform. 2001 Dec;64(2-3):223-240.
PMID: 11734388.

Kim JM, Frosdick P. Description of a drug hierarchy in a concept-based reference terminology.
Proc AMIA Symp. 2001;:314-318.
PMID: 11825202.

Elkin PL, Bailey KR, Ogren PV, Bauer BA, Chute CG.
A randomized double-blind controlled trial of automated term dissection.
Proc AMIA Symp. 1999;:62-66.
PMID: 10566321.

Caviedes JE, Cimino JJ.
Towards the development of a conceptual distance metric for the UMLS.
J Biomed Inform. 2004 Apr;37(2):77-85.
PMID: 15120654.

Chute CG, Cohn SP, Campbell JR.
A framework for comprehensive health terminology systems in the United States: development guidelines, criteria for selection, and public policy implications.
ANSI Healthcare Informatics Standards Board Vocabulary Working Group and the Computer-Based Patient Records Institute Working Group on Codes and Structures.
J Am Med Inform Assoc. 1998 Nov-Dec;5(6):503-510.
PMID: 9824798.

Chute CG, Elkin PL.
A clinically derived terminology: qualification to reduction.
Proc AMIA Annu Fall Symp. 1997;:570-574.
PMID: 9357690.

Browne AC, Divita G, Aronson AR, McCray AT.
UMLS language and vocabulary tools.
AMIA Annu Symp Proc. 2003;:798.
PMID: 14728303.

Stuart-Buttle CD, Read JD, Sanderson HF, Sutton YM.
A language of health in action: Read Codes, classifications and groupings.
Proc AMIA Annu Fall Symp. 1996;:75-79.
PMID: 8947631.

Nadkarni P, Chen R, Brandt C.
UMLS concept indexing for production databases: a feasibility study.
J Am Med Inform Assoc. 2001 Jan-Feb;8(1):80-91.
PMID: 11141514.

Dykes PC, Currie LM, Cimino JJ.
Adequacy of evolving national standardized terminologies for interdisciplinary coded concepts in an automated clinical pathway.
J Biomed Inform. 2003 Aug-Oct;36(4-5):313-325.

Elkin PL, Mohr DN, Tuttle MS, Cole WG, Atkin GE, Keck K, Fisk TB, Kaihoi BH, Lee KE, Higgins MC, Suermondt HJ, Olson N, Claus PL, Carpenter PC, Chute CG.
Standardized problem list generation, utilizing the Mayo canonical vocabulary embedded within the Unified Medical Language System.
Proc AMIA Annu Fall Symp. 1997;:500-504.
PMID: 9357676.

Onogi Y, Ohe K, Tanaka M, Nozoe A, Sasaki T, Sato M, Kikuchi Y, Shinohara T, Suzuki H, Kaihara S, Seyama Y.
Mapping Japanese medical terms to UMLS Metathesaurus.
Medinfo. 2004;11(Pt 1):406-410.
PMID: 15360844.

Hersh WR, Campbell EH, Evans DA, Brownlow ND.
Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools.
Proc AMIA Annu Fall Symp. 1996;:159-163.
PMID: 8947648.

Warren JJ, Collins J, Sorrentino C, Campbell JR.
Just-in-time coding of the problem list in a clinical environment.
Proc AMIA Symp. 1998;:280-284.
PMID: 9929226.

Cantor MN, Sarkar IN, Gelman R, Hartel F, Bodenreider O, Lussier YA.
An evaluation of hybrid methods for matching biomedical terminologies: mapping the gene ontology to the UMLS.
Stud Health Technol Inform. 2003;95:62-67.
PMID: 14663964.

Zweigenbaum P, Grabar N.
Corpus-based associations provide additional morphological variants to medical terminologies.
AMIA Annu Symp Proc. 2003;:768-772.
PMID: 14728277.

Alexander S, Conner T, Slaughter T.
Overview of inpatient coding.
Am J Health Syst Pharm. 2003 Nov 1;60(21 Suppl 6):S11-14.
PMID: 14619128.

Rosse C, Mejino JL Jr.
A reference ontology for biomedical informatics: the Foundational Model of Anatomy.
J Biomed Inform. 2003 Dec;36(6):478-500.
PMID: 14759820.

Bodenreider O, Nelson SJ, Hole WT, Chang HF.
Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies.
Proc AMIA Symp. 1998;:815-819.
PMID: 9929332.

Payne TH, Martin DR.
How useful is the UMLS metathesaurus in developing a controlled vocabulary for an automated problem list?
Proc Annu Symp Comput Appl Med Care. 1993;:705-709.
PMID: 8130567.

Stitt FW.
The problem-oriented medical synopsis: coding, indexing, and classification sub-model.
Proc Annu Symp Comput Appl Med Care. 1994;:964.
PMID: 7950069.

Franz P, Zaiss A, Schulz S, Hahn U, Klar R.
Automated coding of diagnoses--three methods compared.
Proc AMIA Symp. 2000;:250-254.
PMID: 11079883.

Spackman KA.
Integrating sources for a clinical reference terminology: experience linking SNOMED to LOINC and drug vocabularies.
Medinfo. 1998;9 Pt 1:600-603.
PMID: 10384525.

Baud RH, Lovis C, Rassinoux AM, Scherrer JR.
Alternative ways for knowledge collection, indexing and robust language retrieval.
Methods Inf Med. 1998 Nov;37(4-5):315-326.
PMID: 9865029.

Brown PJ, Odusanya L.
Does size matter?--Evaluation of value added content of two decades of successive coding schemes in secondary care.
Proc AMIA Symp. 2001;:71-75.
PMID: 11825157.

Johnson SB.
A semantic lexicon for medical language processing.
J Am Med Inform Assoc. 1999 May-Jun;6(3):205-218.
PMID: 10332654.

Miller PL, Frawley SJ, Wright L, Roderer NK, Powsner SM.
Lessons learned from a pilot implementation of the UMLS information sources map.
J Am Med Inform Assoc. 1995 Mar-Apr;2(2):102-115.
PMID: 7743314.

Burgun A, Bodenreider O.
Mapping the UMLS Semantic Network into general ontologies.
Proc AMIA Symp. 2001;:81-85.
PMID: 11833483.

Hasman A, de Bruijn LM, Arends JW.
Evaluation of a method that supports pathology report coding.
Methods Inf Med. 2001;40(4):293-297.
PMID: 11552341.

Dessena S, Mori AR, Galeazzi E.
Development of a cross-thesaurus with Internet-based refinement supported by UMLS.
Int J Med Inform. 1999 Jan;53(1):29-41.
PMID: 10075129.

Cimino JJ.
Auditing the Unified Medical Language System with semantic methods.
J Am Med Inform Assoc. 1998 Jan-Feb;5(1):41-51.
PMID: 9452984.

MacIntyre CR, Ackland MJ, Chandraraj EJ.
Accuracy of injury coding in Victorian hospital morbidity data.
Aust N Z J Public Health. 1997 Dec;21(7):779-783.
PMID: 9489199.

Schwartz RJ, Nightingale BS, Boisoneau D, Jacobs LM.
Accuracy of e-codes assigned to emergency department records.
Acad Emerg Med. 1995 Jul;2(7):615-620.
PMID: 8521208.

Brown SH, Miller RA, Camp HN, Guise DA, Walker HK.
Empirical derivation of an electronic clinically useful problem statement system.
Ann Intern Med. 1999 Jul 20;131(2):117-126.
PMID: 10419428.

Cimino JJ.
Terminology tools: state of the art and practical lessons.
Methods Inf Med. 2001;40(4):298-306.
PMID: 11552342.

Lowe HJ, Antipov I, Hersh W, Smith CA.
Towards knowledge-based retrieval of medical images. The role of semantic indexing, image content representation and knowledge-based retrieval.
Proc AMIA Symp. 1998;:882-886.
PMID: 9929345.

Geissbuhler A, Miller RA.
Clinical application of the UMLS in a computerized order entry and decision-support system.
Proc AMIA Symp. 1998;:320-324.
PMID: 9929234.

Elkin PL, Brown SH.
Automated enhancement of description logic-defined terminologies to facilitate mapping to ICD9-CM.
J Biomed Inform. 2002 Oct-Dec;35(5-6):281-288.

Li J, Morlet N, Semmens J, Gavin A, Ng J; EPSWA Team.
Coding accuracy for endophthalmitis diagnosis and cataract procedures in Western Australia. The Endophthalmitis Population Study of Western Australia (EPSWA): second report.
Ophthalmic Epidemiol. 2003 Apr;10(2):133-145.
PMID: 12660861.

Hole WT, Srinivasan S.
Discovering missed synonymy in a large concept-oriented metathesaurus.
Proc AMIA Symp. 2000;:354-358.
PMID: 11079904.

Penz JF, Brown SH, Carter JS, Elkin PL, Nguyen VN, Sims SA, Lincoln MJ.
Evaluation of SNOMED coverage of Veterans Health Administration terms.
Medinfo. 2004;11(Pt 1):540-544.
PMID: 15360871.

Read JD, Sanderson HF, Drennan YM.
Terming, encoding, and grouping.
Medinfo. 1995;8 Pt 1:56-59.
PMID: 8591263.

Sager N, Lyman M, Nhan NT, Tick LJ.
Automatic encoding into SNOMED III: a preliminary investigation.
Proc Annu Symp Comput Appl Med Care. 1994;:230-234.
PMID: 7949925.

Spackman KA.
Normal forms for description logic expressions of clinical concepts in SNOMED RT.
Proc AMIA Symp. 2001;:627-631.
PMID: 11825261.

Happe A, Pouliquen B, Burgun A, Cuggia M, Le Beux P.
Automatic concept extraction from spoken medical reports.
Int J Med Inform. 2003 Jul;70(2-3):255-263.
PMID: 12909177.

Berman JJ.
Resources for comparing the speed and performance of medical autocoders.
BMC Med Inform Decis Mak. 2004 Jun 15;4:8.
PMID: 15198804.

Rothwell DJ.
SNOMED-based knowledge representation.
Methods Inf Med. 1995 Mar;34(1-2):209-213.
PMID: 9082133.

Dixon J, Sanderson C, Elliott P, Walls P, Jones J, Petticrew M.
Assessment of the reproducibility of clinical coding in routinely collected hospital activity data: a study in two hospitals.
J Public Health Med. 1998 Mar;20(1):63-69.
PMID: 9602451.

Masarie FE Jr, Miller RA, Bouhaddou O, Giuse NB, Warner HR.
An interlingua for electronic interchange of medical information: using frames to map between clinical vocabularies.
Comput Biomed Res. 1991 Aug;24(4):379-400.
PMID: 1889203.

Masys DR.
An evaluation of the source selection elements of the prototype UMLS Information Sources Map.
Proc Annu Symp Comput Appl Med Care. 1992;:295-298.
PMID: 1482883.

Choi J, Jenkins ML, Cimino JJ, White TM, Bakken S.
Toward semantic interoperability in home health care: formally representing OASIS items for integration into a concept-oriented terminology.
J Am Med Inform Assoc. 2005 Jul-Aug;12(4):410-417. Epub 2005 Mar 31.
PMID: 15802480.

Moore GW, Berman JJ.
Performance analysis of manual and automated systematized nomenclature of medicine (SNOMED) coding.
Am J Clin Pathol. 1994 Mar;101(3):253-256.
PMID: 8135178.

Moore GW, Berman JJ.
Automatic SNOMED coding.
Proc Annu Symp Comput Appl Med Care. 1994;:225-229.
PMID: 7949924.

Rosse C, Ben Said M, Eno KR, Brinkley JF.
Enhancements of anatomical information in UMLS knowledge sources.
Proc Annu Symp Comput Appl Med Care. 1995;:873-877.
PMID: 8563417.

Cimino JJ, Clayton PD, Hripcsak G, Johnson SB.
Knowledge-based approaches to the maintenance of a large controlled medical terminology.
J Am Med Inform Assoc. 1994 Jan-Feb;1(1):35-50.
PMID: 7719786.

McCray AT, Razi AM, Bangalore AK, Browne AC, Stavri PZ.
The UMLS Knowledge Source Server: a versatile Internet-based research tool.
Proc AMIA Annu Fall Symp. 1996;:164-168.
PMID: 8947649.

Murphy SN, Barnett GO.
Achieving automated narrative text interpretation using phrases in the electronic medical record.
Proc AMIA Annu Fall Symp. 1996;:532-536.
PMID: 8947723.

Zweigenbaum P, Baud R, Burgun A, Namer F, Jarrousse E, Grabar N, Ruch P, Le Duff F, Forget JF, Douyere M, Darmoni S.
UMLF: a unified medical lexicon for French.
Int J Med Inform. 2005 Mar;74(2-4):119-124.
PMID: 15694616.

Yarnall KS, Michener JL, Broadhead WE, Hammond WE, Tse CK.
Computer-prompted diagnostic codes.
J Fam Pract. 1995 Mar;40(3):257-262.
PMID: 7876783.

Rector AL, Glowinski AJ, Nowlan WA, Rossi-Mori A.
Medical-concept models and medical records: an approach based on GALEN and PEN&PAD.
J Am Med Inform Assoc. 1995 Jan-Feb;2(1):19-35.
PMID: 7895133.

Britt H, Meza RA, Del Mar C.
Methodology of morbidity and treatment data collection in general practice in Australia: a comparison of two methods.
Fam Pract. 1996 Oct;13(5):462-467.
PMID: 8902516.

Hole WT, Carlsen BA, Tuttle MS, Srinivasan S, Lipow SS, Olson NE, Sherertz DD, Humphreys BL.
Achieving "source transparency" in the UMLS Metathesaurus.
Medinfo. 2004;11(Pt 1):371-375.
PMID: 15360837.

Gu H, Perl Y, Elhanan G, Min H, Zhang L, Peng Y.
Auditing concept categorizations in the UMLS.
Artif Intell Med. 2004 May;31(1):29-44.
PMID: 15182845.

Chute CG, Yang Y, Evans DA.
Latent Semantic Indexing of medical diagnoses using UMLS semantic structures.
Proc Annu Symp Comput Appl Med Care. 1991;:185-189.
PMID: 1807584.

Thurin A, Carlsson M, Gill H, Wigertz O.
Arden syntax and GALEN terminology support: a powerful combination to represent medical knowledge.
Medinfo. 1995;8 Pt 1:110.
PMID: 8591131.

Fisher ES, Whaley FS, Krushat WM, Malenka DJ, Fleming C, Baron JA, Hsia DC.
The accuracy of Medicare's hospital claims data: progress has been made, but problems remain.
Am J Public Health. 1992 Feb;82(2):243-248.
PMID: 1739155.

Levesque Y, LeBlanc AR, Maksud M.
MD Concept: a model for integrating medical knowledge.
Proc Annu Symp Comput Appl Med Care. 1994;:252-256.
PMID: 7949929.

Brown PJ, Warmington V.
The Certainty-Agreement diagram: comparing the functionality of coding schemes in primary care clinical information systems.
AMIA Annu Symp Proc. 2003;:797.
PMID: 14728302.

Berman JJ, Moore GW, Donnelly WH, Massey JK, Craig B.
A SNOMED analysis of three years' accessioned cases (40,124) of a surgical pathology department: implications for pathology-based demographic studies.
Proc Annu Symp Comput Appl Med Care. 1994;:188-192.
PMID: 7949917.

Elkin PL, Brown SH, Bauer BA, Husser CS, Carruth W, Bergstrom LR, Wahner-Roedler DL.
A controlled trial of automated classification of negation from clinical notes.
BMC Med Inform Decis Mak. 2005 May 5;5(1):13.
PMID: 15876352.

Coonan KM.
Medical informatics standards applicable to emergency department information systems: making sense of the jumble.
Acad Emerg Med. 2004 Nov;11(11):1198-1205.
PMID: 15528585.

Henry SB, Holzemer WL.
Can SNOMED International represent patients' perceptions of health-related problems for the computer-based patient record?
Proc Annu Symp Comput Appl Med Care. 1994;:184-187.
PMID: 7949916.

Burgun A, Delamarre D, Botti G, Lukacs B, Mayeux D, Bremond M, Kohler F, Fieschi M, Le Beux P.
Designing a sub-set of the UMLS knowledge base applied to a clinical domain: methods and evaluation.
Proc Annu Symp Comput Appl Med Care. 1994;:968.
PMID: 7950072.

Oliver DE, Altman RB.
Extraction of SNOMED concepts from medical record texts.
Proc Annu Symp Comput Appl Med Care. 1994;:179-183.
PMID: 7949915.

Satomura Y, do Amaral MB.
Automated diagnostic indexing by natural language processing.
Med Inform (Lond). 1992 Jul-Sep;17(3):149-163.
PMID: 1405837.

Burgun A, Bodenreider O, Denier P, Delamarre D, Botti G, Lukacs B, Mayeux D, Bremond M, Kohler F, Fieschi M, et al.
Knowledge acquisition from the UMLS sources: application to the description of surgical procedures.
Medinfo. 1995;8 Pt 1:75-79.
PMID: 8591317.

Hohnloser JH, Kadlec P, Puerner F.
Coding clinical information: analysis of clinicians using computerized coding.
Methods Inf Med. 1996 Jun;35(2):104-107.
PMID: 8755382.

Cimino JJ.
Representation of clinical laboratory terminology in the Unified Medical Language System.
Proc Annu Symp Comput Appl Med Care. 1991;:199-203.
PMID: 1807587.

Greenes RA, McClure RC, Pattison-Gordon E, Sato L.
The findings--diagnosis continuum: implications for image descriptions and clinical databases.
Proc Annu Symp Comput Appl Med Care. 1992;:383-387.
PMID: 1482902.

Rossi CR, Alberti V, Mancino G, Flor L, Martello T, Poeta L, Lise M, Diana L, Puppini G.
Comparison between manual and automatic coding of medical record statistical cards at a university hospital.
Med Inform (Lond). 1993 Jan-Mar;18(1):53-59.
PMID: 8366692.

Tuttle MS, Nelson SJ.
The role of the UMLS in 'storing' and 'sharing' across systems.
Int J Biomed Comput. 1994 Jan;34(1-4):207-237.
PMID: 8125633.

Hohnloser JH, Purner F, Kadlec P.
Coding medical concepts: a controlled experiment with a computerised coding tool.
Int J Clin Monit Comput. 1995;12(3):141-145.
PMID: 8583167.

Kudla KM, Blakemore M.
SNOMED takes the next step.
J AHIMA. 2001 Jul-Aug;72(7):62, 64-68; quiz 69-70.
PMID: 15724371.

Barrie JL, Marsh DR.
Quality of data in the Manchester orthopaedic database.
BMJ. 1992 Jan 18;304(6820):159-162.
PMID: 1737162.

Schuyler PL, Hole WT, Tuttle MS, Sherertz DD.
The UMLS Metathesaurus: representing different views of biomedical concepts.
Bull Med Libr Assoc. 1993 Apr;81(2):217-222.
PMID: 8472007.

Huang Y, Lowe HJ, Klein D, Cucina RJ.
Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.
J Am Med Inform Assoc. 2005 May-Jun;12(3):275-285. Epub 2005 Jan 31.
PMID: 15684131.

Krall MA, Chin H, Dworkin L, Gabriel K, Wong R.
Improving clinician acceptance and use of computerized documentation of coded diagnosis.
Am J Manag Care. 1997 Apr;3(4):597-601.
PMID: 10169527.

Hishiki T, Ogasawara O, Tsuruoka Y, Okubo K.
Indexing anatomical concepts to OMIM Clinical Synopsis using the UMLS Metathesaurus.
In: Silico Biol. 2004;4(1):31-54. Epub 2003 Dec 28.
PMID: 15089752.

Zelingher J, Rind DM, Caraballo E, Tuttle MS, Olson NE, Safran C.
Categorization of free-text problem lists: an effective method of capturing clinical data.
Proc Annu Symp Comput Appl Med Care. 1995;:416-420.
PMID: 8563314.

Bishop CW, Ewing PD.
Description and advantages of an index-driven medical knowledge base.
Medinfo. 1995;8 Pt 2:952.
PMID: 8591595.

Wang X, Quek HN, Cantor M, Kra P, Schultz A, Lussier YA.
Automating terminological networks to link heterogeneous biomedical databases.
Medinfo. 2004;11(Pt 1):555-559.
PMID: 15360874.

Bertaud V, Lasbleiz J, Mougin F, Marin F, Burgun A, Duvauferrier R.
Toward a unified representation of findings in clinical radiology.
Stud Health Technol Inform. 2005;116:671-676.
PMID: 16160335.

Farhan J, Al-Jummaa S, Alrajhi AA, Al-Rayes H, Al-Nasser A.
Documentation and coding of medical records in a tertiary care center: a pilot study.
Ann Saudi Med. 2005 Jan-Feb;25(1):46-9. Erratum in: Ann Saudi Med. 2005 May-Jun;25(3):269. Al-Rajhi, Abdulrahman [corrected to Alrajhi, Abdulrahman A].
PMID: 15822494.

Cantor MN, Lussier YA.
Putting data integration into practice: using biomedical terminologies to add structure to existing data sources.
AMIA Annu Symp Proc. 2003;:125-129.
PMID: 14728147.

Ricketts D, Hartley J, Harries W, Hitchin D.
Who should code orthopaedic inpatients?
A comparison of junior hospital doctors and coding clerks.
Ann R Coll Surg Engl. 1993 Nov;75(6 Suppl):203-206.
PMID: 8017796.

Chute CG, Yang Y.
An evaluation of concept based latent semantic indexing for clinical information retrieval.
Proc Annu Symp Comput Appl Med Care. 1992;:639-643.
PMID: 1482949 [PubMed - indexed for MEDLINE]

Zou Q, Chu WW, Morioka C, Leazer GH, Kangarloo H.
IndexFinder: a method of extracting key concepts from clinical texts for indexing.
AMIA Annu Symp Proc. 2003;:763-767.
PMID: 14728276.

Hohnloser JH, Purner F, Kadlec P.
Coding medical concepts: a controlled experiment with a computerized coding tool.
Med Inform (Lond). 1996 Jul-Sep;21(3):199-206.
PMID: 9062882.

Forman BH, Cimino JJ, Johnson SB, Sengupta S, Sideli R, Clayton P.
Applying a controlled medical terminology to a distributed, production clinical information system.
Proc Annu Symp Comput Appl Med Care. 1995;:421-425.
PMID: 8563316.

Shamoun D, Livesay L.
Organizing the animal hierarchy into a Linnean Taxonomy in SNOMED CT.
AMIA Annu Symp Proc. 2003;:1005.
PMID: 14728508. Bowman S. Coordinating SNOMED-CT and ICD-10. J AHIMA. 2005 Jul-Aug;76(7):60-61. PMID: 16097126.

Hohnloser JH, Kadlec P, Puerner F.
Experiments in coding clinical information: an analysis of clinicians using a computerized coding tool.
Comput Biomed Res. 1995 Oct;28(5):393-401.
PMID: 8612401.

Tse T, Soergel D.
Exploring medical expressions used by consumers and the media: an emerging view of consumer health vocabularies.
AMIA Annu Symp Proc. 2003;:674-678.
PMID: 14728258.

Miller RA, Gieszczykiewicz FM, Vries JK, Cooper GF.
CHARTLINE: providing bibliographic references relevant to patient charts using the UMLS Metathesaurus Knowledge Sources.
Proc Annu Symp Comput Appl Med Care. 1992;:86-90.

Pringle M, Ward P, Chilvers C.
Assessment of the completeness and accuracy of computer medical records in four practices committed to recording data on computer.
Br J Gen Pract. 1995 Oct;45(399):537-541.
PMID: 7492423.

Berman JJ, Moore GW.
SNOMED-encoded surgical pathology databases: a tool for epidemiologic investigation.
Mod Pathol. 1996 Sep;9(9):944-950.
PMID: 8878028.

Zweigenbaum P.
MENELAS: an access system for medical records using natural language.
Comput Methods Programs Biomed. 1994 Oct;45(1-2):117-120.
PMID: 7889741.

Carlsson M, Ahlfeldt H, Thurin A, Wigertz O.
Terminology support for development of sharable knowledge modules.
Med Inform (Lond). 1996 Jul-Sep;21(3):207-14.
PMID: 9062883.

Rothwell DJ, Cote RA.
Managing information with SNOMED: understanding the model.
Proc AMIA Annu Fall Symp. 1996;:80-3.
PMID: 8947632

Cimino JJ, Clayton PD. Coping with changing controlled vocabularies. Proc Annu Symp Comput Appl Med Care. 1994;:135-9. PMID: 7949906

Schulz EB, Price C, Brown PJ.
Symbolic anatomic knowledge representation in the Read Codes version 3: structure and application.
J Am Med Inform Assoc. 1997 Jan-Feb;4(1):38-48.
PMID: 8988473

Sato L, McClure RC, Rouse RL, Schatz CA, Greenes RA.
Enhancing the Metathesaurus with clinically relevant concepts: anatomic representations.
Proc Annu Symp Comput Appl Med Care. 1992;:388-91.
PMID: 1482903

Humphreys BL, Lindberg DA, Hole WT.
Assessing and enhancing the value of the UMLS Knowledge Sources.
Proc Annu Symp Comput Appl Med Care. 1991;:78-82.
PMID: 1807711

Rossi Mori A.
Coding systems and controlled vocabularies for hospital information systems.
Int J Biomed Comput. 1995 Apr;39(1):93-8.
PMID: 7601548

Brigl B, Mieth M, Haux R, Gluck E.
The LBI-method for automated indexing of diagnoses by using SNOMED. Part 2. Evaluation.
Int J Biomed Comput. 1995 Feb;38(2):101-8.
PMID: 7729926

Ireland MC, Regan BG.
General practice medical records: is coding appropriate?.
Medinfo. 1995;8 Pt 1:47-50.
PMID: 8591233

Bodenreider O, Burgun A.
Aligning knowledge sources in the UMLS: methods, quantitative results, and applications.
Medinfo. 2004;11(Pt 1):327-331.
PMID: 15360828.

Nelson SJ, Fuller LF, Erlbaum MS, Tuttle MS, Sherertz DD, Olson NE.
The semantic structure of the UMLS Metathesaurus.
Proc Annu Symp Comput Appl Med Care. 1992;:649-53.
PMID: 1482952

Tabaqchali MA, Venables CW.
The clinical terms project: its potential for computerised surgical audit.
Ann R Coll Surg Engl. 1995 May;77(3 Suppl):124-9.
PMID: 7574305

Kanter SL, Miller RA, Tan M, Schwartz J.
Using POSTDOC to recognize biomedical concepts in medical school curricular documents.
Bull Med Libr Assoc. 1994 Jul;82(3):283-7.
PMID: 7920338

Plovnick RM, Zeng QT.
Reformulation of consumer health queries with professional terminology: a pilot study.
J Med Internet Res. 2004 Sep 3;6(3):e27.
PMID: 15471753

Banks IC.
The application of Read Codes to anaesthesia.
Anaesthesia. 1994 Apr;49(4):324-7.
PMID: 8179142

Cimino JJ, Johnson SB, Hripcsak G, Hill CL, Clayton PD.
Managing vocabulary for a centralized clinical system.
Medinfo. 1995;8 Pt 1:117-20.
PMID: 8591133

James NK, Reid CD.
Plastic surgery audit codes: are the results reproducible?
Br J Plast Surg. 1991 Jan;44(1):62-4.
PMID: 1993243

Griffin TC, Hutter JJ, Johnson KK, Moscow JA.
A survey of clinical productivity and current procedural terminology (CPT) coding patterns of pediatric hematologist/oncologists.
Pediatr Blood Cancer. 2004 Aug;43(2):140-7.
PMID: 15236280

Do Amaral Marcio B, Satomura Y.
Associating semantic grammars with the SNOMED: processing medical language and representing clinical facts into a language-independent frame.
Medinfo. 1995;8 Pt 1:18-22.
PMID: 8591149

Chua RV, Cordell WH, Ernsting KL, Bock HC, Nyhuis AW.
Accuracy of bar codes versus handwriting for recording trauma resuscitation events.
Ann Emerg Med. 1993 Oct;22(10):1545-50.
PMID: 8214833

Chang A, Schyve PM, Croteau RJ, O'Leary DS, Loeb JM.
The JCAHO patient safety event taxonomy: a standardized terminology and classification schema for near misses and adverse events.
Int J Qual Health Care. 2005 Apr;17(2):95-105. Epub 2005 Feb 21.
PMID: 15723817

Zhang L, Hripcsak G, Perl Y, Halper M, Geller J.
An expert study evaluating the UMLS lexical metaschema.
Artif Intell Med. 2005 Jul;34(3):219-33.
PMID: 15996860

Cimino C, Barnett GO.
Analysis of physician questions in an ambulatory care setting.
Comput Biomed Res. 1992 Aug;25(4):366-73.
PMID: 1511597

Lloyd SS, Rissing JP.
Physician and coding errors in patient records.
JAMA. 1985 Sep 13;254(10):1330-6.
PMID: 3927014.

Zimmerman KL, Wilcke JR, Robertson JL, Feldman BF, Kaur T, Rees LR, Spackman KA.
SNOMED representation of explanatory knowledge in veterinary clinical pathology.
Vet Clin Pathol. 2005;34(1):7-16.
PMID: 15732011.

Fu LS, Huff S, Bouhaddou O, Bray B, Warner H.
Estimating frequency of disease findings from combined hospital databases: a UMLS project.
Proc Annu Symp Comput Appl Med Care. 1991;:373-377.
PMID: 1807625.

Happe A, Pouliquen B, Burgun A, Cuggia M, Le Beux P.
Combining voice recognition and automatic indexing of medical reports.
Stud Health Technol Inform. 2002;90:382-387.
PMID: 15460722.

McCray AT, Nelson SJ.
The representation of meaning in the UMLS.
Methods Inf Med. 1995 Mar;34(1-2):193-201.
PMID: 9082131.

Rothwell DJ, Cote RA, Cordeau JP, Boisvert MA.
Developing a standard data structure for medical language --the SNOMED proposal.
Proc Annu Symp Comput Appl Med Care. 1993;:695-699.
PMID: 8130565.

Stannard CF.
Clinical terms project: a coding system for clinicians.
Br J Hosp Med. 1994 Jun 15-Jul 12;52(1):46-8.
PMID: 7952765

Hohnloser JH, Puerner F, Soltanian H.
Improving clinician's coded data entry through the use of an electronic patient record system: 3.5 years experience with a semiautomatic browsing and encoding tool in clinical routine.
Comput Biomed Res. 1996 Feb;29(1):41-7.
PMID: 8689873

Chute CG, Yang Y.
An overview of statistical methods for the classification and retrieval of patient events.
Methods Inf Med. 1995 Mar;34(1-2):104-10.
PMID: 9082119

Bodenreider O.
Strength in numbers: exploring redundancy in hierarchical relations across biomedical terminologies.
AMIA Annu Symp Proc. 2003;:101-105.
PMID: 14728142

Burgun A, Botti G, Lukacs B, Mayeux D, Seka LP, Delamarre D, Bremond M, Kohler F, Fieschi M, Le Beux P.
A system that facilitates the orientation within procedure nomenclatures through a semantic approach.
Med Inform (Lond). 1994 Oct-Dec;19(4):297-310.
PMID: 7603121.

Kokotailo RA, Hill MD.
Coding of stroke and stroke risk factors using international classification of diseases, revisions 9 and 10.
Stroke. 2005 Aug;36(8):1776-81. Epub 2005 Jul 14.
PMID: 16020772

McCray AT, Srinivasan S, Browne AC.
Lexical methods for managing variation in biomedical terminologies.
Proc Annu Symp Comput Appl Med Care. 1994;:235-9.
PMID: 7949926

Eagon JC, Hurdle JF, Lincoln MJ.
Inter-rater reliability and review of the VA unresolved narratives.
Proc AMIA Annu Fall Symp. 1996;:130-4.
PMID: 8947642

Hohnloser JH, Puerner F, Soltanian H.
Improving coded data entry by an electronic patient record system.
Methods Inf Med. 1996 Jun;35(2):108-11.
PMID: 8755383

Evans DA, Cimino JJ, Hersh WR, Huff SM, Bell DS.
Toward a medical-concept representation language. The Canon Group.
J Am Med Inform Assoc. 1994 May-Jun;1(3):207-17.
PMID: 7719804

Mays E, Weida R, Dionne R, Laker M, White B, Liang C, Oles FJ.
Scalable and expressive medical terminologies.
Proc AMIA Annu Fall Symp. 1996;:259-263.
PMID: 8947668.

Saint-Yves IF.
The Read Clinical Classification.
Health Bull (Edinb). 1992 Nov;50(6):422-7.
PMID: 1483867

Rector AL, Nowlan WA.
The GALEN project.
Comput Methods Programs Biomed. 1994 Oct;45(1-2):75-8.
PMID: 7889770

Hogan WR, Wagner MM.
Free-text fields change the meaning of coded data.
Proc AMIA Annu Fall Symp. 1996;:517-21.
PMID: 8947720

Horan TC, Emori TG.
Definitions of key terms used in the NNIS System.
Am J Infect Control. 1997 Apr;25(2):112-6.
PMID: 9113287

Wagner MM, Cooper GF.
Evaluation of a Meta-1-based automatic indexing method for medical documents.
Comput Biomed Res. 1992 Aug;25(4):336-50.
PMID: 1511595

Silverstein SM, Miller PL, Cullen MR.
An information sources map for Occupational and Environmental Medicine: guidance to network-based information through domain-specific indexing.
Proc Annu Symp Comput Appl Med Care. 1993;:616-20.
PMID: 8130548

Currie LM, Mellino LV, Cimino JJ, Bakken S.
Development and representation of a fall-injury risk assessment instrument in a clinical information system.
Medinfo. 2004;11(Pt 1):721-5.
PMID: 15360907

Schadow G, McDonald CJ.
Extracting structured information from free text pathology reports.
AMIA Annu Symp Proc. 2003;:584-8.
PMID: 14728240

McCray AT, Razi A.
The UMLS Knowledge Source server.
Medinfo. 1995;8 Pt 1:144-7.
PMID: 8591140

Essin DJ, Lincoln TL.
Implementing a low-cost computer-based patient record: a controlled vocabulary reduces data base design complexity.
Proc Annu Symp Comput Appl Med Care. 1995;:431-435.
PMID: 8563318.

Darmoni SJ, Jarrousse E, Zweigenbaum P, Le Beux P, Namer F, Baud R, Joubert M, Vallee H, Cote RA, Buemi A, Bourigault D, Recource G, Jeanneau S, Rodrigues JM.
VUMeF: extending the French involvement in the UMLS Metathesaurus.
AMIA Annu Symp Proc. 2003;:824.
PMID: 14728329.

Scherpbier HJ, Abrams RS, Roth DH, Hail JJ.
A simple approach to physician entry of patient problem list.
Proc Annu Symp Comput Appl Med Care. 1994;:206-210.
PMID: 7949921.

Moore GW, Hutchins GM, Miller RE.
Strategies for searching medical natural language text. Distribution of words in the anatomic diagnoses of 7000 autopsy subjects.
Am J Pathol. 1984 Apr;115(1):36-41.
PMID: 6546837.

Cimino C, Barnett GO.
Analysis of physician questions in an ambulatory care setting.
Proc Annu Symp Comput Appl Med Care. 1991;:995-9.
PMID: 1807782

Brown SH, Elkin PL, Rosenbloom ST, Husser C, Bauer BA, Lincoln MJ, Carter J, Erlbaum M, Tuttle MS.
VA National Drug File Reference Terminology: a cross-institutional content coverage study.
Medinfo. 2004;11(Pt 1):477-81.
PMID: 15360858

Tuttle MS, Nelson SJ.
A poor precedent.
Methods Inf Med. 1996 Sep;35(3):211-7.
PMID: 8952305

Brewster D, Muir C, Crichton J.
Registration of non-melanoma skin cancers in Scotland--how accurate are site and morphology codes?
Clin Exp Dermatol. 1995 Sep;20(5):401-5.
PMID: 8593717

Berman JJ.
Doublet method for very fast autocoding.
BMC Med Inform Decis Mak. 2004 Sep 15;4:16.
PMID: 15369595

Miller ET, Wieckert KE, Fagan LM, Musen MA.
The development of a controlled medical terminology: identification, collaboration, and customization.
Medinfo. 1995;8 Pt 1:148-52.
PMID: 8591141

Evans DA, Rothwell DJ, Monarch IA, Lefferts RG, Cote RA.
Toward representations for medical concepts.
Med Decis Making. 1991 Oct-Dec;11(4 Suppl):S102-8.
PMID: 1770838

Campbell KE, Das AK, Musen MA.
A logical foundation for representation of clinical data.
J Am Med Inform Assoc. 1994 May-Jun;1(3):218-32.
PMID: 7719805

Moore GW, Berman JJ.
Object-oriented controlled-vocabulary translator using TRANSOFT + HyperPAD.
Proc Annu Symp Comput Appl Med Care. 1991;:973-5.
PMID: 1807773

Steinwachs DM, Mushlin AI.
The Johns Hopkins ambulatory-care coding scheme.
Health Serv Res. 1978 Spring;13(1):36-49.
PMID: 632104.

Pratt W, Yetisgen-Yildiz M.
A study of biomedical concept identification: MetaMap vs. people.
AMIA Annu Symp Proc. 2003;:529-533.
PMID: 14728229.

Rassinoux AM, Miller RA, Baud RH, Scherrer JR.
Modeling principles for QMR medical findings.
Proc AMIA Annu Fall Symp. 1996;:264-268.
PMID: 8947669.

Jachna JS, Powsner SM, Miller PL.
Augmenting GRATEFUL MED with the UMLS Metathesaurus: an initial evaluation.
Bull Med Libr Assoc. 1993 Jan;81(1):20-28.
PMID: 8428185.

Lamiell JM, Wojcik ZM, Isaacks J.
Computer auditing of surgical operative reports written in English.
Proc Annu Symp Comput Appl Med Care. 1993;:269-273.
PMID: 8130475.

Moorman PW, van Ginneken AM, Siersema PD, van der Lei J, van Bemmel JH.
Evaluation of reporting based on descriptional knowledge.
J Am Med Inform Assoc. 1995 Nov-Dec;2(6):365-73.
PMID: 8581552.

Campbell KE, Wieckert K, Fagan LM, Musen MA.
A computer-based tool for generation of progress notes.
Proc Annu Symp Comput Appl Med Care. 1993;:28428-8.
PMID: 8130479.

Bishop CW.
Alternate approaches to a UMLS.
Med Decis Making. 1991 Oct-Dec;11(4 Suppl):S99-102.
PMID: 1770857.

Pole PM, Rector AL.
Mapping the GALEN CORE model to SNOMED-III: initial experiments.
Proc AMIA Annu Fall Symp. 1996;:100-104.
PMID: 8947636.

Campbell KE, Cohn SP, Chute CG, Rennels G, Shortliffe EH.
Galapagos: computer-based support for evolution of a convergent medical terminology.
Proc AMIA Annu Fall Symp. 1996;:269-273.
PMID: 8947670.

Ozbolt JG, Russo M, Stultz MP.
Validity and reliability of standard terms and codes for patient care data.
Proc Annu Symp Comput Appl Med Care. 1995;:37-41.
PMID: 8563304.

Rodrigues JM, Trombert Paviot B, Martin C, Vercherin P, Samuel O.
Co-ordination between clinical coding systems and pragmatic clinical terminologies based on a core open system: the role of ISO/TC215/WG3 and CEN/TC2511/WG2 standardisation?
Stud Health Technol Inform. 2002;90:401-5.
PMID: 15460725

Goldberg LJ, Ceusters W, Eisner J, Smith B.
The Significance of SNODENT.
Stud Health Technol Inform. 2005;116:737-42.
PMID: 16160346

Wingert F.
Medical linguistics: automated indexing into SNOMED.
Crit Rev Med Inform. 1988;1(4):333-403.
PMID: 3288353

Bales ME, Kukafka R, Burkhardt A, Friedman C.
Qualitative assessment of the International Classification of Functioning, Disability, and Health with respect to the desiderata for controlled medical vocabularies.
Int J Med Inform. 2005 Aug 22; [Epub ahead of print]
PMID: 16122973.

Tuttle MS, Sperzel WD, Olson NE, Erlbaum MS, Suarez-Munist O, Sherertz DD, Nelson SJ, Fuller LF.
The homogenization of the Metathesaurus schema and distribution format.
Proc Annu Symp Comput Appl Med Care. 1992;:299-303.
PMID: 1482884

Lowry RB, Rocheleau J, Keillor L.
Comparison of existing classifications for coding congenital malformation and genetic syndromes.
Birth Defects Orig Artic Ser. 1977;13(3A):53-9.
PMID: 884243

Renner JH, Bauman EA.
Problem-specific coding systems.
J Fam Pract. 1975 Aug;2(4):279-81.
PMID: 1081121

Suarez-Munist ON, Tuttle MS, Olson NE, Erlbaum MS, Sherertz DD, Lipow SS, Cole WG, Keck KD, Davis AN.
MEME-II supports the cooperative management of terminology.
Proc AMIA Annu Fall Symp. 1996;:84-8.
PMID: 8947633

Shiffman RN.
A findings model for an ambulatory pediatric record: essential data, relational modeling, and vocabulary considerations.
Proc Annu Symp Comput Appl Med Care. 1995;:411-415.
PMID: 8563313.

James NK.
The Read clinical classification and its use in plastic surgery.
Ann R Coll Surg Engl. 1994 May;76(3):164-8.
PMID: 8017810

Ruan W, Buerkle T, Dudeck JW.
Mapping various information sources to a semantic network.
Medinfo. 2004;11(Pt 1):430-433.
PMID: 15360849.

Bernauer J, Franz M, Schoop D, Schoop M, Pretschner DP.
The compositional approach for representing medical concept systems.
Medinfo. 1995;8 Pt 1:70-74.
PMID: 8591303.
PubMed Entry

Polissar L, Feigl P, Lane WW, Glaefke G, Dahlberg S.
Accuracy of basic cancer patient data: results from an extensive recoding survey.
J Natl Cancer Inst. 1984 May;72(5):1007-1014.
PMID: 6585578.
PubMed Entry

Yeoh C, Davies H.
Clinical coding: completeness and accuracy when doctors take it on.
BMJ. 1993 Apr 10;306(6883):972.
PMID: 8490474.
PubMed Entry

Zhang L, Perl Y, Halper M, Geller J, Hripcsak G.
A lexical metaschema for the UMLS semantic network.
Artif Intell Med. 2005 Jan;33(1):41-59.
PMID: 15617981.
PubMed Entry

van der Lei J, Musen MA.
The separation of reviewing knowledge from medical knowledge.
Methods Inf Med. 1995 Mar;34(1-2):131-139.
PMID: 9082122.
PubMed Entry

Michel PA, Lovis C, Baud R.
LUCID: a semi-automated ICD-9 encoding system.
Medinfo. 1995;8 Pt 2:1656.
PMID: 8591529.

Green LA.
Read Codes: a tool for automated medical records.
J Fam Pract. 1992 May;34(5):633-4. No abstract available.
PMID: 1578216

Mehanni M, Loughman E, Allwright SP, Prichard J.
The hospital in-patient enquiry scheme: a study of data accuracy and capture.
Ir Med J. 1995 Jan-Feb;88(1):24-6.
PMID: 7737837

de Lusignan S, Valentin T, Chan T, Hague N, Wood O, van Vlymen J, Dhoul N.
Problems with primary care data quality: osteoporosis as an exemplar.
Inform Prim Care. 2004;12(3):147-56.
PMID: 15606987

O'Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM.
Measuring diagnoses: ICD code accuracy.
Health Serv Res. 2005 Oct;40(5 Pt 2):1620-39.
PMID: 16178999

Jonassen K, Saboe R.
The use of text encoding in the development of a terminology and knowledge system associated with the Norwegian version of the ICD-10.
Medinfo. 1995;8 Pt 1:51-5.
PMID: 8591246

Berman JJ.
Tumor taxonomy for the developmental lineage classification of neoplasms.
BMC Cancer. 2004 Nov 30;4:88.
PMID: 15571625

Zweigenbaum P, Baud R, Burgun A, Namer F, Jarrousse E, Grabar N, Ruch P, Le Duff F, Thirion B, Darmoni S.
UMLF: a Unified Medical Lexicon for French.
AMIA Annu Symp Proc. 2003;:1062.
PMID: 14728565

Pahor M, Chrischilles EA, Guralnik JM, Brown SL, Wallace RB, Carbonin P.
Drug data coding and analysis in epidemiologic studies.
Eur J Epidemiol. 1994 Aug;10(4):405-411.
PMID: 7843344

Li W, Tolson J, Horan TC.
Creating public health standard vocabularies: mapping a set of CDC's pathogen codes to SNOMED concepts.
AMIA Annu Symp Proc. 2003;:907.
PMID: 14728413

Gibby GL, Paulus DA, Sirota DJ, Treloar RW, Jackson KI, Gravenstein JS, van der Aa JJ.
Computerized pre-anesthetic evaluation results in additional abstracted comorbidity diagnoses.
J Clin Monit. 1997 Jan;13(1):35-41.
PMID: 9058251

Tuttle MS, Suarez-Munist ON, Olson NE, Sherertz DD, Sperzel WD, Erlbaum MS, Fuller LF, Hole WT, Nelson SJ, Cole WG, et al.
Merging terminologies.
Medinfo. 1995;8 Pt 1:162-6.
PMID: 8591144

Engelbrecht R, Ingenerf J, Reiner J.
Educational standards -- terminologies used.
Stud Health Technol Inform. 2004;109:95-113. Review.
PMID: 15718677

Marwede D, Fielding M.
The epistemological-ontological divide in clinical radiology.
Stud Health Technol Inform. 2005;116:749-54.
PMID: 16160348

Schulz S, Hahn U, Rogers J.
Semantic Clarification of the Representation of Procedures and Diseases in SNOMED((R))CT.
Stud Health Technol Inform. 2005;116:773-8.
PMID: 16160352

Brand DA, Krag MH, Hausman MR, Trainor KF, Akelman E, Rudicel SA, Southwick WO.
A patient registry for orthopedic surgery.
Clin Orthop Relat Res. 1990 Mar;(252):262-9.
PMID: 2302892

Rector AL, Nowlan WA, Glowinski A.
Goals for concept representation in the GALEN project.
Proc Annu Symp Comput Appl Med Care. 1993;:414-418.
PMID: 8130507

Major P, Kostrewski BJ, Anderson J.
Analysis of the semantic structures of medical reference languages: part 2. Analysis of the semantic power of MeSH, ICD and SNOMED.
Med Inform (Lond). 1978 Dec;3(4):269-81.
PMID: 370473

Cordes DO, Limer KL, McEntee K.
Data management for the International Registry of Reproductive Pathology using SNOMED coding and computerization.
Vet Pathol. 1981 May;18(3):342-50.
PMID: 7257079

Moehr JR, Kluge EH, Patel VL.
Advanced patient information systems and medical concept representation.
Medinfo. 1995;8 Pt 1:95-9.
PMID: 8591349

Campbell JR, Kallenberg GA, Sherrick RC.
The clinical utility of META: an analysis for hypertension.
Proc Annu Symp Comput Appl Med Care. 1992;:397-401.
PMID: 1482905

Anon.
A proposal for more informative abstracts of clinical articles. Ad Hoc Working Group for Critical Appraisal of the Medical Literature.
Ann Intern Med. 1987 Apr;106(4):598-604.
PMID: 3826959

Evans DA, Brownlow ND, Hersh WR, Campbell EM.
Automating concept identification in the electronic medical record: an experiment in extracting dosage information.
Proc AMIA Annu Fall Symp. 1996;:388-92.
PMID: 8947694

Bailey IR, Page KB, Jones RG, Payne RB, Little AJ.
Mnemonic coding system for clinical data entry into laboratory computers: its effect on quality and efficiency.
J Clin Pathol. 1991 Dec;44(12):1018-21.
PMID: 1791201

Campos-Outcalt DE.
Accuracy of ICD-9-CM codes in identifying reportable communicable diseases.
Qual Assur Util Rev. 1990 Aug;5(3):86-9.
PMID: 2136670

Foster EA, Stein A, Liberman D, Cooper C, Wolfe HJ.
A computer-assisted surgical pathology system.
Am J Clin Pathol. 1982 Sep;78(3):328-36.
PMID: 7113969

Feigl P, Polissar L, Lane WW, Guinee V.
Reliability of basic cancer patient data.
Stat Med. 1982 Jul-Sep;1(3):191-204.
PMID: 7187093

Jeanty C.
The computerized medical record in gastroenterology: part 2. Morphological (descriptive) data.
Med Inform (Lond). 1978 Dec;3(4):283-9.
PMID: 745473

Ceusters W, Smith B, Kumar A, Dhaen C.
Ontology-based error detection in SNOMED-CT.
Medinfo. 2004;11(Pt 1):482-486.
PMID: 15360859.

Symmons D, Sant S.
The language of rheumatology. I: Nomenclature and coding.
Ann Rheum Dis. 1996 Jan;55(1):4-6. No abstract available.
PMID: 8572732

Lu TH, Jen I, Chou YJ, Chang HJ.
Evaluating the comparability of different grouping schemes for mortality and morbidity.
Health Policy. 2005 Feb;71(2):151-9.
PMID: 15607378

Cornet R, Prins AK.
An architecture for standardized terminology services by wrapping and integration of existing applications.
AMIA Annu Symp Proc. 2003;:180-4.
PMID: 14728158

Lieberman MI, Ricciardi TN, Masarie FE, Spackman KA.
The use of SNOMED CT simplifies querying of a clinical data warehouse.
AMIA Annu Symp Proc. 2003;:910.
PMID: 14728416

Kannry JL, Wright L, Shifman M, Silverstein S, Miller PL.
Portability issues for a structured clinical vocabulary: mapping from Yale to the Columbia medical entities dictionary.
J Am Med Inform Assoc. 1996 Jan-Feb;3(1):66-78.
PMID: 8750391

Smith N, Wilson A, Weekes T.
Use of Read codes in development of a standard data set.
BMJ. 1995 Jul 29;311(7000):313-315.
PMID: 7633247.

Connolly JP, McGavock H, Wilson-Davis K.
Research methodology: Coding perceived morbidity in general practice-- an evaluation of the Read Classification and the International Classification of Primary Care (ICPC).
Pharmacoepidemiol Drug Saf. 1997 Sep;6(5):325-330.
PMID: 15073767.

Groves WE.
Storage and retrieval of coded patient diagnoses and data on a clinical laboratory computer system.
Comput Programs Biomed. 1980 Dec;12(2-3):225-29.
PMID: 7249600

Barrows RC Jr, Cimino JJ, Clayton PD.
Mapping clinically useful terminology to a controlled medical vocabulary.
Proc Annu Symp Comput Appl Med Care. 1994;:211-5.
PMID: 7949922

Surjan G, Balkanyi L.
Theoretical considerations on medical concept representation.
Med Inform (Lond). 1996 Jan-Mar;21(1):61-8.
PMID: 8871898

Cote RA, Rothwell DJ.
The classification-nomenclature issues in medicine: a return to natural language.
Med Inform (Lond). 1989 Jan-Mar;14(1):25-41.
PMID: 2725112

Hohnloser JH, Konig A, Fischer MR, Hertenstein B, Emmerich B.
Building a cytology report database: a computer-assisted system for documentation, evaluation and hospital-wide recall of haematological biopsy reports.
Med Inform (Lond). 1994 Jul-Sep;19(3):199-208.
PMID: 7707742.

Leck I, Birch JM, Marsden HB, Steward JK.
Methods of classifying and ascertaining children's tumours.
Br J Cancer. 1976 Jul;34(1):69-82.
PMID: 952716

Bridges-Webb C.
Classifying and coding morbidity in general practice: validity and reliability in an international trial.
J Fam Pract. 1986 Aug;23(2):147-150.
PMID: 3734719.

Brage S, Bentsen BG, Bjerkedal T, Nygard JF, Tellnes G.
ICPC as a standard classification in Norway.
Fam Pract. 1996 Aug;13(4):391-396.
PMID: 8872099.

Rector AL, Rogers J, Taweel A.
Models and inference methods for clinical systems: a principled approach.
Medinfo. 2004;11(Pt 1):79-83.
PMID: 15360779.

Dybkaer R.
An ontology on property for physical, chemical, and biological systems.
APMIS Suppl. 2004;(117):1-210. Review. Erratum in: APMIS Suppl. 2005 Feb;113(2):151.

Yang YM, Chute CG. Hierarchical distribution analysis for computer-assisted classifications of patient records. Med Decis Making. 1991 Oct-Dec;11(4 Suppl):S94-8. PMID: 1770856

Dudrey EF, Watts MT. A dBASE III surgical pathology reporting and encoding microcomputer system. Am J Clin Pathol. 1990 Jan;93(1):91-7. PMID: 2294706

Bernauer J, Gumrich K, Kutz S, Lindner P, Pretschner DP. An interactive report generator for bone scan studies. Proc Annu Symp Comput Appl Med Care. 1991;:858-60. PMID: 1807729

Rocha RA, Huff SM. Coupling vocabularies and data structures: lessons from LOINC. Proc AMIA Annu Fall Symp. 1996;:90-4. PMID: 8947634

Ma H, Rolka H, Mandl K, Buckeridge D, Fleischauer A, Pavlin J. Implementation of laboratory order data in BioSense Early Event Detection and Situation Awareness System. MMWR Morb Mortal Wkly Rep. 2005 Aug 26;54 Suppl:27-30. PMID: 16177689

Swails WS, Samour PQ, Babineau TJ, Bistrian BR.
A proposed revision of current ICD-9-CM malnutrition code definitions.
J Am Diet Assoc. 1996 Apr;96(4):370-373.
PMID: 8598438.

Robinson PJ.
Version 3 of the Read Codes.
Nucl Med Commun. 1996 Feb;17(2):95-96.
PMID: 8778643.

Macey A, Kelly C, Brady O, Burke F.
Hand surgery coding and classification.
J Hand Surg [Br]. 1995 Oct;20(5):681-684.
PMID: 8543881.

Berman JJ, Henson DE.
Classifying the precancers: a metadata approach.
BMC Med Inform Decis Mak. 2003 Jun 20;3:8. Epub 2003 Jun 20.
PMID: 12818004.

Hogarth MA, Gertz M, Gorin F.
jTerm: an open source terminology server.
AMIA Annu Symp Proc. 2003;:861.
PMID: 14728366.

Tackley RM, Stuart-Taylor ME, Hurrell M.
Why do anaesthetists need codes?
Br J Anaesth. 1993 Oct;71(4):602-606.
PMID: 8260316.

Worth RM, Mytinger RE.
Medical insurance claims as a source of data for research: accuracy of diagnostic coding.
Hawaii Med J. 1996 Jan;55(1):9-11.
PMID: 8786232.

Hall PA, Lemoine NR.
Comparison of manual data coding errors in two hospitals.
J Clin Pathol. 1986 Jun;39(6):622-626.
PMID: 3722414.

De Moor G, Fiers T, Wieme R, Scott P.
The research in semantics behind the OpenLabs coding system.
Comput Methods Programs Biomed. 1996 Jul;50(2):169-185.
PMID: 8875023.

Bishop CW.
A name is not enough.
MD Comput. 1989 Jul-Aug;6(4):200-206.
PMID: 2779395.

van Ginneken AM, Liem EB, Moorman PW.
Integrating QMR with a computer-based patient record.
Proc Annu Symp Comput Appl Med Care. 1993;:98-102.
PMID: 8130602

Cimino JJ, Barnett GO.
Automated translation between medical terminologies using semantic definitions.
MD Comput. 1990 Mar-Apr;7(2):104-9. Erratum in: MD Comput 1990 Jul-Aug;7(4):268.
PMID: 2186251

Pakhomov SV, Buntrock JD, Chute CG.
Using compound codes for automatic classification of clinical diagnoses.
Medinfo. 2004;11(Pt 1):411-415.
PMID: 15360845.

Maletta GJ.
The concept of "reversible" dementia. How nonreliable terminology may impair effective treatment.
J Am Geriatr Soc. 1990 Feb;38(2):136-140.
PMID: 2299117.

Pisanelli DM, Rossi-Mori A.
Converting the representation of medical data: criteria to code the underlying cause of death.
Methods Inf Med. 1990 Jul;29(3):220-235.
PMID: 2215264.

Dale RF, Midwinter MJ.
Use of database management system by surgeons to produce operation notes.
Ann R Coll Surg Engl. 1996 Nov;78(6 Suppl):272-5.
PMID: 8944499

Gallo P, De Blasi V.
A computerized data bank of surgical pathology and cytopathology diagnoses. Structure and purposes.
Virchows Arch A Pathol Anat Histopathol. 1983;401(3):345-354.
PMID: 6415911

Johnson CE, Slotfedt MH.
Organizing patient data for use in clinical pharmacy practice and education.
Am J Hosp Pharm. 1976 Oct;33(10):1020-2.
PMID: 973631

Nash SK.
Nonsynonymous synonyms: correcting and improving SNOMED CT.
AMIA Annu Symp Proc. 2003;:949.
PMID: 14728454.

Walker DA, Thomson K.
Read clinical terms and child health.
Arch Dis Child. 1994 Sep;71(3):272-274.
PMID: 7979507.

Bishop CW, Ewing P.
Representing medical knowledge: reconciling the present or creating the future?
MD Comput. 1992 Jul-Aug;9(4):218-225.
PMID: 1508034.

Arts DG, Cornet R, De Jonge E, De Keizer NF.
Comparison of methods for evaluation of medical terminological systems.
AMIA Annu Symp Proc. 2003;:779.
PMID: 14728284

Rocha RA, Rocha BH, Huff SM.
Automated translation between medical vocabularies using a frame-based interlingua.
Proc Annu Symp Comput Appl Med Care. 1993;:690-4.
PMID: 8130564

Masarie FE Jr, Miller RA.
Medical Subject Headings and medical terminology: an analysis of terminology used in hospital charts.
Bull Med Libr Assoc. 1987 Apr;75(2):89-94.
PMID: 3297223

Szolovits P.
Adding a medical lexicon to an English Parser.
AMIA Annu Symp Proc. 2003;:639-43.
PMID: 14728251

Giannangelo K, Berkowitz L.
SNOMED CT helps drive EHR success.
J AHIMA. 2005 Apr;76(4):66-7. No abstract available.
PMID: 15871475

Miller GC, Britt H.
A new drug classification for computer systems: the ATC extension code.
Int J Biomed Comput. 1995 Oct;40(2):121-4.
PMID: 8847119

Wong KT, Chan KS.
A dBASE III system for managing 35 mm slides in pathology.
Malays J Pathol. 1990 Dec;12(2):101-6.
PMID: 2102964

Honkanen RJ, Monkkonen R.
Reliability of causal indicators for nonfatal injuries.
Scand J Soc Med. 1990 Dec;18(4):257-61.
PMID: 2127135

Groner GF, Hopwood MD, Palley NA, Sibley WL, Baker WR, Christopher TG, Thompson HK Jr.
An interactive data management and analysis system for clinical investigators.
J Lab Clin Med. 1978 Sep;92(3):325-40.
PMID: 681819

Rada R, Ghaoui C, Russell J, Taylor M.
Approaches to the construction of a medical informatics glossary and thesaurus.
Med Inform (Lond). 1993 Jan-Mar;18(1):69-78.
PMID: 8366694

Wilkins R.
Use of postal codes and addresses in the analysis of health data.
Health Rep. 1993;5(2):157-77. English, French.
PMID: 8292756

Dickey RA.
Practical tips on coding for diabetes care.
Endocr Pract. 1996 Nov-Dec;2(6):389-394.
PMID: 15251500.

Zhang L, Halper M, Perl Y, Geller J, Cimino JJ.
Relationship structures and semantic type assignments of the UMLS Enriched Semantic Network.
J Am Med Inform Assoc. 2005 Nov-Dec;12(6):657-66. Epub 2005 Jul 27.
PMID: 16049233

Fabry P, Baud R, Burgun A, Lovis C.
Amplification of Terminologia anatomica by French language terms using Latin terms matching algorithm: A prototype for other language.
Int J Med Inform. 2005 Sep 30; [Epub ahead of print].
PMID: 16203172.

Nelson NA, Barker DM, Van Peenen PF, Blanchard AG.
Determining exposure categories for a refinery retrospective cohort mortality study.
Am Ind Hyg Assoc J. 1985 Nov;46(11):653-657.
PMID: 4072909

Ozbolt J.
Reference terminology for therapeutic goals: a new approach.
AMIA Annu Symp Proc. 2003;:504-8.
PMID: 14728224

Panayiotou B.
Coding of clinical diagnoses. Persevere with Korner system.
BMJ. 1993 Jun 5;306(6891):1541. No abstract available.
PMID: 8518698

Beland MF, Bronson RT, Peacock WC.
Computer-based animal record system in a nonhuman primate colony.
Am J Vet Res. 1981 Aug;42(8):1456-9.
PMID: 7294483

Jasperse DM, Ahmed SW.
The Mid-Atlantic Oncology Program's comparison of two data collection methods.
Control Clin Trials. 1989 Sep;10(3):282-9.
PMID: 2676340

Haig A, Dozier M, Liu D, McKendree J, Roper T, Selai C.
METRO taxonomy - progress report on assessment.
Med Teach. 2005 Mar;27(2):155-7.
PMID: 16019337

Jefferson TO, Demicheli V, Macmillan AH.
Pilot study of the introduction of the J95 health data collection system.
J R Army Med Corps. 1996 Feb;142(1):25-29.
PMID: 8667325.

Shen RN, Band B, Bingham JS, FitzGerald M, Johnson A, Pattman RS, Brown P, Barlow D.
Genitourinary medicine and the Read Codes Clinical Terms Project--a medical language for the future. Genitourinary Medicine Specialty Working Group (GUM SWG)
Int J STD AIDS. 1994 Mar-Apr;5(2):90-92.
PMID: 8031924.

Sadeghi S, Barzi A, Smith JW.
Ontology Driven Construction of a Knowledgebase for Bayesian Decision Models Based on UMLS.
Stud Health Technol Inform. 2005;116:223-228.
PMID: 16160263.

Brown SH, Bauer BA, Wahner-Roedler DL, Elkin PL.
Coverage of oncology drug indication concepts and compositional semantics by SNOMED-CT.
AMIA Annu Symp Proc. 2003;:115-9.
PMID: 14728145

Woods JW, Sneiderman CA, Hameed K, Ackerman MJ, Hatton C.
Using UMLS metathesaurus concepts to describe medical images: dermatology vocabulary.
Comput Biol Med. 2006 Jan;36(1):89-100.
PMID: 16324910

Jeanty C.
The computerized medical record in gastroenterology: part 4. Health curriculum vitae.
Med Inform (Lond). 1978 Dec;3(4):299-303.
PMID: 745475

Sim I, Berlin A.
A framework for classifying decision support systems.
AMIA Annu Symp Proc. 2003;:599-603.
PMID: 14728243

Rinaldi RC, Steindler EM, Wilford BB, Goodwin D.
Clarification and standardization of substance abuse terminology.
JAMA. 1988 Jan 22-29;259(4):555-7.
PMID: 3275816

Ingenerf J.
Taxonomic vocabularies in medicine: the intention of usage determines different established structures.
Medinfo. 1995;8 Pt 1:136-9.
PMID: 8591138

Shapiro-Ilan DI, Fuxa JR, Lacey LA, Onstad DW, Kaya HK.
Definitions of pathogenicity and virulence in invertebrate pathology.
J Invertebr Pathol. 2005 Jan;88(1):1-7. Epub 2004 Dec 1.
PMID: 15707863

Sereno PC.
The logical basis of phylogenetic taxonomy.
Syst Biol. 2005 Aug;54(4):595-619.
PMID: 16109704

Wingert F.
Automated indexing based on SNOMED.
Methods Inf Med. 1985 Jan;24(1):27-34. No abstract available.
PMID: 3982279

Harkness P, Topham J.
Clinical coding in ENT surgery: the Read Codes and clinical terms project.
Clin Otolaryngol Allied Sci. 1995 Feb;20(1):3-4. No abstract available.
PMID: 7788930

Hieb BR.
A proposal for a national health care identifier.
Proc Annu Symp Comput Appl Med Care. 1994;:469-72.
PMID: 7949971

Wingert F.
Automated indexing of SNOMED statements into ICD.
Methods Inf Med. 1987 Jul;26(3):93-8. No abstract available.
PMID: 3670105

Wingert F.
An indexing system for SNOMED.
Methods Inf Med. 1986 Jan;25(1):22-30. No abstract available.
PMID: 3753739

Rossi Mori A, Galeazzi E, Consorti F.
An ontological perspective on surgical procedures.
Proc AMIA Annu Fall Symp. 1996;:115-9.
PMID: 8947639

Ceusters W, Smith B, Kumar A, Dhaen C.
Mistakes in medical ontologies: where do they come from and how can they be detected?
Stud Health Technol Inform. 2004;102:145-163.
PMID: 15853269

Payne C.
Developing a standard dataset for the NHS. Version 3 of read codes addresses many difficulties.
BMJ. 1995 Oct 7;311(7010):951. No abstract available.
PMID: 7580580

Dill LM, Bye BV, Williams CI.
The development of a new geographic coding system for the Continuous Work History Sample.
Soc Secur Bull. 1994 Winter;57(4):34-48.
PMID: 7761958

Smith SH, Kershaw C, Thomas IH, Botha JL.
PIS and DRGs: coding inaccuracies and their consequences for resource management.
J Public Health Med. 1991 Feb;13(1):40-1.
PMID: 1903040

Straub HR, Frei N, Mosimann H, Perger C, Ulrich A.
Simplified representation of concepts and relations on screen.
Stud Health Technol Inform. 2005;116:799-804.
PMID: 16160356

Richwine PW.
A study of MeSH and UMLS for subject searching in an online catalog.
Bull Med Libr Assoc. 1993 Apr;81(2):229-33. No abstract available.
PMID: 8472010

Earlam R.
Korner, nomenclature, and SNOMED.
Br Med J (Clin Res Ed). 1988 Mar 26;296(6626):903-5. No abstract available.
PMID: 3129068

Kelble KM, Kehm G, Roe MH, Todd JK.
Innovative index system for reporting microbiology laboratory results.
J Clin Microbiol. 1993 May;31(5):1290-2.
PMID: 8501231

Kashyap V, Ramakrishnan C, Rindflesch TC.
Toward (semi-)automatic generation of bio-medical ontologies.
AMIA Annu Symp Proc. 2003;:886.
PMID: 14728391

Swettenham KV, Nickols C, Berry CL.
Computer programs in histopathology record keeping.
J Clin Pathol. 1982 Jan;35(1):40-4.
PMID: 7061718

Gee SC, Page WF.
The use of comparability ratios to adjust hospital trend data.
Am J Public Health. 1985 Jan;75(1):81-2.
PMID: 3966607

Coles EC, Slavin G.
An evaluation of automatic coding of surgical pathology reports.
J Clin Pathol. 1976 Jul;29(7):621-5.
PMID: 977772

Bishop CW, Dombrowski T.
Coding: why and how.
MD Comput. 1990 Jul-Aug;7(4):210-5.
PMID: 2215122

Haig A, Ellaway R, Dozier M, Liu D, McKendree J.
METRO--the creation of a taxonomy for medical education.
Health Info Libr J. 2004 Dec;21(4):211-9.
PMID: 15606878

Gabrieli ER.
A new electronic medical nomenclature.
J Med Syst. 1989 Dec;13(6):355-73.
PMID: 2636970

Miller RW, van de Geijn J.
The use of a bar code scanner to improve the utility and flexibility of record and verify systems used in radiation therapy.
Med Phys. 1988 Jul-Aug;15(4):611-3.
PMID: 3211055

Riegodedios AJ, Ajene A, Malakooti MA, Gaydos JC, MacIntosh VH, Bohnker BK.
Comparing diagnostic coding and laboratory results.
Emerg Infect Dis. 2005 Jul;11(7):1151-3. No abstract available.
PMID: 16032796

Harber P, Crawford L, Liu K, Schacter L.
Working words: real-life lexicon of North American workers.
J Occup Environ Med. 2005 Aug;47(8):859-64.
PMID: 16093937

Kiuchi T, Ohashi Y, Sato H, Kaihara S.
Methodology for the construction of a disease nomenclature and classification system for clinical use.
Methods Inf Med. 1995 Dec;34(5):511-7.
PMID: 8713767

Smith B, Rosse C.
The role of foundational relations in the alignment of biomedical ontologies.
Medinfo. 2004;11(Pt 1):444-8.
PMID: 15360852

Gibbons PS, Pishotta FT, Stepto RC.
A system for reporting gynecologic procedures. A linguistic-logical approach.
J Reprod Med. 1983 Mar;28(3):201-5.
PMID: 6854551

Walker D, Misan GM.
The Australian Medicines Handbook and its controlling vocabularies.
MD Comput. 1997 Mar-Apr;14(2):107-113.
PMID: 9066246

Thomos N, Boulgouris NV, Strintzis MG.
Wireless image transmission using turbo codes and optimal unequal error protection.
IEEE Trans Image Process. 2005 Nov;14(11):1890-901.
PMID: 16279187

Banks IC, Tackley RM.
A standard set of terms for critical incident recording?
Br J Anaesth. 1994 Nov;73(5):703-8.
PMID: 7826806

White ME, Vellake E.
A coding scheme for veterinary clinical signs.
Cornell Vet. 1980 Apr;70(2):160-82.
PMID: 7408497

Caldwell SH, Popenoe R.
Perceptions and misperceptions of skin color.
Ann Intern Med. 1995 Apr 15;122(8):614-7.
PMID: 7887557

Devlies JH.
Terminology and coding systems of drugs--CDrugs.
Medinfo. 1995;8 Pt 1:105-9.
PMID: 8591130

Rector AL, Rogers JE, Zanstra PE, Van Der Haring E; OpenGALEN.
OpenGALEN: open source medical terminology and tools.
AMIA Annu Symp Proc. 2003;:982.
PMID: 14728486.

Bishop CW, Ewing PD.
Transferring knowledge from one system to another.
Proc Annu Symp Comput Appl Med Care. 1994;:967.
PMID: 7950071.

Harber P, Miller G, Smitherman J.
Work coding: beyond SIC and SOC, BOC and DOT.
J Occup Med. 1991 Dec;33(12):1274-1280.
PMID: 1800688.

Talbot RB, Mills EM.
SNOMED International for veterinary medicine.
J Am Vet Med Assoc. 1994 Nov 15;205(10):1445-1447.
PMID: 7698928.

Anonymous.
Complete listing of the new evaluation & management codes.
J Ark Med Soc. 1991 Dec;88(7):324-342.
PMID: 1838372.

Klatt EC, Noguchi TT.
Forensic diagnosis coding and retrieval by microcomputer.
Am J Forensic Med Pathol. 1986 Sep;7(3):196-200.
PMID: 3788907.

Cote RA.
Standardized coding of the medical problem list.
J Am Med Inform Assoc. 1995 Jan-Feb;2(1):68.
PMID: 7741924.

Kuperman G, Bates DW.
Standardized coding of the medical problem list. J Am Med Inform Assoc. 1994 Sep-Oct;1(5):414-415.
PMID: 7850566

Bondy J, Lipscomb H, Guarini K, Glazner JE.
Methods for using narrative text from injury reports to identify factors contributing to construction injury.
Am J Ind Med. 2005 Nov;48(5):373-380.
PMID: 16254951

Price C, Bentley TE, Brown PJ, Schulz EB, O'Neil M.
Anatomical characterisation of surgical procedures in the Read Thesaurus.
Proc AMIA Annu Fall Symp. 1996;:110-114.
PMID: 8947638

Biesecker LG.
Mapping phenotypes to language: a proposal to organize and standardize the clinical descriptions of malformations.
Clin Genet. 2005 Oct;68(4):320-6.
PMID: 16143016

Spencer LM, Spencer GR.
A new classification of ophthalmic disorders with standardized ophthalmic abbreviations.
Ophthalmology. 1990 Mar;97(3):385-9.
PMID: 2336279

Fujimoto R.
Investigation of the index structure of Drugdoc and Ringdoc.
J Chem Inf Comput Sci. 1976 Nov;16(4):227-31.
PMID: 1002777

Moore H.
Getting the SNOMED ball rolling in Australia.
J AHIMA. 2004 Nov-Dec;75(10):6.
PMID: 15559833.

Gebbie KM.
Major classification systems in health care and their use.
ANA Publ. 1989 Jan;(NP-74):48-49.
PMID: 2929894.

Palotay JL.
SNOMED-SNOVET: an information system for comparative medicine.
Med Inform (Lond). 1983 Jan-Mar;8(1):17-21.
PMID: 6834930.

Smart D, Shaw C, Johnston CF, Halton DW, Buchanan KD.
Discussion paper: towards a systematic classification for regulatory peptides.
Regul Pept. 1993 Apr 8;44(3):305-9. Review.
PMID: 8484021

Klimczak JC, Hahn AW, Sievert M, Mitchell JA.
Getting around in a large nomenclature file: browsing SNOMED international.
Proc Annu Symp Comput Appl Med Care. 1994;:1023. No abstract available.
PMID: 7949860

Dickey RA.
Coding for endocrine services: using the new codes and the evocative/suppression testing protocols.
Endocr Pract. 1996 May-Jun;2(3):193-196.
PMID: 15251539.

Smith B, Ceusters W, Temmerman R.
Wusteria.
Stud Health Technol Inform. 2005;116:647-652.
PMID: 16160331.

Straubs V.
The mnemonic coding system of diseases.
J Fam Pract. 1976 Jun;3(3):324.
PMID: 993763.

Heja G, Surjan G.
Using n-gram method in the decomposition of compound medical diagnoses.
Stud Health Technol Inform. 2002;90:455-459.
PMID: 15460736.

McLay AL, Toner PG.
The classification of ultrastructural topography in the context of an ultrastructural diagnostic service.
Diagn Histopathol. 1981 Jul-Sep;4(3):219-22.
PMID: 7273993.

Brown WM.
On defining 'disease'.
J Med Philos. 1985 Nov;10(4):311-328.
PMID: 4067454.

Heja G, Surjan G, Lukacsy G, Pallinger P, Gergely M.
GALEN Based Formal Representation of ICD10.
Stud Health Technol Inform. 2005;116:707-712.
PMID: 16160341.

Stubbs DM.
Information content and clarity of radiologists' reports for chest radiography.
Acad Radiol. 1997 Apr;4(4):325.
PMID: 9110033.

Fleck A, Robinson R, Brown SS, Hobbs HR.
Definitions of some words and terms used in automated analysis. Prepared for the study group on automation and scientific and technical committee of the association of clinical biochemists.
Ann Clin Biochem. 1974 Nov;11(6):242-257.
PMID: 4460849.
PubMed Entry

Wolff S.
The use of morphosemantic regularities in the medical vocabulary for automatic lexical coding.
Methods Inf Med. 1984 Oct;23(4):195-203.
PMID: 6392820.
PubMed Entry

Owens H, Maxmen JS.
Mood and affect: a semantic confusion.
Am J Psychiatry. 1979 Jan;136(1):97-99.
PMID: 758838.
PubMed Entry

Dodd W.
Korner, nomenclature, and SNOMED.
Br Med J (Clin Res Ed). 1988 Apr 23;296(6630):1198-1199.
PMID: 3132268.
PubMed Entry

Kilbourne J, Williams T.
Unicode, UTF-8, ASCII, and SNOMED CT.
AMIA Annu Symp Proc. 2003;:892.
PMID: 14728397.
PubMed Entry

Muck M.
A serial read-out scheme for SQUID systems.
Clin Phys Physiol Meas. 1991;12 Suppl B:51-57.
PMID: 1807880.

Fenna D, Wartak J.
Entity-directed coding of medical nomenclature.
Methods Inf Med. 1984 Apr;23(2):82-86.
PMID: 6472121.

Milicevic A, Nikolic S, Trinajstic N.
Coding and ordering Kekule structures.
J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):415-421.
PMID: 15032520.

Gordon BL.
Terminology and content of the medical record.
Comput Biomed Res. 1970 Oct 5;3(5):436-444.
PMID: 5500368.
PubMed Entry

Donnelly WH.
The systematized nomenclature for medicine (SNOMED): its application to paediatric pathology.
Med Inform (Lond). 1983 Jan-Mar;8(1):33-39.
PMID: 6834931.
PubMed Entry
Weeks R.
Computer based prescribing. Database is linked to Read codes.
BMJ. 1996 Feb 17;312(7028):446.
PMID: 8601140.
PubMed Entry

Mainland D.
Some research terms for beginners: definitions, comments, and examples--II.
Clin Pharmacol Ther. 1969 Nov-Dec;10(6):867-900.
PMID: 5349628.

Dickey RA.
Coding: the history of the recognition of endocrinology services.
Endocr Pract. 1996 Mar-Apr;2(2):110-115.
PMID: 15251552.

Healey T.
Towards a unified filing system.
Br Med J. 1970 Aug 1;1(717):277-278.
PMID: 5448806.

McEvoy AJ.
Embryo definitions.
Nature. 1988 Nov 17;336(6196):198.
PMID: 3194006.

Margolis J.
Thoughts on definitions of disease.
J Med Philos. 1986 Aug;11(3):233-236.
PMID: 3794556.

Blake T, Smith DL.
Words, work, system, and medical records.
South Med J. 1973 Sep;66(9):971-972.
PMID: 4733589.

Daves ML.
Language of certainty.
AJR Am J Roentgenol. 1986 Jul;147(1):209-210.
PMID: 3487220.

Wilkerson JA, Hirschowitz BI.
A plea for clarity.
Gastrointest Endosc. 1981 Aug;27(3):192-194.
PMID: 7297832.

Dirckx JH.
Very close veins and superstitious fleabites. A glossary of lay medical terms.
Am J Dermatopathol. 1993 Dec;15(6):612-616.
PMID: 8311197.

Zucker A.
Operations and definitions: a brief analysis.
Br J Psychiatry. 1977 Jul;131:111-2. No abstract available.
PMID: 884409

Jelliffe EF, Jelliffe DB.
Instant ultralogic: clarity or confusion?
Am J Clin Nutr. 1975 Dec;28(12):1348-9. No abstract available.
PMID: 802995

Kothari ML, Mehta LA, Kothari ML.
Towards semantic clarity in cancerology.
J Postgrad Med. 1971 Oct;17(4):145-60. No abstract available.
PMID: 5141467

Judd DB.
Terms, definitions, and symbols in reflectometry.
J Opt Soc Am. 1967 Apr;57(4):445-52. No abstract available.
PMID: 6027819

Haeckel R, Desmond Geary T, Burtis CA.
The term "random access" is inappropriate as a descriptor for clinical-analysis systems.
Clin Chem. 1988 Jul;34(7):1520. No abstract available.
PMID: 3390948

Dolfman ML.
Toward operational definitions of health.
J Sch Health. 1974 Apr;44(4):206-9. No abstract available.
PMID: 4493813

Miller MJ.
Viral taxonomy.
Clin Infect Dis. 1995 Aug;21(2):279-80. No abstract available.
PMID: 8562731

Ioannides G.
Clear-cut definitions.
J Dermatol Surg Oncol. 1985 Mar;11(3):214. No abstract available.
PMID: 3973194

Sugar O.
Accuracy of medical terminology.
Arch Neurol. 1985 Oct;42(10):932-3. No abstract available.
PMID: 4038100

Kane SH.
Administrative significance of computerized medical studies.
Circ Res. 1962 Sep;11:647-9. No abstract available.
PMID: 13962189.
PubMed Entry

Kane SH.
Administrative significance of computerized medical studies.
Circ Res. 1962 Sep;11:647-649.
PMID: 14030582.
PubMed Entry

Issitt PD.
More on blood group terminology.
Immunohematol. 1988;4(1):17.
PMID: 15945923.
PubMed Entry

Berman JJ.
Automatic extraction of candidate nomenclature terms using the doublet method.
BMC Med Inform Decis Mak. 2005 Oct 18;5:35.
PMID: 16232314.
PubMed Entry

Patel AA, Kajdacsy-Balla A, Berman JJ, Bosland M, Datta MW, Dhir R, Gilbertson J, Melamed J, Orenstein J, Tai KF, Becich MJ.
The development of common data elements for a multi-institute prostate cancer tissue bank: the Cooperative Prostate Cancer Tissue Resource (CPCTR) experience.
BMC Cancer. 2005 Aug 21;5:108.
PMID: 16111498.
PubMed Entry

Berman JJ.
Nomenclature-based data retrieval without prior annotation: facilitating biomedical data integration with fast doublet matching.
In: Silico Biol. 2005;5(3):313-322. Epub 2005 Apr 3.
PMID: 15984939.
PubMed Entry

Berman JJ, Bhatia K.
Biomedical data integration: using XML to link clinical and research data sets.
Expert Rev Mol Diagn. 2005 May;5(3):329-336.
PMID: 15934811.
PubMed Entry

Datta MW, Dhir R, Dobbin K, Bosland MC, Melamed J, Becich MJ, Orenstein JM, Kajdacsy-Balla AA, Patel A, Macias V, Berman JJ.
Prostate cancer in patients with screening serum prostate specific antigen values less than 4.0 ng/dl: results from the cooperative prostate cancer tissue resource.
J Urol. 2005 May;173(5):1546-1551. PMID: 15821483.
PubMed Entry

Berman JJ.
Pathology data integration with eXtensible Markup Language.
Hum Pathol. 2005 Feb;36(2):139-145. Review.
PMID: 15754290.
PubMed Entry

Berman JJ.
Tumor taxonomy for the developmental lineage classification of neoplasms.
BMC Cancer. 2004 Nov 30;4:88.
PMID: 15571625.
PubMed Entry

Datta MW, Berman JJ, Dhir R.
Prostate cancer with low PSA levels.
N Engl J Med. 2004 Oct 21;351(17):1802-1803.
PMID: 15499672.
PubMed Entry

Berman JJ.
Doublet method for very fast autocoding. BMC Med Inform Decis Mak. 2004 Sep 15;4:16.
PMID: 15369595.
PubMed Entry

Mitchell KJ, Becich MJ, Berman JJ, Chapman WW, Gilbertson J, Gupta D, Harrison J, Legowski E, Crowley RS.
Implementation and evaluation of a negation tagger in a pipeline-based system for information extract from pathology reports.
Medinfo. 2004;11(Pt 1):663-667.
PMID: 15360896.
PubMed Entry

Melamed J, Datta MW, Becich MJ, Orenstein JM, Dhir R, Silver S, Fidelia-Lambert M, Kadjacsy-Balla A, Macias V, Patel A, Walden PD, Bosland MC, Berman JJ.
The cooperative prostate cancer tissue resource: a specimen and data resource for cancer researchers.
Clin Cancer Res. 2004 Jul 15;10(14):4614-4621.
PMID: 15269132.
PubMed Entry

Berman JJ.
Resources for comparing the speed and performance of medical autocoders.
BMC Med Inform Decis Mak. 2004 Jun 15;4:8.
PMID: 15198804.
PubMed Entry

Booker DL, Berman JJ.
Dangerous abbreviations.
Hum Pathol. 2004 May;35(5):529-531.
PMID: 15138924.
PubMed Entry

Berman JJ.
Tumor classification: molecular analysis meets Aristotle.
BMC Cancer. 2004 Mar 17;4:10.
PMID: 15113444.
PubMed Entry

Berman JJ, Datta M, Kajdacsy-Balla A, Melamed J, Orenstein J, Dobbin K, Patel A, Dhir R, Becich MJ.
The tissue microarray data exchange specification: implementation by the Cooperative Prostate Cancer Tissue Resource.
BMC Bioinformatics. 2004 Feb 27;5:19.
PMID: 15040818.
PubMed Entry

Berman JJ.
Zero-check: a zero-knowledge protocol for reconciling patient identities across institutions.
Arch Pathol Lab Med. 2004 Mar;128(3):344-346.
PMID: 14987147.
PubMed Entry

Berman JJ.
Pathology abbreviated: a long review of short terms.
Arch Pathol Lab Med. 2004 Mar;128(3):347-352.
PMID: 14987146.
PubMed Entry

Berman JJ.
Racing to share pathology data.
Am J Clin Pathol. 2004 Feb;121(2):169-171. Review.
PMID: 14983928.
PubMed Entry

Berman JJ, Henson DE.
The precancers: waiting for a classification.
Hum Pathol. 2003 Sep;34(9):833-834. Review.
PMID: 14562276.
PubMed Entry

Berman JJ, Henson DE.
Classifying the precancers: a metadata approach.
BMC Med Inform Decis Mak. 2003 Jun 20;3:8. Epub 2003 Jun 20.
PMID: 12818004.
PubMed Entry

Berman JJ.
A tool for sharing annotated research data: the "Category 0" UMLS (Unified Medical Language System) vocabularies. BMC Med Inform Decis Mak. 2003 Jun 16;3:6. Epub 2003 Jun 16.
PMID: 12809560.
PubMed Entry

Berman JJ, Edgerton ME, Friedman BA.
The tissue microarray data exchange specification: a community-based, open source tool for sharing tissue microarray data.
BMC Med Inform Decis Mak. 2003 May 23;3:5. Epub 2003 May 23.
PMID: 12769826.
PubMed Entry

Berman JJ.
Concept-match medical data scrubbing. How pathology text can be used in research.
Arch Pathol Lab Med. 2003 Jun;127(6):680-686.
PMID: 12741890.
PubMed Entry

Berman JJ.
Threshold protocol for the exchange of confidential medical data.
BMC Med Res Methodol. 2002 Nov 11;2:12. Epub 2002 Nov 11.
PMID: 12425722.
PubMed Entry

Berman JJ.
Confidentiality issues for medical data miners.
Artif Intell Med. 2002 Sep-Oct;26(1-2):25-36.
PMID: 12234715.
PubMed Entry

Fletcher CD, Berman JJ, Corless C, Gorstein F, Lasota J, Longley BJ, Miettinen M, O'Leary TJ, Remotti H, Rubin BP, Shmookler B, Sobin LH, Weiss SW.
Diagnosis of gastrointestinal stromal tumors: A consensus approach.
Hum Pathol. 2002 May;33(5):459-465. Review.
PMID: 12094370.
PubMed Entry

O'Leary T, Berman JJ.
Gastrointestinal stromal tumors: answers and questions.
Hum Pathol. 2002 May;33(5):456-458. Review.
PMID: 12094369.
PubMed Entry

Fletcher CD, Berman JJ, Corless C, Gorstein F, Lasota J, Longley BJ, Miettinen M, O'Leary TJ, Remotti H, Rubin BP, Shmookler B, Sobin LH, Weiss SW.
Diagnosis of gastrointestinal stromal tumors: a consensus approach.
Int J Surg Pathol. 2002 Apr;10(2):81-89. Review.
PMID: 12075401.
PubMed Entry

Kogan SC, Ward JM, Anver MR, Berman JJ, Brayton C, Cardiff RD, Carter JS, de Coronado S, Downing JR, Fredrickson TN, Haines DC, Harris AW, Harris NL, Hiai H, Jaffe ES, MacLennan IC, Pandolfi PP, Pattengale PK, Perkins AS, Simpson RM, Tuttle MS, Wong JF, Morse HC 3rd; Hematopathology subcommittee of the Mouse Models of Human Cancers Consortium.
Bethesda proposals for classification of nonlymphoid hematopoietic neoplasms in mice.
Blood. 2002 Jul 1;100(1):238-245.
PMID: 12070033.
PubMed Entry

Hutchins GM, Berman JJ, Moore GW, Hanzlick R, Autopsy Committee of the College of American Pathologists.
Practice guidelines for autopsy pathology: autopsy reporting. Autopsy Committee of the College of American Pathologists.
Arch Pathol Lab Med. 1999 Nov;123(11):1085-1092.
PMID: 10539932.
PubMed Entry

Berman JJ, Moore GW, Hutchins GM.
U.S. Senate Bill 422: the Genetic Confidentiality and Nondiscrimination Act of 1997.
Diagn Mol Pathol. 1998 Aug;7(4):192-196. Review.
PMID: 9917128.
PubMed Entry

Berman JJ, Moore GW, Hutchins GM.
Internet autopsy database.
Hum Pathol. 1997 Apr;28(4):393-394.
PMID: 9104935.
PubMed Entry

Berman JJ, Moore GW.
SNOMED-encoded surgical pathology databases: a tool for epidemiologic investigation.
Mod Pathol. 1996 Sep;9(9):944-950.
PMID: 8878028.
PubMed Entry

Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM.
A prototype Internet autopsy database. 1625 consecutive fetal and neonatal autopsy facesheets spanning 20 years.
Arch Pathol Lab Med. 1996 Aug;120(8):782-785.
PMID: 8718907.
PubMed Entry

Berman JJ, Moore GW, Hutchins GM.
Maintaining patient confidentiality in the public domain Internet Autopsy Database (IAD).
Proc AMIA Annu Fall Symp. 1996;:328-332.
PMID: 8947682.
PubMed Entry

Berman JJ, Borkowski A, Rachocka H, Moore GW.
Impact of unfunded research in medicine, pathology, and surgery.
South Med J. 1995 Mar;88(3):295-299.
PMID: 7886525.
PubMed Entry

Berman JJ, Moore GW.
Image analysis software for the detection of preneoplastic and early neoplastic lesions.
Cancer Lett. 1994 Mar 15;77(2-3):103-109.
PMID: 8168056.
PubMed Entry

Moore GW, Berman JJ.
Performance analysis of manual and automated systematized nomenclature of medicine (SNOMED) coding.
Am J Clin Pathol. 1994 Mar;101(3):253-256.
PMID: 8135178.
PubMed Entry

Moore GW, Berman JJ.
Automatic SNOMED coding.
Proc Annu Symp Comput Appl Med Care. 1994;:225-229.
PMID: 7949924.
PubMed Entry

Berman JJ, Moore GW, Donnelly WH, Massey JK, Craig B.
A SNOMED analysis of three years' accessioned cases (40,124) of a surgical pathology department: implications for pathology-based demographic studies.
Proc Annu Symp Comput Appl Med Care. 1994;:188-192.
PMID: 7949917.
PubMed Entry

Seidman JD, Berman JJ.
Premalignant nonepithelial lesions: a biological classification.
Mod Pathol. 1993 Sep;6(5):544-554. Review.
PMID: 8248110.
PubMed Entry

Berman JJ, Moore GW, O'Neill TP, Liebelt AG, Saffiotti U.
Registry of Experimental Cancers of the National Cancer Institute. A database resource for cancer research.
Am J Pathol. 1993 Feb;142(2):351-352.
PMID: 8434635.
PubMed Entry

Berman JJ, Moore GW.
The role of cell death in the growth of preneoplastic lesions: a Monte Carlo simulation model.
Cell Prolif. 1992 Nov;25(6):549-557.
PMID: 1457604.
PubMed Entry

Sorace JM, Carnahan GE, Moore GW, Berman JJ.
Automated review of blood donor screening test patterns at a regional blood center.
Am J Clin Pathol. 1992 Sep;98(3):334-344.
PMID: 1529966.
PubMed Entry

Berman JJ, Moore GW.
Spontaneous regression of residual tumour burden: prediction by Monte Carlo simulation.
Anal Cell Pathol. 1992 Sep;4(5):359-368.
PMID: 1445794.
PubMed Entry

Borkowski A, Berman JJ, Moore GW.
Research by pathologists not funded by external grant agencies: a success story.
Mod Pathol. 1992 Sep;5(5):577-579.
PMID: 1344824.
PubMed Entry

Moore GW, Berman JJ.
Cell growth simulations predicting polyclonal origins for 'monoclonal' tumors.
Cancer Lett. 1991 Nov;60(2):113-119.
PMID: 1933835
PubMed Entry

Sorace JM, Berman JJ, Carnahan GE, Moore GW.
PRELOG: precedence logic inference software for blood donor deferral.
Proc Annu Symp Comput Appl Med Care. 1991;:976-977.
PMID: 1807774.
PubMed Entry

Moore GW, Berman JJ.
Object-oriented controlled-vocabulary translator using TRANSOFT + HyperPAD.
Proc Annu Symp Comput Appl Med Care. 1991;:973-975.
PMID: 1807773.
PubMed Entry

Anon.
Clarity on the diagnosis line.
Ann Diagn Pathol. 2000 Apr;4(2):134.
PMID: 10760326.
PubMed Entry

Anon.
Domain 4: Clarity and presentation.
Z Arztl Fortbild Qualitatssich. 2005;99(8):511-512, 483-484. English, German.
PMID: 16294497.
PubMed Entry

Weed LL.
Medical records that guide and teach.
N Engl J Med. 1968;278:593-600.

Weed LL.
Medical records that guide and teach.
N Engl J Med. 1968;278:652-657.

Bayegan E, Tu S.
The helpful patient record system: problem oriented and knowledge based.
Proc AMIA Symp 2002;:36-40.

Campbell JR, Payne TH.
A comparison of four schemes for codification of problem lists.
Proc Annu Symp Comput Appl Med Care 1994;:201-205.

Campbell JR.
Strategies for problem list implementation in a complex clinical enterprise.
Proc AMIA Symp 1998;:285-289.

Donaldson MS, Povar GJ.
Improving the master problem list: a case study in changing clinician behavior.
QRB Qual Rev Bull 1985, 11:327-333.

Elkin PL, Mohr DN, Tuttle MS, Cole WG, Atkin GE, Keck K, Fisk TB, Kaihoi BH, Lee KE, Higgins MC, Suermondt HJ, Olson N, Claus PL, Carpenter PC, Chute CG.
Standardized problem list generation, utilizing the Mayo canonical vocabulary embedded within the Unified Medical Language System.
Proc AMIA Annu Fall Symp 1997;:500-504.

Goldberg H, Goldsmith D, Law V, Keck K, Tuttle M, Safran C.
An evaluation of UMLS as a controlled terminology for the Problem List Toolkit.
Medinfo 1998;9(Pt 1):609-612.

Hales JW, Schoeffler KM, Kessler DP.
Extracting medical knowledge for a coded problem list vocabulary from the UMLS Knowledge Sources.
Proc AMIA Symp. 1998;:275-279.

Starmer J, Miller R, Brown S.
Development of a Structured Problem List Management System at Vanderbilt.
Proc AMIA Annu Fall Symp 1998, 1083.
Institute of Medicine (U.S.).
Committee on Improving the Patient Record.
In: Dick RS, Steen EB, Detmer DE, eds. The computer-based patient record: an essential technology for health care. Rev edition.
Washington, DC: National Academy Press, 1997;:.

Scherpbier HJ, Abrams RS, Roth DH, Hail JJ.
A simple approach to physician entry of patient problem list.
Proc Annu Symp Comput Appl Med Care 1994, 206-210.

Wasserman H, Wang J.
An Applied Evaluation of SNOMED CT as a Clinical Vocabulary for the Computerized Diagnosis and Problem List.
Proc AMIA Symp. 2003;:699-703.

U. S. National Library of Medicine.
Unified Medical Language System (UMLS).
http://umlsks.nlm.nih.gov/

Payne T, Martin DR.
How useful is the UMLS metathesaurus in developing a controlled vocabulary for an automated problem list?
Proc Annu Symp Comput Appl Med Care. 1993;:.705-709.

Goldberg H, Hsu C, Law V, Safran C.
Validation of clinical problems using a UMLS-based semantic parser.
Proc AMIA Symp. 1998;:805-809.

Zelingher J, Rind DM, Caraballo E, Tuttle M, Olson N, Safran C.
Categorization of free-text problem lists: an effective method of capturing clinical data.
Proc Annu Symp Comput Appl Med Care. 1995;:416-420.

Wang SJ, Bates DW, Chueh HC, Karson AS, Maviglia SM, Greim JA, Frost JP, Kuperman GJ.
Automated coded ambulatory problem lists: evaluation of a vocabulary and a data entry tool.
Int J Med Inf 2003, 72:17-28.

Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G.

Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review.
Comput Biomed Res 2000, 33:1-10.

Tuttle MS, Olson NE, Keck KD, Cole WG, Erlbaum MS, Sherertz DD, Chute CG, Elkin PL, Atkin GE, Kaihoi BH, Safran C, Rind D, Law V.
Metaphrase: an aid to the clinical conceptualization and formalization of patient problems in healthcare enterprises.
Methods Inf Med 1998;37:373-383.

Cooper GF, Miller R.
An experiment comparing lexical and statistical method for extracting MeSH terms from clinical free text.
J Am Med Inform Assoc 1998, 5:62-75.

Aronson AR.
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.
Proc AMIA Symp. 2001;:17-21.

Nadkerni P, Chen R, Brandt C.
UMLS concept indexing for production databases: a feasibility study.
J Am Med Inform Assoc. 2001;8:80-91.

Brennan PF, Aronson AR.
Towards linking patients and clinical information: detecting UMLS concepts in e-mail.
J Biomed Inform 2003, 36:334-341.

Huang Y, Lowe H, Hersh W.
A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports.
J Am Med Inform Assoc. 2003;10:580-587.

Zou Q, Chu WW, Morioka C, Leazer GH, Kangarloo H.
IndexFinder: A Method of Extracting Key Concepts from Clinical Texts for Indexing.
Proc AMIA Symp 2003, 763-767. [PubMed Abstract]

Aronson AR, Bodenreider O, Chang HF, Humphrey SM, Mork JG, Nelson SJ, Rindflesch TC, Wilbur WJ.
The NLM Indexing Initiative.
Proc AMIA Symp. 2000;:17-21.

McCray AT, Sponsler JL, Brylawski B, Browne AC.
The role of lexical knowledge in biomedical text understanding.
In: SCAMC 87. IEEE; 1987:103-107.

McCray AT.
Extending a natural language parser with UMLS knowledge.
Proc Annu Symp Comput Appl Med Care. 1991;:194-198.

Rindflesch TC, Tanabe L, Weinstein JN, Hunter L.
EDGAR: extraction of drugs, genes, and relations from the biomedical literature.
Pac Symp Biocomput. 2000;:517-528.

Pratt W, Yetisgen-Yildiz M.
A Study of Biomedical Concept Identification: MetaMap vs. People.
Proc AMIA Symp 2003, 529-533.

Aronson AR.
Query expansion using the UMLS Metathesaurus.
Proc AMIA Symp. 1997;:485-489.

Wright LW.
Hierarchical Concept Indexing of Full-Text Documents
in the Unified Medical Language System Information Sources Map.

Pratt W, Wassermann H.
QueryCat: Automatic categorization of MEDLINE Queries.
In: Proc AMIA Symp. Los Angeles; 2000:655-659.

Weeber M, Klein H, Aronson AR, Mork JG, de Jong-van den Berg LT, Vos R.
Text-based discovery in biomedicine: the architecture of the DAD-system.
Proc AMIA Symp. 2000;:903-907.

Sneiderman CA, Rindflesch TC, Bean CA.
Identification of anatomical terminology in medical text.
Proc AMIA Symp. 1998;:428-432.

Rindflesch TC, Hunter L, Aronson AR.
Mining molecular binding terminology from biomedical text.
Proc AMIA Symp. 1999;:127-131.

Shadow G, McDonald C.
Extracting structured information from free text pathology reports.
Proc AMIA Symp. 2003;:584-588.

Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG.
A simple algorithm for identifying negated findings and diseases in discharge summaries.
J Biomed Inform. 2001;34:301-310.

Mutalik PG, Deshpande A, Nadkarni PM.
Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS.
J Am Med Inform Assoc. 2001;8:598-609.

NegEx 2
http://web.cbmi.pitt.edu/chapman/NegEx.html
A nice listing of phrases from medical documents that signify negation.

Spyns P.
Natural language processing in medicine: an overview.
Methods Inf Med. 1996;35:285-301.

Chi E, Lyman M, Sager N, Friedman C.
Database of computer-structured narrative: methods of computing complex relations.
In: SCAMC 85. Edited by: IEEE. 1985;:221-226.

Sager N, Friedman C, Chi E.
The analysis and processing of clinical narrative.
In Medinfo 86. Amsterdam (Holland); 1986:1101-1105.

Zingmond D, Lenert LA.
Monitoring free-text data using medical language processing.
Comput Biomed Res. 1993;26:467-481.

Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB
A general natural-language text processor for clinical radiology.
J Am Med Inform Assoc. 1994;1:161-174.

Friedman C, Hripcsak G, Shagina L, Liu H.
Representing information in patient reports using natural language processing and the extensible markup language.
J Am Med Inform Assoc. 1999;6:76-87.

Hripcsak G, Friedman C, Alderson PO, DuMouchel W, Johnson SB, Clayton PD.
Unlocking clinical data from narrative reports: a study of natural language processing.
Ann Intern Med. 1995;122:681-688.

Hripcsak G, Kuperman GJ, Friedman C.
Extracting findings from narrative reports: software transferability and sources of physician disagreement.
Methods Inf Med. 1998;37:1-7.

Knirsch CA, Jain NL, Pablos-Mendez A, Friedman C, Hripcsak G.
Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system.
Infect Control Hosp Epidemiol. 1998;19:94-100.

Jain NL, Friedman C.
Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports.
Proc AMIA Annu Fall Symp. 1997;829-833.

Friedman C, Knirsch C, Shagina L, Hripcsak G.
Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries.
Proc AMIA Symp. 1999;:256-260.

Xu H, Friedman C.
Facilitating Research in Pathology using Natural Language Processing.
Proc AMIA Symp. 2003;:1057.

Friedman C, Johnson SB, Forman B, Starren J.
Architectural requirements for a multipurpose natural language processor in the clinical environment.
Proc Annu Symp Comput Appl Med Care. 1995;:347-351.

Friedman C, Liu H, Shagina L.
A vocabulary development and visualization tool based on natural language processing and the mining of textual patient reports.
J Biomed Inform. 2003;36:189-201.

Friedman C, Shagina L, Lussier Y, Hripcsak G.
Automated Encoding of Clinical Documents Based on Natural Language Processing.
J Am Med Inform Assoc. 2004;:.

Ranum DL.
Knowledge-based understanding of radiology texts.
Comput Methods Programs Biomed. 1989;30:209-215.

Haug PJ, Ranum DL, Frederick PR.
Computerized extraction of coded findings from free-text radiologic reports. Work in progress.
Radiology. 1990;174:543-548.

Haug P, Koehler S, Lau LM, Wang P, Rocha R, Huff S.
A natural language understanding system combining syntactic and semantic techniques.
Proc Annu Symp Comput Appl Med Care. 1994;:247-251.

Haug PJ, Koehler S, Lau LM, Wang P, Rocha R, Huff SM.
Experience with a mixed semantic/syntactic parser.
Proc Annu Symp Comput Appl Med Care. 1995;:284-288.

Koehler SB.
SymText : a natural language understanding system for encoding free text medical data.
1998;:.

Christensen L, Haug P, Fiszman M,
MPLUS: a probabilistic medical language understanding system.
Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain 2002;:29-36.

Nivre J.
On Statistical Methods in Natural Language Processing.
In: Bubenko JjW, Benkt, eds. Promote IT Second Conference for the Promotion of Research in IT at New Universities and University Colleges in Sweden.
Skövde (Sweden): University of Skövde;; 2002:684-694.

Fiszman M, Chapman WW, Aronsky D, Evans RS, Haug PJ.
Automatic detection of acute bacterial pneumonia from chest X-ray reports.
J Am Med Inform Assoc. 2000;:7:593-604.

Fiszman M, Haug PJ.
Using medical language processing to support real-time evaluation of pneumonia guidelines.
Proc AMIA Symp. 2000;:235-239.

Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K.
A natural language parsing system for encoding admitting diagnoses.
Proc AMIA Annu Fall Symp. 1997;:814-818.

Fiszman M, Haug PJ, Frederick PR.
Automatic extraction of PIOPED interpretations from ventilation/perfusion lung scan reports.
Proc AMIA Symp. 1998;:860-864.

Meystre S, Haug PJ.
Medical problem and document model for natural language understanding.
Proc AMIA Symp. 2003;:455-459.

Dolin RH, Alschuler L, Beebe C, Biron PV, Boyer SL, Essin D, Kimber E, Lincoln T, Mattison JE.
The HL7 Clinical Document Architecture.
J Am Med Inform Assoc. 2001;8:552-569.

Worldwide Web Consortium.
Extensible Markup Language (XML) 1.0 (Second Edition).
http://www.w3.org/TR/REC-xml

Worldwide Web Consortium.
Worldwide Web Consortium.
http://www.w3.org

Paterson G, Shepherd M, Wang X, Watters C, Zitner D.
Using the XML-based Clinical Document Architecture for Exchange of Structured Discharge Summaries.
Proceedings 35 th Hawaii Int Conf on System Sciences. 2002.

Heitmann K, Schweiger R, Dudeck J.
Discharge and referral data exchange using global standards -- the SCIPHOX project in Germany.
Int J Med Inf. 2003;70:195-203.

Muller ML, Butta R, Prokosch HU.
Electronic discharge letters using the Clinical Document Architecture (CDA).
Stud Health Technol Inform 2003, 95:824-828.

Bludau H, Wolff A, Hochlehnert AJ.
Presenting XML-based medical discharge letters according to CDA.
Methods Inf Med 2003;42:552-556.

Kleene SC.
Representation of events in nerve nets and finite automata.
In: Shannon C, McCarthy J, eds. Automata Studies. Princeton, NJ: Princeton University Press; 1956:3-41.

Thompson K.
Regular expression search algorithm.
Communications of the ACM 1968, 11:419-422.

Friedl JEF.
Mastering regular expressions.
Cambridge: O'Reilly; 1997;:.

Reichert JC, Glasgow M, Narus SP, Clayton PD.
Using LOINC to link an EMR to the pertinent paragraph in a structured reference knowledge base.
Proc AMIA Symp. 2002;:652-656.

Huff SM, Rocha RA, Bray BE, Warner HR, Haug PJ.
An event model of medical information representation.
J Am Med Inform Assoc. 1995;2:116-134.

International Organization for Standardization.
International Standard ISO/IEC 8824: specification of Abstract Syntax Notation One (ASN.1). Second edition.
Geneva, Switzerland: International Organization for Standardization; 1990;:.

Manning CD, Schütze H.
Foundations of Statistical Natural Language Processing. 6th edition.
Cambridge, Massachusetts, London, England: MIT Press; 2003.

Netica Application
http://www.norsys.com/netica.html

MetaMap Transfer (MMTx).
http://mmtx.nlm.nih.gov/

XSL Transformations (XSLT) Version 1.0
http://www.w3.org/TR/xslt

Institute of Medicine Committee on Quality of Health Care in America, Kohn LT, Corrigan JM, Donaldson MS.
To Err is Human: Building A Safer Health System.
National Academy Press, Washington, DC; 1999;:.

Weed LL.
Medical records that guide and teach.
N Engl J Med 1968, 278:593-600.

Weed LL.
Medical records that guide and teach.
N Engl J Med 1968, 278:652-657.

Bayegan E, Tu S.
The helpful patient record system: problem oriented and knowledge based.
Proc AMIA Symp 2002, 36-40.

Campbell JR, Payne TH.
A comparison of four schemes for codification of problem lists.
Proc Annu Symp Comput Appl Med Care 1994, 201-205.

Campbell JR.
Strategies for problem list implementation in a complex clinical enterprise.
Proc AMIA Symp 1998, 285-289.

Donaldson MS, Povar GJ.
Improving the master problem list: a case study in changing clinician behavior.
QRB Qual Rev Bull 1985, 11:327-333.

Elkin PL, Mohr DN, Tuttle MS, Cole WG, Atkin GE, Keck K, Fisk TB, Kaihoi BH, Lee KE, Higgins MC, Suermondt HJ, Olson N, Claus PL, Carpenter PC, Chute CG.
Standardized problem list generation, utilizing the Mayo canonical vocabulary embedded within the Unified Medical Language System.
Proc AMIA Annu Fall Symp. 1997;:500-504.

Goldberg H, Goldsmith D, Law V, Keck K, Tuttle M, Safran C.
An evaluation of UMLS as a controlled terminology for the Problem List Toolkit.
Medinfo 1998. 1998;9(Pt 1):609-612.

Hales JW, Schoeffler KM, Kessler DP.
Extracting medical knowledge for a coded problem list vocabulary from the UMLS Knowledge Sources.
Proc AMIA Symp 1998, 275-279.

Starmer J, Miller R, Brown S.
Development of a Structured Problem List Management System at Vanderbilt.
Proc AMIA Annu Fall Symp. 1998;:1083.

Institute of Medicine (U.S.).
Committee on Improving the Patient Record.
In: Dick RS, Steen EB, Detmer DE, eds. The computer-based patient record: an essential technology for health care. Revised edition.
Washington, D.C.: National Academy Press; 1997.

Scherpbier HJ, Abrams RS, Roth DH, Hail JJ.
A simple approach to physician entry of patient problem list.
Proc Annu Symp Comput Appl Med Care 1994;:206-210.

Wasserman H, Wang J.
An Applied Evaluation of SNOMED CT as a Clinical Vocabulary for the Computerized Diagnosis and Problem List.
Proc AMIA Symp 2003;:699-703.

Unified Medical Language System (UMLS)
http://umlsks.nlm.nih.gov/

Payne T, Martin DR.
How useful is the UMLS metathesaurus in developing a controlled vocabulary for an automated problem list?
Proc Annu Symp Comput Appl Med Care. 1993;:705-709.

Goldberg H, Hsu C, Law V, Safran C.
Validation of clinical problems using a UMLS-based semantic parser.
Proc AMIA Symp 1998;:805-809.

Zelingher J, Rind DM, Caraballo E, Tuttle M, Olson N, Safran C.
Categorization of free-text problem lists: an effective method of capturing clinical data.
Proc Annu Symp Comput Appl Med Care. 1995;:416-420.

Wang SJ, Bates DW, Chueh HC, Karson AS, Maviglia SM, Greim JA, Frost JP, Kuperman GJ.
Automated coded ambulatory problem lists: evaluation of a vocabulary and a data entry tool.
Int J Med Inf 2003, 72:17-28.

Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G.
Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review.
Comput Biomed Res 2000;33:1-10.

Tuttle MS, Olson NE, Keck KD, Cole WG, Erlbaum MS, Sherertz DD, Chute CG, Elkin PL, Atkin GE, Kaihoi BH, Safran C, Rind D, Law V.
Metaphrase: an aid to the clinical conceptualization and formalization of patient problems in healthcare enterprises.
Methods Inf Med 1998;:37:373-383.

Cooper GF, Miller R.
An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text.
J Am Med Inform Assoc. 1998;5:62-75.

Aronson AR.
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.
Proc AMIA Symp 2001;:17-21.

Nadkerni P, Chen R, Brandt C.
UMLS concept indexing for production databases: a feasibility study.
J Am Med Inform Assoc 2001, 8:80-91.

Brennan PF, Aronson AR.
Towards linking patients and clinical information: detecting UMLS concepts in e-mail.
J Biomed Inform. 2003;36:334-341.

Huang Y, Lowe H, Hersh W.
A pilot study of contextual UMLS indexing to improve the precision of concept-based representation in XML-structured clinical radiology reports.
J Am Med Inform Assoc. 2003;10:580-587.

Zou Q, Chu WW, Morioka C, Leazer GH, Kangarloo H.
IndexFinder: A Method of Extracting Key Concepts from Clinical Texts for Indexing.
Proc AMIA Symp. 2003;:763-767.

Aronson AR, Bodenreider O, Chang HF, Humphrey SM, Mork JG, Nelson SJ, Rindflesch TC, Wilbur WJ.
The NLM Indexing Initiative.
Proc AMIA Symp 2000;:17-21.

McCray AT, Sponsler JL, Brylawski B, Browne AC.
The role of lexical knowledge in biomedical text understanding.
In: SCAMC 87. IEEE; 1987:;103-107.

McCray AT.
Extending a natural language parser with UMLS knowledge.
Proc Annu Symp Comput Appl Med Care. 1991;:194-198.

Rindflesch TC, Tanabe L, Weinstein JN, Hunter L.
EDGAR: extraction of drugs, genes, and relations from the biomedical literature.
Pac Symp Biocomput. 2000;:517-528.

Pratt W, Yetisgen-Yildiz M.
A Study of Biomedical Concept Identification: MetaMap vs. People.
Proc AMIA Symp. 2003;:529-533.

Aronson AR.
Query expansion using the UMLS Metathesaurus.
Proc AMIA Symp. 1997;:485-489.

Wright LW.
Hierarchical Concept Indexing of Full-Text Documents in the Unified Medical Language System Information Sources Map.
Journal of the American Society for Information Science. 1998;50:514-523.

Pratt W, Wassermann H.
QueryCat: Automatic categorization of MEDLINE Queries.
In: Proc AMIA Symp. Los Angeles; 2000:655-659.

Weeber M, Klein H, Aronson AR, Mork JG, de Jong-van den Berg LT, Vos R.
Text-based discovery in biomedicine: the architecture of the DAD-system.
Proc AMIA Symp. 2000;:903-907.

Sneiderman CA, Rindflesch TC, Bean CA.
Identification of anatomical terminology in medical text.
Proc AMIA Symp. 1998;:428-432.

Rindflesch TC, Hunter L, Aronson AR.
Mining molecular binding terminology from biomedical text.
Proc AMIA Symp. 1999;:127-131.

Shadow G, McDonald C.
Extracting structured information from free text pathology reports.
Proc AMIA Symp 2003, 584-588.

Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG.
A simple algorithm for identifying negated findings and diseases in discharge summaries.
J Biomed Inform. 2001;34:301-310.

Mutalik PG, Deshpande A, Nadkarni PM.
Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS.
J Am Med Inform Assoc. 2001;8:598-609.

NegEx 2
http://web.cbmi.pitt.edu/chapman/NegEx.html

Spyns P.
Natural language processing in medicine: an overview.
Methods Inf Med 1996, 35:285-301.

Chi E, Lyman M, Sager N, Friedman C.
Database of computer-structured narrative: methods of computing complex relations.
In: SCAMC 85 Edited by: IEEE. 1985;:221-226.

Sager N, Friedman C, Chi E.
The analysis and processing of clinical narrative.
In: Medinfo 86. Amsterdam (Holland); 1986:1101-1105.

Zingmond D, Lenert LA.
Monitoring free-text data using medical language processing.
Comput Biomed Res. 1993;26:467-481.

Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB.
A general natural-language text processor for clinical radiology.
J Am Med Inform Assoc. 1994;1:161-174.

Friedman C, Hripcsak G, Shagina L, Liu H.
Representing information in patient reports using natural language processing and the extensible markup language.
J Am Med Inform Assoc. 1999;6:76-87.

Hripcsak G, Friedman C, Alderson PO, DuMouchel W, Johnson SB, Clayton PD:
Unlocking clinical data from narrative reports: a study of natural language processing.
Ann Intern Med 1995, 122:681-688.

Hripcsak G, Kuperman GJ, Friedman C.
Extracting findings from narrative reports: software transferability and sources of physician disagreement.
Methods Inf Med. 1998;37:1-7.

Knirsch CA, Jain NL, Pablos-Mendez A, Friedman C, Hripcsak G.
Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system.
Infect Control Hosp Epidemiol. 1998;19:94-100.

Jain NL, Friedman C.
Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports.
Proc AMIA Annu Fall Symp. 1997;:829-833.

Friedman C, Knirsch C, Shagina L, Hripcsak G.
Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries.
Proc AMIA Symp. 1999;:256-260.

Xu H, Friedman C.
Facilitating Research in Pathology using Natural Language Processing.
Proc AMIA Symp. 2003;:1057.

Friedman C, Johnson SB, Forman B, Starren J.
Architectural requirements for a multipurpose natural language processor in the clinical environment.
Proc Annu Symp Comput Appl Med Care. 1995;:347-351.

Friedman C, Liu H, Shagina L.
A vocabulary development and visualization tool based on natural language processing and the mining of textual patient reports.
J Biomed Inform. 2003;36:189-201.

Friedman C, Shagina L, Lussier Y, Hripcsak G.
Automated Encoding of Clinical Documents Based on Natural Language Processing.
J Am Med Inform Assoc. 2004;:.

Ranum DL.
Knowledge-based understanding of radiology texts.
Comput Methods Programs Biomed. 1989;:30:209-215.

Haug PJ, Ranum DL, Frederick PR.
Computerized extraction of coded findings from free-text radiologic reports. Work in progress.
Radiology. 1990;174:543-548.

Haug P, Koehler S, Lau LM, Wang P, Rocha R, Huff S.
A natural language understanding system combining syntactic and semantic techniques.
Proc Annu Symp Comput Appl Med Care. 1994;:247-251.

Haug PJ, Koehler S, Lau LM, Wang P, Rocha R, Huff SM.
Experience with a mixed semantic/syntactic parser.
Proc Annu Symp Comput Appl Med Care. 1995;:284-288.

Koehler SB.
SymText: a natural language understanding system for encoding free text medical data.
1998;:.

Christensen L, Haug P, Fiszman M.
MPLUS: a probabilistic medical language understanding system.
Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain. 2002;:29-36.

Nivre J.
On Statistical Methods in Natural Language Processing.
In: Bubenko J, Benkt JW, eds. Promote IT Second Conference for the Promotion of Research in IT at New Universities and University Colleges in Sweden. Skövde (Sweden): University of Skövde;; 2002:684-694.

Fiszman M, Chapman WW, Aronsky D, Evans RS, Haug PJ.
Automatic detection of acute bacterial pneumonia from chest X-ray reports.
J Am Med Inform Assoc. 2000;7:593-604.

Fiszman M, Haug PJ.
Using medical language processing to support real-time evaluation of pneumonia guidelines.
Proc AMIA Symp. 2000;235-239.

Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K.
A natural language parsing system for encoding admitting diagnoses.
Proc AMIA Annu Fall Symp 1997;:814-818.

Fiszman M, Haug PJ, Frederick PR.
Automatic extraction of PIOPED interpretations from ventilation/perfusion lung scan reports.
Proc AMIA Symp. 1998;:860-864,

Meystre S, Haug PJ.
Medical problem and document model for natural language understanding.
Proc AMIA Symp. 2003;:455-459.

Dolin RH, Alschuler L, Beebe C, Biron PV, Boyer SL, Essin D, Kimber E, Lincoln T, Mattison JE.
The HL7 Clinical Document Architecture.
J Am Med Inform Assoc. 2001;8:552-569.

Extensible Markup Language (XML) 1.0 (Second Edition)
http://www.w3.org/TR/REC-xml

Paterson G, Shepherd M, Wang X, Watters C, Zitner D.
Using the XML-based Clinical Document Architecture for Exchange of Structured Discharge Summaries.
Proceedings 35 th Hawaii Int Conf on System Sciences 2002.

Heitmann K, Schweiger R, Dudeck J.
Discharge and referral data exchange using global standards -- the SCIPHOX project in Germany.
Int J Med Inf 2003, 70:195-203. [Publisher Full Text]

Muller ML, Butta R, Prokosch HU.
Electronic discharge letters using the Clinical Document Architecture (CDA).
Stud Health Technol Inform. 2003;95:824-828.

Bludau H, Wolff A, Hochlehnert AJ.
Presenting XML-based medical discharge letters according to CDA.
Methods Inf Med. 2003;42:552-556.

Kleene SC.
Representation of events in nerve nets and finite automata.
In: Automata Studies. Edited by: Shannon C, McCarthy J. Princeton, NJ: Princeton University Press; 1956:3-41.

Thompson K.
Regular expression search algorithm.
Comm ACM. 1968;11:419-422.

Friedl JEF.
Mastering regular expressions.
Cambridge: O'Reilly. 1997;:.

Reichert JC, Glasgow M, Narus SP, Clayton PD.
Using LOINC to link an EMR to the pertinent paragraph in a structured reference knowledge base.
Proc AMIA Symp. 2002;:652-656.

Huff SM, Rocha RA, Bray BE, Warner HR, Haug PJ.
An event model of medical information representation.
J Am Med Inform Assoc. 1995;2:116-134.

International Organization for Standardization.
International Standard ISO/IEC 8824: specification of Abstract Syntax Notation One (ASN.1). Second edition.
Geneva, Switzerland: International Organization for Standardization; 1990;:.

Manning CD, Schütze H.
Foundations of Statistical Natural Language Processing. Sixth edition.
Cambridge, Massachusetts, London, England: MIT Press; 2003;:.

MetaMap Transfer (MMTx)
http://mmtx.nlm.nih.gov/

XSL Transformations (XSLT) Version 1.0
http://www.w3.org/TR/xslt

Institute of Medicine Committee on Quality of Health Care in America, Kohn LT, Corrigan JM, Donaldson MS.
To Err is Human: Building A Safer Health System.
National Academy Press, Washington, DC; 1999.

Aikins J, Brooks R, Clancey W, et al.
Natural Language Processing Systems. In: Barr A, Feigenbaum EA, The Handbook of Artificial Intelligence. Volume 1. 1981;1:283-321.
Stanford/Los Altos, CA: HeurisTech Press/William Kaufmann, Inc.

Allen JF.
Natural Language Understanding.
Redwood City, CA: Benjamin/Cummings. 1994;:.

Bobrow D.
Natural Language Input for a Computer Problem Solving System.
In: Minsky M, ed. Semantic Information Processing.
Cambridge, MA: MIT Press. 1968;:133-215.

Charniak E.
Statistical Language Learning.
Cambridge, MA: MIT Press. 1993;:.

Cohen P, Morgan J, Pollack M.
Intentions in Communication.
Cambridge, MA: MIT Press. 1990;:.

Grosz BJ, Pollack ME, Sidner CL.
Discourse.
In: Posner M, ed. Foundations of Cognitive Science. Cambridge, MA: MIT Press. 1999;:437-468.

Grosz BJ, Jones KS, Webber BL, ed.
Readings in Natural Language Processing.
San Mateo, CA: Morgan Kaufmann. 1986;:.

Mahesh K, Nirenburg S.
Knowledge-Based Systems for Natural Language.
In: Tucker AB, ed. The Computer Science and Engineering Handbook. 1997;:637-653.
Boca Raton, FL: CRC Press, Inc.

McKeown K, Swartout W.
Language Generation and Explanation.
In: Annual Review of Computer Science. Volume 2.
Palo Alto, CA: Annual Reviews. 1987;2:.

Patterson DW.
Natural Language Processing.
In: Patterson DW, ed. Introduction to Artificial Intelligence and Expert Systems. 1990;:227-270.
Englewood Cliffs, NJ: Prentice Hall.

Shank RC.
The Structure of Episodes in Memory.
In: Luger GF, ed. Computation and Intelligence: Collected Readings.
Menlo Park/Cambridge, MA: AAAI Press/The MIT Press. 1975;:236-259.

Weizenbaum J.
ELIZA--A Computer Program for the Study of Natural Language Communication Between Man and Machine.
Comm ACM. 1965;9(1):36-45.

Winograd T.
Understanding Natural Language.
New York: Academic Press. 1972;:.

Abney S.
Parsing by chunks.
In: Berwick RC, Abney SP, Tenny C, eds. Principle-Based Parsing.
Dordrecht: Kluwer Academic. 1991;:257-278.

Abney S.
Part-of-speech tagging and partial parsing.
In: Steve Young and Gerrit Bloothooft (eds.), Corpus-Based Methods in Language and Speech Processing.
Dordrecht: Kluwer Academic. 1996;:118-136.

Abney S.
Statistical methods and linguistics.
In: Klavans JL, Resnik P, eds. The Balancing Act: Combining Symbolic and Statistical Approaches to Language.
Cambridge, MA: MIT Press. 1996;:1-26.

Abney SP.
Stochastic attribute-value grammars.
Computational Linguistics. 1997;23:597-618.

Ackley DH, Hinton GE, Sejnowski TJ.
A learning algorithm for Boltzmann machines.
Cognitive Science 1985;9:147-169.

Aho AV, Sethi R, Ullman JD.
Compilers: Principles, Techniques, and Tools.
Reading, MA: Addison-Wesley. 1986;:.

Allen J.
Natural Language Understanding.
Redwood City, CA: Benjamin Cummings. 1995;:.

Alshawi H, Buchsbaum AL, Xia F.
A comparison of head transducers and transfer for a limited domain translation application.
In: ACL 35/EACL 1997;8:360-365.

Alshawi H, Carter D.
Training and scaling preference functions for disambiguation.
Computational Linguistics 1994;20:635-648.

Anderson JR.
The architecture of cognition.
Cambridge, MA: Harvard University Press. 1983;:.

Anderson JR.
The adaptive character of thought.
Hillsdale, NJ: Lawrence Erlbaum. 1990;:.

Aone C, McKee D.
Acquiring predicate-argument mapping information from multilingual texts.
In: Boguraev B, Pustejovsky J, eds. Corpus Processing for Lexical Acquisition. 1995;:175-190.
Cambridge, MA: MIT Press.

Appelt DE, Hobbs JR, Bear J, Israel D, Tyson M. 1993. Fastus: A finitestate processor for information extraction from real-world text.
In: Proc. of the 13th IJCAI. 1993;13:1172-1178. Chambery, France.

Apresjan, Jurij D. 1974. Regular polysemy. Linguistics 142:5-32.

Apte, Chidanand, Fred Damerau, and Sholom M. Weiss. 1994. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems 12:233-251.

Argamon, Shlomo, Ido Dagan, and Yuval Krymolowski. 1998. A memory-based approach to learning shallow natural language patterns. In ACL 36/COLING 17, pp. 67-73.

Atwell, Eric. 1987. Constituent-likelihood grammar. In Roger Garside, Geoffrey Leech, and Geoffrey Sampson (eds.), The Computational Analysis of English: A Corpus-Based Approach. London: Longman.

Baayen, Harald, and Richard Sproat. 1996. Estimating lexical priors for lowfrequency morphologically ambiguous forms. Computational Linguistics 22: 155-166.

Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer. 1983. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5:179-190. Reprinted in (Waibel and Lee 1990), pp. 308-319.

Bahl, Lalit R., and Robert L. Mercer. 1976. Part-of-speech assignment by a statistical decision algorithm. In International Symposium on Information Theory, Ronneby, Sweden.

Baker, James K. 1975. Stochastic modeling for automatic speech understanding. In D. Raj Reddy (ed.), Speech Recognition: Invited papers presented at the 1974 IEEEsymposium, pp. 521-541. New York: Academic Press. Reprinted in (Waibel and Lee 1990), pp. 297-307.

Baker, James K. 1979. Trainable grammars for speech recognition. In D. H. Klatt and J. J. Wolf (eds.), Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, pp. 547-550.

Baldi, Pierre, and Seren Brunak. 1998. Bioinformatics: The Machine Learning Approach. Cambridge, MA: MIT Press.

Barnbrook, Geoff. 1996. Language and computers: a practical introduction to the computer analysis of language. Edinburgh: Edinburgh University Press.

Basili, Roberto, Maria Teresa Pazienza, and Paola Velardi. 1996. Integrating general-purpose and corpus-based verb classification. Computational Linguistics 22:559-568.

Basili, Roberto, Gianluca De Rossi, and Maria Teresa Pazienza. 1997. Inducing terminology for lexical acquisition. In EMNLP 2, pp. 125-133.

Baum, L. E., T. Petrie, G. Soules, and N. Weiss. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41:164-171.

Beeferman, Doug, Adam Berger, and John Lafferty. 1997. Text segmentation using exponential models. In EMNLP 2, pp. 3 5-46.

Bell, Timothy C., John G. Cleary, and Ian H. Witten. 1990. Text Compression. Englewood Cliffs, NJ: Prentice Hall.

Benello, Julian, Andrew W. Mackie, and James A. Anderson. 1989. Syntactic category disambiguation with neural networks. Computer Speech and Language 3:203-217.

Benson, Morton. 1989. The structure of the collocational dictionary. International Journal of Lexicography 2:1-14.

Benson, Morton, Evelyn Benson, and Robert Ilson. 1993. The BBI combinatory dictionary of English. Amsterdam: John Benjamins.

Berber Sardinha, A. P. 1997. Automatic Identification of Segments in Written Texts. PhD thesis, University of Liverpool.

Berger, Adam L., Stephen A. Della Pietra, and Vincent J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics 22:39-71.

Berry, Michael W. 1992. Large-scale sparse singular value computations. The International Journal of Supercomputer Applications 6:13-49.

Berry, Michael W., Susan T. Dumais, and Gavin W. O'Brien. 1995. Using linear algebra for intelligent information retrieval. SIAM Review 37:573-595.

Berry, Michael W., and Paul G. Young. 1995. Using latent semantic indexing for multilanguage information retrieval. Computers and the Humanities 29: 413-429.

Bever, Thomas G. 1970. The cognitive basis for linguistic structures. In J. R. Hayes (ed.), Cognition and the development of language. New York: Wiley.

Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistic Computing 8:243-257.

Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.

Black, Ezra. 1988. An experiment in computational discrimination of English word senses. IBM Journal of Research and Development 32:185-194.

Black E, Abney S, Flickinger D, Gdaniec C, Grishman R, Harrison P, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski. 1991. A procedure for quantitatively comparing the syntactic coverage of English grammars. In: Proceedings, Speech and Natural Language Workshop, pp. 306-311, Pacific Grove, CA. DARPA.

Black, Ezra, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, and Salim Roukos. 1993. Towards history-based grammars: Using richer models for probabilistic parsing. In ACL 31, pp. 31-37. Also appears in the Proceedings of the DARPA Speech and Natural Language Workshop, Feb. 1992, pp. 134-139.

Bod R.
Enriching Linguistics with Statistics: Performance Models of Natural Language.
Amsterdam: PhD thesis, University of Amsterdam. 1995;:.

Bod R.
Data-oriented language processing: An overview.
Technical Report LP-96-13, Institute for Logic, Language and Computation, University of Amsterdam. 1996;:.

Bod R.
Beyond Grammar: An experience-based theory of language.
Stanford, CA: CSLI Publications. 1998;:.

Bod R, Kaplan R.
A probabilistic corpus-driven model for lexical-functional analysis.
In: ACL 36/COLING 17. 1998;:145-151.

Bod R, Kaplan R, Scha R, Sima'an K.
A data-oriented approach to lexical-functional grammar.
In: Computational Linguistics in the Netherlands 1996;:.
Eindhoven, The Netherlands.

Boguraev B, Briscoe T.
Computational Lexicography for Natural Language Processing.
London: Longman. 1989;:.

Boguraev B, Pustejovsky J.
Issues in text-based lexicon acquisition.
In: Boguraev B, Pustejovsky J, eds. Corpus Processing for Lexical Acquisition. 1995;:3-17.
Cambridge MA: MIT Press.

Boguraev BK.
The contribution of computational lexicography.
In: Bates M, Weischedel RM, eds. Challenges in natural language processing. 1993;:99-132.
Cambridge: Cambridge University Press.

Bonnema R.
Data-oriented semantics.
Master's thesis. Department of Computational Linguistics, University of Amsterdam. 1996;:.

Bonnema R, Bod R, Scha R. A DOP model for semantic interpretation.
In: ACL 35/EACL 8. 1997;:159-167.

Bonzi, Susan, and Elizabeth D. Liddy. 1988. The use of anaphoric resolution for document description in information retrieval. In SIGIR '88, pp. 53-66.

Bookstein A, Swanson DR.
A decision theoretic foundation for indexing.
Journal of the American Society forln formation Science 1975;26:45-50.

Booth TL.
Probabilistic representation of formal languages.
In: Tenth Annual IEEE Symposium on Switching and Automata Theory. 1969;10:74-81.

Booth, Taylor L., and Richard A. Thomson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22:442-450.

Borthwick, Andrew, John Sterling, Eugene Agichtein, and Ralph Grishman. 1998. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In WVLC 6, pp. 152-160.

Bourigault, Didier. 1993. An endogeneous corpus-based method for structural noun phrase disambiguation. In EACL 6, pp. 81-86.

Box GEP, Tiao GC.
1973. Bayesian Inference in Statistical Analysis.
Reading, MA: Addison-Wesley. 1973;:.

Brants T. 1998. Estimating Hidden Markov Model Topologies. In: Jonathan Ginzburg, Zurab Khasidashvili, Carl Vogel, Jean-Jacques Levy, and Enric Vallduvi (eds.), The Tbilisi Symposium on Logic, Language and Computation: Selected Papers, pp. 163-176. Stanford, CA: CSLI Publications.

Brants, Thorsten, and Wojciech Skut. 1998. Automation of treebank annotation. In Proceedings of NeMLaP-98, Sydney, Australia.

Breiman, Leo. 1994. Bagging predictors. Technical Report 421, Department of Statistics, University of California at Berkeley.

Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Belmont, CA: Wadsworth International Group.

Brent, Michael R. 1993. From grammar to lexicon: Unsupervised learning of lexical syntax. Computational Linguistics 19:243-262.

Brew, Chris. 1995. Stochastic HPSG. In EACL 7, pp. 83-89.

Brill, Eric. 1993a. Automatic grammar induction and parsing free text: A transformation-based approach. In ACL 31, pp. 259-265.

Brill, Eric. 1993b. A Corpus-Based Approach to Language Learning. PhD thesis, University of Pennsylvania.

Brill, Eric. 1993c Transformation-based error-driven parsing. In Proceedings Third International Workshop on Parsing Technologies, Tilburg/Durbuy, The Netherlands/Belgium.

Brill, Eric. 1995a. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21:543-565.

Brill, Eric. 1995b. Unsupervised learning of disambiguation rules for part of speech tagging. In WVLC 3, pp. 1-13.

Brill, Eric, David Magerman, Mitch Marcus, and Beatrice Santorini. 1990. Deducing linguistic structure from the statistics of large corpora. In Proceedings of the DARPA Speech and Natural Language Workshop, pp. 275-282, San Mateo CA. Morgan Kaufmann.

Brill, Eric, and Philip Resnik. 1994. A transformation-based approach to prepositional phrase attachment disambiguation. In COLING 15, pp. 1198-1204.

Briscoe, Ted, and John Carroll. 1993. Generalized probabilistic LR parsing of natural language (corpora) with unification-based methods. Computational Linguistics 19:25-59.

Britton JL, ed.
Collected Works of A. M. Turing: Pure Mathematics.
Amsterdam: North-Holland. 1992;:.

Brown, Peter F., John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16: 79-85.

Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, John D. Lafferty, and Robert L. Mercer. 1992a. Analysis, statistical transfer, and synthesis in machine translation. In Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 83-100.

Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, Jennifer C. Lai, and Robert L. Mercer. 1992b. An estimate of an upper bound for the entropy of English. Computational Linguistics 18:31-40.

Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1991a. A statistical approach to sense disambiguation in machine translation. In Proceedings of the DARPA Workshop on Speech and Natural Language Workshop, pp. 146-151.

Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1991b. Word-sense disambiguation using statistical methods. In ACL 29, pp. 264-270. Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19:263-311.

Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and Robert L. Mercer. 1992c. Class-based n-gram models of natural language. Computational Linguistics 18:467-479.

Brown, Peter F., Jennifer C. Lai, and Robert L. Mercer. 1991c: Aligning sentences in parallel corpora. In ACL 29, pp. 169-176.

Bruce, Rebecca, and Janyce Wiebe. 1994. Word-sense disambiguation using decomposable models. In ACL 32, pp. 139-145.

Bruce, Rebecca F., and Janyce M. Wiebe. 1999. Decomposable modeling in natural language processing. Computational Linguistics. to appear.

Brundage, Jennifer, Maren Kresse, Ulrike Schwall, and Angelika Storrer.
Multiword lexemes: A monolingual and contrastive typology for natural language processing and machine translation.
Technical Report 232, Institut für Wissensbasierte Systeme. Heidelberg: IBM Deutschland GmbH, Heidelberg. 1992;:.

Buckley C, Singhal A, Mitra M, Salton G.
1996. New retrieval approaches using SMART: TREC 4.
In: Harman DK, ed. The Second Text REtrieval Conference (TREC-2). 1996;:25-48.

Buitelaar, Paul. 1998. CoreLex: Systematic Polysemy and Underspecification. PhD thesis, Brandeis University.

Burgess, Curt, and Kevin Lund.
1997. Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes 12:177-210.

Burke, Robin, Kristian Hammond, Vladimir Kulyukin, Steven Lytinen, Noriko Tomuro, and Scott Schoenberg. 1997. Question answering from frequently asked question files. AI Magazine 18:57-66.

Caraballo, Sharon A., and Eugene Charniak. 1998. New figures of merit for best-first probabilistic chart parsing. Computational Linguistics 24:275-298.

Cardie, Claire. 1997. Empirical methods in information extraction. AI Magazine 18:65-79.

Carletta, Jean. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 22:249-254.

Carrasco, Rafael C., and Jose Oncina (eds.). 1994. Grammatical inference and applications: second international colloquium, ICGI-94. Berlin: Springer-Verlag.

Carroll, G., and E. Charniak. 1992. Two experiments on learning probabilistic dependency grammars from corpora. In Carl Weir, Stephen Abney, Ralph Grishman, and Ralph Weischedel (eds.), Working Notes of the Workshop StatisticallyBased NLP Techniques, pp. 1-13. Menlo Park, CA: AAAI Press.

Carroll, John. 1994. Relating complexity to practical performance in parsing with wide-coverage unification grammars. In ACL 32, pp. 287-294.

Chang, Jason S., and Mathis H. Chen. 1997. An alignment method for noisy parallel corpora based on image processing techniques. In ACL 35/EACL 8, pp. 297-304.

Chanod, Jean-Pierre, and Pasi Tapanainen. 1995. Tagging French - comparing a statistical and a constraint-based method. In EACL 7, pp. 149-156.

Charniak, Eugene. 1993. Statistical Language Learning. Cambridge, MA: MIT Press.

Charniak, Eugene. 1996. Tree-bank grammars. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAA[ '96), pp. 1031-1036.

Charniak E.
Statistical parsing with a context-free grammar and word statistics.
In: Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI '97). 1997;:598-603.

Charniak, Eugene.
Statistical techniques for natural language parsing.
AI Magazine. 1997;:33-43.

Charniak E, Hendrickson C, Jacobson N, Perkowitz M.
1993. Equations for part-of-speech tagging.
In: Proceedings of the Eleventh National Conference on Artificial lntelligence. 1993;:784-789. Menlo Park, CA.

Cheeseman, Peter, James Kelly, Matthew Self, John Stutz, Will Taylor, and Don Freeman.
1988. AutoClass: A Bayesian classification system. In Proceedings of the Fifth International Conference on Machine Learning, pp. 54-64, San Francisco, CA. Morgan Kaufmann.

Chelba, Ciprian, and Frederick jelinek. 1998. Exploiting syntactic structure for language modeling. In ACL 36/COLING 17, pp. 225-231.

Chen, Jen Nan, and Jason S. Chang. 1998. Topical clustering of MRD senses based on information retrieval techniques. Computational Linguistics 24:61-95.

Chen, Stanley F. 1993. Aligning sentences in bilingual corpora using lexical information. In ACL 31, pp. 9-16.

Chen, Stanley F. 1995. Bayesian grammar induction for language modeling. In ACL 33, pp. 228-235.

Chen, Stanley F., and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. In ACL 34, pp. 310-318.

Chen, Stanley F., and Joshua Goodman. 1998. An empirical study of smooth ing techniques for language modeling. Technical Report TR-10-98, Center for Research in Computing Technology, Harvard University.

Chi, Zhiyi, and Stuart Geman. 1998. Estimation of probabilistic context-free grammars. Computational linguistics 24:299-305.

Chitrao MV, Grishman R. 1990. Statistical parsing of messages. In: Proceedings of the DARPA Speech and Natural Language Workshop, Hidden Valley, PA. 1990;:263-266. Morgan Kaufmann.

Choueka Y.
Looking for needles in a haystack or locating interesting collocational expressions in large textual databases.
In: Proceedings of the RIAO. 1988;:43-38.

Choueka Y, Lusignan S.
1985. Disambiguation by short contexts.
Computers and the Humanities 1985;19:147-158.

Church K, Gale W, Hanks P, Hindle D.
Using statistics in lexical analysis.
In: Zernik U, ed. Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. 1991;:115-164.
Hillsdale, NJ: Lawrence Erlbaum.

Church K, Patil R.
Coping with syntactic ambiguity or how to put the block in the box on the table.
Computational Linguistics. 1982;8:139-149.

Church KW.
A stochastic parts program and noun phrase parser for unrestricted text.
In: ANLP 2. 1988;2:136-143.

Church KW.
Char-align: A program for aligning parallel texts at the character level.
In: ACL 31. 1993;31;1-8.

Church KW.
One term or two?
In: SIGIR '95. 1995;:310-318.

Church, Kenneth W., and William A. Gale. 1991a. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language 5:19-54.

Church, Kenneth W., and William A. Gale. 1991b. Concordances for parallel text. In Proceedings o f the Seventh Annual Conference of the UW Centre for the New OED and Text Research, pp. 40-62, Oxford.

Church, Kenneth W., and William A. Gale. 1995. Poisson mixtures. Natural Language Engineering 1:163-190.

Church, Kenneth Ward, and Patrick Hanks. 1989. Word association norms, mutual information and lexicography. In ACL 27, pp. 76-83.

Church, Kenneth Ward, and Mark Y. Liberman. 1991. A status report on the ACL/DCL In Proceedings of the 7th Annual Conference of the UW Centre for New OED and Text Research: Using Corpora, pp. 84-91.

Church, Kenneth W., and Robert L. Mercer. 1993. Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics 19:1-24.

Clark E, Clark H.
When nouns surface as verbs.
Language 1979;55:767-811.

Cleverdon, Cyril W., and J. Mills. 1963. The testing of index language devices. Aslib Proceedings 15:106-130. Reprinted in (Sparck Jones and Willett 1998).

Coates-Stephens, Sam. 1993. The analysis and acquisition of proper names for the understanding of free text. Computers and the Humanities 26:441-456.

Collins, Michael John. 1996. A new statistical parser based on bigram lexical dependencies. In ACL 34, pp. 184-191.

Collins, Michael John. 1997. Three generative, lexicalised models for statistical parsing. In ACL 35/EACL 8, pp. 16-23.

Collins, Michael John, and James Brooks. 1995. Prepositional phrase attachment through a backed-off model. In WVLC 3, pp. 27-38.

Copestake A, Briscoe T.
Semi-productive polysemy and sense extension.
J Semantics. 1995;12:15-68.

Cormen TH, Leiserson CE, Rivest RL.
Introduction to Algorithms.
Cambridge, MA: MIT Press. 1990;:.

Cottrell GW.
1989. A Connectionist Approach to Word Sense Disambiguation.
London: Pitman.

Cover, Thomas M., and Joy A. Thomas. 1991. Elements of Information Theory. New York: John Wiley & Sons.

Cowart, Wayne. 1997. Experimental syntax: Applying objective methods to sentence judgments. Thousand Oaks, CA: Sage Publications.

Croft, W. B., and D. J. Harper. 1979. Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35:285295.

Crowley, Terry, John Lynch, Jeff Siegel, and Julie Piau. 1995. The Design of Language: An introduction to descriptive linguistics. Auckland: Longman Paul.

Crystal, David. 1987. The Cambridge Encyclopedia of Language. Cambridge, England: Cambridge University Press.

Cutting, Doug, Julian Kupiec, Jan Pedersen, and Penelope Sibun. 1991. A practical part-of-speech tagger. In ANLP 3, pp. 13 3-140.

Cutting, Douglas R., David R. Karger, and Jan O. Pedersen. 1993. Constant interaction-time scatter/gather browsing of very large document collections. In SIGIR '93, pp. 126-134.

Cutting, Douglas R., Jan O. Pedersen, David Karger, and John W. Tukey. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In SIGIR '92, pp. 318-329.

Daelemans, Walter, and Antal van den Bosch. 1996. Language-independent data-oriented grapheme-to-phoneme conversion. In J. Van Santen, R. Sproat, J. Olive, and J. Hirschberg (eds.), Progress in Speech Synthesis, pp. 77-90. New York: Springer Verlag.

Daelemans, Walter, Jakub Zavrel, Peter Berck, and Steven Gillis. 1996. MBT: A memory-based part of speech tagger generator. In WVLC 4, pp. 14-27.

Dagan, Ido, Kenneth Church, and William Gale. 1993. Robust bilingual word alignment for machine aided translation. In WVLC 1, pp. 1-8.

Dagan, Ido, and Alon Itai. 1994. Word sense disambiguation using a second language monolingual corpus. Computational Linguistics 20:563-596.

Dagan, Ido, Alon Itai, and Ulrike Schwall. 1991. Two languages are more informative than one. In ACL 29, pp. 130-137.

Dagan I, Karov Y, Roth D.
1997a. Mistake-driven learning in text categorization.
In: EMNLP 2. 1997;2:55-63.

Dagan, Ido, Lillian Lee, and Fernando Pereira. 1997b. Similarity-based methods for word sense disambiguation. In ACL 35/EACL 8, pp. 56-63.

Dagan, Ido, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In ACL 32, pp. 272-278.

Damerau, Fred J. 1993. Generating and evaluating domain-oriented multi-word terms from texts. Information Processing & Management 29:433-447.

Darroch, J. N., and D. Ratcliff. 1972. Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43:1470-1480.

de Saussure, Ferdinand. 1962. Cours de linguistique generale. Paris: Payot.

Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41:391-407.

DeGroot, Morris H. 1975. Probability and Statistics. Reading, MA: AddisonWesley.

Della Pietra, Stephen, Vincent Della Pietra, and John Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19.

Demers, A.J. 1977. Generalized left corner parsing. In Proceedings of the Fourth Annual ACM Symposium on Principles of Programming Languages, pp. 170 181

Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Series B 39: 1-38.

Dermatas, Evangelos, and George Kokkinakis. 1995. Automatic stochastic tagging of natural language texts. Computational Linguistics 21:137-164.

DeRose, Steven J. 1988. Grammatical category disambiguation by statistical optimization. Computational Linguistics 14:31-39.

Derouault, Anne-Marie, and Bernard Merialdo. 1986. Natural language modeling for phoneme-to-text transcription. IEEE Transactions on Pattern Analysis and Machine Intelligence 8:742-649.

Dietterich TG.
Approximate statistical tests for comparing supervised classification learning algorithms.
Neural Computation 1998;10:1895-1924.

Dini, Luca, Vittorio Di Tomaso, and Frederique Segond. 1998. Error-driven word sense disambiguation. In ACL 36/COLING 17, pp. 320-324.

Dolan, William B. 1994. Word sense ambiguation: Clustering related senses. In COLING 15, pp. 712-716.

Dolin, Ron. 1998. Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources. PhD thesis, University of California at Santa Barbara.

Domingos, Pedro, and Michael Pazzani. 1997. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29:103-130.

Doran C, Egedi D, Hockey BA, Srinivas B, Zaidel M. 1994. XTAG system - a wide coverage grammar for English. In: COLING 15. 1994;15:922-928.

Dorr, Bonnie J., and Mari Broman Olsen. 1997. Deriving verbal and compositional lexical aspect for nlp applications. In ACL 35/EACL 8, pp. 151-158.

Dras, Mark, and Mike Johnson. 1996. Death and lightness: Using a demographic model to find support verbs. In Proceedings of the 5th International Conference on the Cognitive Science of Natural Language Processing, Dublin.

Duda R0, Hart PE.
Pattern classification and scene analysis.
New York: Wiley. 1973;:.

Dumais ST.
Latent semantic indexing (LSI): TREC-3 report.
In: The Third Text Retrieval Conference (TREC 3). 1995;3:219-230.

Dunning T.
Accurate methods for the statistics of surprise and coincidence.
Computational Linguistics. 1993;19:61-74.

Dunning T.
Statistical identification of language. Technical report.
Computing Research Laboratory, New Mexico State University. 1994;:.

Durbin R, Eddy S, Krogh A, Mitchison G.
Biological sequence analysis: probabilistic models of proteins and nucleic acids.
Cambridge: Cambridge University Press. 1998;:.

Eeg-Olofsson M.
A probability model for computer-aided word class determination.
Literary and Linguistic Computing. 1985;5:25-30.

Egan, Dennis E., Joel R. Remde, Louis M. Gomez, Thomas K. Landauer, Jennifer Eberhardt, and Carol C. Lochbaum. 1989. Formative design-evaluation of superbook. ACM Transactions on Information Systems 7:30-57.

Eisner, Jason. 1996. Three new probabilistic models for dependency parsing: An exploration. In COLING 16, pp. 340-345.

Ellis, C. A. 1969. Probabilistic Languages and Automata. PhD thesis, University of Illinois. Report No. 355, Department of Computer Science.

Elman, Jeffrey L. 1990. Finding structure in time. Cognitive Science 14:179-2 11.

Elworthy, David. 1994. Does Baum-Welch re-estimation help taggers? In ANLP 4, pp. 53-58.

Estoup, J. B. 1916. Gammes Stenographiques, 4th edition. Paris.

Evans, David A., Kimberly Ginther-Webster, Mary Hart, Robert G. Lefferts, and Ira A. Monarch. 1991. Automatic indexing using selective NLP and first-order thesauri. In Proceedings of the RIAO, volume 2, pp. 624-643.

Evans, David A., and Chengxiang Zhai. 1996. Noun-phrase analysis in unrestricted text for information retrieval. In ACL 34, pp. 17-24.

Fagan, Joel L. 1987. Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In SIGIR '87, pp. 91-101.

Fagan, Joel L. 1989. The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science 40:115-132.

Fano, Robert M. 1961. Transmission of information; a statistical theory of communications. New York: MIT Press.

Fillmore, Charles J., and B. T. S. Atkins. 1994. Starting where the dictionaries stop: The challenge of corpus lexicography. In B.T.S. Atkins and A. Zampolli (eds.), Computational Approaches to the Lexicon, pp. 349-393. Oxford: Oxford University Press.

Finch, Steven, and Nick Chater. 1994. Distributional bootstrapping: From word class to proto-sentence. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pp. 301-306, Hillsdale, NJ. Lawrence Erlbaum.

Finch, Steven Paul. 1993. Finding Structure in Language. PhD thesis, University of Edinburgh.

Firth, J. R. 1957. A synopsis of linguistic theory 1930-1955. In Studies in Linguistic Analysis, pp. 1-32. Oxford: Philological Society. Reprinted in F. R. Palmer (ed), Selected Papers of J. R. Firth 1952-1959, London: Longman, 1968.

Fisher, R. A. 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society 222:309-368. Fontenelle, Thierry, Walter Bruls, Luc Thomas, Tom Vanallemeersch, and Jacques Jansen. 1994. DECIDE, MLAP-Project 93-19, deliverable D-la: survey of collocation extraction tools. Technical report, University of Liege, Liege, Belgium.

Ford, Marilyn, Joan Bresnan, and Ronald M. Kaplan. 1982. A competence-based theory of syntactic closure. In Joan Bresnan (ed.), The Mental Representation of Grammatical Relations, pp. 727-796. Cambridge, MA: MIT Press.

Foster, G. F. 1991. Statistical lexical disambiguation. Master's thesis, School of Computer Science, McGill University.

Frakes, William B., and Ricardo Baeza-Yates (eds.). 1992. Information Retrieval. Englewood Cliffs, NJ: Prentice Hall.

Francis, W. Nelson, and Henry Kucera. 1964. Manual of information to accompany a standard corpus of present-day edited American English, for use with digital computers. Providence, RI: Dept of Linguistics, Brown University.

Francis, W. Nelson, and Henry Kucera. 1982. Frequency Analysis of English Usage: Lexicon and Grammar. Boston, MA: Houghton Mifflin.

Franz, Alexander. 1996. Automatic Ambiguity Resolution in Natural Language Processing, volume 1171 of Lecture Notes in Artificial Intelligence. Berlin: Springer Verlag.

Franz, Alexander. 1997. Independence assumptions considered harmful. In ACL 35/EACL 8, pp. 182-189.

Franz, Alexander Mark. 1995. A Statistical Approach to Syntactic Ambiguity Resolution. PhD thesis, CMU.

Frazier, Lyn. 1978. On Comprehending Sentences: Syntactic Parsing Strategies. PhD thesis, University of Connecticut.

Freedman, David, Robert Pisani, and Roger Purves. 1998. Statistics. New York: W. W. Norton. 3rd ed.

Friedl, Jeffrey E. F. 1997. Mastering Regular Expressions. Sebastopol, CA: O'Reilly & Associates.

Fu, King-Sun. 1974. Syntactic Methods in Pattern Recognition. London: Academic Press.

Fung, Pascale, and Kenneth W. Church. 1994. K-vec: A new approach for aligning parallel texts. In COLING 15, pp. 1096-1102.

Fung, Pascale, and Kathleen McKeown. 1994. Aligning noisy parallel corpora across language groups: Word pair feature matching by dynamic time warp ing. In Proceedings o f the Association for Machine Translation in the Americas (AMTA-94), pp. 81-88.

Gale, William A., and Kenneth W. Church. 1990a. Estimation procedures for language context: Poor estimates of context are worse than none. In Proceedings in Computational Statistics (COMPSTAT 9), pp. 69-74.

Gale, William A., and Kenneth W. Church. 1990b. Poor estimates of context are worse than none. In Proceedings of the June 1990 DARPA Speech and Natural Language Workshop, pp. 283-287, Hidden Valley, PA.

Gale, William A., and Kenneth W. Church. 199L A program for aligning sentences in bilingual corpora. In ACL 29, pp. 177-184.

Gale, William A., and Kenneth W. Church. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics 19:75-102.

Gale, William A., and Kenneth W. Church. 1994. What's wrong with adding one? In Nelleke Oostdijk and Pieter de Haan (eds.), Corpus-Based Research into Language: in honour of Jan Aarts. Amsterdam: Rodopi.

Gale, William A., Kenneth W. Church, and David Yarowsky. 1992a. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In ACL 30, pp. 249-256.

Gale, William A., Kenneth W. Church, and David Yarowsky. 1992b. A method for disambiguating word senses in a large corpus. Computers and the Humanities 26:415-439.

Gale, William A., Kenneth W. Church, and David Yarowsky. 1992c. A method for disambiguating word senses in a large corpus. Technical report, AT&T Bell Laboratories, Murray Hill, NJ.

Gale, William A., Kenneth W. Church, and David Yarowsky. 1992d. Using bilingual materials to develop word sense disambiguation methods. In Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-92), pp. 101-112.

Gale, William A., Kenneth W. Church, and David Yarowsky. 1992e. Work on statistical methods for word sense disambiguation. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale (eds.), Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pp. 54-60, Menlo Park, CA. AAAI Press.

Gale, William A., and Geoffrey Sampson. 1995. Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics 2:217-237.

Gallager, Robert G. 1968. Information theory and reliable communication. New York: Wiley.

Garside, Roger. 1995. Grammatical tagging of the spoken part of the British National Corpus: a progress report. In Geoffrey N. Leech, Greg Myers, and Jenny Thomas (eds.), Spoken English on computer: transcription, mark-up, and application. Harlow, Essex: Longman.

Garside, Roger, and Fanny Leech. 1987. The UCREL probabilistic parsing system. In Roger Garside, Geoffrey Leech, and Geoffrey Sampson (eds.), The Computational Analysis of English: A Corpus-Based Approach, pp. 66-81. London: Longman.

Garside, Roger, Geoffrey Sampson, and Geoffrey Leech (eds.). 1987. The Computational analysis of English: a corpus-based approach. London: Longman.

Gaussier, Eric. 1998. Flow network models for word alignment and terminology extraction from bilingual corpora. In ACL 36/COLING 17, pp. 444-450.

Ge, Niyu, John Hale, and Eugene Charniak. 1998. A statistical approach to anaphora resolution. In WVLC 6, pp. 161-170.

Ghahramani, Zoubin. 1994. Solving inverse problems using an EM approach to density estimation. In Michael C. Mozer, Paul Smolensky, David S. Touretzky, and Andreas S. Weigend (eds.), Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ. Erlbaum Associates.

Gibson, Edward, and Neal J. Pearlmutter. 1994. A corpus-based analysis of psycholinguistic constraints on prepositional-phrase attachment. In Charles Clifton, Jr., Lyn Frazier, and Keith Rayner (eds.), Perspectives on Sentence Processing, pp. 181-198. Hillsdale, NJ: Lawrence Erlbaum.

Gold, E. Mark. 1967. Language identification in the limit. Information and Control 10:447-474.

Goldszmidt, Moises, and Mehran Sahami. 1998. A probabilistic approach to full-text document clustering. Technical Report SIDL-WP-1998-0091, Stanford Digital Library Project, Stanford, CA.

Golub, Gene H., and Charles F. van Loan. 1989. Matrix Computations. Baltimore: The Johns Hopkins University Press.

Good IJ.
The population frequencies of species and the estimation of population parameters.
Biometrika 1953;40:237-264.

Good IJ.
Studies in the history of probability and statistics.
In: A. M. Turing's statistical work in World War II.
Biometrika 1979;66:393-396.

Goodman, Joshua. 1996. Parsing algorithms and metrics. In ACL 34, pp. 177183.

Greenbaum, Sidney. 1993. The tagset for the International Corpus of English. In Eric Atwell and Clive Souter (eds.), Corpus-based Computational Linguistics, pp. 11-24. Amsterdam: Rodopi.

Greene, Barbara B., and Gerald M. Rubin. 1971. Automatic grammatical tagging of English. Technical report, Brown University, Providence, RI.

Grefenstette, Gregory. 1992a. Finding semantic similarity in raw text: the deese antonyms. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale (eds.), Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pp. 61-65, Menlo Park, CA. AAAI Press.

Grefenstette, Gregory. 1992b. Use of syntactic context to produce term association lists for text retrieval. In SIGIR '92, pp. 89-97.

Grefenstette, Gregory. 1994. Explorations in Automatic Thesaurus Discovery. Boston: Kluwer Academic Press.

Grefenstette, Gregory. 1996. Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, chapter 11, pp. 205-216. Cambridge, MA: MIT Press.

Grefenstette, Gregory (ed.). 1998. Cross-language information retrieval. Boston, MA: Kluwer Academic Publishers.

Grefenstette, Gregory, and Pasi Tapanainen. 1994. What is a word, what is a sentence? Problems of tokenization. In Proceedings of the Third International Conference on Computational Lexicography (COMPLEX '94), pp. 79-87, Budapest. Available as Rank Xerox Research Centre technical report MLTT004.

Grenander, Ulf. 1967. Syntax-controlled probabilities. Technical report, Division of Applied Mathematics, Brown University.

Gunter, R., L. B. Levitin, B. Shapiro, and P. Wagner. 1996. Zipf's law and the effect of ranking on probability distributions. International Journal of Theoretical Physics 35:395-417.

Guthrie, Joe A., Louise Guthrie, Yorick Wilks, and Homa Aidinejad. 1991. Subjectdependent co-occurrence and word sense disambiguation. In ACL 29, pp. 146152.

Guthrie, Louise, James Pustejovsky, Yorick Wilks, and Brian M. Slator. 1996. The role of lexicons in natural language processing. Communications of the ACM 39:63-72.

Halliday, M. A. K. 1966. Lexis as a linguistic level. In C. E. Bazell, J. C. Catford, M. A. K. Halliday, and R. H. Robins (eds.), In memory of J. R. Firth, pp. 148-162. London: Longmans.

Halliday, M. A. K. 1994. An introduction to functional grammar, 2nd edition. London: Edward Arnold.

Harman, D.K. (ed.). 1996. The Third Text REtrieval Conference (TREC-4). Washington DC: U.S. Department of Commerce.

Harman, D. K. (ed.). 1994. The Second Text REtrieval Conference (TREC-2). Washington DC: U.S. Department of Commerce. NIST Special Publication 500-215.

Harnad, Stevan (ed.). 1987. Categorical perception: the groundwork of cognition. Cambridge: Cambridge University Press.

Harris, B. 1988. Bi-text, a new concept in translation theory. Language Monthly 54.

Harris, T. E. 1963. The Theory of Branching Processes. Berlin: Springer.

Harris, Zellig. 1951. Methods in Structural Linguistics. Chicago: University of Chicago Press.

Harrison, Philip, Steven Abney, Ezra Black, Dan Flickinger, Ralph Grishman Claudia Gdaniec, Donald Hindle, Robert Ingria, Mitch Marcus, Beatrice Santorini, and Tomek Strzalkowski. 1991. Natural Language Processing Systems Evaluation Workshop, Technical Report RL-TR-91-362. In Jeannette G. Neal and Sharon M. Walter (eds.), Evaluating Syntax Performance of Parser/Grammars of English, Rome Laboratory, Air Force Systems Command, Griffis Air Force Base, NY 13441-5700.

Harter, Steve. 1975. A probabilistic approach to automatic keyword indexing: Part II. an algorithm for probabilistic indexing. Journal of the American Society for Information Science 26:280-289.

Haruno, Masahiko, and Takefumi Yamazaki. 1996. High-performance bilingual text alignment using statistical and dictionary information. In ACL 34, pp. 131-138.

Hatzivassiloglou, Vasileios, and Kathleen R. McKeown. 1993. Towards the automatic identification of adjectival scales: clustering adjectives according to meaning. In ACL 31, pp. 172-182.

Hawthorne, Mark. 1994. The computer in literary analysis: Using TACT with students. Computers and the Humanities 28:19-27.

Hearst, Marti, and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In SIGIR '93, pp. 59-68.

Hearst, Marti A. 1991. Noun homograph disambiguation using local context in large text corpora. In Seventh Annual Conference of the UW Centre for the New OED and Text Research, pp. 1-2 2, Oxford.

Hearst, Marti A. 1992. Automatic acquisition of hyponyms from large text corpora. In COLING 14, pp. 539-545.

Hearst, Marti A. 1994. Context and Structure in Automated Full-Text Information Access. PhD thesis, University of California at Berkeley.

Hearst, Marti A. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23:33-64.

Hearst, Marti A., and Hinrich Schiitze. 1995. Customizing a lexicon to better suit a computational task. In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, pp. 77-96. Cambridge, MA: MIT Press.

Henderson, James, and Peter Lane. 1998. A connectionist architecture for learning to parse. In ACL 36/COLING 17, pp. 531-537.

Hermjakob, Ulf, and Raymond J. Mooney. 1997. Learning parse and translation decisions from examples with rich context. In ACL 35/EACL 8, pp. 482-489.

Hertz, John A., Richard G. Palmer, and Anders S. Krogh. 1991. Introduction to the theory of neural computation. Redwood City, CA: Addison-Wesley. Herwijnen, Eric van. 1994. Practical SGML, 2nd edition. Dordrecht: Kluwer Academic.

Hickey, Raymond. 1993. Lexa: Corpus processing software. Technical report, The Norwegian Computing Centre for the Humanities, Bergen.

Hindle, Donald. 1990. Noun classification from predicate argument structures. In ACL 28, pp. 268-275.

Hindle, Donald. 1994. A parser for text corpora. In B.T.S. Atkins and A. Zampolli (eds.), Computational Approaches to the Lexicon, pp. 103-151. Oxford: Oxford University Press.

Hindle, Donald, and Mats Rooth. 1993. Structural ambiguity and lexical relations. Computational Linguistics 19:103-120.

Hirst, Graeme. 1987. Semantic Interpretation and the Resolution of Ambiguity. Cambridge: Cambridge University Press.

Hodges, Julia, Shiyun Yie, Ray Reighart, and Lois Boggess. 1996. An automated system that assists in the generation of document indexes. Natural Language Engineering 2:137-160.

Holmes VM, Stowe L, Cupples L.
Lexical expectations in parsing complement-verb sentences.
Journal of Memory and Language. 1989;28:668-689.

Honavar, Vasant, and Giora Slutzki (eds.). 1998. Grammatical inference: 4th international colloquium, ICGI-98. Berlin: Springer.

Hopcroft, John E., and Jeffrey D. Ullman. 1979. Introduction to automata theory, languages, and computation. Reading, MA: Addison-Wesley.

Hopper, Paul J., and Elizabeth Cross Traugott. 1993. Grammaticalization. Cambrige: Cambridge University Press.

Hornby, A. S. 1974. Oxford Advanced Learner's Dictionary of Current English. Oxford: Oxford University Press. Third Edition.

Horning, James Jay. 1969. A study of grammatical inference. PhD thesis, Stanford.

Huang, T., and King Sun Fu. 1971. On stochastic context-free languages. Information Sciences 3:201-224.

Huddleston, Rodney. 1984. Introduction to the Grammar of English. Cambridge: Cambridge University Press.

Hull, David. 1996. Stemming algorithms - A case study for detailed evaluation. Journal of the American Society for Information Science 47:70-84.

Hull, David. 1998. A practical approach to terminology alignment. In Didier Bourigault, Christian Jacquemin, and Marie-Claude L'Homme (eds.), Proceedings of Computerm '98, pp. 1-7, Montreal, Canada.

Hull, David, and Doug Oard (eds.). 1997. AAAI Symposium on Cross-Language Text and Speech Retrieval. Stanford, CA: AAAI Press.

Hull, David A., and Gregory Grefenstette. 1998. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Karen Sparck Jones and Peter Willett (eds.), Readings in Information Retrieval. San Francisco: Morgan Kaufmann.

Hull, David A., Jan O. Pedersen, and Hinrich Schiitze. 1996. Method combination for document filtering. In SIGIR '96, pp. 279-287.

Hutchins, S. E. 1970. Stochastic Sources for Context-free Languages. PhD thesis, University of California, San Diego.

Ide, Nancy, and Jean Veronis (eds.). 1995. The Text Encoding Initiative: Background and Context. Dordrecht: Kluwer Academic. Reprinted from Computers and the Humanities 29(1-3), 1995.

Ide, Nancy, and Jean Veronis. 1998. Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics 24:1-40.

Ide, Nancy, and Donald Walker. 1992. Introduction: Common methodologies in humanities computing and computational linguistics. Computers and the Humanities 26:327-330.

Inui, K., V. Sornlertlamvanich, H. Tanaka, and T. Tokunaga. 1997. A new formalization of probabilistic GLR parsing. In Proceedings of the Fifth International Workshop on Parsing Technologies (IYVPT-97), pp. 123-134, MIT.

Isabelle, Pierre. 1987. Machine translation at the TAUM group. In Margaret King (ed.), Machine Translation Today: The State of theArt, pp. 247-277. Edinburgh: Edinburgh University Press.

Jacquemin, Christian. 1994. FASTR: A unification-based front-end to automatic indexing. In Proceedings of RIAO, pp. 34-47, Rockefeller University, New York.

Jacquemin, Christian, Judith L. Klavans, and Evelyne Tzoukermann. 1997. Expansion of multi-word terms for indexing and retrieval using morphology and syntax. In ACL 35/EACL 8, pp. 24-31.

Jain, Anil K., and Richard C. Dubes. 1988. Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall.

Jeffreys, Harold. 1948. Theory of Probability. Oxford: Clarendon Press.

Jelinek, Frederick. 1969. Fast sequential decoding algorithm using a stack. IBM Journal of Research and Development pp. 675-685.

Jelinek, Frederick. 1976. Continuous speech recognition by statistical methods. IEEE64:532-556.

Jelinek, Frederick. 1985. Markov source modeling of text generation. In J. K. Skwirzynski (ed.), The Impact of Processing Techniques on Communications, volume E91 of NATO ASI series, pp. 569-598. Dordrecht: M. Nijhoff.

Jelinek, Fred. 1990. Self-organized language modeling for speech recognition. Printed in (Waibel and Lee 1990), pp. 450-506.

Jelinek, Frederick. 1997. Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press.

Jelinek, Frederick, Lalit R. Bahl, and Robert L. Mercer. 1975. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Transactions on Information Theory 21:250-256.

Jelinek, F., J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. 1994. Decision tree parsing using a hidden derivation model. In Proceedings of the 1994 Human Language Technology Workshop, pp. 272-277. DARPA.

Jelinek, Fred, and John D. Lafferty. 1991. Computation of the probability of initial substring generation by stochastic context-free grammars. Computational Linguistics 17:315-324.

Jelinek, F., J. D. Lafferty, and R. L. Mercer. 1990. Basic methods of probabilistic context free grammars. Technical Report RC 16374 (#72684), IBM T. J. Watson Research Center.

Jelinek, F., J. D. Lafferty, and R. L. Mercer. 1992a. Basic methods of probabilistic context free grammars. In P. Laface and R. De Mori (eds.), Speech Recognition and Understanding: Recent Advances, Trends, and Applications, volume 75 of Series F: Computer and Systems Sciences. Springer Verlag.

Jelinek, Fred, and Robert Mercer. 1985. Probability distribution estimation from sparse data. IBM Technical Disclosure Bulletin 28:2591-2594.

Jelinek, Frederick, Robert L. Mercer, and Salim Roukos. 1992b. Principles of lexical language modeling for speech recognition. In Sadaoki Furui and M. Mohan Sondhi (eds.), Advances in Speech Signal Processing, pp. 651-699. New York: Marcel Dekker.

Jensen, Karen, George E. Heidorn, and Stephen D. Richardson (eds.). 1993. Natural language processing: The PLNLP approach. Boston: Kluwer Academic Publishers.

Johansson, Stig, G. N. Leech, and H. Goodluck. 1978. Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Oslo: Dept of English, University of Oslo.

Johnson, Mark. 1998. The effect of alternative tree representations on tree bank grammars. In Proceedings of Joint Conference on New Methods in Language Processing and Computational Natural Language Learning (NeMLaP3/CoNLL98), pp. 39-48, Macquarie University.

Johnson WE.
Probability: deductive and inductive problems.
Mind 1932;41: 421-423.

Joos M.
Review of The Psycho-Biology of Language.
Language 1936;12:196-210.

Jorgensen, Julia. 1990. The psychological reality of word senses. Journal of Psycholinguistic Research 19:167-190.

Joshi, Aravind K. 1993. Tree-adjoining grammars. In R. E. Asher (ed.), The Encyclopedia of Language and Linguistics. Oxford: Pergamon Press.

Justeson JS, Katz SM.
Co-occurrences of antonymous adjectives and their contexts.
Computational Linguistics. 1991;17:1-19.

Justeson JS, Katz SM.
Principled disambiguation: Discriminating adjective senses with modified nouns.
Computational Linguistics. 1995:24:1-28.

Justeson JS, Katz SM.
Technical terminology: some linguistic properties and an algorithm for identification in text.
Natural Language Engineering. 1995;1:9-27.

Kahneman D, Slovic P, Tversky A, eds.
Judgment under uncertainty: heuristics and biases.
Cambridge: Cambridge University Press. 1982;:.

Kan, Min-Yen, Judith L. Klavans, and Kathleen R. McKeown. 1998. Linear segmentation and segment significance. In WVLC 6, pp. 197-205.

Kaplan, Ronald M., and Joan Bresnan. 1982. Lexical-Functional Grammar: A formal system for grammatical representation. In Joan Bresnan (ed.), The Mental Representation of Grammatical Relations, pp. 173-281. Cambridge, MA: MIT Press.

Karlsson, Fred, Atro Voutilainen, Juha Heikkila, and Arto Anttila. 1995. Con straint Grammar: A Language-Independent System for Parsing Unrestricted Text. Berlin: Mouton de Gruyter.

Karov, Yael, and Shimon Edelman. 1998. Similarity-based word sense disambiguation. Computational Linguistics 24:41-59.

Karttunen, Lauri. 1986. Radical lexicalism. Technical Report 86-68, Center for the Study of Language and Information, Stanford CA.

Katz, Slava M. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-35:400-401.

Katz, Slava M. 1996. Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2:15-59. Kaufman, Leonard, and Peter J. Rousseeuw. 1990. Finding groups in data. New York: Wiley.

Kaufmann, Stefan. 1998. Second-order cohesion: Using wordspace in text segmentation. Department of Linguistics, Stanford University.

Kay, Martin, and Martin Roscheisen. 1993. Text-translation alignment. Computational Linguistics 19:121-142.

Kehler, Andrew. 1997. Probabilistic coreference in information extraction. In EMNLP 2, pp. 163-173.

Kelly, Edward, and Phillip Stone. 1975. Computer Recognition of English Word Senses. Amsterdam: North-Holland.

Kempe, Andre. 1997. Finite state transducers approximating hidden markov models. In ACL 35/EACL 8, pp. 460-467.

Kennedy, Graeme. 1998. An Introduction to Corpus Linguistics. London: Longman.

Kent, Roland G. 1930. Review of Relative Frequency as a Determinant of Phonetic Change. Language 6:86-88.

Kilgarriff, Adam. 1993. Dictionary word sense distinctions: An enquiry into their nature. Computers and the Humanities 26:365-387.

Kilgarriff, Adam. 1997. "i don't believe in word senses". Computers and the Humanities 31:91-113.

Kilgarriff, Adam, and Tony Rose. 1998. Metrics for corpus similarity and homogeneity. Manuscript, ITRI, University of Brighton.

Kirkpatrick, S., C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220:671-680.

Klavans, Judith, and Min-Yen Kan. 1998. Role of verbs in document analysis. In ACL 36/COLING 17, pp. 680-686.

Klavans, Judith L., and Evelyne Tzoukermann. 1995. Dictionaries and corpora: Combining corpus and machine-readable dictionary data for building bilingual lexicons. Journal of Machine Translation 10.

Klein, Sheldon, and Robert F. Simmons. 1963. A computational approach to grammatical coding of English words. Journal of the Association for Computing Machinery 10:334-347.

Kneser, Reinhard, and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, volume 1, pp. 181-184.

Knight, Kevin. 1997. Automating knowledge acquisition for machine translation. AI Magazine 18:81-96.

Knight, Kevin, Ishwar Chander, Matthew Haines, Vasileios Hatzivassiloglou, Eduard Hovy, Masayo lida, Steve Luk, Richard Whitney, and Kenji Yamada. 1995. Filling knowledge gaps in a broad-coverage MT system. In Proceedings of IJCAI 95.

Knight, Kevin, and Jonathan Graehl. 1997. Machine transliteration. In ACL 35/EACL 8, pp. 128-135.

Knight, Kevin, and Vasileios Hatzivassiloglou. 1995. Two-level, many-paths generation. In ACL 33, pp. 252-260.

Knill, Kate M., and Steve Young. 1997. Hidden markov models in speech and language processing. In Steve Young and Gerrit Bloothooft (eds.), Corpus-Based Methods in Language and Speech Processing, pp. 27-68. Dordrecht: Kluwer Academic.

Kohonen, Teuvo. 1997. Self-Organizing Maps. Berlin, Heidelberg, New York: Springer Verlag. Second Extended Edition.

Korfhage, Robert R. 1997. Information Storage and Retrieval. Berlin: John Wiley.

Krenn, Brigitte, and Christer Samuelsson. 1997. The linguist's guide to statistics. manuscript, University of Saarbrucken.

Krovetz R.
1991. Lexical acquisition and information retrieval.
In: Zernik U, ed. Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. 1991;:45-64.
Hillsdale, NJ: Lawrence Erlbaum.

Kruskal JB. 1964.
Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.
Psychometrika 1964;29:1-27.

Kruskal JB.
Nonmetric multidimensional scaling: A numerical method.
Psychometrika 1964;29:115-129.

Kucera, Henry, and W. Nelson Francis. 1967. Computational Analysis of PresentDay American English. Providence, RI: Brown University Press.

Kupiec, Julian. 1991. A trellis-based algorithm for estimating the parameters of a hidden stochastic context-free grammar. In Proceedings of the Speech and Natural Language Workshop, pp. 241-246. DARPA.

Kupiec, Julian. 1992a. An algorithm for estimating the parameters of unrestricted hidden stochastic context-free grammars. In COLING 14, pp. 387-393.

Kupiec, Julian. 1992b. Robust part-of-speech tagging using a Hidden Markov Model. Computer Speech and Language 6:225-242.

Kupiec, Julian. 1993a. An algorithm for finding noun phrase correspondences in bilingual corpora. In ACL 31, pp. 17-22.

Kupiec, Julian. 1993b. MURAX: A robust linguistic approach for question answering using an on-line encyclopedia. In SIGIR '93, pp. 181-190.

Kupiec, Julian, Jan Pedersen, and Francine Chen. 1995. A trainable document summarizer. In SIGIR '95, pp. 68-73.

Kwok, K. L., and M. Chan. 1998. Improving two-stage ad-hoc retrieval for short queries. In SIGIR '98, pp. 250-256.

Lafferty, John, Daniel Sleator, and Davy Temperley. 1992. Grammatical trigrams: A probabilistic model of link grammar. In Proceedings of the 1992 AAAI Fall Symposium on Probabilistic Approaches to Natural Language.

Lakoff G.
Women, fire, and dangerous things.
Chicago, IL: University of Chicago Press. 1987;:.

Landauer, Thomas K., and Susan T. Dumais. 1997. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104:211-240.

Langacker, Ronald W. 1987. Foundations of Cognitive Grammar, volume 1. Stanford, CA: Stanford University Press.

Langacker, Ronald W. 1991. Foundations of Cognitive Grammar, volume 2. Stanford, CA: Stanford University Press.

Laplace, PS marquis de. Essai philosophique sur les probabilites. Paris: Mme. Ve. Courcier. 1814;:.

Laplace, Pierre Simon marquis de. Philosophical Essay On Probabilities. New York: Springer-Verlag. 1995;:.

Lari, K., and S. J. Young. 1990. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4: 35-56.

Lari, K., and S. J. Young. 1991. Application of stochastic context free grammar using the inside-outside algorithm. Computer Speech and Language 5:237257.

Lau, Raymond. 1994. Adaptive statistical language modelling. Master's thesis, Massachusetts Institute of Technology.

Lau, Ray, Ronald Rosenfeld, and Salim Roukos. 1993. Adaptive language modeling using the maximum entropy principle. In Proceedings of the Human Language Technology Workshop, pp. 108-113. ARPA.

Lauer, Mark. 1995a. Corpus statistics meet the noun compound: Some empirical results. In ACL 33, pp. 47-54.

Lauer, Mark. 1995b. Designing Statistical Language Learners: Experiments on Noun Compounds. PhD thesis, Macquarie University, Sydney, Australia.

Leacock, Claudia, Martin Chodorow, and George A. Miller. 1998. Using corpus statistics and Wordnet relations for sense identification. Computational Linguistics 24:147-165.

Lesk, Michael. 1986. Automatic sense disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference, pp. 24-26, New York. Association for Computing Machinery.

Lesk ME.
Word-word association in document retrieval systems. American Documentation. 1969;20:27-38.

Levin, Beth. 1993. English Verb Classes and Alternations. Chicago: The University of Chicago Press.

Levine, John R., Tony Mason, and Doug Brown. 1992. Lex & Yacc, 2nd edition. Sebastopol, CA: O'Reilly & Associates.

Levinson, S. E., L. R. Rabiner, and M. M. Sondhi. 1983. An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recongition. Bell System Technical journal 62:1035-1074.

Lewis, David D. 1992. An evaluation of phrasal and clustered representations on a text categorization task. In SIGIR '92, pp. 37-50.

Lewis, David D., and Karen Sparck Jones. 1996. Natural language processing for information retrieval. Communications of the ACM 39:92-101.

Lewis, David D., and Marc Ringuette. 1994. A comparison of two learning algorithms for text categorization. In Proc. SDAIR 94, pp. 81-93, Las Vegas, NV.

Lewis, David D., Robert E. Schapire, James P. Callan, and Ron Papka. 1996. Training algorithms for linear text classifiers. In SIGIR '96, pp. 298-306.

Li, Hang, and Naoki Abe. 1995. Generalizing case frames using a thesaurus and the mdl principle. In Proceedings of Recent Advances in Natural Language Processing, pp. 239-248, Tzigov Chark, Bulgaria.

Li, Hang, and Naoki Abe. 1996. Learning dependencies between case frame slots. In COLING 16, pp. 10-15.

Li, Hang, and Naoki Abe. 1998. Word clustering and disambiguation based on co-occurrence data. In ACL 36/COLING 17, pp. 749-755.

Li W. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory. 1992;38:1842-1845.

Lidstone, G. J. 1920. Note on the general case of the Bayes-Laplace formula for inductive or a priori probabilities. Transactions of the Faculty of Actuaries 8: 182-192.

Light, Marc. 1996. Morphological cues for lexical semantics. In ACL 34, pp. 25-31.

Littlestone, Nick. 1995. Comparing several linear-threshold learning algorithms on tasks involving superfluous attributes. In A. Prieditis (ed.), Proceedings of the 12th International Conference on Machine Learning, pp. 353-361, San Francisco, CA. Morgan Kaufmann.

Littman, Michael L., Susan T. Dumais, and Thomas K. Landauer. 1998a. Automatic cross-language information retrieval using latent semantic indexing. In Gregory Grefenstette (ed.), Cross Language Information Retrieval. Kluwer.

Littman, Michael L., Fan Jiang, and Greg A. Keim. 1998b. Learning a language-independent representation for terms from a partially aligned corpus. In: Jude Shavlik J, ed. Proceedings of the Fifteenth International Conference on Machine Learning, pp. 1998;:314-322. Morgan Kaufmann.

Losee, Robert M. (ed.). 1998. Text Retrieval and Filtering. Boston, MA: Kluwer Academic Publishers.

Lovins, Julie Beth. 1968. Development of a stemming algorithm. Translation and Computational Linguistics 11:22-3 1.

Luhn, H. P. 1960. Keyword-in-context index for technical literature (KWIC index). American Documentation 11:288-295.

Lyons, John. 1968. Introduction to Theoretical Linguistics. Cambridge: Cambridge University Press.

MacDonald, M. A., N. J. Pearlmutter, and M. S. Seidenberg. 1994. The lexical nature of syntactic ambiguity resolution. Psychological Review 101:676-703.

MacKay, David J. C., and Linda C. Peto. 1990. Speech recognition using hidden Markov models. The Lincoln Laboratory journal 3:41-62.

Magerman, David M. 1994. Natural language parsing as statistical pattern recognition. PhD thesis, Stanford University.

Magerman, David M. 1995. Statistical decision-tree models for parsing. In ACL 33, pp. 276-283.

Magerman, David M., and Mitchell P. Marcus. 1991. Pearl: A probabilistic chart parser. In EACL 4. Also published in the Proceedings of the 2nd International Workshop for Parsing Technologies.

Magerman, David M., and Carl Weir. 1992. Efficiency, robustness, and accuracy in Picky chart parsing. In ACL 30, pp. 40-47.

Mandelbrot, Benoit. 1954. Structure formelle des textes et communcation. Word 10:1-27.

Mandelbrot, Benoit B. 1983. The Fractal Geometry of Nature. New York: W. H. Freeman.

Mani, Inderjeet, and T. Richard MacMillan. 1995. Identifying unknown proper names in newswire text. In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, pp. 41-59. Cambridge, MA: MIT Press.

Manning, Christopher D. 1993. Automatic acquisition of a large subcategorization dictionary from corpora. In ACL 31, pp. 235-242.

Manning, Christopher D., and Bob Carpenter. 1997. Probabilistic parsing using left corner language models. In Proceedings of the Fifth International Workshop on Parsing Technologies (IWPT-97), pp. 147-158, MIT.

Marchand, Hans. 1969. Categories and types of present-day English wordformation. Munchen: Beck.

Marcus, Mitchell, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. 1994. The Penn Treebank: Annotating predicate argument structure. In ARPA Human Language Technology Workshop, pp. 110-115.

Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19:313-330.

Markov, Andrei A. 1913. An example of statistical investigation in the text of `Eugene Onyegin' illustrating coupling of `tests' in chains. In Proceedings of the Academy of Sciences, St. Petersburg, volume 7 of VI, pp. 153-162.

Marr, David. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. New York: W. H. Freeman.

Marshall, Ian. 1987. Tag selection using probabilistic methods. In Roger Garside, Geoffrey Sampson, and Geoffrey Leech (eds.), The Computational analysis of English: a corpus-based approach, pp. 42-65. London: Longman.

Martin, James. 1991. Representing and acquiring metaphor-based polysemy. In Uri Zernik (ed.), Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 389-415. Hillsdale, NJ: Lawrence Erlbaum.

Martin, W. A., K. W. Church, and R. S. Patil. 1987. Preliminary analysis of a breadth-first parson algorithm: Theoretical and experimental results. In Leonard Bolc (ed.), Natural Language Parsing Systems. Berlin: Springer Verlag. Also MIT LCS technical report TR-261.

Masand, Brij, Gordon Linoff, and David Waltz. 1992. Classifying news stories using memory based reasoning. In SIGIR '92, pp. 59-65.

Maxwell, III, John T. 1992. The problem with mutual information. Manuscript, Xerox Palo Alto Research Center, September 15, 1992.

McClelland, James L., David E. Rumelhart, and the PDP Research Group (eds.). 1986. Parallel Distributed Processing. Explorations in the Microstructure of Cognition. Volume 2: Psychological and Biological Models. Cambridge, MA: The MIT Press.

McCullagh, Peter, and John A. Nelder. 1989. Generalized Linear Models, 2nd edition, chapter 4, pp. 101-123. Chapman and Hall.

McDonald DD. Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev B, Pustejovsky J, eds. Corpus Processing for Lexical Acquisition. 1995;:21-39. Cambridge MA: MIT Press.

McEnery, Tony, and Andrew Wilson. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press.

McGrath, Sean. 1997. PARSEME.1 ST: SGML for Software Developers. Upper Saddle River, NJ: Prentice Hall PTR.

McMahon, John G., and Francis J. Smith. Improving statistical language model performance with automatically generated word hierarchies. Computational Linguistics 1996;22:217-247.

McQueen, C.M. Sperberg, and Lou Burnard (eds.). 1994. Guidelines for Electronic Text Encoding and Interchange (TEI P3). Chicago, IL: ACH/ACL/ALLC (Association for Computers and the Humanities, Association for Computational Linguistics, Association for Literary and Linguistic Computing).

McRoy, Susan W. 1992. Using multiple knowledge sources for word sense disambiguation. Computational Linguistics 18:1-30.

Melamed, I. Dan. 1997a. A portable algorithm for mapping bitext correspondence. In ACL 35/EACL 8, pp. 305-312.

Melamed, I. Dan. 1997b. A word-to-word model of translational equivalence. In ACL 35/EACL 8, pp. 490-497.

Mel'cuk, Igor Aleksandrovich. 1988. Dependency Syntax: theory and practice. Albany: State University of New York.

Mercer, Robert L. 1993. Inflectional morphology needs to be authenticated by hand. In Working Notes of the AAAI Spring Syposium on Building Lexicons for Machine Translation, pp. 99-99, Stanford, CA. AAAI Press.

Merialdo, Bernard. 1994. Tagging English text with a probabilistic model. Computational Linguistics 20:155-171.

Miclet, Laurent, and Colin de la Higuera (eds.). 1996. Grammatical inference: learning syntax from sentences: Third International Colloquium, ICGI-96. Berlin: Springer.

Miikkulainen, Risto (ed.). 1993. Subsymbolic Natural Language Processing. Cambridge MA: MIT Press.

Mikheev, Andrei. 1998. Feature lattices for maximum entropy modelling. In ACL 36, pp. 848-854.

Miller, George A., and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6:1-28.

Miller S, Stallard D, Bobrow R, Schwartz R. A fully statistical approach to natural language interfaces. In: ACL 34. 1996;34:55-61.

Minsky, Marvin Lee, and Seymour Papert (eds.). 1969. Perceptrons: an introduction to computational geometry. Cambridge, MA: MIT Press. Partly reprinted in (Shavlik and Dietterich 1990).

Minsky, Marvin Lee, and Seymour Papert (eds.). 1988. Perceptrons: an introduction to computational geometry. Cambridge, MA: MIT Press. Expanded edition.

Mitchell, Tom M. 1980. The need for biases in learning generalizations. Technical Report Department of Computer Science. CBM-TR-117, Rutgers University. Reprinted in (Shavlik and Dietterich 1990), pp. 184-191.

Mitchell, Tom M. (ed.). 1997. Machine Learning. New York: McGraw-Hill.

Mitra, Mandar, Chris Buckley, Amit Singhal, and Claire Cardie. 1997. An analysis of statistical and syntactic phrases. In Proceedings of RIAO.

Moffat, Alistair, and Justin Zobel. 1998. Exploring the similarity space. ACM SIGIR Forum 32.

Mood, Alexander M., Franklin A. Graybill, and Duane C. Boes. 1974. Introduction to the theory of statistics. New York: McGraw-Hill. 3rd edition.

Mooney, Raymond J. 1996. Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning. In EMNLP 1, pp. 82-91.

Moore, David S., and George P. McCabe. 1989. Introduction to the practice of statistics. New York: Freeman.

Morris, Jane, and Graeme Hirst. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17: 21-48.

Mosteller, Frederick, and David L. Wallace. 1984. Applied Bayesian and Classical Inference - The Case o f The Federalist Papers. Springer Series in Satistics. New York: Springer-Verlag.

Nagao, Makoto. 1984. A framework of a mechanical translation between Japanese and English by analogy principle. In Alick Elithorn and Ranan B. Banerji (eds.), Artificial and Human Intelligence, pp. 173-180. Edinburgh: North-Holland.

Neff, Mary S., Brigitte Blaser, Jean-Marc Lange, Hubert Lehmann, and Isabel Zapata Dominguez. 1993. Get it where you can: Acquiring and maintaining bilingual lexicons for machine translation. In Working Notes of the AAAI Spring Syposium on Building Lexicons for Machine Translation, pp. 98-98, Stanford, CA. AAAI Press.

Nevill-Manning, Craig G., Ian H. Witten, and Gordon W. Paynter. 1997. Browsing in digital libraries: a phrase-based approach. In Proceedings of ACM Digital Libraries, pp. 230-236, Philadelphia, PA. Association for Computing Machinery.

Newmeyer, Frederick J. 1988. Linguistics: The Cambridge Survey. Cambridge, England: Cambridge University Press.

Ney, Hermann, and Ute Essen. 1993. Estimating `small' probabilities by leavingone-out. In Eurospeech '93, volume 3, pp. 2239-2242. ESCA.

Ney, Hermann, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic dependencies in stochastic language modeling. Computer Speech and Language 8:1-28.

Ney, Hermann, Sven Martin, and Frank Wessel. 1997. Statistical language modeling using leaving-one-out. In Steve Young and Gerrit Bloothooft (eds.), CorpusBased Methods in Language and Speech Processing, pp. 174-207. Dordrecht: Kluwer Academic.

Ng, Hwee Tou, and John Zelle. 1997. Corpus-based approaches to semantic interpretation in natural language processing. AI Magazine 18:45-64.

Ng, Hwee Tou, and Hian Beng Lee. 1996. Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In ACL 34, pp. 40-47.

Nie, Jian-Yun, Pierre Isabelle, Pierre Plamondon, and George Foster. 1998. Using a probablistic translation model for cross-language information retrieval. In WVLC 6, pp. 18-27.

Nielssen S, Vogel S, Ney H, Tillmann C. A DP based search algorithm for statistical machine translation. In: ACL 36/COLING 17. 1998;17:960-967.

Nunberg, Geoffrey. 1990. The Linguistics of Punctuation. Stanford, CA: CSLI Publications.

Nunberg, Geoff, and Annie Zaenen. 1992. Systematic polysemy in lexicology and lexicography. In Proceedings of Euralex II, Tampere, Finland.

Oaksford, M., and N. Chater. 1998. Rational Models of Cognition. Oxford, England: Oxford University Press.

Oard, Douglas W., and Nicholas DeClaris. 1996. Cognitive models for text filtering. Manuscript, University of Maryland, College Park.

Ostler, Nicholas, and B. T. S. Atkins. 1992. Predictable meaning shift: Some linguistic properties of lexical implication rules. In James Pustejovsky and Sabine Bergler (eds.), Lexical Semantics and Knowledge Representation: Proceedings fof the Ist SIGLEX Workshop, pp. 76-87. Berlin: Springer Verlag.

Paik, Woojin, Elizabeth D. Liddy, Edmund Yu, and Mary McKenna. 1995. Categorizing and standardizing proper nouns for efficient information retrieval. In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, pp. 61-73. Cambridge MA: MIT Press.

Palmer, David D., and Marti A. Hearst. 1994. Adaptive sentence boundary disambiguation. In ANLP 4, pp. 78-8 3.

Palmer, David D., and Marti A. Hearst. 1997. Adaptive multilingual sentence boundary disambiguation. Computational Linguistics 23:241-267.

Paul, Douglas B. 1990. Speech recognition using hidden markov models. The Lincoln Laboratory journal 3:41-62. Pearlmutter, N., and M. MacDonald. 1992. Plausibility and syntactic ambiguity resolution. In Proceedings of the 14th Annual Conference of the Cognitive Society.

Pedersen, Ted. 1996. Fishing for exactness. In Proceedings of the South-Central SAS Users Group Conference, Austin TX.

Pedersen, Ted, and Rebecca Bruce. 1997. Distinguishing word senses in untagged text. In EMNLP 2, pp. 197-207.

Pereira, Fernando, and Yves Schabes. 1992. Inside-outside reestimation from partially bracketed corpora. In ACL 30, pp. 128-135.

Pereira, Fernando, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In ACL 31, pp. 183-190.

Pinker, Steven. 1994. The Language Instinct. New York: William Morrow.

Pollard, Carl, and Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. Chicago, IL: University of Chicago Press.

Pook, Stuart L., and Jason Catlett. 1988. Making sense out of searching. In Information Online 88, pp. 148-157, Sydney. The Information Science Section of the Library Association of Australia.

Porter, M. F. 1980. An algorithm for suffix stripping. Program 14:130-137.

Poznanski, Victor, and Antonio Sanfilippo. 1995. Detecting dependencies between semantic verb subclasses and subcategorization frames in text corpora. In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, pp. 175-190. Cambridge, MA: MIT Press. Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. 1988. Numerical Recipes in C. Cambridge: Cambridge University Press.

Procter P, ed.
Longman Dictionary of Contemporary English.
Harlow, England: Longman Group. 1978;:.

Prokosch, E. 1933. Review of selected studies of the principle of relative frequency in language. Language 9:89-92. Pustejovsky, James. 1991. The generative lexicon. Computational Linguistics 17: 409-441.

Pustejovsky, James, Sabine Bergler, and Peter Anick. 1993. Lexical semantic techniques for corpus analysis. Computational Linguistics 19:331-358.

Qiu, Yonggang, and H.P. Frei. 1993. Concept based query expansion. In SIGIR '93, pp. 160-169.

Quinlan, J. R. 1986. Induction of decision trees. Machine Learning 1:81-106. Reprinted in (Shavlik and Dietterich 1990).

Quinlan, John Ross. 1993. C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann Publishers. Quinlan, J. R. 1996. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI '96), pp. 725-730.

Quirk, Randolf, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. 1985. A Comprehensive Grammar of the English Language. London: Longman.

Rabiner, Lawrence, and Biing-Hwang Juang. 1993. Fundamentals of Speech Recognition. Englewood Cliffs, NJ: PTR Prentice-Hall. Rabiner, Lawrence R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of IEEE 77:257-286. Reprinted in (Waibel and Lee 1990), pp. 267-296.

Ramsey, Fred L., and Daniel W. Schafer. 1997. The statistical sleuth: a course in methods of data analysis. Belmont, CA: Duxbury Press.

Ramshaw, Lance A., and Mitchell P. Marcus. 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In The Balancing Act. Proceedings of the Workshop, pp. 86-95, Morristown NJ. Association of Computational Linguistics.

Rasmussen, Edie. 1992. Clustering algorithms. In William B. Frakes and Ricardo Baeza-Yates (eds.), Information Retrieval, pp. 419-442. Englewood Cliffs, NJ: Prentice Hall.

Ratnaparkhi, Adwait. 1996. A maximum entropy model for part-of-speech tagging. In EMNLP 1, pp. 133-142.

Ratnaparkhi, Adwait. 1997a. A linear observed time statistical parser based on maximum entropy models. In EMNLP 2, pp. 1-10.

Ratnaparkhi, Adwait. 1997b. A simple introduction to maximum entropy models for natural language processing. Technical Report IRCS Report 97-08, Institute for Research in Cognitive Science, Philadelphia, PA.

Ratnaparkhi, Adwait. 1998. Unsupervised statistical models for prepositional phrase attachment. In ACL 36/COLING 17, pp. 1079-1085.

Ratnaparkhi, Adwait, Jeff Reynar, and Salim Roukos. 1994. A maximum entropy model for prepositional phrase attachment. In Proceedings of the ARPA Workshop on Human Language Technology, pp. 250-255, Plainsboro, NJ.

Read, Timothy R. C., and Noel A. C. Cressie. 1988. Goodness-of-fit statistics for discrete multivariate data. New York: Springer Verlag.

Resnik, Philip. 1992. Probabilistic tree-adjoining grammar as a framework for statistical natural language processing. In COLING 14, pp. 418-425.

Resnik, Philip. 1996. Selectional constraints: an information-theoretic model and its computational realization. Cognition 61:127-159.

Resnik, Philip, and Marti Hearst. 1993. Structural ambiguity and conceptual relations. In WVLC 1, pp. 58-64.

Resnik, Philip, and David Yarowsky. 1998. A perspective on word sense disambiguation methods and their evaluation. In Proceedings of the SIGLEX workshop Tagging Text with Lexical Semantics, pp. 79-86, Washington, DC.

Resnik, Philip Stuart. 1993. Selection and Information: A Class-Based Approach to Lexical Relationships. PhD thesis, University of Pennsylvania.

Reynar, Jeffrey C., and Adwait Ratnaparkhi. 1997. A maximum entropy approach to identifying sentence boundaries. In ANLP 5, pp. 16-19.

Riley, Michael D. 1989. Some applications of tree-based modeling to speech and language indexing. In Proceedings of the DARPA Speech and Natural Language Workshop, pp. 339-352. Morgan Kaufmann.

Riloff, Ellen, and Jessica Shepherd. 1997. A corpus-based approach for building semantic lexicons. In EMNLP 2, pp. 117-124.

Ristad, Eric Sven. 1995. A natural law of succession. Technical Report CS-TR495-95, Princeton University.

Ristad, Eric Sven. 1996. Maximum entropy modeling toolkit. Manuscript, Princeton University.

Ristad, Eric Sven, and Robert G. Thomas. 1997. Hierarchical non-emitting Markov models. In ACL 35/EACL 8, pp. 381-385.

Roark, Brian, and Eugene Charniak. 1998. Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction. In ACL 36/COLING 17, pp. 1110-1116.

Robertson, S.E., and K. Sparck Jones. 1976. Relevance weighting of search terms. Journal of the American Society for Information Science 27:129-146.

Rocchio JJ.
Relevance feedback in information retrieval.
In: Gerard Salton G, ed. The Smart Retrieval System - Experiments in Automatic Document Processing. 1971;313-323.
Englewood Cliffs, NJ: Prentice-Hall.

Roche, Emmanuel, and Yves Schabes. 1995. Deterministic part-of-speech tagging with finite-state transducers. Computational Linguistics 21:227-253.

Roche, Emmanuel, and Yves Schabes. 1997. Finite-State Language Processing. Boston, MA: MIT Press.

Roget, P. M. 1946. Roget's International Thesaurus. New York: Thomas Y. Crowell.

Rosenblatt, Frank (ed.). 1962. Principles of neurodynamics; perceptrons and the theory of brain mechanisms. Washington, DC: Spartan Books.

Rosenfeld, Ronald. 1994. Adaptive Statistical Language Modeling: A Maximum Entropy Approach. PhD thesis, CMU. Technical report CMU-CS-94-138.

Rosenfeld, Roni. 1996. A maximum entropy approach to adaptive statistical language modelling. Computer Speech and Language 10:187-228.

Rosenfeld, Ronald, and Xuedong Huang. 1992. Improvements in stochastic language modeling. In Proceedings of the DARPA Speech and Natural Language Workshop, pp. 107-111. Morgan Kaufmann.

Rosenkrantz, Stanley J., and Philip M. Lewis, IL 1970. Deterministic left corner parser. In IEEE Conference Record of the Ilth Annual Syposium on Switching and Automata, pp. 139-152.

Ross, Ian C., and John W. Tukey. 1975. Introduction to these volumes. In John Wilder Tukey (ed.), Index to Statistics and Probability, pp. iv-x. Los Altos, CA: R & D Press.

Roth, Dan. 1998. Learning to resolve natural language ambiguities: A unified approach. In Proceedings of the Fiftenth National Conference on Artificial Intelligence, Menlo Park CA. AAAI Press.

Rumelhart, D. E., and J. L. McClelland. 1986. On learning the past tenses of English verbs. In James L. McClelland, David E. Rumelhart, and the PDP Research Group (eds.), Parallel Distributed Processing. Explorations in the Microstructure of Cognition. Volume 2: Psychological and Biological Models, pp. 216-271. Cambridge, MA: The MIT Press.

Rumelhart, David E., James L. McClelland, and the PDP research group (eds.). 1986. Parallel Distributed Processing. Explorations in the Microstructure of Cognition. Volume 1: Foundations. Cambridge, MA: The MIT Press.

Rumelhart, David E., and David Zipser. 1985. Feature discovery by competitive learning. Cognitive Science 9:75-112.

Russell, Stuart J., and Peter Norvig. 1995. Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall.

Sakakibara, Y., M. Brown, R. Hughey, L S. Mian, K. Sjolander, R. C. Underwood, and D. Haussler. 1994. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22:5112-5120.

Salton, Gerard. 1971a. Experiments in automatic thesaurus construction for information retrieval. In Proceedings IFIP Congress, pp. 43-49.

Salton, Gerard (ed.). 1971b. The Smart Retrieval System - Experiments in Automatic Document Processing. Englewood Cliffs, NJ: Prentice-Hall.

Salton, Gerard. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison Wesley.

Salton, G., J. Allan, C. Buckley, and A. Singhal. 1994. Automatic analysis, theme generation and summarization of machine-readable texts. Science 264:14211426.

Salton, Gerard, and James Allen. 1993. Selective text utilization and text traversal. In Proceedings of ACMHypertext 93, New York. Association for Computing Machinery.

Salton, Gerard, and Chris Buckley. 1991. Global text matching for information retrieval. Science 253:1012-1015.

Salton, Gerard, Edward A. Fox, and Harry Wu. 1983. Extended boolean information retrieval. Communications of the ACM 26:1022-1036.

Salton, Gerard, and Michael J. McGill. 1983. Introduction to modern information retrieval. New York: McGraw-Hill.

Salton, Gerard, and R. W. Thorpe. 1962. An approach to the segmentation problem in speech analysis and language translation. In Proceedings of the 1961 International Conference on Machine Translation of Languages and Applied Language Analysis, volume 2, pp. 703-724, London. Her Majesty's Stationery Office.

Sampson, Geoffrey. 1989. How fully does a machine-usable dictionary cover English text? Literary and Linguistic Computing 4:29-3 5.

Sampson, Geoffrey. 1995. English for the Computer. New York: Oxford University Press.

Sampson, Geoffrey. 1997. Educating Eve. London: Cassell. Samuel, Ken, Sandra Carberry, and K. Vijay-Shanker. 1998. Dialogue act tagging with transformation-based learning. In ACL 36/COLING 17, pp. 1150-1156.

Samuelsson, Christer. 1993. Morphological tagging based entirely on bayesian inference. In 9th Nordic Conference on Computational Linguistics, Stockholm University, Stockholm, Sweden.

Samuelsson, Christer. 1996. Handling sparse data by successive abstraction. In COLING 16, pp. 895-900.

Samuelsson, Christer, and Atro Voutilainen. 1997. Comparing a linguistic and a stochastic tagger. In ACL 35/EACL 8, pp. 246-253.

Sanderson, Mark, and C. J. van Rijsbergen. 1998. The impact on retrieval effectiveness of the skewed frequency distribution of a word's senses. ACM Transactions on Information Systems. To appear.

Sankoff, D. 1971. Branching processes with terminal types: applications to context-free grammars. Journal of Applied Probability 8:233-240.

Santorini, Beatrice. 1990. Part-of-speech tagging guidelines for the Penn treebank project. 3rd Revision, 2nd printing, Feb. 1995. University of Pennsylvania.

Sapir, Edward. 1921. Language: an introduction to the study of speech. New York: Harcourt Brace.

Sato, Satoshi. 1992. CTM: An example-based translation aid system. In COLING 14, pp. 1259-1263.

Saund, Eric. 1994. Unsupervised learning of mixtures of multiple causes in binary data. In J. Cowan, G. Tesauro, and J. Alspector (eds.), Advances in Neural Information Processing Systems 6. San Mateo, CA: Morgan Kaufmann Publishers.

Schabes, Yves. 1992. Stochastic lexicalized tree-adjoining grammars. In COLING 14, pp. 426-432.

Schabes, Yves, Anne Abeille, and Aravind Joshi. 1988. Parsing strategies with lexicalized grammars: Tree adjoining grammars. In COLING 12, pp. 578-583.

Schabes, Yves, Michal Roth, and Randy Osborne. 1993. Parsing the Wall Street Journal with the Inside-Outside algorithm. In EACL 6, pp. 341-347.

Schapire, Robert E., Yoram Singer, and Amit Singhal. 1998. Boosting and Rocchio applied to text filtering. In SIGIR '98, pp. 215-223.

Schmid H. Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing. 1994;:44-49, Manchester, England.

Schutze CT. The empirical base of linguistics: grammaticality judgments and linguistic methodology. Chicago, IL: University of Chicago Press. 1996;:.

Schütze, H. 1992. Context space. In: Goldman R, Norvig P, Charniak E, Gale B, eds. Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pp. 113-120, Menlo Park, CA. AAAI Press.

Schütze, H. 1992b. Dimensions of meaning. In Proceedings of Supercomputing '92. 1992;:787-796. Los Alamitos, CA. IEEE Computer Society Press.

Schütze H. 1993. Part-of-speech induction from scratch. In ACL 31, pp. 251-258.

Schütze H. 1995. Distributional part-of-speech tagging. In: EACL 7, pp. 141-148.

Schütze H. 1997. Ambiguity Resolution in Language Learning. Stanford, CA: CSLI Publications.

Schütze H. 1998. Automatic word sense discrimination. Computational Linguistics 24:97-124.

Schütze H, Hull DA, Pederson JO. A comparison of classifiers and document representations for the routing problem. In: SIGIR '95. 1995;:229-237.

Schütze H, Pederson JO. Information retrieval based on word senses. In: Fourth Annual Symposium on Document Analysis and Information Retrieval. 1995;4:161-175, Las Vegas, NV.

Schütze H, Pederson JO. 1997. A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management 33:307-318.

Schütze H, Singer Y. 1994. Part-of-speech tagging using a variable memory Markov model. In: ACL 32. 1994;32:181-187.

Shannon, Claude E. 1948. A mathematical theory of communication. Bell System Technical journal 27:379-423, 623-656.

Shannon, Claude E. 1951. Prediction and entropy of printed English. Bell System Technical journal 30:50-64.

Shavlik, Jude W., and Thomas G. Dietterich (eds.). 1990. Readings in Machine Learning. San Mateo, CA: Morgan Kaufmann.

Shemtov, Hadar. 1993. Text alignment in a tool for translating revised documents. In EACL 6. 1993:6;449-453.

Sheridan, Paraic, and Alan F. Smeaton. 1992. The application of morphosyntactic language processing to effective phrase matching. Information Processing & Management 28:349-3 70.

Sheridan, Paraic, Martin Wechsler, and Peter Schauble. 1997. Cross language speech retrieval: Establishing a baseline performance. In: SIGIR '97, pp. 99-108.

Shimohata, Sayori, Toshiyuko Sugio, and Junji Nagata. 1997. Retrieving collocations by co-occurrences and word order constraints. In ACL 35/EACL 8, pp. 476-481.

Siegel, Sidney, and N. John Castellan, Jr. 1988. Nonparametric Statistics for the Behavioral Sciences, 2nd edition. New York: McGraw Hill.

Silverstein, Craig, and Jan O. Pedersen. 1997. Almost-constant-time clustering of arbitrary corpus subsets. In SIGIR '97, pp. 60-66.

Sima'an, Khalil. 1996. Computational complexity of probabilistic disambiguation by means of tree-grammars. In COLING 16, pp. 1175-1180.

Sima'an, Khalil, Rens Bod, S. Krauwer, and Remko Scha. 1994. Efficient disambiguation by means of stochastic tree substitution grammars. In Proceedings International Conference on New Methods in Language Processing.

Simard, Michel, G. F. Foster, and P. Isabelle. 1992. Using cognates to align sentences in bilingual corpora. In Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-92), pp. 67-81.

Simard, Michel, and Pierre Plamondon. 1996. Bilingual sentence alignment: Balancing robustness and accuracy. In Proceedings of the First Conference of the Association for Machine Translation in the Americas (AMTA-96), pp. 13 5-144.

Sinclair, John (ed.). 1995. Collins COBUILD English dictionary. London: Harper Collins. New edition, completely revised.

Singhal, Amit, Gerard Salton, and Chris Buckley. 1996. Length normalization in degraded text collections. In Fifth Annual Symposium on Document Analysis and Information Retrieval, pp. 149-162, Las Vegas, NV.

Sipser, Michael. 1996. Introduction to the theory of computation. Boston, MA: PWS Publishing Company.

Siskind, Jeffrey Mark. 1996. A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition 61:39-91.

Skut, Wojciech, and Thorsten Brants. 1998. A maximum-entropy partial parser for unrestricted text. In WVLC 6, pp. 143-151.

Smadja, Frank. 1993. Retrieving collocations from text: Xtract. Computational Linguistics 19:143-177.

Smadja, Frank, Kathleen R. McKeown, and Vasileios Hatzivassiloglou. 1996. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22:1-38.

Smadja, Frank A., and Kathleen R. McKeown. 1990. Automatically extracting and representing collocations for language generation. In ACL 28, pp. 252-259.

Smeaton, Alan F. 1992. Progress in the application of natural language processing to information retrieval tasks. The Computer journal 35:268-278.

Smith, Tony C., and John G. Cleary. 1997. Probabilistic unification grammars. In: 1997 Australasian Natural Language Processing Summer Workshop. 1997;:25-32, Macquarie University.

Snedecor GW, Cochran WG. 1989. Statistical methods. Ames: Iowa State University Press. 8th edition.

Sparck Jones, Karen. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28:11-21.

Sparck Jones, Karen, and Peter Willett (eds.). 1998. Readings in Information Retrieval. San Francisco: Morgan Kaufmann.

Sproat, Richard William. 1992. Morphology and computation. Cambridge, MA: MIT Press.

Sproat, Richard W., Chilin Shih, William Gale, and Nancy Chang. 1996. A stochastic finite-state word-segmentation algorithm for Chinese. Computational Linguistics 22:377-404.

St. Laurent, Simon. 1998. XML: A Primer. Foster City, CA: MIS Press/IDG Books.

Stanfill, Craig, and David Waltz. 1986. Toward memory-based reasoning. Communications of the ACM 29:1213-1228.

Steier, Amy M., and Richard K. Belew. 1993. Exporting phrases: A statistical analysis of topical language. In R. Casey and B. Croft (eds.), Second Annual Symposium on DocumentAnalysis and Information Retrieval, pp. 179-190, Las Vegas, NV.

Stolcke, Andreas. 1995. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21:165-202.

Stolcke, Andreas, and Stephen M. Omohundro. 1993. Hidden Markov model induction by Bayesian model merging. In S. J. Hanson, J. D. Cowan, and C. Lee Giles (eds.), Advances in Neural Information Processing Systems 5, pp. 11-18, San Mateo, CA. Morgan Kaufmann.

Stolcke, Andreas, and Stephen M. Omohundro. 1994a. Best-first model merging for hidden Markov model induction. Technical Report TR-94-003, International Computer Science Institute, University of California at Berkeley.

Stolcke, Andreas, and Stephen M. Omohundro. 1994b. Inducing probabilistic grammars by Bayesian model merging. In: Grammatical Inference and Applications: Proceedings of the Second International Colloquium on Grammatical Inference. Springer Verlag.

Stolcke, A., E. Shriberg, R. Bates, N. Coccaro, D. Jurafsky, R. Martin, M. Meteer, K. Ries, P. Taylor, and C. Van Ess-Dykema. 1998. Dialog act modeling for conversational speech. In Applying Machine Learning to Discourse Processing, pp. 98-105, Menlo Park, CA. AAAI Press.

Stolz, Walter S., Percy H. Tannenbaum, and Frederick V. Carstensen. 1965. A stochastic approach to the grammatical coding of English. Communications of the ACM 8:399-405.

Strang, Gilbert. 1988. Linear algebra and its applications, 3rd edition. San Diego: Harcourt, Brace, Jovanovich.

Strzalkowski, Tomek. 1995. Natural language information retrieval. Information Processing & Management 31:397-417.

Stubbs, Michael. 1996. Text and corpus analysis: computer-assisted studies of language and culture. Oxford: Blackwell.

Tabor, Whitney. 1994. Syntactic Innovation: A Connectionist Model. PhD thesis, Stanford.

Tague-Sutcliffe, Jean. 1992. The pragmatics of information retrieval experimentation, revisited. Information Processing & Management 28:467-490. Reprinted in (Sparck Jones and Willett 1998).

Talmy, Leonard. 1985. Lexicalization patterns: Semantic structure in lexical form. In Timothy Shopen (ed.), Language Typology and Syntactic Description III: Grammatical Categories and the Lexicon, pp. 5 7-149. Cambridge, MA: Cambridge University Press.

Tanenhaus, M. K., and J. C. Trueswell. 1995. Sentence comprehension. In J. Miller and P. Eimas (eds.), Handbook of Perception and Cognition, volume 11, pp. 217-262. San Diego: Academic Press.

Tesniere, Lucien. 1959. Elements de Syntaxe Structurale. Paris: Librairie C. Klincksieck.

Tomita, Masaru (ed.). 199 L Generalized LR parsing. Boston: Kluwer Academic.

Towell, Geoffrey, and Ellen M. Voorhees. 1998. Disambiguating highly ambiguous words. Computational Linguistics 24:125-146.

Trask, Robert Lawrence. 1993. A dictionary of grammatical terms in linguistics. London: Routledge.

van Halteren, Hans, Jakub Zavrel, and Walter Daelemans. 1998. Improving data driven wordclass tagging by system combination. In ACL 36/COLING 17, pp. 491-497.

van Riemsdijk, Henk, and Edwin Williams. 1986. Introduction to the Theory of Grammar. Cambridge, MA: MIT Press.

van Rijsbergen, C. J. 1979. Information Retrieval. London: Butterworths. Second Edition.

Velardi, Paola, and Maria Teresa Pazienza. 1989. Computer aided interpretation of lexical cooccurrences. In ACL 27, pp. 185-192.

Viegas, Evelyne, Boyan Onyshkevych, Victor Raskin, and Sergei Nirenburg. 1996. From submit to submitted via submission: On lexical rules in large-scale lexicon acquisition. In ACL 34, pp. 32-39.

Viterbi, A. J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory IT-13: 1260-269.

Vogel, Stephan, Hermann Ney, and Christoph Tillmann. 1996. HMM-based word alignment in statistical translation. In COLING 16, pp. 836-841. Voutilainen, A. 1995. A syntax-based part of speech analyser. In EACL 7, pp. 157-164.

Waibel, Alex, and Kai-Fu Lee (eds.). 1990. Readings in Speech Recognition. San mateo, CA: Morgan Kaufmann.

Walker, Donald E. 1987. Knowledge resource tools for accessing large text files. In Sergei Nirenburg (ed.), Machine Translation: Theoretical and methodological issues, pp. 247-261. Cambridge: Cambridge University Press.

Walker, Donald E., and Robert A. Amsler. 1986. The use of machine-readable dictionaries in sublanguage analysis. In Ralph Grishman and Richard Kittredge (eds.), Analyzing language in restricted domains: sublanguage description and processing, pp. 69-84. Hillsdale, NJ: Lawrence Erlbaum.

Walker, Marilyn A., Jeanne C. Fromer, and Shrikanth Narayanan. 1998. Learning optimal dialogue strategies: A case study of a spoken dialogue agent for email. In ACL 36/COLING 17, pp. 1345-1351.

Walker, Marilyn A., and Johanna D. Moore. 1997. Empirical studies in discourse. Computational Linguistics 23:1-12.

Wang, Ye-Yi, and Alex Waibel. 1997. Decoding algorithm in statistical machine translation. In ACL 35/EACL 8, pp. 366-372.

Wang, Ye-Yi, and Alex Waibel. 1998. Modeling with structures in statistical machine translation. In ACL 36/COLING 17, pp. 1357-1363.

Waterman, Scott A. 1995. Distinguished usage. In Branimir Boguraev and James Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, pp. 143172: Cambridge, MA: MIT Press.

Weaver W. 1955. Translation. In William N. Locke and A. Donald Booth (eds.), Machine Translation of Languages: Fourteen Essays. 1955;:15-23. New York: John Wiley & Sons.

Webster, Mort, and Mitch Marcus. 1989. Automatic acquisition of the lexical semantics of verbs from sentence frames. In ACL 27, pp. 177-184.

Weinberg, Sharon L., and Kenneth P. Goldberg. 1990. Statistics for the behavioral sciences. Cambridge: Cambridge University Press.

Weischedel, Ralph, Marie Meteer, Richard Schwartz, Lance Ramshaw, and Jeff Palmucci. 1993. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics 19:359-382.

Wiener, Erich, Jan Pedersen, and Andreas Weigend. 1995. A neural network approach to topic spotting. In Proc. SDAIR 95, pp. 317-332, Las Vegas, NV.

Wilks, Yorick, and Mark Stevenson. 1998. Word sense disambiguation using optimized combination of knowledge sources. In ACL 36/COLING 17, pp. 1398-1402.

Willett, Peter. 1988. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management 24:577-597.

Willett, P., and V. Winterman. 1986. A comparison of some measures for the determination of inter-molecular structural similarity. Quantitative Structure - Activity Relationships 5:18-25.

Witten IH, Bell TC.
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression.
IEEE Transactions on Information Theory. 1991;37:1085-1094.

Wittgenstein L.
Philosophical Investigations. [Philosophische Untersuchungen]. Third Edition.
Translated by G. E. M. Anscombe.
Oxford: Basil Blackwell. 1968;:.

Wong SKM, Yao YY.
An information-theoretic measure of term specificity.
J Am Soc Information Science. 1992;43:54-61.

Wood MM.
Categorial Grammars.
London: Routledge. 1993;:.

Woolf HB, ed.
Webster's New Collegiate Dictionary.
Springfield, MA: G. & C. Merriam Co. 1973;:.

Wu D.
Aligning a parallel English-Chinese corpus statistically with lexical criteria.
In: ACL 1994;32:80-87.

Wu D.
Grammarless extraction of phrasal examples from parallel texts.
In: Sixth International Conference on Theoretical and Methodological Issues in Machine Translation. 1995;:.

Wu D.
A polynomial-time algorithm for statistical machine translation.
In: ACL 1996;34:152-158.

Wu D, Wong H.
Machine translation with a stochastic grammatical channel.
In: ACL 36/COLING 17. 1998;17:1408-1415.

Yamamoto M, Church KW.
Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus.
In: WVLC 1998;6:28-37.

Yang Y.
Expert network: Effective and efficient learning from human decisions in text categorization and retrieval.
In: SIGIR '94. 1994;:13-22.

Yang Y.
Noise reduction in a statistical approach to text categorization.
In: SIGIR '95. 1999;:256-263.

Yang Y.
An evaluation of statistical approaches to text categorization.
Information Retrieval. 1999;1:69-90.

Yarowsky D.
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora.
In: COLING 1992;14:454-460.

Yarowsky D.
Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French.
In: ACL 1994;32:88-95.

Yarowsky D.
Unsupervised word sense disambiguation rivaling supervised methods.
In: ACL 1995;33:189-196.

Youmans G.
A new tool for discourse analysis: The vocabulary management profile.
Language. 1991;67:763-789.

Younger DH.
Recognition and parsing of context free languages in time n3.
Information and Control 1967;10:198-208.

Zavrel J, Daelemans W.
Memory-based learning: Using similarity for smoothing.
In: ACL 35/EACL 8. 1997;8:436-443.

Zavrel J, Daelemans W, Veenstra J.
Resolving PP attachment ambiguities with memory-based learning. In: Proceedings of the Workshop on Computational Natural Language Learning. 1997;:136-144.
Somerset, NJ: Association for Computational Linguistics.

Zernik U.
Introduction.
In: Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. 1991;:1-26.
Hillsdale, NJ: Lawrence Erlbaum.

Zernik U.
Train1 vs. Train2: Tagging word sense in corpus.
In: Zernik U, ed. Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. 1991;:91-112.
Hillsdale, NJ: Lawrence Erlbaum.

Zipf GK.
Relative frequency as a determinant of phonetic change.
Harvard Studies in Classical Philology 1929;40:1-95.

Zipf GK.
The Psycho-Biology of Language.
Boston, MA: Houghton Mifflin. 1935;:.

Zipf GK.
Human Behavior and the Principle of Least Effort.
Cambridge, MA: Addison-Wesley. 1949;:.

Email from the BCIG Book Club.
This is really weird. Can you raed tihs? Olny srmat poelpe can. I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. This is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Amzanig huh? yaeh and I awlyas tghuhot slpeling was ipmorantt! if you can raed tihs psas it on !!



Medical Malpractice.
NEJM 2004;350:283-292.


Medical Privacy.
NEJM 2004;350:1452-1453.

Last updated: 2/9/2006, by G. William Moore, MD, PhD.