RANK FREQUENCY BARRIER WORD
1 222,175 and
2 196,153 of
3 189,799 with
4 107,039 for
5 104,067 the
6 82,104 note
7 80,740 in
8 78,549 right
9 77,885 left
10 70,923 is
11 70,261 see
12 67,917 are
13 53,071 mild
14 49,987 identified
15 47,804 to
16 41,467 consistent
17 39,792 this
18 30,352 present
19 27,189 seen
20 25,371 at
21 25,097 there
22 24,657 on
23 24,284 or
24 23,021 be
25 21,243 associated
26 19,515 was
27 18,376 one
28 16,122 but
29 16,057 case
30 16,057 from
is no balm in Gilead; [is there] no physician there? why then is not the health of th
them, They that be whole need not a physician, but they that are sick. 13 But go ye a
that are whole have no need of the physician, but they that are sick: I came not to
ll surely say unto me this proverb, Physician, heal thyself: whatsoever we have heard
hem, They that are whole need not a physician; but they that are sick. 32 I came not
in Hierapolis. 14 Luke, the beloved physician, and Demas, greet you. 15 Salute the br
epidermoid carcinoma , uterine cervix extending to fundus , adnexa , bladder , rectum , and pelvic
r . diverticula colon . surgical absence , uterine fundus , and appendix . peritoneal adhesions .
iae . external cardiac massage . petechiae gastric fundus .
cell nuclei . capillary microaneurysms left optic fundus . history of traumatic lumbar puncture .
eral renal pelves and trachea . surgical absence , fundus and corpus uteri , and subtotal absence
and intact healed end to side anastomosis between fundus of stomach and proximal jejunum . hyperp
ial necrosis aorta . surgical absence , body , and fundus of uterus , appendix , and left sixth ri
Chapter 21. Syntax.
21.1. Markov Models.
Markov chain: chain of events, A1, A2,
A3, ..., with a limited memory, classically,
only a single step.
Markov (1913)
originally developed Markov chains to examine the sequence of letters
in Russian literature.
The probablity of letter/word n depends only upon
the previous k words.
21.2. Hidden Markov Model (HMM): probabilistic function
of a Markov process.
21.3. HMMs are the dominant model in speech recognition research.
21.4. HMMs used in part-of-speech tagging of a document.
21.5. Forward Hidden Markov Model algorithm.
21.6. Backward Hidden Markov Model algorithm.
21.7. Probabilistic Context-Free Grammars.
21.8. Probabilistic Parsing.
Chapter 22. Experience with JHSP/JHAR corpus.
22.1. Johns Hopkins Autopsy Resource (JHAR), posted 1995-2003.
22.2. Not publicly available now: HIPAA.
22.3. Requires Institutional Review Board (IRB) approval.
22.3.1. Why the project won't harm the patients.
22.3.2. Why the risk of harm is outweighed by presumed benefits.
22.4. Same for
http://www.netautopsy.org/vhpsapsx.htm
JHSP corpus.
Chapter 23. Statistical Inventory.
23.1. All Words: Zipf's Law.
23.2. Barrier Words: Zipf's Law.
23.3. Collocations: Zipf's Law.
23.4. Grammaticality: Zipf's Law.
23.5. BNF formulas: Zipf's Law.
23.2. Barrier Words: Zipf's Law.
RANK FREQUENCY BARRIER WORD
1 222,175 and
2 196,153 of
3 189,799 with
4 107,039 for
5 104,067 the
6 82,104 note
7 80,740 in
8 78,549 right
9 77,885 left
10 70,923 is
11 70,261 see
12 67,917 are
13 53,071 mild
14 49,987 identified
15 47,804 to
16 41,467 consistent
17 39,792 this
18 30,352 present
19 27,189 seen
20 25,371 at
21 25,097 there
22 24,657 on
23 24,284 or
24 23,021 be
25 21,243 associated
23.3. Collocations: Zipf's Law.
RANK FREQUENCY COLLOCATION
1 38,401 chronic inflammation
2 20,328 lymph nodes
3 18,428 diff quik
4 16,104 soft tissue
5 14,456 bone marrow
6 13,104 non diagnostic
7 13,021 diagnostic findings
8 13,004 non diagnostic findings
9 12,868 helicobacter pylori
10 12,328 crypt distortion
11 12,316 lymph node
12 12,292 quik stain
13 12,284 diff quik stain
14 11,080 mild chronic
15 10,229 epithelial changes
16 10,004 fibroadipose tissue
17 9,967 non specific
18 9,052 left breast
19 8,893 inflammatory disease
20 8,741 gastroesophageal reflux
23.4. Grammaticality: Zipf's Law.
RANK FREQUENCY SENTENCE-PATTERN EXAMPLE
1 423,177 [N] hemangioma
2 106,034 [N[N]] liver [needle]
3 98,958 [AN] left foot
4 85,908 [N|V] scar
5 79,741 [NN|V] skin scar
6 62,042 [AAN] epidermal inclusion cyst
7 50,461 [AN[N]] laryngeal mass [biopsy]
8 41,958 [NCN] decidua and villi
9 38,689 [A|NPN] negative for actinomyces
10 26,745 [N[NPN]] cervix [biopsy at 9:00]
11 22,097 [N[NN]] cervix [biopsy 9:00]
12 21,704 [NPAN] skin of left ear
13 21,102 [NN] ear lobe
14 20,638 [BAN] non diagnostic findings
15 16,864 [AAN[N]] left chest wall [biopsy]
16 13,674 [AAAN] left axillary soft tissue
17 12,798 [NCAN[N]] skin , left flank [biopsy]
18 12,692 [ANCAN] soft tissue , inguinal region
19 12,596 [ANPAAN] fibrous plaque from left carotid artery
20 12,507 [N[N]ANCA|VANCA|NPN] leg [ bka ] old thrombus and calcified atherosclerotic plaque , negative for osteomyelitis
23.5. BNF formulas: Zipf's Law.
RANK FREQUENCY BNF FORMULA EXAMPLE
1 689,478 [N] ==> [] [prostate]
2 313,234 [AN] ==> [] [actinic keratosis]
3 117,039 [AAN] ==> [] [hypertrophic actinic keratosis]
4 86,762 [N|V] ==> [] [scar]
5 80,127 [NN|V] ==> [] [skin scar]
6 66,816 [NAN] ==> [] [skin soft tissue]
7 60,129 [NCN] ==> [] [decidua and villi]
8 55,728 [AN ==> [N [actinic KERATOSIS
9 52,777 [A|N] ==> [] [negative]
10 47,375 [NN] ==> [] [granulation tissue]
11 47,139 [A] ==> [] [void]
12 42,661 [NPN] ==> [] [adenocarcinoma of colon]
13 36,076 [AAAN] ==> [] [focal bowenoid actinic keratosis]
14 31,946 [NPAN] ==> [] [skin with actinic keratosis]
15 25,168 [BAN] ==> [] [focally invasive tumor]
16 22,761 [NCAN] ==> [] [ulcer and acute inflammation]
17 22,276 [ANN] ==> [] [exuberant granulation tissue]
18 16,791 [NN ==> [N [lung CARCINOMA
19 15,577 [NAPN] ==> [] [carcinoma metastatic to lung]
20 13,764 [NNN] ==> [] [liver gallbladder pancreas]
PHRASE STRUCTURE GRAMMAR, PARSING.
[ adenocarcinoma of colon metastatic to lung ]
[ N P N A P N ]
PHRASE STRUCTURE GRAMMAR, UMLS CODES.
[ ADENOCARCINOMA OF COLON METASTATIC TO LUNG ]
[ C0001418 C0332285 C0009368 C0027627 C0332286 C0024109 ]
PHRASE STRUCTURE GRAMMAR, XML FORMAT.
<code section scheme="UMLS">
<c type="morph" value="C0001418>adenocarcinoma
>c type="topo" value="C0009368">colon
<c type="morph" value="C0027627">metastatic
<c type="topo" value="C0024109">lung
</c>
</c>
</c>
</c>
</code-section>
A NOTE OF PESSIMISM.
"Linguistic theories ... do not cover varieties of
exceptional expressions which practical machine translation systems
have to handle. A machine translation system, which is still imperfect
and will never be completed, is exposed to very crude tests
when the system construction reaches a certain stage.
At that stage of development, the system is given
a comparatively simple sentence for translation,
with structures that can be analyzed by a grammar given to the system.
After completion, people other than those who developed the system
are asked to translate a variety of texts such as newspaper articles,
science magazines, patent documents, contract documents,
and commercial letters. Because the documents
have not been adequately tested at the development stage,
users are disappointed by the poor translation results
produced by the system. Many of the failures of the system
come from the fact that the dictionary and the grammar
are not sufficient to accept such unexpected input sentences."
Chapter 24. Conclusions: Future of NLP in medicine.
24.1. Terabytes of text information in medicine annually.
24.2. Raw materials for epidemiologic studies.
24.3. Competition: fast turnaround time versus tolerating
a grammatical filter (e.g., Microsoft® Word® email filter (ugh!).
24.4. Acceptable phrase structure grammar rules: professional societies.
24.5. NLP reducible to synoptic reporting.
24.6. Physicians do not easily surrender control of their documents.
24.7. Prof. Siegel's (father of filmless radiology) Test:
Who wins the first lawsuit.
Chapter 25. Problems for NLP in anatomic pathology.
25.1. Undetected associations between diseases,
e.g., Mesothelioma-asbestos.
25.2. Does one "outgrow" cancer?
Age-specific cancer incidences in an aging population.
Chapter 26. References.
Chapter 27. Mini-histories.
Chapter 28. Glossary.
CHAPTER 1.
INTRODUCTION.
1.1. Reasons for NLP in medicine.
There is currently a raging controversy going on
in anatomic pathology practice, and the fallout will eventually
reach our colleagues in other medical specialties. Anatomic pathologists
have always written their diagnostic reports in free text,
either English or some other medically competent language
(including Latin!). So far, my colleagues have successfully resisted
the onslaught of data-miners and administrators who want us to write
our diagnoses in standardized coding systems
(CAP, 2005;
Ackerman, 2005;
Ackerman, 2004).
This controversy was a big topic at the most recent meeting of
Advancing Practice, Instruction, and Innovation through Informatics
(APIII, 2005); and is a requirement
for hospitals accredited as a certified cancer center by the
College of American Pathologists (CAP,
2005); or by the American College of Surgeons
(ACS, 2005). The driving forces are
billing ( Mauung, 2004;
Hardhats, 2005) and regulation (
JCAHO, 2005). When do two diagnostic reports deserve
the same compensation; and what is the mix of cases
for a particular medical institution? It is hopeless to tabulate records
of this complexity manually. And, in my opinion, it is equally hopeless
to expect pathologists and other physicians to compose their reports
by making selections from pick-lists.
CHAPTER 2.
LINGUISTIC SCIENCE.
2.1. Characterize and explain linguistic observations.
CHAPTER 3.
RULE-BASED SYSTEMS.
3.1. Grammars in Ancient Civilizations.
CHAPTER 4.
GENERATIVE LINGUISTICS.
4.1.
Chomsky: describe the innate language (I-language).
CHAPTER 5.
ARTIFICIAL INTELLIGENCE.
5.1.
Build small systems that behave intelligently.
CHAPTER 6.
BASIC CONCEPTS.
6.1.
Fundamental questions.
CHAPTER 7.
COMPETENCE GRAMMAR.
7.1.
Property of the rational speaker.
CHAPTER 8.
AMBIGUITY OF LANGUAGE.
8.1.
Verbs, gerunds, gerundives.
CHAPTER 9.
CORPUS LINGUISTICS: INTRODUCTION.
9.1.
Text corpora: Brown corpus.
CHAPTER 10.
ZIPF'S LAWS.
10.1. Zipf's First Law.
10.2. Zipf's Second Law.
10.3. Zipf's Third Law.
CHAPTER 11.
COLLOCATIONS.
11.1.
Definition: Multiple word sequence.
CHAPTER 12.
CONCORDANCES.
12.1.
Biblical.
CHAPTER 13.
MATHEMATICAL FOUNDATIONS.
13.1.
Probability Theory.
CHAPTER 14.
STATISTICS.
14.1.
Estimation.
CHAPTER 15.
GENERAL LINGUISTICS.
15.1.
Parts-of-speech, morphology.
CHAPTER 16.
PHRASE STRUCTURE GRAMMAR.
16.1.
Grammar reduced to a sequence of phrases.
CHAPTER 17.
CONTEXT-FREE GRAMMAR.
17.1.
Surrounding context is irrelevant.
CHAPTER 18.
DEPENDENCY GRAMMAR.
18.1.
Definition: dependency between words.
CHAPTER 19.
CORPUS LINGUISTICS: SOURCES AND METHODS.
19.1.
Johns Hopkins Autopsy Resource.
CHAPTER 20.
WORDS AND PHRASES.
20.1.
Collocations.
CHAPTER 21.
SYNTAX.
21.1.
Markov models.
CHAPTER 22.
JHAR/JHSP CORPORA.
22.1.
JHAR.
CHAPTER 23.
STATISTICAL INVENTORY.
23.1. All words: Zipf's Law.
23.2. Barrier words: Zipf's Law.
23.3. Collocations: Zipf's Law.
23.4. Grammaticality: Zipf's Law.
23.5. BNF formulas: Zipf's Law.
CHAPTER 24.
FUTURE OF NLP IN MEDICINE.
24.1.
Terabytes of medical text annually.
CHAPTER 25.
PROBLEMS FOR NLP IN PATHOLOGY
25.1.
Undetected associations.
CHAPTER 26.
REFERENCES.
Pubmed.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
Ackerman AB.
Protocols for the reporting of cutaneous melanoma.
Am J Clin Pathol. 2004 Nov;122(5):815-7. No abstract available.
PMID: 15540388.
PubMed Entry
Ackerman AB.
Dermatologist not equal to dermatopathologist:
no place in a profession for pretenders.
J Am Acad Dermatol. 2005 Oct;53(4):698-699.
PMID: 16198796.
PubMed Entry
Ackerman AB.
Garble that derives from lack of definition.
Am J Dermatopathol. 2005 Aug;27(4):369-370.
PMID: 16121068.
PubMed Entry
Ackerman AB.
The future of pathology as a discipline: none without a dictionary!
Cesk Patol. 2005 Jan;41(1):4-5.
PMID: 15816116.
PubMed Entry
Ackerman AB.
Reviewer conflicts of interest should be disclosed.
J Am Acad Dermatol. 2005 Mar;52(3 Pt 1):538;
author reply 538; discussion 538-539.
PMID: 15761446.
PubMed Entry
Ackerman AB.
Decline of a discipline: abetment by journals.
J Cutan Pathol. 2005 Mar;32(3):254; author reply 254.
PMID: 15701091.
PubMed Entry
Kung JX, Ackerman AB.
Staging of melanoma: a critique of the most recent (2002) system
proposed by the American Joint Committee on Cancer: part II.
Am J Dermatopathol. 2005 Apr;27(2):165-167.
PMID: 15798445.
PubMed Entry
Bakotic B, Ackerman AB.
Staging of melanoma: a critique in historical perspective: part I.
Am J Dermatopathol. 2005 Apr;27(2):160-164.
PMID: 15798444.
PubMed Entry
Dabbs DJ, Geisinger KR, Ruggiero F, Raab SS, Nalesnik M,
Silverman JF; Association of Directors of Anatomic and Surgical Pathology.
Recommendations for the reporting of tissues removed as part
of the surgical treatment of malignant liver tumors.
Hum Pathol. 2004 Nov;35(11):1315-1323.
PMID: 15668887.
PubMed Entry
ADASP Reporting protocol.
Wei JT, Miller EA, Woosley JT, Martin CF, Sandler RS.
Quality of colon carcinoma pathology reporting: a process of care study.
Cancer. 2004 Mar 15;100(6):1262-1267.
PMID: 15022295.
PubMed Entry
ADASP Reporting protocol.
Jaffe ES, Banks PM, Nathwani B, Said J, Swerdlow SH.
Recommendations for the reporting of lymphoid neoplasms: A report from
the Association of Directors of Anatomic and Surgical Pathology.
Mod Pathol. 2004 Jan;17(1):131-135.
PMID: 14657953.
PubMed Entry
ADASP Reporting protocol.
Lawrence WD; Association of Directors of Anatomic
and Surgical Pathology.
ADASP recommendations for processing and reporting
of lymph node specimens submitted for evaluation of metastatic disease.
Virchows Arch. 2001 Nov;439(5):601-603. Review.
PMID: 11764377.
PubMed Entry
ADASP Reporting protocol.
Association of Directors of Anatomic and Surgical Pathology.
ADASP recommendations for processing and reporting lymph node specimens
submitted for evaluation of metastatic disease.
Am J Surg Pathol. 2001 Jul;25(7):961-963.
PMID: 11420470.
PubMed Entry
ADASP Committee. The Association of Directors
of Anatomic and Surgical Pathology.
ADASP recommendations for processing and reporting
of lymph node specimens submitted for evaluation of metastatic disease.
Mod Pathol. 2001 Jun;14(6):629-632.
PMID: 11406667.
PubMed Entry
ADASP Reporting protocol.
Kishi K.
Comments regarding the American Association of Directors of Anatomic
and Surgical Pathology (ADASP) recommendations for the reporting
of urinary bladder specimens containing bladder neoplasms: comparison
with the Japanese General Rule for Clinical and Pathological Studies
on Bladder Cancer.
Pathol Int. 1997 May;47(5):332.
PMID: 9143031.
PubMed Entry
ADASP Reporting protocol.
Association of Directors of Anatomic and Surgical Pathology.
Recommendations for the reporting
of resected large intestinal carcinomas.
Association of Directors of Anatomic and Surgical Pathology.
Am J Clin Pathol. 1996 Jul;106(1):12-15.
PMID: 8701921.
PubMed Entry
ADASP Reporting protocol.
Association of Directors of Anatomic and Surgical Pathology.
Recommendations for the reporting of breast carcinoma.
Association of Directors of Anatomic and Surgical Pathology.
Am J Clin Pathol. 1995 Dec;104(6):614-619.
PMID: 8526202.
PubMed Entry
ADASP Reporting protocol.
Simpson PR, Tschang TP.
ADASP recommendations: consultations in surgical pathology.
Association of Directors of Anatomic and Surgical Pathology.
Hum Pathol. 1993 Dec;24(12):1382.
PMID: 8276389.
PubMed Entry
ADASP Reporting protocol.
Aitchison J.
Teach Yourself Linguistics. Fifth Edition.
Chicago: NTC/Contemporary Publishing Co. 2000.
ISBN: 0844226688.
Bengtsson S, Schneider W, Spencer WA, Pratt AW,
Kastner VV, Reichertz P, Lamson BG, Anderson J.
The application of computer techniques in health care.
World Hosp. 1976;12(1):47-51.
PMID: 1024332.
PubMed Entry
Berman JJ, Moore GW.
Object-oriented controlled-vocabulary translator
using TRANSOFT + HyperPAD.
Proc Annu Symp Comput Appl Med Care. 1991;15:973-975.
PMID: 1807773.
PubMed Entry
Berman JJ.
Tumor classification: molecular analysis meets Aristotle.
BMC Cancer. 2004 Mar 17;4:10.
PMID: 15113444.
PubMed Entry
Borst F, Lyman M, Nhan NT, Tick LJ, Sager N, Scherrer JR.
TEXTINFO: a tool for automatic determination
of patient clinical profiles using text analysis.
Proc Annu Symp Comput Appl Med Care. 1991;:63-67.
PMID: 1807679.
PubMed Entry
Bundy A, ed.
Artificial Intelligence Techniques: A Comprehensive Catalogue.
Fourth, Revised Edition.
Heidelberg: Springer Verlag. 1997;:.
ISBN: 3540593233.
Chi EC, Sager N, Tick LJ, Lyman MS.
Relational data base modelling of free-text medical narrative.
Med Inform (Lond). 1983 Jul-Sep;8(3):209-223.
PMID: 6600043.
PubMed Entry
Chomsky N.
Morphophonemics of Modern Hebrew.
Undergraduate Honors Essay.
University of Pennsylvania. 1949;:.
Cited in: Newmeyer FJ. Generative Linguistics.
A historical Perspective. London: Routledge. 1996;:.
Chomsky N.
Syntactic Structures.
The Hague: Mouton. 1957;:.
Chomsky N.
The development of grammar in child language: Formal discussion.
Monogr Soc Res Child Dev. 1964;29:35-39.
PMID: 14125365.
PubMed Entry
Chomsky N.
Aspects of the Theory of Syntax.
Cambridge, MA: MIT Press. 1965;:.
Chomsky N.
Language and Mind.
San Diego: Harcourt Brace Jovanovich. 1968.
Chomsky N.
Rules and Representations.
New York: Columbia University Press. 1980;:.
Chomsky N.
Knowledge of Language: Its Nature, Origin, and Use.
New York: Prager. 1986;:.
Chomsky N.
The Minimalist Program.
Cambridge, MA: MIT Press. 1995;:.
Chomsky N.
Universals of human nature.
Psychother Psychosom. 2005;74(5):263-268.
PMID: 16088263.
PubMed Entry
Cios KJ, Moore GW.
Medical Data Mining and Knowledge Discovery: Overview.
Chapter 1. In: Cios KJ. Medical Data Mining and Knowledge Discovery.
Berlin: Springer Verlag. 2000;1:1-16.
ISBN: 3-7908-1340-0, 502 pages.
Published within the series: "Studies in Fuzziness and Soft Computing",
Physica-Verlag Heidelberg, a Springer-Verlag Company.
Condon EU.
Statistics of vocabulary.
Science 1928;67:300.
Craig J, Bevington W.
Designing with type. A basic course in typography. Fourth edition.
New York: Watson-Guptill Publications. 1999;:.
ISBN 0-8230-1347-2, 176 pages.
Chapter 1. Origins of the Alphabet. pp. 8-11.
Dunham GS, Pacak MG, Pratt AW.
Automatic indexing of pathology data.
J Am Soc Inf Sci. 1978 Mar;29(2):81-90.
PMID: 10318395.
PubMed Entry
Estoup JB.
Gammes Sténographiques. Fourth Edition.
Paris:. 1916;:.
Fedorowicz J.
A Zipfian model of an automatic bibliographic system:
An application to MEDLINE.
J Am Soc Info Sci 1982;33:223-232.
Fitch WT, Hauser MD, Chomsky N.
The evolution of the language faculty: Clarifications and implications.
Cognition. 2005 Sep;97(2):179-210.
PMID: 16112662.
PubMed Entry
Giere W.
Foundations of clinical data automation in cooperative programs.
Proc 5th Ann Symp Comp Applic Med Care. 1981;5:1142-1148.
Graepel PH, Henson DE, Pratt AW.
Comments on the use of the Systematized Nomenclature of Pathology.
Methods Inf Med. 1975 Apr;14(2):72-75.
PMID: 1207468.
PubMed Entry
Description of VistA® Filemanager.
http://www.hardhats.org
Includes instructions for obtaining at-cost copies of the
complete, public-domain system, through the Freedom of Information Act.
Harris Z.
Methods in Structural Linguistics.
Chicago: University of Chicago Press. 1951;:.
Hauser MD, Chomsky N, Fitch WT.
The faculty of language: what is it, who has it, and how did it evolve?
Science. 2002 Nov 22;298(5598):1569-1579. Review.
PMID: 12446899.
PubMed Entry
Hirschman L, Story G, Marsh E, Lyman M, Sager N.
An experiment in automated health care evaluation
from narrative medical records.
Comput Biomed Res. 1981 Oct;14(5):447-463.
PMID: 7273723.
PubMed Entry
Huff D.
How to lie with statistics.
New York: W. W. Norton & Company. 1954;:.
ISBN 0-393-31072-8, 142 pages.
Hutchins WJ.
Machine Translation : Past, Present, Future .
Ellis Horwood/Wiley, Chichester/ New York. 1986.
Ellis Horwood Series in Computers and Their Applications. ASIN: 0135435218 .
Hutchins GM, Berman JJ, Moore GW, Hanzlick R,
the Autopsy Committee of the College of American Pathologists.
Practice Guidelines for Autopsy Pathology.
Arch Pathol Lab Med. 1999; 123:1085-1092.
Joseph DM, Wong RL.
Correction of misspellings and typographical errors
in a free-text medical English
information storage and retrieval system.
Methods Inf Med. 1979 Oct;18(4):228-234.
Justeson JS, Katz SM.
Technical terminology: some linguistic properties
and an algorithm for identification in text.
Natural Language Engineering. 1995;1:9-27.
December 7, 2003: The master critic - The late Hugh Kenner's theory
of everything. By John Wilson. The Boston Globe / available from Boston.com.
"When Hugh Kenner died on Nov. 24, a few weeks shy of his 81st birthday,
the first problem for writers of obituaries and tributes was how to
categorize him. ... He was himself a 'pattern recognizer,' as he described
inventor Raymond Kurzweil in the December 1990 issue of the pioneering
personal computer magazine Byte. ... This openness to experience, this
confidence that the patterns he saw derived from some ultimate coherence,
must have been owing in part to Kenner's faith, a subject about which he was
reticent in his writing. ... [W]hile some of his coreligionists were wringing
their hands about the implications of artificial intelligence -- and while
MIT's Marvin Minsky was proclaiming that human beings are machines made out
of meat -- Kenner was busy devising, with Joseph O'Rourke, a computer program
called TRAVESTY, which manipulates a text to create odd effects of language.
Later, with Charles Hartman, Kenner published a volume of computer-generated
poetry, 'Sentences.'" See: Poetry, Tributes, Pattern Recognition,
Natural Language Processing, Machine Learning, Applications
Kenner's Corollary: Article in Discover Magazine, circa 1985:
The idea that a desk with an "archeologic ordering"
of papers, i.e., chronological with most recently used papers
at the top of the pile, is a demonstration of Zipf's Law.
That is, the 90% of papers used most often typically appear
in the top 10% of the pile.
Kucera H, Francis WN.
Computational Analysis of Present-Day American English.
Providence, RI: Brown University Press. 1967;:.
Laird CG.
The miracle of language.
Publisher: Fawcett Publications. 1965;:.
ASIN: B0007I1X2Y, 255 pages.
Lewis CI, Langford CH.
Symbolic Logic. Second Edition.
New York: Dover Publications, Inc. 1932.
Li W.
Zipf's Law Bibliography.
http://linkage.rockefeller.edu/wli/zipf/index_ru.html
Lyman M, Sager N, Tick L, Nhan N, Borst F, Scherrer JR.
The application of natural-language processing
to healthcare quality assessment.
Med Decis Making. 1991 Oct-Dec;11(4 Suppl):S65-S68.
PMID: 1770852.
PubMed Entry
Mandelbrot B.
Structure formelle des textes et communication.
Word 1954;10:1-27.
Manning CD, Schütze H.
Foundations of Statistical Natural Language Processing.
Cambridge, MA: The MIT Press. 2000;:.
ISBN: 0262133601, 680 pages.
http://www-nlp.stanford.edu/fsnlp/intro/
Markov AA.
An example of statistical investigation in the text
of Eugene Onyegin, illustrating coupling of tests in chains.
Proc Acad Sci St Petersburg 1913;7;153-162.
Markov was a student of Tschebyscheff.
Masarie FE jr, Miller RA, Bouhaddou O, Guise NB, Warner HR.
An Interlingua for Electronic Interchange of Medical Information:
Using Frames to Map Between Clinical Vocabularies.
Comp Biomed Res 1991; 24(4):379-400.
Maung RTA.
What is the best indicator to determine anatomic pathology workload?
Canadian experience.
Am J Clin Pathol. 2005;123:45-55.
Upstate Medicare Division.
Sample CPT® Fee Schedule: Upstate Medicare Division, 2004 Fee Schedule.
http://www.umd.nycpic.com/2004_80000-89999.html
Accessed January 18, 2005.
From:
http://www.umd.nycpic.com/
Note: CPT® NUMBER and CPT® DESCRIPTOR are copyrighted products
of the American Medical Association.
Minsky M, Hillis D, Rudisch G.
Artificial intelligence.
N Engl J Med. 1980 Jun 26;302(26):1482.
PMID: 7374720.
PubMed Entry
Moore GW, Miller RE, Hutchins GM, Riede UN, Polacsek RA.
Multilingual translation techniques in the analysis
of narrative medical text.
Proc Annu Symp Comput Appl Med Care. 1985;9:.
November 10-13, 1985, Baltimore, MD.
Moore GW, Miller RE, Hutchins GM.
Microcomputer translator for medical text:
Theorem verification for Chapter Two of Zeman's Modal Logic.
Adv Math Comput Med. 7:1621-1633, 1986.
Moore GW, Riede UN, Polacsek RA, Miller RE, Hutchins GM.
Automated translation of German to English medical text.
Am J Med. 1986 Jul;81(1):103-111.
PMID: 3755289.
PubMed Entry
Moore GW, Riede UN, Polacsek RA, Miller RE, Hutchins GM.
Group theory approach to computer translation of medical German.
Methods Inf Med. 1986 Jul;25(3):176-182.
PMID: 3755498.
PubMed Entry
Moore GW, Polacsek RA, Erozan YS,
de la Monte SM, Miller RE, Hutchins GM, Riede UN.
Multilingual translation techniques in the analysis
of narrative medical text.
Comput Methods Programs Biomed. 1986 Mar;22(1):35-42.
PMID: 3634670.
PubMed Entry
Moore GW, Hutchins GM, Boitnott JK, Miller RE, Polacsek RA.
Word root translation of 45,564 autopsy reports into MeSH titles.
Proc Annu Symp Comput Appl Med Care. 1987;11:.
Washington DC, November 1-4, 1987.
Moore GW, Boitnott JK, Miller RE, Eggleston JC, Hutchins GM.
Integrated anatomic pathology reporting system
using natural language diagnoses.
Modern Pathol 1988;1:44-50.
Moore GW, Miller RE, Hutchins GM.
Indexing by MeSH titles of natural language pathology phrases
identified on first encounter using the Barrier Word Method.
In: Scherrer JR, Cote RA, Mandil SH, eds.
Computerized Natural Medical Language Processing
for Knowledge Representation. North-Holland. 1989;:29-39.
Moore GW, Wakai I, Satomura Y, Giere W.
TRANSOFT: Medical translation expert system.
Artif Intell Med 1:149-157, 1989.
Moore GW.
TRANSOFT: Public-domain English-to-SNOMED computer translation shell,
using the DVA File Manager. Abstract.
Mod Pathol. 4:123A, 1991.
Moore GW.
Medical Expert System User Interface. Editorial.
Artif Intell Med. 1991:15;.
Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM.
A prototype internet autopsy database:
1625 consecutive fetal and neonatal autopsy facesheets
spanning twenty years.
Arch Pathol Lab Med. 1996;120:782-785.
http://www.medparse.com/protoiad.htm
Moore GW, Berman JJ.
Anatomic Pathology Data Mining.
Chapter 4. In: Cios KJ.
Medical Data Mining and Knowledge Discovery.
Berlin: Springer Verlag. 2000;4:61-107.
ISBN: 3-7908-1340-0, 502 pages.
Published within the series: "Studies in Fuzziness and Soft Computing",
Physica-Verlag Heidelberg, a Springer-Verlag Company.
http://www.medparse.com/apdmchap.htm
Nagao M.
Machine Translation.
In: Shapiro SC, ed. Encyclopedia of Artificial Intelligence.
Volume 2. M-Z. New York: Wiley-Interscience. 1992;2:898-902.
A nice quote from one of the leaders in the field, that captures
the fruitlessness of open-ended programs for computer translation:
"Linguistic theories ... do not cover varieties of
exceptional expressions which practical machine translation systems
have to handle. A machine translation system, which is still imperfect
and will never be completed, is exposed to very crude tests
when the system construction reaches a certain stage.
At that stage of development, the system is given
a comparatively simple sentence for translation,
with structures that can be analyzed by a grammar given to the system.
After completion, people other than those who developed the system
are asked to translate a variety of texts such as newspaper articles,
science magazines, patent documents, contract documents,
and commercial letters. Because the documents
have not been adequately tested at the development stage,
users are disappointed by the poor translation results
produced by the system. Many of the failures of the system
come from the fact that the dictionary and the grammar
are not sufficient to accept such unexpected input sentences."
Naur P.
Revised Report on the Algorithmic Language ALGOL 60.
Comm ACM, 1960 May; 3(5):299-314.
Nelson SJ, Olson NE, Fuller L, Tuttle MS, Cole WG, Sherertz DD.
Identifying concepts in medical knowledge.
Medinfo. 1995;8:33-36.
Newmeyer FJ.
Generative Linguistics. A historical Perspective.
London: Routledge. 1996;:.
Pacak MG, Pratt AW.
Identification and transformation of terminal morphemes
in medical English part II.
Methods Inf Med. 1978 Apr;17(2):95-100.
PMID: 661609.
PubMed Entry
Pareto V.
Cours d'economie politique
Geneva: Droz. 1896;:.
Lausanne and Paris: Rouge. 1897;:.
Pareto's Principle, a predecessor of Zipf's Law.
Pratt AW, Pacak M.
Identification and transformation
of terminal morphemes in medical English.
Methods Inf Med. 1969 Apr;8(2):84-90.
PMID: 5819388.
PubMed Entry
Pratt AW.
Interactive data processing in the medical research institution.
Methods Inf Med Suppl. 1976;10:65-76.
PMID: 1078477.
PubMed Entry
Sager N, Bross ID, Story G, Bastedo P, Marsh E, Shedd D.
Automatic encoding of clinical narrative.
Comput Biol Med. 1982;12(1):43-56.
PMID: 7075165.
PubMed Entry
Sager N, Wong R.
Developing a database from free-text clinical data.
J Clin Comput. 1983;11(5-6):184-194.
PMID: 10278191.
PubMed Entry
Sager N, Lyman M, Tick LJ, Nhan NT, Bucknall CE.
Natural language processing of asthma discharge summaries
for the monitoring of patient care.
Proc Annu Symp Comput Appl Med Care. 1993;:265-268.
PMID: 8130474.
PubMed Entry
Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ.
Natural language processing and the representation of clinical data.
J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-160. Review.
PMID: 7719796.
PubMed Entry
Sager N, Lyman M, Nhan NT, Tick LJ.
Automatic encoding into SNOMED III: a preliminary investigation.
Proc Annu Symp Comput Appl Med Care. 1994;:230-234.
PMID: 7949925.
PubMed Entry
Sager N, Lyman M, Nhan NT, Tick LJ.
Medical language processing: applications
to patient data representation and automatic encoding.
Methods Inf Med. 1995 Mar;34(1-2):140-146.
PMID: 9082123.
PubMed Entry
Salton G.
Automatic text analysis.
Science. 1970 Apr 17;168(929):335-343.
PMID: 5435890.
PubMed Entry
Salton G.
Experiments in automatic thesaurus construction
for information retrieval.
In: Proceedings IFIP Congress, 1971;:43-49.
Salton G, ed.
The Smart Retrieval System
- Experiments in Automatic Document Processing.
Englewood Cliffs, NJ: Prentice-Hall. 1971;:.
Salton G, McGill MJ.
Introduction to modern information retrieval.
New York: McGraw-Hill. 1983;:.
Salton G, Fox EA, Wu H.
Extended boolean information retrieval.
Communications of the ACM 1983;26:1022-1036.
Salton G, Buckley C, Fox EA.
Automatic query formulations in information retrieval.
J Am Soc Inf Sci. 1983 Jul;34(4):262-280.
PMID: 10299297.
PubMed Entry
Salton G.
Automatic Text Processing:
The Transformation, Analysis, and Retrieval of Information by Computer.
Reading, MA: Addison Wesley. 1989;:.
Salton G, Buckley C.
Global text matching for information retrieval.
Science. 1991;253:1012-1015.
Salton G, Allen J.
Selective text utilization and text traversal.
In: Proceedings of ACM Hypertext 93, New York.
New York: Association for Computing Machinery. 1993;:.
Salton G, Allan J, Buckley C, Singhal A.
Automatic analysis, theme generation
and summarization of machine-readable texts.
Science 1994;264:1421-1426.
Sawyer R, Berman JJ, Borkowski A, Moore GW.
Elevated prostate-specific antigen levels in black men and white men.
Mod Pathol. 1996 Nov;9(11):1029-1032.
http://www.medparse.com/elevpsal.htm
Sorace JM, Berman JJ, Carnahan GE, Moore GW.
PRELOG: precedence logic inference software for blood donor deferral.
Proc Annu Symp Comput Appl Med Care. 1991;:976-977.
PMID: 1807774.
PubMed Entry
Suppes P.
Introduction to Logic.
New York: Van Nostrand. 1957;:.
Suppes P.
Probabilistic grammars for natural languages.
Synthese 1970;22:95-116.
Suppes P.
Axiomatic Set Theory.
New York: Dover Publications. 1972;:.
ISBN: 0486616304.
Suppes P.
Probabilistic Metaphysics.
Oxford: Blackwell. 1984;:.
Suppes P, Bottner M, Liang L.
Machine learning comprehension grammars for ten languages.
Computational Linguistics. 1996;22:329-350.
Taylor M, Saltz J, Nichols JH.
Design of an Integrated Clinical Data Warehouse.
J Assn Lab Automation. 2000. in press.
Tersmette KWF, Scott AF, Moore GW, Matheson NW, Miller RE.
Barrier word method for detecting molecular biology multiple word terms.
Proc Annu Symp Comput Appl Med Care. 1988;12:207-211.
Washington DC, November 6-9, 1988.
Twain M.
Life on the Mississippi.
New York: Signet Classics, Reissue edition. 2001;:.
(November 7, 2001). Twain M, Kaplan J.
ISBN: 0451528174, 359 pages.
See:
http://en.wikipedia.org/wiki/Mark_Twain
Tymoczko T, ed.
New Directions in the Philosophy of Mathematics.
Princeton, NJ: Princeton University Press. 1998;:.
U. S. National Library of Medicine.
Unified Medical Language System.
http://www.nlm.nih.gov/research/umls/
U. S. National Library of Medicine.
UMLS Knowledge Sources. Eleventh Edition.
Unified Medical Language System.
U. S. Department of Health and Human Services.
National Institutes of Health.
National Library of Medicine. 2000;:.
U. S. National Library of Medicine.
UMLS Knowledge Sources. Tenth Edition.
Unified Medical Language System.
U. S. Department of Health and Human Services.
National Institutes of Health.
National Library of Medicine. 1999.
U. S. National Library of Medicine.
UMLS Knowledge Sources. Ninth Edition.
Unified Medical Language System.
U. S. Department of Health and Human Services.
National Institutes of Health.
National Library of Medicine. 1998;:.
Wilbur WJ.
Overview of Books at NCBI.
http://www.ncbi.nlm.nih.gov:80/books/mboc/bookshelp/bookover.html#link
Wingert F.
[PAULA: program for evaluation of logical expressions.
Plausibility-control and evaluation of optical mark reader forms]
Methods Inf Med. 1972 Apr;11(2):96-103.
PMID: 5026579.
PubMed Entry
Wingert F, Ries P.
[Pathology findings system]
Methods Inf Med. 1973 Jul;12(3):150-155. German.
PMID: 4729117.
PubMed Entry
Wingert F.
[Morphosyntactical analysis of compound word forms in medical language]
Methods Inf Med. 1977 Oct;16(4):248-255. German.
PMID: 337050.
PubMed Entry
Wingert F.
Morphologic analysis of compound words.
Methods Inf Med. 1985 Jul;24(3):155-162.
PMID: 4033445.
PubMed Entry
Wingert F.
Automated indexing based on SNOMED.
Methods Inf Med. 1985 Jan;24(1):27-34.
PMID: 3982279.
PubMed Entry
Wingert F.
An indexing system for SNOMED.
Methods Inf Med. 1986 Jan;25(1):22-30.
PMID: 3753739.
PubMed Entry
Wingert F.
Automated indexing of SNOMED statements into ICD.
Methods Inf Med. 1987 Jul;26(3):93-98.
PMID: 3670105.
PubMed Entry
Wingert F.
Medical linguistics: automated indexing into SNOMED.
Crit Rev Med Inform. 1988;1(4):333-403.
PMID: 3288353.
PubMed Entry
Wittgenstein L.
Philosophical Investigations [Philosophische Untersuchungen].
Third edition.
Oxford: Basil Blackwell. 1968;:.
Wong RL, Gaynon P.
An automated parsing routine for diagnostic statements
of surgical pathology reports.
Methods Inf Med. 1971 Jul;10(3):168-175.
Wong RL, Reno JD, Hain TC, Platt RC, Gaynon PS, Joseph DM.
Profile of a dictionary compiled from scanning
over one million words of surgical pathology narrative text.
Comput Biomed Res. 1980 Aug;13(4):382-398.
Yu CC-Y, Moore GW, Unschuld PU.
Romanized Chinese respelling rules for an English medical word list.
Proc Annu Symp Comput Appl Med Care. 1987;11:.
Washington DC, November 1-4, 1987.
Zhang Q.
Easy entry of Chinese character set symbols.
Proc 5th Ann Symp Comp Appl Med 1981;5:143-149.
Zipf GK.
Relative frequency as a determinant of phonetic change.
Harvard Studies in Classical Philology 1929;40:1-95.
Zipf GK.
Selective Studies and the Principle of Relative Frequency in Language.
?1932.
Zipf GK.
The Psycho-Biology of Language.
Boston, MA: Houghton Mifflin. 1935;:.
Boston, MA: MIT Press. 1965;:.
Zipf GK.
National Unity and Disunity: The Nation As a Bio-Social Organism.
Bloomington, IN: Principia Press. 1941;:.
Zipf GK.
Human Behavior and The Principle of Least Effort.
An Introduction to Human Ecology.
Reading, MA: Addison-Wesley Press. 1949;:19-55.
Campbell JR, Carpenter P, Sneiderman C, Cohn S, Chute CG, Warren J.
Phase II evaluation of clinical coding schemes:
completeness, taxonomy, mapping, definitions, and clarity.
CPRI Work Group on Codes and Structures.
J Am Med Inform Assoc. 1997 May-Jun;4(3):238-51.
http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=9147343
Chute CG, Cohn SP, Campbell KE, Oliver DE, Campbell JR.
The content coverage of clinical classifications.
For The Computer-Based Patient Record Institute's Work Group
on Codes & Structures.
J Am Med Inform Assoc. 1996 May-Jun;3(3):224-33.
PMID 8723613.
http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=8723613
Campbell JR, Payne TH.
A Comparison of Four Schemes for Codification of Problem Lists.
Proc SCAMC 1994, Washington, DC, p. 201-205
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=7949920&query_hl=4
Humphreys BL, McCray AT, Cheh ML
Evaluating the coverage of controlled health data terminologies:
report on the results of the NLM/AHCPR large scale vocabulary test.
J Am Med Inform Assoc. 1997 Nov-Dec;4(6):484-500.
http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=9391936
U. S. National Library of Medicine.
Papers covering UMLS/SNOMED/Read Codes in different domains:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Display&dopt=pubmed_pubmed&from_uid=9147343
Apelon Resources:
http://www.apelon.com/literature/conferencepapers.htm
Evaluation of SNOMED coverage of VHA Terms.
http://www.apelon.com/literature/papers/FinalVA_SNOMEDPaper.pdf
Lingologix is the commercial tool that uses
NLP to map to (SNOMED CT), in clinical use at Mayo and Hopkins:
http://www.lingologix.com/
CHAPTER 27.
MINI-HISTORIES.
Genesis 11:1-19 [circa 4000 BC]. Tower of Babel.
According to this story, all persons on earth once spoke a single language.
The people attempted to build a tower reaching to heaven. Because of their
arrogance, God punished them by confounding their languages, and their
building project failed.
There are now over 2000 distinct written languages on earth today.
In this story, different languages are viewed as a curse,
a barrier to understanding.
Aristotle
(Αριστοτελης)
[384 BC - 322 BC].
Greek philosopher, who compiled an encyclopedia of all scientific
and other human knowledge available at that time. Aristotle's Rule:
for every positive y such that x > y, there exists an
n > 0 such that yn > x. Note that if
y=0, the rule doesn't work. This and other pernicious properties
of zero caused Aristotle to avoid the concept. Zero was rediscovered
and developed almost a millennium later by Indian and Arabic mathematicians.
See:
http://en.wikipedia.org/wiki/Aristotle
Rosetta Stone [196 BC]
The Rosetta Stone is a dark granite stone with writing in two languages,
Egyptian and Greek, using three scripts: Hieroglyphic Egyptian,
Demotic Egyptian, and Greek. Because Greek was well known,
the stone was important to scholars for deciphering the hieroglyphs.
Ptolemy V assumed the crown at age five, and was faced with the task
of reclaiming lands lost to various invaders. As an attempt
to reestablish legitimacy for Ptolemy, his priests issued a series
of decrees, inscribed on stones and distributed throughout Egypt.
The Rosetta stone is the decree issued in the city of Memphis.
It stone describes various taxes repealed by Ptolemy V, and instructs
that his statues be erected in temples in three languages.
"Rosetta" is iconic for "translation", and some
computerized translation systems have "Rosetta" as part of their name.
See:
http://en.wikipedia.org/wiki/Rosetta_Stone
Qin Shi-Huang (夌始皇) [260 BC - 210 BC]
First emperor of China (Qin = Ch'in), only emperor of the Qin Dynasty,
who unified the country administratively and linguistically,
in part by burning all books which disagreed with his regime.
The advantage of this linguistic unification is that a document
written in one part of China can be read anywhere else in China
(assuming that the readers are literate), even though the spoken languages
(so-called dialects) are mutually unintelligible.
Everyone was REQUIRED to adopt the imperial ideograms, or else.
Execution of 460 scholars. (The Ten Crimes of Qin.) See:
http://en.wikipedia.org/wiki/Qin_Shi_Huang
The subject of the rise of Emperor Qin, and the conflict
of scholarship versus political unification, is treated
in the movie Ying Xiong (2002) (Hero, starring Jet Li,
Mandarin with English subtitles).
See:
http://www.imdb.com/title/tt0299977/
Acts 2:1-15. [circa 35 AD] The Christian Pentecost miracle,
where the Holy Spirit descends upon a group of disciples,
and allows them to preach in many different languages. In contrast
to the Tower of Babel, this Biblical reference is a positive reference
to the multiple languages of the earth.
Masada. [72 AD] Site of an apparent mass suicide
among first-century Jews, rather than be conquered and subjugated
to the spiritual and linguistic demands of the Roman Empire.
Chronicled by Flavius Josephus, a first-century Jewish historian,
based upon eye-witness accounts. Masada
(Hebrew: מצדה = fortress)
was built by Herod the Great between 37 and 31 BC as a refuge for himself,
in case his subjects should rise up against him. In 66 AD,
a group of Jewish rebels overtook Masada from the Roman garrison, and
used Masada as their base for raiding and harassing local settlements.
In 72 AD, the Roman governor of Judaea, Lucius Flavius Silva,
marched against Masada and eventually built a rampart against
the western plateau, using thousands of tons of stones and beaten earth.
Silva finally breached the wall of the fortress with a battering ram.
When the Romans entered the fortress, they discovered that its defenders
had set all the buildings ablaze and committed mass suicide,
rather than face certain capture or defeat. See:
http://en.wikipedia.org/wiki/Masada
Gaius Suetonius Tranquillus:
Lives of the Grammarians and Rhetoricians.
"The science of grammar was in ancient times far from being in
vogue at Rome; indeed, it was of little use in a rude state of society,
when the people were engaged in constant wars, and had not much time to
bestow on the cultivation of the liberal arts. At the outset, its
pretensions were very slender, for the earliest men of learning, who were
both poets and orators, may be considered as half-Greek: I speak of
Livius
and
Ennius,
who are acknowledged to have taught both
languages as well at Rome as in foreign parts. But they only
translated from the Greek, and if they composed anything
of their own in Latin, it was only from what they had before read.
For although there are those who say that this Ennius published two books,
one on "Letters and Syllables," and the other on "Metres," Lucius Cotta
has satisfactorily proved that they are not the works of the poet Ennius,
but of another writer of the same name...."
Translation by Alexander Thompson, MD.
See:
http://en.wikipedia.org/wiki/Suetonius
http://classicpersuasion.org/pw/cicero/suetoniusrhetor.htm
Rev. Thomas Bayes.
British Anglican priest who developed the theory of conditional probability.
See:
http://en.wikipedia.org/wiki/Bayes
Benjamin Disraeli, Earl of Beaconsfield (1804-1881).
Conservative British Prime Minister during the Victorian Era.
"There are lies, damn lies, and statistics." It is not an accident
that statistics developed in Great Britain, and that the world's best
statisticians still live and work there. Great Britain is an island nation,
and has always made its national livelihood from maritime trade.
Ships at sea, like dice at a gaming table, are subject to chance
occurrences. In his career, Disraeli must have seen more than his share
of deceptive statistics.
See:
http://en.wikipedia.org/wiki/Benjamin_Disraeli
John Maynard Keynes (1883-1946).
"In the long run, we're all dead."
British economist, who developed concepts of national fiscal
and monetary policy. Many economic theories distinguish between
short-run and long-run processes, without really specifying how long
is long-run. This quote is Keynes's ridicule of this particular paradox
of academic economics.
See:
http://en.wikipedia.org/wiki/John_Maynard_Keynes
Karl Pearson.
Early twentieth century British statistician, who introduced the
correlation coefficient, or Pearson's r.
Father of E. S. Pearson, another twentieth century statistical giant.
See:
http://en.wikipedia.org/wiki/Karl_Pearson
Aleksander N. Kolmogorov.
Great 20th c. Russian statistician and mathematician, who introduced many
non-parametric methods in statistics, including the Kolmogorov-Smirnov test.
See:
http://en.wikipedia.org/wiki/Kolmogorov
George Boole [1815-1864]
British mathematician and philosopher.
As the inventor of Boolean algebra, the basis
of all modern computer arithmetic, Boole is regarded
as one of the founders of the field of computer science,
although computers did not exist in his day.
See:
http://en.wikipedia.org/wiki/George_Boole
Col. John Shaw Billings [1838 - 1913].
U. S. surgeon and librarian, born in Indiana. In the Civil War,
Billings was medical inspector of the Army of the Potomac. After the war,
he directed the Surgeon General's Library in Washington, DC.
The catalog entries greatly increased under his supervision by 1873,
and soon thereafter, Billings began work on the Index Catalogue.
Sixteen volumes appeared before his military retirement. In 1879,
he initiated the Index Medicus, a monthly guide to current
medical literature, which eventually became PubMed, curated
by the U. S. National Library of Medicine. Dr. Billings designed
plans for the construction of Johns Hopkins Hospital. His works
include classic essays on hospital administration and training.
Under his leadership (1864 - 1895), the National Library of Medicine
became one of the greatest medical library systems in the world.
Émile Baudot
The Baudot code was used extensively in telegraph systems.
It is a five bit code invented by the Frenchman Emile Baudot in 1870.
Ludwig Josef Johann Wittgenstein [1889 - 1951]
was an Austrian philosopher, who contributed several ground-breaking works
to modern philosophy, primarily on the foundations of logic
and the philosophy of language. He is widely regarded
as one of the most influential philosophers of the 20th century.
See:
http://en.wikipedia.org/wiki/Ludwig_Wittgenstein
George Kingsley Zipf [1902-1950] was an American linguist
and philologist, who studied the statistical properties
of different languages. He is the eponym of Zipf's Law
(actually, Zipf's First Law), which states that only a few words
are used very often, whereas many or most words are used rarely,
according to the formula:
f = k/r
where f is word-frequency, r is word-rank, and
k is a constant. Zipf's work was treated harshly
when it first appeared, perhaps somewhat justifiably
because Zipf's claims were so grandiose: namely, an explanation
for all linguistic usage in all major human languages.
Also, Zipf's "principle of least effort" (i.e., speakers use a few words
repeatedly, because they are linguistically lazy) has never been verified
experimentally.
As recently as a few years ago, a humanities professor from a prestigious
east-coast university made disparaging remarks to me about Zipf's work.
(This wasn't a friendly conversation: I impugned the professor's
abilities and discernment as a scientist.) Also, Zipf's
"principle of least effort" (i.e., speakers use a few words
repeatedly, because they are linguistically lazy) has never been verified
experimentally.
However, Zipf was right. His basic claim (i.e., Zipf's First Law)
has been verified for many major languages, including
English, German,
and Chinese; as well as for specialized bodies
of medical text, including The Frankfurt University
Medical Consultation Database and
The Johns Hopkins University Autopsy Facesheets.
Major internet indexing systems (google.com, yahoo.com) apparently
exploit Zipf's First Law, although their exact search algorithms
are closely-guarded trade secrets.
Furthermore, as anyone knows who has studied a second language,
all beginning (i.e., first-year) textbooks introduce fewer than
a thousand words. Even though this is the vocabulary of a preschooler,
it is the thousand most-used words in the language, and gets you
a pretty good start on ordering dinner or checking into a hotel.
Zipf died at age 48, and did not live to see the
incredible growth of interest in his work. See:
http://en.wikipedia.org/wiki/George_Kingsley_Zipf
Marvin Lee Minsky [1927-]. U. S. scientist in the field
of artificial intelligence (AI), co-founder of the Laboratory
of Artificial Intelligence at the Massachusetts Institute of Techology,
and author of several texts on AI and philosophy. He served in the
U.S. Navy in 1944-1945. He holds a BA in Mathematics from Harvard (1950)
and a PhD in Mathematics from Princeton (1954). He has been on the faculty
of the Massachusetts Institute of Techology since 1958. He is currently
Toshiba Professor of Media Arts and Sciences and Professor
of Electrical Engineering and Computer Science at the
Massachusetts Institute of Technology.
Prof. Avram Noam Chomsky [1928-] is
Institute Professor Emeritus of linguistics at the
Massachusetts Institute of Technology. Chomsky developed
the theory of generative grammar, regarded as the most significant
contribution to the field of theoretical linguistics of the 20th century.
Chomsky established the so-called Chomsky hierarchy, a classification
of formal languages in terms of their generative power. Chomsky
is also widely known for his political activism, and for his criticism
of the foreign policy of the United States and other governments,
particularly in the Vietnam War era. See:
http://en.wikipedia.org/wiki/Noam_Chomsky
Lotfi Asker Zadeh [1922-].
The so-called "Pope of Fuzzy Logic", whose 1968 paper
introducing fuzzy set theory has been cited over 11,000
times in peer-reviewed journals of mathematics,
computer science, or engineering.
See:
http://en.wikipedia.org/wiki/Lotfi_Zadeh
William S. Gossett (Student).
An employee of the Guinness Brewery in Dublin, Ireland,
who wrote the ground-breaking papers in the British journal, Nature,
about the Student t test. Gossett was a student of Karl Pearson,
but because Gossett did his work as an employee, he concealed his identity
because of his commercial ties. His papers were signed, simply,
Student.
The Guinness Book of World Records
was written by the Guinness Brewery as an aid to settle arguments
in British bars were Guinness products were served.
See:
http://en.wikipedia.org/wiki/William_Sealey_Gossett
Sir Ronald A. Fisher.
Greatest British statistician of the twentieth century.
Sir Ronald corrected a small error in a formula for variance
that had originally been promulgated by Karl Pearson.
Fisher proved that the correct formula for the sample variance
is:
s2 = (∑ni=1
(xi) - x)2)/(n-1), not
s2 = (∑ni=1
(xi) - x)2)/n, as Pearson had thought.
Sir Ronald was the scientist who demonstrated statistically
that Mendel had probably fudged his data.
The F-test for the analysis of variance is named in honor of Fisher.
However, Sir Ronald sold out to the tobacco industry.
When the news first emerged that tobacco use was bad for your health,
Fisher defended the tobacco industry by asserting that the
cause-effect relationship was not conclusively demonstrated.
Fisher developed the concept of CONFOUNDING,
in which he argued that tobacco users might have some other
mysterious quality that caused them to develop tobacco-related
illnesses, apart from the tobacco use. Fisher's prominence
in the field of statistics helped the tobacco industry hide
from its responsibilities for a number of years.
Fisher's assertion was eventually rebuffed by the fact
that tobacco users who quit experienced subsequent decrease
in tobacco-related illnesses.
See:
http://en.wikipedia.org/wiki/Ronald_Fisher
Huff D.
How to lie with statistics.
New York: W. W. Norton & Company. 1954;:.
ISBN 0-393-31072-8, 142 pages.
"In the space of one hundred seventy-six years, the Lower Mississippi
has shortened itself two hundred and forty-two miles. That is an average
of a trifle over one mile and a third per year. Therefore, any calm person,
who is not blind or idiotic, can see that in the Old Oölitic
Silurian Period, just a million years ago next November,
the Lower Mississippi River was upward of one million three hundred thousand
miles long, and stuck out over the Gulf of Mexico like a fishing-rod.
And by the same token, any person can see that seven hundred and forty-two
years from now, the Lower Mississippi will be only a mile and three-quarters
long, and Cairo [Illinois] and New Orleans [Louisiana] will have joined
their streets together, and be plodding comfortably along
under a single mayor and a mutual board of aldermen. There is something
fascinating about science. One gets such wholesale returns of conjecture
out of such a trifling investment of fact."
Cited in: Huff D. How to lie with statistics. New York:
W. W. Norton & Company. 1954;:. ISBN 0-393-31072-8, 142 pages.
Page 142.
COMMENT. Mark Twain's classic book, Life on the Mississippi,
is the first book in the world ever submitted by an author to a publisher
as a typewritten manuscript, in 1883. The inventor of the typewriter
was ... Howe, who was born on June 23, 18..
The (mechanical) typewriter was invented in 1868.
Source: Garrison Keillor, Author's Corner, Maryland Public Radio,
June 23, 2004.
CHAPTER 28.
GLOSSARY.
Estimation. The statistical procedure of determining
the best value for a statistical parameter, given sample data.
Random variable. Function X : S -> R,
that maps a probability event space into the real line, R.
Expected Value: The average value, E(X),
of a random value over the probability space: E(X) = ∑ x P(X=x).
Variance: The average squared deviation, Var(X),
of a random value from its expected value,
over the probability space: Var(X) = E[X - E(X)]]2.
Hypothesis testing.
Null hypothesis.
Alternative hypothesis.
Set theory: Zermelo-Frankel Set Theory (ZFST) is ordinary set theory.
Set: Undefined concepts of ZFST: is-a-member-of or belongs-to
∈; null-set or empty-set, Ø or {}.
Set: defined exactly by its members, arbitrary order.
Set-of-x not equal x.
There are no repeat elements in a set.
Set-Roster (extensional, list) notation:
set X = {heart, lung, liver, pancreas, ...}.
Set-Raster (intensional) notation:
O = {x|x is a major-body-organ}.
Set-subset: X ⊆ Y if and only if for every
x ∈ X,
x ∈ Y.
set-equality: X = Y if and only if X ⊆ Y
and Y ⊆ X.
set-union: X ∪ Y is the set of all x such that
x ∈ X or x ∈ Y or both.
Set-intersection: X ∪ Y is the set of all x
such that x ∈ X and x ∈ Y.
Set-subtraction: X - Y is the set of all x such that
x ∈ X and x ~∈ Y.
CHAPTER 29.
ADDITIONAL READINGS.
Campbell JR, Carpenter P, Sneiderman C, Cohn S, Chute CG, Warren J.
Phase II evaluation of clinical coding schemes: completeness, taxonomy,
mapping, definitions, and clarity. CPRI Work Group on Codes and Structures.
J Am Med Inform Assoc. 1997 May-Jun;4(3):238-251.
PMID: 9147343.
PubMed Entry
Humphreys BL, McCray AT, Cheh ML.
Evaluating the coverage of controlled health data terminologies:
report on the results of the NLM/AHCPR large scale vocabulary test.
J Am Med Inform Assoc. 1997 Nov-Dec;4(6):484-500.
PMID: 9391936.
PubMed Entry
Chute CG, Cohn SP, Campbell KE, Oliver DE, Campbell JR.
The content coverage of clinical classifications.
For The Computer-Based Patient Record Institute's Work Group
on Codes & Structures.
J Am Med Inform Assoc. 1996 May-Jun;3(3):224-233.
PMID: 8723613.
Cimino JJ.
Review paper: coding systems in health care.
Methods Inf Med. 1996 Dec;35(4-5):273-284.
PMID: 9019091.
Campbell JR, Payne TH.
A comparison of four schemes for codification of problem lists.
Proc Annu Symp Comput Appl Med Care. 1994;:201-205.
PMID: 7949920.
Langlotz CP, Caldwell SA.
The completeness of existing lexicons for representing
radiology report information.
J Digit Imaging. 2002;15 Suppl 1:201-5. Epub 2002 Mar 21.
PMID: 12105728.
Hales JW, Schoeffler KM, Kessler DP.
Extracting medical knowledge for a coded problem list
vocabulary from the UMLS Knowledge Sources.
Proc AMIA Symp. 1998;:275-279.
PMID: 9929225.
Brown PJ, Warmington V, Laurence M, Prevost AT.
Randomised crossover trial comparing the performance of Clinical Terms
Version 3 and Read Codes 5 byte set coding schemes in general practice.
BMJ. 2003 May 24;326(7399):1127.
PMID: 12763986.
Elkin PL, Ruggieri AP, Brown SH, Buntrock J, Bauer BA,
Wahner-Roedler D, Litin SC, Beinborn J, Bailey KR, Bergstrom L.
A randomized controlled trial of the accuracy
of clinical record retrieval using SNOMED-RT as compared with ICD9-CM.
Proc AMIA Symp. 2001;:159-163.
PMID: 11825173.
Mullins HC, Scanland PM, Collins D, Treece L, Petruzzi P Jr,
Goodson A, Dickinson M.
The efficacy of SNOMED, Read Codes, and UMLS in coding
ambulatory family practice clinical records.
Proc AMIA Annu Fall Symp. 1996;:135-139.
PMID: 8947643.
PubMed Entry
Bodenreider O, Burgun A, Botti G, Fieschi M, Le Beux P, Kohler F.
Evaluation of the Unified Medical Language System
as a medical knowledge source.
J Am Med Inform Assoc. 1998 Jan-Feb;5(1):76-87.
PMID: 9452987.
PubMed Entry
Campbell KE, Musen MA.
Representation of clinical data using SNOMED III and conceptual graphs.
Proc Annu Symp Comput Appl Med Care. 1992;:354-358.
PMID: 1482897.
PubMed Entry
Campbell JR.
Semantic features of an enterprise interface terminology for SNOMED RT.
Medinfo. 2001;10(Pt 1):82-85.
PMID: 11604710.
PubMed Entry
Han SB, Kwak M, Kim S, Yoo S, Park H, Kijoo J, Kim J, Choi M, Choi J.
A comparative study on concept representation between the UMLS
and the clinical terms in Korean medical records.
Medinfo. 2004;11(Pt 1):616-620.
PMID: 15360886.
PubMed Entry
O'Keefe KM, Sievert M, Mitchell JA.
Mendelian inheritance in man: diagnoses in the UMLS.
Proc Annu Symp Comput Appl Med Care. 1993;:735-739.
PMID: 8130573.
PubMed Entry
Humphreys BL, Hole WT, McCray AT, Fitzmaurice JM.
Planned NLM/AHCPR large-scale vocabulary test:
using UMLS technology to determine the extent to which controlled
vocabularies cover terminology needed for health care and public health.
J Am Med Inform Assoc. 1996 Jul-Aug;3(4):281-287.
PMID: 8816351.
PubMed Entry
Han SB, Choi J.
The comparative study on concept representation between
the UMLS and the clinical terms in Korean medical records.
Int J Med Inform. 2005 Jan;74(1):67-76.
PMID: 15626637.
PubMed Entry
Wasserman H, Wang J.
An applied evaluation of SNOMED CT as a clinical vocabulary
for the computerized diagnosis and problem list.
AMIA Annu Symp Proc. 2003;:699-703.
PMID: 14728263.