Dermatopathology False Negative Terms
in Unified Medical Language System (UMLS).
Grace F. Kao, M.D. [1,2].
G. William Moore, MD, PhD. [1,3,4].
From: Pathology and Laboratory Medicine Service,
Veterans Affairs Maryland Health Care System, Baltimore, Maryland [1].
Division of Dermatology and Department of Medicine,
George Washington University School of Medicine, Washington, D.C. [2].
Department of Pathology, University of Maryland School of Medicine,
Baltimore, Maryland [3].
Department of Pathology, The Johns Hopkins Medical Institutions,
Baltimore, Maryland [4].
TABLE OF CONTENTS.
1. ABSTRACT.
2. INTRODUCTION.
3. DESIGN.
4. UNIFIED MEDICAL LANGUAGE SYSTEM.
5. REDUNDANT INDEXING OF SUBCONCEPTS.
6. BARRIER WORD METHOD.
7. BARRIER WORD METHOD: SAMPLE TEXT.
8. AMBIGUITIES IN UMLS.
9. RESULTS.
10. DISCUSSION.
11. REFERENCES.
12. ZIPF DISTRIBUTION OF UMLS CUIS.
1. ABSTRACT.
NEXT PAGE.
RETURN TO TABLE OF CONTENTS.
Background: The Unified Medical Language System Metathesaurus
(UMLS-M) of the U. S. National Library of Medicine
is the most comprehensive, publicly-available list
of standardized medical terminology in the world.
Published false-negative rates for the UMLS-M
applied to general medical text are reported as one to two percent.
This study examines the false-negative rate of UMLS-M concepts
in publicly-available electronic dermatopathology text.
Design:
The entire collection of uncopyrighted dermatopathology image-legends
from the the Armed Forces Institute of Pathology Electronic Fascicles
(AFIP-EF) on melanocytic and non-melanocytic tumors of the skin
was encoded into the UMLS, via a computer translation program
that parses and maps plain-text into UMLS terms
by the barrier word method, in which natural-language medical text
is processed as a sequence of MEDICALLY-SIGNIFICANT TERMS,
linked together with grammatical objects, or BARRIER WORDS.
A medically-significant term, present in the image-legend
but not captured by the program, was defined as a FALSE NEGATIVE.
Conversely, a term captured by the program but not present in the text,
was defined as a FALSE POSITIVE.
By design, the program captured no false positives.
Ambiguous terms and compound terms containing subconcepts
were indexed redundantly.
Results:
There were 446 image-legends from the AFIP-EFs
on melanocytic and non-melanocytic tumors of the skin,
2,140 distinct words, and 8,334 UMLS concept-terms.
Each image-legend yielded an average of 18.7 UMLS concept-terms per legend,
ranging in frequency from three concept-terms in the least-indexed legend
to 57 concept-terms in the most-indexed legend.
There were 260 false-negative terms, for a false-negative rate of 3.1%.
False-negative UMLS concepts tended to be descriptive terms
in dermatopathology that characterize microscopic findings.
Conclusion:
The UMLS has a false-negative rate for dermatopathology concepts
of 3.1%, similar to the reported findings for general medical text.
This study supports the view that the UMLS
is a nearly-comprehensive metathesaurus for dermatopathology text.
Missing concepts could be suggested for incorporation
into future updates of the UMLS.
2. INTRODUCTION.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
UNIFIED MEDICAL LANGUAGE SYSTEM METATHESAURUS
(UMLS-M) OF THE U. S. NATIONAL LIBRARY OF MEDICINE (USNLM).
MOST COMPREHENSIVE, PUBLICLY-AVAILABLE LIST
OF STANDARDIZED MEDICAL TERMINOLOGY IN THE WORLD.
PUBLISHED FALSE-NEGATIVE RATES FOR UMLS-M: 1-2%.
WHAT IS THE
FALSE-NEGATIVE RATE FOR PUBLICLY-AVAILABLE DERMATOPATHOLOGY TEXT?
3. DESIGN.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
DERMATOPATHOLOGY IMAGE-LEGENDS:
ARMED FORCES INSTITUTE OF PATHOLOGY ELECTRONIC FASCICLES (AFIP-EF).
MELANOCYTIC AND NON-MELANOCYTIC TUMORS OF THE SKIN.
COMPUTER-ENCODED INTO UMLS,
WITH ENRICHED SYNONYM LIST.
FALSE NEGATIVE: MEDICALLY-SIGNIFICANT TERM, PRESENT IN THE IMAGE-LEGEND
BUT NOT CAPTURED BY ENCODING PROGRAM.
FALSE NEGATIVE: UMLS CONCEPT NOT PRESENT.
AMBIGUOUS TERMS AND COMPOUND TERMS CONTAINING SUBCONCEPTS
INDEXED REDUNDANTLY.
BY DESIGN, ENCODER CAPTURED NO FALSE POSITIVES.
4. UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS).
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS)
:
DEVELOPED BY U.S. NATIONAL LIBRARY OF MEDICINE
(USNLM) IN 1986.
PURPOSE: AID DEVELOPMENT
OF SYSTEMS
TO RETRIEVE ELECTRONIC
BIOMEDICAL INFORMATION.
http://www.nlm.nih.gov/research/umls/
LAST UPDATED: March 19, 1999.
SIZE: 96,412,092 BYTES.
CONCEPT UNIQUE IDENTIFIERS (CUIs): 625,530, MAX=C0700344.
SYNONYMS: 1,362,823.
OVER 50 SOURCE-VOCABULARIES.
5. REDUNDANT INDEXING OF SUBCONCEPTS.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
CELLULAR BLUE NEVUS REDUNDANTLY INDEXED AS:
CELLULAR BLUE NEVUS (C0334448).
BLUE NEVUS (C0206736).
CELL (C0007634).
BLUE (C0332584).
NEVUS (C0027960).
6. BARRIER WORD METHOD.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
NATURAL-LANGUAGE MEDICAL TEXT: SEQUENCE OF MEDICAL CONCEPTS
SEPARATED BY GRAMMATICAL OBJECTS.
THE GRAMMATICAL OBJECTS, OR BARRIER WORDS:
NUMERALS, PUNCTUATION, SINGLE LETTERS, ARTICLES, PREPOSITIONS,
AND COMMON VERBS AND MODIFIERS.
MEDICAL CONCEPTS, OR KEYWORDS:
ARE ONE-WORD OR MULTIPLE-WORD TERMS CONSISTING OF MEDICALLY SIGNIFICANT
WORDS.
7. BARRIER WORD METHOD: SAMPLE TEXT.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
LENTIGINOUS COMPOUND NEVUS . this LESION is an EARLY COMPOUND NEVUS ,
because a NEST has MIGRATED from the EPIDERMIS into the DERMIS
( lower right of c ) . elsewhere , the HISTOLOGY
is that of a SIMPLE LENTIGO .
barrier words in lower case.
KEYWORDS IN UPPER CASE.
LEGEND NAME |
UMLS CODE |
UMLS NAME |
| LENTIGINOUS |
C0023321 |
Lentigo |
| COMPOUND NEVUS |
C0259781 |
Compound Nevus |
| LESION |
C0012634 |
Lesion |
| EARLY |
C0205085 |
Early |
| COMPOUND NEVUS |
C0259781 |
Compound Nevus |
| NEST |
C0205234 |
Focal |
| MIGRATED |
C0232902 |
Migration |
| EPIDERMIS |
C0014520 |
Epidermis |
| DERMIS |
C0011646 |
Dermis |
| LOWER |
C0205104 |
Inferior |
| RIGHT |
C0205090 |
Right |
| HISTOLOGY |
C0019638 |
Histologic |
| SIMPLE LENTIGO |
C0302255 |
Lentigo Simplex |
8. AMBIGUITIES IN UMLS.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
ADNEXA WITHOUT NEARBY DISAMBIGUATING WORD:
SKIN ADNEXA (C0221943)
UTERINE ADNEXA (C0001575)
OCULAR ADNEXA (C0229243)
9. RESULTS.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
446 IMAGE-LEGENDS FROM THE AFIP-EF
ON MELANOCYTIC AND NON-MELANOCYTIC TUMORS OF THE SKIN.
2,140 DISTINCT WORDS, AND 8,334 UMLS CONCEPTS.
AVERAGE 18.7 = 446/8,334 UMLS CONCEPTS PER LEGEND.
FREQUENCY RANGE: FROM THREE CONCEPTS IN THE LEAST-INDEXED LEGEND
TO 57 CONCEPTS IN THE MOST-INDEXED LEGEND.
260 FALSE-NEGATIVE TERMS; FALSE-NEGATIVE RATE, 3.1% = 260/8,334.
10. DISCUSSION.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
260 FALSE-NEGATIVE TERMS, FALSE-NEGATIVE RATE OF 3.1%.
FALSE-NEGATIVE UMLS CONCEPTS TENDED TO BE DESCRIPTIVE TERMS
IN DERMATOPATHOLOGY THAT CHARACTERIZE MICROSCOPIC FINDINGS.
SIMILAR TO THE REPORTED FINDINGS FOR GENERAL MEDICAL TEXT.
UMLS: NEARLY-COMPREHENSIVE METATHESAURUS FOR DERMATOPATHOLOGY TEXT.
MISSING CONCEPTS COULD BE SUGGESTED FOR INCORPORATION
INTO FUTURE UPDATES OF UMLS.
11. REFERENCES.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
1. UMLS Knowledge Sources. 9th edition. 1998. DOCUMENTATION.
National Institutes of Health. National Library of Medicine.
Bethesda, Maryland 20854.
2. College of American Pathologists. Systematized Nomenclature
of Human and Veterinary Medicine (SNOMED International).
College of American Pathologists, Northfield, IL, 1993.
3. Berman JJ, Moore GW.
SNOMED-encoded surgical pathology databases:
A tool for epidemiologic investigation.
Mod Pathol. 1996 Sep;9(9):944-950.
4. Silverberg SG.
SNOMED-encoded surgical pathology databases:
's no big deal - or is it?
Mod Pathol. 1996 Sep;9(9):953-954.
5. Moore GW, Berman JJ.
Automatic SNOMED coding.
Proc Annu Symp Comput Appl Med Care. 1994;18:225-229.
6. Moore GW, Berman JJ.
Performance analysis of manual and automated
systematized nomenclature of medicine (SNOMED) coding.
Am J Clin Pathol. 1994 Mar;101(3):253-256.
7. Berman JJ, Moore GW.
Object-oriented controlled-vocabulary translator
using TRANSOFT + HyperPAD.
Proc Annu Symp Comput Appl Med Care. 1991;15:973-975.
8. Berman JJ, Moore GW, Donnelly WH, Massey JK, Craig B.
A SNOMED analysis of three years accessioned cases
(40,124) of a surgical pathology department:
implications for pathology-based demographic studies.
Proc Annu Symp Comput Appl Med Care. 1994;18:188-192.
9. Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM.
A prototype Internet autopsy database.
1625 consecutive fetal and neonatal
autopsy facesheets spanning 20 years.
Arch Pathol Lab Med. 1996 Aug;120(8):782-785.
10. Berman JJ, Moore GW, Hutchins GM.
Internet autopsy database.
Hum Pathol. 1997 Apr;28(4):393-394.
11. Moore GW, Miller RE, Hutchins GM. Indexing by MeSH titles
of natural language pathology phrases identified on first encounter
using the barrier word method. In: Scherrer JR,
Côté RA, Mandil SH, eds.
Computerized Natural Medical Language Processing for Knowledge
Representation. Amsterdam: North-Holland; pp 29-39, 1989.
12. Murphy GF, Elder DA.
Armed Forces Institute of Pathology Atlas
of Tumor Pathology. Non-Melanocytic Tumors of the Skin.
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
13. Elder DA, Murphy GF.
Armed Forces Institute of Pathology Atlas
of Tumor Pathology. Melanocytic Tumors of the Skin.
Electronic Fascicle version 2.0. Washington, D.C.
Armed Forces Institute of Pathology.
14. Ackerman AB.
Histologic Diagnosis of Inflammatory Skin Diseases.
A Method by Pattern Analysis.
Philadelphia: Lea & Febiger. 1978.
15. McKee PH.
Pathology of the Skin, with clinical correlations.
Philadelphia: J.B. Lippincott Co. 1989.
16. Ghatan HEY.
Dermatological Differential Diagnosis and Pearls.
New York: The Parthenon Publishing Group. 1994.
17. Mehregan AH.
Pinkus Guide to Dermatohistopathology. Fourth Edition.
Norwalk, CT: Appleton-Century-Crofts. 1986.
18. Hood AF, Kwan TH, Burnes DC, Mihm MC.
Primer of Dermatopathology.
Boston: Little, Brown and Company. 1984.
19. Lever WF, Schaumburg-Lever G.
Histopathology of the Skin. Seventh Edition.
Philadelphia: J.B.Lippincott Company. 1990.
20. Farmer ER, Hood AF.
Pathology of the Skin.
Norwalk, CT: Appleton & Lange. 1990.
12. ZIPF DISTRIBUTION OF UMLS CUIS.
NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
| RANK | FREQUENCY |
UMLS CODE | UMLS NAME |
| 1 | 198 |
C0007634 | CELL |
| 2 | 147 |
C0012634 | DISEASE |
| 3 | 132 |
C0011646 | DERMIS |
| 4 | 117 |
C0027651 | NEOPLASIA |
| 5 | 116 |
C0441469 | PICTURE |
| 6 | 98 |
C0332285 | ARISE FROM |
| 7 | 97 |
C0007952 | CHARACTER |
| 8 | 83 |
C0205234 | FOCAL |
| 9 | 81 |
C0205165 | LESS |
| 10 | 75 |
C0014520 | EPIDERMIS TISSUE |
| 11 | 71 |
C0025202 | CUTANEOUS MELANOMA |
| 12 | 71 |
C0205615 | DIFFERENTIATE |
| 13 | 68 |
C0027960 | NEVUS |
| 14 | 68 |
C0205124 | SUPERFICIAL |
| 15 | 66 |
C0028259 | NODULE |
| 16 | 66 |
C0334094 | PROLIFERATE |
| 17 | 64 |
C0205182 | ATYPIA |
| 18 | 63 |
C0205160 | NEGATIVE |
| 19 | 57 |
C0205210 | CLINICAL |
| 20 | 55 |
C0332448 | INFILTRATE |
| 21 | 53 |
C0221928 | DERMAL |
| 22 | 52 |
C0205286 | MATURE |
| 23 | 51 |
C0004083 | ASSOCIATED |
| 24 | 51 |
C0221920 | EPIDERMAL FEATURE |
| 25 | 51 |
C0348078 | FORM |
| 26 | 47 |
C0205397 | SEEN |
| 27 | 46 |
C0205125 | DEEP |
| 28 | 46 |
C0205250 | ELEVATED |
| 29 | 46 |
C0018270 | GROWTH |
| 30 | 43 |
C0205390 | PHASE |
| 31 | 41 |
C0205183 | BENIGN |
| 32 | 41 |
C0439739 | RETICULAR |
| 33 | 41 |
C0205392 | SOME |
| 34 | 40 |
C0205402 | PROMINENT |
| 35 | 39 |
C0233426 | APPEARANCE |
| 36 | 39 |
C0205103 | BETWEEN |
| 37 | 39 |
C0005971 | BONE PLATE |
| 38 | 39 |
C0439712 | PATTERN |
| 39 | 39 |
C0333610 | PIGMENT |
| 40 | 39 |
C0150312 | PRESENT |
| 41 | 39 |
C0441633 | SCANNING |
| 42 | 38 |
C0162597 | STROMAL CELL |
| 43 | 38 |
C0221908 | EPITHELIAL FEATURE |
| 44 | 37 |
C0205146 | AREA |
| 45 | 37 |
C0009325 | COLLAGEN |
| 46 | 37 |
C0205164 | GREAT |
| 47 | 37 |
C0025201 | MELANOCYTE |
| 48 | 36 |
C0502379 | DUCT |
| 49 | 36 |
C0332258 | EXTEND |
| 50 | 36 |
C0205431 | FORMED |
| 51 | 35 |
C0205428 | AFFECTING |
| 52 | 33 |
C0205085 | BEFORE |
| 53 | 32 |
C0205112 | BASAL |
| 54 | 32 |
C0205099 | CENTER |
| 55 | 32 |
C0443228 | GREATER |
| 56 | 32 |
C0205172 | MANY |
| 57 | 32 |
C0442038 | RADIAL |
| 58 | 31 |
C0205284 | BORDER |
| 59 | 30 |
C0009085 | CLUSTER |
| 60 | 30 |
C0006826 | CANCER |
| 61 | 30 |
C0205419 | VARIANT |
| 62 | 29 |
C0037267 | SKIN |
| 63 | 28 |
C0013879 | ELEMENT |
| 64 | 28 |
C0010709 | CYST |
| 65 | 28 |
C0014609 | EPITHELIUM TISSUE |
| 66 | 28 |
C0205091 | LEFT |
| 67 | 28 |
C0205171 | SINGLE |
| 68 | 28 |
C0035621 | RIGHT |
| 69 | 27 |
C0205113 | CIRCULAR |
| 70 | 27 |
C0010834 | CYTOPLASM |
| 71 | 27 |
C0521125 | FOR |
| 72 | 27 |
C0205312 | PAPILLARY |
| 73 | 27 |
C0030705 | PATIENT |
| 74 | 26 |
C0024264 | LYMPHOCYTE |
| 75 | 26 |
C0439064 | NUMEROUS |
| 76 | 25 |
C0205748 | DYSPLASTIC NEVUS |
| 77 | 25 |
C0332261 | SPREAD |
| 78 | 25 |
C0332516 | SYMMETRIC |
| 79 | 24 |
C0007117 | BASAL CELL CARCINOMA |
| 80 | 24 |
C0449432 | COMPONENT |
| 81 | 24 |
C0016059 | DESMOPLASIA |
| 82 | 24 |
C0011900 | DIAGNOSES |
| 83 | 24 |
C0332184 | RARE |
| 84 | 24 |
C0445247 | SAME |
| 85 | 24 |
C0042591 | VESSEL |
| 86 | 23 |
C0205105 | ABOVE |
| 87 | 23 |
C0391854 | OVERLYING |
| 88 | 23 |
C0332461 | PATCH |
| 89 | 23 |
C0205375 | UNIFORM |
| 90 | 23 |
C0332307 | TYPE |
| 91 | 22 |
C0332183 | FREQUENT |
| 92 | 22 |
C0205393 | MOST |
| 93 | 22 |
C0521447 | NUCLEAR |
| 94 | 22 |
C0332251 | PREDOMINANT |
| 95 | 21 |
C0003737 | ARCHITECTURE |
| 96 | 21 |
C0178499 | BASE |
| 97 | 21 |
C0238767 | BILATERAL |
| 98 | 21 |
C0006382 | HIS |
| 99 | 21 |
C0332120 | EVIDENCE FOR |
| 100 | 21 |
C0439682 | FOLLICULAR |
FREQUENCY DISTRIBUTION OF
100 MOST FREQUENT UMLS CONCEPTS
IN AFIP SKIN LEGEND-TEXTS.
13. FALSE NEGATIVE
DERMATOPATHOLOGY CONCEPTS.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.
CONCEPTS FIRST ADDED IN 1999, ABSENT IN 1998 UMLS.
ECCRINE POROCARCINOMA (C0547063).
PERICYTE (C0598800).
SPINDLE CELL (C0682540).
TRUE FALSE NEGATIVE CONCEPTS.
ENDOPHYTIC (GROWTH BENEATH SURFACE).
EPIDERMAL COLLARETTE (OF PYOGENIC GRANULOMA).
EXOPHYTIC (GROWTH ABOVE SURFACE).
GLABROUS SKIN (NON-HAIRY SKIN: SOLES & PALMS).
GRENZ ZONE (BORDER ZONE BETWEEN EPIDERMIS AND DERMIS)
KAMINO BODY (= GLOBOID BODY, OF SPITZ NEVUS).
MELANOPHAGE (MACROPHAGES CONTAINING MELANIN).
PAGETOID SPREAD (IN MELANOMA, PAGET'S DISEASE).
PALISADING CELLS (IN BASAL CELL CARCINOMA, VEROCAY BODY).
PARAKERATOTIC SPIRE (= CHURCH SPIRE, PARAKERATOTIC MOUND, IN VERRUCA).
SQUAMOUS EDDY (OF IRRITATED SEBORRHEIC KERATOSIS).
STARRY SKY (IN RAPIDLY GROWING LYMPHOID TISSUE)
VEROCAY BODY (OF SCHWANNOMA).