Dermatopathology False Negative Terms
in Unified Medical Language System (UMLS).

Grace F. Kao, M.D. [1,2].
G. William Moore, MD, PhD. [1,3,4].



      From: Pathology and Laboratory Medicine Service, Veterans Affairs Maryland Health Care System, Baltimore, Maryland [1]. Division of Dermatology and Department of Medicine, George Washington University School of Medicine, Washington, D.C. [2]. Department of Pathology, University of Maryland School of Medicine, Baltimore, Maryland [3]. Department of Pathology, The Johns Hopkins Medical Institutions, Baltimore, Maryland [4].

TABLE OF CONTENTS.


1. ABSTRACT.
2. INTRODUCTION.
3. DESIGN.
4. UNIFIED MEDICAL LANGUAGE SYSTEM.
5. REDUNDANT INDEXING OF SUBCONCEPTS.
6. BARRIER WORD METHOD.
7. BARRIER WORD METHOD: SAMPLE TEXT.
8. AMBIGUITIES IN UMLS.
9. RESULTS.
10. DISCUSSION.
11. REFERENCES.
12. ZIPF DISTRIBUTION OF UMLS CUIS.


1. ABSTRACT.


NEXT PAGE.
RETURN TO TABLE OF CONTENTS.

      Background: The Unified Medical Language System Metathesaurus (UMLS-M) of the U. S. National Library of Medicine is the most comprehensive, publicly-available list of standardized medical terminology in the world. Published false-negative rates for the UMLS-M applied to general medical text are reported as one to two percent. This study examines the false-negative rate of UMLS-M concepts in publicly-available electronic dermatopathology text.

      Design: The entire collection of uncopyrighted dermatopathology image-legends from the the Armed Forces Institute of Pathology Electronic Fascicles (AFIP-EF) on melanocytic and non-melanocytic tumors of the skin was encoded into the UMLS, via a computer translation program that parses and maps plain-text into UMLS terms by the barrier word method, in which natural-language medical text is processed as a sequence of MEDICALLY-SIGNIFICANT TERMS, linked together with grammatical objects, or BARRIER WORDS. A medically-significant term, present in the image-legend but not captured by the program, was defined as a FALSE NEGATIVE. Conversely, a term captured by the program but not present in the text, was defined as a FALSE POSITIVE. By design, the program captured no false positives. Ambiguous terms and compound terms containing subconcepts were indexed redundantly.

      Results: There were 446 image-legends from the AFIP-EFs on melanocytic and non-melanocytic tumors of the skin, 2,140 distinct words, and 8,334 UMLS concept-terms. Each image-legend yielded an average of 18.7 UMLS concept-terms per legend, ranging in frequency from three concept-terms in the least-indexed legend to 57 concept-terms in the most-indexed legend. There were 260 false-negative terms, for a false-negative rate of 3.1%. False-negative UMLS concepts tended to be descriptive terms in dermatopathology that characterize microscopic findings.

      Conclusion: The UMLS has a false-negative rate for dermatopathology concepts of 3.1%, similar to the reported findings for general medical text. This study supports the view that the UMLS is a nearly-comprehensive metathesaurus for dermatopathology text. Missing concepts could be suggested for incorporation into future updates of the UMLS.


2. INTRODUCTION.


NEXT PAGE.
PREVIOUS PAGE.
RETURN TO TABLE OF CONTENTS.


  • UNIFIED MEDICAL LANGUAGE SYSTEM METATHESAURUS (UMLS-M) OF THE U. S. NATIONAL LIBRARY OF MEDICINE (USNLM).

  • MOST COMPREHENSIVE, PUBLICLY-AVAILABLE LIST OF STANDARDIZED MEDICAL TERMINOLOGY IN THE WORLD.

  • PUBLISHED FALSE-NEGATIVE RATES FOR UMLS-M: 1-2%.

  • WHAT IS THE FALSE-NEGATIVE RATE FOR PUBLICLY-AVAILABLE DERMATOPATHOLOGY TEXT?



  • 3. DESIGN.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.


  • DERMATOPATHOLOGY IMAGE-LEGENDS: ARMED FORCES INSTITUTE OF PATHOLOGY ELECTRONIC FASCICLES (AFIP-EF).

  • MELANOCYTIC AND NON-MELANOCYTIC TUMORS OF THE SKIN.

  • COMPUTER-ENCODED INTO UMLS, WITH ENRICHED SYNONYM LIST.

  • FALSE NEGATIVE: MEDICALLY-SIGNIFICANT TERM, PRESENT IN THE IMAGE-LEGEND BUT NOT CAPTURED BY ENCODING PROGRAM.

  • FALSE NEGATIVE: UMLS CONCEPT NOT PRESENT.

  • AMBIGUOUS TERMS AND COMPOUND TERMS CONTAINING SUBCONCEPTS INDEXED REDUNDANTLY.

  • BY DESIGN, ENCODER CAPTURED NO FALSE POSITIVES.



  • 4. UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS).


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.


  • UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS) : DEVELOPED BY U.S. NATIONAL LIBRARY OF MEDICINE (USNLM) IN 1986.

  • PURPOSE: AID DEVELOPMENT OF SYSTEMS TO RETRIEVE ELECTRONIC BIOMEDICAL INFORMATION.

  • http://www.nlm.nih.gov/research/umls/

  • LAST UPDATED: March 19, 1999.

  • SIZE: 96,412,092 BYTES.

  • CONCEPT UNIQUE IDENTIFIERS (CUIs): 625,530, MAX=C0700344.

  • SYNONYMS: 1,362,823.

  • OVER 50 SOURCE-VOCABULARIES.



  • 5. REDUNDANT INDEXING OF SUBCONCEPTS.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.

    CELLULAR BLUE NEVUS REDUNDANTLY INDEXED AS:


  • CELLULAR BLUE NEVUS (C0334448).

  • BLUE NEVUS (C0206736).

  • CELL (C0007634).

  • BLUE (C0332584).

  • NEVUS (C0027960).



  • 6. BARRIER WORD METHOD.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.


  • NATURAL-LANGUAGE MEDICAL TEXT: SEQUENCE OF MEDICAL CONCEPTS SEPARATED BY GRAMMATICAL OBJECTS.

  • THE GRAMMATICAL OBJECTS, OR BARRIER WORDS: NUMERALS, PUNCTUATION, SINGLE LETTERS, ARTICLES, PREPOSITIONS, AND COMMON VERBS AND MODIFIERS.

  • MEDICAL CONCEPTS, OR KEYWORDS: ARE ONE-WORD OR MULTIPLE-WORD TERMS CONSISTING OF MEDICALLY SIGNIFICANT WORDS.



  • 7. BARRIER WORD METHOD: SAMPLE TEXT.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.
    LENTIGINOUS COMPOUND NEVUS . this LESION is an EARLY COMPOUND NEVUS , because a NEST has MIGRATED from the EPIDERMIS into the DERMIS ( lower right of c ) . elsewhere , the HISTOLOGY is that of a SIMPLE LENTIGO .


  • barrier words in lower case.

  • KEYWORDS IN UPPER CASE.



  • LEGEND
    NAME
    UMLS
    CODE
    UMLS
    NAME
    LENTIGINOUS C0023321 Lentigo
    COMPOUND NEVUS C0259781 Compound Nevus
    LESION C0012634 Lesion
    EARLY C0205085 Early
    COMPOUND NEVUS C0259781 Compound Nevus
    NEST C0205234 Focal
    MIGRATED C0232902 Migration
    EPIDERMIS C0014520 Epidermis
    DERMIS C0011646 Dermis
    LOWER C0205104 Inferior
    RIGHT C0205090 Right
    HISTOLOGY C0019638 Histologic
    SIMPLE LENTIGO C0302255 Lentigo Simplex






    8. AMBIGUITIES IN UMLS.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.


  • ADNEXA WITHOUT NEARBY DISAMBIGUATING WORD:


  • SKIN ADNEXA (C0221943)

  • UTERINE ADNEXA (C0001575)

  • OCULAR ADNEXA (C0229243)



  • 9. RESULTS.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.


  • 446 IMAGE-LEGENDS FROM THE AFIP-EF ON MELANOCYTIC AND NON-MELANOCYTIC TUMORS OF THE SKIN.

  • 2,140 DISTINCT WORDS, AND 8,334 UMLS CONCEPTS.

  • AVERAGE 18.7 = 446/8,334 UMLS CONCEPTS PER LEGEND.

  • FREQUENCY RANGE: FROM THREE CONCEPTS IN THE LEAST-INDEXED LEGEND TO 57 CONCEPTS IN THE MOST-INDEXED LEGEND.

  • 260 FALSE-NEGATIVE TERMS; FALSE-NEGATIVE RATE, 3.1% = 260/8,334.



  • 10. DISCUSSION.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.


  • 260 FALSE-NEGATIVE TERMS, FALSE-NEGATIVE RATE OF 3.1%.

  • FALSE-NEGATIVE UMLS CONCEPTS TENDED TO BE DESCRIPTIVE TERMS IN DERMATOPATHOLOGY THAT CHARACTERIZE MICROSCOPIC FINDINGS.

  • SIMILAR TO THE REPORTED FINDINGS FOR GENERAL MEDICAL TEXT.

  • UMLS: NEARLY-COMPREHENSIVE METATHESAURUS FOR DERMATOPATHOLOGY TEXT.

  • MISSING CONCEPTS COULD BE SUGGESTED FOR INCORPORATION INTO FUTURE UPDATES OF UMLS.



  • 11. REFERENCES.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.

  • 1. UMLS Knowledge Sources. 9th edition. 1998. DOCUMENTATION. National Institutes of Health. National Library of Medicine. Bethesda, Maryland 20854.

  • 2. College of American Pathologists. Systematized Nomenclature of Human and Veterinary Medicine (SNOMED International). College of American Pathologists, Northfield, IL, 1993.

  • 3. Berman JJ, Moore GW.
    SNOMED-encoded surgical pathology databases: A tool for epidemiologic investigation.
    Mod Pathol. 1996 Sep;9(9):944-950.

  • 4. Silverberg SG.
    SNOMED-encoded surgical pathology databases: 's no big deal - or is it?
    Mod Pathol. 1996 Sep;9(9):953-954.

  • 5. Moore GW, Berman JJ.
    Automatic SNOMED coding.
    Proc Annu Symp Comput Appl Med Care. 1994;18:225-229.

  • 6. Moore GW, Berman JJ.
    Performance analysis of manual and automated systematized nomenclature of medicine (SNOMED) coding.
    Am J Clin Pathol. 1994 Mar;101(3):253-256.

  • 7. Berman JJ, Moore GW.
    Object-oriented controlled-vocabulary translator using TRANSOFT + HyperPAD.
    Proc Annu Symp Comput Appl Med Care. 1991;15:973-975.

  • 8. Berman JJ, Moore GW, Donnelly WH, Massey JK, Craig B.
    A SNOMED analysis of three years accessioned cases (40,124) of a surgical pathology department: implications for pathology-based demographic studies.
    Proc Annu Symp Comput Appl Med Care. 1994;18:188-192.

  • 9. Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM.
    A prototype Internet autopsy database. 1625 consecutive fetal and neonatal autopsy facesheets spanning 20 years.
    Arch Pathol Lab Med. 1996 Aug;120(8):782-785.

  • 10. Berman JJ, Moore GW, Hutchins GM.
    Internet autopsy database.
    Hum Pathol. 1997 Apr;28(4):393-394.

  • 11. Moore GW, Miller RE, Hutchins GM. Indexing by MeSH titles of natural language pathology phrases identified on first encounter using the barrier word method. In: Scherrer JR, Côté RA, Mandil SH, eds. Computerized Natural Medical Language Processing for Knowledge Representation. Amsterdam: North-Holland; pp 29-39, 1989.

  • 12. Murphy GF, Elder DA.
    Armed Forces Institute of Pathology Atlas of Tumor Pathology. Non-Melanocytic Tumors of the Skin.
    Electronic Fascicle version 2.0. Washington, D.C. Armed Forces Institute of Pathology.

  • 13. Elder DA, Murphy GF.
    Armed Forces Institute of Pathology Atlas of Tumor Pathology. Melanocytic Tumors of the Skin.
    Electronic Fascicle version 2.0. Washington, D.C. Armed Forces Institute of Pathology.

  • 14. Ackerman AB.
    Histologic Diagnosis of Inflammatory Skin Diseases. A Method by Pattern Analysis.
    Philadelphia: Lea & Febiger. 1978.

  • 15. McKee PH.
    Pathology of the Skin, with clinical correlations.
    Philadelphia: J.B. Lippincott Co. 1989.

  • 16. Ghatan HEY.
    Dermatological Differential Diagnosis and Pearls.
    New York: The Parthenon Publishing Group. 1994.

  • 17. Mehregan AH.
    Pinkus Guide to Dermatohistopathology. Fourth Edition.
    Norwalk, CT: Appleton-Century-Crofts. 1986.

  • 18. Hood AF, Kwan TH, Burnes DC, Mihm MC.
    Primer of Dermatopathology.
    Boston: Little, Brown and Company. 1984.

  • 19. Lever WF, Schaumburg-Lever G.
    Histopathology of the Skin. Seventh Edition.
    Philadelphia: J.B.Lippincott Company. 1990.

  • 20. Farmer ER, Hood AF.
    Pathology of the Skin.
    Norwalk, CT: Appleton & Lange. 1990.



    12. ZIPF DISTRIBUTION OF UMLS CUIS.


    NEXT PAGE.
    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.

    RANK FREQUENCY UMLS
    CODE
    UMLS
    NAME
    1 198 C0007634 CELL
    2 147 C0012634 DISEASE
    3 132 C0011646 DERMIS
    4 117 C0027651 NEOPLASIA
    5 116 C0441469 PICTURE
    6 98 C0332285 ARISE FROM
    7 97 C0007952 CHARACTER
    8 83 C0205234 FOCAL
    9 81 C0205165 LESS
    10 75 C0014520 EPIDERMIS
    TISSUE
    11 71 C0025202 CUTANEOUS
    MELANOMA
    12 71 C0205615 DIFFERENTIATE
    13 68 C0027960 NEVUS
    14 68 C0205124 SUPERFICIAL
    15 66 C0028259 NODULE
    16 66 C0334094 PROLIFERATE
    17 64 C0205182 ATYPIA
    18 63 C0205160 NEGATIVE
    19 57 C0205210 CLINICAL
    20 55 C0332448 INFILTRATE
    21 53 C0221928 DERMAL
    22 52 C0205286 MATURE
    23 51 C0004083 ASSOCIATED
    24 51 C0221920 EPIDERMAL
    FEATURE
    25 51 C0348078 FORM
    26 47 C0205397 SEEN
    27 46 C0205125 DEEP
    28 46 C0205250 ELEVATED
    29 46 C0018270 GROWTH
    30 43 C0205390 PHASE
    31 41 C0205183 BENIGN
    32 41 C0439739 RETICULAR
    33 41 C0205392 SOME
    34 40 C0205402 PROMINENT
    35 39 C0233426 APPEARANCE
    36 39 C0205103 BETWEEN
    37 39 C0005971 BONE PLATE
    38 39 C0439712 PATTERN
    39 39 C0333610 PIGMENT
    40 39 C0150312 PRESENT
    41 39 C0441633 SCANNING
    42 38 C0162597 STROMAL
    CELL
    43 38 C0221908 EPITHELIAL
    FEATURE
    44 37 C0205146 AREA
    45 37 C0009325 COLLAGEN
    46 37 C0205164 GREAT
    47 37 C0025201 MELANOCYTE
    48 36 C0502379 DUCT
    49 36 C0332258 EXTEND
    50 36 C0205431 FORMED
    51 35 C0205428 AFFECTING
    52 33 C0205085 BEFORE
    53 32 C0205112 BASAL
    54 32 C0205099 CENTER
    55 32 C0443228 GREATER
    56 32 C0205172 MANY
    57 32 C0442038 RADIAL
    58 31 C0205284 BORDER
    59 30 C0009085 CLUSTER
    60 30 C0006826 CANCER
    61 30 C0205419 VARIANT
    62 29 C0037267 SKIN
    63 28 C0013879 ELEMENT
    64 28 C0010709 CYST
    65 28 C0014609 EPITHELIUM
    TISSUE
    66 28 C0205091 LEFT
    67 28 C0205171 SINGLE
    68 28 C0035621 RIGHT
    69 27 C0205113 CIRCULAR
    70 27 C0010834 CYTOPLASM
    71 27 C0521125 FOR
    72 27 C0205312 PAPILLARY
    73 27 C0030705 PATIENT
    74 26 C0024264 LYMPHOCYTE
    75 26 C0439064 NUMEROUS
    76 25 C0205748 DYSPLASTIC
    NEVUS
    77 25 C0332261 SPREAD
    78 25 C0332516 SYMMETRIC
    79 24 C0007117 BASAL CELL
    CARCINOMA
    80 24 C0449432 COMPONENT
    81 24 C0016059 DESMOPLASIA
    82 24 C0011900 DIAGNOSES
    83 24 C0332184 RARE
    84 24 C0445247 SAME
    85 24 C0042591 VESSEL
    86 23 C0205105 ABOVE
    87 23 C0391854 OVERLYING
    88 23 C0332461 PATCH
    89 23 C0205375 UNIFORM
    90 23 C0332307 TYPE
    91 22 C0332183 FREQUENT
    92 22 C0205393 MOST
    93 22 C0521447 NUCLEAR
    94 22 C0332251 PREDOMINANT
    95 21 C0003737 ARCHITECTURE
    96 21 C0178499 BASE
    97 21 C0238767 BILATERAL
    98 21 C0006382 HIS
    99 21 C0332120 EVIDENCE FOR
    100 21 C0439682 FOLLICULAR
    FREQUENCY DISTRIBUTION OF
    100 MOST FREQUENT UMLS CONCEPTS
    IN AFIP SKIN LEGEND-TEXTS.




    13. FALSE NEGATIVE
    DERMATOPATHOLOGY CONCEPTS.


    PREVIOUS PAGE.
    RETURN TO TABLE OF CONTENTS.

    CONCEPTS FIRST ADDED IN 1999, ABSENT IN 1998 UMLS.


  • ECCRINE POROCARCINOMA (C0547063).

  • PERICYTE (C0598800).

  • SPINDLE CELL (C0682540).
  • TRUE FALSE NEGATIVE CONCEPTS.


  • ENDOPHYTIC (GROWTH BENEATH SURFACE).

  • EPIDERMAL COLLARETTE (OF PYOGENIC GRANULOMA).

  • EXOPHYTIC (GROWTH ABOVE SURFACE).

  • GLABROUS SKIN (NON-HAIRY SKIN: SOLES & PALMS).

  • GRENZ ZONE (BORDER ZONE BETWEEN EPIDERMIS AND DERMIS)

  • KAMINO BODY (= GLOBOID BODY, OF SPITZ NEVUS).

  • MELANOPHAGE (MACROPHAGES CONTAINING MELANIN).

  • PAGETOID SPREAD (IN MELANOMA, PAGET'S DISEASE).

  • PALISADING CELLS (IN BASAL CELL CARCINOMA, VEROCAY BODY).

  • PARAKERATOTIC SPIRE (= CHURCH SPIRE, PARAKERATOTIC MOUND, IN VERRUCA).

  • SQUAMOUS EDDY (OF IRRITATED SEBORRHEIC KERATOSIS).

  • STARRY SKY (IN RAPIDLY GROWING LYMPHOID TISSUE)

  • VEROCAY BODY (OF SCHWANNOMA).



  •