UNIFIED MEDICAL LANGUAGE SYSTEM METATHESAURUS (UMLS-M) OF THE U. S. NATIONAL LIBRARY OF MEDICINE (USNLM). MOST COMPREHENSIVE, PUBLICLY-AVAILABLE LIST OF STANDARDIZED MEDICAL TERMINOLOGY IN THE WORLD. WHAT IS THE CONCORDANCE RATE FOR GENERAL PATHOLOGY TEXT?
SINARD'S OUTLINES IN PATHOLOGY. 25 CHAPTERS, POPULAR REVIEW TEXT FOR PATHOLOGY BOARDS. ALL MAJOR AREAS OF ANATOMIC PATHOLOGY ARE COVERED. COMPUTER-ENCODED INTO UMLS, WITH ENRICHED SYNONYM LIST. CONCORDANCE: MEDICALLY-SIGNIFICANT TERM, PRESENT IN THE TEXTBOOK, AND ALSO CAPTURED BY ENCODING PROGRAM. FALSE NEGATIVE: UMLS CONCEPT NOT PRESENT. AMBIGUOUS TERMS AND COMPOUND TERMS CONTAINING SUBCONCEPTS INDEXED REDUNDANTLY.
UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS) : DEVELOPED BY U.S. NATIONAL LIBRARY OF MEDICINE (USNLM) IN 1986. PURPOSE: AID DEVELOPMENT OF SYSTEMS TO RETRIEVE ELECTRONIC BIOMEDICAL INFORMATION. http://www.nlm.nih.gov/research/umls/ LAST UPDATED: January 1, 2000. METATHESAURUS SIZE: 113,699,627 BYTES. CONCEPT UNIQUE IDENTIFIERS (CUIs): 729,248, MAX=C0813178, RETIRED=83,930. SYNONYMS: 1,598,176 OVER 50 SOURCE-VOCABULARIES. OVER 20 PARTIAL TRANSLATIONS INTO FOREIGN LANGUAGES.
CELLULAR BLUE NEVUS (C0334448). BLUE NEVUS (C0206736). CELL (C0007634). BLUE (C0332584). NEVUS (C0027960).
NATURAL-LANGUAGE MEDICAL TEXT: SEQUENCE OF MEDICAL CONCEPTS SEPARATED BY GRAMMATICAL OBJECTS. THE GRAMMATICAL OBJECTS, OR BARRIER WORDS: NUMERALS, PUNCTUATION, SINGLE LETTERS, ARTICLES, PREPOSITIONS, AND COMMON VERBS AND MODIFIERS. MEDICAL CONCEPTS, OR KEYWORDS: ARE ONE-WORD OR MULTIPLE-WORD TERMS CONSISTING OF MEDICALLY SIGNIFICANT WORDS.
LICHEN SIMPLEX CHRONICUS . CHRONIC FORM of any of above with IRRITATION and TRAUMA . EPIDERMIS undergoes a PSORIASIFORM THICKENING but with an increased THICKNESS of the GRANULAR LAYER . SCARRING and BROADENING of DERMAL PAPILLAE .barrier words in lower case. KEYWORDS IN UPPER CASE.
TEXT NAME UMLS CUI
LICHEN SIMPLEX CHRONICUS C0149922
CHRONIC FORM C0205179 C0376315
of C0456627
any C0205392*
of C0456627
above C0205103
with C0332287
IRRITATION C0441718
and C0332287*
TRAUMA C0548346
EPIDERMIS C0014518
undergoes
a C0205447*
PSORIASIFORM THICKENING C0033860* C0332527
but C0332287*
with C0332287
an C0205447*
increased C0205216
thickness C2005400
of C0456627
the C0205435*
GRANULAR LAYER C0205247 C0205274*
SCARRING C0036287
and C0332287*
broadening C0332464*
of C0456627
DERMAL PAPILLAE C0221927 C0205312*
ADNEXA WITHOUT NEARBY DISAMBIGUATING WORD: SKIN ADNEXA (C0221943) UTERINE ADNEXA (C0001575) OCULAR ADNEXA (C0229243)
INPUT TEXT: 951 KB, 25 CHAPTERS. 120,677 WORDS, 11,240 DISTINCT WORDS, FROM 4,037 OCCURRENCES OF 'OF', TO 4,512 WORDS OCCURRING ONLY ONCE. AVERAGE: 10.7 = 120,677/11,240 OCCURRENCES PER WORD. 3,520 DISTINCT COLLOCATIONS WITH EXACT OR APPROXIMATE UMLS MATCHES. 77,498 (64.2%) EXACT MATCHES TO A UMLS SYNONYM, AND 33,348 (27.6%) ADDITIONAL, APPROXIMATE MATCHES TO UMLS CUIS, 8.1% UNMATCHED CONCEPTS.
CONCORDANCE RATE: 90.9%. UNMATCHED CONCEPTS TENDED TO BE DESCRIPTIVE TERMS IN PATHOLOGY THAT CHARACTERIZE MICROSCOPIC FINDINGS. UMLS IS A HIGHLY INCLUSIVE CONCEPT SYSTEM FOR PATHOLOGY. HOWEVER, UMLS IS SYNONYM-POOR. MANY SYNONYMS MUST BE ADDED MANUALLY. UMLS: NEARLY-COMPREHENSIVE METATHESAURUS FOR PATHOLOGY TEXT.
LEXICAL VARIANTS: NUCLEI ==> CELL NUCLEUS. OBVIOUS SYNONYMS: CLUSTER ==> AGGREGATE. OBVIOUS MISSPELLINGS: WILM'S ==> WILMS'.
BRONCHITS ==> BRONCHITIS.OBVIOUS CONTRACTIONS: ADDISON ==> ADDISON'S DISEASE.
CUSHING ==> CUSHING'S DISEASE.
SQUAMOUS ==> SQUAMOUS CELL.COMPOUNDS: WITHOUT ==> NEGATIVE-WITH.
RANK FREQUENCY WORD UMLS CUI
1 3,950 of C0456627
2 2,591 in C0439203
3 2,387 and C0332287*
4 1,873 with C0332287
5 1,779 to C0332285*
6 1,562 the C0205435*
7 1,297 or C0332270*
8 1,256 cells C0007625
9 904 usually C0332183*
10 899 cell C0007634
11 847 may C0806904
12 711 be C0014121
13 682 by C0336807
14 681 most C0205381
15 604 are C0392148*
16 537 common C0205213
17 521 is C0441912
18 469 often C0332181
19 446 can C0808716
20 439 tumor C0027651
21 435 for C0521117
22 433 small C0700320
23 418 from C0332285*
24 406 disease C0012633
25 384 but C0332287*
26 383 carcinoma C0007095
27 369 not C0205160*
28 364 more C0205171
29 358 seen C0205395
30 344 tumors C0027651
31 334 large C0549176
32 333 type C0332307
33 322 aka C0332287*
34 321 have C0605770*
35 307 at C0332285*
36 298 on C0332285*
37 294 as C0003818
38 269 which C0043237*
39 267 no C0205160*
40 263 tissue C0040300
41 254 patients C0030704
42 248 malignant C0205282
43 245 present C0392743
44 242 associated C0004083*
45 240 also C0332287*
46 236 chronic C0205179
47 234 all C0444867
48 232 lesions C0221198
49 226 prognosis C0220901
50 222 age C0001774
RANK FREQUENCY WORD
1 48 cord
2 33 still
3 30 eventually
4 30 need
5 28 cords
6 27 palisading
7 27 particularly
8 26 must
9 25 plump
10 24 polygonal
11 24 represent
12 24 sharply
13 23 counterpart
14 23 germ
15 23 host
16 22 immunoblastic
17 22 remain
18 22 should
19 22 undergo
20 21 mantle
21 21 parenchyma
22 19 just
23 19 prone
24 19 unclear
25 18 villi
26 17 subendothelial
27 16 amino
28 16 arranged
29 16 background
30 16 excellent
31 16 intrahepatic
32 16 odontogenic
33 15 excess
34 15 glans
35 15 goblet
36 15 half
37 15 untreated
38 15 villous
39 14 independent
40 14 laden
41 14 outflow
42 14 subtypes
43 13 bundles
44 13 entity
45 13 extrahepatic
46 13 granulomatosis
47 13 intracytoplasmic
48 13 invariably
49 13 oncocytic
50 13 perineural
RANK FREQUENCY TERM UMLS CUI
1 479 of the C0332285*
2 244 in the C0332285*
3 196 associated with C0332281
4 101 due to C0678226
5 86 giant cells C0017526
6 82 plasma cells C0032112
7 78 smooth muscle C0026843
8 54 tumor cells C0431085
9 50 type i C0441729
10 49 autosomal dominant C0443147
11 49 clear cell C0229473
12 49 well differentiated C0205615
13 44 into the C0332285*
14 44 type ii C0441730
15 41 soft tissue C0225317
16 39 gi tract C0017189
17 38 germinal centers C0282491
18 38 low grade C0205080
19 37 absence of C0332197
20 37 autosomal recessive C0441748
21 35 connective tissue C0009780
22 35 squamous metaplasia C0025570
23 34 spindle cell C0682540
24 34 squamous cell C0221910
25 33 bile ducts C0005400
26 33 within the C0332285*
27 32 chronic inflammation C0021376
28 32 from the C0332285*
29 31 foci of C0205234
30 31 poor prognosis C0278252
31 31 rather than C0489693*
32 31 well defined C0442825
33 30 differential diagnosis C0220820
34 30 giant cell C0017526
35 28 basement membrane C0004799
36 28 good prognosis C0278250
37 28 high grade C0205082
38 28 renal failure C0035078
39 27 bone marrow C0005953
40 27 in situ C0444498
41 26 bile duct C0005400
42 26 lymph nodes C0154054
43 26squamous cell carcinoma C0007137
44 25 rheumatoid arthritis C0003873
45 24 cell type C0449475
46 24 soft tissues C0225317
47 23 but also C0332287*
48 22 blood vessels C0005847
49 22 inflammatory cells C0440752
50 22 of these C0332285*