UNIFIED MEDICAL LANGUAGE SYSTEM METATHESAURUS (UMLS-M) OF THE U. S. NATIONAL LIBRARY OF MEDICINE (USNLM). MOST COMPREHENSIVE, PUBLICLY-AVAILABLE LIST OF STANDARDIZED MEDICAL TERMINOLOGY IN THE WORLD. WHAT IS THE CONCORDANCE RATE FOR CLASSICAL EMBRYOLOGY TEXT?
G. L. STREETER: DEVELOPMENTAL HORIZONS IN HUMAN EMBRYOS.
HEUSER & CORNER: STAGE 9.
O'RAHILLY: STAGES 1-8.COMPUTER-ENCODED INTO UMLS, WITH ENRICHED SYNONYM LIST. CONCORDANCE: MEDICALLY-SIGNIFICANT TERM, PRESENT IN THE TEXTS, ALSO CAPTURED BY ENCODING PROGRAM. FALSE NEGATIVE: UMLS CONCEPT NOT PRESENT. AMBIGUOUS TERMS AND COMPOUND TERMS CONTAINING SUBCONCEPTS INDEXED REDUNDANTLY. BY DESIGN, ENCODER CAPTURED NO FALSE POSITIVES.
UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS) : DEVELOPED BY U.S. NATIONAL LIBRARY OF MEDICINE (USNLM) IN 1986. PURPOSE: AID DEVELOPMENT OF SYSTEMS TO RETRIEVE ELECTRONIC BIOMEDICAL INFORMATION. http://www.nlm.nih.gov/research/umls/ LAST UPDATED: January 1, 2000. METATHESAURUS SIZE: 113,699,627 BYTES. CONCEPT UNIQUE IDENTIFIERS (CUIs): 729,248, MAX=C0813178, RETIRED=83,930. SYNONYMS: 1,598,176 OVER 50 SOURCE-VOCABULARIES. OVER 20 PARTIAL TRANSLATIONS INTO FOREIGN LANGUAGES.
CELLULAR BLUE NEVUS (C0334448). BLUE NEVUS (C0206736). CELL (C0007634). BLUE (C0332584). NEVUS (C0027960).
NATURAL-LANGUAGE MEDICAL TEXT: SEQUENCE OF MEDICAL CONCEPTS SEPARATED BY GRAMMATICAL OBJECTS. THE GRAMMATICAL OBJECTS, OR BARRIER WORDS: NUMERALS, PUNCTUATION, SINGLE LETTERS, ARTICLES, PREPOSITIONS, AND COMMON VERBS AND MODIFIERS. MEDICAL CONCEPTS, OR KEYWORDS: ARE ONE-WORD OR MULTIPLE-WORD TERMS CONSISTING OF MEDICALLY SIGNIFICANT WORDS.
GUT TRACT and its DERIVATIVES . at this same time the PHARYNGEAL POUCHES , which heretofore have been relatively simple LATERAL EXPANSIONS of GUT EPITHELIUM intervening between the AORTIC ARCHES , are taking the form of SPECIALIZED STRUCTURES . one can RECOGNIZE the beginning TRANSFORMATION into an AUDITORY TUBE and TYMPANUM , also the PRIMORDIA of the THYMUS , LATERAL THYROID , and SUPERIOR and INFERIOR PARATHYROID GLANDS.barrier words in lower case. KEYWORDS IN UPPER CASE.
TEXT NAME UMLS CUI
GUT TRACT C0699818 C0332208*
and C0332287*
its C0027344*
DERIVATIVES C0243070
at C0332285*
this C0205435*
same C0445243
time C0040213
the C0205435*
PHARYNGEAL POUCHES C0231067*
which C0043237*
heretofore C0332152*
have C0605770*
been C0392148*
relatively C0205345*
simple C0205347
LATERAL EXPANSIONS C0205091 C0205229*
of C0456627
GUT EPITHELIUM C0699818 C0014603
intervening C0205102
between C0205102
the C0205435*
AORTIC ARCHES C0442005
are C0392148*
taking
the C0205435*
form C0376315
of C0456627
SPECIALIZED STRUCTURES C0205548 C0678594*
one C0205429
can C0808716
RECOGNIZE C0524637*
the C0205435*
beginning C0439657
TRANSFORMATION C0040682
into C0332285
an C0205447*
AUDITORY TUBE C0439822 C0175730
and C0332287*
TYMPANUM C0242251
also C0332287*
the C0205435*
PRIMORDIA C0678727*
of C0456627
the C0205435*
THYMUS C0496916
LATERAL THYROID C0205091 C0795756
and C0332287*
SUPERIOR C0205103
and C0332287*
INFERIOR PARATHYROID GLANDS C0678975 C0030518 C0225352
ADNEXA WITHOUT NEARBY DISAMBIGUATING WORD: SKIN ADNEXA (C0221943) UTERINE ADNEXA (C0001575) OCULAR ADNEXA (C0229243)
INPUT TEXT: 1.26 MB. 110,314 WORDS, 9,087 DISTINCT WORDS. 5,323 (4.8%) MISSPELLINGS (OPTICAL MISTRANSLATIONS). WORDS RANGED IN FREQUENCY FROM 10,394 OCCURRENCES OF 'THE', TO 4,776 WORDS OCCURRING ONLY ONCE. AVERAGE: 12.1 = 110,314/9,087 OCCURRENCES PER WORD. 401 COLLOCATIONS WITH EXACT OR APPROXIMATE UMLS MATCHES. AMONG CORRECTLY SPELLED WORDS: 48,758 (46.4%) EXACT MATCHES TO A UMLS SYNONYM; 46,250 (44.0%) ADDITIONAL, APPROXIMATE MATCHES TO UMLS CUIS. 9.5% UNMATCHED CONCEPTS.
9.5% UNMATCHED CONCEPTS. FALSE-NEGATIVE UMLS CONCEPTS TENDED TO BE DESCRIPTIVE TERMS IN EMBRYOLOGY THAT CHARACTERIZE MICROSCOPIC FINDINGS. UMLS: NEARLY-COMPREHENSIVE METATHESAURUS FOR EMBRYOLOGY TEXT.
LEXICAL VARIANTS: NUCLEI ==> CELL NUCLEUS. OBVIOUS SYNONYMS: CLUSTER ==> AGGREGATE. OBVIOUS MISSPELLINGS: WILM'S ==> WILMS'.
BRONCHITS ==> BRONCHITIS.OBVIOUS CONTRACTIONS: ADDISON ==> ADDISON'S DISEASE.
CUSHING ==> CUSHING'S DISEASE.
SQUAMOUS ==> SQUAMOUS CELL.COMPOUNDS: WITHOUT ==> NEGATIVE-WITH.
RANK FREQUENCY WORD UMLS CUI
1 10,394 the C0205435*
2 5,441 of C0456627
3 3,574 in C0439203
4 3,123 and C0332287*
5 1,982 to C0332285*
6 1,947 is C0441912
7 1,042 that C0205435*
8 959 are C0392148*
9 947 by C0336807
10 919 as C0003818
11 919 be C0014121
12 903 it C0027361*
13 756 this C0205435*
14 695 from C0332285*
15 602 which C0043237*
16 597 mm C0439266
17 560 with C0332287
18 545 at C0332285*
19 541 embryos C0013935
20 534 cells C0007625
21 533 no C0205160*
22 515 age C0001774
23 505 embryo C0013932
24 438 group C0441832
25 422 stage C0684248
26 417 one C0205429
27 404 an C0205447*
28 403 for C0521117
29 389 its C0027344*
30 388 or C0332270*
31 372 has C0605674
32 348 on C0332285*
33 339 not C0205160*
34 330 been C0392148*
35 302 these C0205392*
36 296 form C0376315
37 295 more C0205171
38 294 can C0808716
39 294 fig C0349932
40 288 human C0020102
41 287 have C0605770*
42 269 embryonic C0521444
43 269 plate C0005971
44 268 specimens C0370003*
45 247 figure C0441469*
46 240 into C0332285
47 238 their C0027361*
48 234 was C0392148*
49 233 primitive C0033153*
50 230 shown C0332265*
RANK FREQUENCY WORD
1 101 stalk
2 52 way
3 48 anat
4 48 germ
5 46 pit
6 44 wash
7 43 prechordal
8 43 until
9 42 could
10 42 pairs
11 39 order
12 37 presumed
13 36 ones
14 36 taken
15 35 bars
16 34 cord
17 33 profile
18 32 come
19 32 shell
20 31 free
21 30 chordal
22 30 example
23 29 details
24 29 passage
25 29 polar
26 28 neuropore
27 28 sharply
28 27 consists
29 27 how
30 27 intercellular
31 27 lacunae
32 27 takes
33 27 tubal
34 26 epiblast
35 26 instead
36 25 detail
37 25 folds
38 25 just
39 25 meso
40 25 partly
41 24 gelatinous
42 24 manner
43 24 owing
44 23 cited
45 23 conspicuous
46 23 quite
47 22 field
48 22 particular
49 22 particularly
50 22 proper
RANK FREQUENCY TERM UMLS CUI
1 2,613 of the C0332285*
2 1,037 in the C0332285*
3 346 from the C0332285*
4 267 age group C0596048
5 147 has been C0392148*
6 141 of this C0332285*
7 127 for the C0521125*
8 123 yolk sac C0042893
9 110 embryonic disc C0231003
10 100 primitive streak C0033153
11 93 into the C0332285*
12 78 have been C0392148*
13 78 there is C0332287*
14 69 through the C0332273*
15 59 chorionic cavity C0230966
16 58 between the C0205103*
17 55 of these C0332285*
18 49 there are C0332287*
19 47 age groups C0027362
20 38 nervous system C0027763
21 36 neural tube C0231024
22 33 sinus venosus C0231084
23 32 the other C0205394*
24 30 central nervous system C0007679
25 29 chorionic villi C0008508
26 29 within the C0332285*
27 28 blood vessels C0005847
28 27 stage 5 C0441777
29 27 zona pellucida C0043519
30 26 referred to C0205543
31 24 in addition C0332287*
32 24 over the C0205136*
33 22 cloacal membrane C0231056
34 22 vascular system C0489903
35 20 along the C0205428*
36 19 due to C0678226
37 19 in addition to C0332287
38 18 site of C0449643
39 18 stage 2 C0441767
40 18 stage 3 C0441771
41 17 about the C0475806*
42 17 under the C0542339*
43 16 amniotic cavity C0230976
44 16 as well as C0332287*
45 16 rather than C0489693*
46 15 blood cells C0005773
47 14 chick embryo C0008046
48 14 germ cells C0017471
49 14 in vitro C0021135
50 14 of that C0332285*