http://www.netautopsy.org/ .
http://www.netautopsy.org/ .Each autopsy facesheet received by the Internet Autopsy Database (IAD) consists of a demographic listing, followed by individual medical terms, representing body-sites, disease-names, morphologies, or modifiers. A typical header for demographic information appears as follows, where <CRLF> denotes carriage-return-line-feed (ASCII 13 then ASCII 10): <CRLF>
H = hours.Race/ethnicity is, according to the U.S. Public Health Service, as follows:
D = days.
W = weeks.
M = months (NOT minutes).
Y = years (assumed as default).
Gender is:
B = Black, not of Hispanic origin.
W = White, not of Hispanic origin.
H = Hispanic.
A = Asian or Pacific Islander.
I = American Indian or Alaskan Native.
M = Multiracial or other.
F = female.Year of autopsy is the four-digit year, NOT abbreviated.
M = male.
U = undetermined.
<CRLF>CLINICAL HISTORY:<CRLF>.The anatomical diagnosis portion of the facesheet begins with the text:
<CRLF>ANATOMICAL DIAGNOSIS:<CRLF>.The cause of death portion of the facesheet begins with the text:
<CRLF>CAUSE OF DEATH:<CRLF>.Submitted text should be terse, without syntactical complexities, and should end with an unambiguous sentence-terminator. That is, the sentence-terminator should not appear anywhere on the facesheet except at the end of a sentence.
period-space-spaceIAD records are rendered anonymous so that neither the investigator, the IAD database administrator, nor the contributing institution alone can trace the identity of patients included in the IAD. First, the contributing institution strips or encodes patient identifiers from their submitted records, so that the IAD database administrator cannot know the identiity of the patient. The IAD database administrator then provides a new, encoded identifier for the IAD. The resulting record is anonymous to the institution that contributed the autopsy, as well as to the IAD database administrator and to anyone retrieving the autopsy record from the IAD web page. Anyone desiring further information, glass slides or tissue blocks from a particular case would e-mail the IAD database administrator, identifying the (doubly encoded) IAD autopsy record of interest and his/her research objective. The database administrator then decodes the published record number and restores the contributor record code provided by the contributing institution. The database administrator then forwards the institutionally coded record to the institution. At this point, the institution may decide to do nothing, or to establish a collaboration, with or without divulging the patient's identity, according to its own internal procedures.
period-carriagereturn-linefeed
semicolon-space
semicolon-carriagereturn-linefeed
by
with
showing
through
demonstrating
consistent with
no => negativeThird, the translator drops all letters in a sentence into lower case; removes all punctuation, numerals, 1-letter and 2-letter words; and removes all stop words, namely, articles, prepositions, conjunctions, common modifiers, and other low-information words [6]. These three steps leave behind a residual free-text, which can more readily be converted into SNOMED-compatible terms.
in situ => insitu
in vitro => invitro
21 trisomy => twentyonetrisomy
<CRLF> ###123456123456^67^W^M^1985^2^NONE<CRLF> CLINICAL HISTORY:<CRLF> Hypertension.Massive cardiomegaly. <CRLF> Heart failure. <CRLF> ANATOMICAL DIAGNOSIS: <CRLF>Hypertrophy and dilatation, left ventricular myocardium. <CRLF>Generalized atherosclerosis, severe. <CRLF>Abdominal visceral congestion. <CRLF>Pulmonary congestion. <CRLF>Pulmonic artery atherosclerosis. <CRLF>Focal pulmonary emphysema. <CRLF>Bronchopneumonia. <CRLF>Gallstones. <CRLF>Benign hyperplasia, prostate. <CRLF>Adenomatous polyp, rectum. <CRLF>Diverticula, colon. <CRLF>
###54321^67^W^M^1985^2^NONE^ Hypertensive disease, NOS^ ..... Massive^ Cardiomegaly^ ..... Heart failure, NOS^ ..... Hypertrophy, NOS^ Dilatation, NOS^ Left^ Ventricle, NOS^ Myocardium, NOS^ ..... Generalized^ Atherosclerosis, NOS^ Severe^ ..... Abdominal viscera, NOS^ Congestion, NOS^ .... Pulmonary congestion, NOS^ ..... Pulmonary artery, NOS^ Atherosclerosis, NOS^ ..... Focal^ Pulmonary emphysema, NOS^ ..... Bronchopneumonia, NOS^ ..... Biliary calculus, NOS^ ..... Benign^ Hyperplasia of prostate, NOS^ ..... Adenomatous polyp, NOS^ Rectum, NOS^ ..... Diverticulum, NOS^ Colon, NOS^ .....
Size of database in bytes 59,455,676 No. of cases 49,351 No. of sentence terminators 956,272 No. of SNOMED-compatible terms 2,905,520 No. of unique SNOMED-compatible terms 11,333
Number of patients in each decade...
0 - 9 years 16,425 0 - 19 years 1,839 20 - 29 years 2,665 30 - 39 years 3,833 40 - 49 years 5,412 50 - 59 years. 6,411 60 - 69 years 6,370 70 - 79 years 4,219 80 - 89 years 1,544 90 - 99 years 181 > 99 years 9 age unknown 443
1. Encode autopsy/patient identiers by the contributing institution and again by the IAD database administrator, so that each autopsy appears with a doubly encoded identifer number that cannot be linked to a patient by either the IAD database administrator, the contributing institution, or by any user of the IAD.
2. Include autopsy data from a worldwide collection of institutions, and omit the names of the contributing institutions.
3. Identify patient location only as the first digit of the postal zip code (in the case of U.S. autopsies), or as the multiple-digit international telephone exchange in the case of contributions from foreign countries.
4. Use a large database (in excess of 40,000 cases).
5. Omit the exact dates of autopsy and ages of patient autopsied (permitting only the age in years and year of autopsy).
6. Omit all free text, restricting pathologic findings to a listing of SNOMED-compatible terms derived from the original autopsy facesheet.