Send comments and correspondence to:
George.Moore4@va.gov
See also:
http://www.medparse.com/gwmcv.cv .............
SUMMARY OF ACADEMIC CONTRIBUTIONS:
MATHEMATICAL MODELS OF DISEASE PATHOGENESIS.
Contributions of over 180 papers published in peer-reviewed
journals, spanning an experience of over forty years
in the study of mathematical models of human disease.
The major subject headings are:
1. Separate.
2. Connect.
3. Estimate.
4. Swap.
5. Translate.
6. Harm.
7. Order.
and 8. Grow.
0. INTRODUCTION.
1. Separate.
In medical diagnosis and therapy, the first step is to
separate patients who need medical care
from those who do not. In the simplest case, if patients are measured
along a single, linear scale, then one may observe a histogram,
or distribution, of patients with respect to this measurement.
If this histogram is bimodal (i.e., has two peaks), then possibly
these peaks correspond to the absence (left) or presence (right)
of a medical condition.
654.
2. Connect.
The second step in medical care is to connect
one group of patients with another group. In the simplest case,
one may associate patients with many features in common; equivalently,
one may dissociate patients with few features in common.
660.
3. Estimate.
Considerable medical information is collected at varying levels
of uncertainty, from which one must
estimate
the true value state of the patient's illness
.............
4. Swap.
For data collected under uncertainty, there will often be a certain
proportion of misclassifications, either false negatives or
false positives.
................
swap
.............
5. Translate.
Medical histories, physical findings, and anatomic pathology reports,
are written in free-text, and will be so in the foreseeable future.
Therefore, we must translate
this essential information about a patient into a form that can be reviewed
for quality assurance and other administrative purposes.
6. Harm.
Medical ethics begins with Hippocrates' famous dictum: First do no
harm. In science, one pursues truth at all costs.
In patient care, the pursuit of truth in diagnosis must be tempered
by the harm that it causes to obtain the information. This fact is recognized
in all civilized societies by the profusion of regulations and laws
that prevent patient harm, or punish a heath-provider who causes
patient harm.
7. Order.
.............
order
.............
8. Grow.
.............
grow
.............
1.
Separate.
MINIMUM SQUARES RATIO METHOD.
In a bimodal histogram (i.e., histogram with two peaks), the question arises
whether two coherent clusters of data are present in the histogram:
654.
Coherence is understood here as low variance or
low sum-of-squares. That is, the sum-of-squares of the histogram
to the left of the dotted line plus the sum-of-squares of the histogram
to the right of the dotted line should be small relative to the
sum-of-squares of the entire histogram.
The MINIMUM SQUARES RATIO METHOD examines data that have been
separated at every possible dividing point along the x-axis
(horizontal-axis), and tests whether the resulting clusters are
significantly coherent. The TOTAL SUM OF SQUARES,
or SECOND MOMENT, denoted TSS, for the histogram as a whole,
is given by the expression:
TSS = ∑i (xi - X)2
where the ith histogram-block has value xi, and
X is the grand mean of all the n points, i.e.,
X = [∑i=1n xi] / n.
If one divides the histogram into a LEFT HALF and a
RIGHT HALF (dotted line), then the LEFT SUM OF SQUARES
is given by the expression:
LSS = ∑j (Lxj - LX)2
and the RIGHT SUM OF SQUARES is given by the expression:
RSS = ∑k (Rxk
- RX)2
where the Lxj are left-sided histogram-blocks;
the Rxk are right-sided histogram-blocks;
LX is the left-sided mean, i.e., LX
= [∑i=1n Lxi] / n;
and RX is the right-sided mean, i.e., RX
= [∑i=1n Rxi] / n.
The computer algorithm examines all possible left-right dividers,
and returns the divider with the MINIMUM SQUARES RATIO, MSR, defined
as MSR = (LSS+RSS)/TSS, for the minimum possible left-right divider.
A SIGNIFICANCE TEST for a given histogram of sample size n
is obtained by comparing the calculated MSR for that histogram
against a distribution of random histograms of the same sample size.
The distribution of random histograms of sample size n
is obtained by MONTE-CARLO SIMULATION
(Cashwell and Everett, 1959;
Moore and Berman, 1991;
Berman and Moore, 1992).
The standard distribution for comparison may be a normal distribution,
or any other suitable distribution. For example, a histogram of sample size
n=100 is compared to a Monte Carlo simulations drawn repeatedly
in sample sizes of n=100, from a normal (Gaussian) distribution.
The significance level at p=0.05 corresponds to the least 5%
of minimum-squares-ratios among the samples drawn.
It can be proved by ordinary algebra that if TSS > 0,
i.e., if not all histogram-blocks have the same value, then
0 < squares ratio < 1.
THEOREM.
Total sum-of-squares, TSS = 0, if and only if
all xi are equal.
PROOF: IF. By definition of grand mean,
X = [∑i=1n xi] / n =
x1 = x2 = ... = xi = ... =
xn.
Then TSS = ∑i (xi - X)2 =
∑i (xi - xi)2 = 0.
PROOF: ONLY IF.
..............
THEOREM.
The squares ratio lies between 0 and 1, i.e.,
0 < (LSS+RSS)/TSS < 1,
and strictly less than 1 if at least two xi
are unequal.
NOTE.
The minimum squares ratio for the uniform distribution is 1/2.
The minimum squares ratio for the normal distribution (large sample) is
approximately 0.38.
COMPUTER PROGRAMS.
A computer program for performing MSR was originally described in:
Albert S, Wolf PL, Pryjma I, Moore W.
Thymus development in high- and low-leukemic mice.
J Reticuloendothel Soc. 1965;2:218-237.
Albert S, Wolf PL, Loud AV, Pryjma I, Potter R, Moore W.
Spleen development in mice and high- and low-leukemic strains.
J Reticuloendothel Soc. 1966;3:176-201.
and later in:
Moore GW, Berman JJ, Sydnor DL.
Automated edge detection in image analysis:
distinguishing the nucleus from the cytoplasm
without a user's threshold estimate.
Am J Clin Pathol. 1994;102:539.
http://www.netautopsy.org/ascpedge.htm
Moore GW, Berman JJ, Moore GW, Brown LA.
Software for image segmentation and analysis in pathology (ISAP):
public domain image software and source code developed at the Baltimore
VA Medical Center.
Am J Clin Pathol. 1994;102:538-539.
http://www.netautopsy.org/ascpisap.htm
The program is U. S. Government work, uncopyrighted, available in the
public domain, available in
Microsoft®
Visual Basic® or Perl source code. See:
Microsoft® Visual Basic® version:
http://www.medparse.com/isapvisb.htm
Perl version:
http://www.medparse.com/isapver2.htm
For a histogram (univariate random variable), divide the distribution
at all possible points along the x-axis, and calculate the squares ratio
SR as:
SR = {[∑i (Lxi
- LX)2] + [∑j
(Rxj - RX)2]}
/ {[∑i (Lxi - X)2] +
[∑j (Rxj - X)2]}.
where the Lxi are left-sided histogram-blocks;
Rxj are right-sided histogram-blocks;
LX is the left-sided mean; RX
is the right-sided mean; and
X is the grand mean.
The minimum squares ratio, SLR determines
the best left-right separation of histogram-blocks. Method used for
demonstrating the appearance of two populations in experimental studies
of murine leukemia.
2.
Connect.
SET THEORY/GRAPH THEORY APPROACH
TO MOLECULAR EVOLUTION.
In evolutionary biology, a common ancestor for all animal species,
and perhaps for all living species, is inferred from their
shared genetic elements, or genes, which in turn give rise
to protein products. Therefore, species with a more recent
common ancestor can be expected to share more common genes
and gene-products than more distantly separated species.
Many of the same mathematical models used in evolutionary theory
have applications in the study of cell growth, differentiation,
and cancer (=unbounded cell growth).
In mathematical set theory, the shared elements in a pair of sets
is the INTERSECTION, denoted ∩, of the sets.
Ouchterlony immunodiffusion plates are used
to detemine the amount of immunoglobins not shared in the serum
for a pair of species.
A sparse matrix of such data-elements can be solved for
a ranking of species-common-ancestors
only if the Leontief matrix is non-singular.
Wassily Leontief was awarded a Nobel Prize in Economics in 1973,
for his work in sparse matrices of industrial output data,
showing how various segments of the industrial economy interact
with one another.
Idea that immunoglobin proteins may be regarded as members of a
mathematical set; and that the hierarchy (graph) of set-intersections
corresponds to molecular evolutionary distance between species.
Method used for demonstrating the close molecular relationship
between humans and great ape species.
Moore GW.
A Mathematical Model for the Construction of Cladograms.
North Carolina State University.
Institute of Statistics. Mimeograph Series No. 731 (1971).
Ph.D. Dissertation.
Abstract and Full Text:
http://www.netautopsy.org/mathclad.htm
Method also used for demonstrating the hierarchical distribution
of metastases in human cancers.
3.
Estimate.
FORMALISM OF SUTTON'S LAW.
Sutton's Law, named after the notorious bank robber, Willie Sutton,
is the assertion that in the face of uncertainty, one should choose
the most likely alternative ("go where the money is").
Mathematical logic is an appealing formalism
in pathology informatics, because of its superficial resemblance
to ordinary reasoning, as might be seen in pathology reports:
either X or Y is true; both X and Y are true;
if X then Y, etc. Even the syntax of logic is similar to that of
declarative sentences in natural language.
Logic has the additional advantage over natural language that logic
must be consistent: inconsistencies are readily detected by routine
computing methods.
In efforts to apply the classical mathematical logic of Aristotle and Boole,
one faces a paradox when an unlikely event occurs. That is, one makes
a diagnosis and offers therapy based upon incomplete data, which may
subsequently be overturned by additional data. In classical logic,
Aristotle's Law of Contradiction states that if there is
any contradiction in a mathematical system, i.e., a diagnosis
that is both true and false, then anything is true. (Latin:
Ex Falso Quod Libet; From a contradiction, whatever you please.)
Logically, there is a paradox when the most likely alternative
is contradicted by subsequent medical events, including autopsy findings.
Formalism of Sutton's law generalizes classical (Aristotelian/Boolean)
symbolic logic by removing the Law of Contradiction
(Ex Falso Quod Libet; Latin: if contradiction, then anything goes).
Akin to fuzzy symbolic logic.
Method used in describing congenital heart malformations,
organelle pathology, and gynecologic cytopathology screening.
4.
Swap.
DESIGNER CONTINGENCY TABLES:
TOKEN SWAP TEST OF SIGNIFICANCE.
In comparing a new medical test, or HEURISTIC, against
an established GOLD-STANDARD, one may collect patient-observations
in a 2×2 CONTINGENCY TABLE, also known as a
MISCLASSIFICATION MATRIX or CONFUSION MATRIX ,
This table or matrix (2 rows, 2 columns), has
NUMBERS OF PATIENTS listed in each row-column box, or
CELL, of the table (Cios, 2006).
In the following example:
Heuristic:
Gold Standard ↓ | No | Yes |
| No | 650 | 150 |
| Yes | 150 | 50 |
there are
650 patients in the upper-left cell;
150 patients in the upper-right cell;
150 patients in the lower-left cell;
and 50 patients in the lower-right cell,
a total of 1000 patients.
Patients in the upper-left cell and patients in the lower-right cell
in this 2×2CT represent agreement between the gold-standard
and the heuristic; patients in the upper-right cell represent
FALSE POSITIVE PATIENTS (gold-standard-no, heuristic-yes);
and patients in the lower-left cell represent FALSE NEGATIVE PATIENTS
(gold-standard-yes, heuristic-no). Classically the NULL HYPOTHESIS
proposes/asserts that the gold-standard and heuristic are
STATISTICALLY INDEPENDENT of one another.
REJECTION OF THE NULL HYPOTHESIS suggests that the gold-standard
and heuristic are CORRELATED. We consder an ensemble of different
DESIGNER NULL HYPOTHESES and a novel
TOKEN SWAP MISCLASSIFICATION PARADIGM, which seems
more appropriate for medical reasoning.
PREVIOUS TEXT:
In its simplest form, a CONTINGENCY TABLE, also known as
a MISCLASSIFICATION MATRIX or CONFUSION MATRIX ,
is a 2×2 table or matrix (2 rows, 2 columns),
with NUMBERS OF PATIENTS listed in each row-column box, or
CELL, of the table (Cios, 2006).
For example, one thousand patients might be distributed as follows:
Heuristic:
Gold Standard ↓ | No | Yes |
| No | 650 | 150 |
| Yes | 150 | 50 |
This is a 2×2CT
with 650 patients in the upper-left cell;
150 patients in the upper-right cell;
150 patients in the lower-left cell;
and 50 patients in the lower-right cell,
a total of 1000 patients.
A 2×2 contingency table (2×2CT) also known as
misclassification matrix or confusion matrix,
is a 2×2 rectangular table, whose contents (cells)
contain numbers of patients, or tokens.
The two rows (no versus yes) correspond to a gold standard,
or best possible knowledge with respect to a particular disease;
the two columns (no versus yes) correspond to a heuristic test
for that disease. In classical statistics, one employs either the
chisquare (χ2) contingency test;
or the Fisher exact test. Both classical tests have a standard
null hypothesis (namely, that the gold standard is completely
independent of the heuristic values. In the token swap test
of significance, there is no set null hypothesis, and the user
may construct a designer null hypothesis to custom-fit
a particular medical application.
It may be more transparent/clear to regard
each cell as a CONTAINER, or SET, containing
PATIENTS (人) (人, "ren", is the Chinese ideogram
for person) or TOKENS corresponding to patients.
| . | No | Yes | Total |
| No | 人人人人
人人人人 | 人人 | 10
|
| Yes | 人人人 |
人人人 人人人 | 9
|
| Total | 8 | 200 | 10 |
Method used to examine pain crisis in sickle cell disease.
5.
Translate.
COMPUTER TRANSLATION OF PATHOLOGY REPORTS,
INCLUDING BARRIER WORD METHOD:
QUANTITATIVE NATURAL LANGUAGE PROCESSING.
MOORE'S THEORY OF ANATOMIC PATHOLOGY REPORTS
states that every well-formed anatomic pathology report
has an unambiguous (unique) mapping into a semantic model,
that encompasses all possible anatomic pathology reports.
The mapping is one-way: many well-formed anatomic pathology reports
may map into the same semantic-model-element.
The semantic model is a general hierarchy, that includes bodysite,
diagnosis (neoplastic, non-neoplastic), differentiation, size,
invasion, margins, metastases, and any consultative
and/or notification information. By defining a well-formed
anatomic pathology report as having a unique semantic-model-element,
then, in theory, every well-formed anatomic pathology report
has an exact PARSING FORMULA. Subsets of a given parsing formula
may also be valid, but a superset parsing formula always supersedes
any of its subsets. This superset principle is the basis for
both dictionary lookup and the computer parsing algorithm.
PREVIOUS TEXT.
One of the long-standing controversies in pathology informatics
is whether anatomic pathologists should write their diagnostic reports
in free-text (natural language), or should select diagnoses from a system
of pick-lists. Computer specialists and researchers have always preferred
pick-lists, because they are easier to organize and tabulate. The issue
has recently come to widespread attention because of controversial mandates
by the College and American Pathologists and the American College of Surgeons
for hospital accreditation
(Ackerman, 2004;
AJCC).
In these mandates, pathologists are required to issue SYNOPTIC REPORTS
on large specimens resected for cancer therapy. The current documentation
/regulation only demands that the required information be present in the
reports in some form. But the handwriting is on the wall: regulators
are demanding/want more structured reports.
Mene mene teckel upharsin: מנא
מנא תקל
ופרסין (Dan 5:25).
The proverbial handwriting on the wall.
Quantitative natural language processing (QNLP)
is computer-translation, using quantitative properties
of a natural language (English). It is assumed that
any pathology report is unambiguous with respect to a
medical semantic model for pathology reports.
6.
Harm.
FORMAL MEDICAL ETHICS
The scientific component of diagnostic medicine on the individual patient
consists of collecting data on the patient (history, physical examination,
laboratory tests, etc.), and inferring a diagnosis and indicated therapy.
At each step in the process, the patient must be persuaded that
the next step is necessary, and the patient must give consent.
Classical mathematical logic may be extended to include
additional operators for certainty ($), necessity (#),
and attempt (!). Medical investigations on the individual patient
should be PROACTIVE (do if you must) and HIPPOCRATIC
(first do no harm). That is, if a test or therapy
needs to be performed, then it should be attempted;
and if attempted, then the attempt should be justified.
Formally:
...............
7.
Order.
ORDER-LOGIC FOR PATHOGENESIS.
ORDER-LOGIC is the assertion/paradigm: (1) that the
entire medical reasoning component of pathology informatics,
not including image-recognition, may be expressed in the form
of hierarchical tables; and (2) that these tables may be tested
for consistency. The first part of this program has been outlined/sketched
in various textbooks of anatomic pathology
(Sinard, 1996;
Haber et al, 2002. The second part is as follows.
Let the Hebrew letter, aleph (
),
represent a LOGICAL ORIGIN, or ULTIMATE PARENT.
in a system of hierarchical reasoning. Then every PARENT
has one-or-more CHILDREN, including possibly itself,
and every child has exactly one parent. In tabular form,
the first child of each parent is placed in the cell immediately-below
and immediately-right of the parent. Additional children, if present,
are placed under the first child, so as not to have intervening blank rows.
For example, in this order-logic table,
has two children, namely, A and B; parent A
has two children, namely, C and D; and parent B
has two children, namely, E and F;
 | . | . |
| . | A | . |
| . | . | C |
| . | . | D |
| . | B | . |
| . | . | E |
| . | . | F |
The interpretation of this table in classical logic is:
∧ G-2
∧ G-1 ∧ G0 ⇒
C1 ∨ C2 ∨ ...
where ∧ denotes logical-and; ∨
denotes logical-inclusive-or; G0
denotes the parent;
G-i
denotes the parent of
G-i+1
and
Ci
denotes the
ith child of parent
G0
.
An order-logic table, satisfies a distributive property of logic:
THEOREM 1.
 | . | . |
| . | +A | . |
| . | . | +C |
| . | . | +D |
| . | +B | . |
| . | . | +C |
| . | . | +D |
is equivalent to
 | . | . |
| . | +C | . |
| . | . | +A |
| . | . | +B |
| . | +D | . |
| . | . | +A |
| . | . | +B |
THEOREM 1. PROOF.
The nandsets for the first table of Theorem 1 are:
{
, -A, -B},
{
, +A, -C, -D},
and {
, +B, -C, -D},
which imply {
, -C, -D}.
The nandsets for the second table of Theorem 1 are:
{
, -C, -D},
{
, +C, -A, -B},
and {
, +D, -A, -B},
which imply {
, -A, -B}.
Therefore, the two tables are equivalent.
A corollary is that
THEOREM 2.
 | . | . |
| . | +A | . |
| . | . | +B |
| . | . | -B |
| . | -A | . |
| . | . | +B |
| . | . | -B |
is vacuous.
THEOREM 2. PROOF.
The nandsets for the table of Theorem 2 are:
{
, -A, +A},
{
, +A, +B, -B},
and {
, -A, +B, -B}.
Every nandset (=Quine's nullity) containing both
+X and -X is vacuous.
Therefore, the entire table is vacuous.
8.
Growth.
INFINITE PAPILLOMA.
Cancer is defined formally as unbounded growth of cells,
or more accurately, cell growth bounded by injury to surrounding
(invasive) and/or distant (metastatic) tissues. An infinite papilloma
is defined formally as unbounded cell growth into a defined area,
possibly infinite.
REFERENCES.
1. Ackerman AB.
Protocols for the reporting of cutaneous melanoma.
Am J Clin Pathol. 2004 Nov;122(5):815-817.
Comment in: Am J Clin Pathol. 2004 Nov;122(5):817-818.
Discussion 818-819.
PMID: 15540388.
PubMed Entry
2. Cios KJ.
Assessment of the Generated Data Model.
2006;:. in press.
3. Haber MH, Gattuso P, Spitz DJ, David O.
Differential Diagnosis in Surgical Pathology.
Amsterdam: Elsevier Science. 2002;:.
ISBN 0-7216-9053-X, 1150 pages.
4. Sinard JH.
Outlines in Pathology.
New York: W. B. Saunders Company.
A Harcourt Health Sciences Company. 1996;:.
ISBN 0-7216-6341-9, 229 pages.
5. American Joint Committee on Cancer.
AJCC Cancer Staging Manual. Sixth Edition.
New York: Springer. 2004;:.
ISBN 0-387-95271-3, 421 pages.
6. Court C.
GMC finds doctors not guilty in consent case.
British Medical Journal. 1995;311:1245-146.
7. Cashwell ED, Everett CJ.
A Practical manual on the Monte Carlo Method for Random Walk Problems.
New York: Pergamon Press. 1959;:.
8. Berman JJ, Moore GW.
The role of cell death in the growth of preneoplastic lesions:
a Monte Carlo simulation model.
Cell Prolif. 1992 Nov;25(6):549-557.
PMID: 1457604.
PubMed Entry
Full Text of Article:
http://www.netautopsy.org/celdeath.htm
9. Berman JJ, Moore GW.
Spontaneous regression of residual tumour burden:
prediction by Monte Carlo simulation.
Anal Cell Pathol. 1992 Sep;4(5):359-368.
PMID: 1445794.
PubMed Entry
Full Text of Article:
http://www.netautopsy.org/sponregr.htm
10. Moore GW, Berman JJ.
Cell growth simulations predicting polyclonal origins
for 'monoclonal' tumors.
Cancer Lett. 1991 Nov;60(2):113-119.
PMID: 1933835.
PubMed Entry
Full Text of Article:
http://www.netautopsy.org/monoclon.htm
Public-domain source code:
http://www.netautopsy.org/monoclon.htm#table1
11. Moore GW, Hutchins GM, Miller RE.
Token swap test of significance for serial medical data bases.
Am J Med. 1986 Feb;80(2):182-190.
PMID: 3511687.
PubMed Entry
12. Moore GW, Hutchins GM, Miller RE.
A new paradigm for hypothesis testing in medicine,
with examination of the Neyman Pearson condition.
Theor Med. 1986 Oct;7(3):269-282.
PMID: 3798393.
PubMed Entry
13.
Moore GW, Riede UN, Sandritter W.
Application of Quine's nullities to a quantitative organelle pathology.
J Theor Biol. 1977 Apr 21;65(4):633-651.
PMID: 875397.
PubMed Entry
14. Seddon F, ed.
Aristotle & Lukasiewicz on the Principle of Contradiction.
, ed. by Frederick Seddon (Modern Logic, 1996)
ISBN 1884905048
15. Wolenski J, ed.
Philosophical Logic in Poland.
Kluwer. 1994;:.
ISBN 0792322932.
16. Lukasiewicz J.
Elements of Mathematical Logic.
Warsaw: Panstwowe Wydawnictwo Naukowe. 1963;:.
Multi-valued logic was introduced in 1917 by Prof. Jan Lukasiewicz.
17. Lukasiewicz J.
Selected Works.
North-Holland Publishing Co. 1970;:.
ISBN 0720422523.
18. Moore GW, Hutchins GM, Miller RE.
Token swap test of significance for serial medical data bases.
Am J Med. 1986 Feb;80(2):182-190.
PMID: 3511687; UI: 86127353.
PubMed Entry
19. Moore GW, Hutchins GM, Miller RE.
A new paradigm for hypothesis testing in medicine,
with examination of the Neyman Pearson condition.
Theor Med. 1986 Oct;7(3):269-282.
PMID: 3798393; UI: 87094863.
PubMed Entry
Last updated: 9/23/2005, by G. William Moore, MD, PhD.