From the Pathology and Laboratory Medicine Service,
Baltimore Veterans Affairs Maryland Health Care System;
and Departments of Pathology, University of Maryland Medical System
and The Johns Hopkins Medical Institutions, Baltimore, MD. Originally Presented: Thursday, September 18, 2003, Biomedical Computing
Interest Group (BCIG), U. S. National Institutes of Health, Clinical Center,
1:00 to 3:00 PM.
See:
http://www.altum.com/bcig/events/seminars/2003/2003_09.htm
Please address correspondence to:
G. William Moore, MD, PhD. Chief, Quality Assurance Section, Anatomic Pathology.
Chief, Autopsy Section.
Pathology and Laboratory Medicine Service (113).
Baltimore Veterans Affairs Maryland Health Care System.
Baltimore, Maryland 21201-1524.
George.Moore4@va.gov Last Updated: 5/1/2008, by G. William Moore, MD, PhD.
U. S. Government Work, uncopyrighted, presented at:
Becich MJ, Crowley R, course directors.
Advancing Practice, Instruction, and Innovation through Informatics.
Frontiers in Oncology and pathology. Eighth Annual Conference.
Pittsburgh, PA: University of Pittsburgh Medical Center.
October 8-10, 2003. 2003;:.
http://apiii.upmc.edu Moore GW, Brown LA, Burger RH, Hutchins GM, Miller RE.
Modal Logic Theory for Pathology Inference.
Arch Pathol Lab Med. 2004;128:.
SCREEN 1. DISCLAIMER.
United States Government Work, uncopyrighted, public-domain,
DRAFT COPY ONLY. This document does not necessarily represent the views
or policies of any United States Government agency. This document is provided
"as is", without warranty of any kind, express or implied, including but not
limited to the warranties of merchantability, fitness for a particular
purpose and non-infringement. In no event shall the authors be liable
for any claim, damages or other liability, whether in an action of contract,
tort or otherwise, arising from, out of, or in connection with the document
or the use or other dealings made with the document.
SCREEN 2. ABSTRACT.
Pathology studies the etiology and pathogenesis of disease. Anatomic
pathology is devoted to the gross anatomy and microanatomy of diseased
organs, for rendering diagnoses, and for acquiring new knowledge
about disease biology. A major function of the anatomic pathologist
is to issue diagnostic reports on samples from diseased tissue. The aggregate
collection of these reports contains a wealth of information
related to almost every serious human disease.
Any data-mining program must incorporate the fundamental constraints
on data acquisition in routine medical practice. It may be unnecessary,
uneconomic, technically unfeasible, or unethical to fill in all possible
data-items in a rectangular database. Existing clinical databases should
include formal considerations: for missing values, patient consent,
patient risk, and provider alerts. This report proposes a basic
theory of clinicopathologic inference.
This report proposes a mathematically consistent theory of clinicopathologic
inference. There are two types of propositions: data,
set D; and medical entities, set E. Data are
binary propositions (i.e., true/false); and medical entities
(or medical threats) are fuzzy propositions. Classically,
a fuzzy proposition assumes truth-values, v, along the closed
interval, [0,1], where v=0 is false and v=1 is true.
For convenience in the present formulation, propositions assume
certainty levels, $1, $2, $3, ...,
where certainty level $k corresponds to
fuzzy value (1 - 2-k); so that certainty level
$1 corresponds to fuzzy value ½;
certainty level $2 corresponds to fuzzy value
¾; certainty level $3 corresponds to
fuzzy value ⅞, etc.
There are three ethical operators in our model: certainty ($);
value (#); and payment (!); and nine rules of ethical data
collection, based upon the general princiople that data should be
collected whenever a medical condition sufficiently threatens the patient
and the patient gives informed consent; and data should not
be collected if either condition fails, i.e., there is no significantly
threatening condition, or else the patient does not give informed consent.
What for? There is an emerging technology of software agents,
or spiders that crawl through the worldwide web or other computer
resources, looking for cases needing followup, and other medical anomalies.
The language for constructing and organizing these software agents
is RDF........ The use of these software agents should be
constrained by minimal ethical considerations, consisting
of fuzzy certainty, value, and payment for the relevant medical
entities. The basic framework includes: increasing certainty
of medical threats (Rules 1,2,3); Hippocratic principles
(first do no harm; treat if indicated) (Rules 4,5);
and ethical data collection (Rules 6,7,8,9). Ethical data collection
is the idea that there is an ontology for medical threats;
the physician may be concerned (vexed)) enough to collect data
on a perceived medical threat; that data, once obtained,
are never lost; and the use of Sutton's Law (Zebra Rule)
to guide further threat assessment.
There are nine rules of relationship in this system:
(1) complementizer negation/homomorphism; (2) fuzzy asymmetry;
(3) crisp data; (4) Hippocrates-first; (5) Hippocrates-reverse;
(6) ontology; (7) vexation; (8) ethical data collection;
and (9) Schrödinger's cat.
The theory employs modal/fuzzy/multivalued logic operators
of know-whether/certainty ($), value-to-know-whether (#),
and pay-to-know-whether (!). There is an atomset
of distinct, atomic propositions (atoms, A), each of which has
a definite true-false status. Quantitative, interval, ranked, and categorical
data are interpreted as collections of true-false statements.
Each atomic proposition, a, is either a datum (complaints,
history, physical findings, laboratory values, statements of consent, etc.);
or a medical entity (cancer, inflammation, necrosis, etc.).
No datum is an entity and no entity is a datum. The negation of a datum
is a datum; and the negation of an entity is an entity. To each atom,
a, there exists known-to-the-k-a, denoted
$ka, for every integer, k, up to a maximum,
M>k; and additionally for each datum, d, there exists
value-to-know-d, denoted #d; and pay-to-know-d,
denoted !d. A datum, d, is Hippocratic-first (do-no-harm)
if and only if (not-#d implies not-!d),
i.e., don't-value-d implies don't-pay-d;
and Hippocratic-reverse if and only if ((not-$d and #d)
implies !d), i.e., don't-know-d and value-to-know-d
implies pay-to-know-d. A medical entity e, may be ontologic
(exists) and vexative (worrisome) based upon previously collected data.
The theory is mathematically consistent; and satisfies Occam's Razor, namely,
that no entities are known without data. The Hippocratic-first,
Hippocratic-reverse, ontologic, and vexative properties are consistent
if data are entered consensually, consecutively, and consistently,
i.e., no datum is entered after its negation has been entered. The computer
algorithm for solving this system concludes within polynomial time.
This report introduces a mathematical system for managing medical concepts
and data. Modal/fuzzy/multivalued logic operators expand the purview
of classical symbolic logic, to accommodate technical, economic,
and consent-based constraints on clinicopathologic data collection.
The theory supports such medical concepts as: do-no-harm; treat-if-valuable;
disease ontologies; worrisome findings; and levels of certainty.
The theory is completely general, and permits definitions of patient injury
that include possible death, morbidity, inconvenience, financial constraints,
or loss-of-privacy; and definitions of value-to-know that may differ among
observers (patient, physician, insurer, national health policy, research
protocol). Mathematical theories can serve to organize medical knowledge
and patient data, and improve the scheduling and effectiveness
of data collection and surveillance in large clinicopathologic data systems.
RATIONALE.
Natural science may be regarded as the pursuit of truth, based upon
observations, or data. Scientific, or evidence-based medicine,
involves the collection and organization of data for the relief of human
suffering. While pure science seeks only the truth, medical science has
two significant, additional constraints: value and payment.
Value is the benefit to the patient of obtaining a particular fact
or applying a particular therapy. Payment is the aggregate expense
to the patient of obtaining this fact or therapy, in inconvenience,
money, pain, and/or risk of morbidity and mortality.
The present model for medical analysis recognizes two classes of binary
(yes/no) logic propositions: data, D, and medical entities, E.
Each datum, d∈D, such as serum prostatic specific antigen above
a particular upper bound, has a payment, !d, that must be justified
by a corresponding value, #d, that justifies the payment
for collection. A datum is either entirely certain, +$d,
or entirely uncertain, -$d.
By contrast, a medical entity, e∈E, such as "prostate cancer",
is a theoretical construct supported by observations, that is never entirely
certain. Increasing levels of certainty for a threatening medical entity,
such as cancer, might justify progressively invasive data collection.
For example, a sixty-year-old man who has not had a serum prostatic specific
antigen or digital rectal examination performed for over a decade might
justify performing one or both of these tests (mildly invasive). A positive
result on either test raises the suspicion for prostate cancer, and might
justify a (more invasive) prostate biopsy. However, the suspicion for
prostate cancer in an asymptomatic sixty-year-old man with no other relevant
findings, does not justify an immediate prostate biopsy.
Why construct a mathematical formalism for a few, very ordinary ideas
in medicine? First, because a lot of the folk-ideas of medicine
(Sutton's Law, Zebra Rule, Hippocrates' Rules, Value, Payment,
St. Peter's Rule, etc.) are not well-formalized. As late as the 1980s,
there was no formal definition for intention (not intension)
(see Searle[]). Yet, clinical medicine involves the intention of the patient,
the intention of the physician, as well as that of third-party-payers,
health policy makers, etc. Despite all the advances in astrophysics,
cosmology, and evolutionary biology, there is still no decent definition
of free will (see Wilson[], Hawking[]); yet free will (or at least
the perception of free will) is a major feature of medical care
and medical ethics.
Medical care records are rapidly becoming computerized, and, alas
all-to-slowly, becoming standardized. The U. S. Veterans Affairs medical
centers are a leader. At the Baltimore VA Maryland Health Care System
(VAMHCS), nearly all records have been computerized since 2000, including
ethics records, such as patient consent and patient competence-to-consent.
Quality assurance
processes within the institution depend upon these records, to assure
compliance of the institution to high standards of care. Although the goal of
fully-automated quality assurance processes is still elusive, we can foresee
the day when formal computer systems survey large collections of records,
to monitor compliance with optimal standards of care. The trouble is:
computer programs, by themselves, have no judgment or ethics. We physicians
need to formulate the basic principles of judgment and ethics, in order
to survey electronic medical records for possible anomalies in these
standards.
Why bother with mathematical consistency? So that, when a computer program
surveys these records, barring programming errors, one can be certain
that one doesn't have a statement that is both true and false at the same
time (the definition of mathematical inconsistency). It's not enough
to "try out a few examples". One must verify that the actual basis for the
calculations is consistent.
MODAL LOGIC:
PROSTATE CANCER EXAMPLE.
1. Modal logic is an expanded form of classical logic,
in which Aristotle's (384-322 BC) Law of Excluded Middle
is conditionally/partially suspended.
2. The term refers to subjunctive mood (Latin:
modus subjunctivus in classical grammar.
3. In classical logic, a proposition, p, is either true
or false. In modal logic, proposition, p, is either
necessarily true, denoted □p; necessarily false,
denoted □~p; or possibly true,
denoted ◇p, where:
◇p = ~□~p.
□p = ~◇~p.
4. Plato (428-347 BC, Greek philosopher) and Avicenna
(980-1037, Persian physician and mathematician) were early contributors.
5. Modern contributors: Jan Lukasiewicz (1883-1964, Polish logician),
C. I. Lewis (1883-1964), Sadegh-zadeh, and Zadeh.
6. Deontic modal logic:
6.1. Deontic Necessity: it is mandatory to do p.
6.2. Deontic Possibility: it is permitted to do p.
6.3. Deontic Passive Negation: p is not mandatory to do.
There is also: temporal modal logic (time), doxastic modal logic (belief),
....
7. Prostate cancer example. Let p=prostate cancer. Then:
7.1. A 60-year-old man who hasn't seen a doctor for ten years:
□p unless □2~p,
7.2. Serum prostate specific antigen is positive:
□2p unless
□3~p,
7.3. Needle biopsy of the prostate is negative:
□3~p unless
□4p, etc.
At each step, uncertainty about a threatening medical condition, namely,
prostate cancer, justifies gathering additional data: □p
justifies drawing serum prostate specific antigen;
□2p justifies performing prostate biopsy, etc.
FREQUENTLY ASKED QUESTIONS.
Question 1. Modal logic has been around, in one form
or another, for over a century (Ŀukasiewicz). What is so special about
the present version?
Answer. The present version of modal logic attempts to explain
a stepwise approach to medical diagnosis, in which every data-collection step
on a patient gets one closer to diagnostic certainty. For a diagnosis,
p, one may know the diagnosis as necessarily p, denoted
□p, necessarily necessarily p, denoted
□□p, necessarily necessarily necessarily p, denoted
□□□p, etc. In this formulation,
one never achieves diagnostic certainty. This formulation corresponds
to the medical reality that a medical diagnosis is never certain, but rather,
certain to a degree that one is ethically entitled to take another step,
such as run additional tests or begin treatment. Even some autopsy diagnoses
are not necessarily certain: there are autopsy blocks that are processed
by newer methods (such as DNA analysis) not available at the original
autopsy, which yield additional diagnoses. Example: DNA analysis of autopsy
blocks in victims from the 1917-1918 worldwide influenza pandemic.
Question 2.
Why is it that the masters of Modal Logic (Ŀukasiewicz, Lewis,
Zadeh, Zeman, Snyder) missed this particular variation of modal logic?
Answer.
Perhaps because the present formulation has an infinite regress
of necessarilies, □□□ □..., for which
the early inventors of modal logic did not have a suitable
philosophical analogy. Furthermore, the present formulation does not have
a meaningful symmetry between possibly, ◇,
and necessarily, □, which makes the present formulation
philosophically unesthetic (vide infra). I imagine that the previous workers
in this field either didn't stumble upon the present formulation;
or if they did, did not consider it worthy of further investigation.
Question 3.
What are some of the pitfalls and problems with this formulation?
Answer. Because of the homomorphism rule, there is no useful
meaning in the present formulation for possibly possibly p,
even though there is a meaning for necessarily necessarily p
that is distinct from that of necessarily p.
0. Seven general theorems are stated and proved in this
mathematical model, along with associated lemmas and corollaries. There is
a live proof program in the manuscript, in which simple
examples and theorems may be tested. The reader is invited to try out
his/her own examples. The live proof program has been tested on 200 theorems
from Zeman's Modal Logic. See Appendix H:
1. There is a relationship between modal logic
(necessarily, possibly) and fuzzy set theory, such that greater
fuzzy membership implies higher levels of modal-certainty.
2. Ethical data collection (Rule 8) leads to consistent
entity inferences.
3. An empty system is consistent, and implies no entities;
for stepwise data collection, less data imply less entities
(Occam's Razor, William of Ockham, 1285-1349, English logician
and Franciscan friar, Latinized: Occam).
4.In-between theorem. Analogous to between
in Euclidean geometry. If you have sufficient data to imply
necessarilyk entity, then you have sufficient data to imply
necessarilyk-1 entity.
5. Resource Description Framework (RDF): general syntax
for writing computer-parsable ordered triples, that export meaning among
databases on the semantic worldwide web, by binding a described datum
to a specified subject. Internet web-crawler programs can interrogate
multiple RDF documents, and draw inferences from these ordered triples.
RDF-classes: Strict monoparental hierarchy; An RDF-class hierarchy
is mathematically consistent.
6. RDF Theorems::
Theorem §6.1. Consistency of RDF classes.
Theorem §6.2. Identity. Class p implies p.
Theorem §6.3. Or-expansion. If p implies q,
then p implies q or q or q or q....
Theorem §6.4. Telescoping.
Theorem §6.5. Contextualization.
Theorem §6.6. Intercalation.
Theorem §6.7. Retirement.
7. Token Cube / Neyman-Pearson Condition (Jerzy Neyman,
1894-1981, Polish-American statistician; Egon S. Pearson, 1895-1980,
British statistician). Extension of classical contingency table analysis,
which compensates for metaknowledge in a contingency table;
and deals with zerodivide in chisquare test, χ2
contingency table analysis.
The essential argument of the Neyman-Pearson Condition is that
greater power (=(1-β)) forces greater Type I Error (=α).
1. Rule 1. Complementizers: Absorb negation; homomorphic
in logical-and. A complementizer is a grammatical
element, such as that, whether, which, who, where, when, how,...,
in a sentence, that connects an independent (main) clause
to a dependent clause. For example:
it is said that Homer was blind
where it is said is the main clause; Homer was blind
is the dependent clause (Homer, 8th century BC, Greek poet);
and that is the complementizer. In this sentence,
the complementizer, that, is negation-sensitive,
that is, it is said that Homer was blind is not the same as
it is said that Homer was not blind. By contrast,
the complementizer, whether, is negation-insensitive,
that is, it is said whether Homer was blindis the same as
it is said whether Homer was not blind.
The present mathematical model has three negation-insensitive
complementizers, namely:
$: it is certain/known whether
#: it is of value to know whether
!: payment to know whether
These complementizers, $, #, !, absorb negation.
That is, for propositions p, q:
These complementizers are homomorphic in logical-and.
That is, for propositions p, q:
$(p&q)=$p&$q; #(p&q)=#p&#q; and
!(p&q)=!p&!q.
2. Rule 2. Fuzzy Asymmetry.
Fuzzy set theory(Zadeh, 1965)
is a generalization of classical/crisp set theory, that represents
different levels of certainty for the same concept. Element p
has partial membership in set P, denoted
pμvP, where v assumes any value
along closed interval, v ∈ [0,1]. Fuzzy is not
probability. Despite its quirky name, fuzzy is serious mathematics.
Fuzzy set theory has an asymmetry property:
If pμvP, and v>w,
then pμwP. Classical set theory is the special case
of fuzzy set theory, in which either v=0 or v=1.
3. Rule 3. Crisp Data. In our mathematical model,
data are crisp/classical and entities are fuzzy.
4. Rule 4. Hippocrates-first. Hippocrates (460-370 BC,
Greek physician, father of medicine) is famous for the medical dictum:
first do no harm, often given in the form of Galen's
(129-200, Greco-Roman physician) Latin translation: primum nón
nocére.
5. Rule 5. Hippocrates-reverse A converse doctrine, also
formulated by Hippocrates, that one must offer treatment to the patient
if one is available: treat if you can.
6. Rule 6. Ontology (Platonic description of essential reality
(Smith, 1996); Plato, 424-348 BC,
Greek philosopher; Greek: οντως
= ontós = real, actual; λογος =
logos = word, study); is a description of the core beliefs for a field
of study, in this case, ethical clinical medicine. The central idea
in our model is that a collection of data, Δ, implies
an entity, e, at a certainty level k, commensurate with
the extent and quality of data given.
7. Rule 7. Vexation (Latin: vexari: to worry)
corresponds to the worry list that every physician carries around
in his/her mind, regarding patients requiring additional tests, therapy,
or followup. In our mathematical model, entity e at certainty level
k implies value-to-know the additional datum, d.
8. Rule 8. Ethical Data Collection.
In our mathematical model, a datum, d, is collected ethically
if and only if:
1. the datum is never collected;
2. payment is made and the datum is true (+d and +$d and +!d);
3. payment is made and the datum is false (-d and +$d and +!d); or
4. payment is made and the attempt fails (+!d only).
Each step at which payment is made (+!d), must be justified by value,
(+#d), in the previous step.
9. Rule 9. Schrödinger's cat
(Erwin Schrödinger, 1887-1961, Nobel Prize Physics, 1933)
is a disappearing cat in a box. According to quantum mechanical theory,
a probabilistic event, such as a radioactive decay, doesn't
have a consequence (i.e., the cat neither lives nor dies) until the event
is observed. In our model, an entity is not certain (Rule 6, Ontology)
at a particular certainty-level until all higher certainty levels
are (provisionally)
excluded. In contrast to Schrödinger's cat, which involves a single
physical event in which the cat lives or dies, in our model, there is a
stepwise process of data collection, and corresponding cat's box
openings or Schrödinger openings at each step, where the cat
may die and then come back to life in subsequent data collection steps.
Also known as: Sutton's Law (Willie Sutton, 1901-1980, American Bank Robber,
"Slick Willie"); Zebra Rule; Black Swan; Albino crow; etc.
10. Rule 10. Neyman-Pearson Condition . (Jerzy Neyman,
1894-1981, Polish-American statistician; Egon S. Pearson, 1895-1980,
British statistician). The Neyman-Pearson Condition is the condition
that when performing a hypothesis test between two point hypotheses
H0: θ=θ0 and H1:
θ=θ1, then the likelihood-ratio test that rejects
H0 in favor of H1 when
Λ(x) = (L(θ0|x) / L(θ1|x))
< η, where
P(Λ(X)<η|H0)=α,
is the most powerful test of size α for a threshold
η, where
(L(θ0|x) / L(θ1|x))
is the likelihood ratio (or more generally, any statistical test
inequality comparison); η designates the so-called
critical region for the test; and α is the significance
level for Type I (false positive) Error.
The essential argument of the Neyman-Pearson Condition is that
greater power (=(1-β)) forces greater Type I Error (=α).
1. Data-mining in Anatomic Pathology: use of public data
for drawing medical conclusions (Moore et al,
2001).
2. Constraints: patient privacy, missing values.
3. Data-mining program for pathology: incorporate
ethical/technical constraints of routine medical practice.
4. At a fully-computerized medical institution,
such as the Baltimore VA Maryland Health Care System,
pathology data are used for
quality assurance of clinical services.
5. Completing a rectangular database: may be unnecessary,
uneconomic, technically unfeasible, or unethical, to collect
all possible data for all possible data-cells in the table.
6. Formal considerations for missing values, patient consent,
patient risk, and provider alerts.
7. Set theory definitions of atoms, data, and
medical entities[7,8,9,10].
Male. Caucasian. 1.91 m. 95.5 kg.
b. 8/27/1908. d. 1/22/1973.
Occupation: U.S. Congressman, U.S. Senator, U.S. President.
Status post: Appendectomy.
Status post: Cholecystectomy.
History of: Renal Calculi.
Myocardial Infarct, 1955.
Myocardial Infarct, April, 1972.
Myocardial Infarct, January 22, 1973.
Marked Generalized Atherosclerosis.
U. S. National Library of Medicine Unified Medical Language System:
(USNLM, 2004).
Male. Caucasian. 1.91 m. 95.5 kg. {C0024554}.
b. 8/27/1908. d. 1/22/1973. {C0021132}.
Occupation: U.S. Congressman, U.S. Senator, U.S. President. {C0032382}.
Status post: Appendectomy. {C0003611}.
Status post: Cholecystectomy. {C0008320}.
History of: Renal Calculi. {C0022650}.
Myocardial Infarct, 1955. {C0027051}.
Myocardial Infarct, April, 1972. {C0027051}.
Myocardial Infarct, January 22, 1973. {C0027051}.
Marked Generalized Atherosclerosis. {C0205082,C0205046,C0205246}.
Privacy: Does the patient have a positive syphilis test?
1. U. S. Health Insurance Portability and Accountability Act. 1996.
(HIPAA, Kennedy-Kassebaum Bill, H.R. 3103 of 104th U. S. Congress).
2. Regulates all individually identifiable medical records
in the USA.
3. Final Rule in force since April 14, 2003.
4. Huge fines for non-compliance: $25,000 for
each record disclosed unintentionally; more for intentional
disclosures or disclosures involving commercial gain.
5. Some research studies involving statistics
require individual data.
6. For public research databases, no patient medical record
may be individually identifiable.
1. Some research studies involving statistics
require individual patient data.
2. Published, grouped data may not contain all the detail
necessary to evaluate the statistical analysis methods. Therefore,
it would be valuable if individual data were published on the internet,
so that the statistical analysis methods could be verified
by the public at large.
3. Strong Privacy: The patient him/herself cannot identify
his/her own medical record. Therefore, there may be at least c
exact duplicates in the published record, where c is the
conspiracy threshold. That is, a conspiracy of c
patients could get together and demonstrate that their records,
as a group, have been exposed/published on the internet.
4. Weak Privacy: The public part of a patient's record
cannot be uniquely identified. Therefore, there must be c
exact duplicates in the public variables of the published record.
5. Dangers of Weak Privacy: embarrassment to the patient,
even if logically unfounded; sense by the patient that his/her records
are public, even if they are not; if one private part is accidentally
disclosed, then the remainder of the record is exposed. (See: "syphilis"
example, Screen 8.)
6. Detail must be blurred just enough so that one patient
can be mistaken for c other patients.
7. It is a bad idea statistically, as well as fraudulent
and confusing, to create additional, phantom patients. I'm not sure
that we currently have the statistical apparatus to manage even controlled,
intentional fraud. (But see:
Berman (2007)).
1. Pain crisis in sickle cell disease is an episode of
poorly-localized abdominal pain, that requires major pain medications
for relief. There are no characteristic morphologic features corresponding
to pain crisis in sickle cell disease.
2. Can pain crisis in sickle cell disease be recognized
statistically at autopsy? Is it a cause of death?
4. 71 autopsied cases of sickle cell disease
in the autopsy files of The Johns Hopkins Medical Institutions
with adequate clinical histories. 9/20 (45%) patients died in pain,
death unexplained at autopsy; 4/51(8%) patients died without pain,
death unexplained at autopsy.
5. Is there a significant correlation between unexplained death
and pain crisis?
Click on the SUBMIT button.
6. No-explanation-at-autopsy is the gold-standard, Φ;
and pain-crisis is the new hypothesis, Ψ being investigated.
1. Contingency table analysis
(Screen 11,
above) is a powerful method for comparing frequency data in patients
with two different data-sources, Φ and Ψ
(Pearson, 1904;
Upton and Cook, 2006)
(Karl Pearson, 1857-1936, British statistician).
2. The simplest contingency table is a rectangular table of binary
(false/true) observations on patients, with two rows, two columns, and
2×2=4 cells. Columns correspond to an existing biomedical test,
Φ; (death explained at autopsy); and columns correspond
to a newer test, Ψ (pain crisis), as follows:
_____________
True: | c | d |
Ψ |_____|_____|
False: | a | b |
|_____|_____|
False True
Φ
3. In this contingency table, cell a represents the set of patients
where both test Φ and test Ψ are false
(true negatives, TN); cell b represents the set of patients
where test Φ is true and test Ψ is false
(false negatives, FN); cell c represents the set of patients
where test Φ is false and test Ψ is true
(false positives, FP); and cell d represents the set of patients
where both test Φ and test Ψ are true
(true positives, TP).
That is, the lower-left and upper-right cells form the true diagonal
of this table; and the upper-left and lower-right cells form the
error diagonal.
4. We may calculate marginal totals, w, v, x, y;
and a grand total, z, for this table,
where v=a+b, w=c+d, x=a+c, y=b+d,
and z=v+w=x+y=a+b+c+d.
_____________
True: | c | d | w
Ψ |_____|_____|
False: | a | b | v
|_____|_____|
x y z
False True
Φ
5. In classical statistics, test Φ compared to test Ψ
is evaluated by the chisquare test, χ2, or by the
Fisher exact test (Ronald A. Fisher, 1890-1962, British statistician),
based upon the squared-normal or binomial distributions, respectively.
In the null hypothesis (the statistical straw man), it is assumed
that tests Φ and Ψ
are statistically independent.
1. In classical contingency table analysis, there is a 2×2
rectangular table, in which test Φ (columns) represents
the definitive but costly test for a medical entity (e.g., prostate biopsy);
and test Ψ (rows) represents a newer, less costly, less painful
test for the same medical entity (e.g., serum prostate specific antigen).
Suppose that we have data for both these tests on 10,000 patients,
and the contingency table is as follows:
2. Suppose further that we have adjusted the new test, Ψ,
such that we are willing to accept a 200:10 = 20:1 ratio
of false_positives:false_negatives, as shown. That is,
a false-negative is much more dangerous to the patient than a false-positive,
since a false-negative means that the patient
is not followed-up until until the next regular screening interval;
whereas a false-positive only requires the more expensive test,
Φ, but at least doesn't lose the patient to follow-up.
3. Suppose that we are already convinced that tests Φ
and Ψ are highly correlated (i.e., not independent),
so that the classical χ2
and Fisher exact tests (Ronald A. Fisher,
1890-1962, British statistician) are not useful at this point.
4. Finally, we know that the medical entity, prostate cancer, affects
much less than half the population sampled, so that (a+c)>(b+d)
and c>b. Whence we may conclude that the cell totals satisfy:
a>c>d>b.
(Proof:......).
5. Furthermore, if we know that the actual frequency of the disease
in the general population is <190
(here, 100/10,000, then we would set c/a>1%
(Proof:......).
6. In the token swap test, we set the null hypothesis at b=0.
Then the null hypothesis becomes:
_________________
False: | c-b | d+b | w
Ψ |_______|_______|
True: | a+b | 0 | v
|_______|_______|
x y z
False True
Φ
7. None of the null hypothesis cell totals are negative
(Proof: because c>b).
The marginal totals are preserved, and in particular, the ratio of
Φ-positives to Ψ-positives is preserved.
The token swap algorithm then addresses the question whether
b is unacceptably large, based upon its distance from zero.
1. Many scenarios in medicine are more complex than established test
Φ versus new test Ψ, in determining the presence
of medical entity e. Some patients are in higher risk groups than
other patients, and one is more suspicious of a false negative or false
positive, based upon this ancillary, risk-biased information.
2. Therefore, we propose a third logical variable,
test Ω, as a gold standard that encapsulates
everything that we know about each patient.
The apparatus for managing this heterogenous test Ω
information is given by the medical model below.
3. Suppose that we have a three-dimensional contingency cube,
where test Φ is the horizontal axis, test Ψ is the
vertical axis, and test Ω is the depth axis:
4. There are eight cells (subcubes) in a contingency cube:
a, b, c, d, e, f, g, h: with cells a, b, c, d in the
Ω-front plane, as before; and corresponding
cells e, f, g, h, respectively, in the Ω-back plane.
Cell Φ Ψ Ω Diagonal:
a F F F True.
b T F F Favor Ψ.
c F T F Favor Φ.
d T T F Error.
e F F T Error.
f T F T Favor Φ.
g F T T Favor Ψ.
h T T T True.
5. There are four diagonals. In the true diagonal, ah,
all three tests, Φ, Ψ, and Ω,
agree, i.e., all three tests are either all true (cell a)
or all false (cell h). In the error diagonal, de,
both test Φ and test Ψ disagree equally
with the gold standard, test Ω. In addition,
there is a favor Φ diagonal, cf, in which test
Φ agrees with the gold standard but test Ψ
disagrees with the gold standard, test Ω;
and favor Ψ diagonal, bg, in which test Ψ
agrees with the gold standard but test Ψ disagrees
with the gold standard, test Ω.
TOKEN SWAP CUBE
: PLANAR PROJECTIONS.
Three-dimensional swap from
Ψ to Φ: b → c and g → f.
Three-dimensional swap from
Φ to Ψ: c → b and f → g.
Collapse/project the cube into three
margin-neutral token squares:
A contingency table is a rectangular table, with two rows
and two columns [95,96,97,98].
2. Rows represent an existing gold standard, g; and
columns represent a hypothesis, h
3.
Φ→ Ψ↓
-
+
Total
-
c
d
v
+
a
b
w
Total
x
y
z
4. In the above example, the
explanation-at-autopsy is the gold-standard = Φ; and pain-crisis
is the hypothesis = Χ being investigated.
5. In a simple example, consider a BALANCED 2×2CT
in which there are 100 patients, all told, of which 90 patients
are gold standard negative, Φ- and 10 patients are
gold standard positive, Φ+. Further, suppose that 50 patients
are hypothesis negative, Ψ- and 50 patients are
hypothesis positive, Ψ+, as follows:
BALANCED
Φ→ Ψ↓
Φ-
Φ+
Total
Ψ+
45
5
50
Ψ-
45
5
50
TOTAL
90
10
100
6. In this example, gold-standard ±
is uncorrelated to hypothesis ±.
The individual data cells in the table
contain tokens, that represent individual patients, characterized by
nothing more than their Φ±Ψ± status.
In the example, the observed cell totals are:
Φ-Ψ- = 45 tokens; Φ-Ψ+ = 5 tokens;
Φ+Ψ- = 45 tokens;
Φ+Ψ+ = 5 tokens. The marginal totals are: Φ- = 90;
Φ+ = 10; Ψ- = 50; Ψ+ = 50.
The grand total, z, is 100.
7. The BALANCED/EXPECTED CELL TOTALS are
obtained as cross-products of the marginal totals, as follows:
8. Classical statistical analyses of a (2×2CT)
are afforded by the CHISQUARE TEST (CST) and
FISHER EXACT TEST (FXT), based upon statistical sampling assumptions
(Ronald A. Fisher, 1890-1962, British statistician).
9. The TOKEN SWAP TEST (TST) is a statistical-type
significance test, that measures the likelihood of
MISCLASSIFICATIONS in a 2×2CT.
1. Now consider an UNBALANCED 2×2CT, with the
SAME MARGINAL TOTALS as above.
The least-unbalanced example has only a single token misclassified:
UNBALANCED: BALANCED+1
.
Φ-
Φ+
TOTAL
Ψ+
46
44
90
Ψ-
4
6
10
TOTAL
50
50
100
The second-least-unbalanced example has two tokens misclassified:
UNBALANCED: BALANCED+2
.
Φ-
Φ+
TOTAL
Ψ-
47
43
90
Ψ+
3
7
10
TOTAL
50
50
100
... and so forth.
2. How unbalanced can the observed data-cells be, before
we suspect that there is a genuine relationship between the gold-standard
g, and the hypothesis, h? That is, how unbalanced can
the observed data-cells be, before one rejects the null hypothesis?
3. The CHISQUARE TEST (CST) and FISHER EXACT TEST (FXT)
are based upon statistical sampling assumptions
(Ronald A. Fisher, 1890-1962, British statistician).
4. The TOKEN SWAP TEST
does not depend upon the usual statistical assumptions
of repeated, random sampling from a source population.
1. TOKEN SWAP SIGNIFICANCE EXAMPLE.
In the following example, it requires five TOKEN SWAPS to transform
the expected into the observed contingency table:
EXPECTED
.
NO
YES
TOTAL
YES
16
4
20
NO
42
9
51
TOTAL
58
13
71
⇒⇒⇒
EXPECTED+1
.
NO
YES
TOTAL
YES
15
5
20
NO
43
8
51
TOTAL
58
13
71
⇒⇒⇒
EXPECTED+2
.
NO
YES
TOTAL
YES
14
6
20
NO
44
7
51
TOTAL
58
13
71
⇒⇒⇒
EXPECTED+3
.
NO
YES
TOTAL
YES
13
7
20
NO
45
6
51
TOTAL
58
13
71
⇒⇒⇒
EXPECTED+4
.
NO
YES
TOTAL
YES
12
8
20
NO
46
5
51
TOTAL
58
13
71
⇒⇒⇒
EXPECTED+5 =OBSERVED
.
NO
YES
TOTAL
YES
11
9
20
NO
47
4
51
TOTAL
58
13
71
2. In the zeroth token-swap, the chances that the EXPECTED-to-EXPECTED+1
swaps could have taken place AT RANDOM are:
(9×16)
_________________________
(9×16)+(4×42)
that is, the number of possible of EXPECTED-to-EXPECTED+1 swaps,
divided by (the number of possible EXPECTED-to-EXPECTED+1 swaps
plus the number of possible EXPECTED-to-EXPECTED-1 swaps),
without altering the marginal totals.
3. In the zeroth token-swap, the chances that the EXPECTED-to-EXPECTED-1
swaps could have taken place AT RANDOM are:
(4×42)
_________________________
(9×16)+(4×42)
4. In the first right token-swap, the chances that
the EXPECTED+1-to-EXPECTED+2 swaps could have taken place AT RANDOM are:
(8×15)
_________________________
(8×15)+(5×43)
5. In the first left token-swap, the chances that
the EXPECTED+1-to-EXPECTED swaps could have taken place AT RANDOM are:
(5×43)
_________________________
(8×15)+(5×43)
and so forth.
6. When the EXPECTED has swapped up to the OBSERVED table,
without altering the marginal totals, and the proportion of such swaps
is less than 5%, then the result is significant.
7. If the result is not significant, then we say that the observed
2×2CT is NOT SO DIFFERENT from the expected
2×2CT, that occasional misclassifications
by a medical observer could account for the differences.
1. In statistics, the Neyman-Pearson Condition
(Jerzy Neyman, 1894-1981, Polish-American statistician;
Egon S. Pearson, 1895-1980, British statistician) is the condition
that when performing a hypothesis test between two point hypotheses
H0: θ=θ0 and
H1: θ=θ1, then the likelihood-ratio
test that rejects H0 in favor of H1 when
Λ(x) = (L(θ0|x) / L(θ1|x))
< η, where
P(Λ(X)<η|H0)=α
is the most powerful test of size α for a threshold
η, where
(L(θ0|x) / L(θ1|x))
is the likelihood ratio (or more generally, any statistical test
inequality comparison); η designates the so-called
critical region for the test, and α is the significance
level for Type I (false positive) Error.
If the test is most powerful for all θ1
∈ Θ1, then it is said to be
uniformly most powerful (UMP).
The essential argument of the Neyman-Pearson Condition is that
greater power (=(1-β)) forces greater Type I Error (=α).
2. In practice, the likelihood ratio itself is not actually used in the test.
Instead one computes the ratio to see how the key statistic in it is related
to the size of the ratio (i.e. whether a large statistic corresponds
to a small ratio or to a large one).
3. Neyman J, Pearson E.
On the Problem of the Most Efficient Tests of Statistical Hypotheses.
Philosophical Transactions of the Royal Society of London.
Series A, Containing Papers of a Mathematical or Physical Character.
1933;231:289-337.
1. The Neyman-Pearson Condition involves the notion of
confidence intervals, which reverse the traditional notion
of hypothesis testing. In traditional hypothesis testing with
a symmetric random variable, such as the normal distribution
with population mean, μ, and population standard deviation,
σ, we determine the probability whether a sample mean,
X, lies within a fixed interval, say,
X ± ησ, about the population mean,
μ, i.e., the probability that
X∈[μ-ησ,μ+ησ],
or μ-ησ <X< μ+ησ:
Figure 3485.
2. In many cases, however, we don't really care about which proportion
(probability) of values of X fall within this interval.
Rather, we may have a good sense of the value of the population standard
deviation, σ, but a poor sense regarding that of the population
mean, μ. Furthermore, we may wish to estimate
the value for μ, based upon our knowledge of
X and σ.
3. Let us reverse the question to its algebraic equivalent, namely, whether
σ lies in the interval, say, ±ησ,
about X, i.e., σ ∈ [X -ησ,
X +ησ] or
X -ησ < σ
<X +ησ:
Figure 3486.
Proof that(1): X -ησ < σ
<X +ησ is equivalent to
(2): μ-ησ <X<
μ+ησ.
Expression (1) consists of expressions
(1a): X -ησ < σ
and (1b): σ <X +ησ.
Add ησ to expression (1a) and -ησ
to expression (1b), to obtain:
X< μ+ησ and
μ-ησ <X, which yield
(2) . Q.E.D.
4. This reversal may seem like a peculiar probabilistic formulation,
since X is subject to random fluctuations, whereas
the population mean, μ, is fixed. Neyman and Pearson proposed
the following interpretation in their theory of confidence intervals.
The probability value, α, represents the probability that the
random interval, X ± ησ,
with bracket μ, as shown in Figure 3463:
Figure 3463.
Here, we show 20 trials, each of size N,
where 1/20 (probability 5%) of the trial confidence bars
fall outside the desired population mean, μ.
5. Of course, the population standard deviation, σ, is
typically not known, but may be estimated as the sample standard
deviation, S, divided by √N, where
S/√N is the sample standard error, for trial-size,
N. This sample standard error may vary from trial-to-trial,
where the error bars are different sizes, corresponding to different values
for S, as shown in Figure 3464:
Figure 3464.
6. The parameter, η, satisfies the Student t distribution
for (N-1) degrees of freedom. The Neyman-Pearson condition
asserts that....
7. The token swap test is a non-statistical test, in which there is
no assumption of sampling; rather, probabilities are calculated from
data internal to the contingency table itself. For this interpretation
of the Neyman-Pearson condition, we must demonstrate that, for a given
initial hypothesis, in which the marginal and grand totals are fixed
and specified, a greater value for η, corresponds to
a smaller value for α.
8. The essential argument of the Neyman-Pearson Condition
is that greater power (=(1-β)) forces greater
Type I Error (=α). For example,
in a Gaussian distribution with two hypotheses, θ0
(null hypothesis) and θ1
(alternative hypothesis), the Type I error is designated
as α and the Type II error is designated as β:
Figure 3477.
The power, = (1-β), of the hypothesis test increases, at the
expense of increasing the Type I error:
Figure 3478.
1. The easiest way to understand the Neyman-Pearson Condition
is to consider two curves:
Figure 3477.
Figure 3489.
The left curve, θ0, corresponds to the
null hypothesis; and the right curve, θ1,
corresponds to the alternative hypothesis.
2. A vertical line, η, is drawn between the two curves.
3. The shaded area ///// under the left curve,
θ0, that lies right of line η,
represents Type I Error = α error = false positives,
assuming that the null hypothesis is true.
4. The shaded area \\\\\ under the right curve,
θ1, that lies left of line η,
represents Type II Error = β error = false negatives,
assuming that the alternative hypothesis is true.
5. The power of a statistical test with respect to the
alternative hypothesis is denoted, (1 - β).
6. If one increases the power of the alternative hypothesis,
this is done at the expense of increasing the α error of the
null hypothesis.
7. The Neyman-Pearson Condition is the property that
hypotheses θ0 and θ1 are chosen
to maximize the power of θ1, for a given
θ0 and a given α error.
8. In the token swap test, the bell-shaped curves
are replaced with discrete histograms:
Figure 3479.
The red line shown here is the η-line. The left histogram
is predominantly the null hypothesis; and the right histogram
is predominantly the alternative hypothesis.
SCREEN 21. CONTINGENCY TABLE ANALYSIS:
PROOF OF THE NEYMAN-PEARSON CONDITION.
The essential argument of the Neyman-Pearson Condition is that
greater power (=(1-β)) forces greater Type I Error (=α).
Lemma 1. In a 2×2 contingency table with
given marginal totals, the frequency of cell d determines
the frequencies of the other cell totals, a, b, and c.
Proof. Consider any value of d, where
v, w, x, and y are determined.
Then b=y-d, c=w-d, and a=x-c.
Lemma 2. In a 2×2 contingency table,
let Fkj, for
0<Fkj<1,
represent the proportion of tokens at frequency j
in cell d after k swaps; for D, the expected value of
cell d, let F0D=1,
and F0j≠D=0. Then:
(1) Fkj=0
for j<(D-k) and j>(D+k).
(2) Fk(D-k)>0 and
Fk(D+k)>0.
(3) F(k+1)(D-k-1)
< Fk(D-k) and
F(k+1)(D+k+1)
< Fk(D+k)
Proof. Part (1). Let k=1. Then:
F1(D+1)
= [F0D×(CB/(AD+CB)) +
F0(D+2)×...]
where [F0(D+2)=0;
and F1(D-1)
= [F0D×(AD/(AD+CB)) +
F0(D-2)×...]
where F0(D-2)=0.
By definition, F1j =
[F0(j-1)×...+
F0(j+1)×...].
For j<(D-k), then F0(j-1) =
F0(<D-2)=0
and F0(j+1) = F0(<D)=0.
For j>(D+k), then
F0(j-1) = F0(>D)=0 and
F0(j+1) = F0(>D+2)=0.
Let the lemma be true for k. Then.....
Proof. Part (2). Let k=1. Then:
Proof. Part (3). Let k=1. Then:
Theorem 1.
The token swap test satisfies the Neyman-Pearson Condition.
Proof.
...........
1. Atomic statements of the medical model are propositions,
i.e., statements that are either true, false,
or uncertain. The negation of a proposition is also a proposition;
the double-negation of a proposition equals the original proposition,
i.e., --p=+p. We recognize two mutually exclusive sets
of propositions: data, set D; and medical entities,
set E. The negation of every datum is a datum, i.e.,
+d ∈ D implies -d ∈ D; and the negation of every
medical entity is a medical entity, i.e., +e ∈ E implies
-e ∈ E.
2. A datum is understood as a fixed event, with a fixed
date/time and a localization on the patient, as for example, a serum
potassium of 2.6 mEq/dL on January 1, 2007, at 8:00 AM; or a 0.5 cm pearly
papule biopsied from the left nasal ala on January 1, 2007, at 8:00 AM.
3. A medical entity is an inferred truth, such as heart
failure or basal cell carcinoma. A datum is either absolutely true,
absolutely false, or absolutely uncertain. A medical entity is
fuzzily true or fuzzily false, based upon inferences drawn
from a data vector, Δ = {+d1, +d2,
... +dn}, available at a particular time.
4. The relationship of medical entities to data is specified
by an ontology (Rule 6) of accepted core beliefs in medicine.
For example, a pearly papule and a confirmatory pathology report
from the biopsy implies basal cell carcinoma, say, at a fuzzy level of
7/8 (or a certainty level of 3, see below).
5. Not every pearly papule of the nose is examined by a physician;
and the physician does not biopsy every pearly papule that he/she examines.
The patient must be worried enough about the papule to schedule
a doctor's appointment; and the physician must be worried enough
about the papule to justify a diagnostic biopsy. Rule 7 is the
Vexative Rule (Latin: vexari = to worry), that provides
justifications for obtaining particular data. It is assumed that every datum
obtained has some payment, however small, in injury, pain, money,
inconvenience, or risk of morbidity or mortality to the patient.
6. Rule 8, or Sutton's Law (go where the money is) (Willie
Sutton, 1901-1980, American Bank Robber, nicknamed "Slick Willie") is the
rule of jumping to conclusions based upon incomplete data
(Brewka, 1997), also known as the
Zebra Rule (if you hear hoofbeats in the street, think of horses
not zebras). Medical reasoning inevitably involves decisions
under uncertainty. One collects limited data, from which one must draw
an initial conclusion. One has a a complementary/converse ethical mandate
(Rule 5) to treat a threatening disease condition if there is compelling
(but not absolute) evidence for it. On the other hand, one has the ethical
mandate (Rule 4, first do no harm) not to collect unnecessary
data, that might harm the patient physically, mentally, or financially.
Therefore, there will be instances in which one initially jumps to the
most likely but wrong conclusion, based upon data that are obtained
subsequently.
In Petersdorf and Beeson's
(1961) original paper on Sutton's
Law, namely, (Fevers of Unexplained Origin)),
these events are clinical findings
suggesting one infectious agent that are superseded by subsequent culture
results. In medical slang, these unexpected reversals are called
zebras (Groopman, 2007).
(Willie Sutton, 1901-1980, American Bank Robber; the original "Slick Willie":
nickname for U. S. President Bill Clinton).
1. In classical propositional logic, these infrequent reversals of usual
conclusions (which, cumulatively, occur rather often in medical practice)
result in a mathematical inconsistency, i.e., a proposition that is
both true and false, a mathematical abomination. This inconsistency may be
avoided by requiring that conclusions be interpreted as medical entities,
that are fuzzily true, but may be overturned by subsequent data.
2. The companion concept for overturning a plausible conclusion
based upon subsequent data collection is Schrödinger's Cat:
"... There is a famous thought experiment called Schrödinger's cat.
A cat is placed in a sealed box. There is a gun pointing at it, and it will
go off if a radioactive nucleus decays. The probability of this happening
is fifty percent. (Today no one would dare propose such a thing, even purely
as a thought experiment, but in Schrödinger's time they had not heard
of animal liberation.)
"If one opens the box, one will find the cat either dead or alive. But
before the box is opened, the quantum state of the cat will be a mixture
of the dead cat state with a state in which the cat is alive.
This some philosophers of science find very hard to accept. The cat
can't be half shot and half not-shot, they claim, any more than one
can be half pregnant. Their difficulty arises because they are
implicitly using a classical concept of reality. In this view, an
object has not just a single history but all possible histories. In
most cases, the probability of having a particular history will cancel
out with the probability of having a very slightly different history;
but in certain cases, the probabilities of neighboring histories
reinforce each other. It is one of these reinforced histories
that we observe as the history of the object.
"In the case of Schrödinger's cat, there are two histories that are
reinforced. In one the cat is shot, while in the other it remains
alive. In quantum theory both possibilities can exist together. But
some philosophers get themselves tied in knots because they implicitly
assume that the cat can only have one history."
From:
Hawking S.
Black Holes and Baby Universes and Other Essays.
New York: Bantam Books. 1993;:. Pages 44-45.
ISBN 0-553-37411-7, 182 pages.
3. In our formulation, as with the boxed Schrödinger's Cat,
no medical entity is every absolutely certain [other than possibly
in the mind of God, because God presumably works with a larger data vector
than we mortals can ever know. Or, medical entities are perhaps also
uncertain even in the mind of God, and the certainty model itself has been
imposed upon God by arrogant humans. In any event, Schrödinger's Cat
always has an encore in our formulation.]
4. In Schrödinger's Cat, one irrevocably determines the life-status
of the cat when the cat's box is opened. In our mathematical model, one
determines the status of medical entities when you apply Sutton's Law,
i.e., jump to the most likely conclusion, given the data that you have
on hand. In our mathematical model, this Schrödinger Opening
of the cat's box unleashes an ethical mandate (Rule 7, Vexative) to collect
additional data. In Schrödinger's formulation, the cat's box is opened
exactly once. In our mathematical model, the cat's box is opened once;
vexative data are collected; the cat's box is closed (i.e., Sutton's Law
is suspended again); the cat's box is opened again; additional vexative data
are collected; the cat's box is closed again, ....
SCREEN 24. SUMMARY OF RULES:
SET THEORY FORMULATION.
0. The logic in this report is based upon classical logic,
with the following three complementizers: payment (!);
value(#); and knowledge/certainty($). That is, the
harm/payment created by achieving higher levels of knowledge/certainty
must be balanced by the value in obtaining that knowledge/certainty.
1. Rule 1.
Complementizers: Absorb negation,
homomorphic in logical-and.
= complementizer-positive. That is:
negative-negative-p equals p; know-negative-x equals know-p;
pay-negative-p equals pay-p; value-negative-x equals value-p.
Homomorphic in logical-and...........
3. Rule 3. Data are crisp.
You either know a datum or not.
Nandset definition: {+$d,-$∞d}.
4. Rule 4.
Hippocrates-first (Hippocrates, 460-370 BC, Greek physician,
father of medicine). That is, payment-datum implies value-datum.
(Contrapositively: no-value-datum implies no-payment-datum.)
Nandset definition: {-#d,+!d}
5. Rule 5.
Hippocrates-reverse. Treat if you can.
Not-know-datum and value-datum implies harm-datum.
Nandset definition: {-$d,+#d,-!d}.
6. Rule 6. Ontology. If you know
certain entities and data, then this generates the knowledge/certainty
of an additional entity. For example, if this patient has an elevated
serum-prostatic-specific-antigen, then you become more certain
that the patient has prostate cancer.
Nandset definition:
{+$kΔ,Δ,..,-e,-$k+1e}
and {+$kΔ,Δ,..,-$ke}.
7. Rule 7. Vexative. If you know certain
entities and data, then this generates value for an additional datum.
That is, you become vexed by your ignorance of that additional
datum. For example, if you know that an elderly male patient has not had
a serum-prostatic-specific-antigen in the past five years, you become vexed
regarding that missing-datum.
Nandset definition: {+$ke,e,-$d,-#d,-$k+1e}.
8. Rule 8. Ethical Data Registration.
For each datum, there is
a data-collection step, J, at which the datum is collected
and is true; or the datum is collected and is false; or the datum
collection attempt fails and the datum is unknown. Otherwise, the datum
is never attempted and never collected. That is, for d ∈ D,
there exists at most one J,
1 < J < H, at which
(8.1.1) +$d, +d, +!d are true; or else
(8.1.2) +$d, -d, +!d are true; or else
(8.1.3) -$d, +!d are true.
(8.2) Otherwise, for every J, 1 < J < H,
-$d, +$d, -#d, +#d, -!d, +!d are all not entered into
(SJ -
SJ-1).
The nandsets for Rule 8 are: (8.1.1) {-$d}, {-d}, {-!d} ∈
(SJ -
SJ-1);
or else (8.1.2) {-$d}, {+d}, {-!d} ∈
(SJ -
SJ-1);
or else (8.1.3) {+$d}, {-!d} ∈
(SJ -
SJ-1).
(8.2) Otherwise,
{+$d}, {-$d}, {+#d}, {-#d}, {+!d}, {-!d} ~∈
SJ -
SJ-1).
9. Rule 9.
Schrödinger's Rule. At data-collection-step J,
we create a set,
OJ,
the SCHRÖDINGER OPENING.
The nandset for -$kω, namely,
{+$kω}, is placed in
OJ
if and only if the nandset for +$kω, namely,
{-$kω}, is NOT a member of
the logical consequences of the data-collection-step,
denoted ∫ (for logical "summation").
That is, anything that is uncertain at data-collection-step J
is declared uncertain in
OJ.
If the cat's life is uncertain at data-collection-step J,
then it is declared uncertain in
OJ.
However, the cat may spring alive again at data-collection-step (J+1).
Watch closely: the reasoning is a little tricky.
Rule 9, Schrödinger's Rule:
It is true that -$kω for
OJ
if and only if +$kω
is not a logical consequence, denoted ∫ (for logical
"summation"), of
SJ.
The nandset for Rule 9 is:
{+$kω} ∈
OJ
if and only if
{-$kω} ~∈
∫SJ,
where ∫ represents logical consequences
(for logical "summation").
1. There are two mutually exclusive classes of propositions:
data, D and medical entities, E.
2. There are nine rules of relationship among these
propositions.
3. Each rule corresponds to one or more nandsets.
(Screen 24).
4. Nandsets are: green (quarantined),
yellow (conditional),
or red (absolute).
5. Proof consists of constructing a quarantine
for a claimed theorem, and showing that the nine rules do not violate
the quarantine.
6. Proof Example: The empty dataset is consistent.
7. Proof Example: Occam's Razor is satisfied for medical
entities in the empty dataset
(Occam, William of Ockham, 1285-1349, English logician and Franciscan friar).
A 60 year old male patient makes an appointment and visits a physician
for the first time in the past ten years. Since the patient makes
the appointment, we assume that the physician has permission (+#d1),
and obtains the patient's age and sex, i.e., +$d1, +d1.
Prostate example, Step 1. Live Proof:
d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.
Prostate example, Step 1. Live Proof:
Restated with intuitive notation:
d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.