MODAL LOGIC THEORY
FOR PATHOLOGY INFERENCE.
DRAFT COPY ONLY.
5/1/2008.
G. William Moore, MD, PhD.
Lawrence A. Brown, MD.
Robert H. Burger, MD, MPA.
Grover M. Hutchins, MD.
Robert E. Miller, MD.
http://www.netautopsy.org/modlthry.htm

See also: http://www.netautopsy.org/ordrlogc.htm: Order Logic.
http://www.netautopsy.org/mucordfh.htm: RDF model for Mucosal Surface Tumors.
http://www.netautopsy.org/mucoarch.htm: Notes on Mucosal Surface Tumors.
http://www.netautopsy.org/zemanch2.htm: Zeman's Modal Logic, Chapter 2.
http://www.netautopsy.org/apdmchap.htm: Anatomic Pathology Data Mining.
http://www.netautopsy.org/mucoprpl.htm: Perl Theorem Prover Script.
http://www.netautopsy.org/toknswpl.htm: Perl Token Swap Script.

From the Pathology and Laboratory Medicine Service, Baltimore Veterans Affairs Maryland Health Care System; and Departments of Pathology, University of Maryland Medical System and The Johns Hopkins Medical Institutions, Baltimore, MD.
Originally Presented: Thursday, September 18, 2003, Biomedical Computing Interest Group (BCIG), U. S. National Institutes of Health, Clinical Center, 1:00 to 3:00 PM. See: http://www.altum.com/bcig/events/seminars/2003/2003_09.htm



Please address correspondence to:
G. William Moore, MD, PhD.
Chief, Quality Assurance Section, Anatomic Pathology.
Chief, Autopsy Section.
Pathology and Laboratory Medicine Service (113).
Baltimore Veterans Affairs Maryland Health Care System.
Baltimore, Maryland 21201-1524.
George.Moore4@va.gov
Last Updated: 5/1/2008, by G. William Moore, MD, PhD.

U. S. Government Work, uncopyrighted, presented at:
Becich MJ, Crowley R, course directors.
Advancing Practice, Instruction, and Innovation through Informatics. Frontiers in Oncology and pathology. Eighth Annual Conference.
Pittsburgh, PA: University of Pittsburgh Medical Center. October 8-10, 2003. 2003;:.
http://apiii.upmc.edu
Moore GW, Brown LA, Burger RH, Hutchins GM, Miller RE.
Modal Logic Theory for Pathology Inference.
Arch Pathol Lab Med. 2004;128:.

SCREEN 1. DISCLAIMER.



United States Government Work, uncopyrighted, public-domain, DRAFT COPY ONLY. This document does not necessarily represent the views or policies of any United States Government agency. This document is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. In no event shall the authors be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of, or in connection with the document or the use or other dealings made with the document.

SCREEN 2. ABSTRACT.

Pathology studies the etiology and pathogenesis of disease. Anatomic pathology is devoted to the gross anatomy and microanatomy of diseased organs, for rendering diagnoses, and for acquiring new knowledge about disease biology. A major function of the anatomic pathologist is to issue diagnostic reports on samples from diseased tissue. The aggregate collection of these reports contains a wealth of information related to almost every serious human disease.

Any data-mining program must incorporate the fundamental constraints on data acquisition in routine medical practice. It may be unnecessary, uneconomic, technically unfeasible, or unethical to fill in all possible data-items in a rectangular database. Existing clinical databases should include formal considerations: for missing values, patient consent, patient risk, and provider alerts. This report proposes a basic theory of clinicopathologic inference.

This report proposes a mathematically consistent theory of clinicopathologic inference. There are two types of propositions: data, set D; and medical entities, set E. Data are binary propositions (i.e., true/false); and medical entities (or medical threats) are fuzzy propositions. Classically, a fuzzy proposition assumes truth-values, v, along the closed interval, [0,1], where v=0 is false and v=1 is true. For convenience in the present formulation, propositions assume certainty levels, $1, $2, $3, ..., where certainty level $k corresponds to fuzzy value (1 - 2-k); so that certainty level $1 corresponds to fuzzy value ½; certainty level $2 corresponds to fuzzy value ¾; certainty level $3 corresponds to fuzzy value ⅞, etc.

There are three ethical operators in our model: certainty ($); value (#); and payment (!); and nine rules of ethical data collection, based upon the general princiople that data should be collected whenever a medical condition sufficiently threatens the patient and the patient gives informed consent; and data should not be collected if either condition fails, i.e., there is no significantly threatening condition, or else the patient does not give informed consent.

What for? There is an emerging technology of software agents, or spiders that crawl through the worldwide web or other computer resources, looking for cases needing followup, and other medical anomalies. The language for constructing and organizing these software agents is RDF........ The use of these software agents should be constrained by minimal ethical considerations, consisting of fuzzy certainty, value, and payment for the relevant medical entities. The basic framework includes: increasing certainty of medical threats (Rules 1,2,3); Hippocratic principles (first do no harm; treat if indicated) (Rules 4,5); and ethical data collection (Rules 6,7,8,9). Ethical data collection is the idea that there is an ontology for medical threats; the physician may be concerned (vexed)) enough to collect data on a perceived medical threat; that data, once obtained, are never lost; and the use of Sutton's Law (Zebra Rule) to guide further threat assessment.

There are nine rules of relationship in this system: (1) complementizer negation/homomorphism; (2) fuzzy asymmetry; (3) crisp data; (4) Hippocrates-first; (5) Hippocrates-reverse; (6) ontology; (7) vexation; (8) ethical data collection; and (9) Schrödinger's cat.

The theory employs modal/fuzzy/multivalued logic operators of know-whether/certainty ($), value-to-know-whether (#), and pay-to-know-whether (!). There is an atomset of distinct, atomic propositions (atoms, A), each of which has a definite true-false status. Quantitative, interval, ranked, and categorical data are interpreted as collections of true-false statements. Each atomic proposition, a, is either a datum (complaints, history, physical findings, laboratory values, statements of consent, etc.); or a medical entity (cancer, inflammation, necrosis, etc.). No datum is an entity and no entity is a datum. The negation of a datum is a datum; and the negation of an entity is an entity. To each atom, a, there exists known-to-the-k-a, denoted $ka, for every integer, k, up to a maximum, M>k; and additionally for each datum, d, there exists value-to-know-d, denoted #d; and pay-to-know-d, denoted !d. A datum, d, is Hippocratic-first (do-no-harm) if and only if (not-#d implies not-!d), i.e., don't-value-d implies don't-pay-d; and Hippocratic-reverse if and only if ((not-$d and #d) implies !d), i.e., don't-know-d and value-to-know-d implies pay-to-know-d. A medical entity e, may be ontologic (exists) and vexative (worrisome) based upon previously collected data.

The theory is mathematically consistent; and satisfies Occam's Razor, namely, that no entities are known without data. The Hippocratic-first, Hippocratic-reverse, ontologic, and vexative properties are consistent if data are entered consensually, consecutively, and consistently, i.e., no datum is entered after its negation has been entered. The computer algorithm for solving this system concludes within polynomial time.

This report introduces a mathematical system for managing medical concepts and data. Modal/fuzzy/multivalued logic operators expand the purview of classical symbolic logic, to accommodate technical, economic, and consent-based constraints on clinicopathologic data collection. The theory supports such medical concepts as: do-no-harm; treat-if-valuable; disease ontologies; worrisome findings; and levels of certainty.

The theory is completely general, and permits definitions of patient injury that include possible death, morbidity, inconvenience, financial constraints, or loss-of-privacy; and definitions of value-to-know that may differ among observers (patient, physician, insurer, national health policy, research protocol). Mathematical theories can serve to organize medical knowledge and patient data, and improve the scheduling and effectiveness of data collection and surveillance in large clinicopathologic data systems.


RATIONALE.



Natural science may be regarded as the pursuit of truth, based upon observations, or data. Scientific, or evidence-based medicine, involves the collection and organization of data for the relief of human suffering. While pure science seeks only the truth, medical science has two significant, additional constraints: value and payment. Value is the benefit to the patient of obtaining a particular fact or applying a particular therapy. Payment is the aggregate expense to the patient of obtaining this fact or therapy, in inconvenience, money, pain, and/or risk of morbidity and mortality.

The present model for medical analysis recognizes two classes of binary (yes/no) logic propositions: data, D, and medical entities, E. Each datum, d∈D, such as serum prostatic specific antigen above a particular upper bound, has a payment, !d, that must be justified by a corresponding value, #d, that justifies the payment for collection. A datum is either entirely certain, +$d, or entirely uncertain, -$d.

By contrast, a medical entity, e∈E, such as "prostate cancer", is a theoretical construct supported by observations, that is never entirely certain. Increasing levels of certainty for a threatening medical entity, such as cancer, might justify progressively invasive data collection. For example, a sixty-year-old man who has not had a serum prostatic specific antigen or digital rectal examination performed for over a decade might justify performing one or both of these tests (mildly invasive). A positive result on either test raises the suspicion for prostate cancer, and might justify a (more invasive) prostate biopsy. However, the suspicion for prostate cancer in an asymptomatic sixty-year-old man with no other relevant findings, does not justify an immediate prostate biopsy.

Why construct a mathematical formalism for a few, very ordinary ideas in medicine? First, because a lot of the folk-ideas of medicine (Sutton's Law, Zebra Rule, Hippocrates' Rules, Value, Payment, St. Peter's Rule, etc.) are not well-formalized. As late as the 1980s, there was no formal definition for intention (not intension) (see Searle[]). Yet, clinical medicine involves the intention of the patient, the intention of the physician, as well as that of third-party-payers, health policy makers, etc. Despite all the advances in astrophysics, cosmology, and evolutionary biology, there is still no decent definition of free will (see Wilson[], Hawking[]); yet free will (or at least the perception of free will) is a major feature of medical care and medical ethics.

Medical care records are rapidly becoming computerized, and, alas all-to-slowly, becoming standardized. The U. S. Veterans Affairs medical centers are a leader. At the Baltimore VA Maryland Health Care System (VAMHCS), nearly all records have been computerized since 2000, including ethics records, such as patient consent and patient competence-to-consent. Quality assurance processes within the institution depend upon these records, to assure compliance of the institution to high standards of care. Although the goal of fully-automated quality assurance processes is still elusive, we can foresee the day when formal computer systems survey large collections of records, to monitor compliance with optimal standards of care. The trouble is: computer programs, by themselves, have no judgment or ethics. We physicians need to formulate the basic principles of judgment and ethics, in order to survey electronic medical records for possible anomalies in these standards.

Why bother with mathematical consistency? So that, when a computer program surveys these records, barring programming errors, one can be certain that one doesn't have a statement that is both true and false at the same time (the definition of mathematical inconsistency). It's not enough to "try out a few examples". One must verify that the actual basis for the calculations is consistent.



MODAL LOGIC:
PROSTATE CANCER EXAMPLE.



1. Modal logic is an expanded form of classical logic, in which Aristotle's (384-322 BC) Law of Excluded Middle is conditionally/partially suspended.

2. The term refers to subjunctive mood (Latin: modus subjunctivus in classical grammar.

3. In classical logic, a proposition, p, is either true or false. In modal logic, proposition, p, is either necessarily true, denoted □p; necessarily false, denoted ~p; or possibly true, denoted ◇p, where:
◇p = ~~p.
□p = ~~p.


4. Plato (428-347 BC, Greek philosopher) and Avicenna (980-1037, Persian physician and mathematician) were early contributors.

5. Modern contributors: Jan Lukasiewicz (1883-1964, Polish logician), C. I. Lewis (1883-1964), Sadegh-zadeh, and Zadeh.

6. Deontic modal logic:
6.1. Deontic Necessity: it is mandatory to do p.

6.2. Deontic Possibility: it is permitted to do p.

6.3. Deontic Passive Negation: p is not mandatory to do.
There is also: temporal modal logic (time), doxastic modal logic (belief), ....

7. Prostate cancer example. Let p=prostate cancer. Then:
7.1. A 60-year-old man who hasn't seen a doctor for ten years: □p unless 2~p,
7.2. Serum prostate specific antigen is positive: 2p unless 3~p,
7.3. Needle biopsy of the prostate is negative: 3~p unless 4p, etc.
At each step, uncertainty about a threatening medical condition, namely, prostate cancer, justifies gathering additional data: □p justifies drawing serum prostate specific antigen; 2p justifies performing prostate biopsy, etc.


FREQUENTLY ASKED QUESTIONS.



Question 1. Modal logic has been around, in one form or another, for over a century (Ŀukasiewicz). What is so special about the present version?
Answer. The present version of modal logic attempts to explain a stepwise approach to medical diagnosis, in which every data-collection step on a patient gets one closer to diagnostic certainty. For a diagnosis, p, one may know the diagnosis as necessarily p, denoted □p, necessarily necessarily p, denoted □□p, necessarily necessarily necessarily p, denoted □□□p, etc. In this formulation, one never achieves diagnostic certainty. This formulation corresponds to the medical reality that a medical diagnosis is never certain, but rather, certain to a degree that one is ethically entitled to take another step, such as run additional tests or begin treatment. Even some autopsy diagnoses are not necessarily certain: there are autopsy blocks that are processed by newer methods (such as DNA analysis) not available at the original autopsy, which yield additional diagnoses. Example: DNA analysis of autopsy blocks in victims from the 1917-1918 worldwide influenza pandemic.

Question 2. Why is it that the masters of Modal Logic (Ŀukasiewicz, Lewis, Zadeh, Zeman, Snyder) missed this particular variation of modal logic?
Answer. Perhaps because the present formulation has an infinite regress of necessarilies, □□□ □..., for which the early inventors of modal logic did not have a suitable philosophical analogy. Furthermore, the present formulation does not have a meaningful symmetry between possibly, ◇, and necessarily, □, which makes the present formulation philosophically unesthetic (vide infra). I imagine that the previous workers in this field either didn't stumble upon the present formulation; or if they did, did not consider it worthy of further investigation.

Question 3. What are some of the pitfalls and problems with this formulation?
Answer. Because of the homomorphism rule, there is no useful meaning in the present formulation for possibly possibly p, even though there is a meaning for necessarily necessarily p that is distinct from that of necessarily p.

Question 4.
Answer.

Question 5.
Answer.

Question 6.
Answer.

Question 7.
Answer.

Question 8.
Answer.

Question 9.
Answer.

Question 10.
Answer.

SCREEN 3. TABLE OF CONTENTS.


1. Disclaimer.
2. Abstract, Rationale.
3. Table of Contents.
4. Sketch of Mathematical Model.
5. Word Model: Outline.
6. Introduction.
7. Hypothetical Autopsy Report.
8. UMLS-Encoded hypothetical Autopsy Report.
9. Health Insurance Portability Accountability Act.
10. Privacy and Clinicopathologic Research.
11. Autopsy Example: Sickle Cell Crisis.
12. Contingency Table: Basic Definition.
13. Contingency Table: Hypothetical Example.
14. Contingency Table: Three-Dimensional Table.
15. Contingency Table: Problems with Classical Analysis.
16. Contingency Table: Balanced Table.
17. Contingency Table: Unbalanced Table.
18. Token Swap Test: Misclassification Paradigm.
19. Contingency Table: Neyman-Pearson Condition.
20. Confidence Regions: Neyman-Pearson Condition.
21. Contingency Table: Proof of Neyman-Pearson.
22. The Argument.
23. Schrödinger's Cat.
24. Summary of Rules: Set Theory Model.
25. Method of Proof: Illustrated Table
26. Method of Proof: Automated Theorem Prover.
27. Method of Proof: Automated Theorem Prover.
28. Method of Proof: Automated Theorem Prover.
29. Method of Proof: Automated Theorem Prover.
30. Method of Proof: Automated Theorem Prover.
31. Method of Proof: Automated Theorem Prover.
32. Method of Proof: Automated Theorem Prover.
33. Method of Proof: Automated Theorem Prover.
34. Method of Proof: Automated Theorem Prover.
35. Method of Proof: Automated Theorem Prover.
36. Method of Proof: Automated Theorem Prover.
37. Method of Proof: Live Proof, Corollary 1.
38. Method of Proof: Data, Medical Entities.
39. Method of Proof: Nine Rules of Relationship.
40. Method of Proof: Nand, Nandsets.
41. Method of Proof: Green, Yellow, Red Nandset
42. Method of Proof: Non-Violation of Quarantine.
43. Method of Proof: Empty Data Set is Consistent.
44. Method of Proof: Occam's Razor
45. Zermelo-Frankel Set Theory.
46. Zermelo-Frankel Set Theory Operations.
47. No Paradox of Self Reference.
48. Basic Concepts of the Theory.
49. Dicitur Homerum Caecum Esse.
50. Modal/Fuzzy/Multivalued Logic/Complementizers.
51. Modal/fuzzy logic: St Peter's Rule.
52. Sutton's Law.
53. Basic Definitions.
54. Rule 1. Complementizers: negation, homomorphism.
55. Rule 2. Knowledge-Fuzzy.
56. Rule 3. Data is crisp.
57. Rule 4. Hippocratic-first.
58. Rule 5. Hippocratic-reverse.
59. Rule 7. Ontologic.
60. Rule 6. Vexative.
61. Rule 8. Ethical-dative.
62. Rule 9. Schrödinger/Sutton Covers.
63. Theorem 1a. Consistency before Data-collection.
64. Theorem 1a. Style of Proof.
65. Theorem 1b,c. Occam's Razor.
66. Computational Complexity. NP-complete. TSP.
67. Theorem 10.
68. Token Swap Method Revisited.
69. Loose Ends.
70. Summary. 1.
71. Summary. 2.
72-81. Mathematical Appendix.
82. Perl Source Code.
83. References.

SCREEN 4. SKETCH OF MATHEMATICAL MODEL.


Return to Table of Contents.


0. Seven general theorems are stated and proved in this mathematical model, along with associated lemmas and corollaries. There is a live proof program in the manuscript, in which simple examples and theorems may be tested. The reader is invited to try out his/her own examples. The live proof program has been tested on 200 theorems from Zeman's Modal Logic. See Appendix H:
http://www.netautopsy.org/mucoarch.htm
1. There is a relationship between modal logic (necessarily, possibly) and fuzzy set theory, such that greater fuzzy membership implies higher levels of modal-certainty.

2. Ethical data collection (Rule 8) leads to consistent entity inferences.

3. An empty system is consistent, and implies no entities; for stepwise data collection, less data imply less entities (Occam's Razor, William of Ockham, 1285-1349, English logician and Franciscan friar, Latinized: Occam).

4.In-between theorem. Analogous to between in Euclidean geometry. If you have sufficient data to imply necessarilyk entity, then you have sufficient data to imply necessarilyk-1 entity.

5. Resource Description Framework (RDF): general syntax for writing computer-parsable ordered triples, that export meaning among databases on the semantic worldwide web, by binding a described datum to a specified subject. Internet web-crawler programs can interrogate multiple RDF documents, and draw inferences from these ordered triples. RDF-classes: Strict monoparental hierarchy; An RDF-class hierarchy is mathematically consistent.

6. RDF Theorems::
Theorem §6.1. Consistency of RDF classes.
Theorem §6.2. Identity. Class p implies p.
Theorem §6.3. Or-expansion. If p implies q, then p implies q or q or q or q....
Theorem §6.4. Telescoping.
Theorem §6.5. Contextualization.
Theorem §6.6. Intercalation.
Theorem §6.7. Retirement.

7. Token Cube / Neyman-Pearson Condition (Jerzy Neyman, 1894-1981, Polish-American statistician; Egon S. Pearson, 1895-1980, British statistician). Extension of classical contingency table analysis, which compensates for metaknowledge in a contingency table; and deals with zerodivide in chisquare test, χ2 contingency table analysis. The essential argument of the Neyman-Pearson Condition is that greater power (=(1-β)) forces greater Type I Error (=α).

SCREEN 5. WORD MODEL: OUTLINE.


Return to Table of Contents.


1. Rule 1. Complementizers: Absorb negation; homomorphic in logical-and. A complementizer is a grammatical element, such as that, whether, which, who, where, when, how,..., in a sentence, that connects an independent (main) clause to a dependent clause. For example:
it is said that Homer was blind
where it is said is the main clause; Homer was blind is the dependent clause (Homer, 8th century BC, Greek poet); and that is the complementizer. In this sentence, the complementizer, that, is negation-sensitive, that is, it is said that Homer was blind is not the same as it is said that Homer was not blind. By contrast, the complementizer, whether, is negation-insensitive, that is, it is said whether Homer was blind is the same as it is said whether Homer was not blind.

The present mathematical model has three negation-insensitive complementizers, namely:
$: it is certain/known whether
#: it is of value to know whether
!: payment to know whether
These complementizers, $, #, !, absorb negation. That is, for propositions p, q:
$p = $+p = $-p; #p = #+p = #-p; and !p = !+p = !-p.
These complementizers are homomorphic in logical-and. That is, for propositions p, q:
$(p&q)=$p&$q; #(p&q)=#p&#q; and !(p&q)=!p&!q.
2. Rule 2. Fuzzy Asymmetry. Fuzzy set theory (Zadeh, 1965) is a generalization of classical/crisp set theory, that represents different levels of certainty for the same concept. Element p has partial membership in set P, denoted vP, where v assumes any value along closed interval, v ∈ [0,1]. Fuzzy is not probability. Despite its quirky name, fuzzy is serious mathematics. Fuzzy set theory has an asymmetry property: If vP, and v>w, then wP. Classical set theory is the special case of fuzzy set theory, in which either v=0 or v=1.

3. Rule 3. Crisp Data. In our mathematical model, data are crisp/classical and entities are fuzzy.

4. Rule 4. Hippocrates-first. Hippocrates (460-370 BC, Greek physician, father of medicine) is famous for the medical dictum: first do no harm, often given in the form of Galen's (129-200, Greco-Roman physician) Latin translation: primum nón nocére.

5. Rule 5. Hippocrates-reverse A converse doctrine, also formulated by Hippocrates, that one must offer treatment to the patient if one is available: treat if you can.

6. Rule 6. Ontology (Platonic description of essential reality (Smith, 1996); Plato, 424-348 BC, Greek philosopher; Greek: οντως = ontós = real, actual; λογος = logos = word, study); is a description of the core beliefs for a field of study, in this case, ethical clinical medicine. The central idea in our model is that a collection of data, Δ, implies an entity, e, at a certainty level k, commensurate with the extent and quality of data given.

7. Rule 7. Vexation (Latin: vexari: to worry) corresponds to the worry list that every physician carries around in his/her mind, regarding patients requiring additional tests, therapy, or followup. In our mathematical model, entity e at certainty level k implies value-to-know the additional datum, d.

8. Rule 8. Ethical Data Collection. In our mathematical model, a datum, d, is collected ethically if and only if:
1. the datum is never collected;
2. payment is made and the datum is true (+d and +$d and +!d);
3. payment is made and the datum is false (-d and +$d and +!d); or
4. payment is made and the attempt fails (+!d only).
Each step at which payment is made (+!d), must be justified by value, (+#d), in the previous step.

9. Rule 9. Schrödinger's cat (Erwin Schrödinger, 1887-1961, Nobel Prize Physics, 1933) is a disappearing cat in a box. According to quantum mechanical theory, a probabilistic event, such as a radioactive decay, doesn't have a consequence (i.e., the cat neither lives nor dies) until the event is observed. In our model, an entity is not certain (Rule 6, Ontology) at a particular certainty-level until all higher certainty levels are (provisionally) excluded. In contrast to Schrödinger's cat, which involves a single physical event in which the cat lives or dies, in our model, there is a stepwise process of data collection, and corresponding cat's box openings or Schrödinger openings at each step, where the cat may die and then come back to life in subsequent data collection steps. Also known as: Sutton's Law (Willie Sutton, 1901-1980, American Bank Robber, "Slick Willie"); Zebra Rule; Black Swan; Albino crow; etc.

10. Rule 10. Neyman-Pearson Condition . (Jerzy Neyman, 1894-1981, Polish-American statistician; Egon S. Pearson, 1895-1980, British statistician). The Neyman-Pearson Condition is the condition that when performing a hypothesis test between two point hypotheses H0: θ=θ0 and H1: θ=θ1, then the likelihood-ratio test that rejects H0 in favor of H1 when
Λ(x) = (L(θ0|x) / L(θ1|x)) < η, where P(Λ(X)<η|H0)=α,
is the most powerful test of size α for a threshold η, where (L(θ0|x) / L(θ1|x)) is the likelihood ratio (or more generally, any statistical test inequality comparison); η designates the so-called critical region for the test; and α is the significance level for Type I (false positive) Error. The essential argument of the Neyman-Pearson Condition is that greater power (=(1-β)) forces greater Type I Error (=α).

SCREEN 6. INTRODUCTION.

Return to Table of Contents.

1. Data-mining in Anatomic Pathology: use of public data for drawing medical conclusions (Moore et al, 2001).

2. Constraints: patient privacy, missing values.

3. Data-mining program for pathology: incorporate ethical/technical constraints of routine medical practice.

4. At a fully-computerized medical institution, such as the Baltimore VA Maryland Health Care System, pathology data are used for quality assurance of clinical services.

5. Completing a rectangular database: may be unnecessary, uneconomic, technically unfeasible, or unethical, to collect all possible data for all possible data-cells in the table.

6. Formal considerations for missing values, patient consent, patient risk, and provider alerts.

7. Set theory definitions of atoms, data, and medical entities [7,8,9,10].

8. Fuzzy/multivalued concepts: (Zadeh, 1965).

9. Modal/fuzzy complementizers: know-whether ($), value-to-know-whether (#), and pay-to-know-whether (!) [Moore et al, 1980].

SCREEN 7. HYPOTHETICAL AUTOPSY REPORT.

Return to Table of Contents.

Male. Caucasian. 1.91 m. 95.5 kg.
b. 8/27/1908. d. 1/22/1973.
Occupation: U.S. Congressman, U.S. Senator, U.S. President.
Status post: Appendectomy.
Status post: Cholecystectomy.
History of: Renal Calculi.
Myocardial Infarct, 1955.
Myocardial Infarct, April, 1972.
Myocardial Infarct, January 22, 1973.
Marked Generalized Atherosclerosis.

Who is this person? [DeGregorio, 1997]

SCREEN 8. UMLS-ENCODED
HYPOTHETICAL AUTOPSY REPORT


Return to Table of Contents.

U. S. National Library of Medicine Unified Medical Language System: (USNLM, 2004).

Male. Caucasian. 1.91 m. 95.5 kg. {C0024554}.
b. 8/27/1908. d. 1/22/1973. {C0021132}.
Occupation: U.S. Congressman, U.S. Senator, U.S. President. {C0032382}.
Status post: Appendectomy. {C0003611}.
Status post: Cholecystectomy. {C0008320}.
History of: Renal Calculi. {C0022650}.
Myocardial Infarct, 1955. {C0027051}.
Myocardial Infarct, April, 1972. {C0027051}.
Myocardial Infarct, January 22, 1973. {C0027051}.
Marked Generalized Atherosclerosis. {C0205082,C0205046,C0205246}.

Privacy: Does the patient have a positive syphilis test?

Summary/Set theory definition: {{C0024554}, {C0021132}, {C0032382}, {C0003611}, {C0008320}, {C0022650}, {C0027051}, {C0027051}, {C0027051}, {C0205082,C0205046,C0205246}}.

SCREEN 9. HIPAA:
HEALTH INSURANCE
PORTABILITY AND ACCOUNTABILITY ACT.



Return to Table of Contents.

1. U. S. Health Insurance Portability and Accountability Act. 1996. (HIPAA, Kennedy-Kassebaum Bill, H.R. 3103 of 104th U. S. Congress).

2. Regulates all individually identifiable medical records in the USA.

3. Final Rule in force since April 14, 2003.

4. Huge fines for non-compliance: $25,000 for each record disclosed unintentionally; more for intentional disclosures or disclosures involving commercial gain.

5. Some research studies involving statistics require individual data.

6. For public research databases, no patient medical record may be individually identifiable.

7. U. S. Code of Federal Regulations. 1995.

SCREEN 10. PRIVACY AND
CLINICOPATHOLOGIC RESEARCH. (Moore et al, 2001)



Return to Table of Contents.

1. Some research studies involving statistics require individual patient data.

2. Published, grouped data may not contain all the detail necessary to evaluate the statistical analysis methods. Therefore, it would be valuable if individual data were published on the internet, so that the statistical analysis methods could be verified by the public at large.

3. Strong Privacy: The patient him/herself cannot identify his/her own medical record. Therefore, there may be at least c exact duplicates in the published record, where c is the conspiracy threshold. That is, a conspiracy of c patients could get together and demonstrate that their records, as a group, have been exposed/published on the internet.

4. Weak Privacy: The public part of a patient's record cannot be uniquely identified. Therefore, there must be c exact duplicates in the public variables of the published record.

5. Dangers of Weak Privacy: embarrassment to the patient, even if logically unfounded; sense by the patient that his/her records are public, even if they are not; if one private part is accidentally disclosed, then the remainder of the record is exposed. (See: "syphilis" example, Screen 8.)

6. Detail must be blurred just enough so that one patient can be mistaken for c other patients.

7. It is a bad idea statistically, as well as fraudulent and confusing, to create additional, phantom patients. I'm not sure that we currently have the statistical apparatus to manage even controlled, intentional fraud. (But see: Berman (2007)).

SCREEN 11. AUTOPSY EXAMPLE:
SICKLE CELL CRISIS.



Return to Table of Contents.

1. Pain crisis in sickle cell disease is an episode of poorly-localized abdominal pain, that requires major pain medications for relief. There are no characteristic morphologic features corresponding to pain crisis in sickle cell disease.

2. Can pain crisis in sickle cell disease be recognized statistically at autopsy? Is it a cause of death?

3. Parfrey NA, Moore GW, Hutchins GM.
Is pain crisis a cause of death in sickle cell disease?
Am J Clin Pathol. 1985 Aug;84(2):209-212.

4. 71 autopsied cases of sickle cell disease in the autopsy files of The Johns Hopkins Medical Institutions with adequate clinical histories. 9/20 (45%) patients died in pain, death unexplained at autopsy; 4/51(8%) patients died without pain, death unexplained at autopsy.

5. Is there a significant correlation between unexplained death and pain crisis?



Φ, Unexplained:→
Ψ, Pain Crisis:↓
NoYesTOTAL
Yes 20
No 51
TOTAL 581371



Click on the SUBMIT button.

6. No-explanation-at-autopsy is the gold-standard, Φ; and pain-crisis is the new hypothesis, Ψ being investigated.

7. Try out your own values.

SCREEN 12. CONTINGENCY TABLE ANALYSIS:
BASIC DEFINITION.



Return to Table of Contents.


1. Contingency table analysis (Screen 11, above) is a powerful method for comparing frequency data in patients with two different data-sources, Φ and Ψ (Pearson, 1904; Upton and Cook, 2006) (Karl Pearson, 1857-1936, British statistician).

2. The simplest contingency table is a rectangular table of binary (false/true) observations on patients, with two rows, two columns, and 2×2=4 cells. Columns correspond to an existing biomedical test, Φ; (death explained at autopsy); and columns correspond to a newer test, Ψ (pain crisis), as follows:

                        _____________
                 True:  |  c  |  d  |
             Ψ          |_____|_____|
                 False: |  a  |  b  |
                        |_____|_____|
                         False True
                              Φ


Classical (Two-dimensional) Contingency Table. Fig. 3490.



3. In this contingency table, cell a represents the set of patients where both test Φ and test Ψ are false (true negatives, TN); cell b represents the set of patients where test Φ is true and test Ψ is false (false negatives, FN); cell c represents the set of patients where test Φ is false and test Ψ is true (false positives, FP); and cell d represents the set of patients where both test Φ and test Ψ are true (true positives, TP).

                        _____________
                 True:  |  FP |  TP |
             Ψ          |_____|_____|
                 False: |  TN |  FN |
                        |_____|_____|
                         False True
                              Φ
That is, the lower-left and upper-right cells form the true diagonal of this table; and the upper-left and lower-right cells form the error diagonal.

4. We may calculate marginal totals, w, v, x, y; and a grand total, z, for this table, where v=a+b, w=c+d, x=a+c, y=b+d, and z=v+w=x+y=a+b+c+d.

                        _____________
                 True:  |  c  |  d  |  w
             Ψ          |_____|_____|
                 False: |  a  |  b  |  v
                        |_____|_____|  
                           x     y     z
                         False True
                              Φ
5. In classical statistics, test Φ compared to test Ψ is evaluated by the chisquare test, χ2, or by the Fisher exact test (Ronald A. Fisher, 1890-1962, British statistician), based upon the squared-normal or binomial distributions, respectively. In the null hypothesis (the statistical straw man), it is assumed that tests Φ and Ψ are statistically independent.

SCREEN 13. CONTINGENCY TABLE ANALYSIS:
HYPOTHETICAL EXAMPLE.



Return to Table of Contents.


1. In classical contingency table analysis, there is a 2×2 rectangular table, in which test Φ (columns) represents the definitive but costly test for a medical entity (e.g., prostate biopsy); and test Ψ (rows) represents a newer, less costly, less painful test for the same medical entity (e.g., serum prostate specific antigen). Suppose that we have data for both these tests on 10,000 patients, and the contingency table is as follows:

                        __________________
                 True:  | 200=c  |  90=d |    290
             Ψ          |________|_______|
                 True:  | 9700=a |  10=b |   9700
                        |______ _|_______|           
                          9700     100    10,000
                            False True
                                 Φ
2. Suppose further that we have adjusted the new test, Ψ, such that we are willing to accept a 200:10 = 20:1 ratio of false_positives:false_negatives, as shown. That is, a false-negative is much more dangerous to the patient than a false-positive, since a false-negative means that the patient is not followed-up until until the next regular screening interval; whereas a false-positive only requires the more expensive test, Φ, but at least doesn't lose the patient to follow-up.

3. Suppose that we are already convinced that tests Φ and Ψ are highly correlated (i.e., not independent), so that the classical χ2 and Fisher exact tests (Ronald A. Fisher, 1890-1962, British statistician) are not useful at this point.

4. Finally, we know that the medical entity, prostate cancer, affects much less than half the population sampled, so that (a+c)>(b+d) and c>b. Whence we may conclude that the cell totals satisfy: a>c>d>b. (Proof:......).

5. Furthermore, if we know that the actual frequency of the disease in the general population is <190 (here, 100/10,000, then we would set c/a>1% (Proof:......).

6. In the token swap test, we set the null hypothesis at b=0. Then the null hypothesis becomes:

                        _________________
                 False: |  c-b  |  d+b  |    w
             Ψ          |_______|_______|
                 True:  |  a+b |     0  |    v
                        |_______|_______|           
                             x       y       z
                            False True
                                 Φ
7. None of the null hypothesis cell totals are negative (Proof: because c>b). The marginal totals are preserved, and in particular, the ratio of Φ-positives to Ψ-positives is preserved. The token swap algorithm then addresses the question whether b is unacceptably large, based upon its distance from zero.

Null hypothesis:

                        _________________
                 False: |  190  |  100  |    290
             Ψ          |_______|_______|
                 True:  | 9700 |     0  |   9700
                        |_______|_______|           
                          9700     100    10,000
                            False True
                                 Φ

SCREEN 14. CONTINGENCY TABLE ANALYSIS:
THREE-DIMENSIONAL TABLE (CONTINGENCY CUBE).



Return to Table of Contents.


1. Many scenarios in medicine are more complex than established test Φ versus new test Ψ, in determining the presence of medical entity e. Some patients are in higher risk groups than other patients, and one is more suspicious of a false negative or false positive, based upon this ancillary, risk-biased information.

2. Therefore, we propose a third logical variable, test Ω, as a gold standard that encapsulates everything that we know about each patient. The apparatus for managing this heterogenous test Ω information is given by the medical model below.

3. Suppose that we have a three-dimensional contingency cube, where test Φ is the horizontal axis, test Ψ is the vertical axis, and test Ω is the depth axis:

Three-dimensional Contingency Table (Figure 3480).



4. There are eight cells (subcubes) in a contingency cube: a, b, c, d, e, f, g, h: with cells a, b, c, d in the Ω-front plane, as before; and corresponding cells e, f, g, h, respectively, in the Ω-back plane.

    Cell    Φ   Ψ   Ω    Diagonal:
       a    F   F   F    True.
       b    T   F   F    Favor Ψ.
       c    F   T   F    Favor Φ.
       d    T   T   F    Error.
       e    F   F   T    Error.
       f    T   F   T    Favor Φ.
       g    F   T   T    Favor Ψ.
       h    T   T   T    True.
5. There are four diagonals. In the true diagonal, ah, all three tests, Φ, Ψ, and Ω, agree, i.e., all three tests are either all true (cell a) or all false (cell h). In the error diagonal, de, both test Φ and test Ψ disagree equally with the gold standard, test Ω. In addition, there is a favor Φ diagonal, cf, in which test Φ agrees with the gold standard but test Ψ disagrees with the gold standard, test Ω; and favor Ψ diagonal, bg, in which test Ψ agrees with the gold standard but test Ψ disagrees with the gold standard, test Ω.

TOKEN SWAP CUBE : PLANAR PROJECTIONS.



Three-dimensional swap from Ψ to Φ: b → c and g → f.

Three-dimensional swap from Φ to Ψ: c → b and f → g.

Collapse/project the cube into three margin-neutral token squares:

                        _________________
                 True:  |  c+g  |  d+h  |
             Ψ          |_______|_______|
                 False: |  a+e  |  b+f  |
                        |_______|_______|
                         False   True
                              Φ

                        _________________
                 True:  |  e+g  |  f+h  |
             Ω          |_______|_______|
                 False: |  a+c  |  b+d  |
                        |_______|_______|
                         False   True
                              Φ

                        _________________
                 True:  |  e+f  |  g+h  |
             Ω          |_______|_______|
                 False: |  a+b  |  c+d  |
                        |_______|_______|
                         False   True
                              Ψ

SCREEN 15. CONTINGENCY TABLE ANALYSIS:
PROBLEMS WITH CLASSICAL TESTS.



Return to Table of Contents.


Classical contingency table analysis has several problems in biomedical applications:
1. Classical contingency table analysis assumes statistical independence between methods Φ and Ψ.

2. Expected values for cell totals must be non-zero and not close to zero.

3. There is no way to include/accommodate ancillary information, that might be known about patients in the study.

4. There is no distinction between knowable errors, based upon ancillary information; and unknowable errors.

5. Every classification has an irreducible number of unknowable errors.

6. Classical statistics has no accommodation for missing values.

SCREEN 16. CONTINGENCY TABLE ANALYSIS:
BALANCED TABLE.



Return to Table of Contents.


A contingency table is a rectangular table, with two rows and two columns [95,96,97,98].

2. Rows represent an existing gold standard, g; and columns represent a hypothesis, h

3.
Φ→
Ψ↓
-+Total
-cdv
+abw
Totalxyz


4. In the above example, the explanation-at-autopsy is the gold-standard = Φ; and pain-crisis is the hypothesis = Χ being investigated.

5. In a simple example, consider a BALANCED 2×2CT in which there are 100 patients, all told, of which 90 patients are gold standard negative, Φ- and 10 patients are gold standard positive, Φ+. Further, suppose that 50 patients are hypothesis negative, Ψ- and 50 patients are hypothesis positive, Ψ+, as follows:
BALANCED

Φ→
Ψ↓
Φ-Φ+Total
Ψ+45550
Ψ-45550
TOTAL 9010100


6. In this example, gold-standard ± is uncorrelated to hypothesis ±. The individual data cells in the table contain tokens, that represent individual patients, characterized by nothing more than their Φ±Ψ± status. In the example, the observed cell totals are: Φ-Ψ- = 45 tokens; Φ-Ψ+ = 5 tokens; Φ+Ψ- = 45 tokens; Φ+Ψ+ = 5 tokens. The marginal totals are: Φ- = 90; Φ+ = 10; Ψ- = 50; Ψ+ = 50. The grand total, z, is 100.

7. The BALANCED/EXPECTED CELL TOTALS are obtained as cross-products of the marginal totals, as follows:
Expected Φ-Ψ- = (Φ-×Ψ-)/z = 90×50/100 = 45;
Expected Φ-Ψ+ = (Φ-×Ψ+)/z = 10×50/100 = 5;
Expected Φ+Ψ- = (Φ+×Ψ-)/z = 90×50/100 = 45;
Expected Φ+Ψ+ = (Φ+×Ψ+)/z = 10×50/100 = 5.


8. Classical statistical analyses of a (2×2CT) are afforded by the CHISQUARE TEST (CST) and FISHER EXACT TEST (FXT), based upon statistical sampling assumptions (Ronald A. Fisher, 1890-1962, British statistician).

9. The TOKEN SWAP TEST (TST) is a statistical-type significance test, that measures the likelihood of MISCLASSIFICATIONS in a 2×2CT.

SCREEN 17. CONTINGENCY TABLE ANALYSIS:
UNBALANCED TABLE.



Return to Table of Contents.



1. Now consider an UNBALANCED 2×2CT, with the SAME MARGINAL TOTALS as above.
The least-unbalanced example has only a single token misclassified:

UNBALANCED: BALANCED+1

.Φ-Φ+TOTAL
Ψ+464490
Ψ-4610
TOTAL 5050100

The second-least-unbalanced example has two tokens misclassified:

UNBALANCED: BALANCED+2

.Φ-Φ+TOTAL
Ψ-474390
Ψ+3710
TOTAL 5050100

... and so forth.

2. How unbalanced can the observed data-cells be, before we suspect that there is a genuine relationship between the gold-standard g, and the hypothesis, h? That is, how unbalanced can the observed data-cells be, before one rejects the null hypothesis?

3. The CHISQUARE TEST (CST) and FISHER EXACT TEST (FXT) are based upon statistical sampling assumptions (Ronald A. Fisher, 1890-1962, British statistician).

4. The TOKEN SWAP TEST does not depend upon the usual statistical assumptions of repeated, random sampling from a source population.

SCREEN 18. TOKEN SWAP TEST: MISCLASSIFICATION PARADIGM.


Return to Table of Contents.

1. TOKEN SWAP SIGNIFICANCE EXAMPLE. In the following example, it requires five TOKEN SWAPS to transform the expected into the observed contingency table:
EXPECTED

.NOYESTOTAL
YES16420
NO42951
TOTAL581371
⇒⇒⇒
EXPECTED+1

.NOYESTOTAL
YES15520
NO43851
TOTAL581371
⇒⇒⇒
EXPECTED+2

.NOYESTOTAL
YES14620
NO44751
TOTAL581371
⇒⇒⇒
EXPECTED+3

.NOYESTOTAL
YES13720
NO45651
TOTAL581371
⇒⇒⇒
EXPECTED+4

.NOYESTOTAL
YES12820
NO46551
TOTAL581371
⇒⇒⇒
EXPECTED+5
=OBSERVED

.NOYESTOTAL
YES11920
NO47451
TOTAL581371


2. In the zeroth token-swap, the chances that the EXPECTED-to-EXPECTED+1 swaps could have taken place AT RANDOM are:
         (9×16)
   _________________________
   (9×16)+(4×42)
that is, the number of possible of EXPECTED-to-EXPECTED+1 swaps, divided by (the number of possible EXPECTED-to-EXPECTED+1 swaps plus the number of possible EXPECTED-to-EXPECTED-1 swaps), without altering the marginal totals.

3. In the zeroth token-swap, the chances that the EXPECTED-to-EXPECTED-1 swaps could have taken place AT RANDOM are:
         (4×42)
   _________________________
   (9×16)+(4×42)


4. In the first right token-swap, the chances that the EXPECTED+1-to-EXPECTED+2 swaps could have taken place AT RANDOM are:
         (8×15)
   _________________________
   (8×15)+(5×43)


5. In the first left token-swap, the chances that the EXPECTED+1-to-EXPECTED swaps could have taken place AT RANDOM are:
         (5×43)
   _________________________
   (8×15)+(5×43)
and so forth.

6. When the EXPECTED has swapped up to the OBSERVED table, without altering the marginal totals, and the proportion of such swaps is less than 5%, then the result is significant.

7. If the result is not significant, then we say that the observed 2×2CT is NOT SO DIFFERENT from the expected 2×2CT, that occasional misclassifications by a medical observer could account for the differences.

SCREEN 19. CONTINGENCY TABLE ANALYSIS.
NEYMAN-PEARSON CONDITION.



Return to Table of Contents.


1. In statistics, the Neyman-Pearson Condition (Jerzy Neyman, 1894-1981, Polish-American statistician; Egon S. Pearson, 1895-1980, British statistician) is the condition that when performing a hypothesis test between two point hypotheses H0: θ=θ0 and H1: θ=θ1, then the likelihood-ratio test that rejects H0 in favor of H1 when
Λ(x) = (L(θ0|x) / L(θ1|x)) < η, where P(Λ(X)<η|H0)=α
is the most powerful test of size α for a threshold η, where (L(θ0|x) / L(θ1|x)) is the likelihood ratio (or more generally, any statistical test inequality comparison); η designates the so-called critical region for the test, and α is the significance level for Type I (false positive) Error.

If the test is most powerful for all θ1 ∈ Θ1, then it is said to be uniformly most powerful (UMP). The essential argument of the Neyman-Pearson Condition is that greater power (=(1-β)) forces greater Type I Error (=α).

2. In practice, the likelihood ratio itself is not actually used in the test. Instead one computes the ratio to see how the key statistic in it is related to the size of the ratio (i.e. whether a large statistic corresponds to a small ratio or to a large one).

3. Neyman J, Pearson E.
On the Problem of the Most Efficient Tests of Statistical Hypotheses.
Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character. 1933;231:289-337.

4. cnx.org: Neyman-Pearson criterion:
http://cnx.org/content/m11548/latest/

SCREEN 20. CONFIDENCE REGIONS:
NEYMAN-PEARSON CONDITION.



Return to Table of Contents.


1. The Neyman-Pearson Condition involves the notion of confidence intervals, which reverse the traditional notion of hypothesis testing. In traditional hypothesis testing with a symmetric random variable, such as the normal distribution with population mean, μ, and population standard deviation, σ, we determine the probability whether a sample mean, X, lies within a fixed interval, say, X ± ησ, about the population mean, μ, i.e., the probability that X∈[μ-ησ,μ+ησ], or μ-ησ < X < μ+ησ:
Figure 3485.

2. In many cases, however, we don't really care about which proportion (probability) of values of X fall within this interval. Rather, we may have a good sense of the value of the population standard deviation, σ, but a poor sense regarding that of the population mean, μ. Furthermore, we may wish to estimate the value for μ, based upon our knowledge of X and σ.

3. Let us reverse the question to its algebraic equivalent, namely, whether σ lies in the interval, say, ±ησ, about X, i.e., σ ∈ [X -ησ, X +ησ] or X -ησ < σ < X +ησ:
Figure 3486.

Proof that (1): X -ησ < σ < X +ησ is equivalent to (2): μ-ησ < X < μ+ησ.
Expression (1) consists of expressions (1a): X -ησ < σ and (1b): σ < X +ησ. Add ησ to expression (1a) and -ησ to expression (1b), to obtain: X < μ+ησ and μ-ησ < X , which yield (2) . Q.E.D.




4. This reversal may seem like a peculiar probabilistic formulation, since X is subject to random fluctuations, whereas the population mean, μ, is fixed. Neyman and Pearson proposed the following interpretation in their theory of confidence intervals. The probability value, α, represents the probability that the random interval, X ± ησ, with bracket μ, as shown in Figure 3463:
Figure 3463.

Here, we show 20 trials, each of size N, where 1/20 (probability 5%) of the trial confidence bars fall outside the desired population mean, μ.

5. Of course, the population standard deviation, σ, is typically not known, but may be estimated as the sample standard deviation, S, divided by √N, where S/√N is the sample standard error, for trial-size, N. This sample standard error may vary from trial-to-trial, where the error bars are different sizes, corresponding to different values for S, as shown in Figure 3464:
Figure 3464. 6. The parameter, η, satisfies the Student t distribution for (N-1) degrees of freedom. The Neyman-Pearson condition asserts that....

7. The token swap test is a non-statistical test, in which there is no assumption of sampling; rather, probabilities are calculated from data internal to the contingency table itself. For this interpretation of the Neyman-Pearson condition, we must demonstrate that, for a given initial hypothesis, in which the marginal and grand totals are fixed and specified, a greater value for η, corresponds to a smaller value for α.

8. The essential argument of the Neyman-Pearson Condition is that greater power (=(1-β)) forces greater Type I Error (=α). For example, in a Gaussian distribution with two hypotheses, θ0 (null hypothesis) and θ1 (alternative hypothesis), the Type I error is designated as α and the Type II error is designated as β:
Figure 3477. The power, = (1-β), of the hypothesis test increases, at the expense of increasing the Type I error:
Figure 3478.

SCREEN 20A. CONFIDENCE REGIONS:
NEYMAN-PEARSON CONDITION.



Return to Table of Contents.


1. The easiest way to understand the Neyman-Pearson Condition is to consider two curves:
Figure 3477.               Figure 3489.
The left curve, θ0, corresponds to the null hypothesis; and the right curve, θ1, corresponds to the alternative hypothesis.

2. A vertical line, η, is drawn between the two curves.

3. The shaded area ///// under the left curve, θ0, that lies right of line η, represents Type I Error = α error = false positives, assuming that the null hypothesis is true.

4. The shaded area \\\\\ under the right curve, θ1, that lies left of line η, represents Type II Error = β error = false negatives, assuming that the alternative hypothesis is true.

5. The power of a statistical test with respect to the alternative hypothesis is denoted, (1 - β).

6. If one increases the power of the alternative hypothesis, this is done at the expense of increasing the α error of the null hypothesis.

7. The Neyman-Pearson Condition is the property that hypotheses θ0 and θ1 are chosen to maximize the power of θ1, for a given θ0 and a given α error.

8. In the token swap test, the bell-shaped curves are replaced with discrete histograms:
Figure 3479.
The red line shown here is the η-line. The left histogram is predominantly the null hypothesis; and the right histogram is predominantly the alternative hypothesis.

SCREEN 21. CONTINGENCY TABLE ANALYSIS:
PROOF OF THE NEYMAN-PEARSON CONDITION.


Return to Table of Contents.



The essential argument of the Neyman-Pearson Condition is that greater power (=(1-β)) forces greater Type I Error (=α).

Lemma 1. In a 2×2 contingency table with given marginal totals, the frequency of cell d determines the frequencies of the other cell totals, a, b, and c.
Proof. Consider any value of d, where v, w, x, and y are determined. Then b=y-d, c=w-d, and a=x-c.

Lemma 2. In a 2×2 contingency table, let Fkj, for 0<Fkj<1, represent the proportion of tokens at frequency j in cell d after k swaps; for D, the expected value of cell d, let F0D=1, and F0j≠D=0. Then:
(1) Fkj=0 for j<(D-k) and j>(D+k).

(2) Fk(D-k)>0 and Fk(D+k)>0.

(3) F(k+1)(D-k-1) < Fk(D-k) and F(k+1)(D+k+1) < Fk(D+k)
Proof. Part (1). Let k=1. Then:
F1(D+1) = [F0D×(CB/(AD+CB)) + F0(D+2)×...] where [F0(D+2)=0;
and F1(D-1) = [F0D×(AD/(AD+CB)) + F0(D-2)×...] where F0(D-2)=0.
By definition, F1j = [F0(j-1)×...+ F0(j+1)×...].
For j<(D-k), then F0(j-1) = F0(<D-2)=0 and F0(j+1) = F0(<D)=0.
For j>(D+k), then F0(j-1) = F0(>D)=0 and F0(j+1) = F0(>D+2)=0.

Let the lemma be true for k. Then.....

Proof. Part (2). Let k=1. Then:

Proof. Part (3). Let k=1. Then:

Theorem 1. The token swap test satisfies the Neyman-Pearson Condition.
Proof. ...........

SCREEN 22. THE ARGUMENT.


Return to Table of Contents.

1. Atomic statements of the medical model are propositions, i.e., statements that are either true, false, or uncertain. The negation of a proposition is also a proposition; the double-negation of a proposition equals the original proposition, i.e., --p=+p. We recognize two mutually exclusive sets of propositions: data, set D; and medical entities, set E. The negation of every datum is a datum, i.e., +d ∈ D implies -d ∈ D; and the negation of every medical entity is a medical entity, i.e., +e ∈ E implies -e ∈ E.

2. A datum is understood as a fixed event, with a fixed date/time and a localization on the patient, as for example, a serum potassium of 2.6 mEq/dL on January 1, 2007, at 8:00 AM; or a 0.5 cm pearly papule biopsied from the left nasal ala on January 1, 2007, at 8:00 AM.

3. A medical entity is an inferred truth, such as heart failure or basal cell carcinoma. A datum is either absolutely true, absolutely false, or absolutely uncertain. A medical entity is fuzzily true or fuzzily false, based upon inferences drawn from a data vector, Δ = {+d1, +d2, ... +dn}, available at a particular time.

4. The relationship of medical entities to data is specified by an ontology (Rule 6) of accepted core beliefs in medicine. For example, a pearly papule and a confirmatory pathology report from the biopsy implies basal cell carcinoma, say, at a fuzzy level of 7/8 (or a certainty level of 3, see below).

5. Not every pearly papule of the nose is examined by a physician; and the physician does not biopsy every pearly papule that he/she examines. The patient must be worried enough about the papule to schedule a doctor's appointment; and the physician must be worried enough about the papule to justify a diagnostic biopsy. Rule 7 is the Vexative Rule (Latin: vexari = to worry), that provides justifications for obtaining particular data. It is assumed that every datum obtained has some payment, however small, in injury, pain, money, inconvenience, or risk of morbidity or mortality to the patient.

6. Rule 8, or Sutton's Law (go where the money is) (Willie Sutton, 1901-1980, American Bank Robber, nicknamed "Slick Willie") is the rule of jumping to conclusions based upon incomplete data (Brewka, 1997), also known as the Zebra Rule (if you hear hoofbeats in the street, think of horses not zebras). Medical reasoning inevitably involves decisions under uncertainty. One collects limited data, from which one must draw an initial conclusion. One has a a complementary/converse ethical mandate (Rule 5) to treat a threatening disease condition if there is compelling (but not absolute) evidence for it. On the other hand, one has the ethical mandate (Rule 4, first do no harm) not to collect unnecessary data, that might harm the patient physically, mentally, or financially. Therefore, there will be instances in which one initially jumps to the most likely but wrong conclusion, based upon data that are obtained subsequently.

In Petersdorf and Beeson's (1961) original paper on Sutton's Law, namely, (Fevers of Unexplained Origin)), these events are clinical findings suggesting one infectious agent that are superseded by subsequent culture results. In medical slang, these unexpected reversals are called zebras (Groopman, 2007). (Willie Sutton, 1901-1980, American Bank Robber; the original "Slick Willie": nickname for U. S. President Bill Clinton).

SCREEN 23. SCHRÖDINGER'S CAT.


Return to Table of Contents.

1. In classical propositional logic, these infrequent reversals of usual conclusions (which, cumulatively, occur rather often in medical practice) result in a mathematical inconsistency, i.e., a proposition that is both true and false, a mathematical abomination. This inconsistency may be avoided by requiring that conclusions be interpreted as medical entities, that are fuzzily true, but may be overturned by subsequent data.

2. The companion concept for overturning a plausible conclusion based upon subsequent data collection is Schrödinger's Cat:
"... There is a famous thought experiment called Schrödinger's cat. A cat is placed in a sealed box. There is a gun pointing at it, and it will go off if a radioactive nucleus decays. The probability of this happening is fifty percent. (Today no one would dare propose such a thing, even purely as a thought experiment, but in Schrödinger's time they had not heard of animal liberation.)

"If one opens the box, one will find the cat either dead or alive. But before the box is opened, the quantum state of the cat will be a mixture of the dead cat state with a state in which the cat is alive. This some philosophers of science find very hard to accept. The cat can't be half shot and half not-shot, they claim, any more than one can be half pregnant. Their difficulty arises because they are implicitly using a classical concept of reality. In this view, an object has not just a single history but all possible histories. In most cases, the probability of having a particular history will cancel out with the probability of having a very slightly different history; but in certain cases, the probabilities of neighboring histories reinforce each other. It is one of these reinforced histories that we observe as the history of the object.

"In the case of Schrödinger's cat, there are two histories that are reinforced. In one the cat is shot, while in the other it remains alive. In quantum theory both possibilities can exist together. But some philosophers get themselves tied in knots because they implicitly assume that the cat can only have one history."
From:
Hawking S.
Black Holes and Baby Universes and Other Essays.
New York: Bantam Books. 1993;:. Pages 44-45.
ISBN 0-553-37411-7, 182 pages.


3. In our formulation, as with the boxed Schrödinger's Cat, no medical entity is every absolutely certain [other than possibly in the mind of God, because God presumably works with a larger data vector than we mortals can ever know. Or, medical entities are perhaps also uncertain even in the mind of God, and the certainty model itself has been imposed upon God by arrogant humans. In any event, Schrödinger's Cat always has an encore in our formulation.]

4. In Schrödinger's Cat, one irrevocably determines the life-status of the cat when the cat's box is opened. In our mathematical model, one determines the status of medical entities when you apply Sutton's Law, i.e., jump to the most likely conclusion, given the data that you have on hand. In our mathematical model, this Schrödinger Opening of the cat's box unleashes an ethical mandate (Rule 7, Vexative) to collect additional data. In Schrödinger's formulation, the cat's box is opened exactly once. In our mathematical model, the cat's box is opened once; vexative data are collected; the cat's box is closed (i.e., Sutton's Law is suspended again); the cat's box is opened again; additional vexative data are collected; the cat's box is closed again, ....

SCREEN 24. SUMMARY OF RULES:
SET THEORY FORMULATION.


Return to Table of Contents.



0. The logic in this report is based upon classical logic, with the following three complementizers: payment (!); value(#); and knowledge/certainty($). That is, the harm/payment created by achieving higher levels of knowledge/certainty must be balanced by the value in obtaining that knowledge/certainty.

1. Rule 1. Complementizers: Absorb negation, homomorphic in logical-and. = complementizer-positive. That is: negative-negative-p equals p; know-negative-x equals know-p; pay-negative-p equals pay-p; value-negative-x equals value-p. Homomorphic in logical-and...........

2. Rule 2. Fuzzy asymmetry. More-certain implies less-certain. Certaink+1p implies certainkp.
Nandset definition: {-$kp,+$k+1p}.

3. Rule 3. Data are crisp. You either know a datum or not.
Nandset definition: {+$d,-$d}.

4. Rule 4. Hippocrates-first (Hippocrates, 460-370 BC, Greek physician, father of medicine). That is, payment-datum implies value-datum. (Contrapositively: no-value-datum implies no-payment-datum.)
Nandset definition: {-#d,+!d}

5. Rule 5. Hippocrates-reverse. Treat if you can. Not-know-datum and value-datum implies harm-datum.
Nandset definition: {-$d,+#d,-!d}.

6. Rule 6. Ontology. If you know certain entities and data, then this generates the knowledge/certainty of an additional entity. For example, if this patient has an elevated serum-prostatic-specific-antigen, then you become more certain that the patient has prostate cancer.
Nandset definition: {+$kΔ,Δ,..,-e,-$k+1e} and {+$kΔ,Δ,..,-$ke}.

7. Rule 7. Vexative. If you know certain entities and data, then this generates value for an additional datum. That is, you become vexed by your ignorance of that additional datum. For example, if you know that an elderly male patient has not had a serum-prostatic-specific-antigen in the past five years, you become vexed regarding that missing-datum.
Nandset definition: {+$ke,e,-$d,-#d,-$k+1e}.

8. Rule 8. Ethical Data Registration. For each datum, there is a data-collection step, J, at which the datum is collected and is true; or the datum is collected and is false; or the datum collection attempt fails and the datum is unknown. Otherwise, the datum is never attempted and never collected. That is, for d ∈ D, there exists at most one J, 1 < J < H, at which (8.1.1) +$d, +d, +!d are true; or else (8.1.2) +$d, -d, +!d are true; or else (8.1.3) -$d, +!d are true. (8.2) Otherwise, for every J, 1 < J < H, -$d, +$d, -#d, +#d, -!d, +!d are all not entered into (SJ - SJ-1). The nandsets for Rule 8 are: (8.1.1) {-$d}, {-d}, {-!d} ∈ (SJ - SJ-1); or else (8.1.2) {-$d}, {+d}, {-!d} ∈ (SJ - SJ-1); or else (8.1.3) {+$d}, {-!d} ∈ (SJ - SJ-1). (8.2) Otherwise, {+$d}, {-$d}, {+#d}, {-#d}, {+!d}, {-!d} ~∈ SJ - SJ-1).



9. Rule 9. Schrödinger's Rule. At data-collection-step J, we create a set, OJ, the SCHRÖDINGER OPENING. The nandset for -$kω, namely, {+$kω}, is placed in OJ if and only if the nandset for +$kω, namely, {-$kω}, is NOT a member of the logical consequences of the data-collection-step, denoted (for logical "summation"). That is, anything that is uncertain at data-collection-step J is declared uncertain in OJ. If the cat's life is uncertain at data-collection-step J, then it is declared uncertain in OJ. However, the cat may spring alive again at data-collection-step (J+1). Watch closely: the reasoning is a little tricky.

Rule 9, Schrödinger's Rule: It is true that -$kω for OJ if and only if +$kω is not a logical consequence, denoted (for logical "summation"), of SJ. The nandset for Rule 9 is: {+$kω} ∈ OJ if and only if {-$kω} ~∈ ∫SJ, where represents logical consequences (for logical "summation").

SCREEN 25. METHOD OF PROOF:
ILLUSTRATED TABLES.


Return to Table of Contents.



1. There are two mutually exclusive classes of propositions: data, D and medical entities, E.

2. There are nine rules of relationship among these propositions.

3. Each rule corresponds to one or more nandsets. (Screen 24).

4. Nandsets are: green (quarantined), yellow (conditional), or red (absolute).

5. Proof consists of constructing a quarantine for a claimed theorem, and showing that the nine rules do not violate the quarantine.

6. Proof Example: The empty dataset is consistent.

7. Proof Example: Occam's Razor is satisfied for medical entities in the empty dataset (Occam, William of Ockham, 1285-1349, English logician and Franciscan friar).


stru3403.xls
stru3404.xls
stru3405.xls
stru3406.xls



Data Empty (Fig. 3403):





Data Positive (Fig. 3404):





Data Negative (Fig. 3405):





Data Failure (Fig. 3406):




SCREEN 26. METHOD OF PROOF:
AUTOMATED THEOREM PROVER.
SIMPLE PROSTATE MODEL: STEP 1.
60 YEAR OLD MALE.


Return to Table of Contents.

A 60 year old male patient makes an appointment and visits a physician for the first time in the past ten years. Since the patient makes the appointment, we assume that the physician has permission (+#d1), and obtains the patient's age and sex, i.e., +$d1, +d1.


Prostate example, Step 1. Live Proof:

d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.

Solution: +#d1 , +$d1 , +!d1 , +d1 , -$d2 , -$d3 , -$$e , -$$$e , +e , +$e , -$d2 , -$d3.

Prostate example, Step 1. Live Proof:
Restated with intuitive notation:
d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.

Solution: +#malesixty , +$malesixty , +!malesixty , +malesixty , -$psapositive , -$prostatectomyca , -$$hasprca , -$$$hasprca , +hasprca , +$hasprca.

SCREEN 27. METHOD OF PROOF:
AUTOMATED THEOREM PROVER.
SIMPLE PROSTATE MODEL: STEP 2.
60 YEAR OLD MALE, ELEVATED PSA.


Return to Table of Contents.



Prostate example, Step 2. Live Proof:

d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.

Solution: +#d1 , +!d1 , +$d1 , +d1 , +#d2 , +$d2 , +!d2 , +d2 , -$d3 , -$$$e , -$$$$e , +$e , +e , +$$e.

Prostate example, Step 2. Live Proof:

d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.

Solution: +#malesixty , +!malesixty , +$malesixty , +malesixty , +#psapositive , +$psapositive , +!psapositive , +psapositive , -$prostatectomyca , -$$$hasprca , -$$$$hasprca , +$hasprca , +hasprca , +$$hasprca.

SCREEN 28. METHOD OF PROOF:
AUTOMATED THEOREM PROVER.
SIMPLE PROSTATE MODEL: STEP 2a.
60 YEAR OLD MALE, ELEVATED PSA.


Return to Table of Contents.



Prostate example, Step 2a. Live Proof:

d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.

Solution: +#d1 , +!d1 , +$d1 , +d1 , +#d2 , +$d2 , +!d2 , -d2 , -$d3 , -$$$e , -$$$$e , +$e.

Prostate example, Step 2a. Live Proof:

d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.

Solution: +#malesixty , +!malesixty , +$malesixty , +malesixty , +#psapositive , +$psapositive , +!psapositive , -psapositive , -$prostatectomyca , -$$$hasprca , -$$$$hasprca +$hasprca.

SCREEN 29. METHOD OF PROOF:
AUTOMATED THEOREM PROVER.
SIMPLE PROSTATE MODEL: STEP 2b.
60 YEAR OLD MALE, ELEVATED PSA.


Return to Table of Contents.



Prostate example, Step 2b. Live Proof:

d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.

Solution: +#d1 , +!d1 , +$d1 , +d1 , +#d2 , +$d2 , +!d2 , -d2 , -$d3 , -$$$e , -$$$$e , +$e , -e , +$$e , +#d1 , +!d1 , +$d1 , +d1 , +#d2 , +$d2 , +!d2 , -d2 , -$d3 , -$$$e.

Prostate example, Step 2b. Live Proof:

d1 = Patient is a 60 old male, malesixty.
d2 = Perform PSA test, psapositive.
d3 = Perform prostatectomy, prostatectomyca.
e = Has prostate carcinoma, hasprca.