ARTIFICIAL INTELLIGENCE
IN ANATOMIC PATHOLOGY.
© 2001-2002, G. William Moore, MD, PhD.
http://www.netautopsy.org/whatisai.htm
DRAFT VERSION, February, 2002.

Chief, Quality Improvement Section, Anatomic Pathology
Pathology & Lab Medicine Service (113)
Veterans Affairs Medical Center
Baltimore, MD 21201-1524

Clinical Associate Professor,
Department of Pathology,
University of Maryland School of Medicine.
Baltimore, MD 21201.

Visiting Assistant Professor,
Department of Pathology
The Johns Hopkins Medical Institutions.
Baltimore, MD 21287.

email: gwmoore@gwmoore.org
email: webmaster@netautopsy.org
Publications of G. William Moore, MD, PhD.




TABLE OF CONTENTS.


1. WHAT IS ARTIFICIAL INTELLIGENCE?
2. CENTRAL CONTROVERSY: ARTIFICIAL INTELLIGENCE VERSUS CURVE-FITTING.
3. HUMAN PATHOLOGY AND ARTIFICIAL INTELLIGENCE.
4. ARTIFICIAL INTELLIGENCE AND THE INTERNET.
5. AI APPLICATIONS: PROGRAMMING LANGUAGES.
6. AI APPLICATIONS: SYMBOLIC LOGIC. SET THEORY. INFERENCE.
7. AI APPLICATIONS: COMPUTATIONAL LINGUISTICS.
8. AI APPLICATIONS: CALCULUS.
9. AI APPLICATIONS: PROBABILITY AND STATISTICS.
10. AI APPLICATIONS: COMPUTATIONAL COMPLEXITY.
11. AI APPLICATIONS: NUMBER THEORY.
12. ARTIFICIAL INTELLIGENCE ONTOLOGIES.
13. TOPICS IN ARTIFICIAL INTELLIGENCE.
14. ARTIFICIAL INTELLIGENCE IN MEDICINE.
15. AI AND ANATOMIC PATHOLOGY.
16. ARTIFICIAL INTELLIGENCE ROBOTICS.
17. DISCUSSION.
18. REFERENCES.



1. INTRODUCTION: WHAT IS
ARTIFICIAL INTELLIGENCE?


GO TO NEXT PAGE.
GO TO TABLE OF CONTENTS.

      In popular culture, ARTIFICIAL INTELLIGENCE (AI) is a property of advanced computers, which imitate the high-order cognitive functions of humans. Questions associated with this idea include robotics and the issue of whether computers can effectively imitate high-order human behavior. DREYFUS has written a series of books, starting with WHAT COMPUTERS CAN'T DO, which challenge the idea that machines can ever imitate high-order human cognitive behavior.

      Technically, ARTIFICIAL INTELLIGENCE is a collection of mathematical and computing methods for predicting complex real-world processes, such as: the behavior of complicated games, like chess; the behavior of experts, who might predict future real-estate values, render medical diagnoses, or translate foreign languages; the behavior of complex biological systems; or ordinary human behavior, such as perceiving objects visually or auditorily. The fundamental method of artificial intelligence is the RULEBASE.

      In the early 1960s, PROF. MARVIN MINSKY, at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT), assigned this technical definition to AI, which caught the imagination of the public, and perhaps not coincidentally, many granting agencies in the U. S. Federal Government. Much of the early excitement around AI results from this dubious association of the popular and technical definitions of AI. AI has also been exciting to the academic community, which since the days of LEIBNIZ, the seventeenth century German philosopher who co-invented CALCULUS, sought to understand and build the RATIOCINATOR UNIVERSALIS, or universal reasoning machine, today known as SYMBOLIC LOGIC. The mathematical form of symbolic logic expressions superficially resembles those of an AI rulebase.

      The fundamental method of artificial intelligence, the RULEBASE, consists of PRODUCTION RULES, that usually take the form: IF X IS TRUE, THEN Y IS TRUE; or IF X IS TRUE, THEN PRODUCE Y. The rules in a rulebase are derived from general knowledge, specialized judgments made by experts, and error-testing on previous versions of an existing AI system.



2. CENTRAL CONTROVERSY OF
ARTIFICIAL INTELLIGENCE:
AI VERSUS CURVE-FITTING.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      The chief competitor for predicting the behavior of complex real-world processes is NUMERICAL ANALYSIS. In numerical analysis, the mathematical and computer methods do NOT presuppose general or pre-existing knowledge about the real-world process, but instead rely on TRAINING SETS. Popular methods of numerical analysis include: REGRESSION ANALYSIS and NEUROCOMPUTING.

      REGRESSION ANALYSIS is a mathematical method in which INPUT-VALUES, x, w, v, u, t,...... are placed in a formula, or REGRESSION, that typically has the mathematical form:
y = a + bx + cw + dv + eu + ft + ......
where y is an OUTPUT-VALUE, and a, b, c, .... are coefficients. The values of the coefficients are determined by data analysis, most commonly by the LEAST SQUARES METHOD. In a typical regression, there may be a variety of output-values, and some of the input values may be squared, cubed, etc.

      The simplest form of regression analysis is LINEAR REGRESSION ANALYSIS, in which the output-value, y, equals the input-value, x, times a LINEAR COEFFICIENT, b, plus a CONSTANT COEFFICIENT, a, as follows:
y = a + bx.


      NEUROCOMPUTING is the study of ARTIFICIAL NEURAL NETS. A NEURAL NET (not to be confused with biological neurons, which bear only a superficial resemblance to computer nets) is a set of connected nodes, or NEURONS, consisting of INPUT-NODES, OUTPUT-NODES, and INTERNAL (HIDDEN) NODES. The nodes and connections assumed numeric values, just as the coefficients a, b, c, d,.... in classical regression analysis. The advance of neurocomputing systems over classical regression analysis is that classical regression models employ only addition, subtraction, multiplication, division, and analytic mathematical functions, such as sine, cosine, tangent, exponential, logarithm, etc. Neurocomputing models contain all these functions, plus logical operators such as NOT, AND, OR, IMPLIES,....

      The goal of a neurocomputing system is to TRAIN a neural net optimally, i.e., to assign numerical values to nodes and connections, in order to predict known outputs for a given set of inputs (TRAINING SET); then to apply this training information to as-yet-unexamined inputs (TEST SET). The scope, complexity, and relative weighting of the connections, and the application of these models to practical computing problems, form much of the intellectual content of neurocomputing.

      The fundamental, philosophical problem with regression analysis, neurocomputing, and other methods of numerical analysis, is that they represent CURVE-FITTING METHODS, that do not incorporate any of our pre-existing knowledge about a particular process. In AI, by contrast, our existing knowledge (or existing misconceptions, depending upon your perspective) is formalized in the rulebase.

      Supporters of neurocomputing point out that their methods are not biased by any of our pre-existing prejudices about a field. Therefore, neurocomputing is more open to creativity. As a practical matter, this assertion by the neurocomputing advocates is false, because neurocomputing algorithsm must choose what to use as inputs and outputs, and how to select this training set. These choices impose a de facto bias into conclusions drawn from neural nets.

      DEBATE BETWEEN AI AND NEUROCOMPUTING. A neural net is a set of nodes and connections that store experimental knowledge obtained by training on task examples. Artificial intelligence is a set of apriori deductive rulebases. If an artificial intelligence system yields the wrong answer for a particular dataset, then the human software architect must isolate the faulty rule and replace it. This repair process may require substantial human insight. By contrast, a faulty neurocomputing system might need nothing more than additional nodes, connections, or training-set examples, requiring no particular insight to add to the system. Thus, artificial intelligence is forever dependent upon knowledgeable human authors and troubleshooters. It is a common experience that an artificial intelligence system dies soon after its creator retires or moves to another institution.

      Perhaps there is a possible borderland between neurocomputing and artificial intelligence: a system grounded in a deductive rulebase but fine-tuned by training on focused task examples. Or perhaps either the mathematical complexity or the professional antagonism between the two fields is too great.



3. HUMAN PATHOLOGY AND
ARTIFICIAL INTELLIGENCE.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      PATHOLOGY is the study of disease. ANATOMIC PATHOLOGY is that area of pathology that studies the gross anatomy and microanatomy of diseased organs, in order to render specific diagnoses. Surgical pathologists issue diagnostic reports on tissue samples taken from suspected lesions. Individual SURGICAL PATHOLOGY REPORTS are needed to guide treatment of the individual patient. In the USA, an estimated 40 million surgical pathology reports are issued each year, an average of one per seven patients. The aggregate collection of free-text reports contains information related to almost every serious human disease. A major goal of DATA MINING in anatomic pathology is to extract research value from collections of surgical pathology reports.

      A surgical pathology report is divided into FIELDS, including: date/time; quantities (number of specimen-cups, weights, measurements, etc.); name identifiers (patient, surgeon, pathologist); location-codes (department, ward); billing-codes (procedure, difficulty of procedure, payer, etc.); diagnosis-codes (source of specimen, difficulty of diagnosis, etc.). and medical free-text (clinical history, pathologic diagnosis).

      DATE/TIME is subject to obvious constraints: specimen is obtained from the patient after the patient's date of birth; specimen is obtained from the patient before being received in the laboratory; specimen is received in the laboratory before receiving a diagnosis, etc.

      QUANTITIES are typically not well-controlled in a surgical pathology report. The most important control is that the number of specimens, the number of gross-descriptions, and the number of microscopic-diagnoses should match.

      NAME-IDENTIFIERS should match a master-list of names in the hospital computer system. In most cases, a patient-name will have a name-entry consisting of demographics (gender, date-of-birth), address, billing information, next-of-kin, etc. Each pathologist must be credentialled for the pathology department and institution issuing the report, and have a method of contact (phone number, beeper number, etc.) in case of a delayed or otherwise controversial report. Each surgeon must be credentialled for the and institution issuing the report, and have a method of contact. In many pathology laboratories, a specimen is not acceptable by the computer system if the appropriate name-identifiers are not matched.

      CODES, including location-codes, billing-codes, and diagnosis-codes, should match codes in a master-list.

      MY RESEARCH INTEREST IN ARTIFICIAL INTELLIGENCE. involves COMPUTER TRANSLATION, either between English and artificial medical languages, such as the SYSTEMATIZED NOMENCLATURE OF HUMAN AND VETERINARY MEDICINE (SNOMED III); MEDICAL SUBJECT HEADINGS (MeSH); and the UNIFIED MEDICAL LANGUAGE SYSTEM (UMLS); or between English and natural languages, such as: GERMAN, TURKISH, and JAPANESE. For example, consider the following production rule: IF a noun follows a preposition in English, THEN that noun precedes the preposition in Turkish and Japanese.

      COMPUTATIONAL LINGUISTICS is one of the major, active areas of study in AI, and there is a major National Cancer Institute (NCI) funding initiative, the SHARED PATHOLOGY INFORMATICS NETWORK (SPIN), one of whose aims is the translation of natural language surgical pathology reports into a standardized, international language, such as the EXTENDED MARKUP LANGUAGE (XML).



4. ARTIFICIAL INTELLIGENCE ON THE INTERNET.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      If you embark on a hobby or career involving AI, you should learn at least one programming language. You should write a few programs for your home computer, and a few programs that can be shared on the Internet and run by other Internet users. Don't just READ ABOUT PROGRAMMING, write a few programs! Reading about programming is like attending swimming lectures.

      Here is how the Internet got started. INTRODUCTION TO THE INTERNET:
http://www.medparse.com/whatnett.htm


      There is no specific computer language for Artificial Intelligence. PERL is the best computer language for beginners, because you can learn a little bit of PERL in a few hours, but PERL has a lot of depth. INTRODUCTION TO PERL (PRACTICAL EXTRACTION AND REPORTING LANGUAGE):
http://www.medparse.com/whatperl.htm
PERL is not necessarily the best language for AI, but it is the cheapest (cost-free), and the most widely available for different systems, including MS-DOS and LINUX. There are both personal-computer and Internet versions of PERL, so that you can prototype a PERL program on your home computer, and then post it publicly on the Internet. Because PERL is cost-free, many computer programmers have contributed to the advancement of PERL, and the result is a very powerful computer language, with many of its serious bugs and deficiencies taken care of. You should have at least PERL VERSION FIVE for AI application programming. On the other hand, because PERL is free, and Bill Gates cannot monopolize it, and there is no Microsoft (R) version of PERL.



5. TOOLS OF ARTIFICIAL INTELLIGENCE:
PROGRAMMING LANGUAGES.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      If you embark on a hobby or career involving AI, you should learn at least one programming language. You should write a few programs for your home computer, and a few programs that can be shared on the Internet and run by other Internet users. If your program is interesting enough, you will get email critiques and suggestions, which can be very powerful stimuli for upgrading your knowledge of AI. There is no specific best computer language for Artificial Intelligence. I recommend PERL, because it is easiest and cheapest.

      PROGRAMMING FOR AI: PERL, JAVA, C++, PYTHON. There is no specific computer language for Artificial Intelligence. Your best bet is: whatever you can get at a price that you can afford. PERL is the best computer language for beginners, because you can learn a little bit of PERL in a few hours, but PERL has a lot of depth. JAVA, C++, and PYTHON require a serious committment to learning about object-oriented programming. It will take you days just to write your first program that says HELLO in JAVA, C++, or PYTHON.

      Historically, a considerable amount of AI programming was performed in: FORTRAN and COBOL. The only reason for using these decrepit languages was that nothing else was available. They are not well-suited for good AI programming. LISP is a computer language specifically designed for AI programming, but it has two major drawbacks: (1) LISP takes over control of your entire computer (i.e., it doesn't share resources with other computer programs); and (2) LISP has poor input/output features, so that it is difficult to share LISP outputs with other investigators. It is surprising how many vendors of computer software systems neglect one of the basic desires of computer users: the desire to share results.

      PROGRAMMING TO FIRST ERROR. A fast but sloppy method for writing elementary computer programs, particularly if the program source code is not likely to be examined by others, and is likely not to be used very many times, even by its own author. Long ago, nearly all programming languages were COMPILED PROGRAMMING LANGUAGES, in which the entire program had to be syntactically correct before the computer system would even execute the first statement in the program. Now, many programming languages (MUMPS, PERL) are INTERPRETED PROGRAMMING LANGUAGES, in which the computer system executes all valid statements in the program, until an error condition is encountered. In an interpreted programming language, one may write a new program a few statements at a time, an incredible convenience. Programming to first error is not recommended for computer courses, but I betcha that the class instructor does the same thing.

      PROGRAM CANNIBALISM. Another fast but sloppy method for writing elementary computer programs. Many computer programming tasks are repetitive, and after a while, you remember that you have done something like this before. Try to remember the name of that old program! Then you can cannibalize it for a few lines of source code, that you would otherwise have to think through and debug all over again. Eventually you will develop personal naming conventions that designate your old programs that are particularly suitable for cannibalism. Program cannibalism is not recommended for formal computer courses, but I betcha that the class instructor does the same thing.

      COMPILED VERSUS INTERPRETED COMPUTER PROGRAMMING LANGUAGES. In a COMPILED PROGRAMMING LANGUAGE, the entire program must be syntactically correct before the computer system will even execute the first statement in the program. In an INTERPRETED PROGRAMMING LANGUAGE, the computer system executes all valid statements in the program, until an error is encountered. The program stops, and the user is given some sense of where the error occurred, and why it stopped where it did. If the error is obvious, then one repairs the error and writes a few more lines of computer source code. If the error is not obvious, then one prints out the value of some key variables, right before the program terminates execution. Interpreted programming languages allow a more informal style of programming, but in the minds of some computer scientists, may lead to habits of sloppiness, that will be detrimental to a programmer's career in the future.

      ANOTHER WORD OF CAUTION: If you are writing a computer program for yourself, then all manner of sloppiness is permissible. However, many professional programmers operate as a part of a team, in which some other programmer on the team may have to dig into your source code when you are sleeping or out-of-town. For this reason alone, you need to develop a clear programming style.

      ALGORITHM is a stepwise process for converting a numerical input into a solution. The Latin word, ALGORITHMUS, is a corruption of the name of the eminent tenth-century Arabic mathematician, AL-KARIZMI. For example, the stepwise methods for MULTIPLICATION and LONG DIVISION are algorithms. When you realize that the Europeans employed Roman numerals until the fourteenth century (Arabic numerals were declared illegal in Florence in 1300), it is not difficult to realize why the Arabic mathematical culture advanced far ahead of the European culture in those days.

      PROCEDURAL VERSUS DECLARATIVE COMPUTER LANGUAGES. A PROCEDURAL COMPUTER LANGUAGE consists of a sequence of commands (PROCEDURES). The computer simply performs the commands in sequence, until it either reaches the end of the program, or else encounters an error (such as a command to divide by zero (Seife, 2000)). There is no checking by the computer system to determine whether one step in a sequence makes sense with respect to prior or subsequent steps.

      A DECLARATIVE COMPUTER LANGUAGE consists of a set of statements, DECLARATIONS, in no particular order. The declarations are assertions of truth, and order is irrelevant to the computer system, although it may help other programmers who are reading your computer code. If the logic instructions are provided in a different sequence, the resulting output should not be different. For a declarative computer language, such as PROLOG, the computer system goes to considerable length to verify that all the declarations are consistent with one another.

      An OBJECT-ORIENTED COMPUTER LANGUAGE consists of objects in relation to one another. These objects have PROPERTIES, ANCESTORS, and DESCENDANTS. If two different objects have common properties, then the property must only be specified once by the computer programmer, and all other instances of that property are INHERITED. This feature of object-oriented computer programming make it more efficient for some large programming projects.

      HEURISTIC ALGORITHM. A DETERMINISTIC ALGORITHM must produce a correct solution if the steps of the algorithm are faithfully performed. In many cases, no such perfect algorithm is known, so that one must resort to a HEURISTIC ALGORITHM, which seems like it should get the correct result, or close to the correct result, based upon experiments with a few examples, but not based upon mathematical proof.

      LOGIC PROGRAMMING. Typical computer programming involves a stepwise performance of instructions, or commands, supplied by the computer programmer. If the instructions are provided in a different sequence, the resulting output may be different. In LOGIC PROGRAMMING, the computer program consists of a series of logic instructions that are all TRUE, but are performed in no particular sequence with respect to one another. If the logic instructions are provided in a different sequence, the resulting output should not be different.

      ASSOCIATIVE COMPUTER MEMORY. In traditional computer memory, the computer system locates a particular item in its memory by a numeric location, such as ITEM(0), ITEM(1), ITEM(2), ITEM(3).... If the OFFSET for the list, ITEM, is, say, memory location 123456, then:
ITEM(0) is at location 123456;
ITEM(1) is at location 123457;
ITEM(2) is at location 123458;
ITEM(3) is at location 123459.....


      In associative computer memory, the computer system locates a particular item in its memory by the information in that location, such as ITEM(TOM), ITEM(DICK), ITEM(HARRY). In HOSPITAL INFORMATION SYSTEMS (HISs), such as used in the VA hospitals, the major programming language is MUMPS. MUMPS (MASSACHUSETTS UTILITY MULTI PROGRAMMING SYSTEM) employs an associative computer memory. Another computer programming language with limited associative computer memory capabilities is PERL (PRACTICAL EXTRACTION AND REPORTING LANGUAGE.

      AUGMENTED TRANSITION NETWORK is a mathematical representation of languages. An ARROW denotes a TRANSITION, which points from one part-of-speech in a sentence to the next one. For example, in an English sentence, PREPOSITION ==> NOUN is an allowable transition.

      BACKWARDS SEARCH; BACKWARD CHAINING. This is an AI technique in which you start from the end (GOAL STATE), and you work backward to the beginning (INITIAL STATE).

      MUMPS (Massachusetts-General-Hospital Utility Multi Programming System), also known as simply M, but still called MUMPS by old-timers. MUMPS is an interpreted computer language, developed in the late 1960s, for sharing computing and data resources between laboratories. Many of the important features in MUMPS have been overtaken by other computer languages, but some of the features are still unique to MUMPS or to very expensive database systems.

     
COMPUTERIZED GARBAGE COLLECTION. When you stop using a particular memory location in your computer, what actually happens is that all the pointers to that location are rerouted, so that the location is no longer used. This unused memory location becomes GARBAGE. When an entire block of memory (say, a thousand locations) becomes unused, then the block of memory is REPATRIATED, and reused for other purposes. An active computer program can entirely fill the memory with garbage in a few hours. The bookkeeping associated with unpointering and repatriating memory locations is considerable but not very glamorous, and is often assigned to inexperienced, entry-level programmers. Hospital information systems (HISs), which involve rapid movements of patients in, out, and around the institution, with continually updated status reports, etc., are always challenging their system's garbage collection algorithm. It is not difficult to do a lousy job with this detail. Many of the best garbage collection methods are trade secrets. As a result, many snazzy AI computer systems that work well in demonstration mode with small data sets become impossibly burdened with uncollected garbage in real-world applications. This has been the fate of many AI programs of yesteryear, and continues to be a problem even today.

      INTRODUCTION TO PERL (PRACTICAL EXTRACTION AND REPORTING LANGUAGE):
http://www.medparse.com/whatperl.htm
PERL is not necessarily the best language for AI, but it is the cheapest (cost-free), and the most widely available for different systems, including MS-DOS and LINUX. Because it is free, many computer programmers have contributed to the advancement of PERL, and the result is a very powerful computer language, with many of its serious bugs and deficiencies taken care of. You should have at least PERL VERSION FIVE for AI application programming.



6. TOOLS OF ARTIFICIAL INTELLIGENCE:
SYMBOLIC LOGIC. SET THEORY. INFERENCE.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      AI: RELATIONSHIP TO SET THEORY, SYMBOLIC LOGIC. There is a superficial resemblance between AI production rules, the subset relation in set theory, and the IMPLIES operation in symbolic logic. This resemblance is useful for suggesting production rules to an AI architect, that otherwise may have been overlooked. However, set theory is largely devoted to the properties of calculus, topology, measure theory, and transfinite counting. The crossover between set theory and AI production rules is superficial at best. Likewise, symbolic logic is largely devoted to philosophical definitions of truth, which bear little immediate relation to the questions addressed in AI production rules.

       MATHEMATICAL LOGIC has its roots in Aristotle and in the European Middle Ages, but assumed its modern form in the hands of George Boole (17), who is rightfully memorialized in the term BOOLEAN SEARCHES used for searching the computerized medical literature.

       The MATHEMATICAL THEORY OF SETS was invented in the late nineteenth century, based upon concepts from mathematical logic, as a means for addressing some pressing philosophical issues in mathematics, such as the meaning of limits in calculus, and different classes of infinity. It has been suggested that these mathematical disciplines have potential serious applications in medicine, but so far the published examples (including in this report) are fairly trivial.

       A MATHEMATICAL SET, S, is any collection of objects for which it can be said that a particular object, s, either BELONGS TO S, (s is a member of S), denoted s C S, or DOES NOT BELONG TO S, (s is not a member of S), denoted s ~C S. One cannot simply collect a set of just anything. In particular, the SET OF ALL SETS is a famous paradox, whose recognition by a young Bertrand Russell nearly demolished the celebrated magnum opus and career of Prof. Gottlob Frege. THERE CANNOT BE A SET OF ALL SETS.

       A set may be characterized either as a listing of its actual members within curly brackets, {}, so-called roster notation, or by the method of its creation. For example, {2,3,5,7,11,13,17,19} and the SET OF ALL PRIMES LESS THAN 20 are the same set.

       The most important set of all is the NULL SET or EMPTY SET, i.e., the set that contains no members, usually denoted Ø, or {}.

       There are two important things to remember about a mathematical set (32,33):
A SET IS CHARACTERIZED entirely in terms of its members, i.e., EXTENSIONALLY. A set may NOT be uniquely characterized INTENSIONALLY, i.e., by the manner of its creation. If two sets are created differently, but end up with the same membership, then they are the same set. For example, the set of humans living on the moon in 1850 and the set of Chevrolets built during 1850 are the same set, namely, the null set, Ø.

A SET IS DIFFERENT FROM WHAT IT CONTAINS. that is, s and {s} are different. This property of sets leads to the Russell-Frege paradox: does the set of all sets belong to itself or not? (Is Epimenides a liar or not? (see Appendix C)). The Russell-Frege paradox can be resolved by defining two types of sets: ordinary sets and classes. This double definition involves a lot of extra mathematical bookkeeping.
Smith believes that this classical formulation for sets is fundamentally flawed for describing ontologies, and has proposed using an alternate formulation, known as MEREOLOGY.

       There are six commonly used concepts in set theory:
1. Set membership, denoted C . We say that s is a member of S, denoted C S; or s is not a member of S, denoted ~C S;

2. The empty set, Ø, the set containing no members. There exists no s such that s C Ø.

3. Set Union, U. The set A U B is the set of all members that belong either to set A or to set B or to both. Set union is analogous to inclusive-or (IOR) in first order propositional logic.

4. Set Intersection, /\. The set A /\ B is the set of all members that belong both to set A and to set B. Set intersection is analogous to logical AND in first order propositional logic.

5. Set Subtraction, -. The set A - B is the set of all members that belong to set A but NOT to set B. Set intersection is analogous to logical NOT in first order propositional logic.

6. Subset. A C B. We say that the set A C B, if the set of all members that belong to set A also belong to set B. Set intersection is analogous to IMPLIES in first order propositional logic.
      DEFAULT LOGIC. Production rules of the form: If A is true and B cannot be proved, then C is true. For example:
1. All crows can fly.
2. Jack is a crow.
3. What if Jack has a broken wing?
Obviously, statement #1 is not true all the time, but in many cases, one must REASON UNDER UNCERTAINTY, and some conclusion, even one that is occasionally wrong, is better than no conclusion. In default, or non-monotonic logic, one must have a contingency plan for rescuing the system when it encounters such predictable but unusual errors. In classical logic, the entire system collapses, which is usually not an acceptable result.

      NON-MONOTONIC LOGIC. Same as default logic.

      POLISH LOGIC NOTATION was invented by LUKASIEWICZ, an early twentieth century Polish mathematician. It is known among American speakers as Polish logic, since the name Lukasiewicz does not roll easily off the American tongue. Also known as PARENTHESIS-FREE LOGIC NOTATION, because the mathematical content of each logical combination may be specified entirely by the order of logic-terms and operators, without parentheses. For example, in ordinary arithmetic notation, multiplication (×) supersedes addition (+), so that
3+5×7 = 3+(5×7) = 38.
whereas
(3+5)×7 = 56.
In parenthesis-free notation, the operator is placed BEFORE NOT BETWEEN the two terms that it modifies. Therefore:
+3 × 5 7 = 38.
× +3 5 7 = 56.
In either case, no parentheses are needed. In fact, in Polish logic notation, an arbitrarily complex expression can always be expressed without parentheses. This notation is a great convenience in certain proofs and demonstrations in logic. The notation seems a little bit bizarre and unintuitive at first, but one can eventually become comfortable with it.

      MODAL LOGIC. Three-valued logic. Each variable may take the value: TRUE, FALSE, or POSSIBLY-TRUE. A fourth variable-value, NECESSARILY-TRUE, is defined as:
x is NECESSARILY-TRUE if and only if x is NOT-NECESSARILY-NOT-TRUE


      FUZZY LOGIC.

      MULTI-VALUED LOGIC was invented by LUKASIEWICZ, the early twentieth century Polish mathematician, who also invented parenthesis-free logic notation, known among American speakers as Polish logic notation.

      A BOOLEAN COMBINATION is a combination of logic statements (i.e., statements with a true/false value), with combining operators such as AND, OR, NOT, etc. For example: "Jack is a crow AND Jack has a broken wing." The name commemorates GEORGE BOOLE, a nineteenth century Irish logician and philosopher, who first understood clearly that the dominant or-operator in logic must be INCLUSIVE-OR, not EXCLUSIVE-OR, as believed by Leibniz two centuries earlier. Boole's magnum opus is THE LAWS OF THOUGHT.

      A BOOLEAN SEARCH is a search of documents, based upon combinations of search-terms connected by AND, OR, NOT, etc.



7. TOOLS OF ARTIFICIAL INTELLIGENCE:
COMPUTATIONAL LINGUISTICS.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      COMPUTATIONAL LINGISTICS was invented as a modern intellectual discipline with the appearance of Noam Chomsky's thesis on the linguistics of Hebrew [Chomsky, 1949]. The central paradigm of computational linguistics is that all human languages may be represented as an interconnected collection of production rules [Bundy, 1997], in so-called Backus Naur Form (BNF), originally used by computer scientists to describe computer languages, such as ALGOL [Naur, 1960]. The development of this field over the past half-century has been dominated by Chomsky's intellectual leadership and academic trends at large universities with computational linguistics departments. The mainstream computational linguistics literature devotes an inordinate amount of attention to interpretation of peculiar natural-language sentences, that would appear strange even to a native speaker and incomprehensible to a non-native speaker (Newmeyer, 1996). By contrast, scant attention has been paid to the quantitative and statistical behavior of technically precise sentences, say, in surgical pathology free-text, which employ a restricted vocabulary, and are intended to be unambiguous in written form, even when read by an adversary (i.e., the proverbial plaintiff's pathologist in a malpractice action). The negative impact of these general trends in computational linguistics has been summarized by Nagao [1992], who has devoted his career to machine translation of narratives in electrical engineering from Japanese-to-English.

      In computational linguistics, the SOURCE LANGUAGE is the language from which the translation is being made; the TARGET LANGUAGE is the language into which the translation is being made; In many medical AI applications, the target language is a coding language, such as SNOMED or UMLS.

      BARRIER WORDS, or STOP WORDS, are commonly-occurring, typically short (up to five letters) words in English, that have low information content in medicine. They include articles, conjunctions, interrogatives, complementizers, interrogatives, prepositions, pronouns, auxiliary verbs, etc. Punctuation marks are operationally defined as barrier words. Most high-frequency Zipf words are barrier words. Barrier words typically express grammatical relationships, as in "adenocarcinoma of..." or "metastastic to...."

      The BARRIER WORD METHOD exploits the fact that barrier words (low-information words) serve as separators, or BARRIERS between words in a multiple-word medical or technical term. Multiple-word terms, or COLLOCATIONS, in medical or technical free-text (e.g., TUBULAR ADENOMA , PROSTATIC ADENOCARCINOMA ), are typically bounded on either side by barrier words. Therefore, a collocation may be discovered at first encounter by obtaining all word-sequences bounded on either side by barrier words. In the following example (U.S. Bill of Rights), the barrier words are shown in lower case, and the COLLOCATIONS are shown in upper case:


      AMENDMENT 1 .       CONGRESS shall make no LAW respecting an ESTABLISHMENT of RELIGION, or PROHIBITING the FREE EXERCISE thereof; or ABRIDGING the FREEDOM of SPEECH, or of the PRESS; or the RIGHT of the PEOPLE peaceably to ASSEMBLE, and to PETITION the GOVERNMENT for a REDRESS of GRIEVANCES.

      AMENDMENT 2 .       A well REGULATED MILITIA , being NECESSARY to the SECURITY of a FREE STATE , the RIGHT of the PEOPLE to keep and bear ARMS , shall not be INFRINGED.

      AMENDMENT 3 .       No SOLDIER shall, in TIME of PEACE be QUARTERED in any HOUSE , without the CONSENT of the OWNER, nor in TIME of WAR , but in a MANNER to be PRESCRIBED by LAW .

      AMENDMENT 4 .       the RIGHT of the PEOPLE to be SECURE in their PERSONS , HOUSES , PAPERS , and EFFECTS , against UNREASONABLE SEARCHES and SEIZURES , shall not be VIOLATED , and no WARRANTS shall ISSUE , but upon PROBABLE CAUSE , supported by OATH or AFFIRMATION , and particularly describing the PLACE to be searched , and the PERSONS or things to be seized .

      AMENDMENT 5 .       no PERSON shall be held to ANSWER for a CAPITAL , or otherwise INFAMOUS CRIME , unless on a PRESENTMENT or INDICTMENT of a GRAND JURY , except in cases arising in the LAND or NAVAL FORCES , or in the MILITIA , when in ACTUAL SERVICE in time of WAR or PUBLIC DANGER ; nor shall any PERSON be SUBJECT for the same OFFENCE to be twice put in JEOPARDY of LIFE or LIMB ; nor shall be COMPELLED in any CRIMINAL CASE to be a WITNESS against himself , nor be deprived of LIFE , LIBERTY , or PROPERTY , without DUE PROCESS of LAW ; nor shall PRIVATE PROPERTY be taken for PUBLIC USE , without JUST COMPENSATION .

      AMENDMENT 6 .       In all CRIMINAL PROSECUTIONS , the ACCUSED shall ENJOY the RIGHT to a SPEEDY and PUBLIC TRIAL , by an IMPARTIAL JURY of the STATE and DISTRICT wherein the CRIME shall have been COMMITTED , which DISTRICT shall have been previously ascertained by LAW , and to be informed of the NATURE and CAUSE of the ACCUSATION ; to be confronted with the WITNESSES against him ; to have COMPULSORY PROCESS for obtaining WITNESSES in his FAVOR , and to have the ASSISTANCE of COUNSEL for his DEFENCE .

      AMENDMENT 7 .       in SUITS at COMMON LAW , where the VALUE in CONTROVERSY shall exceed twenty DOLLARS , the RIGHT of TRIAL by JURY shall be PRESERVED, and no FACT tried by a JURY , shall be otherwise REEXAMINED in any COURT of the UNITED STATES , than according to the RULES of the COMMON LAW .

      AMENDMENT 8 .       EXCESSIVE BAIL shall not be REQUIRED , nor EXCESSIVE FINES imposed , nor CRUEL and UNUSUAL PUNISHMENTS inflicted .

      AMENDMENT 9 .       the ENUMERATION in the CONSTITUTION , of CERTAIN RIGHTS , shall not be CONSTRUED to DENY or DISPARAGE others RETAINED by the PEOPLE .

      AMENDMENT 10 .       the POWERS not DELEGATED to the UNITED STATES by the CONSTITUTION , nor PROHIBITED by it to the STATES , are RESERVED to the STATES respectively, or to the PEOPLE .
The following legal concepts (collocations), harvested by the barrier word method, are familiar to every adult U. S. citizen with a reasonable knowledge of our national values:
FREE EXERCISE
REGULATED MILITIA
FREE STATE
UNREASONABLE SEARCHES
PROBABLE CAUSE
INFAMOUS CRIME
GRAND JURY
NAVAL FORCES
ACTUAL SERVICE
PUBLIC DANGER
CRIMINAL CASE
DUE PROCESS
PRIVATE PROPERTY
PUBLIC USE
JUST COMPENSATION
CRIMINAL PROSECUTIONS
PUBLIC TRIAL
IMPARTIAL JURY
COMPULSORY PROCESS
UNITED STATES
COMMON LAW
EXCESSIVE BAIL
EXCESSIVE FINES
UNUSUAL PUNISHMENTS
CERTAIN RIGHTS
UNITED STATES


      CONTEXT FREE GRAMMAR. A formal grammar, used in AI computer translation programs, in which each assertion in the grammar is valid, regardless of any other grammatical assertion. Natural languages are not context-free grammars, but most computer languages, such as PERL, MUMPS, C, BASIC, etc., are. The context-free formalism allows simple parsers to be built for computer programming languages.

      CONTEXT DEPENDENT GRAMMAR. A formal grammar, used in AI computer translation programs that attempt to interpret human natural languages, in which any assertion in the grammar might be invalid, depending upon other assertions in the grammar. Even with this loosened constraint, most context dependent grammars are not models for human natural languages, because human natural languages are often grammatically inconsistent. The MOORE ET AL PARSER is a context-dependent grammar, under the constraint that dependencies in the grammar are HIERARCHICAL.

      NATURAL LANGUAGE is a language spoken by educated (human) speakers in a particular language domain, such as English, French, Turkish, or Chinese. While most natural languages are described by approximate grammars, these grammars typically have exceptions in actual usage, and are often context-dependent and highly complicated if described accurately. Inevitably, natural languages have ambiguities. Because of these features, natural languages are not appropriate tools for writing computer programs. On the other hand, in part because of these features, natural languages may serve as rich sources of creativity and discovery in linguistics.

      PROGRAMMING LANGUAGE is a language used by trained (human) technicians to communicate instructions for execution by a computer program. Typical computer languages are context-free, declarative, and compiled (q.v.). A list of words in descending order of word-frequency is called a ZIPF DISTRIBUTION. The most-frequent word has rank one; the second-most-frequent word has rank two, etc. According to Zipf's Law, rank, r, is inversely proportional to frequency, f (Zipf, 1949). Although Zipf's Law is inexact in detail (Mandelbrot, 1954; Moore et al, 1988), it encapsulates the general truth that a few hundred high-frequency words account for over half the word-occurrences in a large text-corpus. These high-frequency words are typically articles, conjunctions, complementizers, interrogatives, prepositions, pronouns, auxiliary verbs, etc., and serve as boundaries, or "barriers," on either side of medically-significant words, or "keywords" (Tersmette et al, 1988; Moore et al, 1989; Nelson et al, 1995; Wilbur, 2000). The Zipf Distribution for a large surgical pathology database is shown at URL:
http://www.netautopsy.org/vhpsapsx.htm
The German-language Zipf Distribution for the GOETHE UNIVERSITY AUTOPSY REGISTER is shown at URL:
http://www.medparse.com/guarzipf.htm
ZIPF'S LAW formalizes the observation that the high-frequency words in a large free-text document, or text-corpus, are extremely common (Zipf, 1949; Mandelbrot, 1954; Fedorowicz, 1982; Giere, 1981; Zhang, 1981; Moore et al, 1988; Moore et al, 1989). The frequency of words in a free-text document may be ranked, with rank=1 for the most frequent word, rank=2 for the second-most frequent word, etc. According to Zipf's Law, word-rank is inversely proportional to word-frequency. That is, for some constant, k, r = k/f, for r=rank and f=frequency. Thus in a large text-corpus (over a million words), approximately one hundred barrier words (ranks 1 through 100) account for over half the word-occurrences in the document.

      TRANSFER LANGUAGE is an idea from computational linguistics. In this paradigm, the SOURCE LANGUAGE is first translated into an intermediate, or transfer language, from which it is translated into the TARGET LANGUAGE. This idea was popular in the 1960s, in an era in which computerized translators were envisioned for multilingual groups, such as the European community, with about n=15 languages. In theory, there would only need to be a computer translator INTO and OUT OF each natural language and the transfer language. In practice, the transfer languages were so conceptually difficult that it was impractical to find persons who could maintain the dictionaries and grammars in and out of the transfer language.



8. TOOLS OF ARTIFICIAL INTELLIGENCE:
CALCULUS.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      INTEGRAL AND DIFFERENTIAL CALCULUS, typically known simply as CALCULUS, is a set of mathematical methods used to approximate values of slopes and areas under curves (mathematical functions), such as parabolas, circles, ellipses, etc. Co-invented in the seventeenth century by an English physicist, SIR ISAAC NEWTON, and a German philosopher, GOTTFRIED LEIBNIZ, calculus has largely been applied to physics and engineering problems (Courant et al, 1996). However, calculus is also useful tool for a range of curve-fitting and statistics problems, and for this reason should be studied by biomedical specialists as well. In the sample problem shown herein, calculus is used to estimate the second derivative for a series of measurements taken from the vertebral column, and from this the curvature (kyphosis, lordosis, etc.).

      DIFFERENTIAL CALCULUS comprises the mathematical methods used to approximate the values of SLOPES of mathematical functions. Originally known as FLUXIONS, the study of differential calculus was already well-developed before the time of Newton and Leibniz. Differential calculus is, by far, the less difficult of the two great branches of calculus, and is employed in many statistical applications.

      INTEGRAL CALCULUS comprises the mathematical methods used to approximate the values of AREAS under mathematical functions. Originally known as QUADRATURES, the study of integral calculus preceded Newton and Leibniz. The great contribution of Newton and Leibniz was the clear understanding that the two operations are the inverse of one another. This understanding allowed formulas to be exchanged between these two great branches of calculus, and thus enriched our knowledge of both. The present document covers possible biomedical applications of elementary calculus.

      The basic ideas of elementary calculus apply to a CURVE, or MATHEMATICAL FUNCTION, lying on the xy plane. We employ the ordinary Cartesian plane of analytic geometry, with the angles, distances, etc., obeying the usual rules of Euclidean geometry. More advanced ideas in calculus involve multiple dimensions, and different rules of angles and distances.

      A MATHEMATICAL FUNCTION, is defined as a relationship between x and y on the xy plane, in which every x corresponds to ONE AND ONLY ONE VALUE OF y. A LIMIT in calculus is the destination-value for a sequence of values. A DERIVATIVE, or DIFFERENTIAL, for a mathematical function (curve) in calculus is the instantaneous slope of that curve. An INTEGRAL, or ANTIDERIVATIVE, for a mathematical function in calculus is the area under that function. The FUNDAMENTAL THEOREM OF INTEGRAL AND DIFFERENTIAL CALCULUS is the assertion derivative and integral are inverse processes. In this minitutorial, we shall prove the fundamental theorem for so-called well-behaved functions (continuous and everywhere differentiable).

      A LIMIT in calculus is the destination-value for a sequence of values. We write:
Limx -> c f(x) = L.
to denote the
L = LIMIT OF f(x) AS x APPROACHES c.
This means that the function, f(x), gets closer-and-closer to L, as x gets closer-and-closer to c. As long as x-getting-closer-and-closer-to-c doesn't involve dividing by zero, then typically L = f(c). The limit process gets much tricker if a division by zero (or division by a very small number) is involved in calculating L. Unfortunately, taking a derivative involves just such a process.

      In an extension of the original definition of limits, we can speak of f(x) getting closer-and-closer to L as x gallops off to infinity:
Limx -> infinity f(x) = L.
denotes the
L = LIMIT OF f(x) AS x APPROACHES INFINITY.


      The vagaries of DIVISION BY ZERO befuddled our eighteenth century predecessors, and eventually led to the WEIERSTRASS DEFINITION OF LIMIT, in which the expression:
L = LIMIT OF f(x) AS x APPROACHES c
means that for every epsilon>0, and for every |x-c|<epsilon, there exists a delta>0 such that |f(x)-L|<delta. The analogous Weierstrass definition for x approaching infinity is:
Limx -> infinity f(x) = LIMIT OF f(x) AS x APPROACHES INFINITY.
means that for every N>0, and for every x>N, there exists a delta>0 such that |f(x)-L|<delta. Proofs in calculus are then reduced tofinding a formula for calculating delta, given a value of epsilon or N.

      The ancient Greek mathematician, ARCHIMEDES, came very close to the idea of limits in his book entitled, THE SAND RECKONER, which includes an attempt to calculate the circumference of a circle, namely, pi × diameter, using a series of N-sided, regular polygons, as N gets very large (Archimedes, 1939).

      DIVISION-BY-ZERO. Everybody knows that you shouldn't divide by zero, but do you know why? The answer is that DIVISION IS DEFINED IN TERMS OF MULTIPLICATION. That is, when you write x = z / y, then you are actually asking what x is it, for which
x × y = z.
For y=0, the answer is in two parts: IF y=0 AND z~=0, THEN THERE IS NO SUCH x. On the other hand, IF y=0 AND z=0, THEN EVERY POSSIBLE x SATISFIES THE MULTIPLICATION (Seife, 2000). Thus, z/0 is NONEXISTENT for z~=0; whereas 0/0 is ANYTHING. Unfortunately, taking a derivative in elementary calculus involves getting perilously close to division by zero (vide infra). This single fact is why reasoning in calculus is fundamentally more difficult than reasoning in algebra. In calculus, you must ALWAYS PAY CLOSE ATTENTION!

      There are several fool's proofs in the popular mathematics literature, purporting to demonstrate that 1 = 2 or that 2+2 = 5. In both cases, the proof works by setting x=0, and distracting the reader not to pay attention when one divides by x (Asimov, Realm of Numbers; Singh, Fermat's Enigma).

      A DERIVATIVE, or DIFFERENTIAL for a mathematical function (curve) in calculus is the instantaneous slope of that curve. If the slope is going upward, then the derivative is positive; if the slope is going downward, then the derivative is negative; if the slope is exactly horizontal, then the derivative is zero. For a function, f(x), its FIRST DERIVATIVE may be defined as
Limh->0(f(x+h)-f(x))/h = L.
Think of a curve, f(x), and nudge x forward by a small amount, h. Then (f(x+h)-f(x)) is the amount that f() moves upward, while x advances by h. As h approaches zero, the (f(x+h)-f(x))/h becomes the slope of f(x).

      Trouble is, obtaining L by substituting 0 for h would involve dividing by zero, and yielding the answer 0/0. The way to avoid this paradox is to get h out of the denominator before completing the calculation. For example, let f(x) = x2. Then:
L = Limh->0((x+h)2-x2)/h.
or:
L = Limh->0(x2+2xh+h2-x2)/h.
or:
L = Limh->0(2x+h).
Now, let h=0 and complete the calculation, so that L=2x.

      In general, the first derivative, L, for xk is kxk-1



9. TOOLS OF ARTIFICIAL INTELLIGENCE:
PROBABILITY AND STATISTICS.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      STATISTICS is a class of methodologies, which can be used to test and verify the results of AI investigations. The two pillars of statistics are: STATISTICAL ESTIMATION and STATISTICAL HYPOTHESIS TESTING. The major topics in elementary statistics are:
SAMPLE EXPERIMENT.
ROLE OF STATISTICS IN EVALUATING RESEARCH RESULTS.
PROBLEM OF VARIABILITY.
POPULATION VERSUS SAMPLES.
HYPOTHESIS TESTING.
DECISION THEORY.
ANALYSIS OF VARIANCE.
CONTINGENCY TABLES.
CORRELATION.
MULTIVARIATE ANALYSIS.


      SAMPLE EXPERIMENT. There is a need for statistics in biomedicine when you have performed a well-designed experiment, and the RESULTS ARE VARIABLE. Borrowing upon unfortunate metaphors from physics, this variability was traditionally attributed to EXPERIMENTAL ERROR, when the measuring instruments were often good enough to prevent this, and the real error was an inadequate understanding of the underlying biological process. In any event, biostatistics has come to be used as a tool for dealing with the variance between results predicted by theory and actual observations in biological systems.

      ROLE OF STATISTICS IN EVALUATING RESEARCH RESULTS. Generally speaking, statistics in biomedicine evaluates two features of repetitive observations with variability: CENTRAL TENDENCY and DISPERSION. Another major role of statistics in exploiting variable data is: HYPOTHESIS TESTING. Is the variability small enough that one can still draw scientific conclusions?

      Before you move too far into statistics software: LOOK AT YOUR DATA. Graph out your results (many nice computer systems are available for this), and think about what your data are telling you.

      PROBLEM OF VARIABILITY.

      POPULATION VERSUS SAMPLES. In statistics, a POPULATION is everything there is; whereas SAMPLES are repetitive drawings from the population. The purpose of statistics is to infer properties of the population from repetitively drawn samples.

      HYPOTHESIS TESTING is the determination of likelihood that a particular statement is true. Generally, hypothesis-testing arguments are framed in terms of whether a particular variable assumes a value of zero or not, such as whether the difference of two means is zero (STUDENT t-TEST), or whether the slope of a linear regression is zero (PEARSON'S CORRELATION COEFFICIENT).

      ESTIMATION.

      DECISION THEORY.

      ANALYSIS OF VARIANCE.

      CONTINGENCY TABLES.

      CORRELATION.

      MULTIVARIATE ANALYSIS.



10. APPLICATIONS OF ARTIFICIAL INTELLIGENCE. COMPUTATIONAL COMPLEXITY.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      COMPUTATIONAL COMPLEXITY is one of the hot areas of research in theoretical computer science, and impacts upon the performance of nearly all practical, large-scale computer systems, including most serious AI systems. Computational complexity affects scheduling systems, travel planning systems, package design, distribution and routing of shipped goods, chess-playing and other gaming systems, computer circuit design, hospital inforamtion systems, financial management systems, and many other fields. The only areas not impacted by computational complexity considerations are problems so small that they have no significant computing requirements, or problems so large that they have no practical solution, and are discussed only on a theoretical level.

      Despite all the advances in computer speed over the past several decades, there are still some computational problems beyond the reach of today's or even tomorrow's computers. There are, in fact, some problems that can NEVER be solved! (So-called Gödel problems, mostly of interest to theoretical mathematicians.)

      Many AI computing algorithms can be viewed as a set of operations applied to a LIST OF n OBJECTS. When the list is short (i.e., n is small), the method of computation doesn't matter very much. For example, anyone can sort a list of 100 social security numbers in a few minutes, and it is easy to write a computer program that does the same thing. However, to sort the 300,000,000 social security numbers in the USA, different computational methods can have different computational times in many orders of magnitude, the difference between a few minutes and days, months, or years. Failure to grasp this problem has led to the failure of many laboratory information systems that worked well in demonstration form on small data sets, and the subsequent dismissal of pathologists, laboratory managers, computing center directors, etc.

      An AI algorithm may involve SEEKING one out of a list of n objects in the computer memory; SORTING a list of n objects; PACKING n objects into the smallest packing carton; CONNECTING n objects in a computer circuit with the least amount of wire; ROUTING a person, airplane, train, or truck, among n stations, etc. Any bright high school student with an interest in mathematics could develop a PLAN for solving such problems. The difficulty is whether the plan is efficient, i.e., whether there is a different plan that could solve the same problem in significantly fewer steps. A system that requires n steps to solve a problem with n objects has a LINEAR COMPUTING COMPLEXITY. In the worst case, the seeking problem is linear, beacuse one might have to go all the way to the end of the list of n objects in order to locate the desired object.

      Actually, if the n-list is PRESORTED, then a system can be designed to locate any object in log2n (log-n-to-the-base-two) steps We say that x = log2n when 2x=n, so that
log22 = 1;
log24 = 2;
log28 = 3;
log216 = 4;
log232 = 5....
For SEEKING from a list of 32 objects, one is talking about the difference between n=32 steps (linear) and log232 = 5 steps (logarithmic). For seeking from a list of 300,000,000 social security numbers, from a list of 32 objects, one is talking about the difference between n=300,000,000 steps (linear) and log2536,870,912 = 29 steps (logarithmic). Obviously, for large, practical problems, computational complexity methods can assume great importance.

      Computer scientists classify computational complexity algorithms LOG, LINEAR, LOG-LINEAR, QUADRATIC, POLYNOMIAL, EXPONENTIAL, SUPER-EXPONENTIAL, INSOLUBLE. That is:
(1) LOG, soluble in LOGn steps. For example, if you wish to FETCH a single record from a SORTED LIST of n records, then this can be accomplished in LOGn steps. The easiest way to understand this concept is to consider LOG2n steps, i.e., logarithm-to-the-base-2. Now suppose that you have a sorted list of 16 names, and you are looking for HARRY. In the first step, split the list in half. Harry belongs to the first half-list.
 Abner
 Bill
 Charley
 David
 Edward
 Frank
 George
 Harry
_________________
 Ike
 John
 Karl
 Larry
 Milton
 Norman
 Oliver
 Patrick
In the second step, split the remaining list in half. Harry belongs to the second half-list.
 Abner
 Bill
 Charley
 David
_________________
 Edward
 Frank
 George
 Harry
In the third step, split the remaining list in half. Harry belongs to the second half-list.
 Edward
 Frank
_________________
 George
 Harry
In the fourth step, split the remaining list in half. Harry belongs to the second half-list.
 George
_________________
 Harry
 
That is, LOG216 = 4 steps, compared to 16 steps for an unsorted list.

      If you think that it's silly to go to all this mathematical sophistication to fetch from sorted lists, consider this: the U. S. Social Security Administration has a list of some 300,000,000 social security numbers, that it must, in principle, be able to fetch. Since 228=268,435,456 and 229=536,870,912, we are talking about the difference between 29 steps and 300,000,000 steps. Suppose that the computer software can perform 1000 steps per second. Then we are talking about the difference between 0.03 seconds and 300,000 seconds = 5,000 minutes = 83 hours = 3.5 days.

      Likewise, a clinical pathology laboratory in a moderate-sized community hospital (250 beds) performs about 1,000,000 laboratory tests annually, not to mention a total patient census of, perhaps, 300,000, plus inpatient-admission-dates, encounter-dates, outpatient-visit-dates,..... Since 210=1,048,576, we are talking about the difference between 10 steps and 1,000,000 steps, 0.01 seconds and 1,000 seconds= 16 minutes. Perhaps not much, but these resources add up quickly, and there is no excuse for wasting resources when good solutions are already known.

(2) LINEAR,
soluble in n steps. For example, if you wish to FETCH a single record from an UNSORTED LIST of n records, then this can be accomplished in n steps. In the worst case, the algorithm must examine every single record before it hits the desired record.

(3) LOG-LINEAR,
soluble in nLOGn steps. For example, if you wish to SORT a LIST of n records, all the quality algorithms (QUICKSORT, HEAPSORT, etc.) employ a variant of the list-division method described above.

(3) QUADRATIC,
soluble in n2 steps.

(4) POLYNOMIAL,
soluble in nk steps, where k is a fixed integer, hopefully less than 4 or 5.

(5) NP-COMPLETE,
mathematically uncertain, but at best polynomial and at worst exponential. There are many interesting problems in AI involving decision trees and routing diagrams, that fall into this category.

(6) EXPONENTIAL,
soluble in 2n steps.

(7) SUPER-EXPONENTIAL,
soluble only after greater than 2n steps. Includes Pressburger algebra.

(8) INSOLUBLE,
not soluble in an infinite number of steps. Includes such Gödel-unprovable statements as: the consistency of ordinary arithmetic; the Generalized Continuum Hypothesis; and the Axiom of Choice.
Unfortunately, there are many interesting problems in AI, involving decision trees and routing diagrams, that are NP Complete. Although the status of the NP Complete problem is currently uncertain, despite the depradations of numerous brilliant and determined mathematicians over the past several decades, even if it is solved and turns out to be exponential, then this solution offers only despair to hopeful AIers.

      If, as I believe, the status of the NP Complete problem will remain either uncertain or solved in the pejorative direction in the foreseeable future, the mandate to the AI community is the same: large computational problems must be reduced in magnitude, based in part upon reasonable assumptions from the specific fields that generated the problems.

      NON-POLYNOMIAL COMPLETE (NP COMPLETE) problems have an unknown computational complexity. See:
COMPUTATIONAL COMPLEXITY. That is, an NPC problem is at least polynomial-complete, and at most exponential-complete. These problems form the most interesting class of computational complexity problems in AI. This problem-class is not known to REQUIRE an exponential algorithm, but there are no KNOWN polynomial algorithms for them. Furthermore, if you know an algorithm for solving one of the NPC problem, then this algorithm is MATHEMATICALLY CONVERTIBLE into solutions for all other NPC problems. If you solve one, then you get solutions for all others for free! Problem is, many of the finest mathematicians of the twentieth century have tried and failed at the NPC problem

      There are many interesting problems in AI, including scheduling systems, travel planning systems, package design, distribution and routing of shipped goods, chess-playing and other gaming systems, computer circuit design, hospital information systems, financial management systems, decision trees, and routing diagrams, that are NPC.

      Now, Sir Andrew Wiles, the Princeton/Cambridge mathematics professor, who recently solved FERMAT'S LAST THEOREM (Singh, 1997), an open problem in mathematics for the past 350 years, has applied himself to the NPC problem. Good luck, Sir Andrew!

      As a parting thought, suppose that Sir Andrew finds a practical, polynomial solution to the NPC problem. Most of the world's electronic financial transactions are currently based upon ENCRYPTION ALGORITHMS, that depend upon the NONEXISTENCE of a polynomial solution to the NPC problem. That is, factorization according to the SIEVE OF ERATOSTHENES is an NPC problem. If a polynomial solution were suddenly found, then new encryption software would have to be written, and in the meantime, worldwide electronic financial transactions would come to a halt.



11. APPLICATIONS OF ARTIFICIAL INTELLIGENCE. NUMBER THEORY.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      EUCLID'S FACTORIZATION, is a method known to the ancient Greek mathematician most recognized for his ELEMENTS OF PLANE GEOMETRY. Euclid stated and proved the assertion that every whole number consists of a UNIQUE PRODUCT OF PRIME NUMBERS.

      MOD ARITHMETIC (also: MODULO ARITHMETIC or MODULUS ARITHMETIC) is sometimes known as CLOCK ARITHMETIC, because the hour hand on an analog 12-hour clock is a MOD 12 DEVICE. That is, the hour hand cycles every 12 hours. (Strictly speaking, a MOD 12 DEVICE should cycle from 0 to 11.) Therefore:
mod120 = 0
mod121 = 1
mod122 = 2
mod123 = 3 .........
mod1212 = 0
mod1213 = 1
mod1214 = 2
mod1215 = 3


      CHINESE REMAINDER THEOREM.

      NUMBER THEORY.



12. ARTIFICIAL INTELLIGENCE ONTOLOGIES.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      An ONTOLOGY is a (Platonic) description of essential reality, i.e., what actually is, as opposed to what one can see (observation, accident), or what one can know (epistemiology) (Smith, 2001). The term ontology was coined by two German philosophers, Göckel and Lorhard, in 1613, and first appeared in English in 1721. Quine (19xx) views ontology as the metaphysical commitments or presuppositions embodied in the different natural sciences. For example, the belief that a cancer can metastasize would be an ONTOLOGICAL COMMITMENT. In the philosophy and practice of science, ontology goes under various names: essence, reality, Mind of God, nature, gold standard, or mathiverse (Stewart, 2000). In medical informatics, ontology has come to mean a structured list of concepts, typically prepared by an expert or panel of experts.

      With the ease of posting structured lists on the Internet, and with EXTENDED MARKUP LANGUAGE (XML) as an emerging standard for such lists, it is likely that the next decade will witness an explosion of public medical ontologies, both amateur and professional.

      The importance of ontologies has been recognized by the U. S. Defense Advanced Research Projects Agency (DARPA), the original sponsor of the Internet, which has proposed guidelines for a formal ontology AGENT MARKUP LANGUAGE, that employs the ONTOLOGY INFERENCE LAYER.

      4. A simple ontology is illustrated by the observation at autopsy that CHRONIC-PASSIVE-CONGESTION-LIVER often accompanies HEART HYPERTROPHY. In an approximate sense, HH causes CPCL. Thus, in a large case series, you would expect to find that most cases have neither; a moderate number of cases have both; Some cases have HH but don't have CPCL yet; and very few cases should have CPCL without also having HH. Such statistics could be collected from public medical data sources, in order to build ontologies.

      6. These correlations (2x2 CONTINGENCY TABLES), could be edited for redundancy and nonsense correlations.

      7. As necessary, a collection of such 2x2 contingency tables could be ORDERED BY IMPORTANCE, based upon frequencies.



13. TOPICS IN ARTIFICIAL INTELLIGENCE.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      CLASSES OF AI INVESTIGATION.
Inference and Reasoning.
Knowledge Representation.
Logic Programming.
Natural Language
Programming Languages.
Game Theory.
Pattern recognition, Image processing.
Robotics.
Search.
Theorem Proving.
Vision.


      INFERENCE AND REASONING. INFERENCE is the process of drawing conclusions from given data, by computer. Most INFERENCE ENGINES work by combining two appropriate formulas, such as: "A IMPLIES B"; and "B IMPLIES C". By the MODUS PONENS rule of inference, we may conclude (infer) that "A IMPLIES C";

      KNOWLEDGE REPRESENTATION is a mathematical notation for representing specific knowledge in medicine, typically qualitative information. For example:
MR._JONES IS_A MALE.

MR._JONES HAS_AN ENLARGED_LIVER.

ENLARGED_LIVER HAS_DIFFERENTIAL_DIAGNOSIS METASTATIC_TUMOR, HEPATOCELLULAR_CARCINOMA, AMYLOIDOSIS,....


      GAME THEORY is the theory of playing games. A suitable game must have UNAMBIGUOUS RULES, an END-POINT that specifies when the game is over, and a scoring system to determine WINNER, LOSER, AND TIE-GAME. The simplest game structure is a ZERO-SUM-GAME, in which the number of points gained by the winner equals the number of points lost by the loser, so that the sum of points at the end-point is ZERO. In a zero-sum-game, the optimal strategy is for the would-be winner to take points away from the loser.

      In a NEGATIVE-SUM-GAME, both sides lose, and the optimal strategy is for the would-be winner to lose fewer points than the loser. A negative-sum-game-winner goes by the name of PYRRHIC VICTORY, after the ancient Greek general, PYRRHUS, who won a great military battle only after losing most of his army (Livius, 193x). Many pacifists contend that most wars are negative-sum-games.

      In a POSITIVE-SUM-GAME, both sides win, and the optimal strategy is for the would-be winner to win more points for the loser. In biology, the name for this relationship is SYMBIOSIS.

      PATTERN RECOGNITION, IMAGE PROCESSING. Computerized PATTERN RECOGNITION is the ability to locate areas on an image, associated with informative scene events. While AI machines are better at recognizing colors and light intensities than humans, humans are particularly adept at recognizing edges, a fact that has limited the value of AI machines in reading radiographs (xrays) or pathology slides.

      THEOREM PROVING is a class of AI programs that attempt to prove mathematical theorems. These AI programs have the same problems that any human mathematician has (e.g., computational complexity, Gödel unprovability), as well as a lack of imagination, which human mathematicians have. Theorem proving programs have largely been used to verify proofs for theorems that are already believed to be true.

      VISION. is the process of assembling small units of light and color in an image or scene, and drawing conclusions about the content of the image, such as "is a traffic stoplight" or "is an apple".

      BAYES' THEOREM, invented by eighteenth century Anglican clergyman, THE REV. THEODORE BAYES, provides a formula for estimating the probability of compound events, based upon the probability of individual, contributing events. For example, the probability that a tobacco smoker will die of lung cancer, given probabilities of being a tobacco smoker and the general probability of dying from lung cancer. Bayes' formula was once held to be an important method for structuring medical information in medical expert systems. However, it has generally fallen into desuetude, partly because of the following flaws:
Required probabilistic independence of component probabilities. In many cases, there is a medical relationship between different components (parts) of a system of diseases.

Required knowledge of component probabilities. The probabilities for most diseases are unknown.

Requirement that all possible disease-categories be known in advance. There is no mechanism for DETECTING OR ADDING A NEW DISEASE, if the existing list of diseases appears unlikely.


      HIGH-ORDER PATTERN RECOGNITION is a term generally applied to GESTALT PERCEPTIONS, which seem easy for humans to perform, but challenging at least for today's AI machines. AI machines are adept at performing repetitive, methodical calculations, but cannot perceive the "big picture" in a way that humans can. Since many operations demanded of AI machines are of a Gestalt nature, such as recognizing a traffic signal or a lymphocyte, whose individual steps are not understood (if they exist at all), this limitation of AI machines is a significant disadvantage.

      REVERSE BACKUS NAUR FORM PARSING. Computer translation may be regarded as a branch of artificial intelligence [Bundy, 1997]. The fundamental operation of artificial intelligence is the production rule, X ==> Y, read, "X produces Y." Artificial intelligence production rules are akin to if-then statements in symbolic logic ( X ==> Y, read, "if X then Y") [Lewis and Langford, 1959; Suppes, 1957; Tymoczko, 1998]; and to set-inclusion statements in mathematical set theory (Y c X, read, "X includes Y") [Suppes, 1972; Tymoczko, 1998]; BNF is a common notation in computational linguistics [Aitchison, 2000; Naur, 1960], initially developed for designing computer languages, such as ALGOL, but eventually employed for parsing well-controlled free-text English, and for translating this parsed text into structures with common data elements. For example:
  1. [] ==> [Nx]  
  2. [Nx] ==> [N]  
  3. [Nx] ==> [AN] 
  4. [Nx] ==> [NPN]  
where []=null-sentence, Nx=noun-phrase, N=noun, P=preposition, A=adjective, etc. In this simple BNF model, the null-sentence, [], points to a noun-phrase (Nx) (expression 1); and the noun-phrase, in turn, points to one of three choices: noun-only (N) (expression 2); adjective-noun (AN) (expression 3); or noun-preposition-noun (NPN) (expression 4). The term to the left of ==> is called the "argument" the term to the right of ==> is called the "value." This simplified BNF supports surgical pathology diagnoses such as "hemangioma" (=N, expression 2); "actinic keratosis" (=AN, expression 3); or "adenocarcinoma of colon" (=NPN, expression 4). Perhaps a majority of surgical pathology diagnoses, particularly in a pathology practice with predominantly biopsy specimens, can be parsed with such a simple BNF model. However, many surgical pathology reports, especially in academic institutions with large resection specimens, are much more complex than would be suggested by this simple model. The encoder Perl script supports a BNF grammar of arbitrary size and complexity.

      In REVERSE BACKUS-NAUR-FORM PARSING, the parser begins with the more complex expression (the value, right of the arrow ==> ), and works backward to the simpler expression (the argument, left of the arrow ==> ), until it reaches the null-sentence. If the backward-parse fails to reach back to the null-sentence, then the sentence is presumed to be ungrammatical.

      STATISTICAL ESTIMATION is the determination of the BEST VALUE for a repeatedly observed quantity, and the VARIABILITY about that best value. Typically, the best value, or MEASURE OF CENTRAL TENDENCY, is the ARITHMETIC MEAN for a series of repeated observations. Other measures of central tendency include: MEDIAN, MODE, HARMONIC MEAN, GEOMETRIC MEAN.

      Similarly, the variability, or MEASURE OF DISPERSION is typically the STANDARD DEVIATION for a series of repeated observations. Other measures of dispersion include: RANGE, STANDARD ERROR, PERCENTILE.

      STATISTICAL HYPOTHESIS TESTING is the determination of likelihood that a particular statement is true. Generally, hypothesis-testing arguments are framed in terms of whether a particular variable assumes a value of zero or not, such as whether the difference of two means is zero (STUDENT t-TEST), or whether the slope of a linear regression is zero (PEARSON'S CORRELATION COEFFICIENT).

      EDGE DEFINITION is the ability to locate areas on an image, associated with informative scene events. While AI machines are better at recognizing colors and light intensities than humans, humans are particularly adept at recognizing edges, a fact that has limited the value of AI machines in reading radiographs (xrays) or pathology slides.

      GÖDEL'S THEOREM is the demonstration that, in consistent mathematical systems at least as complex as Euclidean geometry, ordinary arithmetic, or ordinary (Zermelo-Fraenkel) set theory, there are TRUE STATEMENTS THAT CANNOT BE PROVED. Perhaps the most significant such statement is: "this system is consistent", which is unprovable. Two other, significant statements are: THE AXIOM OF CHOICE; and THE GENERALIZED CONTINUUM HYPOTHESIS. The importance of Gödel's theorem in AI is the recognition that there are certain true, mathematical statements that are unprovable. This places an absolute upper limit on the effectiveness of certain AI programs, such as theorem-proving programs. There is also a huge class of problems (NP-complete problems) which are soluble, but for which practical solution methods are not known, despite the intense activity of the mathematical community over the past several decades.

      GÖDELIZATION, GÖDEL ENUMERATION is the ingenious method, developed by Gödel in order to prove his famous theorem, for expressing all possible well-formed statements in mathematics as whole numbers. See
GÖDEL'S THEOREM.

      First, all the atomic concepts of mathematics are numbered arbitrarily (say, 1=not, 2=and, 3=x, 4=y, 5=inclusive-or, 6=exclusive-or, ....) in a lookup table. The list of such concepts is short (hundreds), and available from Whitehead and Russell's Principia Mathematica. Second, one has a list of all the PRIME NUMBERS, as follows:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, .....
Then, for example, the mathematical statement "not x and y" is Gödelized as:
2not×3x×5and×7y
That is:
21×33×52×74 = 3,241,350
This plan for Gödelization was introduced in 1927 at a talk attended by Prof. John von Neumann, who used elements of this design for the von Neumann computer (ENIAC).

      BINARY TREES. BALANCED TREES. A BINARY TREE is a graph-theory tree, beginning at a single (most ancestral) point, the ROOT, and branching into exactly TWO DESCENDANTS at each branchpoint, until it reaches the final descendants, or LEAVES of the tree. We say that each branchpoint SUPPORTS its two descendants, and each descendant COVERS its parent branchpoint. In a well-constructed binary tree, or BALANCED TREE, you can reach any leaf on the tree, starting at the root, after a small number of BINARY (YES/NO) DECISIONS. For example, the first decision (at the root) can be used to separate all surnames starting from A to M from all surnames starting from N to Z. Likewise, the second decision can be used to separate all surnames starting from A-G from all surnames starting from H-M, etc. After one decision, one can locate two leaves; after two decisions, one can locate four leaves,...; and after n decisions, one can locate 2n leaves,...;

      PRIME NUMBER. A prime number is a whole number greater than one that is divisible without remainder only by itself and one. The first few prime numbers are:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, .....
It has been suggested, by nineteenth century German mathematician Ludwig Kroneker, that prime numbers are the
FUNDAMENTAL BUILDING BLOCKS of all mathematics. In the motion picture, CONTACT, starring Jodie Foster, the artificial language used to make contact with extra-terrestrial intelligence was built up from prime numbers. Euclid's FUNDAMENTAL THEOREM OF NUMBER THEORY states that every whole number is a unique, ordered product of powers of prime numbers. Prime numbers form the basis for most modern ENCRYPTION ALGORITHMS used in electronic financial systems.

      SIEVE OF ERATOSTHENES is a method for determining whether a whole number is a PRIME NUMBER, originally introduced by the ancient Greek mathematician, ERATOSTHENES. A prime number, p, is a whole number greater than one, that is divisible without remainder only by itself and one. For a given number, N, one examines each of the known prime numbers, p, less than sqrt(N). If p divides N without a remainder, then N is not prime. Essentially all prime-number-testers are variants on this Sieve of Eratosthenes, and can only be solved in NPC computational effort. It is a MATHEMATICAL CONJECTURE, but not known necessarily to be mathematically true, that it takes considerably more computing resources to determine the prime factors of a given number, N, than it takes to multiply the two prime factors of N.

      EUCLID'S FUNDAMENTAL THEOREM OF NUMBER THEORY. states that every whole number is a unique, ordered product of powers of prime numbers.

      FUNDAMENTAL THEOREM OF ALGEBRA.

      The FUNDAMENTAL THEOREM OF INTEGRAL AND DIFFERENTIAL CALCULUS is the assertion that the calculus derivative and calculus integral are inverse processes.

      ENCRYPTION ALGORITHMS. Prime numbers form the basis for most modern public-private encryption algorithms used in electronic financial systems. The most attractive encryption algorithm, for a number of reasons, is the PUBLIC/PRIVATE ALGORITHM, also known as Diffie's algorithm and the RSA algorithm (Schneier, 1997). In this algorithm, the RECEIVER of an encrypted message is the only person who knows the PRIVATE KEY, whereas the PUBLIC KEY is distributed to the general public, via the Internet. The private key is two, large prime numbers, and the public key is the the product of these two, large prime numbers, which cannot be factored in any reasonable period of time.

      A TURING MACHINE is a hypothetical computer, used by mathematicians to explore the theoretical properties of certain types of complex calculations. The concept is named to honor its creator, ALAN B. TURING, the famous British mathematician who helped decrypt the German ENIGMA MACHINE, used to pass messages among the German high command during World War II.

      In a Turing machine, there is a CENTRAL PROCESSING UNIT (CPU), which performs calculations, and a (infinite) READ/WRITE TAPE, which collects or erases data, and which moves, either forward or backward, against the CPU. All numbering is performed in 0=no,1=yes format. Each time that the tape crosses the CPU, the tape may change its numeric status according to information in the CPU. Turing machines are used to determine whether certain calculations are possible (such as, the consistency of ordinary arithmetic, which was determined as unprovable by Gödel). It has been shown that Gödel's enumeration and the numbering of certain features in the Turing machine are the same.

     
In a VON NEUMANN MACHINE. there is a single CENTRAL PROCESSING UNIT (CPU), which performs calculations, and a read/write tape. Only one operation can be performed at a single moment, namely, at the moment in which the CPU attendee weighs the available information and releases the information. A VON NEUMANN MACHINE is the same as a SERIAL MACHINE, because all the operations are performed serially.

      In a PARALLEL MACHINE, there are multiple CENTRAL PROCESSING UNITS (CPUs), each of which performs calculations, in the manner of a SERIAL MACHINE. In theory, a parallel machine should run many-fold faster than a serial machine, but results have been disappointing, because of the amount of coordination required among the different CPUs, especially if one stops and needs repairs while the other CPUs are running. Early excitement in this file (late 1980s) has faded.

     
INFINITY. Literally: Not Finite. The ancient Greeks abhorred infinity, and therefore its companion concept, ZERO. You don't need to HAVE INFINITY in order to STUDY INFINITY. Therefore, many important proofs in mathematics involving infinity, are studied by showing that a particular result CANNOT occur after a finite number of steps, etc. In coarse terms, this is the gist of Gödel's proof.

      The ABACUS is a computing device, consisting of rows of stones (Latin: CALCULUS=STONE), invented by the ancient Babylonians (Iraqis), and carried eastward to India (and eventually China) by Alexander the Great in the third century B.C.E. The abacus is still used in the Far East by merchants who do not have reliable calculators.

      The concept of ZERO had its beginnings as a PLACE-HOLDER in an abacus, i.e., a position where there was no stone. This method flourished in the mediaeval Islamic culture, but languished in the dark-ages western European culture. Along with the concept of zero, the Islamic mathematical culture introduced ARABIC NUMERALS. Have you ever tried to multiply two reasonably large ROMAN NUMERALS by one another, say, 89 × 94 = 8,366 (i.e., LXXXIX × CMXIV = MMMMMMMMCCCLXVI)? The superiority of the Arabic numeral system is immediately apparent.

      The abacus was re-introduced to western Europe in the book, LIBER ABACI (BOOK OF THE ABACUS), written by LEONARDO OF PISA (FIBONACCI) in 1202. Fibonacci is credited with re-introducing ancient Greek mathematics back to the Europeans, and educating the Europeans about algebra, after both study-areas had been abandoned to the Islamic culture of the time.

      CRYPTOGRAPHY, PUBLIC-PRIVATE. Prime numbers form the basis for most modern public-private encryption algorithms used in electronic financial systems. The most attractive encryption algorithm, for a number of reasons, is the PUBLIC/PRIVATE ALGORITHM, also known as Diffie's algorithm and the RSA algorithm (Schneier, 1997). In this algorithm, the RECEIVER of an encrypted message is the only person who knows the PRIVATE KEY, whereas the PUBLIC KEY is distributed to the general public, via the Internet. The private key is two, large prime numbers, and the public key is the the product of these two, large prime numbers, which cannot be factored in any reasonable period of time.

      A CONJECTURE. is mathematical statement, typically of the form: IF X, THEN Y, in which, if X is true, then it seems that Y is true. In a conjecture, the theorem may be tested by thousands of examples, but there is no proof that IF X, THEN Y, is always true. Many of the great theorems of mathematics, including the famous FERMAT'S LAST THEOREM, recently proved by Sir Andrew Wiles, began its life as a conjecture.

      A THEOREM is mathematical statement, typically of the form: IF X, THEN Y, in which, if X is true, then it can be proven that Y is true. The proof, in the style of EUCLID, is a series of mathematical assertions, in which each step is the consequence of one or more prior steps. The first step is X and the last step is Y.

      A PROOF is given for a theorem of the form: IF X, THEN Y, in which, if X is true, then it can be proven that Y is true. The proof, in the style of EUCLID, is a series of mathematical assertions, in which each step is the consequence of one or more prior steps. The first step is X and the last step is Y.

      For a proposed THEOREM, IF X, THEN Y, a COUNTEREXAMPLE is a demonstration in which, X is true but Y is false.

      INTELLIGENCE has been defined by one of my colleagues, Dr. Lawrence A. Brown, MD, as the ability to clearly imagine oneself in a place and/or time that one has never been.

      SKOLEMIZATION is a method for reducing expressions in symbolic logic into much simpler terms, not involving logical quantifiers (for-every, there exists, ....).



14. ARTIFICIAL INTELLIGENCE IN MEDICINE.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      The main applications of ARTIFICIAL INTELLIGENCE IN MEDICINE are so-called MEDICAL EXPERT SYSTEMS. These medical advisory systems, have been studied for years, and are largely a flop. They have no serious presence in the practice of medicine. The fax machine and the internet have had much more practical influence in medicine over the past two decades than AI. It's not that AI computers can't issue reasonable medical advice; but rather that physicians don't need that kind of help from computers. A much more pressing task is to organize and present all the relevant records for a particular patient to the physician; the rest is less of a problem.

      HIPPOCRATES: FIRST DO NO HARM.

      HIPPOCRATES: PATIENT PRIVACY.

      SUTTON'S LAW. Willie Sutton was a notorious bank robber from the 1930s, who allegedly answered the question, "Willie, why do you always rob banks?" with the answer: "Because that's where the money is." (Sutton denies this in his autobiography.) Sutton's Law is commonly discussed in medical education, and has approximately the same meaning as horses not zebras.

      One of the aphorisms of medical wisdom passed along from medical teachers to their students is the saying: "When you hear hoofbeats in the street, think of HORSES NOT ZEBRAS."

      DIFFERENTIAL DIAGNOSIS.



15. AI AND ANATOMIC PATHOLOGY.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      Each year in the USA, anatomic pathologists render diagnoses on some 40 million specimens of human tissue (Moore and Berman, 2000). Many of these diagnoses are made in university-based educational programs, where the responsible (attending) pathologist uses the signout process to educate younger (resident) pathologists. Each specimen arrives at the pathologist's microscope with identifying paperwork and one or more glass slides, which are examined by the pathologist. The paperwork includes the patient's name, age, sex, bodysite from which the specimen was obtained, and a few words of clinical history provided by the submitting physician. No pathologist makes an official diagnosis without proper paperwork.

      A common didactic exercise is to present the paperwork to the resident pathologist without the slide; or the slide without the paperwork. In either case, an experienced resident can make the correct diagnosis in about nine of ten cases. A colleague once told me: the last 10% is why pathologists are paid such exorbitant salaries!

      For example, slide without history. The appearance of masses of relatively small cells, with a blue, stippled nucleus, inconspicuous or absent nucleoli, and a thin rim of cytoplasm, with the cells sometimes smashed and distorted in the preparation process, goes by the slang term of SMALL BLUE CELL TUMOR among pathologists, and has the following DIFFERENTIAL DIAGNOSIS (Haber et al, 2001):
SMALL BLUE CELL TUMOR:
Small cell carcinoma of lung.
Merkel Cell carcinoma of skin.
Small cell adenocarcinoma of prostate.
Small cell carcinoma of larynx.
Medulloblastoma.
Esthesioneuroblastoma.
Retinoblastoma.
Most of these diagnoses are absolutely specific to a certain bodysite or clinical history, but they cannot be distinguished by microscopic findings alone. For example, retinoblastoma always occurs in the eye of children; medulloblastoma always occurs in the brainstem of children; Merkel Cell carcinoma of skin begins in the skin, but may metastasize to other locations in the body. If a surgeon submits a small blue cell tumor to the pathologist for diagnosis, the surgeon doesn't want the 25-choice grocery list of possibilities. The surgeon wants the single choice that is most consistent with the clinical findings.

      Alternative example, history without slide. Slowly enlarging, rubbery mass in the right thumb metacarpal-phalangeal joint, painless, subcutaneous, in a 50 year old woman. Response: unless there are other, salient findings:
PAINLESS, SUBCUTANEOUS METACARPAL-PHALANGEAL JOINT MASS.
Ganglion cyst.




16. ARTIFICIAL INTELLIGENCE ROBOTICS.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      The concept of a ROBOT, or machine-based humanoid, was first introduced by early twentieth century Czech science fiction writer KAREL CAPEK, in his classic science fiction story, ROSSUM'S UNIVERSAL ROBOTS. Robot is the Czech word for WORKER. Arguably, the inspiration for robots was the mediaeval Jewish concept of GOLEMS, or humanoid workers.

      The late DR. ISAAC ASIMOV formulated the THREE LAWS OF ROBOTICS in a series of science fiction stories and books published early in his career:
1. A robot shall never harm a human being, nor through his inaction, allow a human being to come to harm.

2. A robot shall always obey an order given by a human being, except as it conflicts with #1.

3. A robot shall attempt to protect himself, except as it conflicts with #1 or #2.
Later in his life, Asimov found many problems with these laws, perhaps the most serious being the conflict between #2 and #3. Asimov envisioned highly complex robots with hundreds of years of experience. If a malicious human told a robot to harm itself, the robot would be obliged to do so, or else be in conflict with #2.

      CAN A ROBOT EVER BE SENTIENT? The answer depends upon what you mean by sentient. One can define the problem out of existence by insisting upon sentience as a property restricted to carbon-based life forms, in which case, robots, at least those built from silicon-based computer chips, can never meet the standard of sentience. Similar arguments have been used, in the past, to define human rights out-of-existence for different ethnic, gender, or age groups (STAMPP). The issue is addressed in an episode of STAR TREK: THE NEXT GENERATION, in which there is a directive to disassemble (and thus potentially destroy) the apparently sentient Lieutenant Commander Data, a humanoid robot. There are enormous theological dimensions to this question (G.K.Chesterton; C. S. Lewis).

      If one has a wider definition of sentience, that includes all entities with a certain range of perceptions and emotions, then I would say, yes, robots may someday be sentient. However, with techology currently available, perception (vision, hearing) is poor and there is no agreed-upon list of emotional responses that one would build into a sentient robot. If processing power is greatly increased, and if we gain a deeper understanding of perceptual and emotional processes, then, yes, I believe that robots may someday be sentient.

      WHAT IS NEEDED FOR A ROBOT TO BE SENTIENT? At the very least, the following:
1. More computer processing power. Computers today cannot even capture and digest images in hours, that humans can work with in a matter of seconds. Example: microscopic slides in human anatomic pathology.

2. Some computer problems are insoluble, no matter what. For example: trisecting the angle with straightedge and compass (Euclid). finding a universal formula for quintic polynomials and above (Galois). the consistency of ordinary arithmetic (Gödel). The Axiom of Choice and the Generalized Continuum Hypothesis in ordinary (Zermelo-Fraenkel) set theory (Gödel, Cohen).

3. Some of these computer-insoluble problems may be irrelevant in the future. Who cares if you can't trisect an angle with straightedge and compass, when you can get as close to a trisection as you want, using calculus and limit theory? Same with finding the roots of quintic polynomials.


      ROBOTS: PARADOX OF SELF-REFERENCE. Another interesting paradox among robots that must obey Asimov's three laws of robotics is akin to the paradox of self-reference, known to the ancient Greek philosophers, Plato and Aristotle (so-called PARADOX OF EPIMENIDES THE CRETAN). In one of the later Foundation novels, Asimov has an older man, a history professor, fall in love with a young woman, who might in fact be a robot. The older man's ego is boosted by the fact that he has attracted the interest of a much younger woman. If the woman is actually robot, then she is would obliged to feign interest in the older man by the Second Law of Robotics. If the woman is a robot and is asked whether she is a robot, she must lie and say that she is not, or else she would harm the man's feelings, and in so doing, violate the Second Law of Robotics.



17. DISCUSSION.


GO TO NEXT PAGE.
GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      To Ernst Rutherford, the early twentieth century Nobelist in physics, is attributed the statement: "There are two kinds of scientists: physicists and stamp-collectors." Rutherford refers to the perception that physicists have mathematical concepts, equations, and quantities, whereas the biologists of that day were largely naturalists, who assembled collections of preserved organisms for museum displays. While it may be that physics at that time was more advanced than biology, biologists had nascent theories of evolution, natural selection, embryogenesis, and genetics, which have since proved themselves significant beyond scientific quibbling. The accompanying mathematics in biology is equally respectable: probability, statistics, cluster analysis, epidemiology, symbolic logic, computational linguistics. Rutherford could not have foreseen these developments, and his disdain for biomedicine as lacking in scientific respectability now seems quaint.

      The point of this anecdote is not just to say, yes, anatomic pathology is now on the verge of scientific respectability, but rather to point out the advantages of infusing mathematics into anatomic pathology, and to jump-start the process. Anatomic pathology is a highly practical, professional discipline, upon which 40 million Americans depend each year for accurate information regarding their very lives. A great deal of intellectual effort is spent in making more complete and accurate diagnoses, and in making certain that the requestors act upon these diagnoses in a timely manner. Quality assurance is increasingly important, but to automate this activity, the computer system must recognize anomalous events when they occur, and notify the appropriate persons.

      AI and mathematics are part of the solution to this quest. Anatomic pathology diagnosis can be roughly divided into two parts: PATTERN RECOGNITION and MEDICAL CONTEXT. Pattern recognition of tissue images is an important area of research, but practical benefits seem at least decades in the future. Nobody even talks seriously about computers making diagnoses from images anymore; we are still in the era of getting accurate, complete images sent from one laboratory to the next, so that humans can make the diagnoses.

      On the other hand, some assessment of medical context sufficient for fully computerized quality assurance activities is on the horizon. Medical records are gathered and maintained electronically in many larger, U. S. medical institutions. There are serious efforts to assess the organization and usability of these records in actual clinical encounters (Murff, 2001). Methods for parsing and automatically encoding surgical pathology reports are nationally funded and in progress (Berman, 2001). In this report, we make the argument that the main ontological framework of surgical pathology may be stored as a set of differential diagnoses, which in turn, may be instantiated as AI rulebases and subset relations in set theory. Such differential diagnoses are already published in tabular form (Haber et al, 2001). Algorithms for QA programs are in development, and appear to be NPComplete for datasets currently tried.



18. REFERENCES.


GO TO PREVIOUS PAGE.
GO TO TABLE OF CONTENTS.

      1. Aleksandr I, Morton H.
An Introduction to Neural Computing. Second Edition.
London: International Thomson Computer Press. 1995.
ISBN 1-85032-167-1, 284 pages.

      2. Bundy A, ed.
Artificial Intelligence Techniques. A comprehensive catalogue. Fourth, revised edition.
Berlin: Springer Verlag. 1997
ISBN 3-540-59323-3, 141 pages.

      3. Moore GW, Miller RE, Hutchins GM.
Microcomputer translator for medical text: Theorem verification for Chapter Two of Zeman's Modal Logic.
Adv Math Comput Med. 7:1621-1633, 1986.

      4. Moore GW, Riede UN, Polacsek RA, Miller RE, Hutchins GM.
Automated translation of German to English medical text.
Am J Med. 1986 Jul;81(1):103-111.

      5. Moore GW, Hutchins GM, Miller RE.
Examination of disease names using non-Abelian symbolic logic.
Methods Inf Med. 1986 Apr;25(2):109-115.

      6. Moore GW, Riede UN, Polacsek RA, Miller RE, Hutchins GM.
Group theory approach to computer translation of medical German.
Methods Inf Med. 1986 Jul;25(3):176-182.

      7. Moore GW, Miller RE, Hutchins GM, Erozan YS, de la Monte SM, Riede UN, Polacsek RA.
Multilingual translation techniques in the analysis of narrative medical text.

      8. Offerhaus GJA, Tersmette AC, Moore GW, Hershey J, Polacsek RA.
Dutch respelling rules for English and German medical word lists.
Methods Inf Med. 1987 Jul;26(3):99-103.

      9. Moore GW, Miller RE, Hutchins GM.
Determining cause of death in 45,564 autopsy reports.
Theor Med. 1988 Jun;9(2):179-186.

      10. Moore GW, Boitnott JK, Miller RE, Eggleston JC, Hutchins GM.
Integrated pathology reporting, indexing, and retrieval system using natural language diagnoses.
Mod Pathol. 1988 Jan;1(1):44-50.

      11. Moore GW, Wakai I, Satomura Y, Giere W.
TRANSOFT: Medical translation expert system.
Artif Intell Med 1:149-157, 1989.

      12. Moore GW, Berman JJ.
Performance analysis of manual and automated systematized nomenclature of medicine (SNOMED) coding.
Am J Clin Pathol. 1994 Mar;101(3):253-256.

      13. Berman JJ, Moore GW, Donnelly WH, Massey, JK, Craig B.
SNOMED Analysis of 40,124 Surgical Pathology Cases.
Am J Clin Pathol 102:539-540, 1994

      14. Berman JJ, Moore GW.
SNOMED-encoded surgical pathology databases: A tool for epidemiologic investigation.
Mod Pathol. 1996 Sep;9(9):944-950.

      15. Moore GW, Polacsek RA, Casanova MF, Erozan YS, Hershey J, Miller RE, Hutchins GM.
Multilingual respelling rules for an English medical word list.
MEDINFO-86. (R Salamon, B Blum, M Jorgensen, eds.) Elsevier Science Publishers B.V. (North Holland), 1986. Proc. Fifth World Congress on Medical Informatics, October 26-30, 1986, Washington, D.C., pp 1106-1110.

      16. Yu CC-Y, Moore GW, Unschuld PU.
Romanized Chinese respelling rules for an English medical word list.
Proc Annu Symp Comput Appl Med Care. 1987;11:xxx-xxx. Washington DC, November 1-4, 1987.

      17. Moore GW, Hutchins GM, Boitnott JK, Miller RE, Polacsek RA.
Word root translation of 45,564 autopsy reports into MeSH titles.
Proc Annu Symp Comput Appl Med Care. 1987;11: Washington DC, November 1-4, 1987.

      18. Moore GW, Miller RE, Hutchins GM.
Indexing by MeSH titles of natural language pathology phrases identified on first encounter using the barrier word method.
In, Scherrer JR, Côté RA, and Mandil SH, eds., Computerized Natural Medical Language Processing for Knowledge Representation. North-Holland, Amsterdam, 1989.

      19. Tersmette KWF, Scott AF, Moore GW, Matheson NW, Miller RE.
Barrier word method for detecting molecular biology multiple word terms.
Proc Annu Symp Comput Appl Med Care. 1988;12: Washington DC, November 6-9, 1988.

      20. Moore GW, Berman JJ.
Automatic SNOMED coding.
Proc Annu Symp Comput Appl Med Care. 1994;18:225-229.

      21. Berman JJ, Moore GW, Donnelly WH, Massey JK, Craig B.
A SNOMED analysis of three years accessioned cases (40,124) of a surgical pathology department: implications for pathology- based demographic studies.
J Amer Med Informatics Assn (JAMIA), Symposium Supplement, 1994, and Proceedings of the 18th Annual Symposium on Computer Applications in Medical Care 18:188-192, 1994.

      21. Dreyfus HL.
What Computers Still Can't Do.
Paperback, 429 pages (October 30, 1992)
Cambridge, MA: MIT Press;
ISBN: 0262540673, 429 pages.

      22.
Stampp KM.
The Peculiar Institution : Slavery in the Ante-Bellum South.
Paperback Reissue edition (December 1989)
New York: Vintage Books.
ISBN: 0679723072.

      23.
Sutton W, Linn E.
Where the money was.
Out of print.

      24.
Asimov I.
I, Robot.
Mass Market Paperback, 272 pages, Reprint edition (July, 1994). New York: Bantam Books.
ISBN: 0553294385, 272 pages.

      25. Asimov I.
The Robots of Dawn.
Mass Market Paperback Reprint edition (April, 1994). New York: Spectra.
ISBN: 0553299492.

      26. Asimov I.
The Complete Robot.
Paperback, 512 pages (1983). New York: Acacia Press, Inc.
ISBN: 0586057242, 512 pages.

      27.
Seife C.
Zero. The Biography of a Dangerous Idea.
London: Penguin Books. 2000.
ISBN: 0-670-88457-X, 248 pages.
This book includes an account of the execution of Hippasus of Metapontum, a member of the Pythagorean cult, who had dared to reveal the existence of irrational numbers to persons outside the cult.

      28.
Stewart I.
Flatterland. Like Flatland. Only More So.
Cambridge, MA: Perseus Publishing. 2001.
ISBN 0-7382-0442-0, 301 pages.
This book is a sequel of
Edwin A. Abbott's FLATLAND, published in 1884, and cited in Stephen Hawking's A BRIEF HISTORY OF TIME.

      29. Casti JL, DePauli W.
Gödel. A Life of Logic.
Cambridge, MA: Perseus Publishing. 2000.
ISBN 0-7382-0274-6, 210 pages.

      30. Aleksandr I, Morton H.
An Introduction to Neural Computing. Second Edition.
London: International Thomson Computer Press. 1995.
ISBN 1-85032-167-1, 284 pages.

      31. Scarborough D, Sternberg S.
Methods, Models, and Conceptual Issues. An Invitation to Cognitive Science. Volume 4.
Cambridge, MA: MIT Press. 1998.
ISBN 0-262-65946-0, 950 pages.

      32. Changeux J-P, Connes A.
Conversations on Mind, Matter, and Mathematics
Ed & Transl: DeBevoise MB. Princeton, NJ: Princeton University Press. 1995.
ISBN 0-691-08759-8, 260 pages.

      33. Hockey S.
A Guide to Computer Applications in the Humanities.
Chapter 8. Sound Patterns. pp.168-188.
Baltimore: The Johns Hopkins Univ Press. 1980.
ISBN 0-8018-2891-0, 248 pages.
Cited: Ott W. Metrical Analysis of Latin Hexameter by Computer. Revue 4:7-24, 1966.
Cited: Greenberg NA. Scansion Purement Automatique de l'Hexamère Dactylique. Revue 1967;3:1-25.

      34. Woodger JH.
The Axiomatic Method in Biology.
Out of Print.
Classic book by the grandfather of biomathematics and bioinformatics.

  &