BILL MOORE'S
INTERNET MINI-TUTORIAL.
DRAFT COPY ONLY.
1. HISTORY OF THE INTERNET.
The INTERNET is a worldwide system of computers, linked together
by telephone lines and other high-speed connections. The Internet has become
the ultimate vanity publisher. Nearly everyone with an idea to promote
can write a pamphlet or book, and post it for all to read. Publication
and distribution are virtually free.
The Internet started about three decades ago as a secure network
for a few top-of-the-line computers, sponsored by the
U. S. Department of Defense,
DEFENSE ADVANCED RESEARCH PROJECTS AGENCY (DARPA).
As more universities and defense contractors became hooked into this system,
the network became more public and less secure for defense secrets.
Now the Internet is completely public.
Historically, the Internet (originally, DARPANET) was constructed
as a redundant communication network, that would keep communications
in the USA intact in the event of an enemy attack. It works great.
I was in a U. S. Government installation during the attack on the Pentagon
and the New York World Trade Center, on September 11, 2001.
Cellular telephone and long-distance voice telephone traffic were briefly
interrupted, but local telephone service and the Internet never flickered.
Many big computers and many, many small computers comprise the Internet.
The big computers typically function as SERVERS,
and the small computers usually function as CLIENTS.
If you are reading this mini-tutorial from a home computer,
then your computer is probably a client computer.
In order for a client computer to connect to the Internet,
you are probably paying between $10 and $25 per month to an
INTERNET SERVICE PROVIDER (ISP), which is your server.
Popular ISPs in the Baltimore-Washington area include:
America Online (AOL), Microsoft Network (MSN), Earthlink,
or Starpower (formerly, Erols). There are also offerings from
COST-FREE SERVERS, if you don't mind filling up
your computer screen with annoying advertisements while you work.
However, even these free servers have a price if you use
a significant amount of server resources.
And good luck getting technical service
if you aren't paying anybody anything.
2. SERVICES OFFERED ON THE INTERNET.
Your Internet Service Provider (ISP) offers a range of services,
and you typically pay on a sliding scale, depending upon what you get.
Popular services include:
1. Electronic mail (email).
2. Chat rooms.
3. Worldwide web (www).
4. Personal home page.
5. Common Gateway Interface (CGI).
6. UNIX Operating System.
7. Programming in Perl, C, Java.
8. File Transfer Protocol (FTP).
ELECTRONIC MAIL (email) is like ordinary paper mail,
except that you post your letter from your (client) computer,
and send it to another client computer, but way of servers
acting as intermediaries. If things are running smoothly,
an email letter may reach its destination in a few hours
(sometimes in seconds!). However, there are many ways for email
to go astray, and your server may or may not inform you
that your letter never reached its destination.
Sometimes you will be notified that an email has failed to reach
its destination by the MAILER DAEMON, but this function
is not entirely reliable. Almost all ISPs provide email services.
Some email services are offered cost-free, if you don't mind
your screen filling up with a lot of advertisements,
and having a lot of vendors know some of your personal information.
CHAT ROOMS, also known as
MULTI-USER CONVERSATION CHANNELS (MUCCs), are like a floating
poker game, except that the players are in different locations,
connected live on the Internet. People can come in and out of a chat room,
and they may identify themselves by their real names, or by pseudonyms
that they have chosen, in order to reflect a character that they wish
to role-play. Chat rooms are typically devoted to a particular topic.
The WORLDWIDE WEB (WWW) is a collection
of individual WEBSITES, than contain an initial document,
or HOME PAGE, that in turn is linked to many subsidiary documents.
The power of the web derives from the fact that:
(1) The documents may be richly formatted, using various typesetting
commands, as well as multimedia segments, such as images, cinematic video
clips, audio clips, etc.; and
(2) Linkages may be made to related documents anywhere else on the
Internet.
For example, the GOETHE UNIVERSITY AUTOPSY REGISTER, at URL:
http://www.netautopsy.org/apep01gu.htm
is launched from your computer at home (wherever that is);
which moves the request to The Johns Hopkins University School of Medicine
in Baltimore, MD, USA; obtains the relevant autopsy reports
from files stored in Frankfurt, Germany; and sends the result
back to your home computer.
Technically, the great advantage of the web is that your Internet server
sends you a desired web page, and then ignores you until your computer
requests another web-page. Your client computer sends a request to a server,
and the server responds by sending back a webpage.
Although your telephone line is in-use the whole time, and your server
is always passively paying attention in case you send another webpage,
the server is only working actively for you in the few moments
when it is accepting your request and returning the webpage.
The time which you spend staring at the webpage on your home computer
does not consume any significant resources from your server.
In effect, this is like a roomful of amateur chess players playing
simultaneously with a grand master chess player. The grand master goes
from board to board, spending an instant with each amateur player.
This is an extremely cost-effective way for (fast) servers to interact
with (slow) clients. In older server-client systems, the server paid
complete attention to a single client for an entire session,
a much less efficient arrangement.
For an additional five or ten dollars per month, your
Internet Service Provider (ISP) will lease you a few megabytes of space,
into which you can deposit your own, PERSONAL HOME PAGE.
One megabyte = 1 MB = 1 million letters; for comparison,
the Holy Bible (King James Version) is 4.2 MB. Anybody in the world
can visit your personal home page on the worldwide web,
as long as they know how to find it.
Various Internet sites, some cost-free, some with a monthly charge,
will link your home page to various indexing systems on the Internet.
You can deposit your job resume, your philosopy of life,
your hobbies, and anything else, as long as it is not
copyrighted by somebody else.
Some of the more popular Internet indexing sites, such as:
http://www.google.com
http://www.yahoo.com
http://www.dogpile.com
will index your home page, whether you ask them to or not. Your best be
for being findable on an Internet index is to put some distinctive words
on your page. For example, the GOETHE UNIVERSITY AUTOPSY REGISTER,
is indexed by google.com under keywords GOETHE AUTOPSY.
Similarly, the JOHNS HOPKINS UNIVERSITY AUTOPSY RESOURCE,
is indexed by google.com under keywords HOPKINS AUTOPSY.
Automated indexing by these WEB CRAWLERS usually takes a few days
after the page is posted.
BEWARE: Some Internet service providers, including this one,
have censorship conditions, and will shut down your website
if you include objectionable material on your web pages.
The COMMON GATEWAY INTERFACE (CGI) is a method for exchanging files
between server and client. The most popular operating system in the
common gateway interface is UNIX. The most popular programming languages
in the common gateway interface are PERL, Java, and C.
An OPERATING SYSTEM is the computer environment language,
in which you control the fundamental operations of your computer system,
such as copying files, sending files to your disk drive or printer, etc.
The operating system on your computer is probably either Microsoft
Windows 3.x (obsolete), Microsoft Windows 95, Microsoft Windows 98,
Microsoft Windows NT, Microsoft Windows XP, or Macintosh System 7.
UNIX is the most popular operating system in the
common gateway interface for the Internet. Historically, UNIX had been
an expensive (and difficult-to-learn) operating system, used on
large computers located in university computer departments. UNIX is
now available as an inexpensive operating system on small computers,
but it has never really caught on, largely because it is difficult to use
for the average non-computer-programmer.
LINUX is a version of UNIX, developed by Linus Torvaldsen
(thus the name, Linux), as a publicly available version of UNIX.
RED-HAT LINUX is the most popular version for small computers,
but it has never really caught on. It takes a lot of patience
and concentration to use UNIX. By contrast, MS-Windows can be used
by almost anybody after a little bit of training.
The FILE TRANSFER PROTOCOL (FTP) is a service for transferring
large documents across the Internet.
To use FTP, you need an INTERNET SERVICE PROVIDER DIALUP ACCOUNT
and an FTP program. Many serviceable FTP programs are available
as freeware on the Internet. We recommend: WS_FTP LIMITED EDITION
as an FTP program. To get a free copy, see (54).
4. PROGRAMMING LANGUAGES.
A programming language is the language in which you may set up complex
calculations for your computer to perform, such as recognizing passwords,
accepting credit-card information, calculating statistics, etc. The most
popular programming languages in the Common Gateway Interface of the Internet
are: Practical Extraction and Reporting Language (PERL), Java, and C.
Programming languages are typically resident in the server-computer,
in the UNIX operating system, and the results are transmitted
to the client computer in the form of a webpage.
Some programming languages may be run in the client-computer,
and may be run on your Internet browser, even if the dialup line
is disconnected. JAVA and C are very powerful,
but subject to security breaches except in the hands of the experts.
It's difficult to learn even a little Java or C. JavaScript is secure
but almost useless, because it can't access large files,
for security reasons.
It is easy to learn a little PERL, and you can start writing your own
simple programs in one day. However, PERL has a great deal of depth,
and you can do many sophisticated operations in PERL, after sufficient study.
You can download complete instruction manuals and serviceable compilers
(i.e., computer language translators) for PERL from the Internet.
PERL is virtually the single-handed creation of one person,
LARRY WALL. A comprehensive tutorial is offered by [76].
The programming language C (the third try at Bell Laboratories, Inc.,
Murray Hill, NJ, after discarding languages A and B)
is quite difficult, and takes several months of study even to write
and understand a simple C program. However, you can do almost anything,
and do it efficiently, in C.
Similarly for Java.
An INTERNET BROWSER is a software system, which resides
on your (client) computer, and conducts the dialogue between your computer
and your Internet Service Provider (ISP). Popular Internet browsers include:
NETSCAPE and MICROSOFT EXPLORER.
Many ISP-vendors, such as America Online and Microsoft Network,
provide proprietary software when they first sell you a service contract.
Other ISPs, such as STARPOWER, simply give you a registered copy
of a major browser, such as Netscape.
For Netscape and Microsoft Internet Explorer, operating on Microsoft Windows,
the operating instructions are fairly self-explanatory. In order to print
the webpage which is currently displayed on your screen, simply go to
the File option on the TOOLBAR in the upper left-hand corner,
and select Print. If you have displayed a webpage on your browser,
and you wish to print it later when you have time,
then go to the File option and select Save As.
If you have created your own webpage file on your hard disk drive,
and you wish to see what it will look like on the web,
then go to the File option and select Open.
This simple trick allows you to completely debug and clean up
your webpage before you show it to the world.
DIALUP SOFTWARE? establishes a telephone connection
between your home computer and an ISP server.
For many ISPs, the dialup software comes as part of the
software provided when they first sell you a service contract.
Some cable television companies offer a CABLE INTERNET CONNECTION,
which is much faster than a voice telephone line, and is on all the time.
For this convenience, you pay an additional monthly fee, typically
about $20 more. However, for this fee, your telephone line is no longer tied
up with internet traffic, which might be an advantage to some users.
The two most popular Internet browsers are fairly serviceable systems
for simple typesetting, printing, and testing new web pages.
Both browsers are in Version 5 or better.
In these systems, it not necessary to have an active dialup in order to
display a page on the browser. Simply go to the File option
in the upper left-hand corner, and select Print. If you have displayed
a webpage on your browser, and you wish to print it later when you have time,
then go to the File option in the upper left-hand corner,
and select Save As. This trick can also be used in order
to save email messages while your dialup is active, then print them out later
when the telephone is disconnected.
This is particularly valuable if your organization
only owns a single telephone line, and you do not
want your computer to interfere with telephone voice communications.
In the AMERICAN STANDARD CODE FOR INFORMATION INTERCHANGE (ASCII),
so-called seven-bit ASCII is a system of characters corresponding to
the numbers between 0 and 127, which is almost universally standardized.
For example, A=65, B=66,..., Z=90, and a=97, b=98,..., z=122.
The INTERNATIONAL STANDARDS ORGANIZATION (ISO) uses the same
numbering system. Seven-bit ASCII is also known as VANILLA ASCII,
because it is plain, ordinary ASCII, with no frills,
no special alphabets, etc.
So-called EIGHT-BIT ASCII,
which is much less standardized than 7-bit ASCII, is the system of characters
corresponding to the numbers between 0 and 255. Different national language
groups assign their special characters to the numbers between 128 and 255,
but there are not enough slots for all the different alphabets
(French accents, Spanish tildes, German umlauts, Greek, Cyrillic,
Chinese, Japanese, and Korean alphabets). There is no universal agreement
on how to assign the characters in eight-bit ASCII. In email transmissions,
some server-intermediaries truncate the first bit of 8-bit ASCII,
resulting in gibberish. I have a Chinese-born colleague here in the USA,
who writes emails to her (Chinese) friends back home in English, because
the transmission of Chinese characters is so chaotic and unreliable.
A WORD-PROCESSORS FILE is an electronic image,
held in computer memory, or on a floppy disk.
A word-processor file consists of TEXT and MARKUP.
Markup is all the typesetting commands and special characters
in addition to text.
Alas, WORD-PROCESSORS ARE NOT STANDARDIZED.
When word-processors first became popular, different private
companies introduced different, incompatible formats for their
word-processor texts, in order to lock in their customer-base.
The result is a virtual tower of Babel in word-processors.
The closest thing to standardization is an option on most
word-processors to output their texts in seven-bit ASCII.
TO MAKE A SIMPLE (VANILLA) ASCII FILE
in Microsoft Windows 95 or Microsoft Windows 98,
click on START, then click on Programs,
then click on Accessories, then click on Notepad.
Notepad is also available on Windows 3.x.
In Notepad, a simple ASCII file has a name ending with .txt.
FILES ON THE WORLDWIDE WEB ARE NEARLY STANDARDIZED.
The nearly-universal markup language of the worldwide web
is HyperText Markup Language (HTML),
and almost all Internet browsers,
on almost all computers (IBM or MAC) can properly display an HTML file.
Big computer software companies are continually adding new features
to HTML, in an effort to win over customers, and then make these
customers dependent on proprietary HTML features.
So far, this trick hasn't worked, because there are a lot of little guys
out there, who wish to serve a big audience, and have no motivation
to restrict their pages to customers of a particular company.
For very fancy documents, requiring a lot of mathematical symbols
or other graphics, ADOBE is the emerging standard.
ADOBE word processors are proprietary and expensive.
However, ADOBE readers are cost-free and widely available.
The big problem with ADOBE is that you have to open
up and print an ADOBE file before you know what's in it.
At least in HTML, you can glance at the first few pages,
and move on if you don't like what you see.
MAKING A SIMPLE ASCII FILE.
In Windows 95, 98, or NT, click on START,
then click on Programs,
then click on Accessories, then click on Notepad.
Notepad is also available on Windows 3.x.
In Notepad, a simple ASCII file has a name ending with .txt.
MAKING A SIMPLE HTML FILE.
Make any vanilla (seven-bit) ASCII text-file,
using Windows Notepad, or some other suitable word-processor.
Then insert it into the following boiler-plate:
<html><head><title>
PLACE YOUR TITLE HERE</title></head>
<body>PLACE YOUR MAIN TEXT HERE
<br><br><br></body></html>
PROGRAMMING LANGUAGES are languages in which you may set up
complex calculations for your computer to perform, such as
recognizing passwords, accepting credit-card information,
calculating statistics, etc. The most popular programming languages
in the common gateway interface of the Internet are
PRACTICAL EXTRACTION AND REPORTING LANGUAGE (PERL),
JAVA, and C.
HYPERTEXT MARKUP LANGUAGE (HTML) IS NOT A PROGRAMMING LANGUAGE.
For all intents and purposes, HTML is a typesetting language.
On the worldwide web, HTML issues the formatting instructions
for text information (also: image, audio, and cinematic video)
that appears on the web page.
If you want programming on the worldwide web,
then you have two choices: programming on the server;
or programming on the client.
JAVA is a C-like browser language.
Java is typically run in the client-computer,
and may be run on your Internet browser,
even if the dialup line is disconnected.
Java is quite difficult, and it takes
several months of study even to write
and understand a simple Java program.
However, you can do almost anything, and do it efficiently, in Java.
If you have a NetScape or Internet Explorer Browser,
then you have a language compiler for Java.
5. INTERNET SPELLING CORRECTOR.
If you have or can get programmer access to your website, then one of
the things you should have is an Internet misspelling-corrector (IMSC).
Genealogists and archivists use these programs for finding matches
to slightly misspelled surnames and place-names [61]. Jewish and Mormon
scholars are particularly strong in this area. Jewish researchers
are attempting to reconstruct the entirety of Jewry lost
in the Holocaust [62].
Many survivors contributing to this effort remember their relatives' names
and city of birth in Central Europe only by sound, and may misspell them.
Mormons are interested in tracing the ancestors of all their faithful
believers, so that they may retroactively baptize them [63].
The reason why your website needs a misspelling-corrector is for users
who remember words better by sound than by spelling. For example,
in the U. S. Government standard system, my Maryland driver's license number
begins with M600, as would any Marylander's named More, Maur, Mare, etc.
On the other hand, computer databases are unforgiving about
even minor misspellings.
In databases, as in dictionaries, a record must be listed
under only a single name, although sophisticated systems allow for aliases.
If you've read Chaucer in the original, you know that the 13th century
English weren't very consistent with spelling, but that all changed
when dictionaries were invented in the 18th century.
It would be a shame for a person to go to a website,
and not find a topic because of a minor misspelling.
The Internet is loaded with free misspelling-correctors [61],
but the optimal solution is to write your own, because: (i) you can optimize
the program performance to your own set of names;
(ii) you can bypass the advertisements that are
an inevitable part of free Internet misspelling-correctors; and
(iii) you can integrate the program seamlessly into the rest of your website.
The original method for making misspelling corrections is SOUNDEX,
invented by Robert C. Russell of Pittsburgh, Pennsylvania,
and issued patent number 1,261,167 on April 2, 1918 [61,62].
In this system, the letters of the Roman alphabet are
divided into eight categories, as follows:
RUSSELL SOUNDEX TABLE.
0. Vowels: a, e, i, o, u, y.
1. Labials and labio-dentals: b, f, p, v.
2. Gutterals and sibilants: c, g, k, q, s, z .
3. Dental-mutes: d, t.
4. Palatal-fricative: l.
5. Labio-nasal: m.
6. Dental-nasal or lingual-nasal: n.
7. Dental fricative, r.
In SOUNDEX, all consecutive vowels are collapsed to the first vowel
(AE, AI, AU, etc., all become A); and all repeat letters
are collapsed to one (SS becomes S, TT becomes T, etc.).
A few other rules apply for plurals and other peculiarities of English,
such as GH in rough, cough, and bough.
After Russell's patent expired, SOUNDEX spawned a generation of imitators.
One of the most valuable is the U. S. Government version, which is,
of course, free to all. Until one of the sophisticated versions
of U. S. SOUNDEX was put into place, death certificates
(often written in the spidery handwriting of some county clerk)
were not immediately and correctly matched to social security records,
and the U. S. taxpayers spent $2 billion annually on social security checks
to deceased persons. I attended a conference in the 1970s,
in which a government official lamented that a U. S. birth certificate
has an average of seven credible matches to death certificates.
In the U. S. SOUNDEX system, letters M and N are combined;
and vowels are omitted unless at the initial letter of the word.
Then the SOUNDEX code consists of the first letter of the name,
followed by three digits. These three digits are determined
by dropping the letters a, e, h, i, o, u, w and y,
and adding three digits from the remaining letters of the name,
according to the table below. There are two additional rules:
(i) if two or more consecutive letters have the same code,
then they are coded as one letter; (ii) if there are an insufficient numbers
of letters to make up three digits, then the remaining digits
are set to zero.
U. S. SOUNDEX TABLE.
0. a, e, h, i, o, u, w, y.
1. b, f, p, v.
2. c, g, j, k, q, s, x, z.
3. d, t.
4. l.
5. m, n.
6. r.
Obviously, SOUNDEX benefits from customization, based upon the
contents of the name file. The Ashkenazy Jewish SOUNDEX [62]
consolidates spellings based upon features of Hebrew, Yiddish,
Germanic, and Slavic languages.
There are published methods for English surnames, Hispanic surnames,
and surnames of inmates in the New York State prison system [65],
which take advantage of the special features of these name-spaces.
Setting up a customized Internet site has the tremendous advantage
that you can see what people are asking for (i.e., you keep a record
of users' inquiries), and you can modify the behavior of the program
accordingly.
The one thing that paper books do not have is the ability
to respond to queries. What if I wanted to ask the author
how he felt about a particular paragraph?
Yes, yes, I hear all the discussions from the various pundits.
But the pundits don't even agree with each other.
Which one agrees with the author?
If the author had lived today,
he probably would have constructed an Internet HTML dialog box
with the responses to Frequently Asked Questions (FAQs).
I suppose there are some questions that we twenty-first-centurians
could ask that the author could not have anticipated, but probably not many.
The more I read about the ancients, the more I am impressed
with the breadth of their thinking. For example, Hippocrates named
and used concepts of BRONCHITIS, NEPHRITIS, and HEPATITIS, although
the understanding of these diseases has matured since those days [67].
Euclid proved a simple, mathematical theorem (i.e., the infinity
of prime numbers) that underlies nearly all the modern
electronic security systems [68,69,70,81].
If Euclid were wrong, then the world banking system would collapse
for a few days while we scramble for substitutes. But don't worry, Euclid
was almost certainly right. I would like to build a website that talks back.
However, one of the disconcerting behaviors of many of these resources
is their unforgivingness toward misspellings.
I recently used a zipcode checker in which you had to write PO BOX,
but P. O. BOX would not work, and there was no guidance in the error message
as to how to correct such a mistake.
There is also a certain thrill of building an application in cyberspace
where the participants are physically separated. I have now done this
with correspondents in Germany, Turkey, and Japan, and the
Johns Hopkins Autopsy Resource (JHAR), which I administer, has
a seamless linkage to the Frankfurt University School of Medicine
Autopsy Register. All these linkages have been crafted by amateur talent,
working for free, without a support staff. It's a lot of fun.
6. WEBSITE LANGUAGES.
There are two levels of languages for a website:
a MARKUP LANGUAGE and a PROGRAMMING LANGUAGE.
A markup language is the language that presents the webpage
in different fonts, colors, locations on the page, position of images, etc.
Far and away the most popular markup language on the Internet
is HTML (Hypertext Markup Language) [72,73]. Microsoft WORD files have the
option of being downloaded as HTML files, which can then be ported
directly to the Internet. It is easy to learn a little HTML,
just as it is easy for foreigners to learn a little English.
Witness the fact that just about every Tom, Dick, and Harry on earth
has his own homepage. The main drawback with a webpage written
in a markup language is that it just sits there. Markup languages
have essentially no capability to collect or respond to user-supplied data.
A programming language, also known as a
SCRIPTING LANGUAGE,
is the language that accepts user data
(name, email address, responses to boilerplate questions, etc.).
The best programming language for amateurs like myself is Perl [74,75,76],
because it is easy to learn a little Perl. Two other leading
Internet programming languages, Java and C, are slicker and faster than Perl,
but they take several months of effort just to learn to write a program
that says: Greetings from Bill Moore. Perl is easy and relatively forgiving
as programming languages go.
HTML is the lingua franca of the Internet.
You can also view an HTML file locally (within your own computer),
to see what it looks like before you display it to the world on your website.
You simply click on your BROWSER icon, click on the FILE command
(TOOLBAR, upper left corner, of Microsoft INTERNET EXPLORER or NETSCAPE),
and click to OPEN a local file. You can also click on VIEW, then
click on DOCUMENT SOURCE from the TOOLBAR. You can purchase
any number of HTML books from Borders or other quality bookstores [72,73],
but most people learn sophisticated features of HTML by seeing a website
they like, and looking at the DOCUMENT SOURCE to see exactly how it was done.
In church they taught you not to steal, but this is exactly how business
is conducted on the worldwide web. Obviously, don't steal the exact code
(usually a copyright infringement, unless it's a U. S. government site),
but its OK to steal the idea. This feature of HTML alone is a major
reason why the worldwide web has spread globally like wildfire.
Another important property of HTML is that it works on IBM PCs,
Macs, and Unix-based servers. It is the ultimate, worldwide
U. S. First Amendment right to say what you please in public.
I like to work with raw HTML on a text-editor, for the same reason
that some auto drivers prefer a stick shift to an automatic: better control.
Click on START,
then click on PROGRAMS, then click on ACCESSORIES, then click on NOTEPAD.
The Microsoft NOTEPAD text editor is great for any file up to 64 kilobytes,
but it chokes on files above this size. When you're ready to write a file
above 64 KB, we can talk about alternatives. For now, NOTEPAD will suffice.
The simplest possible HTML program is two commands:
<html>
</html>
where <html> starts the file, and </html> ends the file.
Write this to a NOTEPAD file (or COPY and PASTE it), and OPEN it
on your browser. You will see a blank page, because this HTML page
has no content.
Typically, an HTML command begins with <...>
and ends with <...> . The entire file is typically divided
into a HEAD and a BODY. The TITLE of the file is part of the HEAD,
and the BODY contains the text of the document.
A forced <BR>eak to a new line does not have an end command.
You now have enough information to build a text file in HTML.
<html>
<head>
<title>
Place Title Here.
</title>
</head>
<body>
Place Text Here.
<br>
Place next line of text here.
</body>
</html>
Again, write this to a NOTEPAD file (or COPY and PASTE it), and OPEN it
on your browser. You will see a page that reads:
Place Text Here.
Place next line of text here.
The next step you should learn in HTML is how to link your webpage
to other pages on the Internet. For example, if you wish to link
to the popular webpage index, google.com, you type:
http://www.google.com/
into the command line of your web browser.
To display the same linkage in a webpage, enter:
<html>
<head>
<title>
Link to google.com
</title>
</head>
<body>
Link to google.com:
<br>
<a href="http://www.google.com/"> http://www.google.com/ </a>
</body>
</html>
Now let's talk about Perl. Perl is a programming language that accepts
user data from somewhere on the Internet, and does something with it.
Remember, on the Internet, somewhere is anywhere.
I have an HTML page in Baltimore that launches a program in
Frankfurt, Germany, and then returns the results back to the Baltimore site.
The fundamental paradigm is simple:
You launch a program with a valid HTML file,
and the program returns another valid HTML file.
You can go back and forth like this forever.
For the time being, we can practice on one of my websites.
I will give you the exact name and password after I get them all set up.
To run a Perl program at your website, the website needs
to have a COMMON GATEWAY INTERFACE (cgi-bin), that includes
Perl at least version 5. Such a website costs about twice
as much as a website without a cgi-bin, but without a cgi-bin,
you have no way to collect information from users.
If you are interested, you should look into this.
If you have no foreseeable access to a cgi-bin with Perl,
then you may not wish to continue reading this letter.
Learning Perl without a place to practice it is like swimming lectures.
You really have to jump in to get it right.
There are additional security issues with a cgi-bin, since you would then
have the power to broadcast unwanted files and email (so-called SPAM)
to user sites. Somebody has to trust you enough to give you this access,
and in my experience, unsophisticated but powerful people will throw up
a blizzard smokescreens and baffles before they let you go ahead.
You've been around church politics long enough to know how that goes.
Here is an HTML program that launches a GREETINGS program in Perl
from one of the websites that I control:
<html>
<head>
<title>
Greetings.
</title>
</head>
<body>
Greetings from Bill Moore.
<br>
<a href="http://www.medparse.com/cgi-bin/greeting.cgi">
http://www.medparse.com/cgi-bin/greeting.cgi </a>
</body>
</html>
Here is the Perl program that is launched:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
### PERL5 script to say greetings.
### Last Modified 12/30/2002.
###
### PRINT HEADER.
###
print "<html><head><title> Greetings.
</title></head><body>";
###
### PRINT GREETINGS.
###
print " Greetings from Bill Moore. ";
###
### PRINT TRAILER.
###
print "<br><br><br><hr>";
print "Last Updated: December 30, 2002, G. William Moore, MD, PhD.";
print "<br><br></body></html>"; exit;
Each Perl program begins with
#!/usr/bin/perl
print "Content-type: text/html\n\n";
and ends with
exit;
If you have a Perl interpreter that works locally on your own computer,
then you can try out the above program yourself.
Just COPY the program from this email and PASTE it into NOTEPAD,
with a suitable name such as greeting.cgi
Click on START, then click on RUN,
then enter
perl greeting.cgi
in the dialog box.
Here is the HTML file that the Perl program produces:
<html><head><title> Greetings.
</title></head><body>
Greetings from Bill Moore.
<br><br><br><hr>";
Last Updated: December 30, 2002, G. William Moore, MD, PhD.
<br><br></body></html>
Here is what the file looks like on your browser:
Greetings from Bill Moore.
Last Updated: December 30, 2002, G. William Moore, MD, PhD.
If you don't have a local Perl interpreter, then you can get one at:
http://www.downloadsafari.com
Put the word PERL in the search dialog box.
Select the file named PERL..500402. It may be on the second
or third page. Click on DOWNLOAD NOW. As far as I can tell,
this file has no cost and no registration requirement so that
they can load up your email with junk mail.
Now you must UNZIP the perl...zip file using WINZIP.
If you don't have WINZIP, then get an evaluation copy at:
http://www.winzip.com
Follow the instructions.
The unzipped Perl files will end up in subdirectory
c:\unzipped\perl5...\perl5...\perl\bin
To get started, the only files you really need from this subdirectory are:
perl*.* and *.dll. Now enter:
perl greeting.cgi
Finally, here's how the user gives information to a Perl program,
using the FORM command:
<html><head><title> Greetings.
</title></head><body>
Greetings from Bill Moore.
Enter your name here:
<br><form name="sender" method="get"
action="http://www.medparse.com/cgi-bin/greeting.cgi">
<input type="text" name="tx" size=30 maxlength=40 value="">
<input type="submit" name="bx" value="SUBMIT"></form>
<br> <br><br><br><hr>";
<br> Last Updated: December 30, 2002, G. William Moore, MD, PhD.
<br> <br><br></body></html>
COPY and PASTE it and OPEN it on your BROWSER.
Enter your name, and click on the submit button.
The command line at the top of your browser will read:
http://www.medparse.com/cgi-bin/greeting.cgi?tx=YOUR+NAME&bx=SUBMIT
The Perl program that sends the answer is:
#!/usr/bin/perl
print "Content-type: text/html\n\n";
### PERL5 script to say greetings.
### Last Modified 12/30/2002.
###
### PRINT HEADER.
###
print "<html><head><title> Greetings. </title></head><body>";
###
### COLLECT INPUT STRING.
###
$inputstring=($ENV{'QUERY_STRING'});
($inputleft,$inputright)=split(/tx=/,$inputstring,2);
($outputleft,$outputright)=split(/\&/,$inputright,2);
$nameout=$outputleft;
###
### PRINT GREETINGS.
###
print " Greetings from Bill Moore. ";
if($nameout ne ""){
print "<br> Greetings to $nameout. ";};
###
### PRINT TRAILER.
###
print "<br><br><br><hr>";
print "Last Updated: December 30, 2002, G. William Moore, MD, PhD.";
print "<br><br></body></html>"; exit;
7. REFERENCES.
1.
U. S. Defense Advanced Research Projects Agency (DARPA).
http://www.darpa.mil
2.
U. S. Defense Advanced Research Projects Agency (DARPA).
Agent Markup Language.
http://www.daml.org
3.
U. S. Defense Advanced Research Projects Agency (DARPA).
Ontology Inference Layer.
http://www.ontoknowledge.org/oil
4.
Simpson A. HTML Publishing Bible, Windows 95 Edition.
Foster City, CA: IDG Books Worldwide, Inc., 1996.
International Data Group Company, 919 East Hillsdale Blvd,
Suite 400, Foster City, CA 94404. 1-415-655-3200.
5.
Till D. Teach yourself PERL 5 in 21 days, Second Edition.
Indianapolis: SAMS Publishing, 1996.
SAMS Publishing, 201 West 103rd Street, Indianapolis, IN 46290.
1-800-428-5331.
11.
U. S. Code of Federal Regulations. 1995. 45 CFR Subtitle A
(10-1-95 Edition), part 46.101 (b) (4).
U. S. Department of Health and Human Services. Office of the Secretary.
The complete Common Rule document (45CFR46), at URL:
http://ohrp.osophs.dhhs.gov/humansubjects/guidance/45cfr46.htm
12.
U. S. Code of Federal Regulations. 1999. 45 CFR Parts 160 - 164.
Standards for Privacy of Individually Identifiable Health Information;
Proposed Rule.
Department of Health and Human Services. Office of the Secretary.
Fed Regist. 1999 Nov 3;64(212):59917-59966.
http://aspe.hhs.gov/admnsimp/
13.
National Cancer Institute's Confidentiality Brochure, at URL:
http://www-cdp.ims.nci.nih.gov/policy.html
14.
Moore GW, Berman JJ.
Anatomic Pathology Data Mining.
In: Cios KJ, ed.
Medical Data Mining and Knowledge Discovery.
2001. XVIII, 502 pp. 98 figs., 98 tabs. Hardcover.
ISBN: 3-7908-1340-0.
Copyright Springer-Verlag: Berlin/Heidelberg 1999.
15.
Sweeney L.
Computational Disclosure Control: A Primer on Data Privacy Protection.
PhD Thesis. Massachusetts Institute of Technology. Spring, 2001. Draft.
http://www.swiss.ai.mit.edu/classes/6.805/articles/privacy/sweeney-thesis-draft.pdf
16.
Sweeney L.
Privacy and medical-records research.
N Engl J Med. 1998 Apr 9;338(15):1077.
PMID: 9537887; UI: 98181820.
17.
Sweeney L.
Guaranteeing anonymity when sharing medical data, the Datafly System.
Proc AMIA Annu Fall Symp. 1997;:51-55.
PMID: 9357587; UI: 98020458.
18.
Gödel K.
Über formal unentscheidbare Sätze der Principia Mathematica
und verwandter Systeme. I.
Monatsh Math u Physik. 1931; 38: 173-198.
Translation:
Gödel K. On formally undecidable propositions of Principia Mathematica
and related systems. New York: Basic Books. 1962.
19.
Nagel E, Newman JR.
Gödel's Proof.
New York: New York University Press. 1958.
ISBN 0-8147-0325-9.
10.
Hofstadter DR.
Gödel, Escher, Bach. An Eternal Golden Braid.
New York: Basic Books. 1979.
ISBN 0-465-02656-7, 777 pages.
11.
Casti JL, DePauli W.
Gödel. A Life of Logic.
Cambridge, MA: Perseus Publishing. 2000.
ISBN 0-7382-0274-6, 210 pages.
12.
Smith B.
Mereotopology: A Theory of Parts and Boundaries.
Data and Knowledge Engineering. 1996;20:287-303.
13.
Quine WVO.
Ontological relative, and other essays.
New York: Columbia University Press. 1969;:.
14.
Stewart I.
Flatterland. Like Flatland. Only More So.
Cambridge, MA: Perseus Publishing. 2001.
ISBN 0-7382-0442-0, 301 pages.
15.
U. S. Defense Advanced Research Projects Agency (DARPA).
Agent Markup Language.
http://www.daml.org
16.
U. S. Defense Advanced Research Projects Agency (DARPA).
Ontology Inference Layer.
http://www.ontoknowledge.org/oil
17.
Boole G.
An Investigation of the Laws of Thought.
On which are founded the Mathematical Theories of Logic and Probabilities.
New York: Dover Publications, Inc. 1954.
18.
Lewis CI, Langford CH.
Symbolic Logic. Second Edition.
New York: Dover Publications, Inc. 1932.
19.
Borkowski L.
Formale Logik.
Muenchen: C. H. Beck.
20.
Zeman J.
Modal Logic: The Lewis Modal Systems.
Oxford at the Clarendon Press. 1972;:.
21.
Haack S.
Deviant Logic, Fuzzy Logic: Beyond the Formalism.
Paperback - 292 pages (November 1996).
Chicago: University of Chicago Press, 1996.
ISBN: 0226311341 ;
22.
Nguyen HT, Walker EA.
A First Course in Fuzzy Logic.
Hardcover, 300 pages, 2 edition, July 1999.
New York: CRC Press.
ISBN: 0849316596.
23.
Quine WVO, Ullian JS.
The Web of Belief. Second Edition.
Paperback 2nd edition. February 1, 1978.
McGraw-Hill Higher Education;
ISBN: 0075536099.
24.
Beeson P.
Sutton's Law Paper.
Am J Med. 1960.
25.
Sutton WF, with Linn E.
Where the money was.
Out of print.
26.
Moore GW, Hutchins GM, Bulkley BH.
Certainty levels in the nullity method of symbolic logic:
application to the pathogenesis of congenital heart malformations.
J Theor Biol. 1979 Jan 7;76(1):53-81.
27.
Moore GW, Hutchins GM.
Effort and demand logic in medical decision making.
Metamedicine 1980;1:277-304.
28.
Cios KJ, Moore GW. 2000.
Medical Data Mining and Knowledge Discovery: An Overview.
In: Cios KJ, ed.
Medical Data Mining and Knowledge Discovery.
2001. XVIII, 502 pp. 98 figs., 98 tabs. Hardcover.
ISBN: 3-7908-1340-0.
Copyright Springer-Verlag: Berlin/Heidelberg 1999.
29.
College of American Pathologists.
Surgical Pathology Case Summaries.
http://www.cap.org
30.
Collaborative Prostate Cancer Tissue Resource.
http://www.prostatehealth.org/cpctr
31.
Suppes P.
Introduction to Logic.
New York: Van Nostrand. 1957.
32.
Bernays P.
Axiomatic Set Theory.
New York: Dover Publications. 1968.
33.
Suppes P.
Axiomatic Set Theory.
New York: Dover Publications. 1972.
ISBN 0486616304.
34.
Davis M.
Computability and Unsolvability.
New York: Dover Publications, Inc. 1958.
ISBN 0-486-61471-9, 248 pages.
35.
Andrews GL.
Number Theory.
New York: Dover Publications, Inc. 1971.
ISBN 0-486-68252-8, 259 pages.
36.
Schneier B.
Applied Cryptography, Second Edition.
Protocols, Algorithms, and Source Code in C.
New York: John Wiley & Sons, 1996.
37.
Tarjan RE.
Data Structures and Network Algorithms.
CBMS-NSF Regional Conference Series in Applied Mathematics.
Paperback, December, 1983.
New York: Society for Industrial & Applied Mathematics.
ISBN: 0898711878.
38.
Worldwide Web Consortium.
http://www.w3.org
39.
Light R.
Presenting XML.
Sams.net Publishing. 1997.
40.
U.S. National Library of Medicine.
Unified Medical Language System.
http://www.nlm.nih.gov/research/umls/
41.
U. S. National Library of Medicine.
UMLS Knowledge Sources. Twelfth Edition.
Unified Medical Language System.
U. S. Department of Health and Human Services.
National Institutes of Health.
National Library of Medicine. 2001.
42.
Vigorita VJ, Moore GW, Hutchins GM.
Absence of correlation between coronary arterial atherosclerosis
and severity or duration of diabetes mellitus of adult onset.
Am J Cardiol. 1980;46:535-542.
43.
Moore GW, Hutchins GM.
Consistency versus completeness in medical decision making:
Application to 155 patients autopsied after
coronary artery bypass graft surgery.
Proc 6th Annu Symp Comput Appl Med Care. 1982;6:805-811.
44.
Moore GW, Brown LA, Miller RE.
Set Theory Definition and Algorithm for Medical De-Identification.
Arch Pathol Lab Med. 2001;:in press.
http://www.netautopsy.org/apep00st.htm
45.
Johns Hopkins Autopsy Resource.
http://www.netautopsy.org/
46.
Moore GW, Berman JJ, Hanzlick RL, Buchino JJ, Hutchins GM. 1996.
A prototype Internet autopsy database.
1625 consecutive fetal and neonatal autopsy facesheets spanning 20 years.
Arch Pathol Lab Med. 1996;120:782-785.
47.
Hornung J.
Kritik der Signifikanztests.
Metamed 1977;1:325-345.
48.
Moore GW, Hutchins GM.
The persistent importance of autopsies.
Mayo Clin Proc. 2000 Jun;75(6):557-8.
50.
Shared Pathology Informatics Network.
http://grants.nih.gov/grants/guide/rfa-files/RFA-CA-01-006.html
51.
Nelson SJ, Cole WG, Tuttle MS, Olson NE, Sherertz DD.
Recognizing new medical knowledge computationally.
Proc Annu Symp Comput Appl Med Care. 1993;17:409-413.
52.
Moore GW, Polacsek RA, Erozan YS,
de la Monte SM, Miller RE, Hutchins GM, Riede UN.
Multilingual translation techniques in the analysis
of narrative medical text.
Comput Methods Programs Biomed. 1986 Mar;22(1):35-42.
53.
Chomsky N.
Aspects of the Theory of Syntax.
Cambridge, MA: The MIT Press. 1965.
54.
http://www.zdnet.com/
OODLES OF FREE SOFTWARE!
For a great, cost-free File Transfer Protocol (FTP) program,
I recommend: WS_FTP.
Click on: DOWNLOAD OUR 50 FREE PROGRAMS
Enter: FTP Click on: SEARCH options
Click on: WS_FTP LIMITED EDITION.
55.
56.
57.
58.
59.
60.
61.
SOUNDEX resources.
http://www.google.com/
Enter SOUNDEX in the search box, and hit ENTER.
62.
Mokotoff G.
Soundexing and Genealogy.
http://www.avotaynu.com/soundex.html
63.
Mormon Soundex.
http://freepages.genealogy.rootsweb.com/~btphelps/bom/
Click on THE SOUNDEX MACHINE.
Cited at this website: Free Brochure.
This essay is based on "Using the Census Soundex,"
General Information Leaflet 55 (Washington, DC:
National Archives and Records Administration, 1995), a
free brochure available from inquire@nara.gov (include
your name, postal address, and "GIL 55 please").
64.
Alighieri D.
The Divine Comedy, I. Inferno. Part 2.
Charles Singleton (Translator). Paperback: 712 pages.
Princeton NJ: Princeton Univ Press.
ISBN: 0691018952.
commentary edition (February 1, 1990).
65. Taft RL.
Name Search Techniques.
Bureau of Systems Development.
New York State Identification and Intelligence System.
Albany, New York, 1984.
66. Caldwell T.
Dear and Glorious Physician.
Buccaneer Books; ISBN: 1568492421; December, 1996.
A fictionalized account of the life of Saint Luke.
67. Hippocrates.
Hippocrates. Volume I.
Jones WHS, transl. Loeb Classical Library.
Cambridge, MA: Harvard University Press. 1923.
ISBN 0-674-99162-1, 361 pages.
Includes Hippocrates' Oath, with explanatory notes.
68. Euclid. Greek Mathematics.
Goold GP, ed. Thomas I transl.
Loeb Classical Library. #335.
Cambridge, MA: Harvard University Press. 1939.
ISBN 0-674-99369-1, 511 pages.
69. Levy S.
Crypto: How the Code Rebels Beat the Government
-- Saving Privacy in the Digital Age.
New York: Viking Press. January 4, 2001.
ISBN: 0670859508, 356 pages.
70. Schneier B.
Applied Cryptography, Second Edition.
Protocols, Algorithms, and Source Code in C.
New York: John Wiley & Sons. 1996.
ISBN , pages.
71. Rivest RL.
R. L. RIVEST'S CRYPTOGRAPHY AND SECURITY PAGE.
http://theory.lcs.mit.edu/~rivest/crypto-security.html
Prof. Rivest is the R in the RSA public-private
cryptography algorithm, one of the intellectual
masterpieces of this century.
72. Lemay L, Tyler D.
Teach Yourself Web Publishing with HTML 4 in 21 days.
Indianapolis, IN: Sams. A division of Macmillan Computer Publishing.
201 West 103rd St, Indianapolis, IN 46290.
October, 1998.
ISBN: 0-672-31345-6.
73. Simpson A.
HTML Publishing Bible, Windows 95 Edition.
Foster City, CA: IDG Books Worldwide, Inc. 1996.
74. Till D.
Teach Yourself Perl 5 in 21 days. Second edition.
Indianapolis, IN: Sams Publishing. 1996.
75. Orwant J, Hietaniemi J, Macdonald J.
Mastering Algorithms with Perl.
Cambridge: O'Reilly. 1999.
ISBN 1-56592-398-7, 684 pages.
76. Berman JJ.
Perl for Pathologists.
http://www.pathinfo.com/
Scroll to the bottom of the page.
Click on PERL for Pathologists.
This is a fantastically simple, straightforward introduction to Perl,
written by one of my colleagues.