A here document
refers to a way of embedding one document, formatted in one way) into another enclosing document, often formatted in a different way, where the two document formats would clash. The term originated (I believe) with the BourneShell
, which uses two left-angle-brackets followed by a sequence of characters, followed by the content, followed by a line containing nothing but the sequence of characters, i.e.
This program rearranges the foobars installed on your computer
system. To use, invoke with the following options: <blah blah blah>
Lots of other programming languages--most notably scripting languages and markup formats, have here documents. PerlLanguage
uses a syntax very similar to BourneShell
uses three double quotes to begin and end here documents.
uses CDATA sections.
Within a here document, the formatting of the enclosing (host) language is generally disabled; other than searching for the end tag (and possibly an EscapeSequence?
s may or may not be provided to:
- Quote the end tag, if it must be part of the enclosed text
- Drop back into the host language (kind of like the comma in CommonLisp--though quoted and QuasiQuote?d blocks in Lisp are not HereDocuments as they must be well-formed EssExpressions.
If escape sequences are not provided; then it is not possible to quote any text containing the end tag. Some languages (BourneShell
, Perl) allow you to specify the end tag; so that an end tag which isn't
in the quoted document can be chosen. (In PythonLanguage
, the three double quotes can be quoted if need be; with XML CDATA sections you are screwed...I think).
Some languages (embedding files in mail and Usenet posts and HTTP posts and BourneShell
) allow you to specify not merely an end *character*, but an end *string*.
This makes it possible to put *anything* in the enclosed text without having to escape it (with one exception mentioned at QuineProgram
). As long as the enclosed text is less than 17 KB long, you're guaranteed that there is at least one 3-letter "word" that does not occur anywhere in it. As long as the enclosed text is less than 1 terabyte long, you're guaranteed that there is at least one 9-letter "word" that does not occur anywhere in it.
(I saw one ClientHttpRequest?
implementation pick a random 64 bit number in radix-36 every time, hoping that particular string of 12 letters and digits doesn't occur in the text).
There is more explanation on WikiPedia
See also EmbeddedDocument