muforth/README

This file is part of muFORTH: http://pages.nimblemachines.com/muforth

Copyright 2002-2008 David Frech. All rights reserved, and all wrongs
reversed. (See the file COPYRIGHT for details.)


muFORTH README
==============

Why muforth?
^^^^^^^^^^^^
Why write another Forth when there are so many around? Because
everyone else's Forth is wrong. ;-)

First of all, muforth is most emphatically _not_ ANS-compatible. It
would be silly to write another when there are perfectly good ones
available. I wanted instead to build a Forth with specific features,
several of which are shamelessly stolen from Chuck Moore's colorforth
(http://colorforth.com).

I think I started with the question, "What would colorforth look like
without color?" and went from there. I wanted a substrate for experiments,
a good metacompiler, and something small and simple.

Why the name?
^^^^^^^^^^^^^
From startup.mu4:

  The idea is to move as much code as possible -out- of the Forth kernel.
  Hence the name: "mu" is the Greek letter often used in engineering to
  represent "micro". I had called it "nu" Forth, because it was new, but I
  like this nu Greek letter better.

Why another Forth?
^^^^^^^^^^^^^^^^^^
In other words, why not keep using dforth, an earlier Linux Forth I wrote
that I have used successfully on several projects?

dforth has some qualities that I now construe as defects. dforth

  * is written entirely in x86 assembler;

  * traps into the Linux kernel directly, eschewing the C library;

  * is implemented via indirect-threaded code (ITC), an elegant technique,
    but I was interested in exporing alternatives;

  * is a very big kernel, with lots of task-specific code

I wanted to go the other way. My desiderata included the following:

  * a kernel, written in C, that is the smallest possible kernel capable of
    compiling Forth colon definitions (ie, it is self- bootstrapping);

  * a simple native-code compiler;

  * a tail-recursive implementation;

  * a new parser, tokenizer, and interpreter

  * colorforth's terseness but implemented without color and without
    reinventing the OS.

Map of the territory
^^^^^^^^^^^^^^^^^^^^
The following sections are not really in any order. Some lead into the next
one. Some are just the thing I thought of next as I was writing. In any
case, if you get bored reading one section, just skip to the next one. It's
bound to be more interesting!

Written in C
^^^^^^^^^^^^
muforth is written in C. This was a hard decision, and involved some
difficult tradeoffs.

When implementing dforth, I initially wanted to write it in C; but I was
also committed to doing an ITC Forth, and doing that in C I found to be
very clumsy. I bit the bullet and wrote in assembler, which was quite
liberating. The only real downside was the frustration of using m4 as a
macro processor - I needed more than the C preprocessor could give me, but
I grew to _hate_ m4. I found using it to be a case of programming by
trial-and-error, rather than something on which I could bring knowledge and
experience to bear.

This time around I was willing to give on a few things in order to be able
to write in C.

Of course, muforth is far from architecture-neutral, as it contains a
native code compiler that compiles little chunks of x86 machine code. This
would need to be changed to run on other architectures. I tried to keep it
simple, so this shouldn't be hard. A few hours' work, perhaps, once you
understand the structure of muforth.

There are several places where writing in C involved compromise or
cleverness; I'll try to talk about them in their structural context. The
main concerns:

  * calling C from Forth;

  * length-prefixed Forth-style strings vs zero-terminated C strings;

  * code generation inefficiencies forced on me by gcc (no register
    globals)

Since I wanted to be able to write, in C, several utility routines that I
to call from Forth, I decided to write much of the kernel of muforth using
a Forth calling convention. I wrote such "shareable" words as functions
with void arguments returning void, and they "manually" get parameters
from, and push their results onto, the Forth stack - which is simply a C
array of ints.

Any routine that uses the Forth stack (rather than the C stack) in this way
has its name prefixed by "mu_". By putting an entry for a "mu_" function in
one of the arrays that fill the initial dictionary chains for .forth. and
.compiler. it is possible to call the function as a word from Forth.

By doing things this way I was able to write, only once and in C, words
that otherwise I would have to rewrite in Forth. I wanted to avoid such an
error-prone and ugly approach; the cost that I paid was to be forced to
generate crappy native code, since the C compiler dictated my use of
registers, and in particular refused to allow me to put important global
variables in registers.

Another side effect of this decision is that any code that calls into the C
library needs to act like a glue function between the two stacks. There are
some examples of this in tty.c and time.c. It's pretty simple - a routine
simply moves values between the two stacks as needed. Since the return
stack is shared, and since there is no "execution engine" that needs to be
fired up on re-entry to Forth - unlike a threaded Forth, which requires
this - the only "stack convention" to worry about making sure the right
values get moved between the two stacks.

The only place in the code that needs to "worry" about calling conventions
is the native compiler code. It needs to know, in particular, which
registers the C compiler treats as caller-saved, and which callee-saved, so
it can generate code that won't clobber any of its caller's values, and,
likewise, that its caller won't clobber. I determined this largely through
a trial-and-error process of writing useless and bizarre C code that
consumed lots of registers and made lots of calls, but didn't do anything,
and then looking at the assembler output generated by the C compiler.
Because C and Forth can call each other "seamlessly", my code had to play
nicely with C's ideas about register usage.

Oddities and idiosyncrasies
^^^^^^^^^^^^^^^^^^^^^^^^^^^
muforth has many differences from more conventional (old-school?) Forths.
As I said, it's emphatically _not_ ANS-compatible. Instead I tried to
choose and use the best ideas I could find. Many of them are, perhaps not
surprisingly, from two Forths written by Chuck Moore: cmFORTH (for the
Novix chip) and colorforth (for the x86 PC).

Though I've chosen to be intentionally incompatible, I haven't done so to
be difficult or contrarian, but instead because being compatible would have
forced me to give up much of what I wanted to accomplish.

Ideas from cmFORTH
^^^^^^^^^^^^^^^^^^
cmFORTH's main innovation, in my view, was to get rid of IMMEDIATE words.
As any Forth old-timer knows, IMMEDIATE words are executed at compile time.
But they are also executed at "interpret" time, which causes no end of
trouble and leads to horrendously strained games being played with the
names of words.

Instead of IMMEDIATE words, cmFORTH has an entirely separate set of words
in the dictionary. There are two sets - I call them "chains" - forth and
compiler. The compiler words play the role of IMMEDIATE words in cmFORTH
and muforth.

In fact, the same idea exists in colorforth, but Chuck changed the names.
There is a lot more explanation of this in the section on the dictionary.

Since you need a way to force the compilation, rather than execution, of
compiler words, Chuck defined the word \ that searches in the compiler
chain for a token, compiles it if found, and complains if not found.

I call this word \c (compile from compiler chain); and I define \ to be
more like ANS's POSTPONE. There is much more explanation of this in the
section on the funny compiler word called \ .

Ideas from colorforth
^^^^^^^^^^^^^^^^^^^^^
In colorforth the color of a word specifies what the interpreter should do
with it. Red means define a new word; the token in red is the name of the
new word. Yellow means execute; green means compile. There are some other
colors as well, but for our discussion these are the most interesting.

In order to be able to calculate constants inside a colon definition, you
need only switch to yellow, do the calculation, and then switch back to
green. At the yellow-to-green transition the compiler generates a literal
with the value of the calculation done by the yellow words.

How was this done "traditionally"? Like this:

  : blog  [ 25 80 * ] literal + ;

The word "literal" is a compiler word, which gets executed at compile time;
it compiles a literal from the value on the top of the stack - in this
case, the result of 25 80 *.

But that "literal" is ugly. In colorforth it goes away. I wanted to make it
go away in muforth as well. So what did I do? I made "]" _always_ generate
a literal. If you need to jump out of the compiler (using "["), do
something, and then jump back, you can use "-]". It doesn't create a
literal.

Our example above becomes

  : blog  [ 25 80 * ] + ;

which is much nicer. A few neat examples, from startup.mu4:

  : 0   [ dup dup xor ] ;
  : -1  [ 0 invert ] ;
  : 1   [ -1 negate ] ;

A side effect is that literal is no longer a compiler word. Instead of
being used inside normal Forth words to compile literals, it is instead
used inside compiler words, but is itself a normal Forth word. In both
cases it runs at compile time, but the mechanism has changed - it now finds
itself inside compiler words instead of being one itself.

Since there isn't really any compiler "state" in colorforth - each word
tells the interpreter, by its color, how to treat it - there is no need to
bracket definitions with ":" and ";". A red word marks the start of a new
definition; nothing really marks the end. In fact, it's possible, in
colorforth, to write a word whose execution "falls thru" into the next
word. This is considered bad style by some, but assembler programmers over
the years have used it to great advantage. colorforth gives you this
option. (Heh heh. So does muforth.)

If we don't need to mark the start and end of a word with ":" and ";", why
is there a word called ";" in colorforth? What does it do? It exits from,
or returns from, the current word, but without it any way marking it as
"done". It is like EXIT in more traditional Forths.

I coveted these qualities for muforth, but had the constraint that my
compiler _does_ have state, and does switch back and forth between
interpreting and compiling. So I need ";" to end my words, like it always
has. So I needed two new words:

 * to exit a word "prematurely";
 * to end a word, but "fall thru" to the next word.

The first I called "^" to capture the idea of jumping up out of the word.
The second I already had. It's called "[", and is used, as explained above,
to "temporarily" exit the compiler so that we can do a calculation or
something. Great! That's all we need. Since we do not smudge or hide newly
defined words, the only action that needs to be taken to exit the compiler
is to switch from compile to interpret state. This is exactly what [ does:
it stops the compiler, but doesn't compile a "return" (or convert a
tail-call to a jump).

Then there is the usual ; which does both these things. In fact, it simply
calls one ( ^ ) and then the other ( [ ). Its definition is basically:

  compiler
  : ;  \ ^  \ [  ;

where \ is used to force the compilation of a compiler word.

colorforth has the oddball feature that "if" leaves the top of stack
intact. This can be useful or annoying. I thought it would be interesting
to have _both_ options - destructive and non- - so I defined "=if" and
"=while" in addition to "if" and "while". "if" and "while" consume the top
of stack; "=if" and "=while" leave it alone.

And, just as in colorforth, there is no "else" in muforth! You can always
get by without it, and often the code factoring improves. For example, you
can rewrite

  <test> if  a  else  b  then  c ;

as

  <test> if  a c ^  then  b c ;

colorforth doesn't "smudge" (hide) new definitions until they are complete
- and neither does muforth. This makes writing recursive definitions easy,
but makes redefinition harder (you have to rename the old function first).

colorforth lacks the looping words begin, while, until, and repeat. (It
retains for and next.) In colorforth, to write a loop, you simply "call" a
word from within itself. For this to work, colorforth has to be
tail-recursive. So muforth is as well. I liked the idea of being able to
write iterations as syntactically recursive procedures, as in Lisp;
however, I haven't used it much. ;-) See the section on the joy of
tail-recursion for more details.

Numbers
^^^^^^^
As part of my effort to strip everything out of the kernel of muforth that
is not necessary for bootstrapping - keeping only what is necessary to make
colon definitions - I stripped number parsing, number formatting, and most
i/o out of the kernel.

There aren't even any constants defined in muforth! I calculate them,
starting with 0, in startup.mu4. From there I get some basic constants I
need for deciding if a character is a digit, and at that point I can write
the number parsing code.

The number parsing code has some flaws, and isn't that different from what
you'd find in F83 or an ANS Forth. The main thing I added was radix
prefixes. Even though you're in, say, hex, you can enter a number in any of
4 radices by prefixing it with a single character:

  "  -> hex
  '  -> octal
  #  -> decimal
  %  -> binary

This can be a great headache saver.

Forgetting
^^^^^^^^^^
muforth is like an elephant; it never forgets. ;-)

muforth does not define the common Forth word "forget", nor does it define
"empty", as colorforth does. In the presence of deferred words, any kind of
"forget" is error-prone. I find it better to quit and reload. This has
never bothered me, or felt limiting in any way. And it certainly makes
memory allocation easier. ;-)

Strings
^^^^^^^
I struggled with how to define and use strings in muforth. I _really like_
length-prefixed strings, but using them with C is a pain. After several
false starts I arrived at a simple structure for strings that solves not
only the problem of C compatibility, but a host of other problems related
to having strings "inline" with code - the traditional Forth way.

Since muforth is intended to run on modern machines with lots of caches, it
seemed important to separate code and data. This affects not only strings
but also create/does>. I'll talk about create/does> in a later section.

Except in a few places - throw comes to mind - a string in muforth is
represented on the stack by a tuple: (start, length). start points to the
first character; length is the count of characters. The string may or may
not be zero-terminated; no muforth code depends on the presence of such
termination.

But what about saving strings for later use? Now we have to have a
structure for strings in memory, and we'd like it to be C compatible, since
maybe the muforth word we're calling is going to call into the C library.
Maybe we're opening a file by name.

My conclusion was to compile the tuple

  (length, string, termination, padding).

length is a cell-sized count, so very large compiled strings are possible.
string is an array of the characters. termination is a single null
character - always present in compiled strings! And padding gets us to a
cell boundary.

Note that it's possible to "compile" a string but not allot the space for
it. This is useful for temporary strings that need to be zero-terminated,
such as filenames to be passed down into the C library. Unfortunately it's
easy to clobber the string thusly compiled. I used to have "ld" use a
compiled but not alloted string; now it allots it too, so even "temporary"
filename strings use up data space. Oh well. ^C, up-arrow, return. ;-)

Unlike traditional Forths, I compile the string into "data" space, not code
space. Then I compile a literal that at runtime pushes the address, not of
length, but of string. This way I can easily pass it to C code, and I can
just as easily convert it to a canonical muforth (start, length) string by
executing count:

  : count  ( z" - a u)  dup  cell- @ ;

z" here is a mnemonic for zero-terminated, length-prefixed, compiled
string.

One of my subgoals with muforth was to have only one kind of literal.
Forths traditionally have had several different kinds of string literals,
all of them inline (thereby possibly causing cache problems by mixing code
with data) and therefore requiring strange and convoluted code to "jump
over" the inline string in order to continue execution. By converting
string literals to ordinary literals, lots of problems go away, and the use
of strings now becomes neatly postfix as well. If I want a literaled
compiled string to push (start, length) I simply follow the literal with a
call to count. Look for "string," in startup.mu4 and you'll see how it all
works.

Note also that this kind of literal can be used for any arbitrary data
structure, not just strings; it defines a _universal literal_. What is a
clump of data but a (start, length) tuple? You can compile any kind of data
this way, make a literal of it as a counted string, and then call code that
knows how to pull bits out of it and do something with them.

Digression about tokenizing, strings and suffixes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before digging into the interesting details about how muforth's tokenizer
works I want to make a digression that I hope will shed some light on the
odd data structure that I use to capture the state of the tokenizer. I
should say that these ideas came _after_ I had settled on the structure and
written the code - long after. I wanted to find a "formal" (or semi-formal
;-) explanation of the data structure and why it is well-suited to
tokenizing. While walking today, I think I found it.

Think about what a tokenizer does. It looks at a string, a character at a
time, classifying each character as a delimiter or token character. Its job
is to (possibly) skip leading delimiters, collect a sequence of token
characters, and (possibly) consume the delimiter following the last token
character. It returns a (start, length) string representing the token. Then
it somehow "remembers" which characters it has seen, so that the next time
it gets called it will skip over them, rather than starting over at the
start of the string.

It peels a character off the front, does something with it, and sets aside
the rest. At the lowest level, then, a tokenizer looks at successively
shorter suffixes of its input string. This is the main insight that leads
to our data structure. But before we get there, let's look at what is
commonly done.

Almost all Forths represent the string to be tokenized as a (start, length)
tuple, and the count of characters already consumed by the tokenizer as an
offset, commonly called >in. When >in is zero, we have consumed none of the
string; when >in == length, we have consumed the entire string. If we think
of >in as the length of the _prefix_ of the string already consumed, then
(length - >in) is the length of the suffix _yet_ to be consumed.
(Invariant: len(prefix) + len(suffix) == length.)

We should note that using (start, length) to represent successive suffixes
is clumsy. Here are a few such successive suffixes:

  (start, length)          the whole string
  (start + 1, length - 1)  .. minus the first character
  (start + 2, length - 2)  .. minus the first two characters
  ...
  (start + length, 0)      the empty suffix

For each successive suffix, as >in goes from zero to length, it is added to
start, and subtracted from length.

If we were looking at successive prefixes this would be a great way to
represent things;

  (start, length)          the whole string
  (start, length - 1)      .. minus the last character
  (start, length - 2)      .. minus the last two characters
  ...
  (start, 0)               the empty prefix

But that's _not_ what a tokenizer does, so why use this representation?
(I've heard that at least one Algol 60 compiler read its input backwards,
which somehow made parsing easier and more elegant. In that situation some
variant of this successive prefixes method might nicely apply.)

Why not instead put our anchor point at the _end_, which is the same for
all suffixes of the original string? Then we would represent the start of
the suffix as an offset from the end, like this (setting end = start +
length):

  (end, -length)          the whole string
  (end, -length + 1)      .. minus the first character
  (end, -length + 2)      .. minus the first two characters
  ...
  (end, 0)               the empty suffix

Note that only the second one part of the tuple is changing. This should be
a clue that this representation is a better match for suffixing than
(start, length). In fact, if we compare this representation of suffixes
with using (start, length) to represent successive _prefixes_, we see that
they are _duals_ of each other. In both cases only the second part of the
tuple changes; the anchor remains the same. The second part goes from
either length to zero (prefix) or -length to zero (suffix).

Practically this means that there is less work to do on each call to the
tokenizer, since we already know which part of the text we've seen (the
prefix that we've "thrown away"), and less work to do for each character,
since we are only changing (and checking for exhaustion) one part of our
data structure.

With that introduction, we are ready for...

Tokenizing the input
^^^^^^^^^^^^^^^^^^^^
This is one of the cool innovations in muforth.

In muforth I define a notional string type, called "text", that is better
suited than normal (start, length) strings to tokenizing. I say "notional"
because it never shows up in the language, only in the C implementation. A
text is something a bit odd. I'll explain.

I discovered, in writing the tokenizer for dforth, that using a string
(start, length) and an offset (past what we have seen already) was
clumsy and slow. This is what Forths have traditionally done. The state of
the interpreter is captured by two variables: source and >in. source is a
double variable containing a pointer to the first character, and a length -
a normal string. >in is an offset _past_ the characters that we've looked
at already, in previous calls to the tokenizer.

On every call to the tokenizer you first have to offset the pointer forward
(by >in), past the text you've seen already, and the count backwards (also
by >in), since it counts characters yet to be seen. You're always
offsetting two things. Even when consuming a single character during
scanning you have to increment the pointer, and decrement the count. Two
operations. Clumsy.

Instead, I thought, why not run everything backwards?

If a string is represented by the tuple

  (start, length)

then a text describing the same sequence of bytes in memory could be
represented by the tuple

  (end, -length), where end = address + length.

The type "text" consists of a pointer just past the _end_ of the text, and
a negative offset from the end back to the start. When we start tokenizing
a chunk of text, we set a global variable called first to "point" to the
first character. But first is really a negative offset from the end of the
text.

In this scenario, the current state of the tokenizer is captured in two
variables: source (a _text_), and first, an offset running from -length up
to 0.

On each call to the tokenizer, first already points _past_ the the text
we've seen already, so we're ready to start looking for a new token. Now
might perhaps be a good time to mention a little secret: muforth has _two_
tokenizers. One - token - assumes that tokens are separated by whitespace,
and skips _leading_ whitespace. The other - parse - takes a delimiter
character and scans for a _trailing_ delimiter, but assumes that the new
token starts at first - it does _not_ try to skip any leading delimiters.

These two routines do not share any code, even though they are
substantially the same. There simply isn't a good a way to tease them
apart, but they are short enough that I'm not worried about it. ;-)

Ok, back to tokenizing. Let's assume we've called token, and we're going to
skip leading whitespace. first "points" to the first character that we
haven't seen yet. So, while the character at first is whitespace and first
< 0 (not at the end of the text), increment first. Now either we've run out
of text, or we've found a non-whitespace character. Either way, this is the
first character of our token.

We leave first pointing to it, set last = first, and then start scanning
with last, incrementing last as long as it points to a non-whitespace
character and remains < 0. When we reach the end of the token, which
happens either if we find a trailing delimiter, or if we run out of text to
scan, last is left pointing just _past_ the last character of the token.

Another way of thinking about this is that we either run out of input text,
or we "consume" a single trailing delimeter character. We capture the
presence or absence of the trailing delimiter in a variable called
trailing, which we set to 1 when we start to look for the end of a token.
If the input runs out before we find a delimiter we set trailing to 0.

We've now "bracketed" a token. first points to its first character; last
points just beyond its last character. (These "pointers" are actually both
negative offsets from the end of the source text.) The (start, length)
string tuple for the token is then

  (end + first, last - first)

We consume the token and trailing delimiter (if there was one) from the
input stream by setting

  first = last + trailing.

Now we're ready to look for the next token.

Of course, the astute reader will be wondering about the various boundary
cases when the text runs out. By being careful about trailing, we ensure
that first is never left > 0. At the end of parsing a token it will be <=
0. If no characters were found between first and last, which can only
happen if first = 0, possibly _after_ skipping leading whitespace (think
about it), then last - first will be 0 as well. At end of text we _always_
return the token

  (end, 0)

and the interpreter code is smart enough to recognize this zero-length token
as the end.

So how do we interpret text typed by the user or loaded from a file?

In the first case, simply call interpret with the tuple (start, length)
that describes the buffer holding the input text that the user typed in.
(It's called "typing" in startup.mu4.)

For files, we simply mmap them, and then pass the (buffer, length) tuple to
interpret! This make muforth's interpreter code look much more like a
BLOCK-based Forth - which it isn't - than the file-based Forth that it is.

Nested loading of files is possible because the file loading code actually
calls evaluate (rather than interpret directly), which saves the current
source and first, calls interpret on the new file's mmap'ed text, then
restores the old source and first, and keeps going from there.

Since nested interpretation of the typing buffer isn't possible, it is
interpreted directly, with no call to evaluate (see quit in startup.mu4).

The dictionary
^^^^^^^^^^^^^^
The dictionary in muforth is very simple. It is a set of intertwined linked
lists of words. Each list represents what I call a chain, and consists of
related words. The head of each chain is the word most recently defined on
that chain.

The dictionary space is linearly allocated. Each entry consists of a tuple:

  (link, code, count, name, padding)

  * a link pointer to the previous word on the same chain;
  * a pointer to the machine code associated with the word;
  * a 1 byte count of characters in the name;
  * the bytes of the name;
  * padding to a 4-byte boundary.

The dictionary can in theory contain an arbitrary number of intertwined
chains. muforth starts up with only two chains defined; their names and
functions I shamelessly stole from Chuck Moore, but this time from cmFORTH.
They are called "forth" and "compiler". (He uses the same idea in
colorforth, but calls the sets of words "forth" and "macro", and they are
arrays rather than lists.)

The forth chain consists of "normal" words: words that should be executed
when mentioned outside a colon definition, and compiled when mentioned
inside one.

The compiler chain fills the role that so-called IMMEDIATE words fill in
more traditional Forths, but without their problems. Because it is a
completely separate chain that is _not_ searched outside of colon
definitions, it is possible to have a word with the same name in both
chains with no confusion. A perfect example is ." . Outside a definition it
simply scans for a trailing " delimiter and then echoes (types) the string.
Inside a definition it compiles the string into "data" space, and compiles
code to type it out at runtime. This is intuitively what we expect.

Inside a colon definition the interpreter searches the compiler chain, and
if it finds the token there it executes it rather than compiling it. In
other words, words on the compiler chain execute at compile time, and tend
to be used to build control structures like if/then and begin/while/repeat.
Because of this, we often call them "compiling words".

But that's not the end of the story. If the token is _not_ found on the
compiler chain, we look on the forth chain, and if found there it is
compiled.

Ok, so how do we specify which chain a word gets compiled into, and how do
we specify "search orders" like the above?

A variable, called current, points to the "current" chain - the one that
new definitions go into. By convention chains have funny names: forth and
compiler are really called .forth. and .compiler. . I chose those names to
indicate their chainness. (The dots on the ends are supposed to look like
"links".) When you mention a chain by name it simply pushes its address.
It's basically a constant.

So, also by convention, we define the words "forth" and "compiler". These do
more than simply name the chain; they make it the "current" chain - new
words get compiled into it.

How does this work? Simple!

  : definitions  ( chain)  current ! ;
  : forth        .forth. definitions ;
  : compiler  .compiler. definitions ;

Any time we define a new chain, .foo. , we also define "foo" as above.

Then our code can look like this:

  forth
  [words compiled into .forth. chain]
  compiler
  [words compiled into .compiler. chain]

Now we can specify what chain words get compiled _into_, but what about
specifying which chains get searched, and when, and what we do with the
words that we find? I'll first explain the basic dictionary search
mechanism, but to fully answer this question I have to talk about the
structure of the interpreter - another cool muforth innovation.

find is the workhorse word that searches the dictionary. You supply it a
token (start, length), and the pointer to the head of a dictionary chain
(such as .forth.). find returns true and a pointer to the word's code if it
found it, or false and the (start, length) tuple you originally gave it if
not. This way it's easy to specify a _sequence_ of searches of chains,
using the same token over and over.

Which brings us to...

The interpreter
^^^^^^^^^^^^^^^
Like everything in muforth, I wanted the interpreter to be dead simple,
but incredibly flexible. Traditionally, Forth interpreters have relied on
lots of gross hacks and "tricks" that not only make them hard to
understand, but hard to use as well.

The interpreter basically looks like this:

  : interpret  ( start length)
    >text source 2!  first !
    begin  token  dup while  consume ?stack  repeat ;

>text converts (start, length) - the string we want to interpret - into
(end, -length) & -length, which go into source and first, respectively.
(This Forth code is a bit hypothetical; this really happens in C in
interpret.c.)

The loop is simple. Get a token. If its length is 0, we're done. Otherwise,
consume the token and then check the stack for over- & underflow.

Wow. That's it?

The magic all happens in consume. Depending on the state of the
interpreter, consume does wildy different things with the token.

It may not surprise Forth old-timers, but muforth has a variable called
state. Its use is _much_ different than the traditional one, though. My
state points to a two-cell data structure. The first cell is a pointer to
the code to _consume_ a token; the second is a pointer to code to display
an informative prompt, so if you're doing something at the keyboard, you'll
know what state the interpreter is in.

consume is then defined as:

  : consume  state @  @execute ;

I call this (consume, prompt) tuple a _mode_. muforth has only two modes
built-in. Can you guess what they are? Interpret and compile, of course!

Here is the consume function for interpret mode. It has a funny name.
Because [ leaves the colon compiler and ] enters it, the interpret-mode
consume function is called _[ and its compiler dual is _] .

  : _[   ( interpret one token)
       .forth. find  if  execute ^ then  number ;

Look in .forth. for the token. If found, execute it. If not, try to convert
it as a number. If that fails, number will complain. Note that there is
_no_ mention of .compiler. ! Words in .compiler. are invisible to interpret
mode.

The colon compiler's consume function is this:

  : _]   ( compile one token)
    .compiler. find  if  execute ^ then
       .forth. find  if  compile, ^ then  number, ;

Search .compiler. for the token. If found, execute it (it is a _compiling
word_). If not, search .forth. . If found there, compile it. If not found,
try to convert it as a number. If that fails, number, will complain. If it
succeeds, it will compile a literal (hence the name "number,".)

Super-simple, right?

Again, Forth old-timers will laugh at how simple this is compared to what
it replaces: ONLY/ALSO, arrays of weird nibble-encoded dictionary chain
indices, etc.

I think the fundamental difference between this approach and the
traditional ones is that the traditional approaches allow for - in fact,
_demand_ - the run-time creation of search orders. In my humble opinion,
that's simply the wrong time to be creating a search order! When you figure
out what you're trying to do - what your interpreter _mode_ is supposed to
accomplish - you're going to arrive at a search order that is _fixed and
constant_ for that interpreter mode.

I should add that the traditional approaches only specify a search order;
you still have to have immediate words and other gross hacks in order to
have compiling words work. My way specifies not only the search order, but
exactly what to do when a token is found in a particular chain. And the
"encoding" for it is very simple and easy to read. In fact, it's the
simplest thing that could possibly work: straight Forth code!

What makes this _really_ nifty is that it's trivial to create more modes
for the interpreter. startup.mu4 creates a mode for conditional
interpretation, by creating a new dictionary chain called .conditional. and
creating a new interpreter mode. It's a bit hairy, mostly because it allows
_nested_ conditionals, but it's not much code and it's a nice approach: use
the interpreter to scan for tokens for you! That's what it's for!

The simplicity and extensibility of this approach wins big for
metacompilation as well, which, when I finish working on muforth's
"universal metacompiler", I'd like to write about.

The funny compiler word called \
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ANS Forth has a word called POSTPONE that is horrendously complicated but
unfortunately actually useful.

Before I talk about that I want to mention the idea of forcing
interpretation or compilation from a different dictionary chain than the
current interpreter mode would normally choose. An easy example is that
we're writing a compiling word that does the same thing that another
compiling word does, but then does a bit of its own stuff as well. Without
some kind of "escape" mechanism there is no way to compile a call to the
other compiling word, since any mention of it inside a colon definition
will result in its _execution_ rather than its compilation. What to do?

muforth defines several such escape words.

  \f forces compilation of the following token from the .forth. chain
  \c forces compilation from the .compiler. chain

But if you look through startup.mu4, you'll see very few references to
either of these words. Mostly you'll see references to a funny word called
\ .

\ is my name for POSTPONE, courtesy of Chuck Moore. (It was part of
cmFORTH.)

\ is like \c, but it does more than that.

First it looks for the token on .compiler. and compiles it if found. (This
is what \c does, but it stops there - in fact it complains if the token
wasn't found on .compiler. .)

If instead it's on the .forth. chain, \ compiles code into the word that
we're currently compiling, that, when later _executed_, will compile a call
to the word we found in .forth. .

It's kind of a postpone, no matter what. Let's say we're defining the word
foo, like this:

  : foo  \ bar ;

If bar is a .compiler. word, postpone _its_ execution until _foo_
executes. If bar is a .forth. word, postpone its _compilation_ until foo
executes. That's probably the best way to think about it.

With that in mind, let's think about...

Structure compiling words
^^^^^^^^^^^^^^^^^^^^^^^^^
One way to write a loop is like this:

   begin ... <test> while ... repeat

begin marks the beginning. <test> leaves a flag on the stack, which while
consumes, falling through if true and jumping _past_ repeat if false.
repeat then jumps to the code following begin.

So what gets compiled looks something like this, using labels (like
"beginning:") as jump destinations:

  beginning:
    ...
    <test>
    branch if zero to end
    ...
    branch to beginning
  end:

begin compiles nothing, but marks the top of the loop by pushing the
address of beginning. until compiles the branch if zero. repeat compiles
the branch to beginning, the address pushed on the stack by begin.

Let's compare this to the code produced by if/then.

  <test> if ... then

This compiles

    <test>
    branch if zero to end
    ...
  end:

Hmm. This looks just like the code that while is supposed to compile. Do if
and while do exactly the same thing? Not quite. But before we look at the
difference, let's think about what repeat has to do. It has to compile a
backwards branch to beginning, and _then_ it has to fix up the address of
the forward branch that while compiled so that it branches to end.

So what about while? After compiling the conditional branch, the stack has
two pointers on it: beginning, and a reference to the forward branch that
repeat will fix up. But they're in the wrong order! repeat needs to see
beginning first, then the forward branch fix up address. So we need to swap
the addresses on the stack. While in theory repeat could do this, in
practice that's the wrong place, because nothing prevents us, with properly
defined words, from doing

  begin ... while  ... until ... then

which is actually a _very_ useful construct. Unless while does the swap,
until will break, because like repeat it expects the address of the
beginning of the loop on the top of the stack.

All of this explains why while and repeat are defined thus:

  : while   ( dest - src dest)   \ if  swap ;
  : repeat  ( src dest -)     \ again  \ then ;

In fact, repeat's definition as "\ again \ then" suggests using "until ...
then" in our own code.

Of course, I haven't given any good reason for using \ over \c to compile
references in compiling words to other compiling words. Oh well.

The joy of tail recursion
^^^^^^^^^^^^^^^^^^^^^^^^^
Tail-recursion was a motivation for writing muforth. What the heck is it?
you ask. Fair enough.

If I write

  : foo  a b c foo ;

What will happen when foo executes? On a normal Forth, the last reference
to foo will compile a call to foo, which, when executed, will push the
return stack. (Actually, in most Forths, in the presence of SMUDGE, the
reference to foo inside foo will fail unless we somehow tell the compiler
that foo is "recursive".)

After foo executes for a while, the return stack will overflow, and the
program will crash.

But let's look closer at foo. Is there any reason to push the stack when we
call foo from itself? What's it going to do when it returns? It's going to
return to its caller! So why return to ourselves if we're going to
immediately return to our caller? Why indeed!

The call to foo is in what Scheme people call "tail position". Any call in
tail position has nothing terribly interesing to do when it returns - it's
going to return directly to its caller. So what was discovered is that
calls in tail position can be converted to jumps. This conversion is called
"tail-call conversion", and it's very simple, but very powerful.

What happens when we do this? Suddenly foo becomes a word that contains an
infinite loop, but one that consumes zero stack space. We've defined a word
that is syntactically recursive, but procedurally iterative. This is a very
powerful idea, but one that's been with us since the dawn of Lisp, way back
when only FORTRAN existed.

Using tail-recursion obviates the need for any looping constructs. muforth
has them, but really it doesn't need them.

A last question worth asking is, when can we do this conversion, and when
can't we? At any exit from a word, either via ^ or ; we can do the
conversion. There is one case where we have to be careful. If the byte
after our converted tail-call is a jump destination, we have to compile a
return instruction there.

In the following code:

  : foo  if a b c then  foo ;

"call foo" gets converted to "jump foo". What about:

  : foo  if a b c foo then ;

Here the conversion happens too, but _also_ a return instruction gets
compiled after the "jump foo". Strange but true!

Simple and tail-recursive native compiler
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Since muforth text is compiled into native code, we need to define a
compiler. Compilers are usually big and complicated, and take a long time
to write. This is because most languages are big and complicated, and also
because the compiler writers are interested in the best runtime performance
they can get.

Forth is simple, and with muforth I'm really more interested in simplicity,
ease of use, and ease of understanding than in raw performance. The code
that the compiler generates frankly sucks, but it's really simple.
Amazingly even sucky code can run fast enough to be perfectly usable and
useful!

So, how does it work?

Like most Forths, muforth spends most of its time generating sequences of
calls to words. Threaded Forths do the same thing: they compile sequences
of "execution tokens" (to use the FORTH-83 and ANS term) which are
essentially calls with the call opcode stripped off. So, at the lowest
level, almost everything that gets compiled in muforth is a call.

The compiler "remembers" the address of the last call instruction that it
compiled, and so when we're about to compile a "return" to exit from the
word, if the last instruction compiled was a call (and therefore it's in
tail position), it gets converted to a jump.

That's pretty simple. But what else is there?

One of the issues with writing muforth was that gcc didn't let me define a
global register variable to use for the stack pointer or the top of stack,
both of which, in the absence of gcc limitations, I wanted. I can't think
of a way to generate even _reasonably_ good code unless I have those two
global registers. But I can't have them. Pooh.

So, the stack is simply a C array (of cell_t's), and the stack pointer (sp)
is a normal C global variable that lives in memory, that points to a
cell_t. TOP is just sp[0].

If I am generating code that does anything with the stack, I first have to
get sp into a register, indirect off of it, maybe modify it, and then write
it back. So there are a bunch of routines in the compiler C code that
compile these various operations.

There is also code to generate four different kinds of branch instructions,
the combinatorial product of (unconditional branch, branch if top == 0) and
(pop the top, do not pop the top), and also the goo underlying for/next.

Since getting at the return stack (the true hardware stack) is impossible
without low-level support, there are several routines that compile various
pushes, pops, and copies that involve the return stack.

That's about it! For the x86 it's about 300 lines of code.

All of the smarts for compiling control structures (if/then and friends),
and new data structure types (create/does>) is _all_ defined in Forth in
startup.mu4.

Towards a new create/does>
^^^^^^^^^^^^^^^^^^^^^^^^^^
First, let's talk a bit about "old" create/does>, what it does, and how it
(used to) work.

Basically, create creates a new word in the dictionary, and does> causes
this new word, when later executed, to execute a particular piece of code.

create and does> are used to define "defining words" - words that build new
kinds of data structures. The defining word contains code to create the
data structure, and code to be executed by all the "child words" that this
defining word is used to define.

So a word called "definer" could be written:

  : definer  create  <build the data structure>
              does>  <do something with data structure>  ;

And used:

   definer foo

The code to build the data structure is executed when the definer
_executes_ - this is also the _compile_ time of foo. The code to do
something with the data is executed when _foo_ executes. So we can create
new structures and new behaviors. But how do we link them? In other words,
how to tell the behavior code - which lives in definer and is thus _common_
to all words that definer defines - where to find the data of each child
word, such as foo?

What used to happen is that the child word, foo, would look like this when
compiled:

  foo:
    call definer_code
    <foo's data>

When foo is executed the call to definer_code has the side effect of
_pushing_ the address of foo's data onto the return stack. (The use of
"call" here is confusing. call is being used for its behavior "push address
of following byte and then jump", which is what a call does. But normally
the "address of following byte" is construed as a return address. Here it
is used as a data address, which is popped and never used as a return
address.)

definer_code can then pop the data address off the return stack, and push
it onto the data stack, where the definer_code can use it.

But there are a couple of subtleties here. One is that definer_code is
Forth, so in addition to moving the address of foo's data onto the correct
stack, an entry into the Forth interpreter (for threaded Forths), including
a push of the return stack, has to occur as well. Another is that the
body of foo now mixes code with data, something that I am trying to avoid
with muforth.

I'm going to avoid explaining the subtleties of how threaded Forths have
traditionally accomplished this. It's one of the most strange and wonderful
aspects of Forth, but it's also subtle and complicated, and the way muforth
does it is quite a bit simpler. Maybe when muforth's way really makes a lot
of sense to you you'll be ready to conquer the "old" way.

In summary then, when the child word executes two things have to happen:

  * the address of its data is pushed onto the data stack;
  * a call is made to the shared code in the defining word.

When I started thinking about clean but clever ways to push an address of
data, and then jump to a common piece of code, but without mixing code and
data in the child word, I realized that it was impossible. The best I could
do was to load a register with a constant, and then jump to the common
code.

But this, it turns out, is all you need! The constant can be a constant,
with intrinsic value of its own, or it can be the address of the child
word's data, if it has any.

What this means in usage is that whereas the old-style does> "knew" what
address to push, my new-style does> has to be passed a constant value to
compile into the child. If the constant should be an address of a data
area, then call ram to get an address, and pass it to does>. Sometimes the
stack manipulation is bit clumsy, but it's a good system in general.

It wasn't necessary to do things this way. I could have kept the old
behavior, that does> compiles code to push the address of the next free
byte of ram - in other words, that the "constant" it pushes is always an
address. This makes the defining words look a little cleaner, but it is
less efficient when you actually want to compile a constant into the data
space (you have to comma it into ram when you define the word and then
fetch it out again whenever you want to use it), and it doesn't allow for
the address being anywhere but the next available byte of ram. On ROMed
systems you sometimes want does> to push a ROM address instead. This new
way lets you choose which you want, by simply supplying the proper constant
to does>.

Inside the guts of create/does>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
With that introduction, let's look at the gory implementation details. The
code examples here are very x86-specific since at present that is the only
implementation. But it's the ideas that I'm interested in conveying, so
hopefully this will not be a deterrent.

First of all, "does>" changes from compiling an _address_ to compiling a
_constant_ - which could, of course, be an address. does> splits the code
to push a constant into two parts: a load part and a push part. The load
part is the code:

  movl $constant,%edx

The push part is the call [or jump!] to push-literal. This is the same code
that "literal" generates; we're splitting it into two parts to save code
space.

Here's how it works. Let's say we're writing an assembler and need words
for addressing modes. Like many create/does> words, these consist of a
_constant_ and some code. The old way to do this would go something like:

  : md  create ,  does> @  a b c ;

This is a clumsy way to work with constants - compiling them into the word
only to fetch them out again - and is inefficient when compiled to native
code. My modernized way looks like this:

  : md  create  does>  a b c ;
  octal 004 md deferred

does> by default creates a constant, which gets pushed when the child word
- "deferred" in this example - is executed. The actual code that is
compiled looks like this:

  md: call create         ; define-time
      call ;does          ; define-time
  md_does:
      call push_literal   ; run-time
      call a              ; run-time
      call b              ; run-time
      jmp  c              ; run-time

  deferred:
      movl $004, %edx
      jmp md_does

Wait a minute! How did that ";does" thing get in there? Well, does> is
actually a _compiler_ word. It compiles "call ;does" followed by "call
push_literal" into the word being defined - "md" in this case.

When md is called - to create a child word, or, equivalently [for OO types]
an _instance_ - md create's the new word and then ;does compiles the load
constant and the jump to md's run-time code.

If we hadn't split the push-constant part, the code in each child word -
and there could be lots of them! - would have to have an extra call:

  md: call create         ; define-time
      call ;does          ; define-time
  md_does:
      call a              ; run-time
      call b              ; run-time
      jmp  c              ; run-time

  deferred:
      movl $'006, %edx
      call push_literal
      jmp md_does

What we have essentially done is move the "call push_literal", since it is
common to every child word, out of the child and into the parent.

With that long-winded intro, we can now get into some of the odd subtleties
of this brave new world.

The first subtlety is that while does> and constant - as defined below -
both create some type of constant, they do it slightly differently.
constant creates a word and writes code to push a constant - both the
load and push halves of that action - and then forces a _tail-call
conversion_ of the call to push_literal, changing it to a _jump_ and
thereby ending the word. It can do nothing more than push its constant.

This ends up being syntactic sugar;

  4 constant cell

and

  : cell  4 ;

produce exactly the same code; namely:

  cell:
    movl $4,%edx
    jmp  push_literal

does> is different; it leaves open - in fact encourages! - doing something
exciting with the constant that has just been pushed. (Remember that this
constant could be the address of some data structure.) It splits the
constant-pushing task into two parts, as described above, putting the load
into the child and the push into the parent, but leaves the call to
push_literal as a _call_ that is expected to be followed, as in "md" above,
with code that processes the constant.

To drive home this subtle difference, assume that we defined constant thus:

  : constant  create does> ;
  4 constant cell

Now what do we have?

  constant:
    call create
    call ;does
  constant_does:
    jmp  push_literal

  cell:
    movl $4,%edx
    jmp  constant_does

This is slightly less efficient; it compiles extra code into the parent
word - the run-time code "jmp push_literal" - and to execute this constant
we do a jump to a jump, which is both silly and slow.

Note: the trailing ; in our example definition of "constant" above changed
the "call push_literal" to "jmp push_literal". Sneaky!

The other important subtlety is that these constants - esp. when they are
the address of some data structure - can be in the way when compiling said
data structure. The constant that you want compiled into the child word has
to be on the stack when does> [really ;does] executes. It's a bit clumsy,
but you have to do something like this:

  : clumsy  create  ram  push  , ,  pop  does>  a b c ;
  thing1 thing2 clumsy example

I suppose you could write that like this too, with consequent changes in
how the data is "fetched" and used in a, b, and c:

  : less-clumsy  create  literal  does>  a' b' c' ;
  thing1 thing2 less-clumsy example

In this case we compile _two_ literals, the second one being "split in two"
in the canonical way:

  less_clumsy:
    call create
    call literal    ; compiles code to load & push a literal
    call ;does
  less_clumsy_does:
    call push_literal
    call aprime
    call bprime
    jmp  cprime

  example:
    movl $thing2,%edx
    call push_literal
    movl $thing1,%edx
    jmp  less_clumsy_does

With this new kind of create/does>, you choose how to do it.

Having understood create/does> you've tackled one of sublime mysteries of
Forth. The other sublime mystery is our next subject...

The structure and interpretation of metacompilers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* getting the right words - dict structure, escapes \o etc
* time phases
* compiling numbers
* finish with example of create/does>
* be clear about inside == compiling words, outside == mostly defining words

A metacompiler is what most people outside of Forth call a cross-compiler:
a compiler that runs on one machine (the "host") and compiles code for
another (the "target"). We do not here discuss an exciting variation called
the Canadian Cross, wherein we compile a cross compiler on one machine (the
"compile host"), that executes on a second machine (the "host"), to compile
code for a third machine (the "target"). While exciting, there isn't at the
moment the need or application for this when using muforth.

What makes metacompilers hard to write and understand? I think it's a
combination of three things, which interrelate in complicated ways:

  * the time phases of metacompilation;
  * "when words collide"; or, getting the word you asked for;
  * the two sides of defining words

I'll talk about them in turn, but they are intertwingled, so it won't be an
entirely linear discussion.

Time phases of metacompilation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's start with the idea of metacompiler time phases; this should lead
naturally to discussion of the other aspects.

This aspect of metacompilation is rather ill-defined, and different folks
have pioneered different approaches over the years. Unfortunately (as I'm
not much of a fan of Perl or its philosophy) there's more than one way to
do it. One chooses ease over clumsiness, perhaps. But it my experience
"ease" often translates into ambiguity and confusion. My discussion here
will focus on an approach that chooses explicitness and verbosity over
"ease".

Roughly speaking, there are three time phases of metacompilation:

  (1) compiling the metacompiler code;
  (2) executing the metacompiler code, compiling the target code;
  (3) executing the target code.

(1) and (2) occur on the host machine; (3) occurs on the target machine.
(If we are doing tethered debug, (3) needs a little help from the host
machine, but mostly the execution occurs on the target.)

We can break this down still further.

In (1) we are compiling code that will later execute on the host to read a
text stream of "target Forth code" and build a memory image of the compiled
target code. We'll likely need all of the following:

  * an assembler for the target architecture;
  * defining words (: variable code etc.) for the target;
  * compiling words (; if then begin while repeat etc.) for the target;
  * a list of target words as we define them;
  * assorted and sundry utility words for managing memory spaces, etc.

In the past I have overlapped defining this infrastructure with
metacompiling the target code, in order to keep things together that seem
to "belong" together. This is a nicety that complicates the system. We will
not be discussing such a system here; rather, we will try to "firewall" the
phases from each other as much as possible, in the interest of modularity.

Looking at the list above we immediately notice that we will be redefining
almost all the basic infrastructure words with metacompiler versions, and
we will be creating target versions of most of the .forth. chain. Creating
separate chains for these new words will help dramatically to prevent
confusion and ambiguity.

In both phases (1) and (2) being able to specify exactly _which_ version of
a word we want becomes critical. That brings us naturally to our second big
metacompiler subject...

When words collide; or, how to keep aliasing from driving you crazy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You have to be unambiguously clear about which version of "dup" you mean
when you call it by name. This is the idea of "when words collide": you
type "dup", but because of the design of the system you get the _wrong_ dup
(the host's instead of the target's, the metacompiler's rather than the
host's, etc.) and _something_ gets compiled or executed, and a while later
everything crashes down, but you have no idea why.

This kind error is a Royal Pain to find and fix. It is much nicer to force
disambiguation of "which dup" up front, rather than pay later. The muforth
way is to create several dictionary chains, one for each "context" in which
a word might appear, and then to create several interpreter modes, each one
corresponding roughly to a phase or task in the metacompiler.

John Rible and Bill Muench wrote an elegant metacompiler in which they
pioneered the use of the terms "outside" and "inside" to mean "metacompiler
words that occur outside of colon definitions" (mostly defining words) and
"metacompiler words occuring inside colon definitions" (mostly compiling
words). I really like the descriptiveness of these terms and to honor their
inventors have named the dictionary chains in muforth's metacompiler that
serve similar purposes .outside. and .inside. . They are just like .forth.
and .compiler. on the host, but have the advantage of having names distinct
from, and cooler than, the host names.

  * .outside. contains target _defining_ words:   : variable code
  * .inside. contains target _compiling_ words:   ; if then begin while repeat

In addition to these, I define .assembler. to hold the target assembler
words, and .target. to hold the words that we're - finally! - defining on
the target.

By putting the words into different chains like this it becomes much easier
to know what you are getting when you say "dup". We define an interpreter
mode for each phase of metacompilation, and we try to define each mode to
minimize ambiguity - the collision of words, or accidental getting of
something other than called for - even at the cost of increasing verbosity.

Instead of building into our interpreter modes automatic searches of all
the chains that _might_ contain the word we want - thus dramatically
increasing the likelihood of word collision - we define a prefix "escape"
mechanism for executing or compiling a word from a chain _other than_ the
one that would be "normal" for our interpreter mode. A perfect example is
the use of \ and \c to force compilation of .compiler. words.

For every chain in the metacompiler we define an escape, thus:

  \f  compile from .forth.
  \c  compile from .compiler.
  \o  compile from .outside.
  \i  compile from .inside.
  \a  compile from .assembler.
  \t  compile from .target.  (occasionally useful if .target. words are
                              defined using create/does>)

As a concrete example, assume we are writing an outside word that happens
to depend on both words from .forth. and .outside. . Since it would be
folly (I think - I may address this question further later on) to define an
interpreter mode that searches .outside. then .forth. (or vice versa), we
instead use the regular Forth compiler to define .outside. words - which
means that _every_ use of an .outside. word within the definition must be
prefixed with \o . This might seem clumsy and verbose, but it has the
advantage of being unambiguous.

A key point to remember is this: while _defining_ all the metacompiler
words we are running the normal muforth interpreter and compiler: only
.forth. and .compiler. are being searched. If we want to reference an
.outside. word that we just defined, we have to be explicit about it:
eg, "\o here" .

We can now sketch the possible "consume" functions for various interpreter
modes. The .outside. words "remote", "compile," and "number," respectively
do remote execution on the target, compilation of a procedure call for the
target, and compilation of a literal for the target. (I've left off the
prompts for clarity.)

-:  .assembler. find  if  execute ^ then
      .outside. find  if  execute ^ then  number ;
mode meta-assembler

-:  .outside. find  if  execute ^ then
     .target. find  if  \o remote ^ then  number ;
mode meta-outside

-:  .inside. find  if  execute ^ then
    .target. find  if  \o compile, ^ then  \o number, ;
mode meta-inside

-:  .target. find  if  \o remote ^ then  number ;
mode target

Things to note:

  * None of these modes helps directly with the task of building the
    metacompiler itself. If we need to refer to an .assembler. word, an
    .outside. word, or an .inside. word, we have to use an escape (\a, \o,
    or \i).

  * meta-assembler mode does not search .target. . If we want to find the
    target address of a variable, we will have to define a special "escape"
    function to do this. In the past I have had the meta-assembler mode do
    this automatically but I think this is error-prone.

  * The assembler will need to be able to do basic arithmetic (calculating
    offsets, shifts, masks, etc.). The best way to make this possible is to
    put aliases for a handful of host .forth. math words into .assembler. .
    This must be done judiciously. Another option would be \f escapes.

  * Target mode does not search for any "debugging" words that execute on
    the host. These are necessary for painless use of the system. Probably
    an .interaction. chain is necessary, to be _very judiciously_ filled
    with synonyms (aliases) of host words that are useful for changing the
    radix, printing the stack, dumping memory, etc.

  * Only meta-inside does special processing of numbers. I'm not sure that
    this is right, but I think it is. Where numbers are converted from host to
    target form - say, in the assembler when it takes an offset from the
    stack to compile into an instruction - target-specific code can be
    executed. But when we mention a number by "name", we first need to push
    it onto the stack. It seems like the host's number does a fine job of
    this. (If our target were a 64 bit machine and our host 32 bits, this
    might not be quite as true.)

Once we have written all the parts of our metacompiler, the only thing left
to do to start it up is to change the interpreter mode to meta-outside.

In order to be able to switch among the modes defined for the metacompiler
we'll probably want something like:

  outside
  : -]   ( start the meta colon compiler)  meta-inside ;
  :  ]   \o literal  \o -] ;
  : :    \o name  \o -] ;

  inside
  : [    ( switch to outside)  meta-outside ;
  : ^    <compile return on target> ;
  : ;    \i ^  \i [ ;

These are analogous to the code in .forth. and .compiler. .

If "code" and "end-code" bracket an assembler definition, we might define
something like this:

  outside
  : code  \o name  meta-assembler ;

  assembler
  : end-code  meta-outside ;

The Janus effect; or, the two faces of defining words
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A subtle and terribly confusing aspect of metacompilers is the definition
of defining words, like : and variable. The trouble stems from their dual
nature: they have two faces, one facing the host Forth, and one facing the
target.

All defining words are, by definition, create/does> words. Normally what
happens between create and does> is that we write code to build a data
structure, and between does> and ; we write code that is executed to do
something with the data. The difficulty is that, in a metacompiler, the
words between create and does> execute _on the host_, and the words between
does> and ; execute _on the target_.

We are already confused enough by the aliasing of word names that
naturally happens in a metacompiler; now we have to deal with words
(defining words) that draw from two (or more) different sets of words, and
have several distinct time phases built into them as well! Yow!

This potential for utter confusion led me to move away from my past efforts
at "ease" and "naturalness" in the writing of defining words, towards a
more explicit way that requires the author to specify the source of all non
.forth. words.

The best way to illustrate is with an example, followed by a discussion of
each token and its execution behavior.

Let's define a target defining word for arrays. Like all target defining
words, it goes into .outside. .

  outside
  : array  \o create  \o ram  swap  \o cells  \o allot
           \o does>

Of course here my beautiful system breaks down.

I need a special does> for defining words.

The end
^^^^^^^
I hope this helped to explain how muforth works and why I think it's cool.
I hope you'll think it's cool too, and use it to do interesting things!