Cambridge Encyclopedia :: Cambridge Encyclopedia Vol. 26
 

FORTH - Overview, Programmer's perspective, Structure of the language, Programming, Code examples

A compact computer programming language originally written for astronomers in the USA, which has advantages for use in control applications using small computer systems. The name is derived from fourth, as in ‘fourth generation language’.

Portions of the summary below have been contributed by Wikipedia.
Moore
Typing discipline: Typeless
Major implementations: Forth, Inc., gForth
Dialects: colorForth, FCode
Influenced by: Burroughs B5000, Lisp, APL
Influenced: PostScript, Factor

Forth is a programming language and programming environment, initially developed by Charles H.

A procedural, stack-oriented and reflective programming language without type checking, Forth features both interactive execution of commands (making it suitable as a shell for systems that lack a more formal operating system) and the ability to compile sequences of commands for later execution.

Forth is so named because "[t]he file holding the interpreter was labeled FORTH, for 4th (next) generation software - but the operating system restricted file names to 5 characters."

Overview

Forth offers a standalone programming environment consisting of a stack-oriented, interactive, incremental interpreter and compiler. A Forth system consists of words (the term used for Forth subroutines); new words are defined in terms of old words, and there is no distinction made between the words that define the Forth language and those that the programmer creates. A typical Forth package consists of a pre-compiled kernel of the core words, which the programmer uses to define new words for the application. The completed application can be saved as an image, with the new words already compiled. Generally programmers extend the initial core with words that are useful to the types of applications that they write, and save this as their working foundation.

Forth uses separate stacks for storage of subroutine parameters and subroutine activation records. The parameter or data stack (commonly referred to as the stack) is used to pass data to words and to store the results the words return. The linkage or return stack (commonly referred to as the rstack) is used to store return addresses when words are nested (the equivalent of a subroutine call), and store local variables. There are standard words to move data between the stacks, and to load and store variables on the stack.

The logical structure of Forth resembles a virtual machine.

Forth became very popular in the 1980s because it was well suited to the small microcomputers of that time, as it is compact and portable.

Forth is a simple yet extensible language;

Programmer's perspective

Forth relies heavily on explicit use of a data stack and reverse Polish notation (RPN or postfix notation), commonly used in calculators from Hewlett-Packard. Extending the compiler only requires writing a new word, instead of modifying a grammar and changing the underlying implementation.

The word * multiplies the two numbers on the top of the stack and replaces them with their product.

The word + adds it to the previous product.

Even Forth's structural features are stack-based.

This code defines a new word (again, 'word' is the term used for a subroutine) called FLOOR5 using the following commands: DUP duplicates the number on the stack; The text in parentheses is a comment, advising that this word expects a number on the stack and will return a possibly changed number. The interpreter reads a line of input from the user input device, which is then parsed for a word using spaces as a delimiter; When the interpreter finds a word, it tries to look the word up in the dictionary. If the word is found, the interpreter executes the code associated with the word, and then returns to parse what is left of the input stream. If the word isn't found, the word is assumed to be a number, and an attempt is made to convert it into a number and push it on the stack; Otherwise, if both the lookup and number conversion fails, the interpreter prints the word followed by an error message indicating the word is not recognised, flushes the input stream, and waits for new user input.

Compiler

The definition of a new word is started with the word : (colon) and ends with the word ;

will compile the word X.

Assembler

Most Forth systems include a specialized assembler that produces executable words.

Operating system, files and multitasking

Classic Forth systems traditionally use no operating system nor file system. The word BLOCK is employed to translate the number of a 1K-sized block of disk space into the address of a buffer containing the data, which is managed automatically by the Forth system.

Multitasking, most commonly cooperative round-robin scheduling is normally available (although multitasking words and support are not covered by the ANSI Forth Standard). The word PAUSE is used to save the current task's execution context, to locate the next task, and restore its execution context.

By contrast, some Forth systems run under a host operating system such as Microsoft Windows, Linux or a version of Unix and use the host operating system's file system for source and data files; the ANSI Forth Standard describes the words used for I/O. Typically they have a larger and different set of words from the stand-alone Forth's PAUSE word for task creation, suspension, destruction and modification of priority.

Self compilation and cross compilation

A full-featured Forth system with all source code will compile itself, a technique commonly called meta-compilation by Forth programmers (although the term doesn't exactly match meta-compilation as it is normally defined). The usual method is to redefine the handful of words that place compiled bits into memory. The compiler's words use specially-named versions of fetch and store that can be redirected to a buffer area in memory. Such compilers define words to access both the target computer's memory, and the host (compiling) computer's memory. For embedded systems, the code may instead be written to another computer, a technique known as cross compilation, over a serial port or even a single TTL bit, while keeping the word names and other non-executing parts of the dictionary in the original compiling computer. The minimum definitions for such a forth compiler are the words that fetch and store a byte, and the word that commands a forth word to be executed.

Structure of the language

The basic data structure of Forth is the "dictionary" which maps "words" to executable code or named data structures. The dictionary is laid out in memory as a linked list with the links proceeding from the latest (most recently) defined word to oldest, until a sentinel, usually a NULL pointer, is found.

A defined word generally consists of head and body with the head consisting of the name field (NF) and the link field (LF) and body consisting of the code field (CF) and the parameter field (PF). Described as a structure, a dictionary entry might look this way:

University of Phoenix structure byte: flag \ 3bit flags + length of word's name char-array: name \ name's runtime length isn't known at compile time address: previous \ link field, backward ptr to previous word address: codeword \ ptr to the code to execute this word any-array: parameterfield \ unknown length of data, words, or opcodes end-structure forthword

The name field starts with a prefix giving the length of the word's name (typically up to 32 bytes), and several bits for flags. The character representation of the word's name then follows the prefix.

The link field contains a pointer to the previously defined word.

The code field pointer will be either the address of the word which will execute the code or data in the parameter field or the beginning of machine code that the processor will execute directly. For colon defined words, the code field pointer points to the word that will save the current Forth instruction pointer (IP) on the return stack, and load the IP with the new address from which to continue execution of words.

Structure of the compiler

The compiler itself consists of Forth words visible to the system, not a monolithic program. This allows a programmer to change the compiler's words for special purposes.

The "compile time" flag in the name field is set for words with "compile time" behavior. Most simple words execute the same code whether they are typed on a command line, or embedded in code. When compiling these, the compiler simply places code or a threaded pointer to the word.

Compile-time words are actually executed by the compiler. The classic examples of compile-time words are the control structures such as IF and WHILE. All of Forth's control structures, and almost all of its compiler are implemented as compile-time words.

Compilation state and interpretation state

The word : (colon) takes a name as a parameter, creates a dictionary entry (a colon definition) and enters compilation state. The interpreter continues to read space-delimited words from the user input device. If a word is found, the interpreter executes the compilation semantics associated with the word, instead of the interpretation semantics. The default compilation semantics of a word are to append its interpretation semantics to the current definition.

The word ; It is an example of a word whose compilation semantics differ from the default. (semi-colon) and several other words are undefined in ANS Forth.

The interpreter state can be changed manually with the words [ (left-bracket) and ] (right-bracket) which enter interpretation state or compilation state, respectively. These words can be used with the word LITERAL to calculate a value during a compilation and to insert the calculated value into the current colon definition.

In ANS Forth, the current state of the interpreter can be read from the flag STATE which contains the value true when in compilation state and false otherwise.

Immediate words

The word IMMEDIATE marks the most recent colon definition as an immediate word, effectively replacing its compilation semantics with its interpretation semantics. Immediate words are always executed, not compiled, in either state. is an example of an immediate word. In ANS Forth, the word POSTPONE takes a name as a parameter and appends the compilation semantics of the named word to the current definition even if the word was marked immediate. Forth-83 defined separate words COMPILE and [COMPILE] to force the compilation of non-immediate and immediate words, respectively.

Unnamed words and execution tokens

In ANS Forth, unnamed words can be defined with the word :NONAME which compiles the following words up to the next ; The word EXECUTE takes an execution token from the data stack and performs the associated semantics. The word COMPILE, (compile-comma) takes an execution token from the data stack and appends the associated semantics to the current definition.

The word ' (tick) takes the name of a word as a parameter and returns the execution token associated with that word on the data stack.

Parsing words and comments

The words : (colon), POSTPONE, ' (tick) and :NONAME are examples of parsing words that take their arguments from the user input device instead of the data stack. Another example is the word ( (paren) which reads and ignores the following words upto and including the next right parenthesis and is used to place comments in a colon definition. Similarly, the word \ (backslash) is used for comments that continue to the end of the current line.

Structure of code

In most Forth systems, the body of a code definition consists of either machine language, or some form of threaded code. The fastest modern Forths use subroutine threading, insert simple words as macros, and perform peephole optimization or other optimizing strategies to make the code smaller and faster.

Data objects

When a word is a variable or other data object, the CF points to the runtime code associated with the defining word that created it. A defining word has a characteristic "defining behavior" (creating a dictionary entry plus possibly allocating and initializing data space) and also specifies the behavior of an instance of the class of words constructed by this defining word.

Forth also provides a facility by which a programmer can define new application-specific defining words, specifying both a custom defining behavior and instance behavior.

Data objects defined by these and similar words are global in scope. typically such data objects are used to contain data which is used by a number of words or tasks (in a multitasked implementation).

Forth does not enforce consistency of data type usage;

Programming

Words written in Forth are compiled into an executable form. The classical "indirect threaded" implementations compile lists of addresses of words to be executed in turn; many modern systems generate actual machine code (including calls to some external words and code for others expanded in place).

The tool-box approach is one of the reasons that Forth is so difficult to master.

Code examples

Hello world

For an explanation of the tradition of programming "Hello World", see Hello world program.

One possible implementation:

: HELLO ( -- ) CR ." HELLO

The word CR causes the following output to be displayed on a new line. The parsing word ." The space character separating the word ." as a Forth word.

A standard Forth system is also an interpreter, and the same output can be obtained by typing the following code fragment into the Forth console:

CR .( Hello, world!)

.( (dot-paren) is an immediate word that parses a parenthesis-delimited string and displays it. As with the word ."

The word CR comes before the text to print.

Mixing compilation state and interpretation state

Here is the definition of a word EMIT-Q which when executed emits the single character Q:

: EMIT-Q 81 ( the ASCII value for the character 'Q' ) EMIT ; The word EMIT takes a value from the data stack and displays the corresponding character.

The following redefinition of EMIT-Q uses the words [ (left-bracket), ] (right-bracket), CHAR and LITERAL to temporarily switch to interpreter state, calculate the ASCII value of the Q character, return to compilation state and append the calculated value to the current colon definition:

: EMIT-Q [ CHAR Q ] LITERAL EMIT ;

The parsing word CHAR takes a space-delimited word as parameter and places the value of its first character on the data stack. The word [CHAR] is an immediate version of CHAR.

Both CHAR and [CHAR] are predefined in ANS Forth. IMMEDIATE

Tutorials

Introducción a Forth by Javier Gil (PDF, spanish) Programming Forth by Stephen Pelc, MPE (draft version, on-line) Simple Forth by Leo Wong Gforth tutorial A Beginner's Guide to Forth by Julian Noble (on-line tutorial) Moving Forth, by Brad Rodriguez. A series on writing Forth kernels

History

Forth - The Early Years by Chuck Moore The Evolution of Forth by C.
FORTRAN - Language features, Variants of Fortran, Criticisms and rebuttals, Code examples, FORTRAN jokes [next] [back] Fortaleza - History, Tourism, Cultural features, Education, Sports

User Comments Add a comment…