Coding Is Just Data Entry

...because we all know, ever since Tom Kilburn (see taught us, that code and data are more or less interchangeable. So when you are typing in a program, what you are really typing is data. It just happens to be data on "how to do" something. Whether it is stated declaratively, imperatively, or functionally is just a matter of style; all these forms can be looked at as data. When you are debugging, you are checking to see if you entered the correct "data" in the correct form.

That means that programming languages are just different formats for this "how to do it" kind of data.

There are two common goals that programming languages seek to achieve: [1] compression, and [2] automatic error detection and correction. The two goals are somewhat opposed.

Compression first means creating ways to eliminate repetition - ways such as macros, functions, contexts such as Pascal's "WITH record DO" construct, and the like. Then it means libraries, which find commonalities between programs and allow them to be compressed out. (You can call functions without defining them, except that they are defined once in the library.) Compression is what motivated C's menagerie of operators, and C++'s ability to overload them. Compression motivates libraries. Even when somebody creates a library that never existed before, what they are doing is making it possible for you to write longer programs, that do certain things, without you having to write longer programs.

Automatic error detection means allowing compilers to detect errors. The most common way to do this is by type-checking; objects, which allow you to create your own types, extend this further. The types of function parameters and variables are specified in a declaration and the compiler checks every call. The types of interfaces are specified, and the compiler checks to make sure every object implements the interface if it needs to. Type mismatch errors are detected.

Such things as labels for gotos are motivated by error correction; humans make errors when calculating the addresses of goto targets. Labels make such human calculation unnecessary. Structured programming, which hides the gotos, is designed to correct the kinds of errors where humans make inappropriate gotos.

Compression does not strictly increase the number of errors, since humans have a lot of trouble repeating things accurately. Being able to say something OnceAndOnlyOnce therefore reduces the chance of error.

Error detection, though, usually strictly increases verbosity. You have to declare things before you use them. You have to say things more than once so the compiler, or interpreter, or the compiled code at runtime can check the consistency of these things. Sometimes, in some languages, the repetition can become very annoying; try declaring something inside a C++ NameSpace and defining it outside. But the compiler becomes able to detect errors in your "data." This can be valuable, especially if the compiler can tell what the error is. It saves you from having to debug a machine crash later. Sometimes in C++ I have had the experience of having a program finally compile without errors - and then it runs without errors. All the bugs, then, were caught by the compiler.

A third goal that some programming languages seek to achieve is keeping related things together. That allows humans to detect errors. [Added later: When programmers move an abstract concept into a programming language, they thereby enable manual checking of their designs (which contain such abstract concepts) against the code (which, after the addition, contains them too, so it's an easy check).]

But ultimately, writing code is just data entry, and it may help sometimes to think of it as such, especially if you're writing a large program, or developing a programming language. -- EdwardKiser

[Note: this is only a metaphor, or an analogy.] -- EdwardKiser

Not all analogies are useful, insightful, or even sensible. I'm not attacking the notion based on a literal reading. Coding is data entry, but to call it "just" data entry is seriously misleading.

If coding were "just" data entry, it could be automated. ;)

What we cannot automate is the obtaining of the data. The only reason humans have to enter it is that they are the ones who have it in the first place; the computer does not.

I disagree with this somewhat. Writing a term paper involves hitting keys on a keyboard to generate a document, but it is NOT "just data entry" - there is a vast difference between writing a term paper at the keyboard and typing up a paper that was written longhand. The difference, of course, is that when typing up the longhand notes you don't have to do any composing; the content exists, and you're just moving it to another medium.

But I have never seen (and never expect to see) a programming project where the only bugs were typos. When I sit down to write a program I DON'T know exactly what I will write. If I have done a good job with the design, then I may know all the way down to the list of methods, but as I delve in I'll find that I have to change some, and I certainly don't have every line of code already fully formed in my mind before I start typing it. I remember the C64 magazines I used to get where they'd print entire Basic programs (complete with big DATA statements which entered machine-language subroutines byte by byte). Entering THOSE was data entry, and it required no thought, no micro-analysis (okay... do I use a String or a StringBuffer here?), and no testing (well... it worked or it didn't, but typos aren't the same as bugs).

-- MichaelChermside

Hmmm. The way I was looking at it was, if you have to think while you code, then maybe what you're really doing is finishing off the design while you code. And maybe what you're doing is figuring out "how to phrase" the ideas that are already complete in your head. I think what you're talking about is a combination of design and code - but an enjoyable and profitable one. It is not always possible. I have written programs on paper and typed them in; I have also written systems of macros (or objects) that make the remaining code look practically data-driven. In the former case, you might say I was doing the coding on paper. In the latter case, maybe I was doing it in my head...?

Should this page have been called Coding-Is-Like-Data-Entry?

But data entry is generally rote transcription, whereas CodingIsCreative - more like authoring poetry than it is like dictation.

Coding is kinda like the "data entry" done with a Computer Aided Design (CAD) system to design a house: You're doing design of a house. (...which is really a "program" that tells other people how to build the house! ;-)

You're right, up to a point. If you look at code as data, you have to recognize that it is an especially complicated form of data. Not like a mere list of records. It has nested structures. It has room for creativity. But I still think coding is ninety percent perspiration. The ten percent inspiration is design. -- EdwardKiser

If code and data are the same, why have different words?

To denote the two different respects in which you are looking at it.

I suspect a fundamental asymmetry should be pointed out: Code is data, data isn't necessarily code.

It is when a program or a piece of hardware makes decisions based on it. Then it becomes an instruction, instructing the program to make that decision.

In this sentence, the only bits that have any claim to being called "code" are the formatting characters (the TwoSingleQuotes). The rest is "just data" in a very concrete sense : there isn't an existing program which makes decisions based on the "I" at the beginning of the sentence being an "I" rather than a "Z". So I will readily agree that a) all code is data, and b) some data becomes code; but I assert that most data doesn't.

How does the web browser choose which characters to use out of its fonts?

The dividing line between code and data is often fuzzier than you may think: The "invoice type" character may direct your program to process the invoice in different ways, making this piece of data a minimal program that is "interpreted" by your application program. Also, terms of payment is often specified in a way that must be more-or-less "interpreted" by an application program.

Yup. Sounds fine to me. Fuzzy it is. It's not so much a dividing line as an "arrow," an "entropy metaphor" : it's generally much easier for code to make sense as data than for data to make sense as code.

The premise of this page is reminiscent of the notion that Rodin's Praying Hands is just a slab of marble from which all the bits which weren't the praying hands have been removed.

Compression of data isn't the same as eliminating duplication in code. Error checking and parity bits aren't homologues of variable typing.

Just IMHO, of course.

Here's a peek into some of the experience behind this notion. Suppose you wanted to write a calculator program with 1000 functions. The core of the thing is user input; the user input probably gets driven into a switch statement or something like that. That could be poetic. But each case in the switch statement is, or calls, one of the thousand functions. Suppose the functions are relatively simple in your mind. They are not similar enough that you could write a macro, but you still sense a certain sameness between them. You still have to type them all in. The computer cannot figure them out. The program will not work until you have typed in at least the functions you want to use, but you're ambitious and you want to use them all. When this happens, and you have a thousand similar but not-too-similar functions to type in, then you can certainly come to believe that CodingIsJustDataEntry.

You can also believe it when you hand-draw a state machine for a lexical analyzer or LL parser on a piece of paper, and then you have to convert it into a giant switch statement with a case for every state, and each case will have a switch for every input. The fun part was the drawing of the state machine, not the typing of the nested switch statements. DesignIsCreative. CodingIsJustDataEntry.

I was looking at the code for MAME (see MameEmulator) when I came up with this idea. Specifically, it has an M6809 emulator, which consists of a large switch statement that has a case for every byte that might appear in the M6809's instruction stream. Obviously the programmer was just entering the defined behavior of the M6809 in the form of a C program. It doesn't look like much poetry was involved. Nevertheless, the emulator works. I don't think programmers would actually want to maintain a "poetic" M6809 emulator.

Incidentally, I wonder why they didn't have an array of function pointers and use the instruction bytes as the array index?

To allow the compiler to use tail-chain optimizations (or to obviate them). To eliminate the overhead of creating and destroying that many extra stack frames. Of course, using a switch statement takes far longer to compiler, but I don't think they care about that.

H'm. But does programming have to be that way? If the 1000 functions are that similar then I bet I can write a handful of functions to perform all of them. If I'm translating my lexical analyzer to C then I'll use lex and yacc rather than write it myself. And I wouldn't dream of writing a processor emulator as a monstrous switch statement; there are more elegant ways to do it (or perhaps not, depending on the sparse-ness of the opcodes; but then there's less that's "rote" about it). I think that any programming that constitutes monotonous data entry can be done a different and better way.

LittleLanguages are how programmers defeat the kind of mind-numbing repetition talked about in the previous few paragraphs.

But the problem with LittleLanguages is that you have to implement them, unless you can find one ready-made that happens to meet your specific need. In order to decide whether to implement a LittleLanguage, you have to know how long it will take to implement, and weigh that against how long it will take for you to do its job manually. Sometimes you can make a LittleLanguage pretty quickly, but then it typically is little more than a set of macros. The more power you give it, the more it begins to resemble a full-blown language. Sometimes it is actually easier to do things yourself than to explain to the machine how to do them.

A more "poetic" M6809 emulator would probably approach the problem more as the M6809 CPU does - by masking portions of the op code, dividing the instruction set into categories that have similar processing. But such an implementation, while better satisfying the OnceAndOnlyOnce rule, would probably be slower. - Except in hardware. ;-)

One is also reminded of the book GoedelEscherBach An Eternal Golden Braid.
Don't despair; ProgrammingIsMoreThanCoding.

But if 'Coding' is what’s left of Programming once all the thought has been subtracted or partitioned off, isn't 'Coding' just a synonym for 'Typing'? TypingIsJustDataEntry?!
If CodingIsJustDataEntry then maybe DataEntryIsProgramming?. Or maybe both are DataProcessing?
Recently, I came up with a somewhat bromidic way of explaining to a person with little knowledge of programming my feelings on the proposition "programming is just typing a lot": "Architecture is just drawing pictures." -- DanielKnapp
See Also: DataStructureCentricViewDiscussion, TableOrientedProgramming

View edit of October 19, 2004 or FindPage with title or text search