Simply put, behavior in a system which is not defined by the requirements or specification of the system. In other words, invoking UndefinedBehavior
is a BadThing
, and almost(?) always is a programming error.
The original definition here had a second requirement, that the behaviour "should not be defined by anyone else". This is incorrect; behaviour that is undefined according to one standard or specification may be defined by another. For example, PosixStandard defines behaviour for a C implementation that is undefined by AnsiCee.
It is also common for standards to allow implementations to define behavior that is not defined by the standard.
"Nevertheless, there is something very troubling here. Your program's behavior is undefined
-- you have no way of knowing what
will happen...That means compilers may generate code to do whatever they like: reformat your disk, send suggestive email to your boss, fax source code to your competitors, whatever." -- Scott Meyers, "Effective C++" (EffectiveCeePlusPlus
"The `#pragma' command is specified in the ANSI standard to have an arbitrary implementation-defined effect. In the GNU C preprocessor, `#pragma' first attempts to run the game `rogue'; if that fails, it tries to run the game `hack'; if that fails, it tries to run GNU Emacs displaying the Tower of Hanoi; if that fails, it reports a fatal error. In any case, preprocessing does not continue." -- Manual for the GNU C preprocessor for GNU CC 1.34.
I'd like a compiler that lets me write Hello world as:
#pragma set_language Haskell
module Main where main = putStrLn "Hello, world!"
The very name of UndefinedBehavior
strikes me as a relic of informal operational style definitions that languages like C/C++ "enjoyed". In reality the behavior could be perfectly well defined and it's not the case that the multiplication of 2 integers in C can format the hard drive. One would just need to pick another paradigm, for example Dijkstra's predicate transformers and the behavior would be perfectly well defined. It's true that the now very well defined meaning of a statement like:
a= b*c ;
can be objected on the grounds that it's underspecified, and the weakness of the predicate transformer impairs the construction of correct programs. --CostinCozianu
Tony Hoare explicitly contradicts you (about operational definitions and such), please read his (easy, short) paper that I referenced below, and if you think he is wrong, please explain why he is wrong.
Meanwhile, you inserted your discussion high on the page, it would fit better in the discussion section further down. Note that there's a structure here: the first part of this page is definitional followed by various quotes, before getting into personal opinions.
Non-sense. Have you bother to read what I said ? You did not, so you throw this non-argument.
I guess I phrased it badly. I want for you and I to find an agreement here (possibly an agreement to disagree, at worst, but if so, one where we agree on what the disagreement is), not just to leave it as a ThreadMode argument.
I think that everything you said is, in every particular, at least a reasonable thing to say, whether absolute truth or not.
I think that almost everything you said is something that I agree with, and also that the paper cited below by Tony Hoare agrees with.
There is, however, merely one small part of what you said, that I have a problem with -- and yes, I re-read what you said slowly and carefully, since you encouraged me to -- and perhaps I am still misunderstanding, and if so, by all means please help me understand; I have no ill will here, I'm trying to be careful and analytical.
Notwithstanding that, I have a remaining issue, where you said "The very name of UndefinedBehavior strikes me as a relic of informal operational style definitions". The thing is, in the paper I mentioned (that is the sort of thing that I rather thought you would enjoy, actually), he raises the issues of operational definitions (which, in at least the extreme cases, naturally he has always highly detested) and UndefinedBehavior (as something which he thinks has its place) in the same breath.
Maybe I am still badly misunderstanding, but it's not due to ill-will.
Let me try to anticipate your position. Regardless of perfectly good points that Hoare may have made about places where UndefinedBehavior may make sense, perhaps you are pointing out that that does not mean that UndefinedBehavior is always either desirable or inevitable?
I am perfectly willing to agree that UndefinedBehavior is not the general ideal, and that it may even be hypothetically possible to define languages that have none such, but my understanding is that the latter is only a goal, not a reality, to this day, for precisely the reasons that Hoare briefly mentions. And that I would restate as issues with infinite regress of embedding/defining systems, and with the infamous SymbolGroundingProblem?.
Can we proceed in a constructive direction? -- Thanks, Doug
Actually it's really simple. Take a strcpy implementation
For certain cases, this has what is called "undefined behavior", whereas in a safe language it would have "defined" behavior in terms of an exception being thrown. But the distinction of "defined" versus "undefined" is justified if and only if
you regard the meaning of that block as a function
from the states of the program before entering the block to the states of the program after the block. But this is acceptable only if you live with the prejudice that programming languages (and C is an excellent example of that mentality) can only be specifed operationally (the behavior of an executing machine).
Whereas, for example, if you want to see the meaning of the statement as a binary relation
on the set o states, or as a predicate transformer, then the meaning of the statement is perfectly well defined and it's not different in essence than any other "well defined" statement. In this case if dest is null, the relation takes any state before the block to all possible states after the block, or in terms of predicate transformer the correspondence takes the predicate on the previous state simply to the predicate true. And since Dijkstra's claim that determinism is at best a distraction and we ought to think non-deterministically, appears to be more and more correct with the ubiquity of multi-threading and distributed programming, it is natural to drop the idea of viewing semantics in terms of an abstract machine and what is the "behavior" (i.e. what's the function
from the previous state to the next state). And we might drop the misleading terminology as well, while we're at it.
I'm not sure that replacing an operational specification with a functional specification helps--partial functions can and do exist, after all. Virtually every implementation of strcpy that I've seen on a modern system will behave sensibly in the presence of null pointers; where strcpy() has trouble is when it gets passed pointers to something which is neither NULL nor a null-terminated string (see NonNullTerminatedString, WildPointer). Unlike NULL, which is trivial to detect; these cases cases are very difficult to detect and impossible to prevent given the other design constraints and architectural decisions present in C (the presence of PointerArithmetic, for instance, makes any sort of reasoning about pointer types impossible. Which is why, outside cases like traversing arrays, pointer arithmetic itself is undefined; you do it and all bets are off).
In one sense, the behavior of strcpy is "defined" regardless--it iterates through memory, copying bytes, until something stops it. The effect on a running system of such behavior are, of course, impossible to predict. Hence, "undefined".
As Doug has pointed out elsewhere, on some systems undefined behavior does get defined, in which case it may be taken advantage of (at a cost of portability). UndefinedBehavior is, in a sense, in the eyes of the beholder--or the specification writer.
Please read again. The behavior of strcpy is "undefined only if you think in terms of operational semantics, or in your terms "the behavior is impossible to predict",. for example if the pointer is null (no, it doesn't necessarily iteratate through the memory). But you want to constrain the notion of "predicting behavior" to the defininition a function
(or partial function, leaving points like the null case as "undefined") on states of the program, taking the previous state to the next state. But this is the wrong state of mind, at least according to EwDijkstra
and many others (including TonyHoare
). The definition should be viewed as a relation, a predicate transformer, or something equally suitable, such that
you can no longer say that the above is "undefined". A relation
on states is perfectly well defined as a mathematical object, even if it takes the initial state to all possible states, whereas a function
would be "undefined" in such a case. -- CC
Ok. And then how does your point interact with TonyHoare
's point quoted below, e.g. when he says "the most important attribute of a formal definition of semantics should be to leave certain aspects of the language carefully undefined."? -- Doug
The honest answer is that I don't know but I do not think that it matters. If I was to speculate, he was probably addressing a specific audience in a certain age of computer science for which "defined" means the function from the old state to the new state, and anything less than that would have been "undefined". --Costin
You're assuming he changed his mind over the years and that I'm asking you to guess what he might have meant in 1964. That is not the case.
That paper is from 2001, and he's talking about his interest in formal semantics over the years, and he gives every indication that he believes today what he said in 1964, and furthermore he explains why he believes that.
So, given that he means today exactly what he said: "leave certain aspects of the language carefully undefined", how do you reconcile the fact that you are disagreeing with him? -- Doug
An example of UndefinedBehavior
- "Division by zero is described as undefined behavior rather than as setting errno to EDOM. The program can just as easily check for a zero divisor before a division as for an error code afterwards,and the adopted scheme reduces the burden on the function."
How do standards formally define UndefinedBehavior
? The C99 standard says:
#1 undefined behavior
: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.
#2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
#3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.
#1 In this International Standard, "shall" is to be interpreted as a requirement on an implementation or on a program; conversely, "shall not" is to be interpreted as a prohibition.
#2 If a "shall" or "shall not" requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined".
#3 A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 188.8.131.52 [which describes how the externally visible side effects of the program must correspond to the behaviour of the C abstract machine].
Some aspects of these definitions are problematic:
- The terms "nonportable", "erroneous", and "correct" are not defined. Their 'common sense' meanings don't fit: "erroneous" and "correct" are relative to a program specification, and the behavior of a CeeLanguage implementation for a given program shouldn't be dependent on the program specification (in fact can't be, since the specification is not an input to the implementation). The usual meaning of "nonportable" is "doesn't work on a wide enough range of platforms".
- The rule that UndefinedBehavior is the default when the standard "[omits] any explicit definition of behavior" is a seriously bad idea. It results in cases where the standard is simply vague being interpreted quite subjectively: one group of people will consider the behaviour to be undefined because it's not defined well enough; another group will try to interpret it in the way that they consider most reasonable. Not specifying behaviour is also a really poor way of expressing situations where the implementation is intended to be able to behave nondeterministically -- those are precisely the situations in which a programmer needs to know what the extent of the possible nondeterminism is. If you're developing a language standard, don't use this rule.
- Paragraph 4 #3 is vaguely worded ("in all other aspects??"), but is important because without it nondeterministic programs are essentially second-class. Many people misinterpret Standard C as only imposing requirements on an implementation for "strictly conforming" programs, which are always deterministic.
- The standard doesn't say precisely when undefined behavior is allowed for a program that may or may not use an undefined construct depending on its input. For instance, is the implementation required to operate correctly up to the sequence point or last externally visible side effect before the undefined behavior actually occurs, or can it fail as soon as undefined behavior becomes "inevitable"? An earlier C99 draft said "The implementation must successfully translate a given program unless a syntax error is detected, a constraint is violated, or it can determine that every possible [emphasis added] execution of that program would result in undefined behavior.", but this wording was removed.
People are sometimes unhappy (sometimes to the point of moral outrage) with the existence of UndefinedBehavior
; why does it exist?
The famous formalist Tony Hoare was an early and influential advocate of UndefinedBehavior
. Here's what he said in a 2001 paper "Assertions: a personal perspective": http://research.microsoft.com/%7Ethoare/6Jun_assertions_personal_perspective.htm
- In 1964, a conference took place in Vienna on Formal Language Description Languages for Computer Programming... I did not make a presentation at the Vienna conference, but I did make one comment: I thought that the most important attribute of a formal definition of semantics should be to leave certain aspects of the language carefully undefined. As a result, each implementation would have carefully circumscribed freedom to make efficient choices in the interests of its users and in the light of the characteristics of a particular machine architecture. I was very encouraged that this comment was applauded.
- I drew the analogy with the foundations of the various branches of mathematics, like projective geometry or group theory; each branch is in effect defined by the set of axioms that are used without further justification in all proofs of the theorems of that branch. The axioms are written in the common notations of mathematics, but they also contain a number of undefined terms, like lines and points in projective geometry, or units and products in group theory; these constitute the conceptual framework of that branch.
- The pure mathematician deliberately gives no explicit meaning to the undefined terms appearing in the axioms, theorems and proofs. It is the task of the applied mathematician and the experimental scientist to find in the real world a possible meaning for the terms, and check by carefully designed experiment that this meaning satisfies the axioms. The engineer is even allowed to take the axioms as a specification which must be met in the design of a product, for example, the compiler for a programming language. Then all the theorems for that branch of pure mathematics can be validly applied to the product, or to the relevant real-world domain. And surprisingly often, the more abstract approach of the pure mathematician is rewarded by the discovery that there are many different applications of the same axiom set.
- By analogy, there could be many different implementations of the axiom set which defines a standard programming language. That was exactly the carefully circumscribed freedom that I wanted for the compiler writer, who has to take normal engineer's responsibility that the implementation satisfies the axioms, as well as efficiently running its users' programs.
C9X Rationale document seems to be in agreement with Hoare's position; it says:
"The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe."
"The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard. Informative Annex K of the Standard catalogs those behaviors which fall into one of these three categories."
"Unspecified behavior gives the implementor some latitude in translating programs. This latitude does not extend as far as failing to translate the program. Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior."
"Implementation-defined behavior gives an implementor the freedom to choose the appropriate approach, but requires that this choice be explained to the user. Behaviors designated as implementation-defined are generally those in which a user could make meaningful coding decisions based on the implementation's definition. Implementors should bear in mind this criterion when deciding how extensive an implementation definition ought to be. As with unspecified behavior, simply failing to translate the source containing the implementation-defined behavior is not an adequate response."
This does not explain why UndefinedBehavior exists. Many languages either eliminate undefined behavior entirely or limit it to "unsafe" constructs that are used infrequently. The argument normally used to explain the extent of undefined behavior in C is based on performance. It is not clear that this argument is supported by actual performance measurements, rather than an ideological position that giving the implementation "more freedom" is necessarily a good thing.
It does not explain well enough for a skeptic, true, but such standards are not aimed at armchair skeptics, they are developed by and aimed more at companies that are compiler developers, and who fight tooth and nail against ideology that they believe will hurt them in the marketplace.
In that vein, it is simply incorrect that "It is not clear that this argument is supported by actual performance measurements". Very wrong. The example near top of page from the C99 standard was that behavior on integer overflow was undefined, and that is a classic example of something where programs in general
would otherwise suffer severe performance hits on at least some CPUs that some compiler developers support, and this is overwhelmingly supported by a vast amount of quantitative analysis over decades.
And note that, contrary to reporting-with-alarm claims about the scariness of UndefinedBehavior
, an integer overflow in C will simply result in a value that may (or may not) be different than the programmer expected, i.e. a bug in that program, not
necessarily an instant damage to the overall system state or even the program state...and in fact, it's not uncommon for programmers to allow ints to overflow on purpose, or to not particularly care that they might in theory because they won't in practice, etc.
There may be some areas of C99 that are undefined where it is not as overwhelmingly clear as with that example, but in general, areas that are explicitly undefined are thus due to fierce push-back from actual compiler developers.
One good metric of a system (especially a programming language) is its ability to detect and diagnose undefined behavior.
What would that mean? If the language requires behaviour to be detected and diagnosed, then it is not undefined. This may be a good metric of programming language
(Such a system should do something useful for the developer, such as throw an exception or abort the program). Trouble is, many types of undefined behavior are expensive to detect at runtime, and/or impossible to detect at compile time (HaltingProblem
The HaltingProblem is a red herring. Many languages have few or no cases of undefined behaviour (e.g. JavaLanguage, HaskellLanguage), and the HaltingProblem is simply not a constraint on this. The only thing that is really difficult to make absolutely deterministic is concurrency (not impossible; see DeclarativeConcurrency as described in ConceptsTechniquesAndModelsOfComputerProgramming). Even then, in a well-designed language the nondeterminism introduced by concurrency should cause behaviour that is only unspecified, not undefined.
Which is why the behavior is often left undefined; it would be burdensome to require implementations to handle the behavior in a particular way.
This is an opinion often expressed to justify the truly ridiculous degree to which the CeeLanguage standard allows UndefinedBehavior, but it has little empirical support.
You clearly have not developed C compilers. It's got plenty of empirical support, as described above. You're merely doing armchair ideology.
What is the Problem With Undefined Behavior?
There are a lot of situations where it is perfectly reasonable to ignore the situation primarily because it is unlikely to occur. The risk associated with the problem is not worth the cost of preventing it. If something within the flow prevents a zero divisor, than it is perfectly acceptable to have the result of a divide by zero being an undefined operation. Over-specification is merely a waste of resources.
UndefinedBehavior in general doesn't just affect the result of any particular program construct; it puts the system in an unstable state from which no recovery is necessarily possible. In practice operating systems (and operating system standards such as POSIX) limit the extent of undefined behaviour to satisfy their security model; this limitation is essential for the attitude expressed above to be even vaguely reasonable under any circumstances. In practice security flaws due to UndefinedBehavior are common even if the OS security model is maintained. -- DavidSarahHopwood
In the worst
case it might "put the system in an unstable state from which no recovery is necessarily possible", definitely not
in the "general case", and not even in the typical or frequent cases -- e.g. considering the above example that integer overflow is undefined under C99, which usually causes only rare and minor bugs, and considering that damage from bugs is typically limited even in theory, not just "in practice". The rest of your paragraph I agree with.
I ususally find, though, "undefined behavior" is a fact of life that one must live with. In risk management terms, this is acceptance of the risk. There is always a trade-off to determine how fully one will define a system before starting on it. Often, it is sufficient to define the behavior within a determined set of operating characteristics; effort on defining the behavior outside of the specified operating range is of arguable value. Adding to the difficulty is that most undefined behavior falls into the category of unknown undefined behavior, people are not omniscient and are usually unaware of what has not been defined. We deal with undefined behavior-like situations constantly, having to deal with them in programming is no different.
I think there are two classes of undefined behavior. There is weakly undefined behavior (which might cause odd evaluation of an expression) and there is strongly undefined behavior (which in the absence of memory protection really could format your harddisk).
i = i++;
strcpy((char *)strcpy, "Slowly I turned, step by step...");
In the case that you don't have memory protection to abort the program, try to imagine what the second version might do.
In practice you are correct: there are two classes of undefined behavior. But this is only because you can trust your compiler writers to not be outright malicious. There is little more than good sense and laziness preventing compiler writers from saying: "Hey! This action is undefined! That means I can do anything I want, right?" I'd hate to discover by accident what 'undefined behavior' means to an InterCal compiler.
You should perhaps consult the famous article on undefined behaviours: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
. In short, compiler is in its full right to assume that the undefined behaviour doesn't happen, so the compiler might draw some assumptions about i (e.g., that it's always positive) and optimize out some runtime checks. The cited article has enough of real-world examples.
Most discussion on this topic revolves around CeeCeePlusPlus
. Remember, this started out as an extremely lightweight compiler--so lightweight, in fact, that the lint program was introduced about ten years after the compiler was written to do static code analysis. Sure, today we've got our Gnu compilers with seven levels of warning and javac backed by a huge corporation from the get-go, but back then it was pretty much DennisRitchie
bootstrapping his own compiler. In effect, "undefined behavior" means that compiler developers don't have to write special code to handle or even detect various edge cases. Back then, this was a blessing. Nowadays, not so much. That being said, that term does specifically mean that it is permissible to detect all such behavior and respond by producing a compiler error at compile time or crashing at runtime, to keep developers from wandering too close to the edge.