I'm touching this to draw it to the attention of those performing the 2006 spring cleaning.
Here are some helpful RulesForUsingHyphensAndDashes, excerpted from
"The Trouble With EM 'n EN (and Other Shady Characters)" article by Peter K. Sheerin
In the excerpted text, en dash and em dash have been replaced by (UniCode) U+2013 "–" and "--", respectively, and non-ascii quotes and apostrophes have been replaced by ascii ones.
Hyphens are Not Dashes
Stop! Go back and re-read the subhead above--at least 2–3 times--then let it sink in before continuing.
The sentence above illustrates the proper use of the hyphen and the two main types of dashes. They are not the same, and must not be confused with each other. In some fancy fonts the difference is more than just the width--hyphens have a distinct serif. If you don't know the rules already, let's review them. First, though, a definition:
" is a unit of measurement defined as the point size of the font--12 point type uses a 12 point em. An "en
" is one-half of an "em".
Though some of the finer points in the rules are complex, their basic applications are clear-cut and their misuse easily identifiable. First, neither an em dash nor an en dash should be confused with the hyphen (-), which is used to join compound words together.
The correct use of em and en
The em dash
(— U+2014) is used to indicate a sudden break in thought ("I was thinking about writing a--what time did you say the movie started?"), a parenthetical statement that deserves more attention than parentheses indicate, or instead of a colon or semicolon to link clauses. It is also used to indicate an open range, such as from a given date with no end yet (as in "Peter Sheerin [1969--] authored this document."), or vague dates (as a stand-in for the last two digits of a four-digit year).
Two adjacent em dashes (a 2-em dash) are used to indicate missing letters in a word ("I just don't f----ing care about 3.0 browsers").
Three adjacent em dashes (a 3-em dash) are used to substitute for the author's name when a repeated series of works are presented in a bibliography, as well as to indicate an entire missing word in the text.
The en dash
(– U+2013) is used to indicate a range of just about anything with numbers, including dates, numbers, game scores, and pages in any sort of document.
It is also used instead of the word "to" or a hyphen to indicate a connection between things, including geographic references (like the Mason–Dixon Line) and routes (such as the New York–Boston commuter train).
It is used to hyphenate compounds of compounds, where at least one pair is already hyphenated (as in "Netscape 6.1 is an Open-Source–based browser."). The Chicago Manual of style also states that it should be used "Where one of the components of a compound adjective contains more than one word," instead of a hyphen (as in "Netscape 6.1 is an Open Source–based browser"). Both of these rules are for clarity in indicating exactly what is being modified by the compound.
Other sources also specify the use of an en dash when referring to joint authors, as in the "Bose–Einstein" paper. Some also prefer it to a hyphen when text is set in all capital letters.
Some typographers prefer to use an en dash surrounded by full spaces instead of an em dash. Others prefer to insert hair spaces on either side of the em dash, but this is problematic with some web browsers (see the section on spaces for more detail).
hyphen you can insert with the key next to the zero on your keyboard is an ambiguous character suffering from an identity crisis. It can't decide if it's a hyphen, a minus, or an en dash--in fact, the Unicode specification describes it as "hyphen-minus" and defines very specific replacements for each of its personalities.
Use it if you need to insert a hyphen, but never for a minus (− U+2212) or a dash, since it does not have the correct width for either, or the vertical position for the latter. Compare:
- 1+4-2=3 (hyphen-minus)
- 1+4−2=3 (UniCode minus sign)
There are exceptions to this rule--particularly, if you are writing or editing a C2 Wiki page, you should never
non-ASCII characters, which sadly leaves us with no choice but to use the "hyphen-minus" for all vertically centered lines. Please use a space-surrounded double-hyphen instead of an em dash and a double-hyphen without surrounding spaces instead of an en dash.
- I don't believe these rules are set in stone for C2. As of May 2006, the server encodes pages in UtfEight, so any UniCode characters should be fair game.
The soft hyphen
(­ U+00AD a.k.a. "soft hyphen", "discretionary hyphen" or "optional hyphen"
) is to be used for one purpose only--to indicate where a word may be broken at the end of a line. Otherwise, it is to remain invisible and not affect the appearance of the word.
Some browsers display it no matter where it falls, but this is not the correct behavior. Others in the past have recommended against its use because its behavior was not well-defined, but the HTML 4.01 spec makes its use and behavior clear and unambiguous.
Three other hyphen characters exist in UniCode
, but are unfortunately not defined in the HTML entity set (although they should be):
- U+2011: the non‑breaking hyphen (‑ not in HTML) does just what its name implies.
- U+2010: the hyphen character (‐ not in HTML) is meant to be used in place of the hyphen‐minus when a hyphen is exactly the desired character.
- U+2027: the hy‧phen‧a‧tion point (‧ not in HTML) is that bullet-like character you find in some dictionaries to separate syllables. That is its only use, but if you're creating an online dictionary, using it will make your entries look more professional.
character entity references
- In general, inferences made from Unicode character names should be treated with care, because these names cannot be changed (http://www.unicode.org/standard/stability_policy.html) and so may be inaccurate. In this case the name is accurate, though; the UnicodeStandard? says:
- Because of its prevalence in legacy encodings, U+002D HYPHEN-MINUS is the most common of the dash characters used to represent a hyphen. It has ambiguous semantic value and is rendered with an average width. U+2010 HYPHEN represents the hyphen as found in words such as "left‐to‐right." It is rendered with a narrow width. When typesetting text, U+2010 HYPHEN is preferred over U+002D HYPHEN-MINUS. [...]
- U+2013 EN DASH is used to indicate a range of values, such as 1973–1984. It should be distinguished from the U+2212 MINUS SIGN, which is an arithmetic operator; however, typographers have typically used U+2013 EN DASH in typesetting to represent the minus sign. For general compatibility in interpreting formulas, U+002D HYPHEN-MINUS, U+2012 MINUS SIGN, and U+2212 FIGURE DASH should each be taken as indicating a minus sign, as in "x = a ‒ b."
- U+2014 EM DASH is used to make a break—like this—in the flow of a sentence. (Some typographers prefer to use U+2013 EN DASH set off with spaces – like this – to make the same kind of break.) This kind of dash is commonly represented with a typewriter as a double-hyphen.
Many people prefer character entity references over numbers.
The official list (http://www.w3.org/TR/html4/sgml/entities.html
- – == – == en dash, U+2013
- — == — == em dash, U+2014
- − == − == minus sign, U+2212
- ­ == ­ == soft hyphen = discretionary hyphen, U+00AD
for a few more popular entities.
(Some browsers seem to break all the em-dashes in the previous text. I tried to fix them... given the limitations of "Using Hyphens and Dashes in plain US-ASCII text".)
The damage done to the characters is actually pretty funny, and also interesting. It worked (on my browser) when I first put it up (posted from Netscape V7.1 on WinXP). Somewhere along the way, the chars in question got turned into "?"'s. It tempts me to play with recognizing and converting the chars, like tab replacement, on my own wiki. It certainly shows how non-uniform the browser world is.
) changed the above text to use only ASCII characters. It should now work in every browser.
Some non-ascii en dash characters since reinserted.
Even more UtfEight
dashes were inserted, now that C2 is reporting the UtfEight
encoding on its pages.
(moved from AnalRetentive
From the editorial guide at h2g2 (http://www.bbc.co.uk/dna/h2g2/alabaster/SubEditors-Style
I'm not naming any names but someone here at h2g2 is extremely fond of hyphenation in all the right places. The compound adjective is just one of the many grammatical areas that you can use hyphens. Here are some examples:
- 'A well-kept garden' (but 'the garden was well kept')
- 'Quick-witted response' (but 'the response was quick witted')
- 'Beady-eyed neighbour' (but 'the neighour was beady eyed')
- 'Broad-shouldered workers' (but 'the workers were broad shouldered')
- 'Tight-lipped confidante' (but 'the confidante was tight lipped')
- 'Madonna-like pose' (but 'the pose was like The Madonna's')
- 'Italian-style, full-bodied coffee' (but 'the coffee was Italian styled and full bodied')
- And so on...
Hyphens should also go after some prefixes, as in 'Sub-editor', 'co-operative', 'pre-1999', 'post-reformist', 'neo-classical'. Stuff like that. Words starting 're' should not be hyphenated unless the first letter after 're' is a vowel, so it's 're-use' and 're-educate' but 'remastered' and 'rekindled'. Alternatively, words starting 're' should not be hyphenated except before stems beginning with 'e' ("reuse" but "re-educate"), or to disambiguate between established words and more active prefix use: "react" vs "re-act", "reform" vs "re-form", "resign" vs "re-sign" or "relay" vs "re-lay", for example.
There are loads of other hyphen rules which I won't go into here, but do be aware of using too many hyphens, as in 'an-oh-so-efficient-manner', because really long hyphenated phrases don't wrap around on the screen. Put the phrase in quotes instead, and remove the hyphens. So it would be 'alternative rock meets classical music', rather than alternative-rock-meets-classical-music.
See also: http://www2.ncsu.edu/ncsu/grammar/Hyphen3.html
Why is all this stuff about hyphens a sign of anal-retentiveness?
English is a fairly complex language, but many of those features aid the subtlety and rhythm of what you're trying to express. Misuse those features and you make it a ham-handed, blunt language. That's no fun.
U shud let ppl right how-ever thay want. Lotz a stoopidd rools jest git n th' waye of comyunicayshun.
Ys. nglsh cn b usblae eevn wthi ltos fo erorrs, ti hsa exclnt redndcy
Once again, reductio ad absurdum
is the last refuge of those who cling to the apron strings of prescriptive grammarianism.
Am I the only person who finds it unbelievably AnalRetentive
to worry about things like this? I'm quite happy using my - key to represent a hyphen, em-dash, en-dash, arithmetic minus symbol, indication of a lengthened vowel in Romaji, or anything else. It's just a horizontal line, really.
Programmers are AnalRetentive about one set of things, typographers something else, musicians something else again. If it doesn't matter to you, that's fine. I think the point is that it does matter for some, and it's painless for everyone else.
The point that designers (and other people) make about such typographic matters is that these things have a subtle difference on how readable text is, so getting this thing right is a matter of communication etiquette. What's at stake is not how easy it is for you to write it, but how easy it is for others to read it.
Used to be that nobody worried about this since people didn't produce that much text that didn't go through the hands of the few (designers, editors, type-setters) who did worry about such things. Things are different today, of course, what with zines and blogs and everything else. More freedom, more rules to be conscious of. Funny, that.
At any rate, the utility of this particular wiki page is questionable, since a) most of its content is an extended quote from another site, ... and c) this wiki doesn't seem to let you enter HTML entities anyway. -- francis
- I think the party line with non-ASCII characters (en dash, em dash, curly quotes, accents, etc.) is that there's no guarantee that they'll be sent or received correctly. I'd guess off the top of my head that the text encoding that the c2 Wiki uses is different by default than the text encoding that Netscape 6/Win XP uses. Or something like that. These issues can be pretty gnarly to work out, which is why you'd rather use entities if possible. -- francis
I always consider claims about "readability" to be suspect - but I suppose that's because as a programmer, I'm used to staring at reams of monospaced type. Basically, I just don't see that the length of a horizontal line matters much to the reader when trying to interpret the text. When I was writing things out by hand for high school and university assignments, I certainly never paid any attention to the length of the dashes I drew.
I can't speak for everybody, of course, but I do notice when dashes are used incorrectly, and it does slow me down a tiny bit. Just like I notice when people misuse colons, semicolons, quotation marks, bold text, italic text, the subjunctive tense, etc., etc. I never paid attention to these things when I was writing high school assignments either, but then, the audience was much smaller. Maybe it's the same as the difference between having lunch with your friend and speaking in front of a group of 100 people -- you're more likely to prepare your comments if you're speaking in front of a group.
English, like every other language for human beings, is complex and subtle. Far more complex and subtle than computer languages. These rules don't exist to make grammarians feel smart: They exist because having them in the language makes the language richer and more expressive. Of course, there are always tensions, and languages aren't fixed conventions at any rate. Some English speakers may completely have no awareness, conscious or subconscious, of any difference between dashes, just like some English speakers don't seem to notice when billboards use "quotation marks" for "emphasis". So it may depend on who you imagine your audience to be. Personally, I like to err on the side of more subtlety and more texture, but that's just me. -- francis
["Subjunctive tense"? Surely some mistake: the subjunctive mood has many tenses. By the way, I like to double-space after stops and colons, which was the "correct" thing to do in mono-spaced, typewritten text (typesetters have en-spaces, em-spaces and multiples thereof, of course). MS-Word and this Wiki conspire against me, however.]
Basically, I just don't see that the length of a horizontal line matters much to the reader when trying to interpret the text.
I coincidentally ran across a surprizingly clear example in this morning's Boston Globe, in the weather section. The expected low temperature for Thursday is the range of minus 6 to plus 1 degrees F. How shall that be represented?
With the same symbol representing "minus" and "interval" (never mind the ironic absence of the degree symbol), what do we get?
-6-1 <degrees> F
-6 - 1 <degrees> F
- 6 - 1 <degrees> F
For comparison, in UniCode:
−6 – 1 °F
Even though the difference between an en-dash and a minus sign is subtle, the rendering on this morning's newspaper page was certainly more readable than ANY of those three choices. This is why God gave us typesetting. When the topic is readability, I fear that we tread dangerously close to the line of arrogance when we, as programmers, begin scorning the accumulated knowledge of many generations of typographers as "trivial" or "anal retentive". This might
be an area where they know more than we.
Rechecking the story, and it's even funnier -- though I don't know which side it supports. So it must be good data. The range is actually minus six to minus one. The typesetter (I'm guessing an automated one) represented it as:
leading to the WRONG conclusion, namely, that the temperature range was from minus six to plus one. Perhaps it started as:
and then some text macro replaced the doubled hyphen with an <em-dash>, leading to just the wrong answer. The entry should probably have been
<minus sign>6<space><en-dash><space><minus sign>1
Or perhaps: -6 to -1
Using Hyphens and Dashes in plain US-ASCII text
Sometimes the em dash (—) and en dash (–) entities are unavailable -- for example, when editing text on wiki that don't do HTML ( WhyDoesntWikiDoHtml
), or to make web pages that are understandable even on ancient web browsers.
We are forced to substitute ordinary ASCII "-" for en and em dashes.
Given these forces
- typographical em dash (—) looks longer than en dash (–), and
- typographical dashes of both flavors have little or no apparent space around them,
some people (see Proofing Guidelines http://www.pgdp.net/c/faq/document.php#em_dashes)
- substitute "-" for en dash (–), and
- substitute "--" for em dash (—), and
- don't add whitespace (newlines or blanks) around them
to try to duplicate the appearance of printed text as closely as possible.
It's OK if a hyphenated word is word-wrapped as if it were a single word.
However, it looks odd when the separate words "connected" by the em dash (—) are word-wrapped as a single word. Text is easier to read and breaks better in different windows widths if spaces surround dashes.
- always substitute " -- " (double dash with spaces) for em dash (—), or
- always substitute " - " (single dash with spaces) for em dash (—)
Some people (at least here on WardsWiki
- ed) seem to be standardizing on a complicated combination:
- use single dash " - " in text to mark parenthetical remarks
- double dash before signatures (where used).
One might argue that for new work in a new medium there is little to be lost and much to be gained by using the single ASCII dash for sub-clausing (where desired).
(moved from ElizabethWiethoff
...[D]espite active discussion where I defended using double hyphen to represent em-dash, some stubborn gnome commonly combs through pages and changes all double-hyphens (except those in signatures) to single-hyphens, which I find annoying, since my practice most certainly is not incorrect.) -- DougMerritt
I notice 184.108.40.206 is fond of changing double hyphens to single hyphens. So is 220.127.116.11. Might be the same person.
It's not a big deal to me, but I think (non signatory) double hyphens fool the SignatureSurvey
script. -- JoeWeaver
(OK to move this "hyphens" section to RulesForUsingHyphensAndDashes? Is there active discussion anywhere else that needs to be refactored to that page?) -- DavidCary
Great idea. Go ahead and move this hyphen discussion. BTW, it's not exactly "active"; it's been sitting here for almost two months. Oh, I'm not aware of hyphen discussions elsewhere at c2. -- Eliz
Is there a page for "trying to make readable approximations to symbols that aren't available in US-ASCII, not even in ISO 8859-1", in general, not just dashes?)
(moved from MoreAboutHyphensAndDashes?
Hyphen, en-dash, em-dash, and minus are each different characters. In type-written text, a long-standing convention has been to use a doubled hyphen to indicate an em-dash to the typesetter.
(moved from WikiEditingCustoms
Debate about hyphens, en-dashes and em-dashes
In an earlier version of WikiEditingCustoms
, one of the tips specifically suggested using a single "-" to represent en-dashes and em-dashes as well as hyphens. Much argument ensued. Since the suggestion was unrepresentative of predominant existing practice and controversial, I've removed it from the list; the ensuing discussion is now below here. -- GarethMcCaughan
(signed for accountability) in WikiGnome
- Q: Why avoid use of two hyphens elsewhere?
- A: Each hyphen has the same meaning, so the second is redundant. However, the double hyphen has often been used for annotating lists, and may be considered acceptable for that purpose. See also MoreAboutHyphensAndDashes? -- doubled hyphens are surely preferable to eliminating the distinction between em-dash, en-dash, and hyphen altogether.
It may well be the case that double hyphens are sometimes used to indicate an em dash to typesetters, but Wiki does not recognize them specially at the moment
, nor does it support unicode. Hence, we're stuck with the fact that we can't enter codes that work well in browsers not running under MS Windows. The choice is simply between a rather short line and a poor simulation (two or three consecutive short lines) of a longer line. The simplest course is to use the single short line most of the time.
You're missing the point. Some of us have been using double minus in pure-ascii systems as a simulation of em-dash for decades. We don't expect Wiki to recognize them! But we also see absolutely no reason for them to be forbidden. In fact, especially not, since Wiki doesn't mind them.
I understand that, but that leaves you without a simulation (other than hyphen) for an en-dash. Moreover, even those (including myself) raised on ascii are not necessarily familiar with the precise rules as to when an en-dash is correct, rather than em-dash (but see RulesForUsingHyphensAndDashes)
. At best, double hyphen is a simulation of an em-dash. Isn't it simpler to use a shorter version of what is being simulated, given that what is typed is not going to be converted to a true dash?
Ummm...excuse me, but that seems to boil down to your reasoning for why you don't want to use either a real em-dash or a simulated one. I fail to see that you have addressed the subject of why others should not use simulated em-dash.
[Different voice] I think this is a case of "Things should be as simple as possible, but not more so." A double hyphen does no harm and indicates to a text processor that might be smarter than Wiki where an em-dash belongs. It is true that the en-dash is not differentiated from the hyphen, but that is a lesser problem. The fact that this Wiki doesn't recognize and do anything with the double-hyphen today doesn't mean that Wiki won't do so in the future. Some of us may post-process this text in our own environments. The use of a double-hyphen is not a "simulation" of an em-dash -- it is, instead, a representation of it.
[Voice before last :-) ] Well said.
I wouldn't object to a real em-dash or en-dash, but the choice here is between different representations of them. I am asserting the simplest (and quickest) representation is best. I accept that Wiki might recognize double hyphen as em-dash in the future, but I think it's unlikely.
You are asserting that typing two characters rather than one is a big enough deal that "simple and quick" should be brought to bear? Seriously? (1) if you really think so, why are you typing all these extra characters on an argument that is ultimately pointless? (2) that again merely argues for why you would only want to type a single minus rather than two, and still has nothing whatsoever with why you think that Wiki style should forbid the rest of us from typing two, as we are accustomed to doing, and as much of the ASCII world believes to be good style.
Appearance counts as well. Double hyphen is understood by many (from the "ASCII world" at least) to represent an em dash, but it has a poor appearance. In my opinion, use of an em dash not enclosed in spaces also has a poor appearance. In many webpages explaining such use, the pages themselves break this rule. Wiki is aiming to be simple to use; following the punctuation representation methods of a particular group of users is not, as far as I am aware, one of its design principles. If the "ASCII world" custom was to use three hyphens to represent an em dash, and two hyphens to represent an en dash, the custom would have greater merit. As it is, it is a clumsy half-solution, and one that I suspect most Wiki readers (as distinct from editors) are not accustomed to.
I disagree with your opinion about the aesthetics; I think they looked great even on old vt100s, and they look better these days where many of the systems I use blend two minuses together into one apparent long dash. And I personally always surround those two minuses with spaces. Your arguments continue to seem to boil down to simply "I don't want to do it that way". Fine. Don't. But there's still no apparent reason to request other people not to.
It sounds like you're just not used to this way of doing things, and so you automatically think it should be forbidden. Which reminds me of a great quote: "Caesar: Pardon him Theodotus; he is a barbarian, and thinks that the customs of his tribe and island are the laws of nature." (Caesar and Cleopatra, Act II George Bernard Shaw). Heh. :-)
When using a manual typewriter, I have sometimes created a dash by typing two hyphens, with slight adjustment of the spacing so that they join. If there were ways of creating em and en dashes which would be displayed correctly by nearly all browsers, I would use them. In most cases, two hyphens are shown as such, and may be split at a line end (in the editing box or in the rendered page). In those cases, the appearance is poor in comparison with a true dash. A single hyphen is not ideal in every respect, but is the simplest method. Shaw was having fun - Caesar knew the law of Rome was a serious matter, and not the law of nature.
Ultimately, it doesn't matter either way; this is like the old rule that one should put two spaces between a period and the start of the next sentence, or that one could not append a comma to the penultimate item in a list. Stylistic rules change with time (and geography and local custom), and neither the ones I just cited nor the thing we're arguing about is really very important. Few readers will notice one way or the other, fewer still will care if they do notice. Considering that, and considering that we simply seem to disagree on a minor aesthetic matter, let's just agree to differ.
I never said it matters in that sense. I originally said "a single hyphen normally suffices", not that a repeated hyphen was forbidden. As to whether use of the doubled hyphen is preferable to eliminating the distinction between em-dash, en-dash, and hyphen, we can agree to differ - it depends on the relative valuations of the stylistic goals. However, one needs to be aware of possible problems if pasting in text including a dash-like character copied (to the clipboard) from some other place.
I am of course aware of problems with incompatible character sets, but are you saying that pasting a double-minus can cause problems? How?
[No - just pasting text containing an en-dash or em-dash.]
[[Different Different Voice]] To First Voice: You're losing. You keep falling back from one weak argument to another. I had created a nice list of your arguments, but they were lost to the aether of the Web. Anyway, here is my convention for distinguishing between a hyphen, an en-dash, and an em-dash: A hyphen is a single hyphen-character that is surrounded by no whitespace, e.g. "hyphenated-word." An en-dash is a single hyphen-character surrounded by whitespace on either side, e.g. "July 7 - July 10." An em-dash is a double hyphen-character regardless of whitespace (although I prefer to have spaces on either side), e.g. "Stop -- don't go further." I doubt that this convention is explicitly stated anywhere, but I think it is reasonably well accepted.
Truth is that the true em-dash is pretty well redundant. [[Different Different Voice Again]] What are you getting at -- do you mean that it can be losslessly replaced by semicolons, parentheses, or other punctuation? Although at times I've been inclined to agree with that sentiment, I think that it exists in our typographical repertoire for a reason, and it provides a different feeling than a semicolon or other replacement punctuation would. An em-dash seems to indicate flighty reasoning or at least rapid-fire thought on the part of the writer; a sense of immediateness, a sense of "I just thought of this, and it doesn't fit very well with the sentence I've already started, but I'm going to interject it anyway." On the other hand, a semicolon almost always gives a sense of being a well-thought-out addition to a sentence, with no way to indicate a "returning" to the original subject (i.e. semicolons can't be used to create parenthetical statements). Proper parenthetical statements enclosed in parentheses typically are connoted with being incidental or trivial, while an em-dash parenthetical can often contain the main point of a sentence.
I meant that an em-dash is redundant because an en-dash will do instead.
[[DDVA]] Well, here's something that I just found that is quite informative and, I should think, pretty authoritative, especially when it comes to the Web. http://www.alistapart.com/articles/emen/
Even that acknowledges that some prefer to use an en dash surrounded by full spaces instead of an em dash.
Some discussion of this topic can be found on WikiEditingCustoms