Yoknapatawpha Crossing

Hyphenation and justification on the web

with one comment

There were a couple of recent (reckoned in academic time) articles on best practices for web typography which seem, to me, to miss an important point. And so here we are, as I fulfill the ancient role of Offended Nerd responding to Someone Who Is Wrong on the Internet.

This article deals specifically with two questions regarding hyphenation and justification on the web:

  1. Can you?
  2. Should you?

The Web Typography Gurus have answers to these questions: Yes! and, Of course! But these reflect a simplistic and biased view of the situation. The answers should be: Only with serious tradeoffs, and, Maybe.

Background

Here is the part where I try to provide a common ground for discussion. If you already know anything about typography, you should skip to the next section. The only points I make here that I haven’t seen emphasized elsewhere are

  1. The decision to justify text is purely aesthetic and does not confer material benefit to the reader, and
  2. Justification and hyphenation are separate processes which relate and feed back to each other when used together, but are not irretrievably intertwined.

If you look at whatever book you have nearest to hand, it will probably have justified text. That is, if you look at one whole page at a time, without reading the words on the page, you will see that the text fills a rectangle. The left edges of every line are flush, and the right edges of every line are flush. This effect is produced by adjusting the spacing between words in a line, and sometimes even between the individual letters. When a line has lots of characters, the typesetter squashes the spaces a bit, and when there are fewer characters they stretch the spaces out. This squash-and-stretch process is known as justification; text presented in this way is justified. No matter what.

Text does not have to be justified. You have probably been abused by Word at some point in your life, and so you know that there are also buttons for left-align (where everything lines up on the left of the page, but the right edge of the text is ragged), right-align (the opposite), and center. If you were writing in another language, you might top-align, staircase-align, or something else equally un-American and subversive. In Western writing, justification is an extremely common and traditional way to present text. Gutenberg justified his Bible, although the practice of justification predates him (as you may verify for yourself by finding pictures of basically any illuminated manuscript from before Gutenberg’s time).

The decision to justify text is entirely aesthetic and subjective (more below). I imagine many, or even most, book designers choose to justify text based largely on tradition. Even in the absence of tradition, a designer might choose to justify text in a book for very many good aesthetic reasons, one of which is that it introduces some tension into a page of text. Forcing text to fill out a regular large-scale frame creates a nice image on the page, and also necessitates the adjustment of spacing within each line, breaking up what might otherwise be a monotonous repetition of precisely the same spacing word after word.

Justification can lead to absurd adjustments of inter-word spacing, to the extent that the text becomes more difficult to read (and this is an objective and quantifiable effect; again more below). To combat this, typographers also use hyphenation, the practice of breaking one word across two lines. When hyphenation and justification are used together, there is a virtuous feedback between the two things. But it is very important to remember that they are two distinct processes. You can justify without hyphens, and you can hyphenate without justification, and you can use them both at the same time but not together if you want to make something that really looks bad. The best typesetting systems (which includes talented and experienced people as well as certain computer programs) allow these two processes to feed back to each other, so that you may hyphenate a word to achieve better line spacing five lines later, or move a word up on line to avoid a hyphen three lines previously. Donald Knuth, the designer of a typesetting system known as TeX, has written extensively and entertainingly about these issues.

As with justification, the use of a hyphen-like character to fill space did not originate with Gutenberg. Hyphens, fleurons, extended letters, and other decorative devices that helped fill out space were in widespread use by scribes for many years prior to Gutenberg.

The interesting bit

Others suggest that justification is the best, the most professional, or the most readable/legible way to present text. This is not true.

To see that hyphenated-and-justified text is not necessarily “the best,” you only have to ask the person peddling this proposition what being the best means. Then you will hear that being the best means that it’s the best-looking, the most professional, or the most legible way to present text. As for being the best-looking, this may be true in the judgement of some. But this is transparently a subjective claim, and reasonable people have great differences of opinion. Leading us to the next point.

To see that H-and-J is not necessarily the most professional, you only have to look at how professionals choose to present their work. Off the top of my head, I don’t know of any examples of mass-market paperbacks that have been presented using anything other than H-and-J. But I can come up with many examples of books produced by discerning professionals who deeply value the way their words are presented to the world that eschew H-and-J, namely: The Visual Display of Quantitative Information, Graphic Design Referenced, Designing with type. In addition, Grid Systems uses ragged-right for marginal notes; indeed Knuth discusses at some length the futility of trying to set justified text in a narrow block. And, because I know you’ve been waiting for me to bring down the hammer, Robert Bringhurst’s The Elements of Typographic Style (p 27-28) recommends ragged right when using sans-serif or monospaced fonts, or just whenever the situation demands it.

Boom! We are done. There is no retaliation in the face of Bringhurst.

The point is that although you may hear that hyphenated-and-justified text is what the professionals always use, the claim is belied by what you see when you watch the professionals at work.

As for the third claim, that H-and-J makes for the most readable or legible text by some sort of empirical measure, this is also not necessarily true, and to see this you only have to acquaint yourself with some facts. Specifically, Zachrisson finds no appreciable difference in the reading speed or comprehension of subjects when given ragged-right (evenly-spaced) versus hyphenated-and-justified text. In fact, their experiments showed that the least-proficient readers had an easier time with ragged-right text, and he refers to a master’s thesis by S.P. Powers which found that subjects read ragged-right text faster that H-and-J text. Granted, I am quoting results here from just one book; although I had trouble tracking down other sources of specific information on web, I imagine that a lot more research has been done in this field, that the questions involved are very complicated, etc. But the fact remains: a blanket statement like “hyphenated-and-justified text is more legible” is unconvincing without strong empirical evidence.

So what’s the answer to the aesthetic question, “Should I hyphenate and justify text?” The web typography gurus I’ve been reading want you to believe the answer is “Yes, always yes,” and I’ve endeavored to convince you otherwise, that the answer is “Maybe. Different situations demand different solutions.” The facts of the matter indicate that this decision is purely subjective, that neither style of typesetting is better or worse than the other in any sort of measurable way. And that’s the end of the important part of this article.

Specific comments

But I’m still talking! Subservient to the aesthetic question comes the technical question, “Is it possible for me to hyphenate and justify text in the medium I’m using?” And, in this magical future, if you are publishing on the web the answer is now yes. But the solutions aren’t (yet) worth the price you pay.

Specifically, you can tell a web browser to justify just by flipping a CSS switch, and Fink points out the Hyphenator.js library as a solution for getting good hyphenation. “Good” apparently means “hyphenation the same way that TeX does it,” and that is an unfortunately misleading way of framing the whole issue, because, as I pointed out at the beginning, hyphenation and justification are two different things, and Hyphenator.js tackles only one side of the equation. Giving a web browser five hundred million additional potential line breaks does no good if the browser doesn’t use the options it had in the first place intelligently, and that is where the problem lies. Web browsers today generally do not have good justification algorithms; at the time of this writing it was very easy to see this by going to the Hyphenator.js example page and just checking it out for yourself in Firefox, Safari, Chrome, etc. They all did sort of funny things with the word spacings; some browsers didn’t even space words evenly across the line. Without a good justification engine, it doesn’t matter how many potential line breaks you provide to a web browser; it will still produce unappealing text.

The way to address this issue is not with better hyphenation libraries; it’s with better justification algorithms. And in fact, there is already an implementation of the TeX line-breaking algorithm in JavaScript, and an example of this algorithm used in combination with Hyphenator.js. It’s clear immediately that this produces very nice results, but this program works by using the HTML5 canvas element to precisely position and draw each word, which means that copy/paste from any page rendered with Typeset is broken, and that more generally this approach Breaks the Web. The fact that Hyphenator.js and Typeset exist is great, and they’re certainly important steps in bringing better justified text to the web. But using them involves serious tradeoffs: copy/paste, text search, and a meaningful DOM are, you know, sort of important.

Eventually these issues will pass. One day web browsers will provide industry-leading H-and-J algorithms, high-quality mathematical typesetting, and ponies. But it hasn’t happened yet, and you should not make your users suffer now in anticipation of the future.

Finally, let me close by saying that referencing Wikipedia as a primary source will only lead to embarrassment.

References

Postscript

This whole topic is … ramified. Although I’ve tried to give a balanced argument based on the facts as I understand them, I’ve also turned up a whole lot of other material that I haven’t even had time to go through yet. In case you’re interested, here’s the list so far:

Advertisements

Written by Daniel Grady

March 27, 2011 at 16:32

Posted in Uncategorized

One Response

Subscribe to comments with RSS.

  1. They WERE extra cute today! Ugh, it just makes me want to cuddle them and talk in a high-pitched idiot voice! Click http://link.mx/hool08200

    lauriefaulkner9880

    April 8, 2016 at 07:07


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: