Distributed Proofreaders: What your work could look like if it was paginated…

The Dance of Death created by Distributed Proofreaders for Project Gutenberg, with the browser in Full Screen mode.

I’m indebted to Juliet Sutherland for commenting on this blog, and especially for pointing me towards the work being done on books for Project Gutenberg by Distributed Proofreaders.
As Juliet points out, their aims are not exactly the same as mine. They are trying to “make the best of the Web as it is today”, while I and some others argue that the way to make a lot of things better on the Web is to expand what you can do on it.

Readers of this blog will know I’m a proponent of being able to do adaptive pagination on the Web – paginated content whose layout adapts to fit the screen on which it’s being viewed.
People like Joe Clark (actually, I don’t think there are any other people like Joe) would have you think that I’m a heretic, who wants to “turn the Web into outmoded print layouts.
That’s not what I’m advocating at all; what we have on the Web today should stay – but there are better alternatives for many types of “reading” content, and those should be added to what we already have, thus expanding the range of possibilities.
Then people can make up their minds what they prefer, instead of having to work within silly, outmoded constraints like “Web pages always have to scroll in a bottomless window.
As I’ve said elsewhere, that particular constraint exists only because the software engineers building the first publicly-available Web browser, NCSA Mosaic, took the easy shortcut of displaying Web content in a bottomless scrolling window in order to avoid the harder layout problem of pagination. It’s become part of the fabric of the Web, and it’s high time it was questioned.
Distributed Proofreaders has done a great job on books – working within the constraints of the Web today. Juliet pointed me to “The Dance of Death”, a 16th Century book by Holbein, with fanastic woodcuts, as an example of how their work can adapt to different screens.
Here are some thoughts on opening up the book and examining it and its markup, and some ideas on where I’d like to see this type of content be able to go in the future.
None of this should be construed as a criticism of what DP has done (although I do have minor niggles like the use of “inches” and ‘feet’ marks instead of proper typographer’s quotes). This DP book has been proofread and set with great care and thought, and if it’s a typical example, then DP is to be congratulated on a job well done.
Since DP doesn’t specify typefaces or fonts in the books it does, at first all the text appeared in Times New Roman – the default font in most browsers for pages that don’t specify fonts.
Now, TNR was a great print face in its day. It’s not very nice on screen, because it has a small x-height. And it really does look old and tired. Part of that is no doubt because it has been the default font for documents in word-processing software for decades. It has, not to put too fine a point on it, been beaten to death. But that aside, it does look “old-fashioned” – and not in a nice way…
The first thing I did was change my default font in Internet Explorer 8 to Cambria. Instant improvement! Cambria is the best serif face for reading on screen, no question, and this book looks great in it. (I’d choose Calibri as my favorite sans serif). Another way to do this would be to switch CSS stylesheets (which Internet Explorer now supports, since Beta 2 of IE8.
Incidentally, Juliet had the gripe that many of the problems DP faces were the result of trying to get pages to work with Internet Explorer. I hope that will become a “gripe of the past” now that IE8 has shipped using Web-standards rendering by default.
Next thing was to play around with browser width, by re-sizing my browser window. As Juliet points out, the layout adapts very well to changes in browser width, which would equate to different screen sizes.
However, on a larger, modern laptop display, there’s so much unused screen either side of the browser window that the “book” gets lost. There’s too much distraction either side – even if it’s only a large area of unused white screen.
And while the content adapts to width very well, it’s still a less-than-optimal scrolling read…

The Dance of Death in a narrower browser window – still a long way short of the immersive experience of reading a printed book.
Since Juliet sent me this link yesterday, I’ve been experimenting with different layouts to create an improved, paginated version of this book, just to show what it could look like. But this morning an email arrived from my colleague Mike Duggan in Ireland, with a really nice paginated layout.

Mike’s a visual guy, a great typographer, and tends to do his mockups in Windows Paint. So this is just a .jpg. But it’s easy to see how it could be created on the Web, using a CSS stylesheet plus multicolumn and hyphenation Javascripts.

Doesn’t it look much better than most content of this type we see on the Web? With a layout like this, in Full Screen mode, you could truly have an immersive book-reading experience.
Mike Duggan’s mockup of a paginated layout for The Dance of Death
What do we need to be able to do this on the Web? Well, the biggest obstacle is that today you’d have to paginate it manually, which is not only time-consuming, but means you have to decide upfront on a fixed size – and that’s terrible.

With AJAX, you could get the window size from the operating system, and use that not only to calculate the optimum number of columns, but the depth of the columns in which the content would flow.

You really don’t want to be dependent on Javascripts for multi-column and hyphenation. I’ve talked about the problems of a DOM-based multicolumn .js elsewhere in this blog. Those functions are much better done with the layout and composition engines of the browsers – which are much more sophisticated.
You’d like to be able to just create a Master page, then hand off the actual setting to the browser – whatever browser the reader is using – and have create as many “new pages” as it needs to place all the content.
You’d want graphics to be scaled to fit a grid determined by the AJAX calculation. That grid would be based on multiples of body-text line-height – and would change if the reader, for example, wanted or needed to read in larger type.
There would be common elements for every page. You’d need to increment page numbers. The “virtual pages” created would need to be temporarily stored in a cache somewhere so the reader could navigate the “book”.
There are all kinds of details like this which would need to be thought through. And there would certainly be HTML and CSS standards developments which could help.
There are issues like the one pointed out by Richard Fink, of how you index and reference in a book which has different page numbers for different readers. (My suggestion for this is to take a leaf out of the Bible, and refer to passages by Chapter and Verse. Amazon’s Kindle uses “Location numbers”, which works but is pretty ugly).
That’s why this kind of innovation shouldn’t be done in any one browser. It needs to be a collaboration involving them all. It has to become an extension to existing Web standards.
For example, the CSS3 standard for multiple columns allows you to specify columns either as an integer number – which means column width floats, or by specifying a column width – in which case the number of columns floats. It would be nice to be able to specify upper and lower limits for column width, so you’d get smooth reflow as window size changed.
I know my own personal ideal (and that of Mike Duggan and another expert colleague, Geraldine Wade) is to use my browser in Full Screen mode. But that may not be what everyone wants, and they should be free to choose what they prefer. If our way works better for them, that’s what they’ll end up using.
There’s a lot of work needed to make this happen. But as far back as 1984, I saw an unknown software application – running at that time only on the Apple Macintosh – which would let you create a Master Page grid for any size of page, then allow you to autoflow content into it. The application would automatically create as many new pages as it needed to place all your content.
That little application was called PageMaker, and it created an entirely new market for software, fonts, printers etc. called DeskTop Publishing (it was the era of the capital letter in the middle of words).
Is anyone really trying to tell me that the Web can’t be made capable of doing what was easily possible a quarter of a century ago? We need it to become a publishing platform capable of the highest-quality layout and typography people can imagine.
I have an OnScreen Reading discussion group over on FaceBook where anyone is welcome to get together and talk about all of this. Ultimately, though, I’d like to see this discussion take place under the auspices of the W3C or the CSS working group. Maybe it’s already happening, and I just don’t know about it…
Word of warning: I’m not interested in flame wars over on FaceBook. People do need to be able to argue their point of view, if it’s done in the spirit of moving forward – but not as adversaries, scoring points. I’m keeping myself as the only admin on that group, so I can throw off anyone I judge is getting out of hand.

18 thoughts on “Distributed Proofreaders: What your work could look like if it was paginated…

  1. Joe Clark

    Yes, I really am telling you that. Nobody’s buying what you’re selling, and it isn’t because your way is better. Your way is from another medium and isn’t Web-native. PageMaker was used to create printed documents, a model you will apparently defend to the death. But, to paraphrase the Klingon proverb, is today really a good day to die, Bill?

  2. Bill Hill

    Actually I think Sitting Bull said “Today is a good day to die!” before Little Big Horn – long before the Klingons. And he did OK…What’s “Web-native”, anyway? Sixteen years ago, many things we take for granted today were not “Web-native”.Why does CSS3 allow you to define multiple (equal) columns – and specify the weight and color of the rules you can put between them? Outmoded method from another medium? Surely that’s not “Web-native”, either?Really, the phrase “Web-native” is absurd.

  3. Richard Fink

    GOING NATIVEWhile it’s certainly true that the indigenous inhabitants of the web (web-natives) did not possess CSS or cross-browser javascript libraries, you surely must admit that these simple hunter-gatherers, using only such crude tools as could be readily fashioned – tables and font tags, mainly – showed atonishing ingenuity in moving the display of information from a cave wall model to networked CRT screen.And really, people like Eric Meyer and other standardistas who rail about the lack of layout capabilities within CSS should get on an ice-float and just drift away. Along with Jeff Bezos of Amazon who, unlike Bill, is actually selling something (the Kindle) modeled on, ugh, printed documents.Disgusting.If it was good enough for grandpa, it’s good enough for me. Don’t we already have everything we need? Where will it all end?Why do we have to be subjected to progressive drivel like this:Wanted Layout SystemCSS3 Feedback: LayoutI’m totally with Joe on this, it’s time to say “Stop” to the spirit of unfettered inquiry and go native. I’m ready to march. What else shouldn’t we be doing, Joe?

  4. Richard Fink

    To get back on topic:What you are describing is one heckuva smart browser.The problem we’ve got today – and it needs to be addressed with today’s browsers, not tomorrow’s, is the incredible amount of knowledge that exists only in print.And so far, there is no readily available and practical means of adapting that material, en masse, to the browser as a medium.(And it’s the browser that’s the medium, not some amorphous thing called “the web”.)A few days ago I was reading a book titled Visual Explanations by Edward Tufte and trying to imagine how on earth I would go about adapting it to present effectively within a browser window given the state of technology today.Tough one. But I would settle for a decently type-set HTML version of “Pride And Prejudice” sans any illustrations for now but we don’t even have that. (In PDF, yeah.)And then again, maybe we should just let it all sit in the libraries and go get a Margarita. What the hell, tomorrow’s another day.

  5. Bill Hill

    @ RichardYou’re absolutely right, we need a smarter, more typographically-aware browser.What we have today grew organically from very humble formatting beginnings.I remember the first Web browser I saw. I think it was in 1993, but I could be mistaken. Times New Roman for all text, goodnight, that’s all she wrote…We’ve come a long way since then. But when you get a world type expert like Hermann Zapf demonstrating how no-one, even in print – using hot-metal, or computer typesetting – has yet achieved the level of setting quality achieved by Johannes Gutenberg, you realize we still have a long way to go.(Gutenberg used many times more ligatures – letter combinations – than we have available today in order to achieve what he did in the 42-line Bible in about the year 1455)But you know, it’s not all about the browser – there’s already a proposal for grid-based layout been presented to CSS, for example. Multi-column and drop caps have been defined (not very powerfully in my view, and in the view of others to whom I’ve spoken).HTML and CSS have yet to bring in real typographic experts like Robert Bringhust, whose book “The Elements of Typographic Style” is pretty much the Bible for type folks. If you don’t have that book, you should. It’s in my list of recommended book on this blog. Just click to buy it on Amazon.The caching I spoke of will be a requirement anyway – if we ever want to be able to access content on the Web, but read it offline.There are millions of office documents produced every day. It’s time we stopped printing those out.There are millions of books – in hundreds of languages – and the information in them needs to survive.

  6. bowerbird

    bill-i’ve done a lot of work on this.my e-mail address isbowerbird@aol.comif you want to chat…i’d prefer to dialog publicly, butfacebook gives me the creeps…-bowerbirdp.s. distributed proofreadersdoes a good deed, but it alsohas a _lousy_ workflow whichessentially wastes a good dealof the time and energy that itsdedicated volunteers donate to it,and is hostile when that’s proven.

  7. Richard Fink

    Re: BringhurstDon’t really see the need for “experts” on this. To paraphrase Duke Ellington “If it looks good it IS good.”And we’ve got Richard Rutter and Steve Marshall doing a bang-up job of translating those basic principles into browser-specific examples on The Elements of Typographic Style Applied To The WebAnd besides, the crux of the issue (IMHO) is in Chapter 9 (9.4.1 on page 190 in my softcover copy) which contrasts 5 different versions of the same paragraph, same typeface, but with 5 different variations of justification using a variety of techniques.Until we’ve got THAT kind of control in the browser, there’s no sense in getting too picky.BTW – Jon Tan has a nice roundup of basic paragraphing (with dropcaps, too) here:12 Examples Of Paragraph Typography

  8. Richard Fink

    @billI know it’s a bit of a pain, but if you reference a previous post – like the one by Juliet Sutherland – including the permalink URI would be greatly appreciated.@bowerbirdI would love to see the work you’ve done. Presuming that you mean type-setting books for display within a browser.Any chance of your posting some links to your work?@Juliet SutherlandI would like to know, more specifically, what problems proof-readers are having with IE as opposed to other browsers.@bill, againChrome 2 Beta has a full-screen mode. (F11)Leaving Safari as the odd man out.@allHaving downloaded some text-only stuff from Distributed Proofreaders and Project Gutenberg, as well. I find the text formatting inconsistent.For example, one book I downloaded had each line of text formatted as an individual paragraph (that is, a line of text followed by carriage return/line feed – showing up as a pilcrow in my text editor.)Are there any rules or standards in effect, or is everybody winging it? Any other folks experiencing the same thing?

  9. Richard Fink

    @BillChrome ver (is what my ‘About’ box says)Downloaded it labeled as Chrome 2 Beta originally.(It might have updated behind my back since then – haven’t checked my settings. Google’s not bashful about stuff like that, for sure.)On Windows XP.

  10. Julien Couvreur

    This is a great discussion. I appreciate the vision. I may not subscribe to it 100% yet, but definitely appreciate the stretch of imagination, kicking old habits and conventions. It is certainly the case that the web is what we make of it, there is no such thing as “web native”.Re-thinking the graphical design of content on the web is timely. There is a definite shift in usability recently: focusing on the user’s content and tasks, removing administrative junk (as Aza Raskin puts it), making the display and interaction efficient and suited to the user’s brain.

  11. Richard Fink

    PRINT-TO-SCREEN CONVERSIONI’ve skimmed through some of Distributed Proofreader’s Formatting Guidelines and am absolutely fascinated. Must look more closely at a later date.Basically, it’s a spec of sorts, a “standard”, for conversion.FULL SCREEN VS “CHROMELESS BROWSER”It’s a Windows-centric thing, but are you aware of HTA’s? (HyperText Applications)If you don’t, imagine Internet Explorer without its chrome (no built-in menus, just a window which can be any size and can even be borderless). Plus, it acts like a regular application – with both the ability to pull documents and data from the web AND read-write to the local file system.HTA’s can be packaged up and installed just like applications, too.So, what you have is very much like what you get with, say, the NYTimes Reader, but it uses standards-based markup instead of proprietary WPF stuff that requires a specialized back-end CMS.I’m thinking it might provide a nice platform as an e-reader for the kind of stuff we’re talking here about creating.The content would still look OK in a standard browser (any browser, not necessarily IE), but the user gets more perks by using the HTA app and the developer can use his/her regular palette of tools, HTML, JavaScript, CSS, etc…And one major perk – like the NYTimes Reader, is that the user can download content, and then not necessarily need an Internet connection to view it.I’ve worked with HTA’s a lot, real handy.

  12. Richard Fink

    FULL SCREEN VS “CHROMELESS BROWSER” PART IIADOBE AIR – I’ve been looking closer at this. Seems it’s very similar to HTA but has the advantage of being cross-platform. Windows. Mac. Linux.”Its strength is that web developers can use existing web technologies, (X)HTML, CSS, Javascript, Flash and PDF to create applications that run on the desktop. AIR can work in tandem with your existing web sites and web applications and can be built by web developers using the skills they already have. In the cases where AIR will be working in tandem with an existing web site you have much of the code that you need developed already.”In other words, if you want to read the book online, fine. Open any browser and go to it. But then, for offline reading, or just for some of the added advantages that can be had by running a full-blown application, there’s a desktop client available, too.Installing the application imbues trust, giving more control over the appropriate window size – among other things – to the author.As an example – there’s an AIR app called FaceDesk by Robert Nyman that opens FaceBook in a chromeless window.AIR uses WebKit as the browser component.[Historical note: The idea of a web page as desktop app isn’t at all new. IBM developed (but dropped) something similar very early on in the Web’s existence.Maybe its time has finally come. I’ve also read about Microsoft’s efforts with Gazelle, which goes one step further and turns the browser into an OS, of sorts.]It never hurts to scout the horizon…

  13. Bill Hill

    @Richard:Does the fact it’s a proprietary format not put you off?I want to be able to this in completely Web-standards HTML and CSS, although I accept that AJAX will almost certainly also be needed.

  14. bowerbird

    richard-yes, i’ll post links as dialog continues…and i agree with you on the offline apps.they don’t even need to be like air/flash.any program that can download contentfrom the net could be used for e-books.(it should also be able to upload contenttoo — like comments, annotations, etc.)of course, those e-books should also beviewable through any web-browser, sopeople who can’t download and run anoffline app can still access the e-book,but it’s ok (in my mind) if that’s seen asan only-if-absolutely-necessary option,whereas the offline-app experience is significantly richer in its presentation…-bowerbird

  15. Richard Fink

    I may not be explaining this clearly enough. I think you’re confusing being OS-specific with being non-standards based.Nothing illuminates like a good example so I did some digging today into my archives. I’ll grab a couple of pages of your Mabinogion book or something, package it with an HTA, and post a link to download a working sample. Then you’ll see much more easily what I’m talking about – which is how you can create a “Chromeless Browser” that doesn’t necessarily depend upon a server connection and doesn’t have to operate in a security sandbox that prevents unfettered communication with the OS filing system.Rendering standards-compliant pages (which, I agree, is what we should be creating) within such a chromeless “browser/reader” let’s call it, overcomes some significant limitations. It’s not an either/or thing. It’s more of a “progressive enhancement” that you should keep in mind as you go forward. In this arrangement, the local machine substitutes for the server, and you can side-step some nasty limitations posed by “regular” browsers.The thought was triggered by our exchange over Full-Screen mode and the need to free up screen real-estate so that there’s enough room for a full page to display well enough to create an immersive reading experience.When I look at the comp by your colleague Mike Duggan of the “Dance Of Death”, I see it as damn-near irreducible without loss of effect.The minute you chop it up – reflow the text, re-do the proportions and, especially, the relationship between the text and the graphic, a lot is lost.Now, I have a perspective on this that most people don’t in that I’ve done desktop support and I used to teach newbie computer users and I’ve seen firsthand what we’re up against. The average user is a complete and total creature of habit. Asking them to do anything outside the norm of what they usually do is futile. And the next time they come back to your site, they won’t remember, anyway. (No condescension here, most people don’t live, eat, and breath this stuff like you and I do.) Yes, you could have a pop-up that says, “Best viewed Full-Screen – Press F11” but I guarantee you, many people will just ignore it and if the content flows below the boundaries of the window, they’ll accept it as normal and scroll, as usual! As inane as that seems.On a wide-screen laptop, after you deduct space for the title bar, the standard menu bars, the damned toolbars (MSN, Yahoo, Google, whatever), what you’re left with is a narrow sliver. (We are not being helped here, either, by the trend towards “landscape”, wide-screen displays while, for the most part, we are trying to present layouts that were conceived originally in “profile”.)The deeper you go into this, the more you will find regular browsers wanting.For example:For security reasons, you cannot specify the top/left position and overall height and width of the browser window using javascript. The methods do exist – moveTo() and sizeTo() but they are de-activated except for child pop-up windows.Further, there is no javascript property to tell you whether the brower is even running Full-Screen. (WPF has it, but not IE).All browsers are OS specific. Internet Explorer is for Windows. Safari is for Mac, with a special build for Windows. FireFox has to create separate builds for Windows, Mac, and Linux. This is true even though these browsers are standards aware and should display standards-compliant web pages in pretty much the same way with little variation. Accomodating different OS’s is where it seems Adobe AIR might fit in.Now, Adobe’s eReader, Microsoft’s eReader – the stuff with the DRM, now THAT’s proprietary. The file formats are proprietary – and that, I agree we should avoid like the devil, dancin’ or not.Clear, confusing? – I’ll post my example, soon.@bowerbirdIf you show me yours, I’ll show you mine! Bill’s already seen some very quick and dirty preliminary stuff I’ve done.Thanks – looking forward to seeing your work.

  16. bowerbird

    richard-i have a bunch of demo programs,for various stages of the workflow.what in particular would you liketo see? and where can we talk –publicly — in a place that will bemore accommodating than thesecomment boxes? i would prefera wiki, so we can build somethingthat cumulates, instead of disappears.i’m at bowerbird at aol dot com…-bowerbird

  17. Richard Fink

    @bowerbirdWill email you. Yeah, these comment boxes are murder.Never mind readability – maybe we should be focusing on typeability!@allIn the Wall Street Journal online:How The eBook Will Change The Way We Read And WriteAuthor Steven Johnson makes some good points.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s