I've been having some fun lately doing volunteer proofreading for Project Gutenberg through Distributed Proofreaders.
Project Gutenberg makes public-domain text available online as e-books. There are at least 19,000 books available as of this writing. If there's an old book you've been meaning to read, and you suspect it's in the public domain, it's always worthwhile checking the online catalog to see if it's available. (For example, I just searched for The Confessions of Saint Augustine, Dostoyevsky's Crime and Punishment, Whitman's poetry volume Leaves of Grass, Charlotte Bronte's Jane Eyre, and Orthodoxy by G. K. Chesterton and found all of them. Two that I searched for, Newton's Opticks and Andre Gide's La symphonie pastorale, aren't there yet.)
Distributed Proofreaders, aka DP, has in an in-house wiki with all the information you might want about it. The article "Getting Started" describes its mission:
We rescue out-of-copyright (and frequently out-of-print) books and convert them to "electronic texts" that can be read by people on most computers or on hand-held devices such as PDAs or e-text readers. The books we choose to work on are ordinary books from all over the world; some are classics, some are not. Every one of them was interesting enough that someone — or many someones, like you! — invested a great deal of time scanning, checking and double-checking the text and illustrations before sending it out to the world via Project Gutenberg.
It's "Distributed" because the work of proofreading is spread out amongst our volunteers: anyone can do as much or as little as they please. This site provides a web-based method of easing the work associated with accurately proofreading and formatting Public Domain books. By breaking the work into individual pages many proofreaders can be working on the same book at the same time. This significantly speeds up the proofreading/e-book creation process.
Having worked on four or five different projects now, I highly recommend DP as a volunteer hobby.
All the annoying bits have been automated. After a volunteer logs on, he sees a list of links to projects in various stages. Links to projects on which he's unqualified to work are automatically deactivated until he meets the requirements (e.g., minimum numbers of pages completed, passing of proofreading quizzes). Some projects are expressly reserved for beginners.
After the volunteer clicks on a suitable project, a multiframe window opens. One frame is an image of a page scanned from a book; another frame is an ASCII text. A third contains dropdown menus, buttons, and links.
The volunteer checks and corrects the ASCII text, which starts as the raw output of optical character recognition (OCR) software. When the page is done, a single click saves it, sends it to the next "round," and retrieves another.
Each page passes through several rounds before the whole project is submitted to Project Gutenberg. A beginner can work only in round "P1," the first round of proofreading, in which he endeavors to catch the OCR software's mistakes (called "scannos" in an analogous if imprecise formation from "typos"). As a volunteer accumulates experience, he can move to round "P2" and then to the formatting rounds "F1" and "F2" where italics, paragraph indents, and other fine-tuning is added. There are also post-processing rounds, and some special rounds, too, e.g., for material that can't be processed with the OCR software and has to be typed in. Seasoned volunteers can serve as project managers.
And, of course, anyone can provide content -- all you need is an old book (check first to make sure it's not copyrighted!) and a scanner.
Some of the material is technical in nature and needs to be typeset with LaTeX, which is an area of special expertise with me. I think I will enjoy working on that, later on. But alas, I am still a beginner, and need to accumulate more pages!
Meanwhile, I'm having fun doing a page here and a page there while I surf or blog. I've worked on a couple of children's books, a literature textbook in French, and a volume of poetry. Incidentally, you don't have to speak a language to proofread it in the earliest rounds -- I could work on books in Tagalog or Hmong if I wanted to -- because all you need to do is make sure the string of ASCII characters matches the text in the scanned image.
It strikes me as a worthwhile volunteer project for older kids and teens, especially the ones who might be interested in careers in information technology, editing, or whatever "library science" will be called in coming years. Such work could also be incorporated into a foreign-language curriculum.
Recent Comments