Scanning books into e-books gets cheaper, but accessing them remains the problem
Google's efforts with the six-year old Google Books project have yielded 15 million scanned books, a new cross-platform e-bookstore, and a temporary copyright shield that lets Google sell "orphaned" works*; but the task of scanning every book cannot be left solely up to Google and its partners.
Earlier this year, Google estimated that 129,864,880 books were in existence. At Google's current pace of 1,000 pages per hour per scanner in use, it could take over 40 years to scan that many books. What's more, Google is only counting the media it defines as a "book," and not the countless other paper media that makes a library such a valuable resource.
Library scanning projects are not just about books and things that fall under the purview of publishers, said Nick Warnock, president of book digitization solution company Atiz. "It's census records, old real estate transactions, deed registries...I can't even begin to tell you how many of these exist. All of this stuff has been written down for hundreds of years and could potentially be left behind."
Most of the works of this type are in the posession of small community libraries and private collectors, who may not have the reach or capital to become a partner in the Google Books project and let Google do their scanning for them.
While it is completely possible to independently digitize your own library, for many there are two huge factors that prohibit it: the cost of the project, and its potential copyright pitfalls.
"At current funding levels, I think a lot of public libraries would claim that they have a hard enough time keeping up with the tasks and needs of today much less a major initiative for tomorrow," said Aaron Krebeck, IT Manager of Charles County (MD) Public Libraries.
Since funds for digitization projects can be so light, it's important to find a cost-effective, yet high-quality and OCR-friendly way to scan and encode this rare content. Fortunately, book scans can be grabbed with consumer DSLR cameras.
"The three most ambitious book scanning projects --The Internet Archive, Google Books, and Project Gutenberg-- are all scanning with digital cameras," Said Warnock. "However, the setups that Google uses cost between $30,000 and $200,000, and something like that is not going to fit in the budget of a rural library."
Warnock hopes these are the people to whom Atiz can appeal. Its Bookdrive Scanners can do 700 pages an hour with two Canon DSLR cameras. They cost around $10,000 with the associated software, and do not incur a maintenance charge since all of the sensitive components are off-the-shelf.
Libraries can go even cheaper, and build a sophisticated book scanner for around $300 that can scan about 400 pages an hour. The scans can then be finished with a single piece of free software, which sadly has an uncertain future because it has been maintained by just one lone developer.
Fortunately, Atiz also sells its book scanning software which syncs dual Canon camera setups of any type; and as the cost of image sensors drops, so too will the cost of any of these solutions.
"You can do quick, cheap scans for peace of mind," Warnock said. "But if you are thinking about preservation for 200 to 300 years, you're going to shoot in RAW format and convert it to multi-page TIFF."
Ultimately, though, this is only half of the problem. The more aggravating issue for library administrators lies in U.S. Copyright Law.
"The library bible for copyright is Title 17 Section 108 of the US Code," said Krebeck, "Unfortunately this was written when mutton chops were de rigeur and probably last updated when Xerox units were the size of tractor trailers."
"These days it seems like the only people who pay any attention to copyright law are librarians and lawyers," Krebeck continued. "Since we are a relatively easy target, we have to play by the rules. Unfortunately, it's really hard to tell exactly what we're allowed to store digitally and even harder to tell what we're allowed to make available digitally. So we err on the side of caution."
The scope of Charles County's digitization efforts, therefore, have been limited to the sort of regional history that is still largely missing from projects like Google Books.
"For example, We digitized a donated collection of photographs depicting life in the county in the first half of the 20th century," Krebeck said. "It was free and clear of any copyright concerns and local history is always something that can generate a fair amount of interest. So it met that magic blend of legally allowable, relatively easy to do (because of the small collection size), and worth the investment of staff time. It doesn't hurt that these are the kinds of digitization projects that are most likely to get grant funding."
The content Krebeck and his staff scanned can be found through The Enoch Pratt Free Library's Maryland Digital Cultural Heritage project, which is funded by a grant from the Library Services and Technology Act.
So for independent historians, rare book collectors, and small town librarians, the task of book digitization promises to only get easier. Unfortunately, for those of us who want easy access to much of this digitized content, it appears to be a different story altogether.
*Books that are out-of-print, but still under active copyright with no clear copyright holder. For example, a book published after 1923 whose publishing house either went out of business or was acquired by another publisher.