Scanning books into e-books gets cheaper, but accessing them remains the problem

By Tim Conneally
Published 15 years ago

Google's efforts with the six-year old Google Books project have yielded 15 million scanned books, a new cross-platform e-bookstore, and a temporary copyright shield that lets Google sell "orphaned" works*; but the task of scanning every book cannot be left solely up to Google and its partners.

Earlier this year, Google estimated that 129,864,880 books were in existence. At Google's current pace of 1,000 pages per hour per scanner in use, it could take over 40 years to scan that many books. What's more, Google is only counting the media it defines as a "book," and not the countless other paper media that makes a library such a valuable resource.

Library scanning projects are not just about books and things that fall under the purview of publishers, said Nick Warnock, president of book digitization solution company Atiz. "It's census records, old real estate transactions, deed registries...I can't even begin to tell you how many of these exist. All of this stuff has been written down for hundreds of years and could potentially be left behind."

Most of the works of this type are in the posession of small community libraries and private collectors, who may not have the reach or capital to become a partner in the Google Books project and let Google do their scanning for them.

While it is completely possible to independently digitize your own library, for many there are two huge factors that prohibit it: the cost of the project, and its potential copyright pitfalls.

"At current funding levels, I think a lot of public libraries would claim that they have a hard enough time keeping up with the tasks and needs of today much less a major initiative for tomorrow," said Aaron Krebeck, IT Manager of Charles County (MD) Public Libraries.

Since funds for digitization projects can be so light, it's important to find a cost-effective, yet high-quality and OCR-friendly way to scan and encode this rare content. Fortunately, book scans can be grabbed with consumer DSLR cameras.

"The three most ambitious book scanning projects --The Internet Archive, Google Books, and Project Gutenberg-- are all scanning with digital cameras," Said Warnock. "However, the setups that Google uses cost between $30,000 and $200,000, and something like that is not going to fit in the budget of a rural library."

Warnock hopes these are the people to whom Atiz can appeal. Its Bookdrive Scanners can do 700 pages an hour with two Canon DSLR cameras. They cost around $10,000 with the associated software, and do not incur a maintenance charge since all of the sensitive components are off-the-shelf.

Libraries can go even cheaper, and build a sophisticated book scanner for around $300 that can scan about 400 pages an hour. The scans can then be finished with a single piece of free software, which sadly has an uncertain future because it has been maintained by just one lone developer.

Fortunately, Atiz also sells its book scanning software which syncs dual Canon camera setups of any type; and as the cost of image sensors drops, so too will the cost of any of these solutions.

"You can do quick, cheap scans for peace of mind," Warnock said. "But if you are thinking about preservation for 200 to 300 years, you're going to shoot in RAW format and convert it to multi-page TIFF."

Ultimately, though, this is only half of the problem. The more aggravating issue for library administrators lies in U.S. Copyright Law.

"The library bible for copyright is Title 17 Section 108 of the US Code," said Krebeck, "Unfortunately this was written when mutton chops were de rigeur and probably last updated when Xerox units were the size of tractor trailers."

"These days it seems like the only people who pay any attention to copyright law are librarians and lawyers," Krebeck continued. "Since we are a relatively easy target, we have to play by the rules. Unfortunately, it's really hard to tell exactly what we're allowed to store digitally and even harder to tell what we're allowed to make available digitally. So we err on the side of caution."

The scope of Charles County's digitization efforts, therefore, have been limited to the sort of regional history that is still largely missing from projects like Google Books.

"For example, We digitized a donated collection of photographs depicting life in the county in the first half of the 20th century," Krebeck said. "It was free and clear of any copyright concerns and local history is always something that can generate a fair amount of interest. So it met that magic blend of legally allowable, relatively easy to do (because of the small collection size), and worth the investment of staff time. It doesn't hurt that these are the kinds of digitization projects that are most likely to get grant funding."

The content Krebeck and his staff scanned can be found through The Enoch Pratt Free Library's Maryland Digital Cultural Heritage project, which is funded by a grant from the Library Services and Technology Act.

So for independent historians, rare book collectors, and small town librarians, the task of book digitization promises to only get easier. Unfortunately, for those of us who want easy access to much of this digitized content, it appears to be a different story altogether.

*Books that are out-of-print, but still under active copyright with no clear copyright holder. For example, a book published after 1923 whose publishing house either went out of business or was acquired by another publisher.

2 Comments

Scanning books into e-books gets cheaper, but accessing them remains the problem

2 Responses to Scanning books into e-books gets cheaper, but accessing them remains the problem

Recent Headlines

93 percent of software execs plan to introduce custom AI agents

OWC launches Quad and redesigned Dual USB-C HDMI 4K adapters to simplify multi-display setups

Samsung to bring Galaxy AI to 400 million devices globally by end of the year

Amazon Prime Day 2025 delivers record sales, but questions remain over hype and value

UK government issues Windows 11 upgrade warning ahead of Windows 10 end of life

Stanford University study finds AI-based therapy has ‘significant risks’

Instagram opens up Trial Reels feature to more creators

Most Commented Stories

Betanews Is Growing Alongside You

Windows 11 25H2 has a new option to remove all unwanted Microsoft apps

16 Billion Passwords Exposed: Major Leak Hits Apple, Facebook and Google Users

Will Windows 10 stop working? See if your PC will survive the switch to Windows 11

Half of Americans think AI is a threat, the other half don't. Who's right?

Apple’s Liquid Glass Control Center Gets a Much-Needed Fix in iOS 26 Beta 2

Apple’s CarPlay Ultra Comes to a Halt as Industry Giants Start Changing Their Minds

Never mind Windows 11, Windows Classic Remastered is the nostalgic Microsoft operating system you didn't know you wanted