Academic libraries pave a new path away from Google

By Angela Gunn
Published 17 years ago

What's bigger than Google? The vision of librarians, according to the academic institutions banding together to create HathiTrust -- a "universal library" built in part on Google's scanning efforts.

HathiTrust (pronounced haw-TEE -- it's the Hindi word for elephant, that animal that famously lives long and never forgets) launched Tuesday. It's a project of the member universities of the Committee on Institutional Cooperation (CIC) and the University of California system. The CIC has been working with Google since last year to digitize books held in libraries at member schools; the UC system signed on with Google in 2006, and the University of Michigan's MBooks (now folded into HathiTrust) has underway since the school announced affiliation with the Google Books Library Project during its launch in 2004.

In fact, HathiTrust's initial store of content will be built on digital copies of the scans Google made for the Google Books Library Project. In the course of that endeavor, each partner library received a copy of whatever material they offered to the system for, as they say, "ingestion." Those copies were free for schools to use in any appropriate way. HathiTrust is thus able to start life with 78 terabytes (738.8 million pages, 1,713 tons, 25 miles) of content; about 20% of that is available to the general public.

If Google's done it already, why build HathiTrust? Google's library project is, as Google puts it, "an enhanced card catalog" for everybody's use. But librarians familiar with the formation of HathiTrust have expressed concerned that the growing collection of full texts lacks the kind of professional curation that a research-class archive requires -- and that universities, not companies, are the better long-term caretakers of information and scholarship. In addition, some of the materials each university holds were never appropriate for Google scanning; those materials will eventually be digitized as needed and introduced directly to the HathiTrust system, with no version necessarily delivered to Google.

The entire HathiTrust project is licensed under a Creative Commons agreement, with provision made for the approximately 80% of material that still falls under copyright restrictions.

The project's been in the works since a CIC agreement was reached on the project, previously known as the Shared Digital Repository, back in March. A number of information professionals at schools such as the University of Wisconsin and Indiana University spent the summer building mirror sites, figuring out large-scale search options, and (in one school's case) untangling certain problems with Google's previous scans of some of the library's holdings.

At present the partners are preparing for the two-step process of moving their content into the HathiTrust system. First, each institution must prepare accurate bibliographic records to provide metadata for their "digital objects." Once that's done, the content itself can come in -- via Google, if the school prefers, or by non-Google mechanisms currently under development. Those will consist for now of page-image files plus associated optical-character recognition files and metafiles.

HathiTrust is available for searching by the interested public, though there's no grand unified search interface yet. The University of Michigan and the University of Chicago both currently offer search.

7 Comments

Academic libraries pave a new path away from Google

7 Responses to Academic libraries pave a new path away from Google

Recent Headlines

93 percent of software execs plan to introduce custom AI agents

OWC launches Quad and redesigned Dual USB-C HDMI 4K adapters to simplify multi-display setups

Samsung to bring Galaxy AI to 400 million devices globally by end of the year

Amazon Prime Day 2025 delivers record sales, but questions remain over hype and value

UK government issues Windows 11 upgrade warning ahead of Windows 10 end of life

Stanford University study finds AI-based therapy has ‘significant risks’

Instagram opens up Trial Reels feature to more creators

Most Commented Stories

Betanews Is Growing Alongside You

Windows 11 25H2 has a new option to remove all unwanted Microsoft apps

16 Billion Passwords Exposed: Major Leak Hits Apple, Facebook and Google Users

Will Windows 10 stop working? See if your PC will survive the switch to Windows 11

Half of Americans think AI is a threat, the other half don't. Who's right?

Apple’s Liquid Glass Control Center Gets a Much-Needed Fix in iOS 26 Beta 2

Apple’s CarPlay Ultra Comes to a Halt as Industry Giants Start Changing Their Minds

Never mind Windows 11, Windows Classic Remastered is the nostalgic Microsoft operating system you didn't know you wanted