gImageReader extracts text from images, PDFs, more

By Mike Williams
Published 9 years ago

gImageReader

Extracting text from a PDF can be very easy. Just select a section and copy it to the clipboard, or maybe -- in Adobe Reader -- click File > Save As Other > Text to save the entire document.

This all works just fine, too, until you come across a PDF which is all images. And that’s when you need something a little more powerful.

GImageReader is an open source front end for the Tesseract OCR engine, and can extract text from PDFs, image files, or by acquiring them from your scanner. If that's not enough it also accepts images from the clipboard, or by taking a screenshot.

A one-click "Autodetect layout" option will hopefully detect all the text regions within the source. The reliability of this can be anything from "amazing" to "useless", depending on the image, but you can delete or reorder the regions as necessary. Or you might select a block manually by clicking and dragging with the mouse.

If the task is a simple one -- just a paragraph or two of high quality text -- you could just right-click a region and select "Recognize to clipboard". GImageReader grabs whatever text it can from the image and copies it to the clipboard, ready for immediate reuse elsewhere.

Longer blocks can be sent to an "Output" pane for cleaning up. There’s nothing too advanced -- search and replace, stripping line breaks, a chance for manual editing -- but it might be helpful, and when you’re done the results can be saved as a TXT file.

GImageReader’s interface is a little awkward in places, but once you've figured it out it’s easy enough to use, and the Tesseract engine can be very accurate. The program is available now for Windows XP+ and Linux.

No Comments

Comments are closed.

Got News? Contact Us

gImageReader extracts text from images, PDFs, more

Recent Headlines

How writing zip support for Windows almost cost its creator his job at Microsoft

Apple AirPlay comes to IHG Hotels and Resorts

Millennials are key targets for phishing

Get 'Applied Machine Learning and AI for Engineers' (worth $67.99) for FREE

Best Windows apps this week

The dynamics of modern Windows device management [Q&A]

Netflix says it will no longer share details of subscriber numbers

Most Commented Stories

Say goodbye to Microsoft Windows 11 and hello to Nitrux Linux 3.4.0 'pl'

Windows 11 slammed for its 'comically bad' performance even on high-end hardware

Outrageous: Microsoft to charge $61 for Windows 10 updates -- consider switching to Linux!

Microsoft 'improves' Windows 11 by bringing ads to the Start menu in the US

Microsoft is up to its old tricks yet again -- Windows 10 users harassed with full-screen Windows 11 upgrade warnings

The stunning Windows 13 -- yes, 13! -- is the Microsoft operating system we want

Easter giveaway! Get a licensed copy of 'VideoProc Converter for Windows/Mac' (worth $78.90) for FREE

EndeavourOS ARM discontinued: A huge loss for the Linux community