A Novel Method for Image to Text Extraction Using Tesseract-OCR

Sayan Kumar Garai, University of Engineering & Management (UEM), Department of CSE, Kolkata, IndiaFollow
Ojaswita Paul, University of Engineering & Management (UEM), Department of CSE, Kolkata, IndiaFollow
Upayan Dey, University of Engineering & Management (UEM), Department of CSE, Kolkata, IndiaFollow
Sayan Ghoshal, University of Engineering & Management (UEM), Department of CSE, Kolkata, IndiaFollow
Neepa Biswas, University of Engineering & Management (UEM), Department of CSE, Kolkata, IndiaFollow
Dr. Sandip Mondal, University of Engineering & Management (UEM), Department of CSE, Kolkata, IndiaFollow

Abstract

Text extraction process can play a vital role for detecting valuable information from a selected image. This text extraction process involves text detection, localization, marking, tracking, extraction, enhancement and finally recognition task. It is a difficult task to detect these text characters, because of their variation of size, style, font, orientation, alignment, contrast, color and textured background. There is a growing demand of information detection, indexing and retrieval from various multimedia documents nowadays. Several methods have been developed for extraction of text from an image. This article proposes a novel method for image to text extraction. In this paper, we are presenting a multiresolution morphology based text segmentation process suitable for various types of non-text elements like drawing, pictures, halftones or etc. For image processing, python library OpenCV is used and for text extraction Tessaract is used. Python Imaging Library (PIL) is capable to handle the opening and manipulation of images in many formats in Python. Also we are in testing of such an application that can give output in every language correctly.

Recommended Citation

Garai, Sayan Kumar; Paul, Ojaswita; Dey, Upayan; Ghoshal, Sayan; Biswas, Neepa; and Mondal, Dr. Sandip (2024) "A Novel Method for Image to Text Extraction Using Tesseract-OCR," American Journal of Electronics & Communication (AJEC): Vol. 3: Iss. 2, Article 2.
Available at: https://research.smartsociety.org/ajec/vol3/iss2/2

Download

Included in

Electrical and Electronics Commons

COinS

A Novel Method for Image to Text Extraction Using Tesseract-OCR

Authors

Abstract

Recommended Citation

Included in

Share

Search