Optical character recognition
Many of you often ask the question: "What is optical character recognition?". In fact, this term hides a lot of nuances and subtleties, which we will discuss in this article. By the way, on the website img2txt.com you can find the free optical character recognition service, which allows to convert picture to text.
Optical character recognition (OCR) - i's a mechanical or electronic translation of printed, typewritten or handwritten text into a specific sequence of codes, which is used for presentation by means of a text editor. Such recognition technology is often used to convert books or certain documents into electronic form, as well as to publish text on web pages, or to automate business accounting systems.
The main feature of OCR is that using it you can not only edit the text, but also search for a specific word or key phrase, save it in a more compact form, and print and present it without losing its quality. In addition, you can analyze text, format, or easily convert it to speech. By the way, the latter term is called "speech synthesis".
It is worth noting that optical character recognition systems require calibration to work with certain fonts. By the way, earlier for programming, an image of each character was required, since the program could work with only one font at a time. Recently, “intellectual” systems, which most accurately recognize several fonts at once, have been very popular. It should be noted that certain systems are also able to restore the original text formatting, including images with columns. Subsequently, the recognized copies are stored in specially organized electronic archives of paper books, one of which can be viewed here.
A bit of history
The history of optical recognition originates in the distant 1929, when the patent for this technology was first obtained. Over the years, newer methods have been patented. Already in 1953, the first machines for optical character recognition were born.
The development of a machine that could recognize text written in any font began in 1974. Work on the product was completed two years later.
It should be noted that the first commercially successful program that can recognize the Cyrillic alphabet was AutoR, which was released by the Russian company OKRUS. The program began to be implemented in 1992.
Modern optical character recognition technology
Today, accurate recognition of Latin characters in printed form can be realized if high-definition images are available. This allows characters to be recognized with an accuracy of 99 percent. The most accurate result can be achieved through human adjustment.
Active research on problems with the recognition of printed and standard handwriting, as well as several other types continue to be conducted today.
There is on-line and off-line character recognition methods. The latter is associated with a static form of text presentation, and online is able to take into account the movement at the time of writing. For example, with the online method, it is easy to determine which side of a string is being written.
Recently, on-the-fly text recognition methods are also popular. Their main feature lies in the fact that the order, speed and direction of individual sections of the input lines are always known. In addition, users can only use certain forms of writing. At the same time, these methods cannot be practiced in software recognition, therefore the issue of handwriting recognition of “printed” text is still relevant.
Recently, the problem of handwriting recognition has also been actively investigated. High performance has not yet been achieved. That is why, to solve larger problems in this area, intellectual systems are usually used (artificial neural networks, for example).