How Does Google Translate Pictures? Real-time photo translation function explained

Google’s Camera Instant Translate feature, uses technology from Word Lens. Word Lens, acquired by Google, was an augmented reality translation application from Quest Visual. Word Lens used the built-in cameras on smartphones and similar devices to quickly scan and identify foreign text, and then translated and displayed the words in another language on the device’s display.

When you point the camera at something written in another language, such as a sign, it will translate in near real-time into the language you use with very high accuracy.

Real-time photo translation function

Photo or image translation is the machine translation (MT) of images by applying OCR (optical character recognition) technique to image and extract any recognizable text in it. It then apply digital image process (or translate) into the target or desirable language.

OCR (optical character recognition) text recognition refers to the process by which electronic devices (such as scanners or digital cameras) examine characters printed on paper, and then use character recognition methods to translate the shapes into computer text. The process of analyzing and processing image files to obtain text and layout information. How to debug or use auxiliary information to improve the recognition accuracy is the most important topic of OCR. The main indicators to measure the performance of an OCR system are: rejection rate, false recognition rate, recognition speed, user interface friendliness, product stability, ease of use and feasibility, etc.

The core – OCR

The purpose of developing an OCR text recognition software system is very simple. It only needs to convert the image so that the graphics in the image can continue to be saved. If there is a table, the data in the table and the text in the image will all be converted into computer text, so that the The storage capacity of image data is reduced, the recognized characters can be reused and analyzed, and of course, manpower and time for keyboard input can be saved.

From image to result output, it must go through image input, image pre-processing, text feature extraction, comparison and recognition, and finally manual correction to correct the mistaken text and output the result.

Structure of OCR software

OCR software is mainly composed of the following parts.

1. Preprocessing – Image input

For different image formats, there are different storage formats and different compression methods. Preprocessing: mainly includes binarization, noise removal, tilt correction, etc.

2. Binarization

Most of the pictures taken by the camera are color images. Color images contain a huge amount of information. For the content of the pictures, we can simply divide them into foreground and background. In order for the computer to recognize text faster and better, we need to First, process the color image so that the image only has foreground information and background information. You can simply define the foreground information as black and the background information as white. This is the binary image.

3. Noise removal

For different documents, we can define noise differently. Denoising according to the characteristics of noise is called noise removal.

4. Tilt correction

Because ordinary users are more casual when taking pictures of documents, the pictures taken are inevitably skewed, which requires text recognition software to correct.

5. Layout analysis

The process of dividing document pictures into paragraphs and lines is called layout analysis. Due to the diversity and complexity of actual documents, there is no fixed and optimal cutting model.

6. Character cutting

Due to the limitation of photographing conditions, characters are often stuck and broken, which greatly limits the performance of the recognition system, which requires the character recognition software to have a character cutting function.

7. Character recognition

This research is already a very early thing. Template matching was used earlier. Later, it was mainly based on feature extraction. Due to the displacement of text, the thickness of strokes, broken strokes, adhesion, rotation and other factors, it greatly affected the characteristics of features. Difficulty of extraction.

8. Layout recovery

People hope that the recognized text is still arranged like the original document picture, the paragraphs are unchanged, the position is unchanged, the order is unchanged, and the output is output to word documents, pdf documents, etc. This process is called layout restoration.

9. Post-processing and proofreading

Correcting the recognition results according to the specific language context is post-processing.