When we read manga, sometimes there's a need to quickly OCR a portion of the screen to look up new words and add sentences to Anki. To do so, you're going to use an optical character recognition program and a few helper tools.
Install the following dependencies:
$ sudo pacman -S --needed sxiv maim tesseract xclip imagemagick
is an excellent image viewer.
For this setup you can replace it with any image viewer, but
sxivis what I use.
- tesseract is the OCR engine. It is considered fairly accurate, and many people like it.
- maim is an utility for taking screenshots which can take parts of the screen.
- xclip is a tool for copying text to clipboard.
- imagemagick is a command-line image editor. It's going to come handy to edit the screenshots before Tesseract analyzes them.
By default Tesseract is not very good at detecting Japanese characters, but the quality of OCR operations can be improved by using custom trained data.
We won't need the program itself because it's garbage
but the trained data files are going to be useful.
Extract the contents of the
tessdata folder to
$ unzip -j Capture2Text_v*_64bit.zip 'Capture2Text/tessdata/*' -d ~/.local/share/capture2text_tessdata
Alternatively, download just the Capture2Text Japanese files from here.
Contents of the ZIP archive.
You don't need to install any data files from the repositories of your distro,
the ones in the
capture2text archive are way better.
and save it as
Make the file executable:
$ chmod +x ~/.local/bin/maimocr
~/.local/bin should be in your
Bind this script to any key in your DE, WM, sxhkd, xbindkeysrc, etc. Here's an example for i3wm:
bindsym $mod+o exec --no-startup-id maimocr
The script is very trivial, so I hope you can understand it without explanations. When run, it will ask you to select an area with Japanese text and try to OCR it. The resulting text will be saved to the system clipboard. Use it in combination with Yomichan Search to quickly lookup Japanese words in real-time.
To open Yomichan Search, open your Web Browser and press
Alt+Insert. Yomichan should be already installed.
If you notice that the script fails to OCR certain images, try to zoom in or find a scan with a better resolution. Tesseract works poorly at low resolutions.
Note: As an alternative, you can install kanjitomo but it's quite big and forces you to use a Japanese to English dictionary instead of a Japanese to Japanese one.