Mining from movies and TV-shows

June 05, 2021 — Tatsumoto Ren

Sentence mining is the process of picking sentences from your immersion and making Anki cards. Each sentence has one unknown piece of information, which is referred to as target word.

To mine sentences from movies and TV-shows you are going to need the mpv video player, and a plugin for mpv called Mpvacious.

MPV

mpv is the best video player for language learners. It is fast, minimal, extensible and compatible with all video formats. mpv is available on most distributions of GNU/Linux. To install it on Arch Linux, execute the following command:

$ sudo pacman -S mpv

Make sure you have version 0.33.0 or later. If such a version is not available on your distribution, see here for different installation options.

MPV configuration

Start mpv at least once to create the config folder which will be located at ~/.config/mpv/.

mpv reads its configuration from the following directories/files:

~/.config/mpv/mpv.conf. The main config file. Contains settings related to the player itself.
~/.config/mpv/input.conf. Key bindings. Apart from the player's key bindings, it can also contain key bindings of third-party user-scripts.
~/.config/mpv/scripts. Contains user-scrips. User scripts are small programs that extend the player's functionality. Usually each user-script is installed in its own folder.
~/.config/mpv/script-opts. Config files used by the installed user-scripts. Some user-scripts don't have configuration files.

Note that not all the directories or files listed above are created automatically.

To configure mpv, create and open ~/.config/mpv/mpv.conf. This step is not strictly necessary.

Example configuration file

# Define language preferences
alang=ja,jp,jpn,japanese,en,eng,english,English,enUS,en-US
slang=ja,jp,jpn,japanese,en,eng,english,English,enUS,en-US
# mpv will resume where you left off when you reopen a media file
save-position-on-quit=yes
# Automatically use external subtitle files that contain the name of the media file
sub-auto=fuzzy
subs-with-matching-audio=yes
# Additional folders for storing subtitle files.
# You can drop all subs in the "subs" folder to keep files organized.
sub-file-paths=ass:srt:sub:subs:subtitles
# Subtitle font. Make sure the font you select is installed
# and contains all characters that are included in your subtitle files.
# https://archlinux.org/packages/extra/any/noto-fonts-cjk/
sub-font="Noto Sans CJK JP Regular"
# Change subtitle font size.
sub-font-size=40
# Uncomment this if you want mpv to override styles from SubStation Alpha (.ssa/.ass) subtitles
# sub-ass-override=force
# Improves audio when the playback speed is changed.
af-add=scaletempo2
# Screenshots
screenshot-directory="~/Pictures/Screenshots/"
screenshot-template="%F_%wHh%wMm%wSs%wTms"
screenshot-format=jpg
screenshot-jpeg-quality=90
screenshot-high-bit-depth=yes

Note: To tell mpv to show subtitles in the font you selected in the config, press u. If you want to make the setting permanent, add sub-ass-override=force to mpv.conf.

~/.config/mpv/input.conf contains key bindings. For all available bindings, see /usr/share/doc/mpv/input.conf.

Example input.conf

# Increase / decrease subtitle font size
# Works only when sub-ass-override=force
# https://www.reddit.com/r/mpv/comments/dg5yzj/trouble_decreasing_subtitles_size/
/ add sub-scale +0.1
? add sub-scale -0.1
# Cycle video aspect ratios; "-1" is the container aspect
A cycle-values video-aspect "16:9" "16:10" "4:3" "2.35:1" "-1"
# Vim-like seeking
l seek  5
h seek -5
j seek -60
k seek  60
# Cycle between subtitle files
K cycle sub
J cycle sub down
# Skip to previous/next subtitle line (disabled - use Mpvacioius)
#H no-osd sub-seek -1
#L no-osd sub-seek 1
# Search sub-text on Jisho.org
# https://github.com/mpv-player/mpv/issues/4695#issuecomment-609876072
Ctrl+j run "/bin/sh" "-c" "xdg-open 'https://jisho.org/search?keyword=${sub-text}'"
# skip to next/previous file
> playlist-next
< playlist-prev
# Add/subtract 50 ms delay from subs
Z add sub-delay +0.05
z add sub-delay -0.05
# Adjust timing to previous/next subtitle
X sub-step 1
x sub-step -1
# Toggle OSD visibility
V script-binding visibility

Obtain content

Always download what you watch. This way it's easier to work with the files, make Anki cards, take screenshots, cut clips, make condensed audio, and so on. Don't use "services" that spy on you such as Netflix, Hulu, VRV, Funimation or Crunchyroll.

Attempts to create flashcards from streamed video files often fail. As a workaround, the user can record audio and take a screenshot manually, but it is not convenient at all.

Explore Immersion material to find Japanese content and Download tools for a ways to download it.

Obtain subtitles

To learn Japanese from visual content such as anime or jdramas, you need Japanese subtitles. Subtitles can be found on the Resources page. Don't forget that if you have English subtitles enabled, at best you're going to improve your English, but your Japanese is not going to get any better.

I find that it's easier to download all the subtitles for all the shows and have them saved locally. This may require a few GiB of disk space, but it will save you the hassle of having to visit subtitle catalogs every time you need to find subtitles. For example, if you download the mirror of kitsunekko, you can use lf to browse it, and fzf to search it. Watch a demonstration on YouTube .

In the video, I'm using a shell alias to enter the directory with Japanese anime subtitles. I'm also using keyboard shortcuts to call fzf and lf. Shortcuts and aliases can be configured in the .zshrc file.

# Change directory
alias subs='cd -- "/path/to/directory"'
# Ctrl+O to launch lf file manager.
bindkey -s '^o' '^ulf\n'
# Ctrl+F to search directory with fzf.
bindkey -s '^f' '^ucd -- "$(dirname -- "$(fzf)")"\n'

To create your own local mirror of kitsunekko, use kitsunekko-tools. This simple program downloads all files from kitsunekko and saves them in a specified directory on your hard drive. You need roughly 8 GiB on your disk to store a copy of the entire subtitle catalogue.

If you can't find target language subtitles for a video, you can generate them using pyTranscriber. You're going to get cursed subtitles, but it's better than nothing. I have used it a few times for Japanese.

If you want to synchronize the subtitles, see Retiming subtitles.

SUP subtitles

Sometimes the only subtitle files available for a particular video are image-based (.sup or .sub). Often this is the case with subtitles for movies. They don't contain any text and by themselves are not very useful. SUP files can be converted to plain text (.srt) using OCR (Optical Character Recognition) programs such as subtitleedit.

User-scripts for MPV

mpv is highly configurable and extensible, allowing users to customize its interface and functionality. One way to do this is through user-scripts, which are scripts that can modify mpv's behavior and add new features. User-scripts are written in languages such as Lua or JavaScript.

You can find a list of user-scripts for mpv here.

Mpvacious

Mpvacious is a user-script for the mpv video player that allows you to make Anki cards while watching movies and TV shows in your target language.

Mpvacious can update Anki cards you add with Rikaitan by filling the audio and screenshot fields. Watch a demonstration on YouTube .

The user-script has a long list of features and configuration options. For details explore the documentation on GitHub. Below I'm going to write about how I use it.

Requirements

Mpvacious needs AnkiConnect to work. If you're using Xorg, xclip is required to copy text to the clipboard. wl-copy is required on Wayland.

Installation

Mpvacious can be installed with git:

$ git clone 'https://github.com/Ajatt-Tools/mpvacious.git' ~/.config/mpv/scripts/subs2srs

Configuration

Download the example config file and save it to ~/.config/mpv/script-opts/subs2srs.conf. If you use the Note Type from the recommended mining deck, you don't need to change any settings at all. If you don't, make sure to adjust at least the following variables:

sentence_field
audio_field
image_field

Usage

Make sure Anki is running.
Open a video in mpv.
Press Ctrl+t to activate clipboard autocopy.
Open GoldenDict or open your web browser and activate Rikaitan Search.
Enable GoldendDict's "Scan Popup" or enable clipboard monitor in Rikaitan Search.
When there is a word you want to mine, create a new card.
Go back to the mpv window and press Ctrl+m. If you want to grab a sentence that spans multiple lines, press a to open the advanced menu. Then mark the lines you want to appear on the card by pressing c and moving the position with Shift+h and Shift+l. Finish by pressing m.
The Anki Browser window should appear with the card updated.

See How to for more detailed explanations.

Creating cards without pausing

You can save sentences for later without adding definitions. Since you don't have to pause to read the dictionary, this approach is less disruptive. Press Ctrl+n to add the current subtitle line to Anki as a new card. When you finish watching, open the Anki Browser and type added:1 to display cards added today. Finally, add dictionary definitions to the cards.

If you want to create a card from multiple adjacent subtitle lines, do it using the advanced menu as described in the previous section.

Updating

To update Mpvacious, run the command below:

$ cd ~/.config/mpv/scripts/subs2srs && git pull

Should I use a "text hooker" page?

No.

Though you often see such recommendations, I think it unnecessarily complicates your setup. GoldenDict or Rikaitan Search can do the same thing simpler.

Tags: guide, mpv