Inspecting gradient magnitudes in context can be a powerful tool to
see when recurrent units use short-term or long-term contextual
understanding.
This connectivity visualization shows
how strongly previous input characters influence the current target character
in an autocomplete problem.
For example, in the prediction of "grammar" the GRU RNN
initially uses long-term
memorization but as more characters
become available the RNN switches to short-term memorization.
(RESET)