Inspecting gradient magnitudes in context can be a powerful tool to see when recurrent units use short-term or long-term contextual understanding.

This connectivity visualization shows how strongly previous input characters influence the current target character in an autocomplete problem. For example, in the prediction of "grammar" the GRU RNN initially uses long-term memorization but as more characters become available the RNN switches to short-term memorization. (RESET)