This paper proposes a procedure to train scene text recognition model using robust learned surrogate of edit distance. The proposed method borrows from self-paced learning and filters out the training examples that are hard for surrogate. filtering is performed by judging quality approximation, ramp function, enabling end-to-end training. Following literature, experiments conducted in post-tuni...