Few-shot action recognition, i.e. recognizing new classes given only a few examples, benefits from incorporating temporal information. Prior work either encodes such information in the representation itself and learns classifiers at test time, or obtains frame-level features performs pairwise matching. We first evaluate number of matching-based approaches using spatio-temporal backbones, compar...