This paper addresses the problem of object recognition given a set images as input (e.g., multiple camera sources and video frames). Convolutional neural network (CNN)-based frameworks do not exploit these sets effectively, processing pattern observed, capturing underlying feature distribution it does consider variance in set. To address this issue, we propose Grassmannian learning mutual subsp...