While transformers have begun to dominate many tasks in vision, applying them large images is still computationally difficult. A reason for this that self-attention scales quadratically with the number of tokens, which turn, image size. On larger (e.g., 1080p), over 60% total computation network spent solely on creating and attention matrices. We take a step toward solving issue by introducing ...