Classical multiple instance learning (MIL) methods are often based on the identical and independent distributed assumption between instances, hence neglecting potentially rich contextual information beyond individual entities. On other hand, Transformers with global self-attention modules have been proposed to model interdependencies among all instances. However, in this paper we question: Is r...