Abstract This paper analyzes three formal models of Transformer encoders that differ in the form their self-attention mechanism: unique hard attention (UHAT); generalized (GUHAT), which generalizes UHAT; and averaging (AHAT). We show UHAT GUHAT Transformers, viewed as string acceptors, can only recognize languages complexity class AC0, recognizable by families Boolean circuits constant depth po...