This paper is concerned with producing high-level text reports and explanations of human activity in video from a single, static camera. The motivation is to enable surveillance analysts to maintain situational awareness despite the presence of large volumes of data. The scenario we focus on is urban surveillance where the imaged person is medium/low resolution. The final output is text descrip...