Accurate and efficient video classification demands the fusion of multimodal information and the use of intermediate representations. Combining the two ideas into the same framework, we propose a probabilistic approach for video classification using intermediate semantic representations derived from the multi-modal features. Based on a class of bipartite undirected graphical models named harmon...