This paper presents a data-driven cluster sampling framework for parsing scene images into generic regions (such as the sky, mountain and water) and objects (such as cows, horses and cars). We adopt generative models for both generic regions and objects, thus their likelihood probabilities are comparable and are learned under a common information projection principle. The inference algorithm fo...