With the wide use of visual sensors in Internet Things (IoT) past decades, huge amounts images are captured people's daily lives, which poses challenges to traditional deep-learning-based image retrieval frameworks. Most such frameworks need a large amount annotated training data, expensive. Moreover, machines still lack human intelligence, as illustrated by fact that they pay less attention in...