End-to-end neural TTS has shown improved performance in speech style transfer. However, the improvement is still limited by available training data both target styles and speakers. Additionally, degenerated observed when trained tries to transfer a from new speaker with an unknown, arbitrary style. In this paper, we propose approach seen unseen on disjoint, multi-style datasets, i. e., datasets...