We propose a new approach to increase inference performance in environments that require specific sequence of actions order be solved. This is for example the case maze where ideally an optimal path determined. Instead learning policy single step, we want learn can predict n advance. Our proposed method called horizon regression (PHR) uses knowledge environment sampled by A2C dimensional vector...