Deep learning frameworks optimize the computation graphs and intra-operator computations to boost inference performance on GPUs, while inter-operator parallelism is usually ignored. In this paper, a unified framework, AutoGraph, proposed obtain highly optimized in favor of parallel executions GPU kernels. A novel dynamic programming algorithm, combined with backtracking search, adopted explore ...