Abstract We develop new theoretical results on matrix perturbation to shed light the impact of architecture performance a deep network. In particular, we explain analytically what learning practitioners have long observed empirically: parameters some architectures (e.g., residual networks, ResNets, and Dense DenseNets) are easier optimize than others convolutional ConvNets). Building our earlie...