In the neck part of a two-stage object detection network, feature fusion is generally carried out in either top-down or bottom-up manner. However, two types imbalance may exist: model and gradient region interest extraction layer due to scale changes objects. The deeper network is, more abstract learned features are, that say, semantic information can be extracted. extracted image background, s...