A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of hierarchies, and solves it using parallel cyclic reduction both shared memory. This has a lower communication footprint across partitions compared to conventional algorithms involving data transposes or re-partitio...