We demonstrate that parallel deterministic sample sort for many-core GPUs (GPU Bucket Sort) is not only considerably faster than the best comparison-based sorting algorithm for GPUs (Thrust Merge [Satish et.al., Proc. IPDPS 2009]) but also as fast as randomized sample sort for GPUs (GPU Sample Sort [Leischner et.al., Proc. IPDPS 2010]). However, deterministic sample sort has the advantage that ...