Traditional architectural approaches for increasing microprocessor performance rely on the use of large, complex, highly-speculative out-of-order cores to extract InstructionLevel Parallelism (ILP) from single-threaded applications. In order to realize high performance, these designs employ a myriad of speculative techniques, ranging from branch prediction to load-latency prediction and memory-...