In recent years, many private corporations and government organizations have digitized corpuses of legacy paper documents. Often, these organizations hope to take advantage of digital representations to transform costly manual tasks associated with paper archives into less-costly computer-assisted tasks. The most common approach toward automated information extraction is through inverted indexi...