In this paper we analyze a variety of standard techniques used in corpus linguistics and examine a number of issues relating to incorrect usage, the computational infeasibility of various approaches, and the inevitable sampling errors. We motivate much of the paper with applications from information retrieval but the techniques discussed and introduced are far more widely applicable, and some a...