OBJECTIVE
Identify the lexical content of a large corpus of ordinary medical records to assess the feasibility of large-scale natural language processing.
METHODS
A corpus of 560 megabytes of medical record text from an academic medical center was broken into individual words and compared with the words in six medical vocabularies, a common word list, and a database of patient names. Unrecogn...