Pretraining large neural language models, such as BERT, has led to impressive gains on many natural processing (NLP) tasks. However, most pretraining efforts focus general domain corpora, newswire and Web. A prevailing assumption is that even domain-specific can benefit by starting from general-domain models. In this paper, we challenge showing for domains with abundant unlabeled text, biomedic...