Déjà vu - A study of duplicate citations in Medline
نویسندگان
چکیده
MOTIVATION Duplicate publication impacts the quality of the scientific corpus, has been difficult to detect, and studies this far have been limited in scope and size. Using text similarity searches, we were able to identify signatures of duplicate citations among a body of abstracts. RESULTS A sample of 62,213 Medline citations was examined and a database of manually verified duplicate citations was created to study author publication behavior. We found that 0.04% of the citations with no shared authors were highly similar and are thus potential cases of plagiarism. 1.35% with shared authors were sufficiently similar to be considered a duplicate. Extrapolating, this would correspond to 3500 and 117,500 duplicate citations in total, respectively. AVAILABILITY eTBLAST, an automated citation matching tool, and Déjà vu, the duplicate citation database, are freely available at http://invention.swmed.edu/ and http://spore.swmed.edu/dejavu
منابع مشابه
Comparing Medline citations using modified N-grams
OBJECTIVE We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information. MATERIALS AND METHODS Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the document may have been altered. These are: (1) d...
متن کاملDéjà vu: a database of highly similar citations in the scientific literature
In the scientific research community, plagiarism and covert multiple publications of the same data are considered unacceptable because they undermine the public confidence in the scientific integrity. Yet, little has been done to help authors and editors to identify highly similar citations, which sometimes may represent cases of unethical duplication. For this reason, we have made available Dé...
متن کاملIdentifying duplicate publications: primum non nocere.
Errami and Garner published in the January 24, 2008 issue of Nature an article entitled “A Tale of Two Citations” that deals with the timely subject of publication of duplicate papers (1 ). With the increase in the number of scientific journals and the ongoing pressure to publish in them, inappropriate and unethical practices such as plagiarism, unauthorized “cosubmission” of a paper to two or ...
متن کاملIdentifying duplicate content using statistically improbable phrases
MOTIVATION Document similarity metrics such as PubMed's 'Find related articles' feature, which have been primarily used to identify studies with similar topics, can now also be used to detect duplicated or potentially plagiarized papers within literature reference databases. However, the CPU-intensive nature of document comparison has limited MEDLINE text similarity studies to the comparison of...
متن کاملCopeptin response to clinical maximal exercise tests.
has been renamed “Update,” as some users expressed concern over the word “Duplicate.” With these changes in mind, we would like to emphasize that the ultimate purpose of the Déjà Vu database is to maintain the integrity of biomedical literature, a goal that can be achieved only by a thorough and accurate interpretation of the information contained within. We therefore extend to both the editors...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 24 2 شماره
صفحات -
تاریخ انتشار 2008