Data sets for author name disambiguation: an empirical analysis and a new resource
نویسندگان
چکیده
منابع مشابه
Author Name Disambiguation for PubMed
Log analysis shows that PubMed users frequently use author names in queries for retrieving scientific literature. However, author name ambiguity may lead to irrelevant retrieval results. To improve the PubMed user experience with author name queries, we designed an author name disambiguation system consisting of similarity estimation and agglomerative clustering. A machine-learning method was e...
متن کاملAuthor Name Disambiguation Using a New Categorical Distribution Similarity
Author name ambiguity has been a long-standing problem which impairs the accuracy of publication retrieval and bibliometric methods. Most of the existing disambiguation methods are built on similarity measures, e.g., “Jaccard Coefficient”, between two sets of papers to be disambiguated, each set represented by a set of categorical features, e.g., coauthors and published venues. Such measures pe...
متن کاملMerging error analysis of name disambiguation based on author similarity
Falsely identifying different authors as one is called merging error in the name disambiguation of coauthorship networks. Research on the measurement and distribution of merging errors helps to collect high quality coauthorship networks. In the aspect of measurement, we provide a Bayesian model to measure the errors through author similarity. We illustratively use the model and coauthor similar...
متن کاملReducing Fragmentation in Incremental Author Name Disambiguation
Author name ambiguity is a hard problem that occurs when several authors publish articles with the same name or when a same author publishes their articles under different names. Traditionally, automatic disambiguation methods process the author names of all citation records in a repository. Aiming efficiency, incremental methods disambiguate author names only when new citation records are inse...
متن کاملScaling Author Name Disambiguation with CNF Blocking
An author name disambiguation (AND) algorithm identifies a unique author entity record from all similar or same publication records in scholarly or similar databases. Typically, a clustering method is used that requires calculation of similarities between each possible record pair. However, the total number of pairs grows quadratically with the size of the author database making such clustering...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Scientometrics
سال: 2017
ISSN: 0138-9130,1588-2861
DOI: 10.1007/s11192-017-2363-5