Large Scale Subject Category Classification of Scholarly Papers With Deep Attentive Neural Networks
نویسندگان
چکیده
Subject categories of scholarly papers generally refer to the knowledge domain(s) which belong, examples being computer science or physics. category classification is a prerequisite for bibliometric studies, organizing scientific publications domain extraction, and facilitating faceted searches digital library search engines. Unfortunately, many academic do not have such information as part their metadata. Most existing methods solving this task focus on unsupervised learning that often relies citation networks. However, complete list citing current paper may be readily available. In particular, new few no citations cannot classified using methods. Here, we propose deep attentive neural network (DANN) classifies only abstracts. The trained nine million abstracts from Web Science (WoS). We also use WoS schema covers 104 subject categories. proposed consists two bi-directional recurrent networks followed by an attention layer. compare our model against baselines varying architecture text representation. Our best achieves micro- F1 measure 0.76 with individual ranging 0.50 0.95. results showed importance retraining word embedding models maximize vocabulary overlap effectiveness mechanism. combination vectors TFIDF outperforms character sentence level models. discuss imbalanced samples overlapping suggest possible strategies mitigation. determine distribution in CiteSeerX classifying random sample one papers.
منابع مشابه
Large-Scale YouTube-8M Video Understanding with Deep Neural Networks
Video classification problem has been studied many years. The success of Convolutional Neural Networks (CNN) in image recognition tasks gives a powerful incentive for researchers to create more advanced video classification approaches. As video has a temporal content Long Short Term Memory (LSTM) networks become handy tool allowing to model long-term temporal clues. Both approaches need a large...
متن کاملLarge-Scale Text Classification with Recurrent Neural Networks
요 약 문서 분류 문제는 오랜 기간 동안 자연어 처리 분야에서 연구되어 왔다. 우리는 기존 컨볼루션 신경망을 이용했던 연구에서 더 나아가, 순환 신경망에 기반을 둔 문서 분류를 수행하였다. 순환 신경망에서는 가장 성능이 좋다고 알려져 있는 장기-단기 기억 (Long-Short Term Memory; LSTM) 신경망과 회로형 순환 유 닛(Gated Recurrent Units; GRU)을 활용하였다. 실험 결과, 분류 정확도는 Multinomial Naive Bayesian Classifier, SVM, LSTM, CNN, GRU의 순서로 나타났다. 따라서 텍스트 문서 분류 문제는 시퀀스를 고려하 는 것 보다는 문서의 feature를 뽑아 분류하는 문제에 가깝다는 것을 추측할 수 있었다. 그리고 GRU...
متن کاملSkin Lesion Classification Using Deep Multi-scale Convolutional Neural Networks
Melanoma is a malignant tumour originating from melanocytes cells skin cells responsible for the production of melanin. The American Cancer Society estimates that in the United States alone for 2017, more than 87,000 new melanoma cases will be diagnosed and around 9,300 persons are expected to die[1]. Skin melanoma lesions are very challenging to visually diagnose due to their similarity in vis...
متن کاملDeep Fisher Networks for Large-Scale Image Classification
As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Frontiers in Research Metrics and Analytics
سال: 2021
ISSN: ['2504-0537']
DOI: https://doi.org/10.3389/frma.2020.600382