Finding Names in Trove: Named Entity Recognition for Australian Historical Newspapers
نویسندگان
چکیده
Historical newspapers are an important resource in humanities research, providing the source materials about people and places in historical context. The Trove collection in the National Library of Australia holds a large collection of digitised newspapers dating back to 1803. This paper reports on some work to apply named-entity recognition (NER) to data from Trove with the aim of supplying useful data to Humanities researchers using the HuNI Virtual Laboratory. We present an evaluation of the Stanford NER system on this data and discuss the issues raised when applying NER to the 155 million articles in the Trove archive. We then present some analysis of the results including a version published as Linked Data and an exploration of clustering the mentions of certain names in the archive to try to identify individuals.
منابع مشابه
تشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملLarge-scale refinement of digital historic newspapers with named entity recognition
Within the Europeana Newspapers project (www.europeana-newspapers.eu), full-text will be produced for over 10 million pages of digitised historical newspapers by applying Optical Character Recognition (OCR) and Optical Layout Recognition (OLR). In order to further increase the usability of the full-text, Named Entity Recognition (NER) is also applied to materials in Dutch, German and French lan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015