Internet as Corpus-Automatic Construction of a Swedish News Corpus
نویسنده
چکیده
This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, its application and possible future use. The corpus is aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by using an Internet agent, the so called newsAgent, downloading Swedish news text from various sources. A small part of this corpus has then been manually tagged with keywords and named entities. The newsAgent is also used as a workbench for processing the abundant flows of news texts for various users in a customized format in the application Nyhetsguiden.
منابع مشابه
Internet as Corpus Automatic Construction of a Swedish News Corpus
This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, its application and possible future use. The corpus is aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by using an Internet agent, the so called newsAgent, downloading Swedish news text f...
متن کاملDevelopment of a Swedish Corpus for Evaluating Summarizers and other IR-tools
We are presenting the construction of a Swedish corpus aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization, we will also present the results on evaluating our Swedish text summarizer SweSum with this corpus. The corpus has been constructed by using Internet agents downloading Swedish newspaper text from various sources. A sma...
متن کاملPrediction of intonation patterns of accented words in a corpus of read Swedish news
This paper describes an initial attempt at the construction of a data-driven model of Swedish intonation. The study is mainly concerned with model building and prediction of the intonation patterns of accented words in a corpus of read news in Swedish. Extraction of pitch information is achieved by performing a stylization of the pitch contours. The information is used to build a model for the ...
متن کاملPrediction of intonation patterns of accented words in a corpus of read Swedish news through pitch contour stylization
This paper describes an initial attempt at the construction of a data-driven model of Swedish intonation. The study is mainly concerned with model-building and prediction of the intonation patterns of accented words in a corpus of read news in Swedish. Extraction of pitch information is achieved by performing a stylization of the pitch contours. The information is used to build a model for the ...
متن کاملA Comparative Analysis of Institutional Identities in a Corpus of English and Persian News Interviews
Institutional identity as a concept in CDA is a field of study that deals with the identities that individuals in institutions obtain, one that merits deep research attention. News interviews as institutional instances can be analyzed based on the impersonal structures because interviewees see themselves as part of the institution and they may not take responsibility when they encounter problem...
متن کامل