classecol: Classifiers to understand public opinions of nature

نویسندگان

چکیده

Ecology has become more transdisciplinary to better understand our environment. For example, ecosystem services reflect health, economic and cultural values (Kareiva et al., 2011), journals societies want study human relationships with nature (Gaston 2019; Society for Conservation Biology working groups, 2020). This shift brought the dimension of into focus, but human–nature largely falls outside traditional expertise an ecologist or conservationist, who may be unfamiliar available methods data. Social media could help us relationships. Historically, surveys (or other qualitative approaches) have assessed perceptions, often providing a detailed understanding person's thoughts. does not offer such detail, is cost-effective, less time-intensive offers enormous amounts information (Fox In 2020, social widely used in most countries, approximately half world's population (and increasing) being active users (Clement, captures many data types (e.g. text, photos, videos, sound interaction networks people) spatial representation temporal time series that allow holistic analyses (Toivonen 2019). recent years, use, diversity uses, analysis across environmental sciences rapidly increased (Ghermandi & Sinclair, been develop species distribution models (August 2020), measure aesthetic recreational (Graham Eigenbrod, Van Zanten 2016), track illegal wildlife trade (Di Minin 2018) determine role nature-based tourism (Hausmann 2017). The abundance availability on these platforms—many now 15 years old, open door research. Analyses revolutionise human—nature relationship how it impacts environment, this requires new improved tools There are approaches ‘mine’ opinions gain insights from text (Aggarwal Zhai, 2013). sentiment aims emotion classifying text's language use as negative, neutral positive (Liu, can done machine learning approaches, readily accessible approach interested ecologists conservationists would lexicon-based analysis. Lexicon-based assign scores words calculate average score passage, if negative used, will labelled negative. Overall effective describing sentiment, meaning unclear (Aldayel Magdy, Mohammad return two messages ‘It sad Pangolin vanishing’ ‘Pangolins bad’ (both language), failing recognise only second message indicates dislike pangolins. Furthermore, some lexicons, names ‘shark’) which bias results we (Lennox Stance alternative 2013; Liu, 2020; Srivastava Sahami, 2009), targeted towards assessing about topics specific questions. recognize pangolins example above, method time-consuming large training datasets alongside complex models. generality stance low. model was built detect fondness pangolins, limited species. So whilst gets far closer (relative analysis) user's opinion, useful, also need derived broad array themes, answer general pertinent With massive growth analysis, especially studies using look at people's perceptions 2019), there great To meet demand, present classecol cleaning, processing classification tool support public big setting. avoids interpretation issues specificity identify relevant texts, describe their type user produced text. provides proof concept guide encourage further development ecology, hope groups developing classifiers consider uploading them package—becoming formal contributors (see package vignette). classecol's 10 trained tested Twitter data, fall within three topics: Prior ten collection, developed base each following eight steps: (a) Defined protocol criteria must category What characteristics distinguish pro- against-hunting?). (b) Ensured accurately consistently protocol. (c) Seven individuals classified 1,100 texts topic (tweets hunting nature, provided descriptions bio) creating dataset 7,700 per topic. (d) Built six including multinomial logistic regression, vector machines, naïve Bayes, random forest, K nearest neighbour four-layer neural network. A regression then merge outputs generating ensemble classifier. (e) Tested performance identified cases misclassification refine criteria. (f) Corrected misclassified refined (g) Finalised (h) different cleaning options raw very clean text—see Table S1) maximised precision recall defined below). These steps Supporting Information: Developing classifiers. final protocol, categories four bio (one added during reclassification steps) Hunting Nature Bio We report F-score (Zhang Zhang, 2009) accuracy classifier, overall classifier (average weighted by proportional category). Accuracy measured independent sample, is, F = 1 perfect classification. had high (0.87–0.97) accuracies (Figure 1), except Irrelevant, where lower (0.64–0.72) driven low (0.54–0.61). Nearly Irrelevant were assigned wrong category. classifiers, ranged 0.82 0.92, moderate all Pro-nature (negative phrasing) Against-nature ‘full’ model. (0.67) (0.4), probably because represented 1.1% classifications. coverage make unreliable, explain why model, despite good Given finding, removed trimmed recommend over full Finally, models, 0.79 0.87, categories. All characterised Figures S6–S8. collection analyses, any research project involving opinion should legal ethical requirements—see Data rights ethics Information. functions groups: first group includes five value anyone natural processing. function comprehensive options, conversion common emoticons, abbreviations, slang environment-related hashtags readable valence detects presence terms alter, reverse amplify meaning. contract performs word stemming lemmatisation reduce term complexity consulting becomes consult). lang_eng non-English terms. senti_matrix pulls together 11 popular one function, produce matrix sentence. conjunction, assess you remove before running function. Our important component classecol. processed through Python backend, thus require downloading installing (we version 3.6). automatically R addeR::py_download (Johnson, 2021a). load_classecol downloads module dependencies. links backend needs run every environment loaded; modules downloaded once. hun_class, nat_class bio_class perform classifications hunting, respectively. clean(level “simple”) hun_class “full”)for nat_class, no required bio_class. indicators, well record vignette https://github.com/GitTFJ/classecol). contain multiple valuable scenarios 1). relevance identifies whether irrelevant classifies against-hunting runs both stance. Similarly, pro-nature phrasing combines both. low-accuracy category, caution. bio_class, person not, expert persons experts adds additional ‘Nature organisation’ Classifiers hierarchically followed stance) rather than combined computational little impact accuracies, stacked. explore public's USA, lang_eng, members bio_class(type “full”) hun_class(type “full”). When manually sample your so determined. suite processing, assist academics policy-makers exploring dimensions theme, in-turn value, extends beyond fields ecology conservation, scientists, geographers scientists evidence achieved inspire future (methods code openly available). Admittedly, costs supervised like lengthy datasets, laborious compile, mentioned earlier, lack generality. Whilst designed its accuracy) unknown. cautiously non-Twitter always (by human), tested. Despite hundreds scarcity comparison testing means representativeness remains unknown, error-prone. when human-classified expect opposing stances, tweets, Against-hunting tweets primarily scores, Pro-hunting scores. However, between overlap 2). Sentiment unable stances (lexicon-based polarity, infer meaning). ensure robustly sciences, pivotal frameworks developed. Big culturomics ecological conservation already reliant work science. Transdisciplinary key harnessing data's potential, careful testing. scrutiny onto next include potential classecol, knowledge publicly yet explored, growing community. authors thank R.C. reviewers feedback, M.G. name. T.F.J. thanks NERC (Natural Environment Research Council) Centre Doctoral Training studentship (J71566E), M.G.-S. Royal (IE160539) funding. reviewed M.G.-S.; T.F.J., L.D., G.D., T.F., B.M.H., H.K. N.P. datasets; prepared manuscript draft. critically manuscript. peer review history article https://publons.com/publon/10.1111/2041-210X.13596. conditions prevent sharing Code https://github.com/GitTFJ/classecol_dev located https://github.com/GitTFJ/classecol. static 0.4.0 archived Zenodo 2021b). Please note: publisher responsible content functionality supporting supplied authors. Any queries (other missing content) directed corresponding author article.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The nature of public-private partnership

To explaining the public-private partnership matter, we have to search about the answer of this questions: public-private partnership Is subject of what kind of legal contracts? private law? Administrate law? And what kind of Outstanding feature make it difference from other contracts.in order to finding the answer we Explained the concept of partnership, the definition of governmental contract...

متن کامل

Towards Simple, Easy-to-Understand, yet Accurate Classifiers

We design a method for weighting linear support vector machine classifiers or random hyperplanes, to obtain classifiers whose accuracy is comparable to the accuracy of a non-linear support vector machine classifier, and whose results can be readily visualized. We conduct a simulation study to examine how our weighted linear classifiers behave in the presence of known structure. The results show...

متن کامل

System Dynamics to Understand Public Information Technology

introduction Information technology development and implementation have been recognized as forms of organizational change (Doherty & King, 2003; Orlikowski, 2000). Public-sector organizations are interested in this process of change because of the expected benefits of using IT, such as cost savings, improved service quality, increased accountability , and public participation (Gil-Garcia & Helb...

متن کامل

Using N-Grams To Understand the Nature of Summaries

Although single-document summarization is a well-studied task, the nature of multidocument summarization is only beginning to be studied in detail. While close attention has been paid to what technologies are necessary when moving from single to multi-document summarization, the properties of humanwritten multi-document summaries have not been quantified. In this paper, we empirically character...

متن کامل

To Understand Nature - Computer Modelling between Genetics and Evolution

We have presented the basic knowledge on the structure of molecules coding the genetic information, mechanisms of transfer of this information from DNA to proteins and phenomena connected with replication of DNA. In particular, we have described the differences of mutational pressure connected with replication of the leading and lagging DNA strands. We have shown how the asymmetric replication ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Methods in Ecology and Evolution

سال: 2021

ISSN: ['2041-210X']

DOI: https://doi.org/10.1111/2041-210x.13596