Natural Language Processing for Large Scale Analysis of Eczema and Psoriasis Social Media Comments

نویسندگان

چکیده

Social media tools are widely used by dermatologic patients. Eczema and psoriasis, two of the most common inflammatory skin diseases, well represented on social site Reddit. We natural language processing to examine comments in subreddits r/psoriasis r/eczema (combined user base >187,000), tracking commenters’ interest level sentiments related treatments for psoriasis eczema, as discussion adverse drug reactions. All from 2014 - 2020 (n=196,571) (n=123,144) were retrieved processed using tools. Comment volume antibacterial therapies, lifestyle changes, prednisone decreased 2014-2020, while phototherapy remained stable dupilumab comment increased. newer therapeutics (including biologics, apremilast) increased after FDA approval, older therapies such etanercept, adalimumab, methotrexate over time. Sentiment scores tended decrease years following approval. Among treatments, calcipotriene branded calcipotriene/betamethasone foam had highest sentiment apremilast lowest overall score. These analyses also identified changes patient eczema suggesting an area additional research. Psoriasis diseases that can substantially impact patients’ quality life.(Falissard et al., 2020Falissard B. Simpson E.L. Guttman-Yassky E. Papp K.A. Barbarot S. Gadkari A. al.Qualitative Assessment Adult Patients’ Perception Atopic Dermatitis Using Natural Language Processing Analysis a Cross-Sectional Study.Dermatol. Ther. 2020; 10: 297-305Crossref Scopus (8) Google Scholar; Moberg 2009Moberg C. Alderling M. Meding Hand life: population-based study.Br. J. Dermatol. 2009; 161: 397-403Crossref PubMed (128) Scholar) A wide range topical, systemic, nonpharmacologic therapeutic interventions available each these conditions, but treatment regimens chronic conditions be time-consuming, difficult, or unpleasant. Studies show noncompliance is among patients with conditions.(Murage 2018Murage M.J. Tongbram V. Feldman S.R. Malatestinic W.N. Larmore C.J. Muram T.M. al.Medication adherence persistence rheumatoid arthritis, psoriatic arthritis: systematic literature review.Patient Prefer. Adherence. 2018; 12: 1483-1503Crossref (83) Patel 2017Patel N.U. D’Ambra Increasing Adherence Topical Agents Dermatitis.Am. Clin. 2017; 18: 323-332Crossref (31) Feldman, N. Dermatitis.Adv. Exp. Med. Biol. 1027: 139-159Crossref (28) Patient counseling frequency visits correlates compliance improvement,(Heaton 2013Heaton Levender M.M. Timing office powerful tool improve conditions.J. Treat. 2013; 24: 82-88Crossref (17) frequently consult information sources outside clinic searches, media, health forums.(Sunkureddi 2018Sunkureddi P. Doogan Heid Benosman Ogdie Martin L. al.Evaluation Self-reported Experiences: Insights Digital Communities Psoriatic Arthritis.J. Rheumatol. 45: 638-647Crossref (13) Wu 2020Wu Q. Xu Z. Dan Y.-L. Zhao C.-N. Mao Y.-M. Liu L.-N. al.Seasonality global public psoriasis: infodemiology study.Postgrad. 96: 139-143Crossref (15) Data informs clinicians about perceptions disease identify knowledge gaps contribute more effective management.(Sunkureddi (NLP) techniques have been study large quantities unstructured survey response data, including assessing adult their atopic dermatitis.(Falissard While historically researched primarily through studies, similar data now thousands communities Reddit variety forums. NLP methods efficiently effectively process illuminate key insights into perception management. As Buntinx-Krieg viable source reports disease.(Buntinx-Krieg 2017Buntinx-Krieg T. Caravaglio Domozych R. Dellavalle R.P. Dermatology Reddit: elucidating trends communications world web.Dermatol. Online 23PubMed 6th popular website United States divided up forums called subreddits. Several examined present study. categorize medium.(Couto, 2019Couto F.M. Text Processing.Adv. 2019; 1137: 45-60Crossref (0) Okon 2020Okon Rachakonda Hong H.J. Callison-Burch Lipoff J.B. evaluate dermatology experiences therapeutics.J. Am. Acad. 83: 803-808Abstract Full PDF (26) The uses validated better understand -- focus analysis time patient-reported reactions discussed This demonstrates use methodology applied topics order community relevant topics. open platform 430 million monthly active users. It politics, sports, health, other subreddit 2013 obtained Pushshift API.(Baumgartner 2020Baumgartner Zannettou Keegan Squire Blackburn Dataset.Proc. Int. AAAI Conf. Web Soc. Media. 14: 830-839Crossref Each was analyzed individually Python 3.(Van Rossum Drake, 2009Van G, Drake FL. 3 Reference Manual. Scotts Valley, CA: CreateSpace; 2009.Google datasets filtered removal empty whitespace (comments containing no characters than whitespace), string “I am bot” (automated messages), deleted removed Reddit’s moderators posting longer visible public), only non-Unicode (such some emojis images) removed. comments. post-filtering preprocessed techniques. text made lowercase punctuation One-letter words “I”, “a”, isolated letters not part larger word Next, tokenized, converting list words. Stop lists union Genism’s Toolkit’s (NLTK) stop list, along “use” “like”.(Bird 2009Bird S, Klein E, Loper Python: analyzing toolkit. O’Reilly Media, Inc.; Řehůřek Sojka, 2010Řehůřek R, Sojka Software Framework Topic Modelling Large Corpora. Proc. LREC 2010 Workshop New Chall. Framew. Valletta, Malta: ELRA; 2010. p. 45–50Google do much meaning create noise data. In r/eczema, r/psoriasis, title respectively, treated based assumption almost all would regardless whether written particular comment. After removed, remaining lemmatized NLTK’s WordNet Lemmatizer.(Bird Lemmatization removes words’ inflectional endings reduces down infinitive dictionary form. bag (BOW) model genism dictionary.(Řehůřek BOW counts number occurrences document. way easy implement, it does capture For example, “The president good will fail” both same model. created, appeared fewer 15 times 50% documents extremely high low frequencies Following preprocessing filtering, 1000 (number contain at least once) reviewed manually reduced 100 per relevance. Samples authorship (e.g., first-person versus family member). Regular expressions (RE) count contained selected within corresponding subreddit. RE searches set case insensitive. addition individually, groups. Words generic brand name description medication grouped together analysis. Dupixent Dupilumab combined group total and/or counted. addition, completed individually. Matplotlib plots subreddit, showing percentage specified year during period.(Hunter, 2007Hunter J.D. Matplotlib: 2D graphics environment.Comput. Sci. Eng. IEEE COMPUTER SOC. 2007; 9: 90-95Crossref (15982) Percentage calculated dividing pattern periods. Valence Aware Dictionary sEntiment Reasoning (VADER) extract drug.(Hutto Gilbert, 2014Hutto CJ, Gilbert VADER: Parsimonious Rule-based Model Media Text. Ann Arbor, MI; 2014.Google VADER analyzer which determines -1 +1 scale, being negative possible, positive 0 completely neutral sentiment. lexicon rules negation. specific tracked median drugs treat compared. this study, ranged -0.9981 0.9997. (ADRs) searching one term reflecting known ADR drug. between compared boxplots Kruskal-Wallis rank sum tests. Confidence intervals percentages Wilson score intervals. Two-sided p-values < 0.05 considered statistically significant. done (version 3.7) R 4.1.2). generated current author reasonable request. there 123,144 196,571 There 24,759 unique commenters 14,015 steadily rose 1091 11,389 2020, 1039 5939 2020. comments, commenters, commenter shown Table 1a, 1b.Table 1aNumber r/eczema.Year# Comments# Unique Commenters# Comments Per Commenter201449699725.112015889415995.562016785818254.3120172080735555.8520183272550886.4320195080279376.40202070516113876.19 Open table new tab 1bNumber r/psoriasis.Year# Commenter2014512010384.932015701214184.942016837815695.3420171100021495.1720182070731906.4920193285246317.0920203392959165.74 Proportion keywords r/eczema: Total initial 6665 5120 increasing 70,516 33,929 r/psoriasis. terms (Figure 1). Terms bacteria (“staph” “infection”) (“cider” OR “vinegar”, “bleach”, “antibiotic”) trend toward lower cyclosporine trended downward period 1b). Interest peaked 2018 when 4% “dupilumab” “Dupixent”. Phototherapy proportionally stable. dashed lines demonstrate topic prior approval 2016. 2017 (its approval), introduced 3% 2018. Use its name, “Dupixent”, became used. Not notable, “steroid” 12.8% 7.9% “moisturizer” alternative spelling “moisturiser” 5.1% 3.5% Probiotics saw increase 2015 2016, exception, consistent diet stress “stress”, “gluten”, “milk”, “diet” 1c). Volume r/psoriasis: biologic determined equivalent calculate percentage. Medications approved since ixekizumab, guselkumab, secukinumab proportionately coinciding 2a). Drugs etanercept adalimumab – demonstrated Despite time, continued higher any every examined, Ustekinumab 2019 biologics dropped “calcipotriol” “Enstilar” has whereas tacrolimus remains 2b). steroid 4.1% 5.3% Apremilast peak greater 2% subsequently 2c). Methotrexate 2016 2.3% slight drop determine changing regarding might simple reflection usage population, prescription sales medications, cyclosporine, etanercept. prescriptions prednisone, 2014-2019 clincalc.com. Review did correlation conversation four medications examined. Notably, multiple medical therefore possible indications solely medications. Moreover, many factors prevalence may drive commentors’ newly side effects, usage. accurately sorted subreddits, r/eczema. search “Dupixent” refer 6346 mentioned 18 Likewise, “Humira” “adalimumab”, refers seen 8201 41 manual review 30 randomly conducted originated reporting personal experiences. From 24 out (80%, 95% confidence interval 63%-91%) describing experience. 22 (73%, 56%-86%) 0.59 before 0.42 3a). Median “Otezla” “apremilast” 0.82 0.34 3b). “Taltz” “ixekizumab” gradually 0.48 0.35 2019, jumped back 0.46 3c). “calcipotriene” 0.80, 4). 1021 guselkumab exception Enstillar 161 analyzed. ADRs performing reaction effects (conjunctivitis nausea diarrhea use) listed package insert. performed, ten (Table 2). self-reports ADR. 167 found either “nause” queried because nauseous. 73 Otezla diarrhea. 82 conjunctivitis.Table 2Potential Adverse Drug Reactions detected RE:SubredditDrug NameReaction# Commentsr/psoriasisOtezla apremilastnause vomit throw threw up167r/psoriasisOtezla apremilastdiarrhea73r/eczemaDupixent dupilumabconjunctivitis82Table 2 Number Expression patterns quantity non-prescription 5). “aloe vera” 0.2% 0.32% 0.24% 2017, then stayed around 0.3% rest period. “tea tree oil” relatively maximum 0.28% 2015, minimum 0.15% 2019. “turmeric” 0.02% 0.22% 0.10% Average apple cider vinegar ACV 0.47 766 “bleach bath” 0.40 1,876 By comparison, 0.36 2,730 dupilumab/dupixent 5,941 commonly calendula, Out 480 “apple vinegar”, 28 mild effect irritation, described actual burn lasting symptoms pain. 102 “calendula” reviewed. Only stinging, events observed. experience (itch, pain, sleep) counted “painful” “sleepy” search, just looks letters, word. 8.39% “itch”, 3.82% 3.19% “pain” 4.27% “sleep” 2.72% 1.18% provides insight options self-reported effects. results suggest users who knowledgeable anticipated cases, increases decreases appear parallel degree favored dermatologists consensus. Keyword oral approved, tend rise rapidly plateau. contrast, ago become available, forum participants acquire lose Nonetheless, despite indicating sustained established recent years, lifestyle, approaches. reflect true relative reduction due slightly different possibly first agent indication. moderate 1% very few exclusively referred likely fact already approved. appears management recommendations expected understanding recommended explore treatments. across tends high, followed subsequent years. Prior drug, focused anticipation perceived benefits diminish usage, even cases where improvement previous options. they increasingly against drugs, hamper enthusiasm. topical intermediate injected Personal potential reaction. revealed numerous ADRs. isotretinoin hundreds mentioning mood. diarrhea, conjunctivitis 68 (though underestimate given lay condition). self-report valuable known, surveillance descriptions help characterize generally big rare previously clinical studies. Exploring identification aggregated reported outcome measures offers case. findings limited generalizable individuals conditions. Individuals commenting invested management, representing population average Additionally, must technical participate forum. Although allows unstandardized nature inherent limitations uncaptured alternate spellings choices transition certain approach helped mitigate limitation identifying terms. Another distinct lack transparent limiting ability directly compare volumes metric individual Any relies loss finer subtleties meanings examination. captures words, provide full sentence comment, thereby interpretation intended portion random sample validate Future studies combine represent obtain detailed information. paper successfully could disease, Examination uniquely physician researcher patient’s own without restrictions structured surveys bias wishing please clinician researcher. interventions. Tools highly language, trending communicate patients, paving enhanced care future. Conceptualization: JAC, VEN; Curation: JAC; Formal Analysis: GZ, Funding Acquisition: N/A; Investigation: Methodology: Project Administration: Resources: GZ; Software: Supervision: Validation: Writing Original Draft: & editing: VEN Primary accessed at: https://github.com/JackCummins493/eczema_psoriasis. IRB Approval: waived Mass General Brigham Institutional Board determination Human Subjects Research (ID #514).

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natural Language Processing for Social Media

Today, social media refers to a wide range of Web sites and Internet-based services that allow users to create content and interact with other users. Some of these tools, such as multi-party chats, discussion forums, blogs, and online reviews, have been a focus of natural language processing (NLP) research for quite some time now. But within the last decade, NLP work has expanded rapidly to cov...

متن کامل

Natural Language Processing for Health and Social Media Authors:

Intro Social media, such as Twitter, has shown great potential to analyze real world events, such as politics, product sentiment and natural disasters. In recent years, social media has emerged in the health community, particularly in public health, as a revolutionary data source for a wide range of problems. Vast amounts of naturalistic population data can be collected through social media muc...

متن کامل

A Pointillism Approach for Natural Language Processing of Social Media

The Chinese language poses challenges for natural language processing based on the unit of a word even for formal uses of the Chinese language, social media only makes word segmentation in Chinese even more difficult. In this document we propose a pointillism approach to natural language processing. Rather than words that have individual meanings, the basic unit of a pointillism approach is tri...

متن کامل

Single-Pass, Adaptive Natural Language Filtering: Measuring Value in User Generated Comments on Large-Scale, Social Media News Forums

There are large amounts of insight and social discovery potential in mining crowd-sourced comments left on popular news forums like Reddit.com, Tumblr.com, Facebook.com and Hacker News. Unfortunately, due the overwhelming amount of participation with its varying quality of commentary, extracting value out of such data isn't always obvious nor timely. By designing efficient, single-pass and adap...

متن کامل

Large - Scale Semi - Supervised Learning for Natural Language Processing

Natural Language Processing (NLP) develops computational approaches to processing language data. Supervised machine learning has become the dominant methodology of modern NLP. The performance of a supervised NLP system crucially depends on the amount of data available for training. In the standard supervised framework, if a sequence of words was not encountered in the training set, the system c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: JID innovations

سال: 2023

ISSN: ['2667-0267']

DOI: https://doi.org/10.1016/j.xjidi.2023.100210