“It’s only words, and words are all I have”: Using Latent Text Analysis to Analyze Topics in Philippine Supreme Court Decisions

نویسنده

  • Dominic J. Nardi
چکیده

In this paper, I employ an LDA model in order to classify 20,227 judicial decisions from the Philippine Supreme Court during the 1996-2012. I begin by introducing the Philippine Supreme Court, its jurisdiction, and significant controversies during this time period. Next, I explain the problems that would arise from hand-coding these judicial decisions. I then explain the Latent Dirichlet Allocation methodology, as well as the process for converting the decisions into data. After running the model, I present the resultant 36 topics and discuss potential substantive interpretations. Finally, I then use the results to analyze how the Supreme Court’s docket has changed over time, with particular attention paid to the identity of the chief justice. Ultimately, the results suggest that latent topic models could be used to determine if a court is responsive to and engaged with broader political disputes. As a field, comparative judicial politics relies heavily on our ability to classify and categorize judicial opinions by topic. Scholars often test inferences about judicial behavior on a subset of cases, such as “constitutional” or “civil rights” law. The central question underpinning the discipline’s entire research agenda is whether judges decide cases according to their policy preferences in certain issue areas. For example, in American judicial politics, attitudinalists frequently use the Supreme Court Database “issue” codes in order to predict the votes of liberal and conservative judges (Segal and Cover, 1989; Segal and Spaeth, 1996). In the comparative courts context, numerous models of judicial voting behavior include “issue” dummy variables in an attempt to control for the ways in which the type of case influence judicial voting (Carroll and Tiede, 2011; Vanberg, 2001; Carrubba and Zorn, 2010). However, the reliability of such “issue” variables has come under increased criticism on both substantive and methodological grounds. Supervised hand-coding introduces the risk of bias and coding error. Even more importantly, the discipline still lacks a consensus on how exactly to categorize cases. Legal scholars tend to group cases by the law or legal provision at issue (e.g., 14 Amendment; § 201 of the Uniform Commercial Code), while political scientists focus on fields of public policy (e.g., civil rights; contracts). Moreover, hand-coding often forces each case into a single “issue” category, overlooking the fact that many cases address multiple issues (Shapiro, 2009). There is a risk of confirmation bias as scholars categorize cases in line with their theoretical worldview, not with that of the judges they are supposedly studying (Harvey and Woodruff, 2011). In response to such methodological challenges, political scientists have begun to utilize unsupervised computer learning methods in order to uncover the latent structure in collections of textual documents (Grimmer, 2010; Gerrish and Blei, 2011; Rice, 2012). Latent text analysis determines topic clusters based on word frequency in documents, using topics actually generated from the texts themselves rather imposed by coder discretion. The Latent Dirichlet Allocation (LDA) model uses a Bayesian approach to determine complex posteriors for the probability that a particular document falls within a particular topic cluster (Blei et al., 2003; Blei and McAuliffe, 2007). In addition to avoiding the time and resources often incurred hand-coding documents, LDA minimizes coder discretion and can assign multiple topics to a single case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بازشناسی متون فارسی با استفاده از مدل زبانی n-gram و پالایش گرامری

Abstract Text recognition has been one of the growing research topics in recent years. Many of these researches have focused on recognition of letters and sub-words as a basis for identifying larger text structures such as words, phrases and sentences. This thesis presents a new method in which the recognized sub-words are combined in order to provide meaningful words and sentences in Farsi tex...

متن کامل

ارائه روشی جدید برای شاخص‌گذاری خودکار و استخراج کلمات کلیدی برای بازیابی اطلاعات و خوشه‌بندی متون

Persian words in writing with a diverse and cover all modes of grammatical words with the recruitment of a series of specific rules because it is impossible to extract keywords automatically from Persian texts difficult and complex. This thesis has attempted to use linguistic information and thesaurus, keywords Mnatry be provided. Using the symbol system is structured network can be keywords, i...

متن کامل

Latent Dirichlet Allocation for Text, Images, and Music

Latent Dirichlet Allocation (LDA) is an unsupervised, statistical approach to document modeling that discovers latent semantic topics in large collections of text documents. LDA posits that words carry strong semantic information, and documents discussing similar topics will use a similar group of words. Latent topics are thus discovered by identifying groups of words in the corpus that frequen...

متن کامل

The Supreme Court and public policy.

REVISED SYLLABUS (2/6/14) This course examines major Supreme Court decisions in light of constitutional doctrine and the public policy controversies at the time they were handed down. It then places them in broader historical context. The overarching thesis of this course is that the Supreme Court is a major player in just about every major public policy in the United States. Its decisions regu...

متن کامل

Latent Dirichlet Markov Allocation for Sentiment Analysis

In recent years probabilistic topic models have gained tremendous attention in data mining and natural language processing research areas. In the field of information retrieval for text mining, a variety of probabilistic topic models have been used to analyse content of documents. A topic model is a generative model for documents, it specifies a probabilistic procedure by which documents can be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012