Legal Documents Clustering using Latent Dirichlet Allocation
نویسندگان
چکیده
At present due to the availability of large amount of legal judgments in the digital form creates opportunities and challenges for both the legal community and for information technology researchers. This development needs assistance in organizing, analyzing, retrieving and presenting this content in a helpful and distributed manner. We propose an approach to cluster legal judgments based on the topics obtained from Latent Dirichlet Allocation (LDA) using similarity measure between topics and documents. The developed topic based clustering model is capable of grouping the legal judgments into different clusters in effective manner. As per as our knowledge is concerned this is the first approach to cluster Indian legal judgments using LDA topic model. General Terms Documents Clustering, Similarity measure.
منابع مشابه
Document Representation Methods for Clustering Bilingual Documents
Globalization places people in a multilingual environment. There is a growing number of users to access and share information in several languages for public or private purpose. In order to deliver relevant information in different languages, efficient multilingual documents management is worthy of study. Generally, classification and clustering are two typical methods for documents management....
متن کاملDocument Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps
Clustering and visualization of large text document collections aids in browsing, navigation, and information retrieval. We present a document clustering and visualization method based on Latent Dirichlet Allocation and self-organizing maps (LDA-SOM). LDA-SOM clusters documents based on topical content and renders clusters in an intuitive twodimensional format. Document topics are inferred usin...
متن کاملTopic Models For Feature Selection in Document Clustering
We investigate the idea of using a topic model such as the popular Latent Dirichlet Allocation model as a feature selection step for unsupervised document clustering, where documents are clustered using the proportion of the various topics that are present in each document. One concern with using “vanilla” LDA as a feature selection method for input to a clustering algorithm is that the Dirichl...
متن کاملSMART Electronic Legal Discovery Via Topic Modeling
Electronic discovery is an interesting sub problem of information retrieval in which one identifies documents that are potentially relevant to issues and facts of a legal case from an electronically stored document collection (a corpus). In this paper, we consider representing documents in a topic space using the well-known topic models such as latent Dirichlet allocation and latent semantic in...
متن کاملSingle Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation
This paper describes the design of a system for extracting keyphrases from a single document The principle of the algorithm is to cluster sentences of the documents in order to highlight parts of text that are semantically related. The clusters of sentences, that reflect the themes of the document, are then analyzed to find the main topics of the text. Finally, the most important words, or grou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012