Generating Original Structure in Regulatory Documents

نویسندگان

  • Steven Orla Kimbrough
  • Thomas Y. Lee
  • Balaji Padmanabhan
  • Yinghui Yang
چکیده

As technology and society continue to evolve, the size of the corpus of government policies and procedures continues to more than keep pace. The U.S. Federal Tax Code today consumes over 2.8 million words or 6000 pages. There are more than 20,000 cross-references both within the code itself and to external regulations. Navigating the sea of information is a daunting task for an IRS expert let alone a well-intentioned tax payer, or policy-maker seeking to eliminate redundancies, internal inconsistencies, or loopholes. Likewise as the new Department of Homeland Security takes shape, the need for managing the 300,000 words and 650 pages of the Immigration and Nationality Act becomes urgent. While tools for tasks such as compliance checking or query answering have long held promise, automated reasoning, however intelligent, needs something to reason upon, a formalized knowledge base of some kind. AI in support of legal reasoning is no exception. It has been an enduring challenge to find ways of obtaining sufficiently structured documents. In other domains people may be the primary targets of knowledge engineering; in the policy realm, much of the requisite knowledge resides in legal and regulatory documents of various sorts. Compromising severely in favor of brevity over accuracy, we may say that there have been three main approaches to extracting formalized knowledge from documents of legal and regulatory interest. (1) Manually symbolize a relevant corpus. The approach here is to pick an appropriate formal representation language and manually (perhaps with computerized support) symbolize the essential information from the relevant collection of documents, typically statutes, administrative rules, and legal cases. (2) Accept minimally structured documents. Under this approach, the documents in the chosen corpus are formalized only in a very weak sense, e.g., by creating inverted files of their terms and limiting inference to what Information Retrieval techniques can produce. (3) Automatically structure a relevant corpus. Under this approach, pattern-finding programs extract structure from documents in a target corpus and create derived documents, such as in XML, which present their structures more transparently. The three approaches (and combinations) have complementary strengths and weaknesses. Manual symbolization affords the best prospect for deep and detailed automated inferencing and information recovery, yet it is the most labor intensive option and presents serious problems of maintenance. Accepting minimally structured documents is the least expensive alternative and the least powerful in terms of potential to support inferencing. Automated structuring lies more or less between the other two approaches. The three may be thought of as defining an operating curve that trades off cost and inferential acuity in an inverse relationship. Are there any ways to shift the operating frontier rather than limiting ourselves to seeking the best compromise point for a given application? There are such ways. Our aim in this project is to explore one such family: change how documents are originally created, write the documents in such a way that, from the initial draft, they have the requisite structure to support automated inferencing. Our objective is to automatically create documents that are equivalent in their expressiveness to documents which are manually coded ex-post as in (1) above. Alternatively, how might we generate semistructured documents that afford more complex reasoning than the automated structuring approaches as in (3) above. This project is therefore about a fourth approach to formalizing the knowledge in legal and regulatory documents: create the documents in a structured form rather than attempting to impose structure later. The intuition is to create these documents using formal sublanguages so that structure emerges as a by-product of the normal, policy-making process. More specifically, in restricted domains of import to systems of law and government, it may be feasible to draft policies and regulations using formal, special-purpose (artificial sub-) languages and vocabularies. We proceed with a brief introduction to artificial sub-languages and then turn to their potential in drafting fully-structured and semistructured documents that support automated reasoning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating paper texture of historical documents using statistical moments

This paper presents a scheme for generating paper texture of historical documents. A new entropy based segmentation algorithm is used to decompose the image of documents into the image of the paper background and the printing of the document. Statistical analysis allows filling in the gaps from the printing, yielding a blank sheet of paper with similar texture to the original document.

متن کامل

Generating XML structure using examples and constraints

This paper presents a framework for automatically generating structural XML documents. The user provides a target DTD and an example of an XML document, called a Generate-XML-ByExample Document, or a GxBE document, for short. GxBE documents use a natural declarative syntax, which includes XPath expressions and the function count. Using GxBE documents, users can express important global and loca...

متن کامل

Visual Definition of Virtual Documents for the World-Wide Web

Trying to support the presentation of large amounts of heterogeneous data on the World-Wide Web normally results in relocating and restructuring the original data. Our approach avoids these disadvantages by generating metadata imposing an arbitrary logical structure on existing and new data. This paper proposes a new high-level visual language as a user-friendly means to control the process of ...

متن کامل

The Study of Effective Factors with Emphasis on Training for Employees' Empowerment in Center for Medical Documents in Social Security Organization

In an organization, human resources are known as a valuable and lasting capital. In order to get the most out of these resources, employees' empowerment appeared in the management literature. Empowerment refers to the delegation of authority in order to lay appropriate ground for self-motivation and self-efficacy among employees.The main aim of this study was to explore the effective factors, e...

متن کامل

Taking Advantage of Out-of-Corpus Information for Citation Network Clustering

In this paper we explore the use of several popular clustering and graph partitioning algorithms as a method of generating clusters of related scientific documents and suggest a simple graph augmentation technique for taking advantage of external information. We show that by hallucinating nodes for scientific documents that are cited but not present in the original dataset, we can improve perfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003