Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

نویسندگان

چکیده

Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) closest known language is determined. We propose a model handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. capture natural phonological geometry learning character embeddings based International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed constraints. evaluate deciphered (Gothic, Ugaritic) an one (Iberian). experiments show incorporating phonetic leads to clear gains. Additionally, we measure for closeness which correctly identifies related Gothic Ugaritic. For Iberian, method does strong evidence supporting Basque as language, concurring with favored position current scholarship. 1

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deciphering ancient rapid radiations.

A deeper phylogenetic understanding of ancient patterns of diversification would contribute to solving many problems in evolutionary biology, yet many of these phylogenies remain poorly resolved. Ancient rapid radiations pose a major challenge for phylogenetic analysis for two main reasons. First, the pattern to be deciphered, the order of divergence among lineages, tends to be supported by sma...

متن کامل

A Computational Approach To Deciphering Unknown Scripts

We propose and evaluate computational techniques for deciphering unknown scripts. We focus on the case in which an unfamiliar script encodes a known language. The decipherment of a brief document or inscription is driven by data about the spoken language. We consider which scripts are easy or hard to decipher, how much data is required, and whether the techniques are robust against language cha...

متن کامل

A Computational Phonetic Model for Indian Language Scripts

In spite of South Asia being one of the richest areas in terms of linguistic diversity, South Asian languages have a lot in common. For example, most of the major Indian languages use scripts which are derived from the ancient Brahmi script, have more or less the same arrangement of alphabet, are highly phonetic in nature and are very well organised. We have used this fact to build a computatio...

متن کامل

Human-Robot Coordination Using Scripts

This paper describes an extension of scripts, which have been used to control sequences of robot behavior, to facilitate human-robot coordination. The script mechanism permits the human to both conduct expected, complementary activities with the robot and to intervene opportunistically taking direct control. Scripts address the six major issues associated with human-robot coordination. They all...

متن کامل

Using Scripts for Reactive Planning

We describe a model-based representation for real-time planning in technical processes. This means a complete model of causal and temporal dependencies is developed for an application. The main difference to other planners is that scripts, an event-oriented representation formalism is used and that goals and actions are part of the knowledge base in order to reason about old goals and already s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2021

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00354