Text Prediction for Translators
نویسنده
چکیده
Demand for the servi es of translators is on the in rease, and onsequently so is the demand for tools to help them improve their produ tivity. This thesis proposes a novel tool intended to give a translator intera tive a ess to the most powerful translation te hnology available: a ma hine translation system. The main new idea is to use the target text being produ ed as the medium of intera tion with the omputer. In ontrast to previous approa hes, this is natural and exible, pla ing the translator in full ontrol of the translation pro ess, but giving the tool s ope to ontribute when it an usefully do so. A simple version of this idea is a system that tries to predi t target text in real time as a translator types. This an aid by speeding typing and suggesting ideas, but it an also hinder by distra ting the translator, as previous studies have demonstrated. I present a new method for text predi tion that aims expli itly at maximizing a translator's produ tivity a ording to a model of user hara teristi s. Simulations show that this approa h has the potential to improve the produ tivity of an average translator by over 10%. The ore of the text predi tion method presented here is the statisti al model used to estimate the probability of up oming text. This must be as a urate as possible, but also eÆ ient enough to support real-time sear hing. I des ribe new models based on the te hnique of maximum entropy that are spe i ally designed to balan e a ura y and eÆ ien y for the predi tion appli ation. These outperform equivalent baseline models used in prior work by about 50% a ording to an empiri al measure of predi tive a ura y, with no sa ri e in eÆ ien y. keywords: ma hine-assisted translation, text predi tion, statisti al translation models TABLE OF CONTENTS List of Figures iv List of Tables vi Chapter 1: Introdu tion 1 1.1 TransType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 The Predi tion Task . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Making Predi tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 2: Target-Text Mediated Intera tive Ma hine Translation 22 2.1 Status and Prospe ts for Ma hine Translation . . . . . . . . . . . . . 23 2.2 Ma hine-Assisted Translation . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Target-Text Mediated IMT . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4 TransType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 3: An EÆ ient Translation Model 44 3.1 Introdu tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3 Feature Sele tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5 Dis ussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.6 Con lusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Chapter 4: An Improved MEMD Translation Model 64 4.1 Introdu tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4 Con lusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 73 Chapter 5: Predi tion For Translators 77 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Estimating Pre x Probabilities . . . . . . . . . . . . . . . . . . . . . 78 5.3 User Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.4 Sear h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.6 Con lusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter 6: Con lusion and Future Work 105 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Referen es 109 Appendix A: Predi tion Examples 125 Appendix B: Statisti al Ma hine Translation 130 B.1 The Noisy Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 B.2 Alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 B.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 B.4 Models 1, 2, and HMM . . . . . . . . . . . . . . . . . . . . . . . . . . 135 B.5 Models 3, 4, and 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 ii Appendix C: Feature Sele tion in a Fren h MEMD Language Model 139 C.1 Introdu tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 C.2 Maximum Entropy/Minimum Divergen e Model . . . . . . . . . . . . 142 C.3 Feature Sele tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 C.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 C.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 C.6 Con lusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 iii LIST OF FIGURES 1.1 S reen dump of the TransType prototype. The sour e text is shown in the top half of the s reen, and the target text is typed in the bottom half, with suggestions given by the menu at the ursor position. . . . . . . . . . . . 3 1.2 Example of a predi tion for English to Fren h translation. s is the sour e senten e, h is the part of its translation that has already been typed, x is what the translator wants to type, and x is a predi tion. The surrounding ontexts S and H from whi h these senten es are drawn are omitted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Algorithm for IBM1 gains. freqs(s) gives the number of times s o urs in s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 MEMD performan e versus number of features for various featuresele tion methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3 Performan e of the linear model versus number of IBM1 parameters. 58 4.1 Example of an English to Fren h translation generated by the MEMD model of hapter 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 MEMD2B partition sear h path, beginning at the point (10; 10). Arrows out of ea h point show the on gurations tested at ea h iteration. 72 4.3 Validation orpus perplexities for various MEMD2B models. Ea h onne ted line in this graph orresponds to a verti al olumn of sear h points in gure 4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 iv 5.1 Example of a predi tion for English to Fren h translation. s is the sour e senten e, h is the part of its translation that has already been typed, x is what the translator wants to type, and x is a predi tion. The surrounding ontexts S and H are omitted. . . . . . . . . . . . . 78 5.2 Agreement between model and empiri al probabilities. Points on the observed urve represent the proportion of times ŵ = argmaxwp(wjh; s) was orre t among all ŵ to whi h the model assigned a probability in the interval surrounding the point on the x axis. The total rel freq urve gives the number of predi tions in ea h interval over the total number of predi tions made. . . . . . . . . . . . . . . . . . . . . . . . 83 5.3 Probability that a predi tion will be a epted versus its gain. . . . . . 88 5.4 Time to read and a ept proposals versus their length . . . . . . . . . 91 5.5 Time to read and reje t proposals versus their length . . . . . . . . . 91 B.1 One possible alignment between the English text on the left and the Fren h text on the right (for Fren h to English translation). Fren h words onne ted to the null word are shown without lines. . . . . . . . . . . . . . . 133 C.1 Performan e of MI and gain feature sele tion methods. Ea h point represents the performan e over the test orpus of a MEMD model using a feature set of the given size. . . . . . . . . . . . . . . . . . . . 158 v LIST OF TABLES 3.1 Corpus segmentation. The held-out segment was used to train interpolation oeÆ ients for the trigram and the ombining weights for the overall linear model; the train segment was used for all other training tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Top 10 word pairs for ea h feature-sele tion method. . . . . . . . . . 57 3.3 Comparison of model performan es . . . . . . . . . . . . . . . . . . . 59 4.1 Corpus segmentation. The train segment was the main training orpus; the held-out 1 segment was used for ombining weights for the trigram and the overall linear model; and the held-out 2 segment was used for the MEMD2B partition sear h. . . . . . . . . . . . . . . . . . . . . . 71 4.2 Model performan es. Linear interpolation is designated with a + sign; and the MEMD2B position parameters are given as m , where m and n are the numbers of position partitions and word-pair partitions respe tively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1 Approximate times in se onds to generate predi tions of maximum word sequen e length M , on an 1.2GHz pro essor, for MEMD and 3G+IBM models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Rational simulation, using MEMD2B model. Numbers give estimated per ent redu tions in keystrokes. . . . . . . . . . . . . . . . . . . . . . 97 5.3 RationalReader simulation, using MEMD2B model. Numbers give estimated per ent redu tions in keystrokes. . . . . . . . . . . . . . . . . 97 vi 5.4 Realisti simulations, using the MEMD2B model (top table), and the 3G+IBM2 model (bottom table). Numbers give estimated per ent redu tions in keystrokes. . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5 Realisti simulations using the MEMD2B model, with orre ted probability estimates. Numbers give estimated per ent redu tions in keystrokes.101 5.6 Number of proposals by length, forRealisti simulation with the MEMD2B model and M = 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 C.1 Performan e of Printz' algorithm on a set of trigger features sele ted randomly from among frequent words, over a 390k word training orpus. Features are ordered by de reasing true gain Gf(̂) al ulated by training, for ea h feature, a single-feature model with a referen e trigram distribution, to get the optimum weight ̂. ~ denotes the approximation to ̂, and ~ Gf is the approximated gain. . . . . . . . . 148 C.2 Corpus division; note that the four segments shown are ontiguous and in hronologi al order. . . . . . . . . . . . . . . . . . . . . . . . . . . 152 C.3 Perplexities of the trigram referen e distribution p and its omponents on di erent segments of the orpus. The rise in perplexity with \distan e" from blo k A re e ts the hronologi al nature of the Hansard. The perplexities of B, C, and test for the empiri al models are not shown be ause they are in nite. . . . . . . . . . . . . . . . . . . . . . 155 C.4 Top ten trigger pairs for MI and gain ranking methods. (The angle bra kets are Fren h quotation marks, rendered in orre tly by LATEX.) 156 C.5 Average and standard deviation of perplexity over les in the test orpus, for top MIand gain-ranked feature sets of the given sizes. The olumn marked re e ts the di eren es in perplexities between ea h MI model and the orresponding gain model. . . . . . . . . . . . 159 vii C.6 Average and standard deviation of perplexity over les in the test orpus, for ea h model listed. The numbers beside the a he models indi ate the window length L, and the numbers beside the MEMD models indi ate the number of features. The olumn marked re e ts the di eren es in perplexities between ea h model and the referen e trigram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 viii ACKNOWLEDGMENTS Thanks to Mar Dymetman for supplying the roots of many of these ideas long ago, Pierre Isabelle for reating the wonderfully utopi resear h atmosphere in whi h they ourished, Elliott Ma klovit h for many insights and inspirations, and Guy Lapalme for the in redible energy, perseveran e, and dedi ation with whi h he stu k to the task of keeping this thesis on the rails. Thanks also to Philippe Langlais for being a fantasti ollaborator on the TransType proje t, and to all other members of the RALI team for their support. Finally, thanks to sundry giants. If I have not seen as far as some, it is not be ause I have not o asionally managed to make it onto the shoulders of the odd giant to enjoy the view for a few moments before losing my balan e and tumbling down again. ix Chapter
منابع مشابه
User-Friendly Text Prediction For Translators
Text prediction is a form of interactive machine translation that is well suited to skilled translators. In principle it can assist in the production of a target text with minimal disruption to a translator’s normal routine. However, recent evaluations of a prototype prediction system showed that it significantly decreased the productivity of most translators who used it. In this paper, we anal...
متن کاملTransType: Text Prediction for Translators
Text prediction is a novel form of interactive machine translation that is well suited to skilled translators. It has the potential to assist in several ways: speeding typing, suggesting possible translations, and averting translator errors. However, recent evaluations of a prototype prediction system showed that predictions can also distract and hinder translators if made indiscriminately. We ...
متن کاملAnalysis and Prediction of Unalignable Words in Parallel Text
Professional human translators usually do not employ the concept of word alignments, producing translations ‘sense-forsense’ instead of ‘word-for-word’. This suggests that unalignable words may be prevalent in the parallel text used for machine translation (MT). We analyze this phenomenon in-depth for Chinese-English translation. We further propose a simple and effective method to improve autom...
متن کاملUncertainty and Uncertainty Management in EFL Translators
This study tried to examine EFL translators’ uncertainty and uncertainty management strategies through employing think aloud procedures. The participants of this study were some MA andBA translators selected from several universities in Iran. To this aim, a proficiency test was firstly administered among the volunteers. Then, think aloud protocol and retrospective interview were used to collect...
متن کاملStrategies Available for Translating Persian Epic Poetry: A Case of Shahnameh
This study tried to find the strategies applied in three English translations of the Battle of Rostam and Esfandiyar. To this aim, the source text (ST) was analyzed verse by verse with each verse being compared with its English translations to determine what procedures the translators had used to render the source text. Subsequently, the frequency of usage for each procedure was measured ...
متن کاملDetecting Cross-Lingual Plagiarism Using Simulated Word Embeddings
Cross-lingual plagiarism (CLP) occurs when texts written in one language are translated into a different language and used without acknowledging the original sources. One of the most common methods for detecting CLP requires online machine translators (such as Google or Microsoft translate) which are not always available, and given that plagiarism detection typically involves large document com...
متن کامل