Database-Driven Mathematical Character Recognition

نویسندگان

  • Alan P. Sexton
  • Volker Sorge
چکیده

We present an approach for recognising mathematical texts using an extensive LTEX symbol database and a novel recognition algorithm. The process consists essentially of three steps: Recognising the individual characters in a mathematical text by relating them to glyphs in the database of symbols, analysing the recognised glyphs to determine the closest corresponding LTEX symbol, and reassembling the text by putting the appropriate LTEX commands at their corresponding positions of the original text inside a LTEX picture environment. The recogniser itself is based on a novel variation on the application of geometric moment invariants. The working system is implemented in Java.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a Parser for Mathematical Formula Recognition

For the transfer of mathematical knowledge from paper to electronic form, the reliable automatic analysis and understanding of mathematical texts is crucial. A robust system for this task needs to combine low level character recognition with higher level structural analysis of mathematical formulas. We present progress towards this goal by extending a database-driven optical character recogniti...

متن کامل

A Database of Glyphs for OCR of Mathematical Documents

Automatic document analysis tools for mathematical texts are necessary to enlarge the pool of mathematical knowledge available in electronic form. However, development of such tools is currently hindered by the weakness of optical character recognition systems in dealing with the large range of mathematical symbols and the often subtle but important distinctions in font usage in mathematical te...

متن کامل

Extraction of Logical Structure from Articles in Mathematics

We propose a mathematical knowledge browser which helps people to read mathematical documents. By the browser printed mathematical documents can be scanned and recognized by OCR (Optical Character Recognition). Then the meta-information (e.g. title, author) and the logical structure (e.g. section, theorem) of the documents are automatically extracted. The purpose of this paper is to show the ex...

متن کامل

Trie-Lexicon-Driven On-line Handwritten Japanese Disease Name Recognition

This paper describes a lexicon-driven approach to on-line handwritten Japanese disease name recognition using a time-synchronous method. A trie lexicon is constructed from a disease name database containing 21,713 disease name phrases. It expands the search space using time-synchronous method and applies the beam search strategy to search segmentation candidate lattice constructed based on prim...

متن کامل

Combining Prediction and Recognition to Improve On-Line Mathematical Character Recognition

This paper describes methods to increase the accuracy of mathematical handwriting analysis by using context information. Our approach is based on the assumption that likely expression continuations can be derived from a database of mathematical expressions and then can be used to rank the candidates of isolated symbol recognition. We present how predicted continuations for an expressions are de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005