Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and holistic understanding of the document. Current Visual Document (VDU) methods outsource to off-the-shelf Optical Character Recognition (OCR) engines focus on with OCR outputs. Although OCR-based approaches have shown promising performance, they suffer from 1...