X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation

نویسندگان

چکیده

Bird’s-eye-view (BEV) grid is a typical representation of the perception road components, e.g., drivable area, in autonomous driving. Most existing approaches rely on cameras only to perform segmentation BEV space, which fundamentally constrained by absence reliable depth information. The latest works leverage both camera and LiDAR modalities but suboptimally fuse their features using simple, concatenation-based mechanisms. In this paper, we address these problems enhancing alignment unimodal order aid feature fusion, as well between cameras’ perspective view (PV) representations. We propose X-Align, novel end-to-end cross-modal cross-view learning framework for consisting following components: (i) Cross-Modal Feature Alignment (X-FA) loss, (ii) an attention-based Fusion (X-FF) module align multi-modal implicitly, (iii) auxiliary PV branch with Cross-View Segmentation (X-SA) losses improve PV-to-BEV transformation. evaluate our proposed method across two commonly used benchmark datasets, i.e., nuScenes KITTI-360. Notably, X-Align significantly outperforms state-of-the-art 3 absolute mIoU points nuScenes. also provide extensive ablation studies demonstrate effectiveness individual components.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Birds' eye view: a decade of perspectives.

O of Jan Drake’s happy innovations as editor of other significant events (i.e., the first issue of Genetics in 1916). These have fulfilled the retrospective function. Genetics was to initiate the publication of an essay, entitled Perspectives, at the beginning of each issue. The perspective and prospective functions have been achieved by articles that have integrated classic studies His idea wa...

متن کامل

A Birds Eye View on System Identification

System identification is concerned with obtaining good models from data, i.e. with data driven modeling. In this contribution the aim is to explain and discuss ideas, general approaches and theories underlying identification of linear systems. Identification of linear systems is a nonlinear problem and is prototypical also for many parts of identification of nonlinear systems.

متن کامل

Cross-view Graph Embedding

Recently, more and more approaches are emerging to solve the cross-view matching problem where reference samples and query samples are from different views. In this paper, inspired by Graph Embedding, we propose a unified framework for these cross-view methods called Cross-view Graph Embedding. The proposed framework can not only reformulate most traditional cross-view methods (e.g., CCA, PLS a...

متن کامل

What is binocular vision for? A birds' eye view.

It is proposed that with the possible exception of owls, binocularity in birds does not have a higher order function that results in the perception of solidity and relative depth. Rather, binocularity is a consequence of the requirement of having a portion of the visual field that looks in the direction of travel; hence, each eye must have a contralateral projection that gives rise to binocular...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Machine Vision and Applications

سال: 2023

ISSN: ['1432-1769', '0932-8092']

DOI: https://doi.org/10.1007/s00138-023-01400-7