CV4Code: Sourcecode Understanding via Visual Code Representations

نویسندگان

چکیده

We present CV4Code $$^1$$ , a compact and effective computer vision method for sourcecode understanding. Our leverages the contextual structural information available from code snippet by treating each as two-dimensional image, which naturally encodes context retains underlying through an explicit spatial representation. To codify snippets images, we propose ASCII codepoint-based image representation that facilitates fast generation of images eliminates redundancy in encoding would arise RGB pixel Furthermore, is treated neither lexical analysis (tokenisation) nor syntax tree parsing required, makes proposed agnostic to any particular programming language lightweight application pipeline point view. can even featurise syntactically incorrect not possible methods depend on Abstract Syntax Tree (AST). demonstrate effectiveness learning Convolutional Transformer networks predict functional task, i.e. problem it solves, source directly its representation, using embedding latent space derive similarity score two retrieval setup. Experimental results show our approach achieves state-of-the-art performance comparison other with same task data configurations. For first time benefits understanding form processing task. ( https://github.com/jpmorganchase/cv4code )

منابع مشابه

Improving Visual Representations of Code

The contents of this paper describe the work carried out by the Visualisation Research Group in the Centre of Software Maintenance at the University of Durham. For obtaining a high level understanding of the code systems graphical representations are more useful than purely textual representations. However, graphical representations still have a tendency to provide the maintainer with too much ...

متن کامل

Understanding mid-level representations in visual processing.

It is clear that early visual processing provides an image-based representation of the visual scene: Neurons in Striate cortex (V1) encode nothing about the meaning of a scene, but they do provide a great deal of information about the image features within it. The mechanisms of these "low-level" visual processes are relatively well understood. We can construct plausible models for how neurons, ...

متن کامل

Interpreting Deep Visual Representations via Network Dissection

The success of recent deep convolutional neural networks (CNNs) depends on learning hidden representations that can summarize the important factors of variation behind the data. However, CNNs often criticized as being black boxes that lack interpretability, since they have millions of unexplained model parameters. In this work, we describe Network Dissection, a method that interprets networks b...

متن کامل

Visual Tracking via Boolean Map Representations

In this paper, we present a simple yet effective Boolean map based representation that exploits connectivity cues for visual tracking. We describe a target object with histogram of oriented gradients and raw color features, of which each one is characterized by a set of Boolean maps generated by uniformly thresholding their values. The Boolean maps effectively encode multi-scale connectivity cu...

متن کامل

Improving Word Representations via Global Visual Context

Visually grounded semantics is a very important aspect in word representation, largely due to its potential to improve many NLP tasks such as information retrieval, text classification and analysis. We present a new distributed word learning framework which 1) learns word embeddings that better capture the visually grounded semantics by unifying local document context and global visual context,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2023

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-26284-5_18