Optimizing the Cyclic Jacobi Algorithm of Singular Value Decomposition for DLX Architecture

نویسندگان

  • Jun Ma
  • Yun-Nan Chang
چکیده

This project focussed on speeding up the cyclic Jacobi algorithm of the Singular Value Decomposition for DLX architecture. Using the performance monitor, it is shown that over 96% of the time is spent in a subroutine calculates Jacobi rotations. To speed up the subroutine, the improvement are performed on two aspects. From the algorithm point of view, we rst implemented the fast Jacobi matrix multiplications , which brings our program to a more mature level, and forms the algorithm structure basis for further improvements. We then implemented the fast Jacobi rotation which requiring half as many oating point operations as standard Jacobi rotation and achieved 1.2 times speed up; From the complier point of view, we optimized the assemble code by reducing the integer operations which obtained 1.7 times speed up, and explored the instruction level parallelism by loop unrolling, rescheduling and register renaming which gave us further 1.2 times speed improvement. The cache and memory traces are also performed. The results showed that cache misses does not play an important role in the performance of the algorithm. Along with the algorithm optimization , we also modiied and enhanced the DLXsim simulator by providing its "stop info" command with the ability to calculate and display the cycle counts for a particular code section, which makes the DLXsim more convenient and practical to proole a program. Bugs are found and corrected in DLXsim simulator and DLXcc complier. Optimized DLXcc complier and limitations of the DLX architecture are also addressed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Jacobi{like Algorithms for Computing the Ordinary Singular Value Decomposition on Jacobi{like Algorithms for Computing the Ordinary Singular Value Decomposition

The increasing interest for using the OSVD in the real{time DSP domain necessitates an eecient computation of the OSVD. Special interest has been given to Jacobi{like algorithms which also is the case in this paper. After a description of the basic orthogonal transformations, algorithms for computing the OSVD are classiied and shortly described. Various rotation schemes for Jacobi{like algorith...

متن کامل

A Sort-jacobi Algorithm on Semisimple Lie Algebras

A structure preserving Sort-Jacobi algorithm for computing eigenvalues or singular values is presented. The proposed method applies to an arbitrary semisimple Lie algebra on its (−1)-eigenspace of the Cartan involution. Local quadratic convergence for arbitrary cyclic schemes is shown for the regular case. The proposed method is independent of the representation of the underlying Lie algebra an...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

A Dimensionless Parameter Approach based on Singular Value Decomposition and Evolutionary Algorithm for Prediction of Carbamazepine Particles Size

The particle size control of drug is one of the most important factors affecting the efficiency of the nano-drug production in confined liquid impinging jets. In the present research, for this investigation the confined liquid impinging jet was used to produce nanoparticles of Carbamazepine. The effects of several parameters such as concentration, solution and anti-solvent flow rate and solvent...

متن کامل

Dynamic Ordering for the Parallel One-sided Block-jacobi Svd Algorithm

The serial Jacobi algorithm (either one-sided or two-sided) for the computation of a singular value decomposition (SVD) of a general matrix has excellent numerical properties and parallelization potential, but it is considered to be the slowest method for computing the SVD. Even its parallelization with some parallel cyclic (static) ordering of subproblems does not lead to much improvement when...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996