Indiana University Bloomington

Luddy School of Informatics, Computing, and Engineering

Technical Report TR705:
A Framework for Optimizing Function Call Sequences in MATLAB or Inter-procedural Optimization without Inter-procedural Analysis

Arun Chauhan, Chun-Yu Shei
(Aug 2014), 10 pages pages
Modern processors are getting harder to program. At the same time, wider availability of high-level dynamic languages is enabling relatively novice users to write sophisticated applications. In this paper, we argue that memory bandwidth-related problems on modern multi-core processors are exacerbated in the context of high-level languages. Compilers can help alleviate these problems, but lack a robust framework to perform the kind of inter-procedural analysis that is required to solve these problems for high-level dynamic languages. We identify some specific issues related to memory accesses that arise in optimizing applications in high-level programming systems, such as MATLAB. We demonstrate that the severity of the memory bottleneck in such languages forces us to reconsider several traditional compiler optimizations and, in some cases, to perform transformations that are the exact opposite of what "conventional wisdom" dictates. We propose a theoretical framework to solve several of these related problems. The framework relies on simplifying sequences of function calls based on separately measured "savings" functions. It focuses on the specific problem of rewriting function call sequences, rather than attempting to be a completely general rewriting system that many past systems for describing compiler optimizations have tried to be. As a result, we are able to devise efficient algorithms to implement the framework. This seemingly simple framework turns out to be powerful enough to be applicable to a variety of inter-procedural problems without paying the price of live inter-procedural analysis. These problems include, library function selection, procedure specialization, "de-vectorization" (converting vector statements to parallelizable loops), computation partitioning for heterogeneous platforms, and grouping operations based on data formats and distributions.

Available as: