ML for ML: Learning Cost Semantics by Experiment
نویسندگان
چکیده
It is an open problem in static resource bound analysis to connect high-level resource bounds with the actual execution time and memory usage of compiled machine code. This paper proposes to use machine learning to derive a cost model for a high-level source language that approximates the execution cost of compiled programs on a specific hardware platform. The proposed technique starts by fixing a cost semantics for the source language in which certain constants are unknown. To learn the constants for a specific hardware, a machine learning algorithm measures the resource cost of a set of training programs and compares the cost with the prediction of the cost semantics. The quality of the learned cost model is evaluated by comparing the model with the measured cost on a set of independent control programs. The technique has been implemented for a subset of OCaml using Inria’s OCaml compiler on an Intel x86-64 and ARM 64-bit v8-A platform. The considered resources in the implementation are heap allocations and execution time. The training programs are deliberately simple, handwritten micro benchmarks and the control programs are retrieved from the standard library, an OCaml online tutorial, and local OCaml projects. Different machine learning techniques are applied, including (weighted) linear regression and (weighted) robust regression. To model the execution time of programs with garbage collection (GC), the system combines models for memory allocations and executions without GC, which are derived first. Experiments indicate that the derived cost semantics for the number of heap allocations on both hardware platforms is accurate. The error of the cost semantics on the control programs for the x86-64 architecture for execution time with and without GC is about 19.80% and 13.04%, respectively. The derived cost semantics are combined with RAML, a state-of-the-art system for automatically deriving resource bounds for OCaml programs. Using these semantics, RAML is for the first time able to make predictions about the actual worst-case execution time.
منابع مشابه
Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML
SystemML aims at declarative, large-scale machine learning (ML) on top of MapReduce, where high-level ML scripts with R-like syntax are compiled to programs of MR jobs. The declarative specification of ML algorithms enables—in contrast to existing large-scale machine learning libraries— automatic optimization. SystemML’s primary focus is on data parallelism but many ML algorithms inherently exh...
متن کاملSemantics of Minimally Synchronous Parallel ML
This paper presents a new functional parallel language: Minimally Synchronous Parallel ML. The execution time can then be estimated and dead-locks and indeterminism are avoided. It shares with Bulk Synchronous Parallel ML its syntax and high-level semantics but it has a minimally synchronous distributed semantics. Programs are written as usual ML programs but using a small set of additional fun...
متن کاملCosting Generated Runtime Execution Plans for Large-Scale Machine Learning Programs
Declarative large-scale machine learning (ML) aims at the specification of ML algorithms in a high-level language and automatic generation of hybrid runtime execution plans ranging from single node, in-memory computations to distributed computations on MapReduce (MR) or similar frameworks like Spark. The compilation of large-scale ML programs exhibits many opportunities for automatic optimizati...
متن کاملIncreased Production and Activity of Cellulase Enzyme of Trichoderma reesei by Using Gibberellin Hormone
Cellulolytic complex are enzymes capable of hydrolyzing cellulose. Due to rapid growth in population and industrialization, most countries are required to produce more fuel. Production of bioethanol from lignocellulosic biomass is very challenging due to environmental pollution by fossil fuels. Cellulases play a significant role in biotechnological processes. The cost of production of cellulase...
متن کاملHybridizing Personal and Impersonal Machine Learning Models for Activity Recognition on Mobile Devices
Recognition of human activities, using smart phones and wearable devices, has attracted much attention recently. The machine learning (ML) approach to human activity recognition can broadly be classified into two categories: training an ML model on (i) an impersonal dataset or (ii) a personal dataset. Previous research shows that models learned from personal datasets can provide better activity...
متن کامل