Laurelin: Java-native ROOT I/O for Apache Spark

نویسندگان

چکیده

Apache Spark[1] is one of the predominant frameworks in big data space, providing a fully-functional query processing engine, vendor support for hardware accelerators, and performant integrations with scientific computing libraries. One difficulty adopting conventional to HEP workflows lack ROOT file format these frameworks. Laurelin[6] implements I/O pure Java library, no bindings C++ ROOT[2] implementation, readily installable via standard packaging tools. It provides interface enabling Spark read (and soon write) TTrees, users process without pre-processing phase converting an intermediate format.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While S...

متن کامل

Approximate Stream Analytics in Apache Flink and Apache Spark Streaming

Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...

متن کامل

MLlib: Machine Learning in Apache Spark

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shi...

متن کامل

Optimizing ROOT IO For Analysis

The ROOT I/O (RIO) subsystem is foundational to most HEP experiments it provides a file format, a set of APIs/semantics, and a reference implementation in C++. It is often found at the base of an experiment’s framework and is used to serialize the experiment’s data; in the case of an LHC experiment, this may be hundreds of petabytes of files! Individual physicists will further use RIO to perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Epj Web of Conferences

سال: 2021

ISSN: ['2101-6275', '2100-014X']

DOI: https://doi.org/10.1051/epjconf/202125102072