S3QLRDF: distributed SPARQL query processing using Apache Spark—a comparative performance study

نویسندگان

چکیده

The proliferation of semantic data in the form Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage along with a highly available fault-tolerant parallel processing strategy. There are three open issues RDF management systems that not well addressed altogether existing work. First is querying efficiency, second solutions optimized for certain types query patterns don’t necessarily work all types, third concerned reducing pre-processing cost. More precisely, rapid growth raises need efficient partitioning strategy over to improve SPARQL (SPARQL Protocol Query Language) performance regardless its pattern shape minimized time. In systems, both distributed. On other hand, workloads dynamic structurally diverse can have different degrees complexity. A complex large graph requires combining lot pieces through join operations. Therefore, designing data-partitioning schema minimize transfer fundamental challenge systems. this context, we propose new relational called Property Table Partitioning (PTP) data, further partitions into multiple tables based on distinct properties (comprising subjects non-null values those properties) order input size number operations query. This paper proposed system S3QLRDF, which built top Spark utilizes SQL execute queries PTP schema. experimental analysis respect preprocessing costs performance, using synthetic real datasets shows S3QLRDF outperforms state-of-the-art

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SPARQL query processing with Apache Spark

The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be efficiently addressed with execution guaranteeing scalability, high availability and fault tolerance. RDF data management systems requiring these properties are rarely built from sc...

متن کامل

Incremental SPARQL Query Processing

The number of linked data sources available on the Web is growing at a rapid rate. Moreover, users are showing an interest for any framework that allows them to obtain answers, for a formulated query, accessing heterogeneous data sources without the need of explicitly specifying the sources to answer the query. Our proposal focus on that interest and its goal is to build a system capable of ans...

متن کامل

Predicting SPARQL Query Performance

We address the problem of predicting SPARQL query performance. We use machine learning techniques to learn SPARQL query performance from previously executed queries. We show how to model SPARQL queries as feature vectors, and use k -nearest neighbors regression and Support Vector Machine with the nu-SVR kernel to accurately (R value of 0.98526) predict SPARQL query execution time. 1 Query Perfo...

متن کامل

Flexible query processing for SPARQL

Flexible querying techniques can enhance users’ access to complex, heterogeneous datasets in settingssuch as Linked Data, where the user may not always know how a query should be formulated in order to retrievethe desired answers. This paper presents query processing algorithms for a fragment of SPARQL 1.1 incorporatingregular path queries (property path queries), extended with quer...

متن کامل

SFRDF+: Join Plans for SPARQL Processing in Apache Flink

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Distributed and Parallel Databases

سال: 2023

ISSN: ['0926-8782', '1573-7578']

DOI: https://doi.org/10.1007/s10619-023-07422-4