Towards a distributed, scalable and real-time RDF Stream Processing engine
نویسنده
چکیده
Due to the growing need to timely process and derive valuable information and knowledge from data produced in the Semantic Web, RDF stream processing (RSP) has emerged as an important research domain. Of course, modern RSP have to address the volume and velocity characteristics encountered in the Big Data era. This comes at the price of designing high throughput, low latency, fault tolerant, highly available and scalable engines. The cost of implementing such systems from scratch is very high and usually one prefers to program components on top of a framework that possesses these properties, e.g., Apache Hadoop or Apache Spark. The research conducting in this PhD adopts this approach and aims to create a production-ready RSP engine which will be based on domain standards, e.g., Apache Kafka and Spark Streaming. In a nutshell, the engine aims to i) address basic event modeling to guarantee the completeness of input data in window operators, ii) process real-time RDF stream in a distributed manner efficient RDF stream handling is required; iii) support and extend common continuous SPARQL syntax easy-to-use, adapt to the industrial needs and iv) support reasoning services at both the data preparation and query processing levels.
منابع مشابه
Scalable Linked Data Stream Processing via Network-Aware Workload Scheduling
In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy—a uniform distribu...
متن کاملNetwork-Aware Workload Scheduling for Scalable Linked Data Stream Processing
In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy—a uniform distribu...
متن کاملWAVES: Big Data Platform for Real-time RDF Stream Processing
Processing data as they arrive has recently gained momentum to mine continuous, high-volume and unbounded sequence of data streams. Due to the heterogeneity and the multi-modality of this data, RDF is widely used to provide a unified metadata layer in streaming context. In response to this ever-increasing demand, a number of systems and languages were produced, aiming at RDF stream processing (...
متن کاملStrider: A Hybrid Adaptive Distributed RDF Stream Processing Engine
Real-time processing of data streams emanating from sensors is becoming a common task in Internet of Things scenarios. The key implementation goal consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. In an on-going, industrial project, a 24/7 available stream processing engine usually faces dynamically changing da...
متن کاملTowards Efficient Processing of RDF Data Streams - Short Paper
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This ...
متن کامل