Building Data-Intensive Grid Applications with Globus Toolkit - An Evaluation Based on Web Crawling
نویسندگان
چکیده
Nowadays, there is a trend to create resource-consuming applications without building heavy computer centers, but to use resources on computer systems distributed over the internet. Grid middleware is a framework to access these resources. The concern of this paper is the evaluation of a specific grid middleware, namely Globus Toolkit, for data-intensive applications. As a test case, we have designed and implemented a service-based distributed web crawler on top of this middleware: A web crawler is a complex application consisting of many nodes. It imposes significantly higher demands on grid middleware regarding administrative flexibility compared to grid applications that allocate computing power of grid nodes. We have observed that some components of Globus Toolkit are flexible enough to provide the control functionality necessary for a web crawler, while others are not. For these other components, we propose possible extensions. Since we expect the combination of those characteristics to occur with many other grid applications as well, our study is of broader interest, beyond web crawling.
منابع مشابه
A Distributed Data Storage Architecture for Event Processing by Using the Globus Grid Toolkit
In this paper we discuss a Grid-based Event Processing System (GEPS). Data intensive problems broadly exist in many scientific computational areas; usually their needs for super storage and computing capacities are difficult to be fully satisfied. Meanwhile the Globus Toolkit has become the de facto standard of building high performance distributed computing environments. Event processing and f...
متن کاملThe GSI Plug-In for gSOAP: Building Cross-Grid Interoperable Secure Grid Services
Increasingly, grid computing is becoming the paradigm of choice for building large-scale complex scientific applications. These applications are characterized as being computationally and/or data intensive, requiring computational power and storage resources well beyond the capability of a single computer. Grid environments provide distributed, geographically spread computing and storage resour...
متن کاملDynaSched: a dynamic Web service scheduling and deployment framework for data-intensive Grid workflows
Grid computing boosts productivity by maximizing resource utilization and simplifying access to resources which are shared among virtual organizations. Recently, the Grid and Web Service communities have established a set of common interests and requirements. The latest version of the Globus Toolkit implements the Web Service Resource Framework (WSRF) specifications which have been formulated t...
متن کاملÜberlegungen zur Entwicklung komplexer Grid-Anwendungen mit Globus Toolkit
Verteilte Anwendungen mit einem hohen Ressourcenbedarf lassen sich heutzutage mit Hilfe von Grid-Techniken ohne den Einsatz großer Rechenzentren erstellen. Stattdessen nutzt man freie Ressourcen auf Computern, die im Internet verteilt sind. Grid Middleware ist hierfür eine Grundlage. Hauptinteresse dieser Arbeit ist die Analyse der am weitesten verbreiteten Middleware für Grid-Anwendungen, dem ...
متن کاملSemantic Meta Data (SMD): An Approach to Next Generation Knowledge Centric Web / Grid Services
As Web Services have matured, they have been substantially leveraged within the academic, research and business communities. The Grid is an emerging platform to support on demand “virtual organizations” for coordinated resource sharing and problem solving on a global scale. Web/Grid services, metadata and semantics are becoming increasing important for service sharing and effective reuse. In th...
متن کامل