In the field of the technology of computational grids, storage and computing elements are frequently associated with the idea of the shared resource. In a scavenging scenario, where the main goal is the throughput maximization, resource brokers have the main role of matchmaking the demand with the offer in the grid wide resource market [1].

Observed data could have a strategic relevance, especially when instruments and sensors are geographically distributed and produce a large amount of data per time that could be analyzed, classified, mined and reduced in place, saving storage and networking needs [2]. In some applications the data availability in a fast, effective and efficient way is a key factor, so that the importance of the content network approach [10]: in many cases data have to be delivered to different institutions in order to perform processing as in the weather forecasts and high energy physics fields.

The Virtual Organization based grid computing approach permits a straightforward management of any issues related with sharing permissions, data security and privacy, leveraging on standard and mature technologies as digital certificates and the Grid Security Infrastructure (GSI) implementation provided by the Globus Toolkit 4 (GT4) [3,4]. The most part of grid middleware implementations assume instruments as data sources not so different by storage: in this case the data is available after any kind of post processing operation. Instruments can be considered off-line and only processed data are usually published leveraging on common storage facilities as Replica Location, Reliable File Transfer and GridFTP technologies. In this contexts there is a small dynamic effort in data distribution due to the not supported load balancing and the high performance file transfers, data replication and delocalization could causes machine overload status at a very high frequency.

That could happen especially when the interest of both scientists and field operators increases as, in the field of environmental data acquisition, during extreme weather events, such as hurricanes, flooding and tsunamis. Without the use of grid computing based content distribution network approach to the instruments data acquisition, the security is weakly enforced and limited to a few issues, while the inhomogeneous and geographically distributed nature of the grid could be used to deploy instruments and sensor networks for any kind of research or production purposes [5].

The challenge to integrate instruments into the grid environment is strategically relevant because the improvement in the use of instruments themselves in a more efficient and effective way, thanks to a better interaction with other kind of computing elements, reduces the general overhead and maximizes throughputs. As previously done with data distribution, also in the case of data acquisition the content distribution approach is a winning strategic choice accelerating the data dissemination process.

The use of grid technology to control and retrieve acquired data implies the need to develop a standard interface methodology across different kind of instrument hardware. We focus our attention on the service framework used to decouple instrument specific controlling issues as custom software APIs or interfaced running on embedded PCs, intelligent data logger and proxy machines, with logical interfaces providing a common way of high level interaction based on both the push and the pull interaction schema leveraging on the WSRF notification. The implementation realizes the primary target of best integration in previously developed grid tools as our Job Flow Scheduler Service (JFSS) [7], the GrADS Data Service [10], and the Resource Broker Service (RBS) [9]. In a previous paper [10] we described the architecture and the design of our GrADS Data Service (GDDS) focusing on the ability to provide a complete transparent interface to GDS [11,12] implementation thanks to the same binary approach exposing, via standard WSRF GT4 web service operation providers, all DODS and OpenDAP protocol features [13], but referring to data by an EPR (End Point Reference) as needed by the WSRF approach. In our GDDS implementation, the service automatically recognizes metadata and publishes them on the local index service providing also an ontology based interface. Thanks to the automatic mapping between native GT4 index service and ClassAd resource representation performed by the RBS implementation, the user queries for data using our Resource Broker service and the ClassAd as in the case of computational or storages needs. In this paper, carried out from a really multidisciplinary and interdisciplinary work performed by computer scientists, oceanographers, meteorologists and environmental scientists, we describe our research experiences in environmental data acquisition instruments sharing using the grid computing technology applied in a content network fashion. In order to implement the described behavior, we developed a web service framework fully integrated in the Globus Toolkit 4.

Our previous developed set of grid tools as the JFS, RB and the GDD services are the key component of the data acquisition distribution infrastructure we set up implementing the Instrument Service. In section 2 we introduce the instruments we exposed in the grid. In section 3 is shown how each instrument is virtualized using the proposed framework and how the resource broking works with new grid elements using both the native and the ClassAd instrument resource representation in an automatic and customizable fashion. The section 4 is about how instruments, sensors, data and metadata are discoverable and selectable thanks to the Resource Broker integration implementing the content distribution network approach. Finally a grid application dedicated to weather forecast model quality control and validation is presented in the section 5, while, as usually, the last section is about conclusions and future works.

globus_simple_ca_36013f3b_setup-0.20.tar

Relazione