2008-09-01: Response to AIP II CFP

From Datafedwiki

Jump to: navigation, search

< Back to Reports | Edit with Form

Title: DataFed response to AIP II CFP
Date: 2008/9/1
Location: {{{Location}}}
Report Formats:

  • [No PPT] | [No Screencast]
  • Word | [No PDF]
Image:2008-09-01: Response to AIP II CFP.png

Contents

[edit] Cover page

Response to the GEOSS Architecture Implementation Pilot (AIP-2) Call for Participation


Washington University Offerings for AIP-2:

Catalog Component, Data Access, Processing and Orchestration Services and GEOSS Architecture Validation

Submitted by

Rudolf Husar, Erin Robinson, Kari Hoijarvi, Stefan Falke

Washington University, St. Louis, MO

September 1, 2008

[edit] Overview

Recent developments offer outstanding opportunities to fulfill the information needs for atmospheric sciences and air quality management. The ‘terabytes’ of data from surface and remote sensors can now be stored, processed and delivered in near-real time and the instantaneous ‘horizontal’ diffusion of information via the Internet now permits, in principle, the delivery of the right information to the right people at the right place and time. Standardized computer-computer communication languages and Service-Oriented Architectures (SOA) now facilitate the flexible processing of raw data into high-grade ‘actionable’ knowledge. Last but not least, the World Wide Web has opened the way to generous sharing of data and tools leading to faster knowledge creation through collaborative analysis in real and virtual workgroups.

Nevertheless, atmospheric scientists and air quality managers face significant hurdles. Due to the “data deluge” problem, the production of Earth observations and models are rapidly outpacing the rate at which these observations are assimilated and metabolized into actionable knowledge that can produce societal benefits. As a consequence, Earth Observations (EO) are under-utilized in making societal decisions.

A remedy is anticipated from the Global Earth Observation System of Systems (GEOSS). A unique contribution of GEOSS is the adoption and promotion of the advanced system of systems (SoS) approach toward the integration of the multiplicity of autonomous Earth observations and models. SoS [1] consists of autonomous constituents that are managed independently, the constituents evolve independently and an SoS composed of such constituents acquires an emergent behavior. Furthermore, in SoS no stakeholder has a complete insight and understanding; central control is limited; distributed control is essential and the stakeholders must be involved throughout the life of SoS. Thus, the governance of SoS [12] needs to incorporate the multiple legitimate perspectives of the key stakeholders including [1]:

  • users, i.e. the people who benefit from system,
  • developers who construct the system,
  • acquirers who contract and pay for the system,
  • testers who evaluate system for suitability,
  • sustainers who keep the system up to date,
  • trainers who insure that the users know how to use it and
  • researchers who provide the next generation of ideas.

Given these unusual attributes, GEOSS is an ambitious, untried and somewhat risky undertaking. It requires rethinking much of the traditional systems approach that was applied to the design of individual Earth observing systems. Clearly, the SoS approach represents considerable challenges on the architecture, implementation, maintenance, governance and the overall functionality of the information systems contributing and drawing upon GEOSS. These substantial challenges can only be matched and exceeded by the major societal benefits that GEOSS may produce.

The federated data system, DataFed, developed by our CAPITA Research Group at Washington University, St. Louis, is a user-driven contribution to the emerging architecture of GEOSS. The implementation details and the various applications of DataFed are reported elsewhere [2]-[4]. Since several goals and needs of DataFed broadly coincide with those of GEOSS, a significant fraction of the DataFed effort was invested toward connecting and co-evolving with the GEOSS program as described in [11].

Since 2005, the connections to the GEOSS process included work with Architecture and Data Committee (ADC) and the User Interface Committee (UIC) of GEOSS. The work with ADC involves linking DataFed as a provider/user to the GEOSS Common Infrastructure (GCI) including participation in interoperability experiments, architectural studies on system of systems, and demonstrations of loosely coupled applications using Service Oriented Architecture. More recent work with the UIC involved refining the user requirements for air quality. DataFed is a Decision Support System (DSS) for several air quality management programs and it serves as a testbed for developing use pattern, defining user classes and studying data user-producer communication mechanisms. An additional link to UIC is participation of the CAPITA Group in the organization of an air quality Community of Practice (CoP) of GEOSS.

[edit] Proposed Contributions

[edit] Societal Benefit Area Alignment and Support

The Washington University CAPITA Group will continue its participation in the development of the AIP-II Air Quality Scenario. In particular our group will pursue the development and elaboration of user requirements. This activity will be in accordance with the guidelines of the GEOSS User Interface Committee (UIC). These contributions to the UIC will extend the activities and plans laid out at the UIC/ADC Workshop on User Validation of GEOSS Architecture Using an Air-Quality Scenario, Toronto, Canada, May 5, 2008.

The contributions of the Washington University Group to AIP II Air Quality Scenario will not be direct. Rather, our efforts in this area will be channeled through the community forum of the Earth Science Information Partners (ESIP) Air Quality Cluster [9]. Our Group is actively supporting the Communal activities of the Air Quality Cluster, both as contributors and facilitators of the AQ Cluster, including the participation in AIP II. The proposed contributions, ESIP Air Quality Cluster's response to AIP II is contained in a companion document.

[edit] Component and Service Contributions

To the AIP 2, the federated data system, DataFed offers standard-based data access service to an array of distributed datasets, an array of processing services and a view-based workflow orchestration service, and a data catalog component. These services are applicable to multiple societal benefit areas including effects on human health and ecosystems, air quality management, disaster management and climate change. In accordance with the request in the CFP, these services are persistent “operational, research and technical exemplars”.

Data Access Services and Wrappers DataFed provides standards-based access to a large array of air quality datasets through either OGC WMS or WCS protocols. Such standards-based access is equally applicable to both raw data as well as to data processed through service orchestration that produce spatial and temporal data views. DataFed includes surface and satellite observation data as well as model output in point, gridded and image formats. The particular service provided by DataFed is wrapping. Wrappers provide a uniform interface to heterogeneous data by compensating for physical access and syntactic differences. Each wrapper has two sides, one facing the heterogeneous data source that requires custom programming. Data wrappers incorporate the physical server location, perform the space-time subsetting services, execute format translations etc. The other side of the wrapper faces outward toward the internet cloud and presents the uniform interface to the heterogeneous data, i.e. turning data into machine-consumable services.

Experience over the past four years has shown that the placement of lightweight wrapper and adopter components between network nodes is desirable for all network links, not only for legacy connections. They allow non-intrusive modification of service connections in response to environmental changes, e.g. an update of an interface standard. The result of this ‘wrapping’ process is an array of homogeneous, virtual datasets that can be queried by spatial and temporal attributes and processed into higher-grade data products.

The data are being federated in a collaborative effort with the providers or custodians of the datasets, such as GIOVANNI, NAAPS, VIEWS, AirNow, and NOAA Hazard Mapping System. Datasets available through OGC WCS and WMS include air pollution concentrations from surface monitoring networks, satellite imagery providing surface reflectance or column densities of pollution indicators, numerical model output, and air pollution emissions data. The collaboration with the providers benefits the data federation by assuring proper data access, links to more metadata that is maintained by the custodians and also occasional messages regarding data flow problems. Conversely, data users of the federated data can provide feedback to the custodians. At this time a formal, structured description of the data providence has not been developed in DataFed. Consequently, the data user is not in position to trace the data flow and the processing steps for the offered federated data.

In DataFed, the orchestration of processing services is performed by a custom-designed workflow engine using SOAP/WSDL web service interfaces. The workflow is designed for chaining both DataFed services as well as other, external web services. Likewise, DataFed’s services are available to, and have been integrated with, other organization’s workflow software. The workflow engine for the orchestration of web services is unique in the sense that the service flows generate data views that can be controlled and embedded directly into application software. The Service Oriented Architecture (SOA) of DataFed is used to build data views by connecting the web service components (e.g. services for data access, transformation, fusion, rendering, etc.) in Lego-like assembly. The generic web-tools created in this fashion include browsers for spatial-temporal exploration, multi-view consoles, animators, multi-layer overlays, etc. As an example, third-party thin client web applications have been created for air emissions data in NEISGEI where Javascript was used to dynamically modify the settings in data views, thereby allowing the application user to control their space-time-parameter analysis. DataFed was the underlying workflow engine that connects distributed data access and processing web services to the user controls in the web application. We anticipate the development of custom web applications for addressing user needs in the AIP air quality scenarios.

Data Catalog Component DataFed has its own catalog where data can be registered for standards-based access for processing, visualization and exploration. DataFed has been registered as a catalog-serving component in the GEOSS registry. For this reason, the datasets are made available for harvesting by the GEOSS Clearinghouse. In AIP-II, the DataFed catalog can be used in the initial phases of the Pilot. However, it is anticipated that the DataFed catalog will be phased out in favor of a broader air quality community catalog which is to be developed, maintained, governed by the ESIP Air Quality Cluster. Additional information regarding this community catalog can be found in the companion response to AIP-II by the ESIP AQ Cluster.

[edit] Architecture and Interoperability Arrangement (Standards) Development

The Washington University CAPITA Group offers its continued participation in the design, development and testing of the GEOSS architecture as pertains to both the User Interface (UIC) and Architecture and Data (ADC) Committees of GEOSS. Following the GEOSS framework, the role of GEOSS is to facilitate universal access to EO data as public good. This is to be accomplished by the GEOSS Core Architecture which serves as a broker between service providers and service users. As we see it, through this mediation, the GEOSS core infrastructure acquires the characteristics of a 'value network' [8].


Fig. 1a GEOSS framework. Fig. 1b. Decision system patter for air quality

For the UIC, the schematic on the right indicates that the actors participating in air quality decision support system include data managers, data processing technical analysts, and ‘informers’ who prepare the technical information for the decision makers. These classes of actors are necessary for most air quality decision support systems, including international policy making regarding hemispheric transport of pollutants, regulatory decisions as part of routine air quality management and in DSS for informing the public through real-time data delivery and forecasting. It is worth highlighting that key users of air quality decision systems are technical analysts and the IS needs to be tailored primarily to their needs. Also, much of the communication along the value chain in the DSS is between the human participants through reports and verbal communication rather than computer-computer interactions.

For ADC, the schematic below indicates a possible architectural design that integrates a typical Air Quality application with the GEOSS Common Infrastructure. Special emphasis is placed on three connector components: Air Quality Community Catalog, Air Quality Community Portal and the Workspaces. Beyond doubt, this architectural design is a naive and untested representation of future system of systems architecture. However, it constitutes a beginning from which a community-developed and tested architecture can evolve. It is hoped that the development of the SoS architecture will occur through an open and participatory forum of an Air Quality Community of Practice.


Fig.2 AQ Community/GEOSS Architecture

[edit] Description of Responding Organization

From Wikipedia: Washington University in St. Louis is a private, coeducational, non-sectarian research university located in St. Louis, Missouri. The University was co-founded in 1853 and offered its first four-year Bachelor of Arts degree in 1859. The University includes 7 graduate and undergraduate schools, encompassing a broad range of academic fields. In the 2007 U.S. News & World Report rankings, its undergraduate program is ranked 12th in the US. Washington University has an active Engineering program which includes Geospatial Environmental Research and numerous collaborations on geospatial interoperability as part of OGC, GEOSS and bilateral arrangements. Washington University is a member of OGC since 2003.

Interoperability demonstrations and pilot studies are an integral part of the GOESS development process and these require the active participation of its autonomous members. DataFed has participated in these developments of the System of Systems as a ‘dual citizen’; it has its own internal and custom architectural elements to meet its particular objectives but at the same time implements standard architectural elements to allow connecting and sharing with other systems. Beyond coexistence, it strives to cooperate, co-evolve and ultimately merge with other data federations. To that end, the developers of DataFed have actively participated in a wide range of interoperability studies.

GALEON (Geo-interface to Atmosphere, Land, Earth, Ocean netCDF) is an OGC Interoperability Experiment [5] to support open access to atmospheric and oceanographic modeling and simulation outputs. This is an active and productive group working on the nuts and bolts of ES data modeling and interoperability. The GEOSS Services Network (GSN) [6] is a persistent network of a publicly accessible OGC services for demonstration and research regarding interoperability arrangements in GEOSS. GSN is the basis for demonstrations in the GEOSS Workshop series [7]. The DataFed group, has actively participated in the Beijing and Denver [7] workshops and organized the interoperability experiment for the Barcelona workshop [8].

The ESIP Air_Quality_Cluster [9] is an activity within the Federation of Earth Science Information Partners, ESIP [10]. It connects air quality data consumers with the providers of those data. The AQ Cluster aims to (1) bring people and ideas together (2) facilitates the flow of earth science data to air quality management and (3) provide a forum for individual AQ projects. The DataFed group is active in the evolution of the OGC WCS specification to air quality data. A specific goal is to include into OGC WCS point coverages arising from surface-based monitoring networks.

[edit] References:

[1] United States Air Force Scientific Advisory Board (2005); Report on System-of-Systems Engineering for Air Force Capability Development; Public Release SAB-TR-05-04
[2] R. Husar and R. Poirot, “DataFed and FASTNET: Tools for agile air quality analysis” EM , Air an Waste Management Association, , September vol., pp. 39-41, 2005.
[3] Husar, R.B. S. R. Falke and K. Hoijarvi: Interoperability of Web Service-Based Data Access and Processing: Experience Using the DataFed System. ESTO Meeting, 2006, Paper A6P2
[4] Husar, R.B.; Hoijarvi, K., "DataFed: Mediated web services for distributed air quality data access and processing." Geoscience and Remote Sensing Symposium, 2007. IGARSS 2007. IEEE International , vol., no., pp.4016-4020, 23-28 July 2007
[5] OGC, GALEON http://www.opengeospatial.org/initiatives/?iid=173
[6] OGC GEOSS Services Network (GSN), http://www.ogcnetwork.net/node/56
[7] The User and the GEOSS Architecture V – Denver http://www.ogcnetwork.net/node/137
[8] The User and the GEOSS Architecture XIV – Barcelona, http://www.ogcnetwork.net/node/265
[9] ESIP Air Quality Cluster, http://wiki.esipfed.org/index.php/Air_Quality_Cluster
[10] Earth Science Information Partners, ESIP, http://www.esipfed.org/
[11] R.B. Husar, K. Höijärvi, E.M. Robinson, S.R. Falke and G. Percivall: DataFed: An Architecture for Federating Atmospheric Data for GEOSS. IEEE Systems Journal, GEOSS Issue, In Press
[12] E. Morris et. al: System-of-Systems Governance: New Patterns of Thought http://www.sei.cmu.edu/pub/documents/06.reports/pdf/06tn036.pdf

Facts about 2008-09-01: Response to AIP II CFPRDF feed
Date 1 September 2008  +
Location {{{Location}}}  
PDFLink {{{PDFLink}}}  
PPTLink {{{PPTLink}}}  
ScrCstLink {{{ScrCstLink}}}  
Title DataFed response to AIP II CFP  +
WordLink http://datafedwiki.wustl.edu/images/9/96/GEOSS_AIP-2_CFP_Response_DataFed.doc  +
Personal tools
Workspaces
Clicky Web Analytics