NASA ACCESS09: Tools and Methods for Air Quality Data Access and Discovery Services

From Datafedwiki

Jump to: navigation, search

Contents


[edit] Tools and Methods for Finding and Accessing Air Quality Data (moved from ESIP)

[edit] Objectives and Expected Significance

The purpose of this proposal is to design, develop and implement new and efficient tools and methods for discovery and seamless access of Air Quality-related datasets. The proposal is in response to the solicitation: Advancing Collaborative Connections for Earth System Science (ACCESS) 2009. In particular, it focuses on providing "means for users to discover and use services being made available by NASA, other Federal agencies, academia, the private sector, and others".

In the U.S., virtually all the important Earth Science datasets are now publicly accessible through the Internet. Typically, data providers place the numeric datasets onto a web server along with the associated 'Readme' files, which contains descriptive information about the data, access instructions and other information, believed to be of use to the user. Over the past decades, there was also a steady evolution of data directories where individual datasets could be registered, properly labeled and searched by potential users. The registering a dataset in a directory is now aided by forms and controlled vocabularies making it easy for the provider to advertise a data product.

The outstanding example of the long-term cataloging effort is NASA's Global Change Master Directory (GCMD). The Global Change Master Directory is an example of the rich resource pool of Earth Observations that are registered as potentially useful data for many different applications. For example, a search of GCMD for "atmosphere" shows about 8000 entries for datasets and 1000 entries for services. Even an air quality parameter "aerosol" returns about 1000 datasets and 500 services. The directories of other agencies and nations probably contains at least this many entries for datasets and services. Recently, Scheffee (2008) compiled a list of datasets relevant to air quality. Such a vast amount of content is overwhelming to the user because relevancy, quality and accessibility to the data is highly variable among the directory entries. Thus the user is faced with an immense burden of combing through, assessing and ultimately deciding as to which datasets to choose for detailed exploration. Assuming that relevant data were identified the user is faced with the problem of accessing the numeric data. Most datasets are only available as granules, i.e. files on a server, that requires human interaction to download the data. Procedures such as OpenSearch, are becoming useful extensions to data directories since they allow the identification of data granules based on space-time constraints for each dataset. After the granules have been downloaded the user faces the tedious task of reading the data file and interfacing the downloaded data file with the data processing or visualization software of the client. A small fraction of the GCMD datasets is accessible through a service interface, most commonly through the flexible OpenDAP protocol, for which adapters are available for several desktop and web applications (??). Data access through the international standard protocols (OGC WMS, WCS, WFS) are extremely rare (e.g. NASA Giovanni, NOAA HMS). In GCMD user support for finding a dataset is provided primarily through a controlled vocabulary of keywords along with free text search of the DIF-formatted standard metadata.

Providers of Earth Observations can easily publish the data through directories and they only have to do it once. On the other hand, finding and accessing and using any given dataset takes a much larger effort and it is repeated individually by every user of a given dataset. The users of a dataset need to expend 100 or 1000 times more effort than a publisher of a dataset. As pointed out by Harland Cleavland (198X), in an information rich environment most of the burden of accessing the proper information is carried by the user, not the producer. According to Taylor (??) this type of information system is classified as provider-content driven and it is also aided by the internet technology. In the next section we describe the characteristics of a user-driven information system, which guides the design of the proposed Air Quality ... for finding and accessing data.


The outcome of the proposed work will significantly reduce the burden on the user in finding and accessing data relevant to the understanding and management of air quality and atmospheric composition. This subject area is the scientific and research domain of the proposing team and the IT technologies will be directly applicable to the furthering research... of this infrastructure is for this domain. The user-oriented information infrastructure will make

The AQ data user benefits through utilizing more appropriate data by making it easier to find suitable data for their applications and accessing and utilizing the data in creating either scientific or actionable knowledge for societal decision-making. The providers of Earth Observations will also benefit since their data products will find easier re-use enhancing the relevance and importance of their products. Finally, the earth science community at large will benefit from the proposed work by developing tools and methods for improved data discovery and access that is applicable to all earth science domains. The broad applicability of this proposed work will be pursued through active participation and compatibility with the GEOSS Common Infrastructure.


There are major impediments to seamless and effective data usage encountered by both data providers and users. The impediments from the user's point of view are succinctly stated in the report by NAS (1998), in short: the user can not find the data, if she can find them, she can not access them, if she can access then, she does not know how good they are, if she finds the data good, she can not merge them with other data. The data provider face a similar set of hurdles: the provider can not find the users, if she can find users, she does not know how to seamlessly deliver the data, if she can deliver, she does not know how to make them more valuable to the users. This project intends to provide support to overcome the first two hurdles: finding the user/provider and accessing/delivering the desired data.



The unique characteristics of the proposed work is that the design is centered on the user of available data. Secondly, the proposed tools and methods are to offer means for finding and accessing data. Thirdly, the discovery and access facilities are to be applicable to the domain of Air Quality and Atmospheric Composition. However, the tools, methods and infrastructure should be applicable to other domains of science and application.

problem: GCMD has earth observations from all of NASA. User has burden to filter which datasets are applicable to them. solution: help AQ User by filter/subset GCMD

Should include 1-pgr on how the aq user gets data now and how that would be improved with new system.


Recent developments offer outstanding opportunities to fulfill the information needs for Earth Sciences and support for many societal benefit areas. The satellite sensing revolution of the 1990's now yield near-real-time observations of many Earth System parameters. The data from surface-based monitoring networks now routinely provide detailed characterisation of atmospheric and surface parameters. The ‘terabytes’ of data from these surface and remote sensors can now be stored, processed and delivered in near-real time and the instantaneous ‘horizontal’ diffusion of information via the Internet now permits, in principle, the delivery of the right information to the right people at the right place and time. Standardized computer-computer communication languages and the emerging Service-Oriented information systems now facilitate the flexible processing of raw data into high-grade scientific or ‘actionable’ knowledge. Last but not least, the World Wide Web has opened the way to generous sharing of data and tools leading to faster knowledge creation through collaborative analysis in real and virtual workgroups.

Nevertheless, Earth scientists and societal decision makers face significant hurdles. The production of Earth observations and models are rapidly outpacing the rate at which these observations are assimilated and metabolized into actionable knowledge that can produce societal benefits. The “data deluge” problem is especially acute for atmospheric scientists interested in the use of satellite observations. As a consequence, Earth Observations (EO) are under-utilized in atmospheric science and for making societal decisions.

climates change and atmospheric processes are inherently complex, the numerous relevant data range form detailed surface-based chemical measurements to extensive satellite remote sensing and the integration of these requires the use of sophisticated models.



Problem: Need to find interesting stuff?

  • provider can identify AQ metadata. Users also provide AQ datasets.
  • Filter, Aggregate, Fuse metadata


problem: GCMD has earth observations from all of NASA. User has burden to filter which datasets are applicable to them. solution: help AQ User by filter/subset GCMD

Should include 1-pgr on how the aq user gets data now and how that would be improved with new system.

===== Provider and User Oriented Designs =====
  • Providers offers it wares ... to reach maxinum users in many applications

Image:GEOSS Fanin Fanout.png
Image:GEOSSUIC Diagram.png
Image:PublishFindBind.png
Image:ExtendedArch.png

[edit] Project Description

The proposal is from the Air Quality user perspective. The system design is focused on users, AQ users shouldn't have to filter.


  • (completeness) machine harvest searches catalogs to ensure complete id/collection?? of resource pool (geoss clearinghouse?)
    • identify metadata catalogs - EPA, NOAA, NASA, ...

Characteristics of user-oriented system:

  • filter the available datasets through machine mechanisms in order to provide a subset of AQ-relevant datasets
    • e.g. Frost-y coarse filter - identify good candidate aq; particular dataset would be subject to standards based access service as well as additional aq specific metadata so that they could be queried and browsed in the high resolution query.
  • (precision) Query is sharp - more specific at community catalog level in parameter space (as well as space, time) - includes instrument, method, ...; Metadata needs to be at the resolution of the dataset - parameter level.
    • Specify the content you are requesting at high level of specificity in space, time, parameter, who did it, who used it, (standard data access interface is implemented for the user)
  • user judgement.
    • Need to be able browse, compare to decide if you want it.
    • user statistics:
      • classes of users - gov't, university, ...
    • data statistics:
      • Pageviews - returnable parameter for each data layer that can be sorted by.
      • # of users that have used this
      • what is the data used for - value and where? help make data more suitable for that usage
  • Access data seamlessly through WCS

This project focuses on the AQ user needs and does not give We intend to create an AQ-relevant library that allows aq analysts to discover and access data. Different because:

  • does not give whole earth observations
  • Provides access to data directly

[edit] Identification of AQ User Needs

The primary characteristic of the proposed information system is that design and implementation that is the user

There is currently a GEO Task US-09-01a "to establish a GEO process for identifying critical Earth observation priorities common to many GEOSS societal benefit areas, involving scientific and technical experts, taking account of socio-economic factors, and building on the results of existing systems’ requirements development processes." The proposing team is leading the air quality and health sub-workgroup to identify the critical Earth observations priorities critical to Air Quality and Health.

The outcome of the AQ and Health workgroup for the GEO Task will inform this proposal design by identifying valuable datasets and identifying places where AQ users need information and data, but cannot access it.

[edit] Filter Directories and Harvest AQ-Relevant Data

The AQ User Needs will determine what datasets are initially targeted for inclusion in the AQ Data ??. As relevant data needs are identified, the community will filter the directories like GCMD and the GEOSS Clearinghouse to fulfill those needs. Another group of initial datasets for inclusion in the AQ Data ?? are current data systems that support AQ management like, EPA AQS, VIEWS, ... Current data systems that support air quality management fall into several categories:

Dedicated Data Systems Dedicated Data Systems are integrated systems where the air quality monitoring, data archiving and applications are all integrated into a closed (stove pipe) application that serve very specific regulatory or other air quality management purposes. For instance, EPA's Air Quality System (AQS) acquires data from the mandated regulatory network, performs the regulatory data analysis for ambient air compliance and performs reporting of the national air pollution pattern and trends. The data from AQS are also accessible as periodically updated zip files for each criteria pollutant and summary data can also be downloaded for individual monitoring sites.


  • High value data

Harvest:
This proposal will provide mechanisms to pull and push air quality metadata to the user

  • pull by attracting data providers, through forces:
    • "Would you like your content to be seen by AQ Users?
    • immediate reward for providing content through standard means is the ability to use tools, displayers, processing services and view data in multiple formats.
    • user feedback

[edit] Dataset Wrapping

For the datasets that are to be incorporated into the AQ Data ??, they need to go through a wrapping process.

Datasets found through the AQ Data ?? must have standard OGC WCS, WMS interfaces in order for AQ users to directly access the subset of data that they want to analyse. By adding standard interfaces, then datasets can be converted into other user formats like netCDF, kml.

Furthermore, the datasets need to have additional AQ-relevant metadata added to metadata that the provider gives in order for the AQ user to easily find the data. This additional metadata allows for sharp queries to be given in the parameter space, time, and physical space.


Access

  • Provide tools for creating standard data access of numeric data
  • Connect users to the 'cloud' for additional related data (webcams, blogs, papers,...)
  • Use community to 'invite' data providers

[edit] Data Browser/Finder

Once the dataset has been wrapped, then the AQ User can search for datasets through a faceted search which reacts to each step of the user's query. This dynamic interface ensures that the user will have results to the final query and allows users to navigate by facets more familiar first and make decisions about less familiar facets as they are narrowed.

The results returned will have additional metadata to aid decision-making like # of users that have used this datasets or links to places where it is used, so that the AQ-user is provided with some additional context.

Because the AQ datasets have been wrapped with a standard data access interface (WCS/WMS) the user can browse data layers, compare them and further explore the data, ultimately choosing the most appropriate data for their application.

We envision that a user interested in finding air quality-relevant data would follow a multi-step procedure. The initial stage is data discovery, which would begin at a master catalog such as the GEOSS Clearinghouse or GCMD.




[edit] Publish-Find-Bind Architecture

The proposed data system will have high value because it starts by wrapping datasets that are of value to the AQ user community. The data access and aq metadata wrapping allows the data to be published, the user then can find the data based on sharp queries in ways that are relevant to the user and ultimately the user's client applications can bind to the datasets that are most applicable in a loosely-coupled manner.


homogenize What few things need to be the same...

[edit] Technical Approach and Methodology

In the proposed data discovery system, the unit data is a 'layer', i.e. a single measured parameter obtained by a given instrument. A layer can represent data from a surface monitor, column measurement from a satellite (e.g. AOT) or a modeled parameter. Data layers originating from the same instrument are grouped into datasets (e.g. MODIS). For each data layer and dataset there is an ISO 19115 metadata record that describes the respective characteristics. The content of the metadata is then exposed to a harvesting (or to a query) service which collects the metadata from distributed providers into the master catalog.

The user accesses the master catalog, navigates and explores the master catalog using advanced faceted search service, data browsing and rich metadata resources.

Focus on impediments!!! Why cant we Publish, Find, Bind??? Invest energy into generic tools to aid Publish, Find, Bind.

  • All data are to be accessible through WMS WCS; data browser, exploration is part of the data discovery process
  • Metadata are GEO - compatible ISO and unstructured
  • Rich metadata collected from providers and users as well as from usage statistics..
  • Metadata the messaging connection and glue that connects data providers, users and mediators


Project can interact with community resources of ESIP WGs, GEO WGs, ....:

  • Many thousands of diverse AQ data layers, tagged (classified). An ideal resource for a semantic uses case. (Semantic Group)
  • Since data will pass through many hands, ideal use case for design and testing of provenance (Greg)
  • Connecting data providers with users through workspaces ... )Stefan)
  • EPA, NOAA and other data are already


• The technical approach and methodology to be employed in conducting the proposed research, including a description of any hardware proposed to be built in order to carry out the research, as well as any special facilities of the proposing organization(s) and/or capabilities of the Proposer(s) that would be used for carrying out the work. (Note: ref. also Section 2.3.10(a) concerning the description of critical existing equipment needed for carrying out the proposed research and the Instructions for the Budget Justification in Section 2.3.10 for further discussion of costing details needed for proposals involving significant hardware, software, and/or ground systems development, and, as may be allowed by an NRA, proposals for flight instruments);


[edit] User Oriented Approach

Value of data cannot be measured prior to its use. It is given value by its users. It only has a potential value. So the better we understand the environment in which the output of the system is used the better will be able to estimate its potential value and better design the system. Value added activities in IS are those process that enhance, or otherwise strengthen the potential utility of the system. The values that result form VA (23) fall in six categories: 1. Ease of use 2. Noise reduction 3. Quality 4. Adaptability 5. Time Savings 6. Cost savings.

[edit] Architectural Approach:

The architectural basis of the proposed work is Service Oriented Architecture (SOA) for the publishing, finding, binding to and delivery of data as services. The critical aspect of SOA is the loose coupling between service providers and service users. Loose coupling is accomplished through plug-and-play connectivity facilitated by standards-based data access service protocols. SOA is the only architecture that we are aware of that allows both loose, dynamic connection and seamless flow of data between a rich set of provider resources and diverse array of users. Service providers registers services in a suitable service registry, the users discover the desired service and access the data The result is a dynamic binding mechanism for the construction of loosely-coupled work-flow applications.

Service orientation, has been accepted as the desired way of delivering Earth Observation (EO) data products. However, formal standards-based data-as-service offerings have been slow delivery of data products in NASA and other Agencies. While offering images through OGC WMS standard interface is becoming common for many Federal Agency data products, there is currently no effective way for the users to find those services advertised and exposed over dispersed over the web pages.


[edit] Engineering and Implementation:

The three major components of this SOA-based project are: (1) facilities for publication of data services (2) facilities to find data services and (3) facilities to access data services. Everything is a service. mash ups to other systems. Connectivity-workflow. The engineering design of the proposed work is Each of these components will be supported by a set of tools and methods.

Designed for lateral connectivity and expanson.

[edit] Data as Service
  • Wrappers, reusable tools and methods (wrapper classes for: SeqImage/Seq File, SQL, netCDF) into WCS and WMS
    • Unidata (netCDF), CF (CF working group; Naming); GALEON (WCS-netCDF) building on top of...
[edit] Metadata as Service
  • Metadata not service: readme file
  • Metadata as service: Capability


  • WCS/WMS GetCapabilities Conventions that allow metadata reuse
  • WCS Capabilitie expanded -> WMS (Combination of WCS and Render) (build metadata in 2 steps wcs and then augment with wms fields)
  • WMS GetCapabilities > ISO Maker Tool publish metadata
[edit] Data Discovery
  • Semantic Mediation - Repackaging/homogenizing metadata - added value comes from incorporating user actions back into the semantic relationships.
  • Metadata system for publishing and finding content has to be jointly developed between data providers and users.
  • Generic catalog systems - metadata collection of not only what provider has done but also tracking what users need
    • Collecting and Enhancing Metadata from observing Users
  • Communication along the value chain, in both direction;
  • Metadata the glue and the message
  • Market approach; many providers; many users; may products
  • Faceted search
    • user is happy
  • Search by usage data

Description on how users will discover and use services provided by NASA, other Agencies, academia..

  • Detail on discovery services
  • System components for persistent availability of these services
    • machine-to-machine interface
    • GUI interface

Classes of Users

  • by value chain
  • by level of experience
  • by ...
[edit] Data Access and Usage

Provider Oriented Catalog:

The first requirement of interoperability is a common data model. The general data model for air quality data is that of a multi-dimensional data cube, with dimensions (X,Y,Z,T) in physical coordinates. Such a data model can be represented through Views, which are slices through the data cube organized, by latitude, longitude, elevation and time. The sub-cubes can be one dimensional, e.g. a time series at a specific location or 3-4 dimensional, depending on the view. These slices are shown below in Fig. 1.

Figure 1. Multi-dimensional Data Model and Data Views This is an abstract data model in a sense that the system response to queries that are addressed to this data model. Implicit in the use of abstract data models is that all the data are accessed through a well-defined interface rather than as physical files. In other words, the goal is to turn data into a service. The physical data storage and management is an implementation issue that is of no concern to the data user.


The federated datasets can be queried, by simply specifying a latitude-longitude window for spatial views, time range for time views, etc. This universal access is accomplished by ‘wrapping’ the heterogeneous data (Fig. 3), a process that turns data access into a standardized web service, callable through well-defined Internet protocols [2].

Figure 3. Data Access Protocols and Adapters. The electric adapter is a good analogue of the DataFed software adapters. The result of this ‘wrapping’ process is an array of homogeneous, virtual datasets that can be queried by spatial and temporal attributes and processed into higher-grade data products. The rich structure and semantics of Earth Science data means that any given dataset can be accessed through multiple protocols. In general, each client and server is capable of communicating through a subset of protocols. Thus, loose coupling between data access and processing services involves choices and negotiations. The main topics of client-server negotiation are the selection of a shared data access protocol and a choice of returned data format.

We have also created a WCS test server to deliver a wide variety of point, grid an image coverage data. Our goal was to evaluate the WCS protocol for accessing coverages of different types arising from a variety Earth observation and modeling systems. It was demonstrated that for the air quality applications WCS is a well-suited protocol for point/station (Fig.6.), image and gridded data (Fig.7.). Fig.8. shows the WCS queries for Map, Time, and Elevation views for a 4-dimensional dataset.


Figure 6. WCS Query for Point Data Type

Figure 7. Universal WCS Data Query for Grid, Image, and Point Data Types

Figure 8. WCS Queries for Map, Time, and Elevation Views The strength of WCS is in the simplicity and universality of the BBOX, TIME data query. This vital query feature is common to WMS and WFS queries, which makes the OGC protocol compatible with the abstract multi-dimensional data model (and vice versa). It is clear, however, that there is considerable work to be done on extending the describeCoverage schema to accommodate these coverage types. Also, the data types returned need to be extended (see Fig.9.) to accommodate these additional coverage types. The discussion on the possible WCS extensions is given in a separate report to the GALEON IE group.

In DataFed, we have adopted the OGC [11] WMS and WCS protocols as the "convergence" protocols [3] for the standards-based access for all datasets.

Fig. 5a. Key data types: sequential images, multidimensional grids and station-point data. Fig. 5a. Schematics of OGC standard protocols, WMS and WCS. 

OGC WCS is particularly applicable for representing space-time-varying phenomena in Fluid Earth Sciences, atmosphere and oceans. OGC WCS version 1.1 is limited to grids, or "simple” coverages, with homogeneous range sets but future revisions of the standard are anticipated to include support a broader set of coverages, including point coverages. An attractive feature of these services is that (1) they can be executed using the simple, universal HTTP GET/POST Internet protocol; (2) the services are described by formal XML documents (“GetCapabilities”, “DescribeCoverage”) and the output formats can be advertised in those service documents.



  • All data from a provider (subset fall data)
  • Metadata only (Standard protocol)
  • Provider metadata (meta-meta-meta data)

[edit] Technology Approach:

The project will rely on three key mature and widely used standard protocols for interoperability : (1) OGC WMS, WCS and WFS web services for accessing data; (2)ISO 19115 for geographic metadata and (3) RSS/Atom and HTTP for inter-service data transfer. Furthermore, the developing software is implemented using three key maturing 'intellectual technologies': (1) Tagging for flexible, user-extensible structuring and annotation of diverse metadata; (2) Faceted search technology for navigation through multidimensional data discovery and (3) Ajax-based dynamic user interface to the data discovery and exploration.

Finder offers service of finding datasets through faceted search, info about users, ... however links to a service called DataSpaces that provides information about the particular resource. This is just one place that dataspaces is linked/embedded/contributed to, providers could also link to dataspaces /Finder for their resources, i.e. query finder for all datasets by giovanni - finder returns a list of datasets with links to dataspaces that more fully describe dataset. amazon parallel - author can query for books written by them and get a feed list of the books to embed in their site. the links to amazon are to the description pages (see blog).

Finder services:

  • Displaying things you may be interested based on your previous searches
  • Providing category search -> that incorporates facets as you get more specific
  • changing the facets as your search deepens
  • Provides user information/analytics about searches to dataspaces

Dataspaces services

  • provides aggregate metadata coming from many sources

[edit] Extensions of Past Work and Impact of this Work

• The perceived impact of the proposed work to the state of knowledge in the field and, if the proposal is offered as a direct successor to an existing NASA award, how the proposed work is expected to build on and otherwise extend previous accomplishments supported by NASA;

[edit] Relevance to NASA Programs

[edit] Relevance to NRA Objectives

• The relevance of the proposed work to past, present, and/or future NASA programs and interests or to the specific objectives given in the NRA;

Proposals must address how effective the total life-cycle cost of development and maintenance is, and whether the design is appropriate to usage for the targeted users.

  • value formula
  • each user defines value of data, net value is value defined - costs of finding/accessing/judge...
  • if providers can alleviate some of the burden by improving the finding, accessing,
  • if you think you only have one user ... then keep it
  • if you think you may have many users then system is appropriate.

Is it worth to place WCS wrapper and AQ metadata?

[edit] General Work Plan

[edit] Data Finding

  • Use Scheffee dataset list as example of user-filtered datasets as candidates for formal access in AQ ...

Connect data hubs together

[edit] Data Access

Data Summit... listing hubs and then Connection to data hubs. DataFed - data access. EMAP/NILU EPA AQS VIEWS


  • including anticipated key milestones for accomplishments,
  • the management structure for the proposal personnel,

[edit] Management Approach:

[edit] Responsibilities

Provider, Users, Mediators (all contribute)

  • Identify candidate AQ datasets
    • machine harvest
    • user needs
    • Provider push
    • mediator flow
  • Decide on classification system

Librarians/Operators... - us

  • publish metadata for structuring/finding dataset is consistentcy across aq and with everything else (across sba's)
  • provide AQ classification/additional metadata for each dataset
  • provide tools methods to turn data into service
  • prepare data for access

Providers

  • publish metadata discovery, access, intrinsic (how collected, sensor)
  • prepare their own data for access

Mediators

  • prepare data for access

Users

  • publish feedback and pagehits
  • if other groups don't prepare data, user may prepare data for access

The major Core group Community - Find-bind ... was developend and tested durion AIP2 by the cooomm Collaborative,

Semantic people Provenence Workflow

  • any substantial collaboration(s) and/or use of consultant(s) that is(are) proposed to complete the investigation;
  • a description of the expected contribution to the proposed effort by the PI and each person as identified in one of

the additional categories in Section 1.4.2, regardless of whether or not they derive support from the proposed budget.

[edit] Management

Community approach:

  • ESIP AQ Workgroup with links to
    • GEO CoP -> Linking multi-region (global), multi SBA
  • Agency (Air Quality Information Partnership (AQIP))
    • EPA
    • NASA
    • NOAA
    • DOE ....
  • ESIP
    • Semantic Cluster
      • Offer: a rich highly textured data needing semantics
      • Needed: Semantics of the data descriptions and finding
    • Web Services and Orchestration Cluster
      • Offer: A rich array of WCS data access services
      • Different workflow & orchestration clients
    • Meetings
      • Winter
      • Summer
    • Telecons
    • OGC WCS netCDF
      • Stefano
      • Ben
      • Max Cugliano
  • Other Related Proposals/Projects
    • Show our CC proposal to AQWG
      • Ask the if they have a way to use this CC as testbed
      • Add a paragraph into the proposal to indicate the way their fits in

[edit] Data Sharing Plan

• To facilitate data sharing where appropriate, as part of their technical proposal, the Proposer shall provide a data-sharing plan and shall provide evidence (if any) of any past data- sharing practices.

The Scientific/Technical/Management Section may contain illustrations and figures that amplify and demonstrate key points of the proposal (including milestone schedules, as appropriate). However, they must be of an easily viewed size and have self-contained captions that do not contain critical information not provided elsewhere in the proposal.

[edit] Misc.

Current Version of AQ Data Finder





[edit] Publishing (metadata -> WAF)

The data discovery through the clearinghouse is aided by ISO metadata for Geospatial data which is prepared for each dataset.

The metadata has the primary purpose to facilitate finding and accessing the data in order to help dealing with first two hurdles that the users face. Clearly, the air quality specific metadata such as sampling platform, data domain and measured parameters etc. need to be defined by air quality users. Dealing with the hurdles of data quality and multi-sensory data integration are topics of future efforts.

The metadata is prepared by transforming and augmenting OGC GetCapabilities into ISO metadata records. The GetCapabilities document provides initial metadata for an ISO 19115 metadata record, through tools and methods provided this metadata is extended to include AQ-specific metadata. The ISO record is validated and saved into the AQ community catalog. The community catalog is registered as a component in the GEOSS Component and Service Registry (CSR). The GEOSS Clearinghouses query the GEOSS CSR for catalogs and then periodically harvest the catalogs for their metadata records, ending the metadata publishing process.

[edit] Finding (...)

The finding of air quality data is accomplished in two stages: a coarse filter generic to all Earth Observations through the clearinghouse and then a high resolution filter specific to air quality in the AQ Catalog Browser.

The GEOSS clearinghouse provides a coarse filter using generic discovery metadata to find Earth Observations. However, the clearinghouse exposes a search API which enables machine queries to be made. The returned records can be further searched using the entire metadata record originally submitted allowing a more refined search specific to a particular community. The AQ Community has built a catalog browser interface to the clearinghouse which enables this two step process. After the initial coarse filter, the returned records are browsed using a customized, faceted search interface was built to search the extended AQ metadata and find AQ data using specific filters such as sampling platform and data structure. Finding the right data is further enhanced by the user's ability to immediately view data as WMS through multiple clients provided by ESRI, Compusult and others. This proposal allow query results to be embedded in another web page. The proposal will also link to multiple available WMS viewers to browse layers available in catalog.

Additionally, the finding process will be augmented through web-based analytics that monitor user activity of the AQ catalog. These analytics will highlight where users come from spatially and virtually (i.e. by search engine, link...) as well as what datasets, spatial and temporal domain and parameters are most viewed or what datasets were viewed together. This feedback will help to hone the catalog to provide more useful information to users such as more datasets in a certain domain or tips for what others who viewed this dataset also viewed. The metrics will also help providers by identifying who some of the users of the data are.


[edit] Binding

Once the data are accessible through standard service protocols and discoverable through the clearinghouse they can be incorporated and browsed in any client application including the ESRI and Compusult GEO Portals.

The registered datasets are also directly accessible to air quality specific, work-flow based clients which can perform value-adding data processing and analysis.

The loose coupling between the growing data pool in GEOSS and workflow-based air quality client software shows the benefits of the Service Oriented Architecture to the Air Quality and Health Societal Benefit Area.

Add new browser and client


[edit] Web Services

A Web Service is a URL addressable resource that returns requested data, e.g. current weather or the map for a neighborhood. Web Services use standard web protocols: HTTP, XML, SOAP, WSDL allow computer to computer communication, regardless of their language or platform. Web Services are reusable components, like ‘LEGO blocks’, that allow agile development of richer applications with less effort. Visionaries (e.g. Berners-Lee, the ‘father’ of the Internet) argue that Web services can transform the web from a medium for viewing and downloading to distributed data/knowledge-exchange and computing.

Enabling Protocols of the Web Services architecture: Connect: Extensible Markup Language (XML) is the universal data format that makes data and metadata sharing possible. Communicate. Simple Object Access Protocol (SOAP) is the new W3C protocol for data communication, e.g. making and responding to requests. Describe. Web Service Description Language (WSDL) describes the functions, parameters and the returned results from a service. Discover. Universal Description, Discovery and Integration (UDDI) is a broad W3C effort for locating and understanding web services.

Service Oriented Architecture (SOA)) provides methods for systems development and integration where systems package functionality as interoperable services. SOA allows different applications to exchange data with one another. Service-orientation aims at a loose coupling of services with operating systems, programming languages and other technologies that underlie applications. These services communicate with each other by passing data from one service to another, or by coordinating an activity between two or more services. SOA can be seen in a continuum, from older concepts of distributed computing and modular programming, through to current practices of mashups, and Cloud Computing.


Image:20090505 AIP2 ADC UICSlide3.PNG
There are numerous Earth Observations that are available and in principle useful for air quality applications such as informing the public and enforcing AQ standards. However, connecting a user to the right observations or models is accompanied by an array of hurdles.

The GEOSS Common Infrastructure allows the reuse of observations and models for multiple purposes

Even in the narrow application of Wildfire smoke, observations and models can be reused.
Image:20090505 AIP2 ADC UICSlide4.PNG

Image:20090505 AIP2 ADC UICSlide5.PNG
The ADC and UIC are both participating stakeholders in the functioning of the GEOSS information system that overcomes these hurdles. The UIC is in position to formulate questions and the ADC can provide infrastructure that delivers the answers.

Image:20090505 AIP2 ADC UICSlide6.PNG
The data reuse is possible through the service oriented architecture of GEOSS.

  • Service providers registers services in the GEOSS Clearinghouse.
  • Users discover the needed service and access the data

The result is a dynamic binding mechanism for the construction of loosely-coupled work-flow applications.

Image:20090505 AIP2 ADC UICSlide7.PNG
The metadata has the primary purpose to facilitate finding and accessing the data in order to help dealing with first two hurdles that the users face. Clearly, the air quality specific metadata such as sampling platform, data domain and measured parameters etc. need to be defined by air quality users. Dealing with the hurdles of data quality and multi-sensory data integration are topics of future efforts.


The finding of air quality data is accomplished in two stages.

  • the data are filtered through the generic discovery mechanism of the clearinghouse
  • then air quality specific filters such as sampling platform and data structure are applied

Image:20090505 AIP2 ADC UICSlide9.PNG
Once the data are accessible through standard service protocols and discoverable through the clearinghouse they can be incorporated and browsed in any application including the ESRI and Compusult GEO Portals.

Image:20090505 AIP2 ADC UICSlide10.PNG
The registered datasets are also directly accessible to air quality specific, work-flow based clients which can perform value-adding data processing and analysis.

The loose coupling between the growing data pool in GEOSS and workflow-based air quality client software shows the benefits of the Service Oriented Architecture to the Air Quality and Health Societal Benefit Area.
Image:20090505 AIP2 ADC UICSlide11.PNG

[edit] The Network

  • Fan-In, Fan-Out
  • (so is GCI) not central
  • holarchy , data up into the pool though the aggregator network and down the disaggregator/filter network

Image:ScaleFreeNetwork3.png

  • Data distributed through Scale-free aggregation network. Metadata contributed along the line of usage. Homogenized and shared.


This proposal...application of the GEOSS concepts in the federated data system, DataFed. The proposal focuses on the SAO aspects of the publish find bind. ...a contribution to the emerging architecture of GEOSS. It is recognized that it represents just one of the many configurations that is consistent with the loosely defind concept of GEOSS.


The implementation details and the various applications of DataFed are reported elsewhere [4]-[6].



Data Value Chain Stages: Acquisition - Mediation - Application

  • Acquisition: Data from Sensor -> CalVal -> Data exposed
  • Mediation: Accessible/Reusable -> Leverable
  • Application: Processed -> LeveragedSynergy -> Productivity



User Oriented Catalog:

  • The right data to the user at the right time the right (subset fall data)
  • Seamlessly accessable (Standard protocol)
  • Complete Metadata (meta-meta-meta data)


  • Has to handle derived data (Raw-procssed Pyramid --- less along the value chain-network)

[edit] Performance Measurement and Feedback

[edit] Metadata from Providers

Active contributions:

  • Provide discovery, access information
  • Providers can also provide information about how users behave once they are at their site

Analytics contributions:

  • Providers could expose monitoring data about usage on their site in order to provide information about who uses the data, where, when...
  • Mediators could aggregate monitoring data from multiple providers of the same data

[edit] Metadata from Users

Analytics:

  • Google Analytics/Google Sitemap - provides feedback and helps market process by improving "shopping" experience for users - creates values to both users and producers.
  • Amazon - collects data on user actions in order to help the next user navigate to books of interest. collects data on text in the book in order to relate books together
  • Recently Viewed widget Tracks for the user the last things that they viewed
  • The 'What users do next' would allow a capture of
    • other datasets viewed at the same time
    • tools that are used with this dataset - based on the analytics monitoring tool/data combinations you could provide information on "this data is most commonly used with this tool"


User active contributions
Tags:

  • Users can tag based on which project the data is used in, event, ...
  • Users could login and select favorites or tag in their own way this would allow navigation of data by users
    • The benefit gained by logging in, is that the catalog would be personalized
    • The benefit others gain is an additional relationship of datasets that only a human could know.
    • Logging in would also allow the identification of 'data experts' - who are people that have contributed a lot about this dataset
  • Dataset popularity could be shown like in delicious - x# of people have tagged this.
  • Feeds of particular tag query can be fed to different sources
  • This extends the "what links here" that could be embedded in the dataspace page. (youtube and sitemap both track what links here)
  • Amapedia - Amazon "DataSpaces" - lets users give structured tags "facts" that allow additional navigation in novel ways.

Reviews

  • Users can offer "reviews" of the data - feedback on problems, advertise where they use it, questions about the data
  • Users can offer help docs, papers or other information they have on the data (also adding to the expertise of a particular contributor)

[edit] Metadata from Mediators

This is what I think we could include in the catalog now:

  • Most popular queries (Each Query has a unique url, so it shows up in the page views)
  • Most often queried next (can show navigation between query urls, so can see what the next step normally is.)
  • People that queried this often went to this dataset. (Navigation summary also includes dataset pages, so by seperating datasets and queries can have both what people queried next and datasets viewed from this query)
  • Once you navigate to dataset, then you can include people that viewed this data often wanted WMS, WCS, GE and -> what did they do next.

  • Alexa Amazon Web Information Serviceprovides analytics about another site. This is good for us b/c we could access analytics about distributed data access and show them uniformly in catalog (Site Overview pages, Traffic Detail pages and Related Links pages)
    • Alexa also example of site pulling in information about one URL from multiple sources. (Amazon example)
  • perform collaborative filtering based upon data collected from more than one [data provider]
  • Avinash Kaushik - Evangelist for Google Analytics
  • Key Performance Indicators -
    • Where do people come from? search, other links - % of visits across all traffic source
    • Bounce rate - Came to one page and left immediately (Good for us b/c it helps direct people to the right information/right place/right time)
      • Combining where people come from and how many people from that source bounce lets you know if you are targeting the right people...
      • Wrong audience
      • Wrong Landing page
    • Visitor loyalty - # that come in a give duration
    • Recency of visit - do you retain people over time
  • What are the key outcomes you want people to do (i.e. subscribe to feed, click on data link, click on metadata link...)
  • What is the top content on site?


[edit] links

ESIP


DataFed wiki

NASA Existing Component - Links

  • Atmosphere Data Reference Sheet - Datasets identified to be relevant to atmospheric research.
  • Giovanni - that provides a simple and intuitive way to visualize, analyze, and access vast amounts of Earth science remote sensing data without having to download the data GIOVANNI metadata describes briefly parameter
  • parameter information pages - provide short descriptions of important geophysical parameters; information about the satellites and sensors which acquire data relevant to these parameters; links to GES DAAC datasets which contain these parameters; and external data source links where data or information relevant to these parameters can be found.
  • Mirador new search and order Web interface employs the Google mini appliance for metadata keyword searches. Other features include quick response, data file hit estimator, Gazetteer (geographic search by feature name capability), event search
  • Frosty
  • GCMD - Has selected Air Quality datasets; provides lots of discovery metadata keywords, citation, etc. lacks standard data access.
  • A-Train Data Depot - to process, archive, allow access to, visualize, analyze and correlate distributed atmospheric measurements from A-Train instruments.
  • Atmospheric Composition Data and Information Service Center - is a portal to the Atmospheric Composition (AC) specific, user driven, multi-sensor, on-line, easy access archive and distribution system employing data analysis and visualization, data mining, and other user requested techniques for the better science data usage.
  • WIST - Warehouse Inventory Search Tool. search-and-order tool is the primary access point to 2,100 EOSDIS and other Earth science data sets
  • FIND - The FIND Web-based system enables users to locate data and information held by members of the Federation (DAACs are Type 1 ESIPs.) FIND incorporates EOSDIS data available from the DAAC Alliance data centers as well as data from other Federation members, including government agencies, universities, nonprofit organizations, and businesses.
  • SESDI Semantically-Enabled Science Data Integration | Vision - ACCESS Project, Peter Fox - will demonstrate how ontologies implemented within existing distributed technology frameworks will provide essential, re-useable, and robust, support for an evolution to science measurement processing systems (or frameworks) as well as for data and information systems (or framework) support for NASA Science Focus Areas and Applications.


[edit] REQUIRED CONSTITUENT PARTS OF A PROPOSAL

(in order of assembly) PAGE LIMIT

  • Proposal Cover Page No page limit when generated by

electronic proposal system

  • Proposal Summary (abstract) 4,000 characters, included in
  • Proposal Cover Page
  • Table of Contents 1
  • Scientific/Technical/Management Section 15*
  • References and Citations As needed
  • Biographical Sketches for:
  • the Principal Investigator(s) 2 (per PI)
  • each Co-Investigator 1
  • Current and Pending Support As needed
  • Statements of Commitment and Letters of Support As needed
  • Budget Justification: Narrative and Details (including Proposing Organization Budget, itemized lists detailing expenses within major budget categories, and detailed subcontract/subaward budgets)
  • Budget Narrative As needed (including Summary of Proposal Personnel 1 and Work Effort and Facilities and Equipment) 2
  • Budget Details As needed
  • Special Notifications and/or Certifications As needed

[edit] ACCESS proposals must address the following additional factors

  • The ACCESS program, while focusing on IT deployment, is centered on serving the research and applied science communities, therefore, proposal teams must include both information technology and Earth science experts.
  • Proposals should be tied directly to an Earth science research issue(s) or investigations with a clear objective and work plan for technology execution and deployment.
  • Proposals should clearly identify the Earth science focus area and/or the science application to be served by the technical work proposed.
  • The period of award for these projects is two years. While in some instances an additional year work may be requested to be proposed, proposal work plans and deliverables described must be limited to two years.
  • Work plans must include the current state of practice/application for the tool or service proposed and identify the improvements or augmentation that will result from the two year ACCESS award.
  • If the proposal is leveraging or extending past work funded by the ACCESS program, the relevancy to this ACCESS solicitation should be made clear.
  • Proposals submitted in response to this solicitation must provide an operations concept for continuance of the tools and services developed for the ACCESS program (see below).
  • Proposers should review the Earth Science Data Rights and Related Issues document (http://nasascience.nasa.gov/earth-science/earth-science-data-centers/data-and-information-policy/) and where applicable include plans for the future state of their information system software rights.
  • Proposals must address how effective the total life-cycle cost of development and maintenance is, and whether the design is appropriate to usage for the targeted users.
Personal tools
Workspaces
Clicky Web Analytics