2007-07-18: DataSpace Evolution

From Datafedwiki

Jump to: navigation, search

Back to DataFed Development Events
Datasets are easily catalogable in DataFed and other places, but the catalogs are not easily modifiable by users, the content wasn't reusable in other places. The structured catalog also didn't leave room for user feedback or other harvestable "unstructured" resources like papers, pictures or other links.

Originally we moved just the dataset description to the wiki and tagged these pages with Dataset tags. With the need to sort these datasets by type, platform, etc. the structured value of the key value pair (e.g. Satellite for Sample Platform = Satellite) was used. The pages could now be filtered using combinations of tags with the Dynamic Page List extension. Loosing the key was a problem because an dataset in the domain Aerosol isn't necessarily related to another arbitrary page randomly tagged aerosol, but there was no way to differentiate these two. A hack that we used was to extend the tag to Category:DDom:Aerosol. This is only human readable, but the computer can't understand at all, it did provide the distinction between DDom Aerosol and random Aerosol.

At the same time we evolved the use of a template as a way to standardize and structure dataset content. Using the template we created a fuax-form were the content and the form were seperate. This was a first step toward a scalable system changing the dataset page layout without having to manually update all of the pages. Even with all of the structuring of categories and pages with the template - the data entered either place wasn't easily reusable. Adding DDom:Aersol didn't allow different types of queries to be asked and worse - it permanently filtered out Aerosol in the off chance that you really did want every wiki page remotely connected to Aerosol. The template looks like key value pairs, but even within the same page the content entered can't be reused. We worked around this by creating a short version of the template for the compact catalog, reentering all of the parameters a second time. The issues with this fix is that because everything was disconnected when changes were made on the dataset page they didn't propogate to the catalog, likewise if a new dataset was added the user needed to remember to also add it to the compact catalog. We knew that this was temporary and couldn't possibly be left.

Semantic Mediawiki is the latest "big step" in our DataSpace evolution. The enhanced system allows attributes and relations. An attribute is a characteristic of the object. The structured key-value pairs became triples (object, predacate, value) the general tags could now be classified (Aerosol, Met, ... are Domain) or parts of the page could be identified (e.g. Dataset Title or Dataset Description). The value of Semantic Mediawiki is that by providing these attributes it is possible to reuse them through queries. The query also capitalizes on what we already knew about using multiple templates to structure the same material, the key difference is that Semantic mediawiki fills the template automatically with information coming from the Dataset pages, so there isn't the the update anomally that we previously encountered.

Semantic Mediawiki also produces an RDF feed which through XSLT transformation can be reformatted and reused in other catalogs as well as incorporated as HTML into KML files calling a particular dataset.


Personal tools
Workspaces
Clicky Web Analytics