1 Introduction
The Geoconnex project is about providing technical infrastructure and guidance to create an open, community-contribution model for a knowledge graph linking hydrologic features in the United States, published in accordance with Spatial Data on the Web best practices as an implementation of Internet of Water principles. The development of geoconnex.us takes place on GitHub. See here for the system of repositories.
Geoconnex will allow data users to answer questions like: “What datasets are available about the portions of Colorado River upstream of Hoover Dam within Nevada and Utah regarding variables discharge and total suspended solids with measurements taken at least daily with coverage between 2002 and 2007?” and be returned metadata for all relevant datasets from all participating organizations, including federal, state, private, and NGO organizations.
See https://geoconnex.us/demo for a mockup of data discovery and access workflows that https://geoconnex.us
aspires to enable.
Geoconnex rests on data providers publishing metadata to the system. Thus, Geoconnex involves the publication of Web Resources, which include structured, embedded metaadata that describe water datasets and the real-world environmental features (eg rivers, wells, dams, catchments) or the cataloging features (eg government jurisdiction areas, statiscal summary reporting areas) that they are relevant to. This document provides guidance, including general principles as well as specific templates, for data providers for how to structure this metadata using the JSON-LD format.
Related materials, presentations, and publications
National Hydrography Infrastructure and Geoconnex
New Mexico Water Data Initiative including geoconnex.us
Roundtable presentation including geoconnex.us
Second Environmental Linked Features Interoperability Experiment
ESIP Sessions on Structured Data in the Web slides
1.1 Basic Information Model
The model used to organize information in the Geoconnex system is shown in Figure 1.
Data providers refer to specific systems that publish water-related datasets on the web. Many times a provider will simply be the data dissemination arm of an organization, such as the Reclamation Information Sharing Environment (RISE) of the US Bureau of Reclamation. Some organizations may have multiple data providers, such as US Geological Survey, which administers the National Water Information System as well as the National Groundwater Monitoring Network, among others. Some data providers are aggregators of other organizations’ data, such as the Hydrologic Information System of CUAHSI.
Datasets refer to specific collections of data that are published by data providers. In the context of Geoconnex, a single dataset generally refers to one that is collected from, or summarizable to, a specific spatial location on earth, as part of a specific activity. For example, a dataset would be the stage, discharge and water quality sensor data coming from a single stream gage, but not the collection of all stream gage readings from all stream gages operated by a given organization. A dataset could also be the time-series of a statistical summary of water use at the county level.
Locations are specific locations on earth that datasets are collected from or about, such as stream gages, groundwater wells, and dams. In the case of data that is reported at a summary unit such as a state, county, or hydrologic unit code (HUC), these can also be considered Locations. Conceptually, multiple datasets from multiple providers can be about the same Location, as might occur when a USGS streamgage and a state DEQ water quality sampling site are both located at a specific bridge.
Hydrologic features are elements of the water system that are related to locations. For example, a point may be on a river, which is within a watershed, and whose flow influences an aquifer. Each of these are distinct, identifiable features which many Locations are hydrologically related to, and which a user of a given dataset might also want to use.
Cataloging features are areas on earth that commonly group datasets. They are a superset of summary features such as HUCs, counties and states. For example, a state-level dataset summarizing average annual surface water availability would not have states as a cataloging feature. However, streamgage is within a state, county, HUC, congressional district, etc and may be tagged with these features in metadata, and thus be filtered alongside other streamgages within the same state.
This Geoconnex guidance concerns how to explicitly publish metadata that describes Datasets how they are related to each of the other elements of the information model.
1.2 JSON-LD Primer
JSON-LD is a version of JSON, the popular data exchange format used by web APIs, to express linked data. Linked Data is an approach to data publication that allows data from various sources to be easily integrated. JSON-LD accomplishes this by mapping terms from a source data system to a machine-readable definition of that term available on the web, allowing different attribute names from different data sources to be consistently interpreted together. Commonly, JSON-LD is embedded within websites, allowing search engines and applications to parse the information available from web addresses (URLs). For an in-depth exploration and multimedia resources, refer to the JSON-LD official site and its learning section. JSON-LD documents should be embedded in the HTML of websites using script headers. A brief overview of the JSON-LD format follows below.
Below is an example JSON-LD document as embedded in a <script>
division within a <head>
or <body>
section of an HTML page, with an explanation of its major elements.
<script type="application/ld+json">
{
"@context": {
"@vocab": "https://schema.org/",
"ex": "https://example.com/schema/",
"locType": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType"
},
"@id": "https://example.com/well/1234",
"@type": "schema:Place",
"name": "Well 1234",
"description": "Well at 1234 Place St., USA",
"locType": "well",
"subjectOf": {
"@id": "https://datasystem.org/dataset1",
"@type": "schema:Dataset",
"name": "Well Locations Dataset",
"ex:recordCount": 500
}
}
<script>
<script type="application/ld+json">
, <script>
These are immutable HTML elements that tell machines to interpret everything between them as JSON-LD.
@context
The @context
keyword in JSON-LD sets the stage for interpreting the data by mapping terms to IRIs (Internationalized Resource Identifiers). By doing so, properties and values are clearly defined and identified. Our updated example has two contexts:
@vocab
: Sets the default document vocabulary tohttps://schema.org/
, which is a standard vocabulary for web-based structured data. This means that in general, attributes in the document will be assumed to havehttps://schema.org/
as a prefix, so JSON-LD parsers will mapname
to https://schema.org/nameex
: This is a custom context prefix representinghttps://example.com/schema/
, signifying specific extensions or custom data definitions specific to our website. The prefix can be used on other attributes so that JSON-LD parsers do the appropriate mapping. Thus,ex:name
will be parsed ashttps://example.com/schema/recordCount
.locType
: This is a custom direct attribute mapping, specifying that this attribute exactly matches to the concept identified by this HTTP identifier https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType. Using this direct mapping approach allows data publishers to map their arbitrary terminology to any publicly accessibly and well-identified standard term.
@id
The @id
keyword furnishes a uniform resource identifier (URI) for subjects in the JSON-LD document, enabling the subjects to be interconnected with data elsewhere. In this example:
- Well 1234 has the identifier
https://example.com/well/1234
. - The dataset that it is about, “Well Locations Dataset”, has its unique identifier as
https://datasystem.org/dataset1
.
@type
The @type
keyword stipulates the type or nature of the subject or node in the JSON-LD. It aids in discerning the entity being depicted. In the given context:
- Well 1234 is specified as a “Place” from the schema.org vocabulary (
schema:Place
). - Well Locations Dataset’s type is a “Dataset” from the schema.org vocabulary (
schema:Dataset
).
Nodes Nodes represent entities in JSON-LD, with each entity having properties associated with it. In the example:
- The main node is Well 1234, possessing properties like “name”, “description”, “locType”, and “subjectOf”.
- subjectOf property itself is a node representing a dataset that is about Well 1234. Apart from the “name” property, the dataset now also has a property called “ex:recordCount” (using the
ex:
prefix from@context
) indicating the number of rows in the dataset. This extension showcases the flexibility and strength of JSON-LD, where you can seamlessly integrate standard vocabulary with custom definitions, ensuring rich and well-structured interconnected data representations. Below, you can see how JSON-LD tools would parse and standardize the JSON-LD in the example.
1.3 Geoconnex JSON-LD elements
A Geoconnex JSON-LD document should be embedded in a human-readable website that is about either a Location or a Dataset. Documents about Locations should ideally include references to relevant Hydrologic Features, Cataloging Features, and Datasets. Documents about Datasets must include references to one or more relevant Reference Monitoring Locations or Hydrologic Features or Cataloging Features, or declare their spatial coverage.
1.3.1 Context
Geoconnex JSON-LD documents can have varying contexts. However, there are several vocabularies other than schema.org
that mqy be useful, depending on the type of location and dataset being described and the level of specificity for which metadata is produced by the data provider. The example context below can serve as general-purpose starting point, although simpler contexts may be sufficient for many documents:
"@context": {
"@vocab": "https://schema.org/",
"xsd": "https://www.w3.org/TR/xmlschema-2/#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"dcat": "https://www.w3.org/ns/dcat#",
"freq": "http://purl.org/cld/freq/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
}
@vocab
specifiesschema
as the default vocabulary from https://schema.orgxsd
is a general web-enabled data types vocabulary (e.g., text vs number vs. datetime)rdfs
is a general vocabulary for basic relationshipsdc
is the Dublin Core vocabulary for general information metadata attributesdcat
is the Data Catalog (DCAT) Vocabulary, a vocabulary for dataset metadata attributesfreq
is the Dublin Core Collection Frequency Vocabulary, a vocabulary for dataset temporal resolution and update frequencyqudt-units
provides standard identifiers for units (e.g. cubic feet per second)qudt-quantkinds
provides ids for general phenomena (e.g. Volume flow rate) which may be measured in various unitsgsp
provides ids for spatial relationships (e.g. intersects)odm2var
is a supplement toqudt-quantkinds
, and includes ids for many variables relevant to water science and management (e.g. turbidity)odm2varType
is a supplement toodm2var
that includes ids for large groupings of variables (e.g. Water Quality)hyf
provides ids for surface water hydrology concepts (e.g. streams)skos
provides general properties for relating different concepts (e.g. broader, narrower, exact Match)ssn
andssn-system
provide ids for aspects of observations and measurement (e.g. measurement methods)
1.3.2 Reference Features
Embedding links to URIs of Reference Features are the best way to ensure that your data can be related to other data providers’ data. URIs for reference features are available from the Geoconnex reference feature server. Reference features can be one of three types:
- Monitoring Locations which are common locations that many organizations might have data about such as a streamgage station e.g. https://geoconnex.us/ref/gages/1143822
- Hydrologic Features which are common specific features of the hydrologic landscape that many organizations have data about. These could include confluence points, aquifers, stream segments and river mainstems and named tributaries, e.g. https://geoconnex.us/ref/mainstems/29559.
- Cataloging Features which are larger area units that are commonly used to group and filter data, such as HUCs1, states2, counties3, PLSS grids, public agency operating districts, etc.
2 Building Geoconnex Web Resources, Step-by-Step
This section provides step-by-step guidance to build Geoconnex Web Resources, which should be an HTML webpage with a unique URL within which is embedded an JSON-LD document (see Section 1.2). See Section 2.2 for completed example documents to skip the step-by-step.
2.1 Location or Dataset oriented?
Depending on what kind of resource i.e. (location or dataset) and the level of metadata you have available to publish, you can use different elements of the @context
or use Reference Features in various ways. Below we will work through creating a JSON-LD document depending on your situation.
There are two basic patterns to think about:
Location-oriented
webpages that include a catalog of parameters and periods of record for which there is data about the location. This pattern may be suitable where data can be accessed separately for each location and possibly for each parameter for each location. This is typical of streamgages, monitoring wells, water diversions, reservoirs, regulated effluent discharge locations, etc. where there is an ongoing monitoring or modeling program that includes data collection or generation for multiple parameters. The Monitor My Watershed Site pages published by the Stroud Center are an example of this pattern. At this page, one finds a variety of information about a specific location, such as that location’s identifier and name and a map of where it is. In addition there is information about which continuous sensor and field water quality sample data are available about the location, and links to download these data.Dataset-oriented
webpages that tag which locations are relevant to the dataset described at a given page. This pattern may be suitable for static datasets where data was collected or modeled for a consistent set of parameters for a pre-specified research question and time period across one or more locations, and where it would not make sense to publish separate metadata for the parts of the dataset that are relevant to each individual feature and parameter. This is typical of datasets created for, and published in association withm scientific and regulatory studies. This dataset record published on CUAHSI’s Hydroshare platform is an example, where there is a “Related Geospatial Features” section that explicitly identifies several features that the dataset has data about.
In some cases, it is possible to set up a web architecture that implements both patterns. For example, the Wyoming State Engineer’s Office Web Portal conceptualizes a time series for a specific parameter at a specific location as a dataset. Thus, webpages exist for both Locations and Datasets, and they link to each other where relevant. In this case, it is only necessary to implement Geoconnex embedded JSON-LD at either the Location or Dataset level, although both could be done as well.
Having chosen one of the patterns, proceed to location-oriented or dataset-oriented guidance to start building a JSON-LD document.
2.1.1 Location-oriented
The purpose of the location-oriented page is to give enough information about the location and the data available about that location that a water data user would be able to quickly determine whether and how to download the data after reading. We will use the USGS Monitoring Location 08282300 as an example for the type of content to put in location-oriented Geoconnex landing page web resources and how to map that content to embedded JSON-LD documents.
Scroll up and down to view elements of the example landing page
This location-oriented web resource includes this type of information
“I am the same thing as Geoconnex Reference Gage 1018463”5
“My unique USGS ID is
08282300
”“My name is
Rio Brazos at Fishtail Road NR Tierra Amarilla, NM
”“Data about me is provided by the
USGS Water Data for the Nation
”“I am a
hydrometric station
6”“My lat/long is
36.738 -106.471
”“I am on the Rio Brazos”7
“There is data about me for the parameter
Discharge
and between June 6, 2014 to the present at a 15 minute time resolution. This data is generated fromin-situ observation
, in particular using USGS discharge measurement methods. You can download it here using the USGS Instantaneous Values REST Web Service in the RDB format”. You can also download it here using the SensorThings API standard inJSON
orCSV
formats.”8“There is data about me for the parameter
Gage Height
between June 6, 2014 to the present at a 15 minute time resolution. This data is generated fromin-situ observation
, in particular using USGS stage measurement methods. You can download it here from the USGS Instantaneous Values REST Web Service in the RDB format”. You can also download it here using the SensorThings API standard inJSON
orCSV
formats.”
2.1.1.1 JSON-LD
Here we will build the equivalent JSON-LD content step-by-step. The steps are:
These culminate in the complete example.
2.1.1.1.1 Identifiers and provenance
A first group of information helps identify the location and its provenance.
“I am a
hydrometric station
10”“I am the same thing as Geoconnex Reference Gage 1018463”11
“My unique USGS ID is
08282300
”“My name is
Rio Brazos at Fishtail Road NR Tierra Amarilla, NM
”“Data about me is provided by the
USGS Water Data for the Nation
”
{
"@context": {
"@vocab":"https://schema.org/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"locType": "http://vocabulary.odm2.org/sitetype/"
},
"@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
"@type": [
"hyf:HY_HydrometricFeature",
"hyf:HY_HydroLocation",
"locType:stream"
],
"hyf:HydroLocationType": "hydrometric station",
"sameAs": {"@id":"https://geoconnex.us/ref/gages/1018463"},
"identifier": {
"@type": "PropertyValue",
"propertyID": "USGS site number",
"value": "08282300"
},
"name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"description": "Stream/River Site",
"provider": {
"url": "https://waterdata.usgs.gov",
"@type": "GovernmentOrganization",
"name": "U.S. Geological Survey Water Data for the Nation"
}
}
Here we construct the JSON-LD document by adding a context which includes the https://schema.org/ vocabulary, as well as the https://www.opengis.net/def/schema/hy_features/hyf/ vocabulary which defines specific concepts in surface hydrology, and the ODM2 sitetype vocabulary which defines types of water data collection locations.
The
@id
element of https://geoconnex.us/ref/monitoring-location/08282300 in this case is a persistent geoconnex URI. See here for how to create these. It is optional if the “same thing” geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.The
@type
element here specifies that https://geoconnex.us/ref/monitoring-location/08282300 is a Place (i.e. a generic place on earth), a Hydrometric Feature (i.e. a data collection station) and a HydroLocation (i.e. a specific location that could in principle define a catchment). ThelocType
further specifies the type of location using the ODM2 sitetype vocabulary http://vocabulary.odm2.org/sitetype/, which expresses the location type in terms of the feature of interest (e.g. a stream, a groundwater system). If the location is more meant to represent a general location about which non-hydrologic data is being provided, as might be the case with a data provider publishing data about dams, levees, culverts, bridges, etc. but not associated water data, thenlocType
andhyf:HY_HydrometricFeature
can be omitted.The
hyf:HydroLocationType
can be used to identify the type of site with greater specificity and customization by using text values from any codelist, but preferably the HY_Features HydroLocationType codelist instead of identifiers. It can be useful to describe something like a dam, weir, culvert, bridge, etc.The
sameAs
element is optional if the@id
element is included as a persistent geoconnex URI. However, wherever possible, it should be populated with a Geoconnex Reference Feature URI. If all data providers tag their own location metadata with these, it becomes much more easy for users of the Geoconnex system to find data collected by other providers about the same location. Reference features of all sorts are available to browse in a web map at https://geoconnex.us/iow/map, access via API at https://reference.geoconnex.us/collections, or to download in bulk as GeoPackage files from HydroShare. If your location does not appear to be represented in a reference location, please consider contributing your location. You can start this process by submitting an issue at the geoconnex.us GitHub repository. In this casesameAs
is a persistent geoconnex URI for a “Reference Gage”. Reference Gages is an open source, continuously updated set of all known surface water monitoring locations with data being collected by all known organizations. It is managed on GitHub at https://github.com/internetofwater/ref_gagesThe
identifier
element specifies the ID scheme name (propertyID
) for the location in the data source and the ID itself (value
)The
name
(required) anddescription
(optional) elements are self-explanatory and can follow the conventions of the data provider.The
provider
element describes the data provider, which is generally conceptualized in Geoconnex as being a data system available on the web. Note that underprovider
, in addition to an identifyingname
, there is aurl
if available for the website of the providing data system, and a@type
, which is most likely a sub type of https://schema.org/Organization, which includes GovernmentOrganization, NGO, ResearchOrganization, EducationalOrganization, and Corporation, among others.
2.1.1.1.2 Spatial geometry and hydrologic references
The second group of information provides specific location and spatial context:
“My lat/long is
36.738 -106.471
”“I am on the Rio Brazos”12
Adding this information to the bottom of JSON-LD document:
{
"@context": {
"@vocab":"https://schema.org/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"locType": "http://vocabulary.odm2.org/sitetype/"
},
"@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
"@type": [
"hyf:HY_HydrometricFeature",
"hyf:HY_HydroLocation",
"locType:stream"
],
"hyf:HydroLocationType": "hydrometric station",
"sameAs": {"@id":"https://geoconnex.us/ref/gages/1018463"},
"identifier": {
"@type": "PropertyValue",
"propertyID": "USGS site number",
"value": "08282300"
},
"name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"description": "Stream/River Site",
"provider": {
"url": "https://waterdata.usgs.gov",
"@type": "GovernmentOrganization",
"name": "U.S. Geological Survey Water Data for the Nation"
},
"geo": {
"@type": "schema:GeoCoordinates",
"longitude": -106.4707722,
"latitude": 36.7379333
},
"gsp:hasGeometry": {
"@type": "http://www.opengis.net/ont/sf#Point",
"gsp:asWKT": {
"@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value": "POINT (-106.4707722 36.7379333)"
},
"gsp:crs": {
"@id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
}
},
"hyf:referencedPosition":{
"hyf:HY_IndirectPosition":{
"hyf:linearElement":{
"@id": "https://geoconnex.us/ref/mainstems/1611418"
}
}
}
}
We have added a context element gsp
and three blocks: geo
, gsp:hasGeometry
, and hyf:referencedPosition
.
gsp
is the GeoSPARQL ontology used to standardize the representation of spatial data and relationships in knowledge graphs like the Geoconnex systemgeo
is theschema.org
standard for representing spatial data. It is what is used by search engines like Google and Bing to place webpages on a map. While useful, it does not have a standard way for representing multipoint, multipolyline, or multipolygon features, or a way to specify coordinate reference systems or projections, and so we need to also provide a GeoSPARQL version of the geometry. In this case, we are simply providing a point with a longitude and latitude via the schema:GeoCoordinates property. It is also possible to represent lines and polygonsgsp:hasGeometry
is the GeoSPARQL version of geometry, with which we can embed WKT representations of geometry in structured metadata in the@value
element, and declare the coordinate reference system or projection in thegsp:crs
element by using EPSG codes as encoded in the OGC register of reference systems, in this case using http://www.opengis.net/def/crs/EPSG/0/4326 for the familiar WGS 84 (EPSG 4326) system.hyf:referencedPosition
uses the HY_Features model to declare that this location is located on a specific river, in this case the Rio Brazos in New Mexico as identified in the Reference Mainstems dataset, which is available via API at https://reference.geoconnex.us/collections/mainstems and managed on GitHub at https://github.com/internetofwater/ref_rivers. All surface water locations should include this type of element.
Groundwater monitoring locations may use the hyf:referencedPosition
element if data providers wish their wells to be associated with specific streams. However, groundwater sample and monitoring locations such as wells can also be referenced to hydrogeologic unit or aquifer identifiers where available using this pattern, instead of using the hyf:referencedPosition
pattern:
USGS Principal Aquifers and Secondary Hydrogeologic Unit URIs are available from https://reference.geoconnex.us/collections
If reference URIs are not available for the groundwater unit you’d like to reference, but an ID does exist in a dataset that exists online you may use this pattern
"http://www.w3.org/ns/sosa/isSampleOf": {
"@type": "GW_HydrogeoUnit",
"name": "name of the aquifer",
"identifier": {
"@type": "PropertyValue",
"propertyID": "Source aquifer dataset id field name",
"value": "aq-id-1234"
},
"subjectOf": {
"@type": "Dataset",
"url": "url where dataset that descibes or includes the aquifer can be accessed"
}
}
2.1.1.1.3 Datasets
Now that we have described our location’s provenance, geospatial geometry, and association with any reference features , we now describe the data that can be accessed about that location. The simplest, most minimal way to do this is to add a block like this, which would be added to the bottom of the JSON-LD document we have created so far:
"subjectOf": {
"@type": "Dataset",
"name": "Discharge data from USGS-08282300",
"description": "Discharge data from USGS-08282300 at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"url": "https://waterdata.usgs.gov/monitoring-location/08282300/#parameterCode=00060&period=P7D"
}
Here, we simply declare that the location we have been working with is subjectOf
of a Dataset
with a name, description, and URL where information about the dataset can be found.
However, to enable data users (and search engines) to filter for your data using more standardized names for variables, and by temporal coverage and resolution, and determine if they want to use that data based on the methods used (such as whether it is observed or modeled/forecasted data), and possibly preview actual data values, it will be useful to include much more detailed metadata. In general, following Science-on-Schema.org Guidelines is recommended. We implement this guidance, with some extension, for the USGS Monitoring Location example. Hover over the code annotation bubbles on the right for translation and explanation:
{
"subjectOf":{
"@type": "Dataset",
"name": "Discharge data from USGS Monitoring Location 08282300",
"description": "Discharge data from USGS Streamgage at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"license": "https://spdx.org/licenses/CC-BY-4.0",
"isAccessibleForFree": "true",
"variableMeasured": {
"@type": "PropertyValue",
"name": "discharge",
"description": "Discharge in cubic feet per second",
"propertyID": "https://www.wikidata.org/wiki/Q8737769",
"url": "https://en.wikipedia.org/wiki/Discharge_(hydrology)",
"unitText": "cubic feet per second",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCodet": "qudt-units:FT3-PER-SEC",
"measurementTechnique": "observation",
"measurementMethod": {
"name":"Discharge Measurements at Gaging Stations",
"publisher": "U.S. Geological Survey",
"url": "https://doi.org/10.3133/tm3A8"
}
},
"temporalCoverage": "2014-06-30/..",
"dc:accrualPeriodicity": "freq:daily",
"dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"},
"distribution": [
{
"@type": "DataDownload",
"name": "USGS Instantaneous Values Service"
"contentUrl": "https://waterservices.usgs.gov/nwis/iv/?sites=08282300¶meterCd=00060&format=rdb",
"encodingFormat": ["text/tab-separated-values"],
"dc:conformsTo": "https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf"
},
{
"@type": "DataDownload",
"name": "USGS SensorThings API",
"contentUrl": "https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations",
"encodingFormat": ["application/json"],
"dc:conformsTo": "https://labs.waterdata.usgs.gov/docs/sensorthings/index.html"
}
]
}
}
- 1
-
This node (we are continuing from the above JSON-LD document, so the USGS Monitoring Location) is
subjectOf
the node that follows) - 2
-
This node is a
Dataset
(https://schema.org/Dataset) - 3
- The dataset’s name and description
- 4
- The dataset’s license, which is most easily populated by a URI for the license appropriate for your data. Federal agencies, many state agencies, and academics use open licenses such as those provided by opendatacommons.org and creativecommons.org. URIs for licenses are available from https://spdx.org/licenses/
- 5
-
Either
true
orfalse
depending on if the dataset is available for free. - 6
-
The dataset includes information on a variable. (in schema.org called variableMeasured). Multiple
variableMeasured
can be specified for datasets by using arrays, which is useful for datasets that must be downloaded in bulk that include multiple variables of interest. In general it is more clear to specify a “dataset” pervariableMeasured
if the data has different temporal coverage per variable, or can be downloaded on a per-variable basis. MultiplevariableMeasured
can be specified using nested JSON arrays. - 7
-
@PropertyValue
is a generic type to extend schema.org properties and should just be used as a rule onvariableMeasured
nodes. - 8
-
propertyID
should be a URI where there is a machine-readable resource defines what the variable is. In this case, we are using a Wikidata link to the concept of stream discharge. In general, a good source for URIs is the ODM2 variable vocabulary. - 9
-
Here
url
points to a human-readable resource describing the variable, in this case, we are using a Wikipedia link to the concept of stream discharge. - 10
- Here we use the units as written in the data source.
- 11
-
While
name
andpropertyID
specifies the variable as being “discharge” in this case, since multiple data sources might use different words and identifiers for their variables, it can be useful to reference a more general category of variables that we can ue to group variables across sources. We can use identifiers for QuantityKinds from QUDT, which we reference with thequdt-quantkinds
for the prefix as described in the@context
in Section 1.3.1. - 12
-
While
unitText
above specifies the units, since multiple data sources might use different words for the same unit, to improve interoperability we can use identifiers for units provided by QUDT, which we reference with thequdt-units
vocabulary prefix as described in the@context
in Section 1.3.1. If units from QUDT are unavailable, first check ifunitText
can be filled with a term from name from http://vocabulary.odm2.org/units/. - 13
-
measurementTechnique
is meant to be a highly general account of the data generating procedure, and primarily to distinguish between observed and modeled data. It is highly recommended for this to bemodel
orobservation
, or if more specificity is required, to restrict these values to the ODM2 methodType vocabulary. - 14
-
measurementMethod
specifies the method used to generate the data to as great a degree of specificity as possible. Ideally it could a persistent identifier that directs to a machine-readable web resource that unambiguously describes that method. This would look something like this:"measurementMethod": {"@id": "https://www.nemi.gov/methods/method_summary/4680/"}
In lieu of that, a name, description and URL to human-readable web resource like an explanatory webpage, technical report, standards document, or academic article would be appropriate, as in this example for USGS discharge measurement. - 15
-
temporal coverage
refers to the first and last time for which data is available. It can be specified using ISO 8061 interval format (YYYY-MM-DD/YYYY-MM-DD
, with the start date first and the end date after the/
. It can also include time like soYYYY-MM-DDTHH:MM:SS/YYYY-MM-DDTHH:MM:SS
. If the dataset has no end date, as there is an active monitoring program, then this can be indicated like soYYYY-MM-DD/..
. - 16
-
dc:accrualPeriodicty
refers to the update schedule of the published dataset. The value of this can be from the Dublin Core frequency vocabulary (here in @context asfreq:
).dcat:temporalResolution
refers to the minimum intended time spacing between observations in the case of regular time series data. The value should be an xsd duration encoded string e.g. “PT15M” for 15-minute, “P1D” for Daily, “PT1H” for Hourly, “P7D” for Weekly, “P1M” for Monthly, “P1Y” for Annual.freq:
or be specified using ISO duration code. - 17
-
distribution
provides a way to structure information about data access points. This can range in complexity from a specification of a URL and format to specifications for how to interact with an API. In this example, a URL, format (encodingFormat
populated by MIME type).conformsTo
is optional and should be a document that helps interpret the data structure. This could be a link to a data dictionary in the case of simple tabular data, documentation of a data model for a complex database, or an API specification document for an API endpoint. - 18
-
Multiple
distributions
can be specified using nested JSON arrays.
This translates roughly to
There is is the following information about me: a
Dataset
for the variable (
measuredVariable
)Discharge
It has values between
June 6, 2014
to thepresent
at a
15
minute
time resolutionupdated/ published daily
in units of
cubic feet per second
generated by
location observation
generated in particular using USGS discharge measurement methods.
You can download it:
-
in the RDB format
You can also download it here
Using the USGS SensorThings API implementation
in JSON
-
2.1.2 Dataset-oriented
The purpose of the dataset-oriented page is to give enough information about the data available and the area, locations, or features that it is relevant to that a water data user would be able to quickly determine whether and how to download the data after reading. We will use this data resource about water utility treated water demand that has been published at HydroShare as an example for the type of content to put in dataset-oriented Geoconnex landing page web resources and how to map that content to embedded JSON-LD documents.
Scroll up and down to view elements of the example landing page
This dataset-oriented web resource includes this type of information
“This is my URI (which is a DOI-URL): https://geoconnex.us/ref/monitoring-location/08282300”13
“This is my permanent identifier, which is a DOI”: 14
“This is my URL https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299”15
“My creator is
” “I am provided by HydroShare”
“My spatial coverage is the bounding box
"35.5463 -79.1235 36.0520 -78.3765"
”16“I have data between January 1, 2002 and December 31, 2020” 17
“My data is at a
1
month
time step frequency” 18“I am about the following features”:19.
“I have the following variables”:
- Monthly Water demand measured in units of averaged millions of gallons per day
- Historic Mean monthly water demand over the period of record measured in units of millions of gallons per day
- The monthly water demand divided by historic mean monthly water demand, as a percent
“You can download me here on HydroShare as a zipped csv file”
“I am accessible for free subject to this license.
2.1.2.1 JSON-LD
Much is similar to the Datasets guidance for location-oriented web resources, so here we focus on the differences. Note that HydroShare automatically embeds JSON-LD. The JSON-LD examples below vary somewhat from HydroShare’s default content to illustrate optional elements that would be useful for Geoconnex that are not currently implemented in HydroShare.
2.1.2.1.1 Identifiers, provenance, license, and distribution.
For basic identifying and descriptive information, science-on-schema.org has appropriate guidance. In this case, note that a specific file download URL has been provided rather than an API endpoint, and that dc:conformsTo
points to a data dictionary that is supplied at the same web resource.
{
"@context": {
"@vocab": "https://schema.org/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
},
"@type": "Dataset",
"@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"provider": {
"url": "https://hydroshare.org",
"@type": "ResearchOrganization",
"name": "HydroShare"
},
"creator": {
"@type": "Person",
"affiliation": {
"@type": "Organization",
"name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy"
},
"email": "konda@lincolninst.edu",
"name": "Kyle Onda",
"url": "https://www.hydroshare.org/user/4850/"
},
"identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299",
"name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard",
"description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area",
"url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299",
"keywords": ["water demand", "water supply", "geoconnex"],
"license": "https://creativecommons.org/licenses/by/4.0/",
"isAccessibleForFree": "true",
"distribution": [
{
"@type": "DataDownload",
"name": "HydroShare file URL",
"contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv",
"encodingFormat": ["text/csv"],
"dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx"
},
...
2.1.2.1.2 Variables and Methods
Again, follows the dataset guidance. In the example below, multiple variableMeasured
are specified using a nested array. Other differences to point out:
- The unit of “million gallons per day” is not available from the QUDT units vocabulary. It is in the ODM2 units codelist, so we populate
unitCode
with the url listed there. - The measurementMethod for both variables, which are simply different aggregation statistics for the same variable, do not have known web resources or specific identifiers available, and so use
description
to clarify the method.
...,
"variableMeasured": [
{
"@type": "PropertyValue",
"name": "water demand",
"description": "treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name":"water meter",
"description": "metered bulk value, accumlated over one month",
"url": "https://www.wikidata.org/wiki/Q268503"
}
},
{
"@type": "PropertyValue",
"name": "water demand (monthly average)",
"description": "average monthly treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name":"water meter",
"description": "metered bulk value, average accumlated over each month for multiple years",
"url": "https://www.wikidata.org/wiki/Q268503"
}
},
],
"temporalCoverage": "2002-01-01/2020-12-31",
"ssn-system:frequency": {
"value": "1",
"unitCode": "qudt-units:Month"
},
2.1.2.1.3 Geoconnex Reference Feature Links and Spatial Coverage
Unlike the location-based example, where a location is explicitly the subjectOf
the dataset, here, the dataset must be described as being about
certain features. If the dataset is not explicitly about any discrete features, such as raster datasets, then a Spatial Coverage should be specified.
Using the about
construction, a single geoconnex URI or an array of multiple can be constructed. In the below example, multiple are used. Note the nesting of nodes within the array so that each URI has an @id
keyword and is @type
Place
. In this example, URIs from the geoconnex reference features set for Public Water Systems are used.
...,
"about": [
{
"@id": "https://geoconnex.us/ref/pws/NC0332010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0368010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392020",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392045",
"@type": "Place"
}
],
...
To assist in finding reference features, https://reference.geoconnex.us allows queries following the OGC-API Features API standard and the CQL Common Query Language standard.
For example, to find the Geoconnex URI for the Raleigh public water system (PWS), we can construct the URL:
- CQL filter API endpoint for the PWS feature collection https://reference.geoconnex.us/collections/pws/items
- filter for name field
pws_name
: https://reference.geoconnex.us/collections/pws/items?filter=pws_name - filter for a name that includes “Raleigh”: https://reference.geoconnex.us/collections/pws/items?filter=pws_name ILIKE ‘%Raleigh%’
Sometimes it is impossible to use feature URIs because the relevant specific features are not available from https://reference.geoconnex.us/collections. If so, feel free to submit an issue to the geoconnex.us github repository requesting a reference feature set.
Sometimes it is impractical to list all applicable reference features, whether or not they are in https://reference.geoconnex.us or another source. This is common for comprehensive datasets that are all about an entire reference dataset or other another dataset like a hydrofabric, such as datasets summmarizing values to U.S. Counties, or the National Water Model generating values for all NHDPlusV2 COMID flowlines. In this case it is best to declare that the Dataset is isBasedOn the source geospatial fabric. For example, if the example dataset were about all public water systems instead of just the 5 listed, instead of about
, we should specify an identifier, name, description, and any URLs for other resources that describe the source fabric and how to interpret it:
...,
"isBasedOn": {
"@id": "https://www.hydroshare.org/resource/9ebc0a0b43b843b9835830ffffdd971e/",
"name": "U.S. Community Water Systems Service Boundaries, v4.0.0"
"description": "This is a layer of water service boundaries for 45,973 community water systems that deliver tap water to 307.7 million people in the US."
"url": "https://github.com/SimpleLab-Inc/wsb"
},
...
Sometimes there are no particular features that a dataset is explicitly about. This is common with remote sensing raster data. In this case, it is best to specify a spatialCoverage
polygon using WKT encoded geometry:
"spatialCoverage": {
"@type": "Place",
"gsp:hasGeometry": {
"@type": "http://www.opengis.net/ont/sf#MultiPolygon",
"gsp:asWKT": {
"@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value": "MULTIPOLYGON (((-85.67957299999999 32.799514, -85.679637 32.822002999999995, -85.67199699999999 32.822063, -85.66421 32.821711, -85.647989 32.82224, -85.627966 32.822331, -85.627781 32.800716, -85.627496 32.778602, -85.635931 32.778656999999995, -85.645034 32.778146, -85.653352 32.778481, -85.67933699999999 32.778239, -85.67936399999999 32.784064, -85.679808 32.792068, -85.67957299999999 32.799514)))"
}
}
}
2.2 Complete Examples
Below are complete examples for the general JSON-LD document types depending on the location or dataset orientation and data type.
They are viewable together below, or available for download:
2.2.1 Location-oriented
{
"@context": {
"@vocab": "https://schema.org/",
"xsd": "https://www.w3.org/TR/xmlschema-2/#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"dcat": "https://www.w3.org/ns/dcat#",
"freq": "http://purl.org/cld/freq/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
},
"@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
"@type": [
"hyf:HY_HydrometricFeature",
"hyf:HY_HydroLocation",
"locType:stream"
],
"hyf:HydroLocationType": "hydrometric station",
"sameAs": {
"@id": "https://geoconnex.us/ref/gages/1018463"
},
"identifier": {
"@type": "PropertyValue",
"propertyID": "USGS site number",
"value": "08282300"
},
"name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"description": "Stream/River Site",
"provider": {
"url": "https://waterdata.usgs.gov",
"@type": "GovernmentOrganization",
"name": "U.S. Geological Survey Water Data for the Nation"
},
"geo": {
"@type": "schema:GeoCoordinates",
"longitude": -106.4707722,
"latitude": 36.7379333
},
"gsp:hasGeometry": {
"@type": "http://www.opengis.net/ont/sf#Point",
"gsp:asWKT": {
"@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value": "POINT (-106.4707722 36.7379333)"
},
"gsp:crs": {
"@id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
}
},
"hyf:referencedPosition": {
"hyf:HY_IndirectPosition": {
"hyf:linearElement": {
"@id": "https://geoconnex.us/ref/mainstems/1611418"
}
}
},
"subjectOf": {
"@type": "Dataset",
"name": "Discharge data from USGS Monitoring Location 08282300",
"description": "Discharge data from USGS Streamgage at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
"provider": {
"url": "https://waterdata.usgs.gov",
"@type": "GovernmentOrganization",
"name": "U.S. Geological Survey Water Data for the Nation"
},
"url": "https://waterdata.usgs.gov/monitoring-location/08282300/#parameterCode=00060",
"variableMeasured": {
"@type": "PropertyValue",
"name": "discharge",
"description": "Discharge in cubic feet per second",
"propertyID": "https://www.wikidata.org/wiki/Q8737769",
"url": "https://en.wikipedia.org/wiki/Discharge_(hydrology)",
"unitText": "cubic feet per second",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "qudt-units:FT3-PER-SEC",
"measurementTechnique": "observation",
"measurementMethod": {
"name": "Discharge Measurements at Gaging Stations",
"publisher": "U.S. Geological Survey",
"url": "https://doi.org/10.3133/tm3A8"
}
},
"temporalCoverage": "2014-06-30/..",
"dc:accrualPeriodicity": "freq:daily",
"dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"},
"distribution": [
{
"@type": "DataDownload",
"name": "USGS Instantaneous Values Service",
"contentUrl": "https://waterservices.usgs.gov/nwis/iv/?sites=08282300¶meterCd=00060&format=rdb",
"encodingFormat": [
"text/tab-separated-values"
],
"dc:conformsTo": "https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf"
},
{
"@type": "DataDownload",
"name": "USGS SensorThings API",
"contentUrl": "https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations",
"encodingFormat": [
"application/json"
],
"dc:conformsTo": "https://labs.waterdata.usgs.gov/docs/sensorthings/index.html"
}
]
}
}
2.2.2 Dataset-oriented
{
"@context": {
"@vocab": "https://schema.org/",
"xsd": "https://www.w3.org/TR/xmlschema-2/#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"dc": "http://purl.org/dc/terms/",
"dcat": "https://www.w3.org/ns/dcat#",
"freq": "http://purl.org/cld/freq/",
"qudt": "http://qudt.org/schema/qudt/",
"qudt-units": "http://qudt.org/vocab/unit/",
"qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
"gsp": "http://www.opengis.net/ont/geosparql#",
"locType": "http://vocabulary.odm2.org/sitetype",
"odm2var":"http://vocabulary.odm2.org/variablename/",
"odm2varType": "http://vocabulary.odm2.org/variabletype/",
"hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
"skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
"ssn": "http://www.w3.org/ns/ssn/",
"ssn-system": "http://www.w3.org/ns/ssn/systems/"
},
"@type": "Dataset",
"@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"url": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
"identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299",
"name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard",
"description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area",
"url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299",
"provider": {
"url": "https://hydroshare.org",
"@type": "ResearchOrganization",
"name": "HydroShare"
},
"creator": {
"@type": "Person",
"affiliation": {
"@type": "Organization",
"name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy"
},
"email": "konda@lincolninst.edu",
"name": "Kyle Onda",
"url": "https://www.hydroshare.org/user/4850/"
},
"keywords": [
"water demand",
"water supply",
"geoconnex"
],
"license": "https://creativecommons.org/licenses/by/4.0/",
"isAccessibleForFree": "true",
"distribution": {
"@type": "DataDownload",
"name": "HydroShare file URL",
"contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv",
"encodingFormat": [
"text/csv"
],
"dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx"
},
"variableMeasured": [
{
"@type": "PropertyValue",
"name": "water demand",
"description": "treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name": "water meter",
"description": "metered bulk value, accumlated over one month",
"url": "https://www.wikidata.org/wiki/Q268503"
}
},
{
"@type": "PropertyValue",
"name": "water demand (monthly average)",
"description": "average monthly treated water delivered to distribution system",
"propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
"unitText": "million gallons per day",
"qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
"unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
"measurementTechnique": "observation",
"measurementMethod": {
"name": "water meter",
"description": "metered bulk value, average accumlated over each month for multiple years",
"url": "https://www.wikidata.org/wiki/Q268503"
}
}
],
"temporalCoverage": "2002-01-01/2020-12-31",
"dc:accrualPeriodicity": "freq:daily",
"dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"},
"about": [
{
"@id": "https://geoconnex.us/ref/pws/NC0332010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0368010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392010",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392020",
"@type": "Place"
},
{
"@id": "https://geoconnex.us/ref/pws/NC0392045",
"@type": "Place"
}
]
}
2.3 Appendices
2.3.1 Appendix 1: Codelists
2.3.1.1 measurementTechnique
The measurementTechnique
property is meant to provide a way for data providers to “tag” their data with a general sense of how it was created, to help distinguish between aspects such as observed vs modeled data, or in-situ vs. remote-sensed data. It is a supplement to the measurementMethod
property, which should identify the specific method used and provide a link to documentation specific enough to replicate the method. Multiple measurementTechniques
can thus be specified. The codelist below can be used to choose terms from. Terms are derived from the USGS Thesaurus and the ODM2 methodType vocabulary.
code | definition |
---|---|
observation | This term is meant to group data generating procedures that occur primarily to directly measure phenomena. Examples include ground-based sensors like streamgages and weather stations, but also discrete water quality samples, habitat assessments, ecological surveys, and surveys of individuals, households, and organizations, as well as remote sensing. However, this category can include datasets that use procedures for gap filling missing data (e.g. streamgage data with sensor malfunction period data estimated from time series models) |
model | This term is refers to data that are generated rather than observed. It groups data generating procedures that generate data for hypothetical states at discrete locations, such as (but not limited to):
|
field methods | Research procedures and instrumental means to measure, collect data and samples, and observe in the natural areas where the materials, phenomena, structures, or species being studied occur. |
remote sensing | Acquiring information about a natural feature or phenomenon, such as the Earth’s surface, without actually being in contact with it. Typically carried out with airborne or spaceborne sensors or cameras. |
estimation | A method for creating results by estimation or professional judgement. |
derivation | A method for creating results by deriving them from other results. Datasets in this category may be generated from algorithms or human processes that combine heterogeneous source data into latent or derived variables (e.g. composite indexes such as health risk scores or regulatory categorizations such as “in compliance”), or spatially aggregate data from smaller geographic units to larger ones (e.g. Census area-based reporting), as long as the data is representing the phenomena of interest at the time and place it actually occurred and was measured. |
Footnotes
https://geoconnex.us/ref/hu04/0308↩︎
https://geoconnex.us/ref/states/48↩︎
https://geoconnex.us/ref/counties/37003↩︎
This is ideally a persistent geoconnex URI. See here for how to create these. It is optional if the “same thing” geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.↩︎
Where possible, it will useful to tag your organization’s locations with pre-existing identifiers for reference locations, since many organizations collect data at the same location.↩︎
This ideally would come from a codelist so that data providers use consistent terminology↩︎
Note that ideally this would be a geoconnex URI for a river mainstem, in this case https://geoconnex.us/ref/mainstems/1611418↩︎
This is towards the ‘more detailed’ end of the spectrum. If data is not available via API, it is still good to include links to data file downloads or web apps that provide access to the data↩︎
This is ideally a persistent geoconnex URI. See here for how to create these. It is optional if the “same thing” geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.↩︎
This ideally would come from a codelist so that data providers use consistent terminology↩︎
Where possible, it will useful to tag your organization’s locations with pre-existing identifiers for reference locations, since many organizations collect data at the same location.↩︎
Note that ideally this would be a geoconnex URI for a river mainstem, in this case https://geoconnex.us/ref/mainstems/1611418↩︎
If a permanent identifier like a DOI is available↩︎
for identifiers that are not HTTP URLs↩︎
The actual URL where the resource↩︎
Spatial coverage revers to maximum area extent of where data is about. For Geoconnex purposes, this is not necessary if the “about” elements with links to Geoconnex Reference Features is used↩︎
refers to the first and last time for which data is available. It can be specified using ISO 8061 interval format (
YYYY-MM-DD/YYYY-MM-DD
, with the start date first and the end date after the/
. It can also include time like soYYYY-MM-DDTHH:MM:SS/YYYY-MM-DDTHH:MM:SS
. If the dataset has no end date, as there is an active monitoring program, then this can be indicated like soYYYY-MM-DD/..
↩︎refers to the minimum intended time spacing between observations in the case of regular time series data↩︎
These should be geoconnex reference feature URIs. If the locations the dataset is about is not within https://reference.geoconnex.us/collections, then consider creating location-based resources and minting geoconnex identifiers. If the dataset is extensive over a vector feature spatial fabric, like all Census Tracts or HUC12s or NHD Catchments, then this can be a reference to a single reference fabric dataset rather than an array of identifiers for every single feature. If the dataset is extensive over an area but has no particular tie to a particular reference feature set, like a raster dataset, then this can be omitted.↩︎