1 Introduction

The Geoconnex project is about providing technical infrastructure and guidance to create an open, community-contribution model for a knowledge graph linking hydrologic features in the United States, published in accordance with Spatial Data on the Web best practices as an implementation of Internet of Water principles. The development of geoconnex.us takes place on GitHub. See here for the system of repositories.

Geoconnex will allow data users to answer questions like: “What datasets are available about the portions of Colorado River upstream of Hoover Dam within Nevada and Utah regarding variables discharge and total suspended solids with measurements taken at least daily with coverage between 2002 and 2007?” and be returned metadata for all relevant datasets from all participating organizations, including federal, state, private, and NGO organizations.

See https://geoconnex.us/demo for a mockup of data discovery and access workflows that https://geoconnex.us aspires to enable.

Geoconnex rests on data providers publishing metadata to the system. Thus, Geoconnex involves the publication of Web Resources, which include structured, embedded metaadata that describe water datasets and the real-world environmental features (eg rivers, wells, dams, catchments) or the cataloging features (eg government jurisdiction areas, statiscal summary reporting areas) that they are relevant to. This document provides guidance, including general principles as well as specific templates, for data providers for how to structure this metadata using the JSON-LD format.

Related materials, presentations, and publications

National Hydrography Infrastructure and Geoconnex

New Mexico Water Data Initiative including geoconnex.us

Roundtable presentation including geoconnex.us

Second Environmental Linked Features Interoperability Experiment

ESIP Sessions on Structured Data in the Web slides

1.1 Basic Information Model

The model used to organize information in the Geoconnex system is shown in Figure 1.

Figure 1: Basic information model for resources in geoconnex

Data providers refer to specific systems that publish water-related datasets on the web. Many times a provider will simply be the data dissemination arm of an organization, such as the Reclamation Information Sharing Environment (RISE) of the US Bureau of Reclamation. Some organizations may have multiple data providers, such as US Geological Survey, which administers the National Water Information System as well as the National Groundwater Monitoring Network, among others. Some data providers are aggregators of other organizations’ data, such as the Hydrologic Information System of CUAHSI.
Datasets refer to specific collections of data that are published by data providers. In the context of Geoconnex, a single dataset generally refers to one that is collected from, or summarizable to, a specific spatial location on earth, as part of a specific activity. For example, a dataset would be the stage, discharge and water quality sensor data coming from a single stream gage, but not the collection of all stream gage readings from all stream gages operated by a given organization. A dataset could also be the time-series of a statistical summary of water use at the county level.
Locations are specific locations on earth that datasets are collected from or about, such as stream gages, groundwater wells, and dams. In the case of data that is reported at a summary unit such as a state, county, or hydrologic unit code (HUC), these can also be considered Locations. Conceptually, multiple datasets from multiple providers can be about the same Location, as might occur when a USGS streamgage and a state DEQ water quality sampling site are both located at a specific bridge.
Hydrologic features are elements of the water system that are related to locations. For example, a point may be on a river, which is within a watershed, and whose flow influences an aquifer. Each of these are distinct, identifiable features which many Locations are hydrologically related to, and which a user of a given dataset might also want to use.
Cataloging features are areas on earth that commonly group datasets. They are a superset of summary features such as HUCs, counties and states. For example, a state-level dataset summarizing average annual surface water availability would not have states as a cataloging feature. However, streamgage is within a state, county, HUC, congressional district, etc and may be tagged with these features in metadata, and thus be filtered alongside other streamgages within the same state.

This Geoconnex guidance concerns how to explicitly publish metadata that describes Datasets how they are related to each of the other elements of the information model.

1.2 JSON-LD Primer

JSON-LD is a version of JSON, the popular data exchange format used by web APIs, to express linked data. Linked Data is an approach to data publication that allows data from various sources to be easily integrated. JSON-LD accomplishes this by mapping terms from a source data system to a machine-readable definition of that term available on the web, allowing different attribute names from different data sources to be consistently interpreted together. Commonly, JSON-LD is embedded within websites, allowing search engines and applications to parse the information available from web addresses (URLs). For an in-depth exploration and multimedia resources, refer to the JSON-LD official site and its learning section. JSON-LD documents should be embedded in the HTML of websites using script headers. A brief overview of the JSON-LD format follows below.

Below is an example JSON-LD document as embedded in a <script> division within a <head> or <body> section of an HTML page, with an explanation of its major elements.

<script type="application/ld+json"> 
{
  "@context": {
    "@vocab": "https://schema.org/",
    "ex": "https://example.com/schema/",
    "locType": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType"
  },
  "@id": "https://example.com/well/1234",
  "@type": "schema:Place",
  "name": "Well 1234",
  "description": "Well at 1234 Place St., USA",
  "locType": "well",
  "subjectOf": {
    "@id": "https://datasystem.org/dataset1",
    "@type": "schema:Dataset",
    "name": "Well Locations Dataset",
    "ex:recordCount": 500
  }
}
<script>

<script type="application/ld+json">, <script> These are immutable HTML elements that tell machines to interpret everything between them as JSON-LD.

@context The @context keyword in JSON-LD sets the stage for interpreting the data by mapping terms to IRIs (Internationalized Resource Identifiers). By doing so, properties and values are clearly defined and identified. Our updated example has two contexts:

@vocab: Sets the default document vocabulary to https://schema.org/, which is a standard vocabulary for web-based structured data. This means that in general, attributes in the document will be assumed to have https://schema.org/ as a prefix, so JSON-LD parsers will map name to https://schema.org/name
ex: This is a custom context prefix representing https://example.com/schema/, signifying specific extensions or custom data definitions specific to our website. The prefix can be used on other attributes so that JSON-LD parsers do the appropriate mapping. Thus, ex:name will be parsed as https://example.com/schema/recordCount.
locType: This is a custom direct attribute mapping, specifying that this attribute exactly matches to the concept identified by this HTTP identifier https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType. Using this direct mapping approach allows data publishers to map their arbitrary terminology to any publicly accessibly and well-identified standard term.

@id The @id keyword furnishes a uniform resource identifier (URI) for subjects in the JSON-LD document, enabling the subjects to be interconnected with data elsewhere. In this example:

Well 1234 has the identifier https://example.com/well/1234.
The dataset that it is about, “Well Locations Dataset”, has its unique identifier as https://datasystem.org/dataset1.

@type The @type keyword stipulates the type or nature of the subject or node in the JSON-LD. It aids in discerning the entity being depicted. In the given context:

Well 1234 is specified as a “Place” from the schema.org vocabulary (schema:Place).
Well Locations Dataset’s type is a “Dataset” from the schema.org vocabulary (schema:Dataset).

Nodes Nodes represent entities in JSON-LD, with each entity having properties associated with it. In the example:

The main node is Well 1234, possessing properties like “name”, “description”, “locType”, and “subjectOf”.
subjectOf property itself is a node representing a dataset that is about Well 1234. Apart from the “name” property, the dataset now also has a property called “ex:recordCount” (using the ex: prefix from @context) indicating the number of rows in the dataset. This extension showcases the flexibility and strength of JSON-LD, where you can seamlessly integrate standard vocabulary with custom definitions, ensuring rich and well-structured interconnected data representations. Below, you can see how JSON-LD tools would parse and standardize the JSON-LD in the example.

1.3 Geoconnex JSON-LD elements

A Geoconnex JSON-LD document should be embedded in a human-readable website that is about either a Location or a Dataset. Documents about Locations should ideally include references to relevant Hydrologic Features, Cataloging Features, and Datasets. Documents about Datasets must include references to one or more relevant Reference Monitoring Locations or Hydrologic Features or Cataloging Features, or declare their spatial coverage.

1.3.1 Context

Geoconnex JSON-LD documents can have varying contexts. However, there are several vocabularies other than schema.org that mqy be useful, depending on the type of location and dataset being described and the level of specificity for which metadata is produced by the data provider. The example context below can serve as general-purpose starting point, although simpler contexts may be sufficient for many documents:

  "@context": {
    "@vocab": "https://schema.org/", 
    "xsd": "https://www.w3.org/TR/xmlschema-2/#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "dc": "http://purl.org/dc/terms/",
    "dcat": "https://www.w3.org/ns/dcat#",
    "freq": "http://purl.org/cld/freq/",
    "qudt": "http://qudt.org/schema/qudt/",
    "qudt-units": "http://qudt.org/vocab/unit/",
    "qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
    "gsp": "http://www.opengis.net/ont/geosparql#",
    "locType": "http://vocabulary.odm2.org/sitetype",
    "odm2var":"http://vocabulary.odm2.org/variablename/",
    "odm2varType": "http://vocabulary.odm2.org/variabletype/",
    "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
    "skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
    "ssn": "http://www.w3.org/ns/ssn/",
    "ssn-system":  "http://www.w3.org/ns/ssn/systems/"
  }

@vocab specifies schema as the default vocabulary from https://schema.org
xsd is a general web-enabled data types vocabulary (e.g., text vs number vs. datetime)
rdfs is a general vocabulary for basic relationships
dc is the Dublin Core vocabulary for general information metadata attributes
dcat is the Data Catalog (DCAT) Vocabulary, a vocabulary for dataset metadata attributes
freq is the Dublin Core Collection Frequency Vocabulary, a vocabulary for dataset temporal resolution and update frequency
qudt-units provides standard identifiers for units (e.g. cubic feet per second)
qudt-quantkinds provides ids for general phenomena (e.g. Volume flow rate) which may be measured in various units
gsp provides ids for spatial relationships (e.g. intersects)
odm2var is a supplement to qudt-quantkinds, and includes ids for many variables relevant to water science and management (e.g. turbidity)
odm2varType is a supplement to odm2var that includes ids for large groupings of variables (e.g. Water Quality)
hyf provides ids for surface water hydrology concepts (e.g. streams)
skos provides general properties for relating different concepts (e.g. broader, narrower, exact Match)
ssn and ssn-system provide ids for aspects of observations and measurement (e.g. measurement methods)

1.3.2 Reference Features

Embedding links to URIs of Reference Features are the best way to ensure that your data can be related to other data providers’ data. URIs for reference features are available from the Geoconnex reference feature server. Reference features can be one of three types:

Monitoring Locations which are common locations that many organizations might have data about such as a streamgage station e.g. https://geoconnex.us/ref/gages/1143822
Hydrologic Features which are common specific features of the hydrologic landscape that many organizations have data about. These could include confluence points, aquifers, stream segments and river mainstems and named tributaries, e.g. https://geoconnex.us/ref/mainstems/29559.
Cataloging Features which are larger area units that are commonly used to group and filter data, such as HUCs ¹, states ², counties ³, PLSS grids, public agency operating districts, etc.

2 Building Geoconnex Web Resources, Step-by-Step

This section provides step-by-step guidance to build Geoconnex Web Resources, which should be an HTML webpage with a unique URL within which is embedded an JSON-LD document (see Section 1.2). See Section 2.2 for completed example documents to skip the step-by-step.

2.1 Location or Dataset oriented?

Depending on what kind of resource i.e. (location or dataset) and the level of metadata you have available to publish, you can use different elements of the @context or use Reference Features in various ways. Below we will work through creating a JSON-LD document depending on your situation.

There are two basic patterns to think about:

Location-oriented webpages that include a catalog of parameters and periods of record for which there is data about the location. This pattern may be suitable where data can be accessed separately for each location and possibly for each parameter for each location. This is typical of streamgages, monitoring wells, water diversions, reservoirs, regulated effluent discharge locations, etc. where there is an ongoing monitoring or modeling program that includes data collection or generation for multiple parameters. The Monitor My Watershed Site pages published by the Stroud Center are an example of this pattern. At this page, one finds a variety of information about a specific location, such as that location’s identifier and name and a map of where it is. In addition there is information about which continuous sensor and field water quality sample data are available about the location, and links to download these data.
Dataset-oriented webpages that tag which locations are relevant to the dataset described at a given page. This pattern may be suitable for static datasets where data was collected or modeled for a consistent set of parameters for a pre-specified research question and time period across one or more locations, and where it would not make sense to publish separate metadata for the parts of the dataset that are relevant to each individual feature and parameter. This is typical of datasets created for, and published in association withm scientific and regulatory studies. This dataset record published on CUAHSI’s Hydroshare platform is an example, where there is a “Related Geospatial Features” section that explicitly identifies several features that the dataset has data about.

In some cases, it is possible to set up a web architecture that implements both patterns. For example, the Wyoming State Engineer’s Office Web Portal conceptualizes a time series for a specific parameter at a specific location as a dataset. Thus, webpages exist for both Locations and Datasets, and they link to each other where relevant. In this case, it is only necessary to implement Geoconnex embedded JSON-LD at either the Location or Dataset level, although both could be done as well.

Having chosen one of the patterns, proceed to location-oriented or dataset-oriented guidance to start building a JSON-LD document.

2.1.1 Location-oriented

The purpose of the location-oriented page is to give enough information about the location and the data available about that location that a water data user would be able to quickly determine whether and how to download the data after reading. We will use the USGS Monitoring Location 08282300 as an example for the type of content to put in location-oriented Geoconnex landing page web resources and how to map that content to embedded JSON-LD documents.

Note

Scroll up and down to view elements of the example landing page

This location-oriented web resource includes this type of information

“This is my HTTP identifier”⁴
“I am the same thing as Geoconnex Reference Gage 1018463”⁵
“My unique USGS ID is 08282300”
“My name is Rio Brazos at Fishtail Road NR Tierra Amarilla, NM”
“Data about me is provided by the USGS Water Data for the Nation”
“I am a hydrometric station⁶”
“My lat/long is 36.738 -106.471”
“I am on the Rio Brazos”⁷
“There is data about me for the parameter Discharge and between June 6, 2014 to the present at a 15 minute time resolution. This data is generated from in-situ observation, in particular using USGS discharge measurement methods. You can download it here using the USGS Instantaneous Values REST Web Service in the RDB format”. You can also download it here using the SensorThings API standard in JSON or CSV formats.”⁸
“There is data about me for the parameter Gage Height between June 6, 2014 to the present at a 15 minute time resolution. This data is generated from in-situ observation, in particular using USGS stage measurement methods. You can download it here from the USGS Instantaneous Values REST Web Service in the RDB format”. You can also download it here using the SensorThings API standard in JSON or CSV formats.”

2.1.1.1 JSON-LD

Here we will build the equivalent JSON-LD content step-by-step. The steps are:

Identifiers and provenance
Spatial geometry and hydrologic references
Datasets

These culminate in the complete example.

2.1.1.1.1 Identifiers and provenance

A first group of information helps identify the location and its provenance.

“This is my HTTP identifier”⁹
“I am a hydrometric station¹⁰”
“I am the same thing as Geoconnex Reference Gage 1018463”¹¹
“My unique USGS ID is 08282300”
“My name is Rio Brazos at Fishtail Road NR Tierra Amarilla, NM”
“Data about me is provided by the USGS Water Data for the Nation”

{
  "@context": {
    "@vocab":"https://schema.org/",
    "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
    "locType": "http://vocabulary.odm2.org/sitetype/"
  },
  "@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
  "@type": [
    "hyf:HY_HydrometricFeature",
    "hyf:HY_HydroLocation",
    "locType:stream"
  ],
  "hyf:HydroLocationType": "hydrometric station",
  "sameAs": {"@id":"https://geoconnex.us/ref/gages/1018463"},
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "USGS site number",
    "value":  "08282300"
  },
  "name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
  "description": "Stream/River Site",
  "provider": {
    "url": "https://waterdata.usgs.gov",
    "@type": "GovernmentOrganization",
    "name": "U.S. Geological Survey Water Data for the Nation"
  }
}

Here we construct the JSON-LD document by adding a context which includes the https://schema.org/ vocabulary, as well as the https://www.opengis.net/def/schema/hy_features/hyf/ vocabulary which defines specific concepts in surface hydrology, and the ODM2 sitetype vocabulary which defines types of water data collection locations.

The @id element of https://geoconnex.us/ref/monitoring-location/08282300 in this case is a persistent geoconnex URI. See here for how to create these. It is optional if the “same thing” geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.
The @type element here specifies that https://geoconnex.us/ref/monitoring-location/08282300 is a Place (i.e. a generic place on earth), a Hydrometric Feature (i.e. a data collection station) and a HydroLocation (i.e. a specific location that could in principle define a catchment). The locType further specifies the type of location using the ODM2 sitetype vocabulary http://vocabulary.odm2.org/sitetype/, which expresses the location type in terms of the feature of interest (e.g. a stream, a groundwater system). If the location is more meant to represent a general location about which non-hydrologic data is being provided, as might be the case with a data provider publishing data about dams, levees, culverts, bridges, etc. but not associated water data, then locType and hyf:HY_HydrometricFeature can be omitted.
The hyf:HydroLocationType can be used to identify the type of site with greater specificity and customization by using text values from any codelist, but preferably the HY_Features HydroLocationType codelist instead of identifiers. It can be useful to describe something like a dam, weir, culvert, bridge, etc.
The sameAs element is optional if the @id element is included as a persistent geoconnex URI. However, wherever possible, it should be populated with a Geoconnex Reference Feature URI. If all data providers tag their own location metadata with these, it becomes much more easy for users of the Geoconnex system to find data collected by other providers about the same location. Reference features of all sorts are available to browse in a web map at https://geoconnex.us/iow/map, access via API at https://reference.geoconnex.us/collections, or to download in bulk as GeoPackage files from HydroShare. If your location does not appear to be represented in a reference location, please consider contributing your location. You can start this process by submitting an issue at the geoconnex.us GitHub repository. In this case sameAs is a persistent geoconnex URI for a “Reference Gage”. Reference Gages is an open source, continuously updated set of all known surface water monitoring locations with data being collected by all known organizations. It is managed on GitHub at https://github.com/internetofwater/ref_gages
The identifier element specifies the ID scheme name (propertyID) for the location in the data source and the ID itself (value)
The name (required) and description (optional) elements are self-explanatory and can follow the conventions of the data provider.
The provider element describes the data provider, which is generally conceptualized in Geoconnex as being a data system available on the web. Note that under provider, in addition to an identifying name, there is a url if available for the website of the providing data system, and a @type, which is most likely a sub type of https://schema.org/Organization, which includes GovernmentOrganization, NGO, ResearchOrganization, EducationalOrganization, and Corporation, among others.

2.1.1.1.2 Spatial geometry and hydrologic references

The second group of information provides specific location and spatial context:

“My lat/long is 36.738 -106.471”
“I am on the Rio Brazos”¹²

Adding this information to the bottom of JSON-LD document:

{
  "@context": {
    "@vocab":"https://schema.org/",
    "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
    "locType": "http://vocabulary.odm2.org/sitetype/"
  },
  "@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
  "@type": [
    "hyf:HY_HydrometricFeature",
    "hyf:HY_HydroLocation",
    "locType:stream"
  ],
  "hyf:HydroLocationType": "hydrometric station",
  "sameAs": {"@id":"https://geoconnex.us/ref/gages/1018463"},
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "USGS site number",
    "value":  "08282300"
  },
  "name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
  "description": "Stream/River Site",
  "provider": {
    "url": "https://waterdata.usgs.gov",
    "@type": "GovernmentOrganization",
    "name": "U.S. Geological Survey Water Data for the Nation"
  },
  "geo": {
    "@type": "schema:GeoCoordinates",
    "longitude": -106.4707722,
    "latitude": 36.7379333
  },
  "gsp:hasGeometry": {
    "@type": "http://www.opengis.net/ont/sf#Point",
    "gsp:asWKT": {
      "@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
      "@value": "POINT (-106.4707722 36.7379333)"
     },
    "gsp:crs": {
      "@id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
    }
   },
  "hyf:referencedPosition":{
    "hyf:HY_IndirectPosition":{
      "hyf:linearElement":{
        "@id": "https://geoconnex.us/ref/mainstems/1611418"
      }
    }
  }
    
}

We have added a context element gsp and three blocks: geo, gsp:hasGeometry, and hyf:referencedPosition.

gsp is the GeoSPARQL ontology used to standardize the representation of spatial data and relationships in knowledge graphs like the Geoconnex system
geo is the schema.org standard for representing spatial data. It is what is used by search engines like Google and Bing to place webpages on a map. While useful, it does not have a standard way for representing multipoint, multipolyline, or multipolygon features, or a way to specify coordinate reference systems or projections, and so we need to also provide a GeoSPARQL version of the geometry. In this case, we are simply providing a point with a longitude and latitude via the schema:GeoCoordinates property. It is also possible to represent lines and polygons
gsp:hasGeometry is the GeoSPARQL version of geometry, with which we can embed WKT representations of geometry in structured metadata in the @value element, and declare the coordinate reference system or projection in the gsp:crs element by using EPSG codes as encoded in the OGC register of reference systems, in this case using http://www.opengis.net/def/crs/EPSG/0/4326 for the familiar WGS 84 (EPSG 4326) system.
hyf:referencedPosition uses the HY_Features model to declare that this location is located on a specific river, in this case the Rio Brazos in New Mexico as identified in the Reference Mainstems dataset, which is available via API at https://reference.geoconnex.us/collections/mainstems and managed on GitHub at https://github.com/internetofwater/ref_rivers. All surface water locations should include this type of element.

What about groundwater?

Groundwater monitoring locations may use the hyf:referencedPosition element if data providers wish their wells to be associated with specific streams. However, groundwater sample and monitoring locations such as wells can also be referenced to hydrogeologic unit or aquifer identifiers where available using this pattern, instead of using the hyf:referencedPosition pattern:

"http://www.w3.org/ns/sosa/isSampleOf": {
  "id": "https://geoconnex.us/ref/sec_hydrg_reg/S26"
}

USGS Principal Aquifers and Secondary Hydrogeologic Unit URIs are available from https://reference.geoconnex.us/collections

If reference URIs are not available for the groundwater unit you’d like to reference, but an ID does exist in a dataset that exists online you may use this pattern

"http://www.w3.org/ns/sosa/isSampleOf": {
  "@type": "GW_HydrogeoUnit",
  "name": "name of the aquifer",
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "Source aquifer dataset id field name",
    "value":  "aq-id-1234"
  },
  "subjectOf": {
    "@type": "Dataset",
    "url": "url where dataset that descibes or includes the aquifer can be accessed"
   }
}

2.1.1.1.3 Datasets

Now that we have described our location’s provenance, geospatial geometry, and association with any reference features , we now describe the data that can be accessed about that location. The simplest, most minimal way to do this is to add a block like this, which would be added to the bottom of the JSON-LD document we have created so far:

"subjectOf": {
  "@type": "Dataset",
  "name": "Discharge data from USGS-08282300",
  "description": "Discharge data from USGS-08282300 at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
  "url": "https://waterdata.usgs.gov/monitoring-location/08282300/#parameterCode=00060&period=P7D"
}

Here, we simply declare that the location we have been working with is subjectOf of a Dataset with a name, description, and URL where information about the dataset can be found.

However, to enable data users (and search engines) to filter for your data using more standardized names for variables, and by temporal coverage and resolution, and determine if they want to use that data based on the methods used (such as whether it is observed or modeled/forecasted data), and possibly preview actual data values, it will be useful to include much more detailed metadata. In general, following Science-on-Schema.org Guidelines is recommended. We implement this guidance, with some extension, for the USGS Monitoring Location example. Hover over the code annotation bubbles on the right for translation and explanation:

{
 "subjectOf":{
  "@type": "Dataset",
  "name": "Discharge data from USGS Monitoring Location 08282300",
  "description": "Discharge data from USGS Streamgage at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
  "license": "https://spdx.org/licenses/CC-BY-4.0",
  "isAccessibleForFree": "true",
  "variableMeasured": {
   "@type": "PropertyValue",
   "name": "discharge",
   "description": "Discharge in cubic feet per second",
   "propertyID": "https://www.wikidata.org/wiki/Q8737769",
   "url": "https://en.wikipedia.org/wiki/Discharge_(hydrology)",
   "unitText": "cubic feet per second",
   "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
   "unitCodet": "qudt-units:FT3-PER-SEC",
   "measurementTechnique": "observation",
   "measurementMethod": {
     "name":"Discharge Measurements at Gaging Stations",
     "publisher": "U.S. Geological Survey",
     "url": "https://doi.org/10.3133/tm3A8"
     }
    },
   "temporalCoverage": "2014-06-30/..",
   "dc:accrualPeriodicity": "freq:daily",
   "dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"},
   "distribution": [
     {
      "@type": "DataDownload",
      "name": "USGS Instantaneous Values Service"
      "contentUrl": "https://waterservices.usgs.gov/nwis/iv/?sites=08282300&parameterCd=00060&format=rdb",
      "encodingFormat": ["text/tab-separated-values"],
      "dc:conformsTo": "https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf"
      },
      {
      "@type": "DataDownload",
      "name": "USGS SensorThings API",
      "contentUrl": "https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations",
      "encodingFormat": ["application/json"],
      "dc:conformsTo": "https://labs.waterdata.usgs.gov/docs/sensorthings/index.html"
      }
    ]
  }
}

1: This node (we are continuing from the above JSON-LD document, so the USGS Monitoring Location) is subjectOf the node that follows)
2: This node is a Dataset(https://schema.org/Dataset)
3: The dataset’s name and description
4: The dataset’s license, which is most easily populated by a URI for the license appropriate for your data. Federal agencies, many state agencies, and academics use open licenses such as those provided by opendatacommons.org and creativecommons.org. URIs for licenses are available from https://spdx.org/licenses/
5: Either true or false depending on if the dataset is available for free.
6: The dataset includes information on a variable. (in schema.org called variableMeasured). Multiple variableMeasured can be specified for datasets by using arrays, which is useful for datasets that must be downloaded in bulk that include multiple variables of interest. In general it is more clear to specify a “dataset” per variableMeasured if the data has different temporal coverage per variable, or can be downloaded on a per-variable basis. Multiple variableMeasured can be specified using nested JSON arrays.
7: @PropertyValue is a generic type to extend schema.org properties and should just be used as a rule on variableMeasured nodes.
8: propertyID should be a URI where there is a machine-readable resource defines what the variable is. In this case, we are using a Wikidata link to the concept of stream discharge. In general, a good source for URIs is the ODM2 variable vocabulary.
9: Here url points to a human-readable resource describing the variable, in this case, we are using a Wikipedia link to the concept of stream discharge.
10: Here we use the units as written in the data source.
11: While name and propertyID specifies the variable as being “discharge” in this case, since multiple data sources might use different words and identifiers for their variables, it can be useful to reference a more general category of variables that we can ue to group variables across sources. We can use identifiers for QuantityKinds from QUDT, which we reference with the qudt-quantkinds for the prefix as described in the @context in Section 1.3.1.
12: While unitText above specifies the units, since multiple data sources might use different words for the same unit, to improve interoperability we can use identifiers for units provided by QUDT, which we reference with the qudt-units vocabulary prefix as described in the @context in Section 1.3.1. If units from QUDT are unavailable, first check if unitText can be filled with a term from name from http://vocabulary.odm2.org/units/.
13: measurementTechnique is meant to be a highly general account of the data generating procedure, and primarily to distinguish between observed and modeled data. It is highly recommended for this to be model or observation, or if more specificity is required, to restrict these values to the ODM2 methodType vocabulary.
14: measurementMethod specifies the method used to generate the data to as great a degree of specificity as possible. Ideally it could a persistent identifier that directs to a machine-readable web resource that unambiguously describes that method. This would look something like this: "measurementMethod": {"@id": "https://www.nemi.gov/methods/method_summary/4680/"} In lieu of that, a name, description and URL to human-readable web resource like an explanatory webpage, technical report, standards document, or academic article would be appropriate, as in this example for USGS discharge measurement.
15: temporal coverage refers to the first and last time for which data is available. It can be specified using ISO 8061 interval format (YYYY-MM-DD/YYYY-MM-DD, with the start date first and the end date after the / . It can also include time like so YYYY-MM-DDTHH:MM:SS/YYYY-MM-DDTHH:MM:SS . If the dataset has no end date, as there is an active monitoring program, then this can be indicated like so YYYY-MM-DD/.. .
16: dc:accrualPeriodicty refers to the update schedule of the published dataset. The value of this can be from the Dublin Core frequency vocabulary (here in @context as freq:). dcat:temporalResolution refers to the minimum intended time spacing between observations in the case of regular time series data. The value should be an xsd duration encoded string e.g. “PT15M” for 15-minute, “P1D” for Daily, “PT1H” for Hourly, “P7D” for Weekly, “P1M” for Monthly, “P1Y” for Annual. freq: or be specified using ISO duration code.
17: distribution provides a way to structure information about data access points. This can range in complexity from a specification of a URL and format to specifications for how to interact with an API. In this example, a URL, format (encodingFormat populated by MIME type). conformsTo is optional and should be a document that helps interpret the data structure. This could be a link to a data dictionary in the case of simple tabular data, documentation of a data model for a complex database, or an API specification document for an API endpoint.
18: Multiple distributions can be specified using nested JSON arrays.

This translates roughly to

There is is the following information about me: a Dataset
- for the variable (measuredVariable) Discharge
  - It has values between June 6, 2014 to the present
  - at a 15 minute time resolution
  - updated/ published daily
  - in units of cubic feet per second
  - generated by location observation
  - generated in particular using USGS discharge measurement methods.
  - You can download it:
    - here
      - Using the USGS Instantaneous Values REST Web Service
      - in the RDB format
    - You can also download it here
      - Using the USGS SensorThings API implementation
      - in JSON

2.1.2 Dataset-oriented

The purpose of the dataset-oriented page is to give enough information about the data available and the area, locations, or features that it is relevant to that a water data user would be able to quickly determine whether and how to download the data after reading. We will use this data resource about water utility treated water demand that has been published at HydroShare as an example for the type of content to put in dataset-oriented Geoconnex landing page web resources and how to map that content to embedded JSON-LD documents.

Note

Scroll up and down to view elements of the example landing page

This dataset-oriented web resource includes this type of information

“This is my URI (which is a DOI-URL): https://geoconnex.us/ref/monitoring-location/08282300”¹³
“This is my permanent identifier, which is a DOI”: ¹⁴
“This is my URL https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299”¹⁵
“My creator is ”
“I am provided by HydroShare”
“My spatial coverage is the bounding box "35.5463 -79.1235 36.0520 -78.3765"”¹⁶
“I have data between January 1, 2002 and December 31, 2020” ¹⁷
“My data is at a 1 month time step frequency” ¹⁸
“I am about the following features”:¹⁹.
“I have the following variables”:
- Monthly Water demand measured in units of averaged millions of gallons per day
- Historic Mean monthly water demand over the period of record measured in units of millions of gallons per day
- The monthly water demand divided by historic mean monthly water demand, as a percent
“You can download me here on HydroShare as a zipped csv file”
“I am accessible for free subject to this license.

2.1.2.1 JSON-LD

Much is similar to the Datasets guidance for location-oriented web resources, so here we focus on the differences. Note that HydroShare automatically embeds JSON-LD. The JSON-LD examples below vary somewhat from HydroShare’s default content to illustrate optional elements that would be useful for Geoconnex that are not currently implemented in HydroShare.

2.1.2.1.1 Identifiers, provenance, license, and distribution.

For basic identifying and descriptive information, science-on-schema.org has appropriate guidance. In this case, note that a specific file download URL has been provided rather than an API endpoint, and that dc:conformsTo points to a data dictionary that is supplied at the same web resource.

{
  "@context": {
    "@vocab": "https://schema.org/", 
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "dc": "http://purl.org/dc/terms/",
    "qudt": "http://qudt.org/schema/qudt/",
    "qudt-units": "http://qudt.org/vocab/unit/",
    "qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
    "gsp": "http://www.opengis.net/ont/geosparql#",
    "locType": "http://vocabulary.odm2.org/sitetype",
    "odm2var":"http://vocabulary.odm2.org/variablename/",
    "odm2varType": "http://vocabulary.odm2.org/variabletype/",
    "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
    "skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
    "ssn": "http://www.w3.org/ns/ssn/",
    "ssn-system":  "http://www.w3.org/ns/ssn/systems/"
  },
  "@type": "Dataset",
  "@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
   "provider": {
    "url": "https://hydroshare.org",
    "@type": "ResearchOrganization",
    "name": "HydroShare"
  },
    "creator": {
    "@type": "Person",
    "affiliation": {
                    "@type": "Organization",
                    "name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy"
                  },
                  "email": "konda@lincolninst.edu",
                  "name": "Kyle Onda",
                  "url": "https://www.hydroshare.org/user/4850/"
            },
  "identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299",
  "name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard",
  "description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area",
  "url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299",
  "keywords": ["water demand", "water supply", "geoconnex"],
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isAccessibleForFree": "true",
     "distribution": [                                                                                       
     {
      "@type": "DataDownload", 
      "name": "HydroShare file URL", 
      "contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv",    
      "encodingFormat": ["text/csv"],                                                        
      "dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx"   
      },
      ...

2.1.2.1.2 Variables and Methods

Again, follows the dataset guidance. In the example below, multiple variableMeasured are specified using a nested array. Other differences to point out:

The unit of “million gallons per day” is not available from the QUDT units vocabulary. It is in the ODM2 units codelist, so we populate unitCode with the url listed there.
The measurementMethod for both variables, which are simply different aggregation statistics for the same variable, do not have known web resources or specific identifiers available, and so use description to clarify the method.

...,
   "variableMeasured": [
   {                                                                                      
   "@type": "PropertyValue",                                                                                 
   "name": "water demand",                                                                                    
   "description": "treated water delivered to distribution system",                                                   
   "propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",                                                  
   "url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",     
   "unitText": "million gallons per day",
   "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",                                                  
   "unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",        
   "measurementTechnique": "observation",                                                      
   "measurementMethod": {                                                                                     
     "name":"water meter",
     "description": "metered bulk value, accumlated over one month",
     "url": "https://www.wikidata.org/wiki/Q268503"                                                                   
     }                                                                                                        
    }, 
    {                                                                                      
   "@type": "PropertyValue",                                                                                
   "name": "water demand (monthly average)",                                                                                     
   "description": "average monthly treated water delivered to distribution system",                                                   
   "propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",                                                   
   "url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",                                              
   "unitText": "million gallons per day",                                                                       
   "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",                                                
   "unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",                                            
   "measurementTechnique": "observation",                                                  
   "measurementMethod": {                                                                               
     "name":"water meter",                                                   
     "description": "metered bulk value, average accumlated over each month for multiple years",                   
     "url": "https://www.wikidata.org/wiki/Q268503"                                                             
     }                                                                                                    
    },
    ],
   "temporalCoverage": "2002-01-01/2020-12-31",                                                                     
   "ssn-system:frequency": {                                                                                
     "value": "1",                                                                                        
     "unitCode": "qudt-units:Month"                                                                         
   },

2.1.2.1.3 Geoconnex Reference Feature Links and Spatial Coverage

Unlike the location-based example, where a location is explicitly the subjectOf the dataset, here, the dataset must be described as being about certain features. If the dataset is not explicitly about any discrete features, such as raster datasets, then a Spatial Coverage should be specified.

Using the about construction, a single geoconnex URI or an array of multiple can be constructed. In the below example, multiple are used. Note the nesting of nodes within the array so that each URI has an @id keyword and is @type Place. In this example, URIs from the geoconnex reference features set for Public Water Systems are used.

...,
"about": [
      {
        "@id": "https://geoconnex.us/ref/pws/NC0332010",
        "@type": "Place"
      },
    
      {
        "@id": "https://geoconnex.us/ref/pws/NC0368010",
        "@type": "Place"
      },
    
      {
        "@id": "https://geoconnex.us/ref/pws/NC0392010",
        "@type": "Place"
      },
    
      {
        "@id": "https://geoconnex.us/ref/pws/NC0392020",
        "@type": "Place"
      },
      
      {
        "@id": "https://geoconnex.us/ref/pws/NC0392045",
        "@type": "Place"
      }
    ],
    ...

To assist in finding reference features, https://reference.geoconnex.us allows queries following the OGC-API Features API standard and the CQL Common Query Language standard.

For example, to find the Geoconnex URI for the Raleigh public water system (PWS), we can construct the URL:

CQL filter API endpoint for the PWS feature collection https://reference.geoconnex.us/collections/pws/items
filter for name field pws_name: https://reference.geoconnex.us/collections/pws/items?filter=pws_name
filter for a name that includes “Raleigh”: https://reference.geoconnex.us/collections/pws/items?filter=pws_name ILIKE ‘%Raleigh%’

Sometimes it is impossible to use feature URIs because the relevant specific features are not available from https://reference.geoconnex.us/collections. If so, feel free to submit an issue to the geoconnex.us github repository requesting a reference feature set.

Sometimes it is impractical to list all applicable reference features, whether or not they are in https://reference.geoconnex.us or another source. This is common for comprehensive datasets that are all about an entire reference dataset or other another dataset like a hydrofabric, such as datasets summmarizing values to U.S. Counties, or the National Water Model generating values for all NHDPlusV2 COMID flowlines. In this case it is best to declare that the Dataset is isBasedOn the source geospatial fabric. For example, if the example dataset were about all public water systems instead of just the 5 listed, instead of about, we should specify an identifier, name, description, and any URLs for other resources that describe the source fabric and how to interpret it:

...,
"isBasedOn": {
"@id": "https://www.hydroshare.org/resource/9ebc0a0b43b843b9835830ffffdd971e/",
"name": "U.S. Community Water Systems Service Boundaries, v4.0.0"
"description": "This is a layer of water service boundaries for 45,973 community water systems that deliver tap water to 307.7 million people in the US." 
"url": "https://github.com/SimpleLab-Inc/wsb"
},
...

Sometimes there are no particular features that a dataset is explicitly about. This is common with remote sensing raster data. In this case, it is best to specify a spatialCoverage polygon using WKT encoded geometry:

  "spatialCoverage": {
    "@type": "Place",
    "gsp:hasGeometry": {
      "@type": "http://www.opengis.net/ont/sf#MultiPolygon",
      "gsp:asWKT": {
        "@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
        "@value": "MULTIPOLYGON (((-85.67957299999999 32.799514, -85.679637 32.822002999999995, -85.67199699999999 32.822063, -85.66421 32.821711, -85.647989 32.82224, -85.627966 32.822331, -85.627781 32.800716, -85.627496 32.778602, -85.635931 32.778656999999995, -85.645034 32.778146, -85.653352 32.778481, -85.67933699999999 32.778239, -85.67936399999999 32.784064, -85.679808 32.792068, -85.67957299999999 32.799514)))"
      }
    }
  }

2.2 Complete Examples

Below are complete examples for the general JSON-LD document types depending on the location or dataset orientation and data type.

They are viewable together below, or available for download:

2.2.1 Location-oriented

{
  "@context": {
    "@vocab": "https://schema.org/", 
    "xsd": "https://www.w3.org/TR/xmlschema-2/#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "dc": "http://purl.org/dc/terms/",
    "dcat": "https://www.w3.org/ns/dcat#",
    "freq": "http://purl.org/cld/freq/",
    "qudt": "http://qudt.org/schema/qudt/",
    "qudt-units": "http://qudt.org/vocab/unit/",
    "qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
    "gsp": "http://www.opengis.net/ont/geosparql#",
    "locType": "http://vocabulary.odm2.org/sitetype",
    "odm2var":"http://vocabulary.odm2.org/variablename/",
    "odm2varType": "http://vocabulary.odm2.org/variabletype/",
    "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
    "skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
    "ssn": "http://www.w3.org/ns/ssn/",
    "ssn-system":  "http://www.w3.org/ns/ssn/systems/"
  },
  "@id": "https://geoconnex.us/usgs/monitoring-location/08282300",
  "@type": [
    "hyf:HY_HydrometricFeature",
    "hyf:HY_HydroLocation",
    "locType:stream"
  ],
  "hyf:HydroLocationType": "hydrometric station",
  "sameAs": {
    "@id": "https://geoconnex.us/ref/gages/1018463"
  },
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "USGS site number",
    "value": "08282300"
  },
  "name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
  "description": "Stream/River Site",
  "provider": {
    "url": "https://waterdata.usgs.gov",
    "@type": "GovernmentOrganization",
    "name": "U.S. Geological Survey Water Data for the Nation"
  },
  "geo": {
    "@type": "schema:GeoCoordinates",
    "longitude": -106.4707722,
    "latitude": 36.7379333
  },
  "gsp:hasGeometry": {
    "@type": "http://www.opengis.net/ont/sf#Point",
    "gsp:asWKT": {
      "@type": "http://www.opengis.net/ont/geosparql#wktLiteral",
      "@value": "POINT (-106.4707722 36.7379333)"
    },
    "gsp:crs": {
      "@id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
    }
  },
  "hyf:referencedPosition": {
    "hyf:HY_IndirectPosition": {
      "hyf:linearElement": {
        "@id": "https://geoconnex.us/ref/mainstems/1611418"
      }
    }
  },
  "subjectOf": {
    "@type": "Dataset",
    "name": "Discharge data from USGS Monitoring Location 08282300",
    "description": "Discharge data from USGS Streamgage at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM",
      "provider": {
    "url": "https://waterdata.usgs.gov",
    "@type": "GovernmentOrganization",
    "name": "U.S. Geological Survey Water Data for the Nation"
  },
  "url": "https://waterdata.usgs.gov/monitoring-location/08282300/#parameterCode=00060",
    "variableMeasured": {
      "@type": "PropertyValue",
      "name": "discharge",
      "description": "Discharge in cubic feet per second",
      "propertyID": "https://www.wikidata.org/wiki/Q8737769",
      "url": "https://en.wikipedia.org/wiki/Discharge_(hydrology)",
      "unitText": "cubic feet per second",
      "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
      "unitCode": "qudt-units:FT3-PER-SEC",
      "measurementTechnique": "observation",
      "measurementMethod": {
        "name": "Discharge Measurements at Gaging Stations",
        "publisher": "U.S. Geological Survey",
        "url": "https://doi.org/10.3133/tm3A8"
      }
    },
    "temporalCoverage": "2014-06-30/..",
    "dc:accrualPeriodicity": "freq:daily",                                                               
    "dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"},  
    "distribution": [
      {
        "@type": "DataDownload",
        "name": "USGS Instantaneous Values Service",
        "contentUrl": "https://waterservices.usgs.gov/nwis/iv/?sites=08282300&parameterCd=00060&format=rdb",
        "encodingFormat": [
          "text/tab-separated-values"
        ],
        "dc:conformsTo": "https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf"
      },
      {
        "@type": "DataDownload",
        "name": "USGS SensorThings API",
        "contentUrl": "https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations",
        "encodingFormat": [
          "application/json"
        ],
        "dc:conformsTo": "https://labs.waterdata.usgs.gov/docs/sensorthings/index.html"
      }
    ]
  }
}

2.2.2 Dataset-oriented

{
  "@context": {
    "@vocab": "https://schema.org/", 
    "xsd": "https://www.w3.org/TR/xmlschema-2/#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "dc": "http://purl.org/dc/terms/",
    "dcat": "https://www.w3.org/ns/dcat#",
    "freq": "http://purl.org/cld/freq/",
    "qudt": "http://qudt.org/schema/qudt/",
    "qudt-units": "http://qudt.org/vocab/unit/",
    "qudt-quantkinds": "http://qudt.org/vocab/quantitykind/",
    "gsp": "http://www.opengis.net/ont/geosparql#",
    "locType": "http://vocabulary.odm2.org/sitetype",
    "odm2var":"http://vocabulary.odm2.org/variablename/",
    "odm2varType": "http://vocabulary.odm2.org/variabletype/",
    "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/",
    "skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType",
    "ssn": "http://www.w3.org/ns/ssn/",
    "ssn-system":  "http://www.w3.org/ns/ssn/systems/"
  },
  "@type": "Dataset",
  "@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
  "url": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299",
  "identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299",
  "name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard",
  "description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area",
  "url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299",
   "provider": {
    "url": "https://hydroshare.org",
    "@type": "ResearchOrganization",
    "name": "HydroShare"
  },
    "creator": {
    "@type": "Person",
    "affiliation": {
                    "@type": "Organization",
                    "name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy"
                  },
                  "email": "konda@lincolninst.edu",
                  "name": "Kyle Onda",
                  "url": "https://www.hydroshare.org/user/4850/"
            },
  "keywords": [
    "water demand",
    "water supply",
    "geoconnex"
  ],
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isAccessibleForFree": "true",
  "distribution": {
    "@type": "DataDownload",
    "name": "HydroShare file URL",
    "contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv",
    "encodingFormat": [
      "text/csv"
    ],
    "dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx"
  },
  "variableMeasured": [
    {
      "@type": "PropertyValue",
      "name": "water demand",
      "description": "treated water delivered to distribution system",
      "propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
      "url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
      "unitText": "million gallons per day",
      "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
      "unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
      "measurementTechnique": "observation",
      "measurementMethod": {
        "name": "water meter",
        "description": "metered bulk value, accumlated over one month",
        "url": "https://www.wikidata.org/wiki/Q268503"
      }
    },
    {
      "@type": "PropertyValue",
      "name": "water demand (monthly average)",
      "description": "average monthly treated water delivered to distribution system",
      "propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
      "url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/",
      "unitText": "million gallons per day",
      "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate",
      "unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048",
      "measurementTechnique": "observation",
      "measurementMethod": {
        "name": "water meter",
        "description": "metered bulk value, average accumlated over each month for multiple years",
        "url": "https://www.wikidata.org/wiki/Q268503"
      }
    }
  ],
  "temporalCoverage": "2002-01-01/2020-12-31",
  "dc:accrualPeriodicity": "freq:daily",                                                               
  "dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"}, 
  "about": [
    {
      "@id": "https://geoconnex.us/ref/pws/NC0332010",
      "@type": "Place"
    },
    {
      "@id": "https://geoconnex.us/ref/pws/NC0368010",
      "@type": "Place"
    },
    {
      "@id": "https://geoconnex.us/ref/pws/NC0392010",
      "@type": "Place"
    },
    {
      "@id": "https://geoconnex.us/ref/pws/NC0392020",
      "@type": "Place"
    },
    {
      "@id": "https://geoconnex.us/ref/pws/NC0392045",
      "@type": "Place"
    }
  ]
}

2.3 Appendices

2.3.1 Appendix 1: Codelists

2.3.1.1 measurementTechnique

The measurementTechnique property is meant to provide a way for data providers to “tag” their data with a general sense of how it was created, to help distinguish between aspects such as observed vs modeled data, or in-situ vs. remote-sensed data. It is a supplement to the measurementMethod property, which should identify the specific method used and provide a link to documentation specific enough to replicate the method. Multiple measurementTechniques can thus be specified. The codelist below can be used to choose terms from. Terms are derived from the USGS Thesaurus and the ODM2 methodType vocabulary.

code	definition
observation	This term is meant to group data generating procedures that occur primarily to directly measure phenomena. Examples include ground-based sensors like streamgages and weather stations, but also discrete water quality samples, habitat assessments, ecological surveys, and surveys of individuals, households, and organizations, as well as remote sensing. However, this category can include datasets that use procedures for gap filling missing data (e.g. streamgage data with sensor malfunction period data estimated from time series models)
model	This term is refers to data that are generated rather than observed. It groups data generating procedures that generate data for hypothetical states at discrete locations, such as (but not limited to): the future (e.g. river stage forecasts like the gage location-based forecasts from the NOAA Advanced Hydrologic Prediction System) counterfactuals (e.g. hydrologic models under varying assumptions about dam removal or reservoir operations) the unobserved past and present at the feature of interest (e.g. water quality models for parameters based on climate and upstream effluent discharge data)
field methods	Research procedures and instrumental means to measure, collect data and samples, and observe in the natural areas where the materials, phenomena, structures, or species being studied occur.
remote sensing	Acquiring information about a natural feature or phenomenon, such as the Earth’s surface, without actually being in contact with it. Typically carried out with airborne or spaceborne sensors or cameras.
estimation	A method for creating results by estimation or professional judgement.
derivation	A method for creating results by deriving them from other results. Datasets in this category may be generated from algorithms or human processes that combine heterogeneous source data into latent or derived variables (e.g. composite indexes such as health risk scores or regulatory categorizations such as “in compliance”), or spatially aggregate data from smaller geographic units to larger ones (e.g. Census area-based reporting), as long as the data is representing the phenomena of interest at the time and place it actually occurred and was measured.

Footnotes

https://geoconnex.us/ref/hu04/0308↩︎
https://geoconnex.us/ref/states/48↩︎
https://geoconnex.us/ref/counties/37003↩︎
This is ideally a persistent geoconnex URI. See here for how to create these. It is optional if the “same thing” geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.↩︎
Where possible, it will useful to tag your organization’s locations with pre-existing identifiers for reference locations, since many organizations collect data at the same location.↩︎
This ideally would come from a codelist so that data providers use consistent terminology↩︎
Note that ideally this would be a geoconnex URI for a river mainstem, in this case https://geoconnex.us/ref/mainstems/1611418 ↩︎
This is towards the ‘more detailed’ end of the spectrum. If data is not available via API, it is still good to include links to data file downloads or web apps that provide access to the data↩︎
This is ideally a persistent geoconnex URI. See here for how to create these. It is optional if the “same thing” geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted.↩︎
This ideally would come from a codelist so that data providers use consistent terminology↩︎
Where possible, it will useful to tag your organization’s locations with pre-existing identifiers for reference locations, since many organizations collect data at the same location.↩︎
Note that ideally this would be a geoconnex URI for a river mainstem, in this case https://geoconnex.us/ref/mainstems/1611418 ↩︎
If a permanent identifier like a DOI is available↩︎
for identifiers that are not HTTP URLs↩︎
The actual URL where the resource↩︎
Spatial coverage revers to maximum area extent of where data is about. For Geoconnex purposes, this is not necessary if the “about” elements with links to Geoconnex Reference Features is used↩︎
refers to the first and last time for which data is available. It can be specified using ISO 8061 interval format (YYYY-MM-DD/YYYY-MM-DD, with the start date first and the end date after the / . It can also include time like so YYYY-MM-DDTHH:MM:SS/YYYY-MM-DDTHH:MM:SS . If the dataset has no end date, as there is an active monitoring program, then this can be indicated like so YYYY-MM-DD/..↩︎
refers to the minimum intended time spacing between observations in the case of regular time series data↩︎
These should be geoconnex reference feature URIs. If the locations the dataset is about is not within https://reference.geoconnex.us/collections, then consider creating location-based resources and minting geoconnex identifiers. If the dataset is extensive over a vector feature spatial fabric, like all Census Tracts or HUC12s or NHD Catchments, then this can be a reference to a single reference fabric dataset rather than an array of identifiers for every single feature. If the dataset is extensive over an area but has no particular tie to a particular reference feature set, like a raster dataset, then this can be omitted.↩︎

--- title: "Geoconnex Structured Metadata Guidance" title-block-banner: '#181868' number-sections: true website: search: type: overlay format: html: toc: true toc-depth: 7 toc-expand: 6 toc-title: Table of Contents toc-location: left code-tools: true code-overflow: wrap code-line-numbers: true code-annotations: hover # embed-resources: true anchor-sections: true link-external-newwindow: true comments: # utterances: # repo: internetofwater/geoconnex-guidance hypothesis: true editor: visual --- ::: {align="center"} <img src="images/geoconnex-logo.png" alt="Logo" width="600"/> ::: # Introduction {#sec-introduction} The Geoconnex project is about providing technical infrastructure and guidance to create an open, community-contribution model for a knowledge graph linking hydrologic features in the United States, published in accordance with [Spatial Data on the Web best practices](https://www.w3.org/TR/sdw-bp/) as an implementation of [Internet of Water](https://github.com/opengeospatial/SELFIE/blob/master/docs/demo/internet_of_water.md) principles. The development of geoconnex.us takes place on GitHub. See [here](https://github.com/internetofwater/about.geoconnex.us) for the system of repositories. Geoconnex will allow data users to answer questions like: "What datasets are available about the portions of [**Colorado River**](https://geoconnex.us/ref/mainstems/29559 "This is an HTTP identifier in the geoconnex system to unambiguously denote the Colorado River") *upstream* of [**Hoover Dam**](https://geoconnex.us/ref/dams/1080095 "This an HTTP identifier in the Geoconnex system to unambiguously identify the Hoover Dam, in particular its location along the Colorado River") *within* **Nevada** and **Utah** regarding *variables* **discharge** and **total suspended solids** with measurements taken at least **daily** with *coverage* **between 2002 and 2007**?" and be returned metadata for all relevant datasets from all participating organizations, including federal, state, private, and NGO organizations. See <https://geoconnex.us/demo> for a mockup of data discovery and access workflows that `https://geoconnex.us` aspires to enable. Geoconnex rests on data providers publishing metadata to the system. Thus, Geoconnex involves the publication of Web Resources, which include structured, embedded metaadata that describe water datasets and the real-world environmental features (eg rivers, wells, dams, catchments) or the cataloging features (eg government jurisdiction areas, statiscal summary reporting areas) that they are relevant to. This document provides guidance, including general principles as well as specific templates, for data providers for how to structure this metadata using the JSON-LD format. **Related materials, presentations, and publications** [National Hydrography Infrastructure and Geoconnex](https://drive.google.com/file/d/1J0NKYOq3pGjQXr58FKO8sd7uHpGA8kNB/view?usp=sharing) [New Mexico Water Data Initiative including geoconnex.us](https://docs.google.com/presentation/d/1yuNpBbQPcmb_Nw8DXiuNTazAjIM8UF7o/edit?usp=sharing&ouid=102421334323378854304&rtpof=true&sd=true) [Roundtable presentation including geoconnex.us](https://www.westernstateswater.org/wp-content/uploads/2020/06/CO_Roundable_IoW.pdf) [Second Environmental Linked Features Interoperability Experiment](https://github.com/opengeospatial/SELFIE) [ESIP Sessions on Structured Data in the Web](https://2020esipsummermeeting.sched.com/event/cIvv/structured-data-on-the-web-putting-best-practice-to-work) [slides](https://docs.google.com/presentation/d/1LSXHz2_Y7hrkGZPC_sNoJWl8AIujI8AAWktl9amIR4E/edit#slide=id.g8250495469_1_30) ## Basic Information Model {#sec-infomodel} The model used to organize information in the Geoconnex system is shown in @fig-info-model. ![Basic information model for resources in geoconnex](images/screenshot.png){#fig-info-model} - **Data providers** refer to specific systems that publish water-related **datasets** on the web. Many times a provider will simply be the data dissemination arm of an organization, such as the [Reclamation Information Sharing Environment (RISE)](https://data.usbr.gov) of the US Bureau of Reclamation. Some organizations may have multiple data providers, such as US Geological Survey, which administers the [National Water Information System](https://waterdata.usgs.gov) as well as the [National Groundwater Monitoring Network](https://cida.usgs.gov/ngwmn/), among others. Some data providers are aggregators of other organizations' data, such as the [Hydrologic Information System](https://data.cuahsi.org) of CUAHSI. - **Datasets** refer to specific collections of data that are published by data providers. In the context of Geoconnex, a single dataset generally refers to one that is collected from, or summarizable to, a specific spatial **location** on earth, as part of a specific activity. For example, a dataset would be the stage, discharge and water quality sensor data coming from a single stream gage, but not the collection of all stream gage readings from all stream gages operated by a given organization. A dataset could also be the time-series of a statistical summary of water use at the county level. - **Locations** are specific locations on earth that datasets are collected from or about, such as stream gages, groundwater wells, and dams. In the case of data that is reported at a summary unit such as a state, county, or hydrologic unit code (HUC), these can also be considered Locations. Conceptually, multiple datasets from multiple providers can be about the same Location, as might occur when a USGS streamgage and a state DEQ water quality sampling site are both located at a specific bridge. - **Hydrologic features** are elements of the water system that are related to locations. For example, a point may be on a river, which is within a watershed, and whose flow influences an aquifer. Each of these are distinct, identifiable features which many Locations are hydrologically related to, and which a user of a given dataset might also want to use. - **Cataloging features** are areas on earth that commonly group datasets. They are a superset of summary features such as HUCs, counties and states. For example, a state-level dataset summarizing average annual surface water availability would not have states as a cataloging feature. However, streamgage is within a state, county, HUC, congressional district, etc and may be tagged with these features in metadata, and thus be filtered alongside other streamgages within the same state. This Geoconnex guidance concerns how to explicitly publish metadata that describes Datasets how they are related to each of the other elements of the information model. ## JSON-LD Primer {#sec-primer} JSON-LD is a version of JSON, the popular data exchange format used by web APIs, to express linked data. Linked Data is an approach to data publication that allows data from various sources to be easily integrated. JSON-LD accomplishes this by mapping terms from a source data system to a machine-readable definition of that term available on the web, allowing different attribute names from different data sources to be consistently interpreted together. Commonly, JSON-LD is embedded within websites, allowing search engines and applications to parse the information available from web addresses (URLs). For an in-depth exploration and multimedia resources, refer to the [JSON-LD official site](https://json-ld.org) and its [learning section](https://json-ld.org/learn.html). JSON-LD documents should be embedded in the HTML of websites using script headers. A brief overview of the JSON-LD format follows below. Below is an example JSON-LD document as embedded in a `<script>` division within a `<head>` or `<body>` section of an HTML page, with an explanation of its major elements. ``` json <script type="application/ld+json"> { "@context": { "@vocab": "https://schema.org/", "ex": "https://example.com/schema/", "locType": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType" }, "@id": "https://example.com/well/1234", "@type": "schema:Place", "name": "Well 1234", "description": "Well at 1234 Place St., USA", "locType": "well", "subjectOf": { "@id": "https://datasystem.org/dataset1", "@type": "schema:Dataset", "name": "Well Locations Dataset", "ex:recordCount": 500 } } <script> ``` **`<script type="application/ld+json">`, `<script>`** These are immutable HTML elements that tell machines to interpret everything between them as JSON-LD. **`@context`** The `@context` keyword in JSON-LD sets the stage for interpreting the data by mapping terms to IRIs (Internationalized Resource Identifiers). By doing so, properties and values are clearly defined and identified. Our updated example has two contexts: - `@vocab`: Sets the default document vocabulary to `https://schema.org/`, which is a standard vocabulary for web-based structured data. This means that in general, attributes in the document will be assumed to have `https://schema.org/` as a prefix, so JSON-LD parsers will map `name` to <https://schema.org/name> - `ex`: This is a custom context prefix representing `https://example.com/schema/`, signifying specific extensions or custom data definitions specific to our website. The prefix can be used on other attributes so that JSON-LD parsers do the appropriate mapping. Thus, `ex:name` will be parsed as `https://example.com/schema/recordCount`. - `locType`: This is a custom direct attribute mapping, specifying that this attribute exactly matches to the concept identified by this HTTP identifier <https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType>. Using this direct mapping approach allows data publishers to map their arbitrary terminology to any publicly accessibly and well-identified standard term. **`@id`** The `@id` keyword furnishes a uniform resource identifier (URI) for subjects in the JSON-LD document, enabling the subjects to be interconnected with data elsewhere. In this example: - Well 1234 has the identifier `https://example.com/well/1234`. - The dataset that it is about, "Well Locations Dataset", has its unique identifier as `https://datasystem.org/dataset1`. **`@type`** The `@type` keyword stipulates the type or nature of the subject or node in the JSON-LD. It aids in discerning the entity being depicted. In the given context: - Well 1234 is specified as a "Place" from the schema.org vocabulary (`schema:Place`). - Well Locations Dataset's type is a "Dataset" from the schema.org vocabulary (`schema:Dataset`). **Nodes** Nodes represent entities in JSON-LD, with each entity having properties associated with it. In the example: - The main node is Well 1234, possessing properties like "name", "description", "locType", and "subjectOf". - subjectOf property itself is a node representing a dataset that is about Well 1234. Apart from the "name" property, the dataset now also has a property called "ex:recordCount" (using the `ex:` prefix from `@context`) indicating the number of rows in the dataset. This extension showcases the flexibility and strength of JSON-LD, where you can seamlessly integrate standard vocabulary with custom definitions, ensuring rich and well-structured interconnected data representations. Below, you can see how JSON-LD tools would parse and standardize the JSON-LD in the example. ```{=html} <iframe width="780" height="500" src="https://tinyurl.com/29qaectm" title="JSON-LD playground"></iframe> ``` ## Geoconnex JSON-LD elements {#sec-jsonldelem} A Geoconnex JSON-LD document should be embedded in a human-readable website that is about either a **Location** or a **Dataset**. Documents about **Locations** should ideally include references to relevant **Hydrologic Features**, **Cataloging Features**, and **Datasets**. Documents about **Datasets** *must* include references to one or more relevant Reference **Monitoring Locations** or **Hydrologic Features** or **Cataloging Features**, or declare their spatial coverage. ### Context {#sec-context} Geoconnex JSON-LD documents can have varying contexts. However, there are several vocabularies other than `schema.org` that mqy be useful, depending on the type of location and dataset being described and the level of specificity for which metadata is produced by the data provider. The example context below can serve as general-purpose starting point, although simpler contexts may be sufficient for many documents: ``` json "@context": { "@vocab": "https://schema.org/", "xsd": "https://www.w3.org/TR/xmlschema-2/#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "dc": "http://purl.org/dc/terms/", "dcat": "https://www.w3.org/ns/dcat#", "freq": "http://purl.org/cld/freq/", "qudt": "http://qudt.org/schema/qudt/", "qudt-units": "http://qudt.org/vocab/unit/", "qudt-quantkinds": "http://qudt.org/vocab/quantitykind/", "gsp": "http://www.opengis.net/ont/geosparql#", "locType": "http://vocabulary.odm2.org/sitetype", "odm2var":"http://vocabulary.odm2.org/variablename/", "odm2varType": "http://vocabulary.odm2.org/variabletype/", "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/", "skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType", "ssn": "http://www.w3.org/ns/ssn/", "ssn-system": "http://www.w3.org/ns/ssn/systems/" } ``` - `@vocab` specifies [`schema`](https://schema.org/) as the default vocabulary from https://schema.org - [`xsd`](https://www.w3.org/TR/xmlschema-2/) is a general web-enabled data types vocabulary (e.g., text vs number vs. datetime) - [`rdfs`](https://www.w3.org/TR/rdf12-schema/) is a general vocabulary for basic relationships - [`dc`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#) is the [Dublin Core](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/) vocabulary for general information metadata attributes - [`dcat`](https://www.w3.org/ns/dcat#) is the [Data Catalog (DCAT) Vocabulary](https://www.w3.org/TR/vocab-dcat-3), a vocabulary for dataset metadata attributes - [`freq`](http://purl.org/cld/freq/) is the [Dublin Core Collection Frequency Vocabulary](https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/), a vocabulary for dataset temporal resolution and update frequency - [`qudt-units`](https://www.qudt.org/doc/DOC_VOCAB-UNITS.html) provides standard identifiers for units (e.g. [cubic feet per second](https://qudt.org/vocab/unit/FT3-PER-SEC)) - [`qudt-quantkinds`](https://www.qudt.org/doc/DOC_VOCAB-QUANTITY-KINDS.html) provides ids for general phenomena (e.g. [Volume flow rate](https://qudt.org/vocab/quantitykind/VolumeFlowRate)) which may be measured in various units - [`gsp`](http://defs.opengis.net/vocprez/object?uri=http://www.opengis.net/def/function/geosparql) provides ids for spatial relationships (e.g. intersects) - [`odm2var`](http://vocabulary.odm2.org/variablename) is a supplement to `qudt-quantkinds`, and includes ids for many variables relevant to water science and management (e.g. [turbidity](http://vocabulary.odm2.org/variablename/turbidity/)) - [`odm2varType`](http://vocabulary.odm2.org/variabletype/) is a supplement to `odm2var` that includes ids for large groupings of variables (e.g. [Water Quality](http://vocabulary.odm2.org/variabletype/WaterQuality/)) - [`hyf`](https://www.opengis.net/def/schema/hy_features/hyf/) provides ids for surface water hydrology concepts (e.g. [streams](https://defs.opengis.net/vocprez/object?uri=https%3A//www.opengis.net/def/schema/hy_features/hyf/HY_River)) - [`skos`](https://www.w3.org/TR/swbp-skos-core-spec/) provides general properties for relating different concepts (e.g. broader, [narrower,](https://www.w3.org/2009/08/skos-reference/skos.html#narrower) exact Match) - [`ssn`](https://www.w3.org/TR/vocab-ssn/) and `ssn-system` provide ids for aspects of observations and measurement (e.g. measurement methods) ### Reference Features {#sec-ref} Embedding links to URIs of Reference Features are the best way to ensure that your data can be related to other data providers' data. URIs for reference features are available from [the Geoconnex reference feature server](https://reference.geoconnex.us/collections). Reference features can be one of three types: - **Monitoring Locations** which are common locations that many organizations might have data about such as a streamgage station e.g. <https://geoconnex.us/ref/gages/1143822> - **Hydrologic Features** which are common specific features of the hydrologic landscape that many organizations have data about. These could include confluence points, aquifers, stream segments and river mainstems and named tributaries, e.g. <https://geoconnex.us/ref/mainstems/29559>. - **Cataloging Features** which are larger area units that are commonly used to group and filter data, such as [HUCs](https://geoconnex.us/ref/hu04/0308)[^1], [states](https://geoconnex.us/ref/states/48)[^2], [counties](https://geoconnex.us/ref/counties/37003)[^3], PLSS grids, public agency operating districts, etc. [^1]: https://geoconnex.us/ref/hu04/0308 [^2]: https://geoconnex.us/ref/states/48 [^3]: https://geoconnex.us/ref/counties/37003 # Building Geoconnex Web Resources, Step-by-Step {#sec-step-by-step} This section provides step-by-step guidance to build Geoconnex Web Resources, which should be an HTML webpage with a unique URL within which is embedded an JSON-LD document (see @sec-primer). See @sec-complete-examples for completed example documents to skip the step-by-step. ## Location or Dataset oriented? Depending on what kind of resource i.e. (location or dataset) and the level of metadata you have available to publish, you can use different elements of the `@context` or use Reference Features in various ways. Below we will work through creating a JSON-LD document depending on your situation. There are two basic patterns to think about: 1. `Location-oriented` webpages that include a catalog of parameters and periods of record for which there is data about the location. This pattern may be suitable where data can be accessed separately for each location and possibly for each parameter for each location. This is typical of streamgages, monitoring wells, water diversions, reservoirs, regulated effluent discharge locations, etc. where there is an ongoing monitoring or modeling program that includes data collection or generation for multiple parameters. The Monitor My Watershed Site pages published by the [Stroud Center](https://stroudcenter.org) are an example of this pattern. At [this page](https://monitormywatershed.org/sites/RH_MD/), one finds a variety of information about a specific location, such as that location's identifier and name and a map of where it is. In addition there is information about which continuous sensor and field water quality sample data are available about the location, and links to download these data. 2. `Dataset-oriented` webpages that tag which locations are relevant to the dataset described at a given page. This pattern may be suitable for static datasets where data was collected or modeled for a consistent set of parameters for a pre-specified research question and time period across one or more locations, and where it would not make sense to publish separate metadata for the parts of the dataset that are relevant to each individual feature and parameter. This is typical of datasets created for, and published in association withm scientific and regulatory studies. [This dataset record](https://www.hydroshare.org/resource/11dd1840fe6a48abb9a33380ecaa6e1d/) published on [CUAHSI](https://cuahsi.org)'s [Hydroshare](https://hydroshare.org) platform is an example, where there is a "Related Geospatial Features" section that explicitly identifies several features that the dataset has data about. In some cases, it is possible to set up a web architecture that implements both patterns. For example, the [Wyoming State Engineer's Office Web Portal](https://seoflow.wyo.gov) conceptualizes a time series for a specific parameter at a specific location as a dataset. Thus, webpages exist for both [Locations](https://seoflow.wyo.gov/Data/Location/Summary/Location/06280300/Interval/Latest) and [Datasets](https://seoflow.wyo.gov/Data/DataSet/Summary/Location/06280300/DataSet/Discharge/Discharge/Interval/Latest), and they link to each other where relevant. In this case, it is only necessary to implement Geoconnex embedded JSON-LD at either the Location or Dataset level, although both could be done as well. Having chosen one of the patterns, proceed to [location-oriented](@sec-loc) or [dataset-oriented](@sec-data) guidance to start building a JSON-LD document. ### Location-oriented {#sec-loc} The purpose of the location-oriented page is to give enough information about the location and the data available about that location that a water data user would be able to quickly determine whether and how to download the data after reading. We will use the USGS Monitoring Location [08282300](https://geoconnex.us/usgs/monitoring-location/08282300) as an example for the type of content to put in location-oriented Geoconnex landing page web resources and how to map that content to embedded JSON-LD documents. ::: callout-note Scroll up and down to view elements of the example landing page ::: ```{=html} <iframe width="780" height="500" src="https://waterdata.usgs.gov/monitoring-location/08282300" title="USGS Example"></iframe> ``` This location-oriented web resource includes this type of information - "[This is my HTTP identifier](https://geoconnex.us/ref/monitoring-location/08282300)"[^4] - "I am the same thing as [Geoconnex Reference Gage 1018463](https://geoconnex.us/ref/gages/1018463)"[^5] - "My unique USGS ID is `08282300`" - "My name is `Rio Brazos at Fishtail Road NR Tierra Amarilla, NM`" - "Data about me is provided by the `USGS Water Data for the Nation`" - "I am a `hydrometric station`[^6]" - "My lat/long is `36.738 -106.471`" - "I am on the [Rio Brazos](https://geoconnex.us/ref/mainstems/1611418)"[^7] - "There is data about me for the parameter `Discharge` and between June 6, 2014 to the present at a 15 minute time resolution. This data is generated from `in-situ observation`, in particular using [USGS discharge measurement methods](https://pubs.usgs.gov/publication/tm3A8). You can download it [here](https://waterservices.usgs.gov/nwis/iv/?sites=08282300&parameterCd=00060&startDT=2023-08-13T03:08:21.313-06:00&endDT=2023-08-20T03:08:21.313-06:00&siteStatus=all&format=rdb) using the [USGS Instantaneous Values REST Web Service](https://waterservices.usgs.gov/rest/IV-Test-Tool.html) in the [RDB format](https://waterdata.usgs.gov/nwis/?tab_delimited_format_info)". You can also download it [here](https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations) using the [SensorThings API standard](https://docs.ogc.org/is/15-078r6/15-078r6.html) in `JSON` or `CSV` formats."[^8] - "There is data about me for the parameter `Gage Height` between June 6, 2014 to the present at a 15 minute time resolution. This data is generated from `in-situ observation`, in particular using [USGS stage measurement methods](https://pubs.usgs.gov/publication/tm3A7). You can download it [here](https://waterservices.usgs.gov/nwis/iv/?sites=08282300&parameterCd=00065&startDT=2023-08-13T03:08:21.313-06:00&endDT=2023-08-20T03:08:21.313-06:00&siteStatus=all&format=rdb) from the [USGS Instantaneous Values REST Web Service](https://waterservices.usgs.gov/rest/IV-Test-Tool.html) in the [RDB format](https://waterdata.usgs.gov/nwis/?tab_delimited_format_info)". You can also download it [here](https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('ba774169b3e542cdb9c02e8d705b4d0f')?$expand=Thing,Observations) using the [SensorThings API standard](https://docs.ogc.org/is/15-078r6/15-078r6.html) in `JSON` or `CSV` formats." [^4]: This is ideally a persistent geoconnex URI. See [here](https://github.com/internetofwater/geoconnex.us/blob/master/CONTRIBUTING.md) for how to create these. It is optional if the "same thing" geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted. [^5]: Where possible, it will useful to tag your organization's locations with pre-existing identifiers for reference locations, since many organizations collect data at the same location. [^6]: This ideally would come from a [codelist](https://docs.ogc.org/is/14-111r6/14-111r6.html#annexB_1) so that data providers use consistent terminology [^7]: Note that ideally this would be a geoconnex URI for a river mainstem, in this case <https://geoconnex.us/ref/mainstems/1611418> [^8]: This is towards the 'more detailed' end of the spectrum. If data is not available via API, it is still good to include links to data file downloads or web apps that provide access to the data #### JSON-LD Here we will build the equivalent JSON-LD content step-by-step. The steps are: 1. [Identifiers and provenance](#sec-ident) 2. [Spatial geometry and hydrologic references](#sec-spatial) 3. [Datasets](#sec-loc-data) These culminate in the [complete example](). ##### Identifiers and provenance {#sec-ident} A first group of information helps identify the location and its provenance. - "[This is my HTTP identifier](https://geoconnex.us/usgs/monitoring-location/08282300)"[^9] - "I am a `hydrometric station`[^10]" - "I am the same thing as [Geoconnex Reference Gage 1018463](https://geoconnex.us/ref/gages/1018463)"[^11] - "My unique USGS ID is `08282300`" - "My name is `Rio Brazos at Fishtail Road NR Tierra Amarilla, NM`" - "Data about me is provided by the `USGS Water Data for the Nation`" [^9]: This is ideally a persistent geoconnex URI. See [here](https://github.com/internetofwater/geoconnex.us/blob/master/CONTRIBUTING.md) for how to create these. It is optional if the "same thing" geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted. [^10]: This ideally would come from a [codelist](https://docs.ogc.org/is/14-111r6/14-111r6.html#annexB_1) so that data providers use consistent terminology [^11]: Where possible, it will useful to tag your organization's locations with pre-existing identifiers for reference locations, since many organizations collect data at the same location. ``` json { "@context": { "@vocab":"https://schema.org/", "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/", "locType": "http://vocabulary.odm2.org/sitetype/" }, "@id": "https://geoconnex.us/usgs/monitoring-location/08282300", "@type": [ "hyf:HY_HydrometricFeature", "hyf:HY_HydroLocation", "locType:stream" ], "hyf:HydroLocationType": "hydrometric station", "sameAs": {"@id":"https://geoconnex.us/ref/gages/1018463"}, "identifier": { "@type": "PropertyValue", "propertyID": "USGS site number", "value": "08282300" }, "name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM", "description": "Stream/River Site", "provider": { "url": "https://waterdata.usgs.gov", "@type": "GovernmentOrganization", "name": "U.S. Geological Survey Water Data for the Nation" } } ``` Here we construct the JSON-LD document by adding a context which includes the <https://schema.org/> vocabulary, as well as the <https://www.opengis.net/def/schema/hy_features/hyf/> vocabulary which defines specific concepts in surface hydrology, and the ODM2 [sitetype vocabulary](http://vocabulary.odm2.org/sitetype/) which defines types of water data collection locations. - The `@id` element of <https://geoconnex.us/ref/monitoring-location/08282300> in this case is a persistent geoconnex URI. See [here](https://github.com/internetofwater/geoconnex.us/blob/master/CONTRIBUTING.md) for how to create these. It is optional if the "same thing" geoconnex URI in the next bullet is provided, in which case this could just be the URL of the web resource for the location, or omitted. - The `@type` element here specifies that <https://geoconnex.us/ref/monitoring-location/08282300> is a [Place](https://schema.org/Place) (i.e. a generic place on earth), a [Hydrometric Feature](https://www.opengis.net/def/schema/hy_features/hyf/HY_HydrometricFeature) (i.e. a data collection station) and a [HydroLocation](https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocation) (i.e. a specific location that could in principle define a catchment). The `locType` further specifies the type of location using the ODM2 sitetype vocabulary <http://vocabulary.odm2.org/sitetype/>, which expresses the location type in terms of the feature of interest (e.g. a stream, a groundwater system). If the location is more meant to represent a general location about which non-hydrologic data is being provided, as might be the case with a data provider publishing data about dams, levees, culverts, bridges, etc. but not associated water data, then `locType` and `hyf:HY_HydrometricFeature` can be omitted. - The `hyf:HydroLocationType` can be used to identify the type of site with greater specificity and customization by using text values from any codelist, but preferably the [HY_Features HydroLocationType codelist](https://docs.ogc.org/is/14-111r6/14-111r6.html#annexB_1) instead of identifiers. It can be useful to describe something like a dam, weir, culvert, bridge, etc. - The `sameAs` element is optional if the `@id` element is included as a persistent geoconnex URI. However, wherever possible, it should be populated with a Geoconnex Reference Feature URI. If all data providers tag their own location metadata with these, it becomes much more easy for users of the Geoconnex system to find data collected by other providers about the same location. Reference features of all sorts are available to browse in a web map at <https://geoconnex.us/iow/map>, access via API at <https://reference.geoconnex.us/collections>, or to download in bulk as GeoPackage files from [HydroShare](https://www.hydroshare.org/resource/3cc04df349cd45f38e1637305c98529c/). If your location does not appear to be represented in a reference location, please consider contributing your location. You can start this process by [submitting an issue at the geoconnex.us GitHub repository](https://github.com/internetofwater/geoconnex.us/issues/new?assignees=&labels=&projects=&template=general.md&title=%5Bgeneral%5D). In this case `sameAs` is a persistent geoconnex URI for a "Reference Gage". Reference Gages is an open source, continuously updated set of all known surface water monitoring locations with data being collected by all known organizations. It is managed on GitHub at <https://github.com/internetofwater/ref_gages> - The `identifier` element specifies the ID scheme name (`propertyID`) for the location in the data source and the ID itself (`value`) - The `name` (required) and `description` (optional) elements are self-explanatory and can follow the conventions of the data provider. - The `provider` element describes the data provider, which is generally conceptualized in Geoconnex as being a data system available on the web. Note that under `provider`, in addition to an identifying `name`, there is a `url` if available for the website of the providing data system, and a `@type`, which is most likely a sub type of <https://schema.org/Organization>, which includes [GovernmentOrganization](https://schema.org/GovernmentOrganization), [NGO](https://schema.org/NGO), [ResearchOrganization](https://schema.org/ResearchOrganization), [EducationalOrganization](https://schema.org/EducationalOrganization), and [Corporation](https://schema.org/Corporation), among others. ##### Spatial geometry and hydrologic references {#sec-spatial} The second group of information provides specific location and spatial context: - "My lat/long is `36.738 -106.471`" - "I am on the [Rio Brazos](https://geoconnex.us/ref/mainstems/1611418)"[^12] [^12]: Note that ideally this would be a geoconnex URI for a river mainstem, in this case <https://geoconnex.us/ref/mainstems/1611418> Adding this information to the bottom of JSON-LD document: ``` json { "@context": { "@vocab":"https://schema.org/", "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/", "locType": "http://vocabulary.odm2.org/sitetype/" }, "@id": "https://geoconnex.us/usgs/monitoring-location/08282300", "@type": [ "hyf:HY_HydrometricFeature", "hyf:HY_HydroLocation", "locType:stream" ], "hyf:HydroLocationType": "hydrometric station", "sameAs": {"@id":"https://geoconnex.us/ref/gages/1018463"}, "identifier": { "@type": "PropertyValue", "propertyID": "USGS site number", "value": "08282300" }, "name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM", "description": "Stream/River Site", "provider": { "url": "https://waterdata.usgs.gov", "@type": "GovernmentOrganization", "name": "U.S. Geological Survey Water Data for the Nation" }, "geo": { "@type": "schema:GeoCoordinates", "longitude": -106.4707722, "latitude": 36.7379333 }, "gsp:hasGeometry": { "@type": "http://www.opengis.net/ont/sf#Point", "gsp:asWKT": { "@type": "http://www.opengis.net/ont/geosparql#wktLiteral", "@value": "POINT (-106.4707722 36.7379333)" }, "gsp:crs": { "@id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84" } }, "hyf:referencedPosition":{ "hyf:HY_IndirectPosition":{ "hyf:linearElement":{ "@id": "https://geoconnex.us/ref/mainstems/1611418" } } } } ``` We have added a context element `gsp` and three blocks: `geo`, `gsp:hasGeometry`, and `hyf:referencedPosition`. - `gsp` is the [GeoSPARQL](https://www.ogc.org/standard/geosparql/) ontology used to standardize the representation of spatial data and relationships in knowledge graphs like the Geoconnex system - `geo` is the `schema.org` [standard for representing spatial data](https://schema.org/geo). It is what is used by search engines like Google and Bing to place webpages on a map. While useful, it does not have a standard way for representing multipoint, multipolyline, or multipolygon features, or a way to specify coordinate reference systems or projections, and so we need to also provide a GeoSPARQL version of the geometry. In this case, we are simply providing a point with a longitude and latitude via the [schema:GeoCoordinates](https://schema.org/GeoCoordinates) property. It is also possible to represent [lines](https://schema.org/line) and [polygons](https://schema.org/polygon) - `gsp:hasGeometry` is the GeoSPARQL version of geometry, with which we can embed [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) representations of geometry in structured metadata in the `@value` element, and declare the coordinate reference system or projection in the `gsp:crs` element by using EPSG codes as encoded in the [OGC register of reference systems](http://www.opengis.net/def/crs/EPSG/0/), in this case using <http://www.opengis.net/def/crs/EPSG/0/4326> for the familiar WGS 84 (EPSG 4326) system. - `hyf:referencedPosition` uses the [HY_Features](https://www.opengis.net/def/schema/hy_features/hyf/) model to declare that this location is located on a specific river, in this case the [Rio Brazos in New Mexico](%22https://geoconnex.us/ref/mainstems/1611418%22) as identified in the Reference Mainstems dataset, which is available via API at <https://reference.geoconnex.us/collections/mainstems> and managed on GitHub at <https://github.com/internetofwater/ref_rivers>. All surface water locations should include this type of element. ::: callout-note ####### What about groundwater? Groundwater monitoring locations may use the `hyf:referencedPosition` element if data providers wish their wells to be associated with specific streams. However, groundwater sample and monitoring locations such as wells can also be referenced to hydrogeologic unit or aquifer identifiers where available using this pattern, instead of using the `hyf:referencedPosition` pattern: ``` json "http://www.w3.org/ns/sosa/isSampleOf": { "id": "https://geoconnex.us/ref/sec_hydrg_reg/S26" } ``` USGS Principal Aquifers and Secondary Hydrogeologic Unit URIs are available from <https://reference.geoconnex.us/collections> If reference URIs are not available for the groundwater unit you'd like to reference, but an ID does exist in a dataset that exists online you may use this pattern ``` json "http://www.w3.org/ns/sosa/isSampleOf": { "@type": "GW_HydrogeoUnit", "name": "name of the aquifer", "identifier": { "@type": "PropertyValue", "propertyID": "Source aquifer dataset id field name", "value": "aq-id-1234" }, "subjectOf": { "@type": "Dataset", "url": "url where dataset that descibes or includes the aquifer can be accessed" } } ``` ::: ##### Datasets {#sec-loc-data} Now that we have described our location's provenance, geospatial geometry, and association with any reference features , we now describe the data that can be accessed about that location. The simplest, most minimal way to do this is to add a block like this, which would be added to the bottom of the JSON-LD document we have created so far: ``` json "subjectOf": { "@type": "Dataset", "name": "Discharge data from USGS-08282300", "description": "Discharge data from USGS-08282300 at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM", "url": "https://waterdata.usgs.gov/monitoring-location/08282300/#parameterCode=00060&period=P7D" } ``` Here, we simply declare that the location we have been working with is `subjectOf` of a `Dataset` with a name, description, and URL where information about the dataset can be found. However, to enable data users (and search engines) to filter for your data using more standardized names for variables, and by temporal coverage and resolution, and determine if they want to use that data based on the methods used (such as whether it is observed or modeled/forecasted data), and possibly preview actual data values, it will be useful to include much more detailed metadata. In general, following [Science-on-Schema.org Guidelines](https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md) is recommended. We implement this guidance, with some extension, for the USGS Monitoring Location example. Hover over the code annotation bubbles on the right for translation and explanation: ``` json { "subjectOf":{ // <1> "@type": "Dataset", // <2> "name": "Discharge data from USGS Monitoring Location 08282300", // <3> "description": "Discharge data from USGS Streamgage at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM", // <3> "license": "https://spdx.org/licenses/CC-BY-4.0", // <4> "isAccessibleForFree": "true", // <5> "variableMeasured": { // <6> "@type": "PropertyValue", // <7> "name": "discharge", // <7> "description": "Discharge in cubic feet per second", // <7> "propertyID": "https://www.wikidata.org/wiki/Q8737769", // <8> "url": "https://en.wikipedia.org/wiki/Discharge_(hydrology)", // <9> "unitText": "cubic feet per second", // <10> "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate", // <11> "unitCodet": "qudt-units:FT3-PER-SEC", // <12> "measurementTechnique": "observation", // <13> "measurementMethod": { // <14> "name":"Discharge Measurements at Gaging Stations", // <14> "publisher": "U.S. Geological Survey", // <14> "url": "https://doi.org/10.3133/tm3A8" // <14> } // <14> }, // <14> "temporalCoverage": "2014-06-30/..", // <15> "dc:accrualPeriodicity": "freq:daily", // <16> "dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"}, // <16> "distribution": [ // <17> { "@type": "DataDownload", // <17> "name": "USGS Instantaneous Values Service" // <17> "contentUrl": "https://waterservices.usgs.gov/nwis/iv/?sites=08282300&parameterCd=00060&format=rdb", // <17> "encodingFormat": ["text/tab-separated-values"], // <17> "dc:conformsTo": "https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf" // <17> }, { "@type": "DataDownload", // <18> "name": "USGS SensorThings API", // <18> "contentUrl": "https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations", // <18> "encodingFormat": ["application/json"], // <18> "dc:conformsTo": "https://labs.waterdata.usgs.gov/docs/sensorthings/index.html" // <18> } ] } } ``` 1. This node (we are continuing from the above JSON-LD document, so the USGS Monitoring Location) is `subjectOf` the node that follows) 2. This node is a `Dataset`(<https://schema.org/Dataset>) 3. The dataset's name and description 4. The dataset's license, which is most easily populated by a URI for the license appropriate for your data. Federal agencies, many state agencies, and academics use open licenses such as those provided by [opendatacommons.org](https://opendatacommons.org/licenses/) and [creativecommons.org](https://creativecommons.org/licenses). URIs for licenses are available from <https://spdx.org/licenses/> 5. Either `true` or `false` depending on if the dataset is available for free. 6. The dataset includes information on a variable. (in schema.org called [variableMeasured](https://schema.org/variableMeasured)). Multiple `variableMeasured` can be specified for datasets by using [arrays](https://www.w3.org/TR/json-ld11/#example-135-indexing-language-tagged-strings-and-set), which is useful for datasets that must be downloaded in bulk that include multiple variables of interest. In general it is more clear to specify a "dataset" per `variableMeasured` if the data has different temporal coverage per variable, or can be downloaded on a per-variable basis. Multiple `variableMeasured` can be specified using nested JSON arrays. 7. `@PropertyValue` is a generic type to extend schema.org properties and should just be used as a rule on `variableMeasured` nodes. 8. `propertyID` should be a URI where there is a machine-readable resource defines what the variable is. In this case, we are using a Wikidata link to the concept of stream discharge. In general, a good source for URIs is the [ODM2 variable vocabulary](http://vocabulary.odm2.org/variablename/). 9. Here `url` points to a human-readable resource describing the variable, in this case, we are using a Wikipedia link to the concept of stream discharge. 10. Here we use the units as written in the data source. 11. While `name` and `propertyID` specifies the variable as being "discharge" in this case, since multiple data sources might use different words and identifiers for their variables, it can be useful to reference a more general category of variables that we can ue to group variables across sources. We can use identifiers for [QuantityKinds](https://qudt.org/schema/qudt/QuantityKind) from QUDT, which we reference with the `qudt-quantkinds` for the prefix as described in the `@context` in @sec-context. 12. While `unitText` above specifies the units, since multiple data sources might use different words for the same unit, to improve interoperability we can use identifiers for units provided by QUDT, which we reference with the `qudt-units` vocabulary prefix as described in the `@context` in @sec-context. If units from QUDT are unavailable, first check if `unitText` can be filled with a term from name from http://vocabulary.odm2.org/units/. 13. `measurementTechnique` is meant to be a highly general account of the data generating procedure, and primarily to distinguish between observed and modeled data. It is highly recommended for this to be `model` or `observation`, or if more specificity is required, to restrict these values to the ODM2 [methodType](http://vocabulary.odm2.org/methodtype/) vocabulary. 14. `measurementMethod` specifies the method used to generate the data to as great a degree of specificity as possible. Ideally it could a persistent identifier that directs to a machine-readable web resource that unambiguously describes that method. This would look something like this: `"measurementMethod": {"@id": "https://www.nemi.gov/methods/method_summary/4680/"}` In lieu of that, a name, description and URL to human-readable web resource like an explanatory webpage, technical report, standards document, or academic article would be appropriate, as in this example for USGS discharge measurement. 15. `temporal coverage` refers to the first and last time for which data is available. It can be specified using [ISO 8061 interval format](https://en.wikipedia.org/wiki/ISO_8601#Time_intervals) (`YYYY-MM-DD/YYYY-MM-DD`, with the start date first and the end date after the `/` . It can also include time like so `YYYY-MM-DDTHH:MM:SS/YYYY-MM-DDTHH:MM:SS` . If the dataset has no end date, as there is an active monitoring program, then this can be indicated like so `YYYY-MM-DD/..` . 16. `dc:accrualPeriodicty` refers to the update schedule of the published dataset. The value of this can be from the [Dublin Core frequency vocabulary](https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/) (here in @context as `freq:`). `dcat:temporalResolution` refers to the minimum intended time spacing between observations in the case of regular time series data. The value should be an [xsd duration encoded string](http://www.datypic.com/sc/xsd/t-xsd_duration.html) e.g. "PT15M" for 15-minute, "P1D" for Daily, "PT1H" for Hourly, "P7D" for Weekly, "P1M" for Monthly, "P1Y" for Annual. [`freq:`](https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/) or be specified using ISO duration code. 17. `distribution` provides a way to structure information about data access points. This can range in complexity from a specification of a URL and format to specifications for how to interact with an API. In this example, a URL, format (`encodingFormat` populated by [MIME type](https://www.iana.org/assignments/media-types/media-types.xhtml)). `conformsTo` is optional and should be a document that helps interpret the data structure. This could be a link to a data dictionary in the case of simple tabular data, documentation of a data model for a complex database, or an API specification document for an API endpoint. 18. Multiple `distributions` can be specified using nested JSON arrays. This translates roughly to - There is is the following information about me: a `Dataset` - for the variable (`measuredVariable`) `Discharge` - It has values between `June 6, 2014` to the `present` - at a `15` `minute` time resolution - updated/ published daily - in units of `cubic feet per second` - generated by `location observation` - generated in particular using [USGS discharge measurement methods](https://pubs.usgs.gov/publication/tm3A8). - You can download it: - [here](https://waterservices.usgs.gov/nwis/iv/?sites=08282300&paramete) - Using the [USGS Instantaneous Values REST Web Service](https://waterservices.usgs.gov/rest/IV-Service.html) - in the [RDB format](https://waterdata.usgs.gov/nwis/?tab_delimited_format_info) - You can also download it [here](https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations) - Using the [USGS SensorThings API implementation](https://labs.waterdata.usgs.gov/docs/sensorthings/index.html) - in JSON ### Dataset-oriented {#sec-data} The purpose of the dataset-oriented page is to give enough information about the data available and the area, locations, or features that it is relevant to that a water data user would be able to quickly determine whether and how to download the data after reading. We will use this [data resource about water utility treated water demand that has been published at HydroShare](https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/) as an example for the type of content to put in dataset-oriented Geoconnex landing page web resources and how to map that content to embedded JSON-LD documents. ::: callout-note Scroll up and down to view elements of the example landing page ::: ```{=html} <iframe width="780" height="500" src="https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/" title="Hydroshare Example"></iframe> ``` This dataset-oriented web resource includes this type of information - "This is my URI (which is a DOI-URL): https://geoconnex.us/ref/monitoring-location/08282300"[^13] - "This is my permanent identifier, which is a DOI": [^14] - "This is my URL <https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299>"[^15] - "My creator is <name>" - "I am provided by HydroShare" - "My spatial coverage is the bounding box `"35.5463 -79.1235 36.0520 -78.3765"`"[^16] - "I have data between January 1, 2002 and December 31, 2020" [^17] - "My data is at a `1` `month` time step frequency" [^18] - "I am about the following features":[^19]. - [Raleigh Public Water System](https://geoconnex.us/ref/pws/NC0392010) - [Cary Public Water System](https://geoconnex.us/ref/pws/NC0392020) - [Durham Public Water System](https://geoconnex.us/ref/pws/NC0332010) - [Apex Public Water System](https://geoconnex.us/ref/pws/NC0392045) - [Orange Water and Sewer Authority](https://geoconnex.us/ref/pws/NC0368010) - "I have the following variables": - Monthly Water demand measured in units of averaged millions of gallons per day - Historic Mean monthly water demand over the period of record measured in units of millions of gallons per day - The monthly water demand divided by historic mean monthly water demand, as a percent - "You can download me [here](https://www.hydroshare.org/hsapi/resource/4cf2a4298eca418f980201c1c5505299/) on HydroShare as a zipped csv file" - "I am accessible for free subject to this [license](http://creativecommons.org/licenses/by/4.0/). [^13]: If a permanent identifier like a DOI is available [^14]: for identifiers that are not HTTP URLs [^15]: The actual URL where the resource [^16]: Spatial coverage revers to maximum area extent of where data is about. For Geoconnex purposes, this is not necessary if the "about" elements with links to Geoconnex Reference Features is used [^17]: refers to the first and last time for which data is available. It can be specified using [ISO 8061 interval format](https://en.wikipedia.org/wiki/ISO_8601#Time_intervals) (`YYYY-MM-DD/YYYY-MM-DD`, with the start date first and the end date after the `/` . It can also include time like so `YYYY-MM-DDTHH:MM:SS/YYYY-MM-DDTHH:MM:SS` . If the dataset has no end date, as there is an active monitoring program, then this can be indicated like so `YYYY-MM-DD/..` [^18]: refers to the minimum intended time spacing between observations in the case of regular time series data [^19]: These should be geoconnex reference feature URIs. If the locations the dataset is about is not within <https://reference.geoconnex.us/collections>, then consider [creating location-based resources and minting geoconnex identifiers](https://github.com/internetofwater/geoconnex.us/blob/master/CONTRIBUTING.md). If the dataset is extensive over a vector feature spatial fabric, like all Census Tracts or HUC12s or NHD Catchments, then this can be a reference to a single reference fabric dataset rather than an array of identifiers for every single feature. If the dataset is extensive over an area but has no particular tie to a particular reference feature set, like a raster dataset, then this can be omitted. #### JSON-LD Much is similar to the [Datasets guidance for location-oriented web resources](#sec-loc-data), so here we focus on the differences. Note that HydroShare automatically embeds JSON-LD. The JSON-LD examples below vary somewhat from HydroShare's default content to illustrate optional elements that would be useful for Geoconnex that are not currently implemented in HydroShare. ##### Identifiers, provenance, license, and distribution. For basic identifying and descriptive information, [science-on-schema.org has appropriate guidance](https://github.com/ESIPFed/science-on-schema.org/blob/master/examples/dataset/minimal.jsonld). In this case, note that a specific file download URL has been provided rather than an API endpoint, and that `dc:conformsTo` points to a data dictionary that is supplied at the same web resource. ``` json { "@context": { "@vocab": "https://schema.org/", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "dc": "http://purl.org/dc/terms/", "qudt": "http://qudt.org/schema/qudt/", "qudt-units": "http://qudt.org/vocab/unit/", "qudt-quantkinds": "http://qudt.org/vocab/quantitykind/", "gsp": "http://www.opengis.net/ont/geosparql#", "locType": "http://vocabulary.odm2.org/sitetype", "odm2var":"http://vocabulary.odm2.org/variablename/", "odm2varType": "http://vocabulary.odm2.org/variabletype/", "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/", "skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType", "ssn": "http://www.w3.org/ns/ssn/", "ssn-system": "http://www.w3.org/ns/ssn/systems/" }, "@type": "Dataset", "@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299", "provider": { "url": "https://hydroshare.org", "@type": "ResearchOrganization", "name": "HydroShare" }, "creator": { "@type": "Person", "affiliation": { "@type": "Organization", "name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy" }, "email": "konda@lincolninst.edu", "name": "Kyle Onda", "url": "https://www.hydroshare.org/user/4850/" }, "identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299", "name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard", "description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area", "url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299", "keywords": ["water demand", "water supply", "geoconnex"], "license": "https://creativecommons.org/licenses/by/4.0/", "isAccessibleForFree": "true", "distribution": [ { "@type": "DataDownload", "name": "HydroShare file URL", "contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv", "encodingFormat": ["text/csv"], "dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx" }, ... ``` ##### Variables and Methods Again, follows the [dataset guidance](#sec-loc-data). In the example below, multiple `variableMeasured` are specified using a nested array. Other differences to point out: - The unit of "million gallons per day" is not available from the QUDT units vocabulary. It is in the [ODM2 units codelist](http://vocabulary.odm2.org/units/), so we populate `unitCode` with the url listed there. - The measurementMethod for both variables, which are simply different aggregation statistics for the same variable, do not have known web resources or specific identifiers available, and so use `description` to clarify the method. ``` json ..., "variableMeasured": [ { "@type": "PropertyValue", "name": "water demand", "description": "treated water delivered to distribution system", "propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/", "url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/", "unitText": "million gallons per day", "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate", "unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048", "measurementTechnique": "observation", "measurementMethod": { "name":"water meter", "description": "metered bulk value, accumlated over one month", "url": "https://www.wikidata.org/wiki/Q268503" } }, { "@type": "PropertyValue", "name": "water demand (monthly average)", "description": "average monthly treated water delivered to distribution system", "propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/", "url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/", "unitText": "million gallons per day", "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate", "unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048", "measurementTechnique": "observation", "measurementMethod": { "name":"water meter", "description": "metered bulk value, average accumlated over each month for multiple years", "url": "https://www.wikidata.org/wiki/Q268503" } }, ], "temporalCoverage": "2002-01-01/2020-12-31", "ssn-system:frequency": { "value": "1", "unitCode": "qudt-units:Month" }, ``` ##### Geoconnex Reference Feature Links and Spatial Coverage Unlike the location-based example, where a location is explicitly the `subjectOf` the dataset, here, the dataset must be described as being `about` certain features. If the dataset is not explicitly about any discrete features, such as raster datasets, then a Spatial Coverage should be specified. Using the `about` construction, a single geoconnex URI or an array of multiple can be constructed. In the below example, multiple are used. Note the nesting of nodes within the array so that each URI has an `@id` keyword and is `@type` `Place`. In this example, URIs from the geoconnex [reference features set for Public Water Systems](https://reference.geoconnex.us/collections/pws) are used. ``` json ..., "about": [ { "@id": "https://geoconnex.us/ref/pws/NC0332010", "@type": "Place" }, { "@id": "https://geoconnex.us/ref/pws/NC0368010", "@type": "Place" }, { "@id": "https://geoconnex.us/ref/pws/NC0392010", "@type": "Place" }, { "@id": "https://geoconnex.us/ref/pws/NC0392020", "@type": "Place" }, { "@id": "https://geoconnex.us/ref/pws/NC0392045", "@type": "Place" } ], ... ``` To assist in finding reference features, <https://reference.geoconnex.us> allows queries following the [OGC-API Features](https://ogcapi.ogc.org/features/) API standard and the CQL [Common Query Language standard](https://portal.ogc.org/files/96288). For example, to find the Geoconnex URI for the Raleigh public water system (PWS), we can construct the URL: - CQL filter API endpoint for the PWS feature collection <https://reference.geoconnex.us/collections/pws/items> - filter for name field `pws_name`: <https://reference.geoconnex.us/collections/pws/items?filter=pws_name> - filter for a name that includes "Raleigh": [https://reference.geoconnex.us/collections/pws/items?filter=pws_name ILIKE '%Raleigh%'](https://reference.geoconnex.us/collections/pws/items?filter=pws_name%20ILIKE "%Raleigh%") Sometimes it is impossible to use feature URIs because the relevant specific features are not available from <https://reference.geoconnex.us/collections>. If so, feel free to [submit an issue to the geoconnex.us github repository](https://github.com/internetofwater/geoconnex.us/issues/new/choose) requesting a reference feature set. Sometimes it is impractical to list all applicable reference features, whether or not they are in <https://reference.geoconnex.us> or another source. This is common for comprehensive datasets that are all about an entire reference dataset or other another dataset like a hydrofabric, such as datasets summmarizing values to U.S. Counties, or the National Water Model generating values for all NHDPlusV2 COMID flowlines. In this case it is best to declare that the Dataset is [isBasedOn](https://schema.org/isBasedOn) the source geospatial fabric. For example, if the example dataset were about all public water systems instead of just the 5 listed, instead of `about`, we should specify an identifier, name, description, and any URLs for other resources that describe the source fabric and how to interpret it: ``` json ..., "isBasedOn": { "@id": "https://www.hydroshare.org/resource/9ebc0a0b43b843b9835830ffffdd971e/", "name": "U.S. Community Water Systems Service Boundaries, v4.0.0" "description": "This is a layer of water service boundaries for 45,973 community water systems that deliver tap water to 307.7 million people in the US." "url": "https://github.com/SimpleLab-Inc/wsb" }, ... ``` Sometimes there are no particular features that a dataset is explicitly about. This is common with remote sensing raster data. In this case, it is best to specify a `spatialCoverage` polygon using WKT encoded geometry: ``` json "spatialCoverage": { "@type": "Place", "gsp:hasGeometry": { "@type": "http://www.opengis.net/ont/sf#MultiPolygon", "gsp:asWKT": { "@type": "http://www.opengis.net/ont/geosparql#wktLiteral", "@value": "MULTIPOLYGON (((-85.67957299999999 32.799514, -85.679637 32.822002999999995, -85.67199699999999 32.822063, -85.66421 32.821711, -85.647989 32.82224, -85.627966 32.822331, -85.627781 32.800716, -85.627496 32.778602, -85.635931 32.778656999999995, -85.645034 32.778146, -85.653352 32.778481, -85.67933699999999 32.778239, -85.67936399999999 32.784064, -85.679808 32.792068, -85.67957299999999 32.799514)))" } } } ``` ## Complete Examples {#sec-complete-examples} Below are complete examples for the general JSON-LD document types depending on the location or dataset orientation and data type. They are viewable together below, or available for download: - [location-oriented example](https://raw.githubusercontent.com/internetofwater/geoconnex-guidance/main/examples/location-complete.jsonld) - [dataset-oriented example](https://raw.githubusercontent.com/internetofwater/geoconnex-guidance/main/examples/dataaset-complete.jsonld) ### Location-oriented {#sec-loc-complete-example} ``` json { "@context": { "@vocab": "https://schema.org/", "xsd": "https://www.w3.org/TR/xmlschema-2/#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "dc": "http://purl.org/dc/terms/", "dcat": "https://www.w3.org/ns/dcat#", "freq": "http://purl.org/cld/freq/", "qudt": "http://qudt.org/schema/qudt/", "qudt-units": "http://qudt.org/vocab/unit/", "qudt-quantkinds": "http://qudt.org/vocab/quantitykind/", "gsp": "http://www.opengis.net/ont/geosparql#", "locType": "http://vocabulary.odm2.org/sitetype", "odm2var":"http://vocabulary.odm2.org/variablename/", "odm2varType": "http://vocabulary.odm2.org/variabletype/", "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/", "skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType", "ssn": "http://www.w3.org/ns/ssn/", "ssn-system": "http://www.w3.org/ns/ssn/systems/" }, "@id": "https://geoconnex.us/usgs/monitoring-location/08282300", "@type": [ "hyf:HY_HydrometricFeature", "hyf:HY_HydroLocation", "locType:stream" ], "hyf:HydroLocationType": "hydrometric station", "sameAs": { "@id": "https://geoconnex.us/ref/gages/1018463" }, "identifier": { "@type": "PropertyValue", "propertyID": "USGS site number", "value": "08282300" }, "name": "Rio Brazos at Fishtail Road NR Tierra Amarilla, NM", "description": "Stream/River Site", "provider": { "url": "https://waterdata.usgs.gov", "@type": "GovernmentOrganization", "name": "U.S. Geological Survey Water Data for the Nation" }, "geo": { "@type": "schema:GeoCoordinates", "longitude": -106.4707722, "latitude": 36.7379333 }, "gsp:hasGeometry": { "@type": "http://www.opengis.net/ont/sf#Point", "gsp:asWKT": { "@type": "http://www.opengis.net/ont/geosparql#wktLiteral", "@value": "POINT (-106.4707722 36.7379333)" }, "gsp:crs": { "@id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84" } }, "hyf:referencedPosition": { "hyf:HY_IndirectPosition": { "hyf:linearElement": { "@id": "https://geoconnex.us/ref/mainstems/1611418" } } }, "subjectOf": { "@type": "Dataset", "name": "Discharge data from USGS Monitoring Location 08282300", "description": "Discharge data from USGS Streamgage at Rio Brazos at Fishtail Road NR Tierra Amarilla, NM", "provider": { "url": "https://waterdata.usgs.gov", "@type": "GovernmentOrganization", "name": "U.S. Geological Survey Water Data for the Nation" }, "url": "https://waterdata.usgs.gov/monitoring-location/08282300/#parameterCode=00060", "variableMeasured": { "@type": "PropertyValue", "name": "discharge", "description": "Discharge in cubic feet per second", "propertyID": "https://www.wikidata.org/wiki/Q8737769", "url": "https://en.wikipedia.org/wiki/Discharge_(hydrology)", "unitText": "cubic feet per second", "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate", "unitCode": "qudt-units:FT3-PER-SEC", "measurementTechnique": "observation", "measurementMethod": { "name": "Discharge Measurements at Gaging Stations", "publisher": "U.S. Geological Survey", "url": "https://doi.org/10.3133/tm3A8" } }, "temporalCoverage": "2014-06-30/..", "dc:accrualPeriodicity": "freq:daily", "dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"}, "distribution": [ { "@type": "DataDownload", "name": "USGS Instantaneous Values Service", "contentUrl": "https://waterservices.usgs.gov/nwis/iv/?sites=08282300&parameterCd=00060&format=rdb", "encodingFormat": [ "text/tab-separated-values" ], "dc:conformsTo": "https://pubs.usgs.gov/of/2003/ofr03123/6.4rdb_format.pdf" }, { "@type": "DataDownload", "name": "USGS SensorThings API", "contentUrl": "https://labs.waterdata.usgs.gov/sta/v1.1/Datastreams('0adb31f7852e4e1c9a778a85076ac0cf')?$expand=Thing,Observations", "encodingFormat": [ "application/json" ], "dc:conformsTo": "https://labs.waterdata.usgs.gov/docs/sensorthings/index.html" } ] } } ``` ### Dataset-oriented {#sec-data-complete-example} ``` json { "@context": { "@vocab": "https://schema.org/", "xsd": "https://www.w3.org/TR/xmlschema-2/#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "dc": "http://purl.org/dc/terms/", "dcat": "https://www.w3.org/ns/dcat#", "freq": "http://purl.org/cld/freq/", "qudt": "http://qudt.org/schema/qudt/", "qudt-units": "http://qudt.org/vocab/unit/", "qudt-quantkinds": "http://qudt.org/vocab/quantitykind/", "gsp": "http://www.opengis.net/ont/geosparql#", "locType": "http://vocabulary.odm2.org/sitetype", "odm2var":"http://vocabulary.odm2.org/variablename/", "odm2varType": "http://vocabulary.odm2.org/variabletype/", "hyf": "https://www.opengis.net/def/schema/hy_features/hyf/", "skos": "https://www.opengis.net/def/schema/hy_features/hyf/HY_HydroLocationType", "ssn": "http://www.w3.org/ns/ssn/", "ssn-system": "http://www.w3.org/ns/ssn/systems/" }, "@type": "Dataset", "@id": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299", "url": "https://doi.org/10.4211/hs.4cf2a4298eca418f980201c1c5505299", "identifier": "doi:hs.4cf2a4298eca418f980201c1c5505299", "name": "Geoconnex Dataset Example: Data for Triangle Water Supply Dashboard", "description": "This is a dataset meant as an example of dataset-level schema.org markup for https://geoconnex.us. It uses as an example data from the NC Triangle Water Supply Dashboard, which collects and visualizes finished water deliveries by water utilities to their service areas in the North Carolina Research Triangle area", "url": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299", "provider": { "url": "https://hydroshare.org", "@type": "ResearchOrganization", "name": "HydroShare" }, "creator": { "@type": "Person", "affiliation": { "@type": "Organization", "name": "Internet of Water;Center for Geospatial Solutions;Lincoln Institute of Land Policy" }, "email": "konda@lincolninst.edu", "name": "Kyle Onda", "url": "https://www.hydroshare.org/user/4850/" }, "keywords": [ "water demand", "water supply", "geoconnex" ], "license": "https://creativecommons.org/licenses/by/4.0/", "isAccessibleForFree": "true", "distribution": { "@type": "DataDownload", "name": "HydroShare file URL", "contentUrl": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/demand_over_time.csv", "encodingFormat": [ "text/csv" ], "dc:conformsTo": "https://www.hydroshare.org/resource/4cf2a4298eca418f980201c1c5505299/data/contents/dataDictionary.xlsx" }, "variableMeasured": [ { "@type": "PropertyValue", "name": "water demand", "description": "treated water delivered to distribution system", "propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/", "url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/", "unitText": "million gallons per day", "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate", "unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048", "measurementTechnique": "observation", "measurementMethod": { "name": "water meter", "description": "metered bulk value, accumlated over one month", "url": "https://www.wikidata.org/wiki/Q268503" } }, { "@type": "PropertyValue", "name": "water demand (monthly average)", "description": "average monthly treated water delivered to distribution system", "propertyID": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/", "url": "http://vocabulary.odm2.org/variablename/waterUsePublicSupply/", "unitText": "million gallons per day", "qudt:hasQuantityKind": "qudt-quantkinds:VolumeFlowRate", "unitCode": "http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048", "measurementTechnique": "observation", "measurementMethod": { "name": "water meter", "description": "metered bulk value, average accumlated over each month for multiple years", "url": "https://www.wikidata.org/wiki/Q268503" } } ], "temporalCoverage": "2002-01-01/2020-12-31", "dc:accrualPeriodicity": "freq:daily", "dcat:temporalResolution": {"@value":"PT15M","@type":"xsd:duration"}, "about": [ { "@id": "https://geoconnex.us/ref/pws/NC0332010", "@type": "Place" }, { "@id": "https://geoconnex.us/ref/pws/NC0368010", "@type": "Place" }, { "@id": "https://geoconnex.us/ref/pws/NC0392010", "@type": "Place" }, { "@id": "https://geoconnex.us/ref/pws/NC0392020", "@type": "Place" }, { "@id": "https://geoconnex.us/ref/pws/NC0392045", "@type": "Place" } ] } ``` ## Appendices {#sec-appendices} ### Appendix 1: Codelists {#sec-codelists} #### measurementTechnique {#sec-measurementTechnique} The `measurementTechnique` property is meant to provide a way for data providers to "tag" their data with a general sense of how it was created, to help distinguish between aspects such as observed vs modeled data, or in-situ vs. remote-sensed data. It is a supplement to the `measurementMethod` property, which should identify the specific method used and provide a link to documentation specific enough to replicate the method. Multiple `measurementTechniques` can thus be specified. The codelist below can be used to choose terms from. Terms are derived from the [USGS Thesaurus](https://apps.usgs.gov/thesaurus/term-simple.php?thcode=2&code=734) and the [ODM2 methodType vocabulary](http://vocabulary.odm2.org/methodtype/). +----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | code | definition | +================+=============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ | observation | This term is meant to group data generating procedures that occur primarily to directly measure phenomena. Examples include ground-based sensors like streamgages and weather stations, but also discrete water quality samples, habitat assessments, ecological surveys, and surveys of individuals, households, and organizations, as well as remote sensing. However, this category can include datasets that use procedures for gap filling missing data (e.g. streamgage data with sensor malfunction period data estimated from time series models) | +----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | model | This term is refers to data that are generated rather than observed. It groups data generating procedures that generate data for **hypothetical states** at discrete locations, such as (but not limited to):\ | | | | | | - **the future** (e.g. river stage forecasts like the gage location-based forecasts from the NOAA [Advanced Hydrologic Prediction System](https://water.weather.gov/ahps/forecasts.php)) | | | | | | - **counterfactuals** (e.g. hydrologic models under varying assumptions about dam removal or reservoir operations) | | | | | | - **the unobserved past and present at the feature of interest** (e.g. water quality models for parameters based on climate and upstream effluent discharge data) | +----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | field methods | Research procedures and instrumental means to measure, collect data and samples, and observe in the natural areas where the materials, phenomena, structures, or species being studied occur. | +----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | remote sensing | Acquiring information about a natural feature or phenomenon, such as the Earth's surface, without actually being in contact with it. Typically carried out with airborne or spaceborne sensors or cameras. | +----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | estimation | A method for creating results by estimation or professional judgement. | +----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | derivation | A method for creating results by deriving them from other results. Datasets in this category may be generated from algorithms or human processes that combine heterogeneous source data into latent or derived variables (e.g. composite indexes such as health risk scores or regulatory categorizations such as "in compliance"), or spatially aggregate data from smaller geographic units to larger ones (e.g. Census area-based reporting), as long as the data is representing the phenomena of interest at the time and place it actually occurred and was measured. | +----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+