Namespace Registration for Metadata Identifiers (META) Namespace ID: META Version: 1 Date: 2022-11-14 Registrant: Name: Juha Hakala E-mail: juha.hakala&helsinki.fi Affiliation: Senior adviser, The National Library of Finland Address: P.O.Box 15, 00014 Helsinki University, Finland. Web URL: https://www.kansalliskirjasto.fi/en/ Background: According to ISO 5127, metadata is data about other data, documents or records that describes their content, context, structure, data format, provenance and/or rights attached to them. Metadata elements and their machine readable codes are specified in (cataloguing) formats. Libraries have been using a metadata format (Machine Readable Cataloguing, MARC) since late 1960s, and starting in 1990s, museums, archives and many other organizations have developed their own metadata formats. There are also metadata formats for administrative metadata (rights, long-term preservation). As a result of this format proliferation and increased availability of high-quality structured metadata as open linked data, it is important to enable human and machine users to make sense of metadata. This requires at least basic understanding of metadata elements in these formats, such as Dublin Core Title or MARC 21 tag 245, Title statement. Metadata format specifications are usually available in the Web for free. For instance, the Library of Congress has published MARC 21 formats at https://www.loc.gov/marc/. The National Library of Finland maintains translations of these formats at . A list of MARC 21 translations (n = 20) is available at . From users' point of view, using MARC 21 documentation is not as easy as it should be. Information may have been published in a form (e.g. a PDF document) which is not sufficiently machine understandable. And even if information about each metadata element is available in HTML, there are no explicit links between translations. There is no connection between the page describing Title statement in English , the page describing it in Finnish and the Swedish version . Most metadata format communities have chosen to use location dependent HTTP URLs as identifiers for their metadata formats and their elements. The only exception is the Dublin Core Metadata Initiative, which chose Persistent URLs (PURLs) as identifiers. For instance, DC metadata element Creator has two Persistent URLs; http://purl.org/dc/terms/creator in the /terms/ namespace, and http://purl.org/dc/terms/1.1/creator in the /elements/1.1/ namespace, because these two creators have different semantics. NOTE: These PURLs identify creator-related metadata elements in the Dublin Core format, not the creators themselves. Creators have other identifiers, such as ORCiDs and ISNIs, which may be expressed as URIs (e.g. ). The French translation of DCMI Metadata Terms uses the same PURLs, but they resolve to English texts. In order to access French element descriptions, it is necessary to use their URLs, such as or creator in the Terms namespace. Translations to e.g. Czech, Japanese and Italian are similar in this respect: the identifier of the element is the PURL of the English version (which is OK), but the PURL does not enable the user to retrieve documentation about the element in the appropriate language. From users' point of view it is confusing that in MARC 21 and all other formats which rely on HTTP URLs, each translation of the format has its own set of identifiers. In Dublin Core the same identifier is used for several translations, but the identifier provides linking only to the version in English. Therefore users who want to understand the metadata they see, may have difficulties finding this information in an appropriate language. NOTE: Some XML-based metadata formats have XML namespaces: AudioMD: http://www.loc.gov/audioMD/ MARCXML: http://www.loc.gov/MARC21/slim PREMIS: http://www.loc.gov/premis/rdf/v3/ VideoMD: http://www.loc.gov/videoMD/ Unfortunately namespaces for AudioMD, MARCXML and VideoMD are not resolvable using e.g. HTTP, and functionality supported by the PREMIS namespace is insufficient - links to individual tags such as http://www.loc.gov/premis/rdf/v3/Copyright do not provide useful results, since they all take the client only to the beginning of the file containing all element descriptions. When implemented in this manner links to descriptions of metadata elements do not help human and machine users of metadata to understand the metadata provided. Purpose: Provide a uniform basis and a tool for identification of elements in metadata formats. Provide functional improvements to the current URL-based identifiers of metadata elements. Benefits: Linking to alternative versions (e.g. full / concise, human readable / machine readable) of element descriptions with one URN. Automatic selection of the appropriate language (see below). For instance, a bibliographic record containing a link to the URI explaining MARC tag 245 in Finnish is only useful for users who understand Finnish, because the HTTP server holding these pages cannot redirect Swedish / English speaking users to MARC tag 245 pages on other HTTP servers that would be appropriate for them. Having a URN namespace dedicated for metadata elements may improve co-ordination on how identifiers for metadata elements are created and what they resolve to. It may also encourage the organizations maintaining metadata formats to provide documentation separately for each metadata element and in a form more suitable to the Web than e.g. PDF. A resolution service for URN:META identifiers can direct users to information in their preferred languages if the resolution method communicates their language preferences to the resolver. If a URN:META identifier is assigned to the Library of Congress description of MARC 245 tag, it is possible for a resolution service to direct a user to the descriptions of this tag in multiple languages, depending on the language settings of the user's Web client. This functionality can be implemented by supporting the HTTP Accept-Language header in the URN resolver. A prerequisite for this functionality is that all element URLs from translated versions are harvested to the resolver's URN – URL mapping table. Since these URLs should be stable, keeping the links up to date is feasible. If a network protocol used does not support language negotiation the required functionality may also be implemented with the URN R-component. NOTE: A URN should not be assigned if an element already has a well-managed persistent identifier such as DOI . Syntax: The Namespace-Specific String (NSS) consists of three parts: o a prefix consisting of a code identifying the metadata format and optional sub-namespace code(s) separated by a colon(s); o a hyphen (-) as the delimiting character; and, o a string assigned under the auspices of the format maintenance agency. These strings may be constructed according to the local preferences as long as they are aligned with the requirements of RFC 3986 and RFC 8141. Format maintenance agency is the organization maintaining the original version of the format, such as Dublin Core Metadata Initiative for Dublin Core, or the Library of Congress for MARC 21 formats. A maintenance agency shall specify the NSS syntax for its formats, and it may outsource the URN assignment and maintenance of URN resolver to a third party, such as an organization maintaining a translation of the format. The following formal definition uses ABNF [RFC5234]. meta-nss = prefix "-" meta-string prefix = format-code *( ":" sub-namespace ) ; The entire prefix is case insensitive. format-code = 1*(ALPHA / DIGIT) ; As assigned by the National Library of Finland ; (identifies the metadata format and the maintenance ; agency to which the branch is delegated). sub-namespace = 1*(ALPHA / DIGIT) ; As assigned by the respective format maintenance ; agency. meta-string = path-rootless ; The "path-rootless" rule is defined in RFC 3986. ; Syntax requirements specified in RFC 8141 MUST be ; taken into account. The meta-string is case-sensitive unless specified as case-insensitive by the maintenance agency. The following metadata format codes SHALL be used: Descriptive metadata Code Format(s) URL BF BIBFRAME http://www.loc.gov/bibframe/ danMARC2 DANMARC2 http://www.kat-format.dk/danMARC2/ DW Darwin Core https://dwc.tdwg.org/terms/ DC Dublin Core https://www.dublincore.org/specifications/dublin-core/ DDI DDI https://ddialliance.org/explore-documentation EAD EAD https://www.loc.gov/ead/ FINMARC FINMARC https://www.kiwi.fi/display/Marc21/FINMARC IMARC INTERMARC https://www.bnf.fr/fr/intermarc-bibliographique-de-diffusion LIDO LIDO http://network.icom.museum/cidoc/working-groups/lido/ MARC MARC 21 https://www.loc.gov/marc/marcdocz.html MARCXML MARCXML http://www.loc.gov/standards/marcxml/ MIX MIX http://www.loc.gov/standards/mix/ MODS MODS http://www.loc.gov/standards/mods/ ONIX ONIX https://www.editeur.org/8/ONIX/ UKMARC UKMARC https://www.webarchive.org.uk/wayback/archive/20160000000000/http://www.bl.uk/bibliographic/ukmarc.html UNIMARC UNIMARC https://www.ifla.org/unimarc Administrative metadata Code Format URL PREMIS PREMIS http://www.loc.gov/standards/premis/ TEXTMD textMD https://www.loc.gov/standards/textMD/ AUDIOMD audioMD https://www.loc.gov/standards/amdvmd/ VIDEOMD videoMD https://www.loc.gov/standards/amdvmd/ Cataloguing rules Code Rules URL ISBD ISBD http://iflastandards.info/ns/isbd/elements/ RDA RDA https://www.rdaregistry.info One code may cover an entire family of formats (e.g. MARC Authority, Bibliographic and Holdings formats). Sub-namespaces may be used to differentiate formats within these format families if necessary. Since all prefixes with the same format-code are delegated to the same maintenance agency, such families are perforce maintained by the same agency. National translations of metadata standards and cataloguing rules shall use the codes and URNs of the original specifications. Thus the Finnish translation of MARC 21 shall use the prefix MARC of MARC 21 and URNs assigned to the elements of the English version of the format. If resolution is done via HTTP, the URN resolver can use HTTP language negotiation to direct the client to the correct language version. Other network protocols may support similar functionality in the future. National variants of metadata formats (e.g. historical FINMARC format, which was based on equally outdated USMARC) shall have their own format codes, since their tags and semantics may differ from the original ones. For instance, UKMARC tag 245 is not the same as USMARC tag 245, since in the former subtitle had its own tag, 248, whereas in the latter, subtitle was included in tag 245. Metadata application profiles such as Darwin Core, an extension of Dublin Core intended for sharing of information about biological diversity, may have codes of their own if they a) extend the base format substantially and b) are well documented and stable. The structure (if any) of the meta-string is determined by the authority for the prefix. Within the meta-string, it is recommended that a hyphen is used for separating different sections of the identifier from one another in order to improve the human readability of the string. Maintenance agencies SHOULD NOT use in meta-strings characters requiring percent-encoding. Registering format codes: New codes will be added by the National Library of Finland on request. Requests should be sent to meta-request&helsinki.fi. The list of registered format codes will be maintained as a part of this document. NOTE: Formats included above are the ones most commonly used by libraries, archives, museums and in publishing. It is anticipated that the need for adding new formats will not be frequent. Rules for lexical equivalence: Whereas the prefix is case insensitive, meta-strings MAY be case sensitive at the preference of the assigning authority; parsers therefore SHALL treat these as case sensitive, and any case mapping needed to introduce case insensitivity is the responsibility of the relevant resolution system. Case insensitivity of the prefix must be taken into account when URN:META identifiers are compared. META assignment: National and international metadata format maintenance agencies may use URN META when they want to assign persistent identifiers for the metadata elements and tags of their formats, and provide URN-based access to machine or human readable descriptions about these metadata elements. For the time being these descriptions are unstructured text on Web pages. The URN assigned to the element shall not change even if the description of the element is changed. URNs assigned to deleted elements shall not be re-used. Metadata format maintenance agencies shall have procedures in place to make sure that the assigned URNs are unique and persistent. Since the number of metadata elements on formats is relatively low (at most a few hundreds) such procedures can be simple (e.g. URN can be based on the name of the element). Security and Privacy: URN:META identifiers do not have any known security or privacy issues. They are intended to have publicly-known meanings and do not refer to specific individuals, groups, or organizations. Interoperability: URN:META identifiers do not have any known interoperability related issues. Resolution: General URNs in the URN:META namespace MUST be resolvable. Each registrant of a format-code MUST register a base URL (ending in "/") for a resolution service for URN:METAs that have that format-code. A registrant may provide additional base URLs for prefixes composed of that format-code and one or more following sub-namespaces. The base URL for resolving a particular URN:META is the base URL for the longest registered prefix which is an initial part of the URN's prefix. A URN:META is resolved by composing a URL by concatenating the base URL with the URN:META and fetching that URL using the normal HTTP/HTTPS GET method. The retrieved resource SHOULD describe the identified metadata element and MAY provide or reference further information about it. Resolution services MAY respond to GET requests with a redirection response whose Location header field is a URL of a preexisting description of the element. If information about the element is available in multiple languages, a resolution service SHOULD use the HTTP Accept-Language header to select a URL of a resource in the user's requested language. URN to URL resolution service from the URN to the URL of the page describing the identified metadata element MUST be supported. Other resolution services such as a link to appropriate cataloguing rule may be provided if appropriate. Example 1 Namespace URN:META:MARC contains URNs which identify the tags of MARC 21 metadata formats in various languages. Unlike current HTTP URLs, all language versions share the same identifier. In this example, URNs are expressed as HTTP URIs which use a (non-existent) URN resolver located http://example.com/. It is assumed that the language setting of the client is English (and as a default) URNs http://example.com/urn:meta:marc- will be resolved at MARC 21 format specific pages at directories https://www.loc.gov/marc/bibliographic/ https://www.loc.gov/marc/authority/ https://www.loc.gov/marc/holdings/ on the Library of Congress site. Case 1 URN of the MARC Bibliographic format tag 245 (Title Statement) Assuming that the registered resolver base URL for urn:meta:marc is http://example.com/, urn:meta:marc-bd245 would be resolved by fetching http://example.com/urn:meta:marc-bd245 which (absent an Accept-Language header in the GET request) would redirect either to https://www.loc.gov/marc/bibliographic/bd245.html (full description of the tag) or to https://www.loc.gov/marc/bibliographic/concise/bd100.html (concise description of the tag) depending on the service requested. If full description is set as a default, concise description can be made available via an R-component based service request. If the GET request contained "Accept-Language: fi" (preferring Finnish), it would always redirect to https://marc21.kansalliskirjasto.fi/bib/20X-24X.htm#245 since there is only one Finnish translation, based on the concise version. If the GET request contained "Accept-Language: sv" (preferring Swedish), it would always redirect to http://www.kb.se/katalogisering/Formathandboken/Bibliografiska-formatet/210-249/#245 since there is only one Swedish translation, based on the concise version. File names of pages describing tags in MARC 21 formats have the same syntax (MARC Bibliographic uses bdxxx.html, where xxx is the tag number), no URN – URL mapping table is required in the resolver. Target URLs can be generated from URNs programmatically. Since the target URLs have been stable for decades, the need to modify programmatic mapping should be very infrequent. Case 2 URN of the MARC Authority format tag 100 Assuming that the registered resolver base URL for urn:meta:marc is http://example.com/, urn:meta:marc-ad100 would be resolved by fetching http://example.com/urn:meta:marc-ad100 which (absent an Accept-Language header in the GET request) would redirect to https://www.loc.gov/marc/authority/ad100.html If the request contained "Accept-Language: fi" (preferring Finnish), it would redirect to https://marc21.kansalliskirjasto.fi/aukt/1XX.htm#100 If the request contained "Accept-Language: sv" (preferring Swedish), it would redirect to http://www.kb.se/katalogisering/Formathandboken/Auktoritetsformatet/1XX/#100 Example 2 URN of the Dublin Core "terms" namespace property Title and the Dublin Core "elements/1.1" namespace element Title. Assuming that the registered resolver base URL for urn:meta:dc is http://example.com/, urn:meta:dc:terms-title would be resolved by fetching http://example.com/urn:meta:dc:terms-title which (absent an Accept-Language header in the GET request) would redirect to http://purl.org/dc/terms/title Assuming that the registered resolver base URL for urn:meta:dc is http://example.com/, urn:meta:dc:elements1.1-title would be resolved by fetching http://example.com/urn:meta:dc:elements1.1-title which (absent an Accept-Language header in the GET request) would redirect to http://purl.org/dc/elements/1.1/title Persistence: Persistence of URN:META resolution services depends on the persistence of metadata formats. Metadata about deprecated MARC formats such as USMARC, UKMARC or FINMARC may no longer be available at all, and even if it is, its form and content may not be suitable for URN resolution. See for instance UKMARC documentation at https://www.webarchive.org.uk/wayback/archive/20160107133726/http://www.bl.uk/bibliographic/ukmarc.html or FINMARC documentation at https://www.kiwi.fi/display/Marc21/FINMARC Additional documentation / information: None