Re-implementing the N2T ARK Resolver
Archival Resource Keys (ARKs) are flavor of persistent
identifiers like DOIs, URNs, and Handles that have the benefit of being free,
flexible with what metadata gets attached, and natively able to resolve to web
pages. Name-to-Thing (N2T) implements a resolver for a
variety of ARKs, so this blog post is about how that resolver can be
re-implemented with the curies
Python
package.
In a lot of ways, ARKs look and act like CURIEs. For example,
ark:/53355/cl010277627
could be interpreted as having the prefix ark
and the
local unique identifier /53355/cl010277627
. The first part of each ARK between
the first two slashes corresponds to the provider. In this example, 53355
corresponds to the Louvre museum in Paris, France
and cl010277627
is the local unique identifier corresponding to the Vénus de
Milo statue.
However, I might have just committed ARK blasphemy. In N2T, it appears that the
ARK prefix and provider code stay grouped together in the front half like
ark:/53355/
and then the back half cl010277627
represents the local unique
identifier. This is very similar to the two-layer identifiers in DOI and the
arbitrary number of layer identifiers in OID.
The point is, if we can interpret this enough like CURIEs, we can use the
curies
package to implement a resolver. The first step we can take is to
download the N2T data from
https://n2t.net/e/n2t_full_prefixes.yaml.
Then we can parse out the ARKs (there are other things in N2T we’ll disregard)
with the following code:
import pystow
import yaml
URL = "https://n2t.net/e/n2t_full_prefixes.yaml"
PROTOCOLS = {"https://", "http://", "ftp://"}
def get_prefix_map():
"""Get the prefix map from N2T, not including redundant ``ark:/`` in prefixes."""
with pystow.ensure_open("n2t", url=URL) as file:
records = yaml.safe_load(file)
prefix_map = {}
for key, record in records.items():
uri_prefix = record.get("redirect")
if (
not uri_prefix
or all(not uri_prefix.startswith(protocol) for protocol in PROTOCOLS)
or uri_prefix.count("$id") != 1
or not uri_prefix.endswith("$id")
or not key.startswith("ark:/")
):
continue
key = key.removeprefix("ark:/")
prefix_map[key] = uri_prefix.removesuffix("$id") + "/" + key + "/"
return prefix_map
This prefix map removes ark:/
from the beginning of the prefixes in N2T and
also adds the provider code into the URI prefix to make the URIs more focused on
the local unique identifiers within each provider, rather than the entire ARK
space.
Once we have a prefix map, we can make a curies.Converter
and a Flask web
application for resolving in a few lines:
from curies import Converter, get_flask_app
def get_app():
"""Get an ARK resolver app, noting that it uses a non-standard delimiter and URL prefix."""
prefix_map = get_prefix_map()
print(prefix_map)
converter = Converter.from_prefix_map(prefix_map, delimiter="/")
app = get_flask_app(converter, blueprint_kwargs=dict(url_prefix="/ark:"))
return app
The two tricks here are:
- We want to remove the redundant
ark:/
then interpret the ARK provider code as the prefix and the rest as the local unique identifier. However, we still want to be able to write URLs in our resolver that have theark:/
prefix. Luckily, Flask has the facility to define a defaulturl_prefix
before a given blueprint that we invoke directly. - Unlike CURIEs that use a colon
:
as the delimiter between the prefix and local unique identifier, ARKs use a slash/
. We can also set this in the Converter’s settings.
Now, all we need to do is instantiate the app and serve it with any WSGI tool like Gunicorn, Uvicorn, or Flask’s built-in development server (from Werkzeug). Navigating to http://localhost:5000/ark:/53355/cl010277627 redirects to https://collections.louvre.fr/ark:/53355/cl010277627 and gets some nice art from the Louvre. In general, you can stick any ARK after http://localhost:5000/ark: that is resolvable via N2T when running this server.
All of this code is on GitHub and can be run with the following:
git clone https://github.com/cthoyt/n2t-ark-resolver
cd n2t-ark-resolver
python -m pip install -r requirements.txt
python wsgi.py
Update: since posting this, I have heard from John Kunze that the ARK format is
currently being updated to look more like URNs and therefore not have the slash
after ark:/
anymore. If/when that happens, there are only a few bits of string
pre-processing in this script that need to be updated to keep everything
running.