Linked Library Data: Making It Happen

Shlomo Sanders, Chief Technology Officer, Ex Libris

This post was originally published at the Ex Libris Developer Network Tech Blog

This article focuses on a practical way of making large of amounts of library data from disparate library sources available in an easily consumable linked data. The solution includes both discovery and access via URI of the underlying metadata. The focus is on use by non-library applications that do not need the intricacies and richness normally managed by libraries.

Libraries rely heavily on integrated systems for managing and delivering library services. These systems encompass a wide range of services and have multiple components empowered by a central catalog, whose records are usually in MARC 21 format. Years of effort have gone into making such records ever richer, but at the same time they have become more complex.

The Ex Libris Alma® unified management system employs a simplified record format for internal operations but must still interact with the library world and vendors that use MARC formats and occasionally Dublin Core metadata. To complicate the situation, many libraries maintain information in a variety of locations outside their primary catalog, such as in institutional repositories of theses or digital collections that are often based on Dublin Core.

To simplify access by patrons, many institutions provide a discovery system that offers a unified view of all their institutional data, whether it resides in the primary catalog or one of the institutional repositories. Ex Libris Primo®, for example, aggregates multiple sources into a common discoverable repository.

The non-library world, on the other hand, does not care which system the data resides in and cannot process MARC—and probably never will be able to. Linked-data standards hold the promise of making a two-way interchange of data possible between library systems and non-library systems on an as-needed and real-time basis.

Approaches to making library data accessible

There are two approaches to making library data accessible to the world: one is to draw information from each source system separately, and the other is to provide information in the form of linked data generated by a unified discovery system (such as Primo). Although it may lack some of the depth of description found in the source systems, a discovery system is a far simpler and more uniform conduit to the world.

We refer to a discovery system as unified if it combines data from multiple sources of library information, such as a library’s catalog and an institutional repository. A discovery system can make the following forms of information available as linked data: titles; URIs referring to authoritative authors and subjects, publication locations, and languages; publishers; descriptions; and availability information to help users access and borrow materials.

The richness of the available metadata depends wholly on the data that the discovery interface displays to its users—which usually includes the most important information that non-library users are likely to require. This information is also what non-library applications need in order to make use of a library’s descriptive data. An additional advantage of the discovery-system approach is that such a system is designed to be accessible by both people and computers in the world at large, and not just by local institutional users.

Basic linked-data services

Ex Libris is now working on making RESTful APIs available. Grounded in linked-data principles, these APIs will make end-to-end support possible for applications that are based on linked data. As a rule, Alma APIs are RESTful and may be configured to return JSON. The existing RESTful APIs use URIs to point to other objects maintained inside Alma. Alma has now added URI support for a BIB record (see here) including URIs to external objects including language, subject, author and some identifiers. It is interesting to note that the author may actually have multiple URIs. For example, to Library of Congress and VIAF. Further down the road, Alma will be able to “publish” BIBs using the same standard and including the same URIs.

In Primo, a RESTful search API that returns a list of search results, including URIs and basic metadata in JSON-LD format. Given this basic metadata, an application does not have to make a second call to display a simple list of results. The URIs give access to the full metadata, which is also in JSON-LD format. Ex Libris has already released a beta version of URI support in Primo; a JSON-LD context is described at https://github.com/ExLibrisGroup/primo.PNX-context, which is accessible to all.

With this search API and URIs that return full metadata, more than two billion metadata records that reside in over a thousand institutions using Primo worldwide and in the Ex Libris Primo Central Index are now available. The Primo Central Index enables discovery of over a billion articles, e books, and other types of content from a multitude of vendors. However, not all metadata in Primo Central is available via the URI because of vendor-imposed restrictions. The Primo URI provides access to metadata that a library has not defined as search restricted. Similarly, Primo Central URIs give access to metadata on which vendors have not imposed copyright restrictions. All records that are defined as open access (for example, institutional repositories that universities upload to Primo Central) are available through the Primo URI, as well as vendor metadata (in keeping with institutional licensing policies).

In its present form, the linked data returned by Primo contains a minimal number of links (URIs) to metadata records. As the service matures, Ex Libris plans to add links to more records, such as locations, author authority records, subject authority records, and records found in Primo that are either duplicates or additional editions that Primo has automatically identified as such.

Making linked data richer

One can easily envision end-to-end support for URIs in the Alma and Primo metadata ingest and cataloging processes. The option to incorporate such URIs would then be available in discovery services and in the linked data provided by Primo. Indeed, for authoritative URIs to achieve a high degree of accuracy, the metadata maintenance module (that is, cataloging) must take linked data into account and make persistent keys or URIs available for downstream use.

Ex Libris Alma and Primo SaaS deployments live in highly scalable multitenant environments. These SaaS environments proxy incoming RESTful API calls through an API gateway that serves a dual purpose. First, by providing a Try It Now button, the gateway enables any developer to obtain easy access to documentation and an API test harness, thereby dramatically reducing the time to first “hello world”. The second purpose of the gateway is to act as a run-time proxy so that unusual scenarios will not inadvertently lead to a denial of service. The proxy also ensures that an incoming URI will be automatically routed to the correct repository, thus facilitating the work of developers and keeping persistent URIs persistent despite the operational needs of live systems.</spa n>

Conclusions

Right now, while the world is just beginning to generate and use linked data, leveraging library discovery systems to help advance the growth of linked data seems to be the more pragmatic solution. In one fell swoop, Ex Libris is making library data available as linked data from Alma and Primo, with a consistent JSON-LD context. Furthermore, because the products are SaaS and have frequent update cycles all users of Alma and Primo SaaS linked data will benefit immediately as the linked support deepens.

As a result of our research and consultation into linked data for libraries, we have produced a new discussion paper: Putting Linked Data at the Service of Libraries.

December 9, 2015

What's Next?

BLOG