Discovery Architecture for Cultural Heritage: Layered Retrieval, Institutional Authority, and the Limits of Keyword Search

Edward Monk

doi:10.5281/zenodo.20418999

Abstract

Cultural heritage discovery systems were designed for a retrieval environment that no longer fully describes the people who use them. The vocabulary of the cataloger and the vocabulary of the searcher are not the same, and no refinement of keyword search has been able to close that gap. This paper argues that the answer lies in a layered discovery architecture that combines keyword retrieval, semantic similarity, and AI-assisted query mediation, while preserving the authoritative institutional record as the only thing the end user ever sees. The central architectural principle is that AI belongs at the representation layer, not the interpretation layer. The system helps users find records. It does not generate, rewrite, or interpret them. LinkedCulture, an open-source prototype built across eight cultural heritage institutions and more than two hundred thousand records, demonstrates this architecture in operation and documents three observations from its deployment: that layered retrieval surfaces records keyword search alone misses, that neither retrieval mode dominates unconditionally across query types, and that a shared representational space appears to mediate vocabulary inconsistency across institutional boundaries in ways that single-institution search cannot replicate. The implications for how cultural heritage institutions approach discovery infrastructure, retrieval evaluation, and the appropriate role of AI in their systems are discussed.

1. The gap between collections and discovery

A researcher interested in the history of labor organizing searches a major archive's public interface. The collection holds hundreds of relevant objects: photographs, pamphlets, correspondence, objects of daily working life. The objects are cataloged, digitized, and publicly accessible. The researcher searches the terms they know. They find little, or they find the wrong things, or they find nothing at all. They leave without the objects that would have mattered most to their work, and without knowing those objects exist.

This is not an unusual situation. It is a persistent condition of cultural heritage discovery.

The problem is not the institution. Cultural metadata systems were built for stewardship and internal cataloging, not for conceptual public discovery. A cataloger describing a photograph in 1974 used the vocabulary available to them: object type, period, donor information, collection-specific taxonomy, the controlled language of their discipline at that moment. That vocabulary was precise, internally consistent, and appropriate for its purpose. It was never designed to anticipate the language a researcher, a student, or a curious member of the public would use fifty years later when searching from a browser. The vocabulary of the cataloger and the vocabulary of the searcher were built by different people, for different purposes, at different moments in time.

This is not simply a data quality problem. It is a structural condition of how cultural knowledge gets recorded and how people go looking for it. Metadata improvement alone cannot fully close that gap, because the gap exists not where the metadata is wrong, but where it is entirely correct and still insufficient for the discovery purpose it was never designed to serve.

This condition is not unique to any single institution or collection type. It affects museums, libraries, archives, and historical societies in equal measure, large collections and small ones, collections digitized for decades and collections that came online last year. A researcher who does not share the institution's vocabulary cannot find what the institution holds, regardless of how carefully that material was described.

The field has not ignored this problem. Controlled vocabularies, authority files, linked data initiatives, and aggregation projects have each attempted to address portions of it [4], and each has made genuine contributions to interoperability and consistency. This paper does not dismiss that work. It argues that those efforts, valuable as they are, address the problem at the metadata layer rather than the discovery layer, and that closing the gap between cultural knowledge and the people who need it requires architectural thinking about how retrieval itself operates.

What makes this problem particularly difficult is that it is invisible to the institution. The collection is accessible. The search interface works. Objects are returned in response to queries. The failure, when it happens, looks like an absence of relevant material rather than a failure of the discovery system. The institution rarely knows what the user did not find.

This paper is concerned with that invisible failure, and with the architectural question it raises: what would a discovery system need to do differently to close the gap between how cultural knowledge is recorded and how people actually go looking for it? The answer, this paper argues, is not to replace keyword search, but to build a layered discovery architecture above and alongside it, one whose machinery remains invisible to the user, and whose only visible output is the authoritative institutional record.

2. Why keyword search, including its fuzzy variants, is insufficient alone

Keyword search works. That is worth stating plainly, because this paper is not an argument against it. When a researcher knows the term an institution used, keyword search returns the right records quickly and reliably. It is precise, predictable, and transparent in a way that matters enormously for institutional trust. A cataloger, an archivist, or a researcher with deep subject expertise can use it effectively. For known-item retrieval, it remains the appropriate tool.

The problem emerges at the boundary of that expertise. For the researcher who does not already know the institution's vocabulary, keyword search does not fail gradually. It fails completely. A search that does not share a single term with the relevant metadata returns nothing, and nothing looks identical to an empty collection. The user cannot tell the difference between a collection that does not hold what they are looking for and a collection that holds it under a name they have never encountered.

The field recognized this problem and responded with more flexible matching approaches: systems that tolerate spelling variation, account for typographic error, and return results for terms that closely resemble the query even when they do not match exactly. These are genuine improvements for a specific class of failure. A researcher who misspells a term, uses a British spelling where the catalog uses an American one, or transposes two letters in a proper name benefits directly from that flexibility.

But the gap that matters most is not a spelling gap. It is a meaning gap.

A keyword search for "symbols of liberty" illustrates a failure mode that is in some ways harder to detect than returning nothing at all. Results are returned. The search appears to work. But because the word "symbols" appears across thousands of unrelated catalog records, the result set is semantically incoherent: objects that share a term but not a meaning, assembled by the matching logic without regard for what the researcher was actually looking for. A ceramic vessel cataloged with full accuracy, including object type, period, region, and material, will not surface for a researcher searching for objects associated with ritual or domestic life. The metadata is exemplary. The semantic distance between the cataloger's descriptive frame and the researcher's conceptual frame is not a quality problem. It is a vocabulary problem. A researcher exploring themes of migration will not find objects cataloged under the geographic and period terminology of a specific collection's internal taxonomy. These failures take different forms, sometimes an incoherent result set, sometimes an empty one, but they share the same structural cause: the vocabulary used to describe the object and the vocabulary the researcher brings to the search do not overlap, and no amount of surface-level matching flexibility changes that.

This is the boundary at which keyword search, in any of its forms, reaches the limit of what it was designed to do. It was designed to match terms. When the terms do not match, it has nothing to work with. The sophistication of the matching algorithm is beside the point when the query and the metadata occupy entirely different vocabularies.

It is also worth noting what this means for the institutions themselves. Keyword search creates an invisible asymmetry between users who already possess subject expertise and users who are arriving at a collection without it. The experienced researcher who knows the period terminology, the collection-specific controlled vocabulary, or the donor language finds the collection accessible and well-organized. The student, the curious member of the public, or the researcher approaching the subject from an adjacent discipline encounters a wall they cannot see. The collection appears not to hold what they are looking for. In most cases, they will not return.

This asymmetry is not a design failure on anyone's part. Keyword search emerged within retrieval environments where users were more likely to share institutional or disciplinary vocabulary with the catalog itself. In that environment, it works exactly as intended. The environment has changed. The public audiences for cultural heritage collections are broader, more diverse, and less likely to arrive with institutional vocabulary than at any previous point in the history of these collections. The retrieval model has not kept pace with that change.

3. The user's burden: finding the collections before you can search them

The failure described in the previous section assumes the researcher has already arrived at the right collection. That assumption deserves examination, because for most users, finding the right collection is itself a substantial and largely unacknowledged problem.

Consider what a researcher must do before they can search. They must first know that a relevant collection exists. This requires subject knowledge that most members of the public do not have and cannot reasonably be expected to acquire. They must then locate the collection's public interface, which may be a standalone website, a subdomain of an institutional site, a third-party aggregator, or some combination of all three. They must learn how that particular institution has organized its search: what filters are available, what controlled vocabulary is in use, what the collection's particular strengths and gaps are. None of this knowledge transfers to the next institution. A researcher who has become fluent in navigating one collection's interface and vocabulary begins again from zero at the next.

For a researcher working within a single well-defined subject area at a single institution, this burden is manageable. For anyone working across time periods, disciplines, or geographic regions, it is considerable. A researcher interested in the material culture of the Atlantic world, or the history of a particular craft tradition, or the iconography of a specific religious movement, may need to navigate dozens of separate institutional interfaces before they can be confident they have searched the relevant collections. Many will stop long before that point, not because the material does not exist, but because the cost of finding it is too high.

This is a discovery problem that precedes the retrieval problem entirely. It does not matter how good an institution's search interface is if the researcher does not know the institution exists. The retrieval layer and the navigation layer are separate problems, and the field has historically addressed only the first.

The navigation burden falls unevenly. A specialist researcher with institutional affiliations, subject expertise, and professional networks has access to knowledge about collections that a graduate student, an independent scholar, or a curious member of the public does not. The person who would benefit most from discovering an object in a collection they have never heard of is precisely the person least equipped to find it. That asymmetry is not a consequence of any individual institution's choices. It is a structural property of a discovery landscape organized around institutional silos rather than around the questions researchers actually bring to collections.

No discovery layer operating across the breadth of cultural heritage institutions has achieved the kind of unified public access that the scale and diversity of these collections would seem to demand. Meaningful aggregation efforts exist and have made genuine contributions. But aggregation of metadata is not the same as a shared discovery layer, and the navigation burden for researchers working across collections remains largely intact.

4. The infrastructure gap: why collections still do not talk to each other

The problems described in the preceding sections share a common structural root. They are not primarily failures of individual institutions. They are consequences of a discovery landscape that evolved institution by institution, collection by collection, without a shared layer connecting them. Each institution built its own interface, its own search conventions, its own metadata standards, and its own relationship with its public. The result is a fragmented ecosystem in which the connections between collections, connections that would make each collection more meaningful in relation to the others, are largely invisible to the people who need them most.

This fragmentation is not inevitable. Coordinated cultural heritage infrastructure developed in other parts of the world has demonstrated that cross-institutional discovery at scale is achievable. A researcher can search across the holdings of thousands of institutions through a single interface, encounter consistent search conventions, and surface relevant material without knowing in advance which institution holds it. The technical and organizational problems this requires are substantial, but they have been solved. The will to solve them, and the institutional coordination required to act on that will, is the harder problem.

In the United States, the most significant effort to address this fragmentation at scale has been the Digital Public Library of America, launched in 2013 [2]. The DPLA aggregates metadata from hundreds of contributing institutions and presents it through a unified public interface, making it possible for a researcher to search across collections that would otherwise require separate visits to separate websites. That is a genuine and meaningful contribution to access. It represents exactly the kind of institutional coordination the field needs more of.

But the DPLA's architecture, like most aggregation efforts, addresses the navigation layer without changing the retrieval layer. A researcher using the DPLA still encounters the vocabulary problem described in Section 2. Aggregation multiplies access to the problem without resolving it.

This is the distinction that matters most for the argument this paper is making. The infrastructure problem and the retrieval problem are not the same problem. Solving the first, bringing collections into a shared discovery space, is necessary but not sufficient. Solving the second, making retrieval within that space responsive to conceptual proximity rather than only to terminology, requires a different kind of architectural thinking. The two problems must be addressed together, at the same layer of the system, or the gains from infrastructure investment will remain partial.

The question is not whether shared discovery infrastructure is worth building. It clearly is. The question is what kind of retrieval should operate within it. A shared discovery layer that still relies entirely on keyword matching is a better version of the current situation, but it is not a fundamentally different one. The researcher who does not share the cataloger's vocabulary remains effectively excluded, now from a larger and better-organized system that still cannot find what they are looking for on their behalf.

5. A layered discovery architecture: what it is, and what it is not

The preceding sections have established a layered problem. Discovery fails before retrieval begins, because users cannot find relevant collections without specialist knowledge. Retrieval fails at the vocabulary boundary, because keyword matching cannot bridge the gap between the cataloger's language and the researcher's language. Infrastructure efforts have made meaningful progress on the first problem without addressing the second. The question now is what a retrieval approach that addresses the second problem actually looks like, and what it requires of the systems and institutions that adopt it.

The core architectural shift is this: rather than matching the characters in a query against the characters in a metadata record, a retrieval system can encode both the query and the record as numerical representations in a shared space, where proximity can reflect conceptual association rather than textual identity. A query for "objects associated with grief" and a record described as a "memorial textile, West Africa, 19th century" may share no words at all, but in an appropriately trained representational space, they may be proximate. The retrieval system surfaces the record not because the terms matched, but because the representational relationships between the query and the record's descriptive language associate them within the shared space.

This is AI at the representation layer, not the interpretation layer. The model that constructs the representational space is a machine learning system trained on large bodies of text [3]. It encodes relationships that emerge from patterns in that training data, including associations between concepts, periods, materials, and cultural contexts that do not share explicit lexical connections. What it produces is not an interpretation of an object's meaning. It is a representation of the associative relationships encoded in the object's descriptive metadata, situated within a space that can be queried by conceptual proximity rather than by term matching alone.

It is important to be precise about what this architecture does not do, because the distinction matters for institutional trust.

It does not generate answers. The system does not produce summaries, explanations, or descriptions of objects. It surfaces records. What the researcher sees is the cataloged record the institution created, unaltered and unmediated.

It does not rewrite metadata. The descriptive language the institution used remains exactly as the cataloger wrote it. The retrieval system works with that language as input. It does not modify, supplement, or override it.

It does not claim curatorial authority. The associative relationships the system encodes are retrieval signals, not interpretive statements. When the system surfaces a memorial textile in response to a query about grief, it is not asserting that the object represents grief. It is identifying a proximity in representational space that makes the record a plausible candidate for the researcher's attention. The interpretation remains entirely with the researcher and the institution.

It does not replace keyword search. Keyword search remains in the retrieval stack. For known-item retrieval, for searches where the researcher already possesses the relevant vocabulary, keyword matching is faster, more transparent, and more precise than similarity-based retrieval. A layered architecture uses both, routing queries through whichever combination of retrieval signals is most likely to surface relevant records.

This layered approach is not a new concept in retrieval systems generally. Combining multiple retrieval signals, including keyword matching, semantic similarity, and other relevance signals, is a well-established practice in document retrieval and commercial search [1]. What is less established is how these techniques behave when applied to cultural heritage metadata specifically, where the data properties, the retrieval objectives, and the institutional requirements differ substantially from the environments in which these systems were originally developed and optimized.

Cultural heritage metadata is sparse in ways that commercial document collections are not. It is inconsistent across institutions in ways that reflect genuine differences in cataloging philosophy and historical practice. It is multilingual, and the linguistic boundaries between collections do not correspond to the conceptual boundaries between the objects they describe. It carries authority structures and provenance obligations that have no equivalent in product retrieval or web search. And the retrieval objective itself is different: the goal is not relevance to a commercial intent or a navigational query, but the surfacing of objects that a researcher may not have known to look for, from collections they may not have known existed.

6. LinkedCulture: observations from a working system

LinkedCulture is an open-source semantic search prototype built across the public collections of eight cultural heritage institutions, five based in the United States and three in Europe. The index spans more than two hundred thousand records drawn from publicly accessible collection APIs and encoded as vector representations using an open-source embedding model [3]. No proprietary infrastructure is involved; the system is built entirely from open-source components.

The architecture implements the layered retrieval model described in the preceding section. Keyword matching and semantic similarity retrieval operate in parallel, and their relative contribution to the result set is governed by a reciprocal rank fusion mechanism [1] that can be adjusted in real time. Reciprocal rank fusion (RRF) is a retrieval-ranking technique that combines independently ranked result lists into a single merged ranking. A researcher can weight retrieval entirely toward keyword matching, entirely toward semantic similarity, or at any point along the continuum between them. That adjustability is not merely a convenience feature. It makes the behavioral difference between the two retrieval modes directly observable, and it allows the researcher to adapt the system to the nature of their query rather than adapting their query to the limitations of the system.

Two queries illustrate the observed behavior clearly.

A search for "symbols of liberty" run against keyword retrieval alone returns results, but the result set is incoherent. The word "symbols" appears as a generic descriptive term across thousands of unrelated catalog records, and the matching logic assembles them without regard for conceptual relevance. The results share a term but not a theme. When the same query runs against semantic retrieval, the result set shifts. Objects conceptually associated with liberty surface: representations of the Statue of Liberty, works entitled liberty, objects whose descriptive metadata associates them with freedom, national identity, and related conceptual territory. Many of these records do not contain the word "symbols" at all. The retrieval is operating on associative proximity in the representational space, not on term frequency in the catalog.

A search for "migration and exile" produces a different and in some ways more instructive observation. Keyword retrieval is not simply insufficient here. It is actively misleading, returning records that contain those terms in unrelated contexts while failing to surface objects whose descriptive metadata encodes the experience of displacement, movement, and loss without using those words. Semantic retrieval surfaces a broader and more conceptually coherent set of candidates. But the most useful result set is neither the keyword result nor the semantic result alone. It is the hybrid, where both retrieval signals contribute and their relative weight is adjusted to suit the query. A researcher with specific terminology can weight toward keyword precision. A researcher exploring a theme they cannot fully articulate can weight toward semantic breadth. The system adapts to the researcher rather than requiring the researcher to adapt to the system.

These two examples illustrate a behavioral pattern observed consistently across the index. Queries that use generic or thematic language tend to benefit from semantic weighting. Queries that use specific institutional or disciplinary terminology tend to benefit from keyword weighting. Neither retrieval mode dominates unconditionally. The appropriate balance depends on the nature of the query, the characteristics of the underlying metadata, and the researcher's own degree of familiarity with the collection vocabulary. A system that offers only one mode forecloses options that a layered architecture keeps open.

A third observation concerns behavior that was not designed into the system but emerged from operating the index across multiple institutions simultaneously. When records from eight collections with different cataloging conventions, different controlled vocabularies, and different institutional histories are encoded into a shared representational space, the space itself appears to mediate a degree of the vocabulary inconsistency between them. Objects that are conceptually proximate but described in institutionally different language find proximity to each other in the shared space that neither institution's catalog could surface independently. A query for "maritime migration" run against keyword retrieval returns few images of vessels on water, because the catalog records that hold those images rarely use that phrase. The same query run against semantic retrieval surfaces them consistently across multiple institutions, drawing on associative proximity between the query and the descriptive language of objects depicting sea crossings, ocean travel, and displacement by water. The records originate from different collections with different cataloging traditions. The shared representational space surfaces the connection between them that no single institution's catalog could have made visible alone.

Hybrid retrieval does not eliminate retrieval error. Reciprocal rank fusion [1] merges ranked candidate lists from both retrieval modes, but it has no mechanism for evaluating the quality of those lists before combining them. A weak keyword result set, one that contains a small number of marginally relevant records alongside many that are not, contributes its full contents to the merged ranking. The fusion process can elevate weakly relevant or irrelevant keyword candidates higher in the ranking than either retrieval mode independently would justify, because both lists are treated as equally trustworthy inputs regardless of how they were produced. In cultural heritage environments, where metadata is frequently sparse and thematic queries often produce low-confidence keyword candidates, this is a consistent rather than occasional problem. The adjustable weighting mechanism in LinkedCulture allows a researcher to reduce the contribution of a poorly performing retrieval mode in real time, but determining the appropriate weighting for a given query, collection, and metadata density remains an open and unsolved problem.

This is an observation, not a claim of systematic performance. The representational space does not resolve all vocabulary differences, and it introduces associative noise of its own. Records surface as candidates on the basis of proximity that do not always serve the researcher's intent. The system requires evaluation and refinement over time. But the cross-institutional associative behavior is real, consistently observed, and represents a property of the shared representational space that has direct implications for how multi-institution discovery infrastructure might be designed and evaluated.

7. What this means for the field

The observations described in the preceding section are not, individually, surprising to anyone familiar with the retrieval literature. Layered retrieval systems that combine keyword and semantic signals are well documented. The behavior of vector representations across heterogeneous text collections is an active area of research. What is less documented is how these techniques behave specifically in cultural heritage environments, and what the implications of that behavior are for institutions whose obligations extend beyond retrieval performance to questions of authority, provenance, and public trust.

The argument this paper has built rests on a single architectural claim: that the most important boundary in a cultural heritage discovery system is not the boundary between keyword and semantic retrieval, but the boundary between the retrieval layer and the interpretation layer. Everything that helps a researcher find a record belongs on one side of that boundary. Everything that constitutes the institutional meaning of that record belongs on the other. Keeping those two things separate is not a technical preference. It is an institutional requirement.

This matters for how the field thinks about adopting AI-assisted retrieval. The anxiety most commonly expressed by museum and library professionals about AI in their systems is not about retrieval performance. It is about authority. Who is responsible for what the system says about an object? Whose interpretation is being presented to the public? A layered discovery architecture that observes the boundary described in this paper answers those questions cleanly. The system says nothing about objects. It surfaces them. What they mean, in what context, with what authority, remains entirely with the institution and the cataloger.

That distinction also changes how the field should evaluate these systems. Retrieval performance in cultural heritage environments cannot be assessed using the same metrics applied in commercial search. There is no ground truth for relevance in exploratory cultural discovery. A researcher searching for objects associated with exile may find something unexpected and valuable that no relevance metric would have predicted. The appropriate evaluation framework is one that asks whether the system expanded the range of discoverable objects without misrepresenting any of them, and whether it did so while preserving the integrity of the institutional record. Those are different questions from the ones commercial retrieval evaluation asks, and they require different methods.

The field is at an early stage of this work. LinkedCulture is one demonstration, built on open infrastructure, operating at a scale that is meaningful but not exhaustive. Its observations are a starting point, not a conclusion. What they suggest is that layered retrieval infrastructure for cultural heritage is not only technically feasible but behaviorally interesting in ways that merit serious investigation. The cross-institutional associative behavior observed in the shared representational space is particularly worth pursuing, because it points toward a model of discovery that does not require vocabulary alignment between institutions as a precondition for cross-institutional search. If that property holds at larger scale and across more heterogeneous collections, it has significant implications for how shared cultural heritage infrastructure might be designed.

The institutions that engage seriously with these questions now will have more influence over how they develop than those that wait.

8. Conclusion and invitation

Cultural heritage discovery systems were not designed to fail. They were designed for a retrieval environment that no longer fully describes the people who use them. The vocabulary of the cataloger and the vocabulary of the searcher have diverged over time, across disciplines, and across languages, and no refinement of keyword search has been able to close that gap. The result is a persistent and largely invisible failure mode in which objects exist in publicly accessible collections and remain practically undiscoverable to the researchers, students, and members of the public who need them.

The argument this paper has made is not that AI solves this problem. It is that a layered discovery architecture, one that combines keyword retrieval, semantic similarity, and AI-assisted query mediation while preserving the authoritative institutional record as the only thing the end user ever sees, addresses it more completely than any single retrieval approach can alone. AI belongs at the representation layer, not the interpretation layer. The machinery is invisible. The record is what surfaces.

LinkedCulture is an open-source demonstration of this architecture, built across eight institutions and more than two hundred thousand records using fully open infrastructure. Its observations are preliminary and its scale is modest relative to the breadth of cultural heritage holdings that exist. But it demonstrates that the approach is operational, that its cross-institutional behavior is consistent with the architectural principles this paper describes, and that the property of the shared representational space to surface conceptual associations across institutional vocabulary boundaries warrants serious further investigation.

Two threads from this paper will be developed in subsequent work. The behavior of the embedding model and the pipeline architecture that produces the shared representational space are the subject of the second paper in this series. The emergent semantic cluster structure observed within the combined index, groupings that no cataloger defined but that the representational space surfaces consistently, is the subject of the third. Together these papers aim to provide a methodological account of layered discovery infrastructure for cultural heritage that is detailed enough to be evaluated, reproduced, and improved upon.

Institutions interested in exploring this approach, contributing collections to the index, or engaging with the research questions this work raises are invited to make contact. LinkedCulture remains an active research prototype, and additional technical documentation will accompany subsequent papers in this series. The system and the observations described here can be examined through the live LinkedCulture demonstration platform [6].

The observations presented in this paper are qualitative and exploratory rather than benchmark-driven. No formal relevance study or comparative retrieval evaluation was conducted as part of this work. The purpose of the prototype is architectural observation: to examine how layered retrieval behaves across heterogeneous cultural heritage collections, rather than to optimize retrieval performance against a fixed benchmark.

The collections exist. The records are there. The work is to build the infrastructure that makes them findable by the people who need them, without asking those people to first become experts in the vocabularies of the institutions that hold them.

References

[1]Cormack, Gordon V., Charles L.A. Clarke, and Stefan Buettcher. 2009. “Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods.” SIGIR ’09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 758–759. https://doi.org/10.1145/1571941.1572114
[2]Digital Public Library of America. 2013. About DPLA. Accessed 2025. https://dp.la/about
[3]Nussbaum, Zach, John X. Morris, Brandon Duderstadt, and Andriy Mulyar. 2024. “Nomic Embed: Training a Reproducible Long Context Text Embedder.” arXiv:2402.01613. https://arxiv.org/abs/2402.01613
[4]Schreiber, Guus, Alia Amin, Lora Aroyo, Mark van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Borys Omelayenko, Jacco van Osenbruggen, Anna Tordai, Jan Wielemaker, and Bob Wielinga. 2008. “Semantic Annotation and Search of Cultural-Heritage Collections: The MultimediaN E-Culture Demonstrator.” Web Semantics 6 (4): 243–249. https://doi.org/10.1016/j.websem.2008.08.001
[5]Monk, Edward. 2026. Discovery Architecture for Cultural Heritage: Layered Retrieval, Institutional Authority, and the Limits of Keyword Search. Working paper. Zenodo. https://doi.org/10.5281/zenodo.20418999
[6]LinkedCulture Demonstration Platform. 2026. Accessed 2026. https://linkedculture.org

How to cite

Monk, Edward. 2026. “Discovery Architecture for Cultural Heritage: Layered Retrieval, Institutional Authority, and the Limits of Keyword Search.” LinkedCulture Research Paper Series No. 1. Zenodo. https://doi.org/10.5281/zenodo.20418999

This is the HTML edition. The archival, citable version of record is the Zenodo deposit (DOI 10.5281/zenodo.20418999).