Live - MVP
LinkedCulture
Unified Cultural Search
Published paper
Discovery architecture for cultural heritage
Layered retrieval, institutional authority, and the limits of keyword search
Zenodo · May 27, 2026 · CC BY 4.0
Central claim
AI belongs at the representation layer, not the interpretation layer.
LinkedCulture helps users find authoritative museum records. It does not generate, rewrite, or interpret those records for them.
- Scope
- 8 cultural heritage institutions
- Scale
- 200k+ open-access records
- Architecture
- Keyword + semantic + query mediation
Abstract
Cultural heritage discovery systems were designed for a retrieval environment that no longer fully describes the people who use them. The vocabulary of the cataloger and the vocabulary of the searcher are not the same, and no refinement of keyword search has been able to close that gap. This paper argues that the answer lies in a layered discovery architecture that combines keyword retrieval, semantic similarity, and AI-assisted query mediation, while preserving the authoritative institutional record as the only thing the end user ever sees. The central architectural principle is that AI belongs at the representation layer, not the interpretation layer. The system helps users find records. It does not generate, rewrite, or interpret them. LinkedCulture, an open-source prototype built across eight cultural heritage institutions and more than two hundred thousand records, demonstrates this architecture in operation and documents three observations from its deployment: that layered retrieval surfaces records keyword search alone misses, that neither retrieval mode dominates unconditionally across query types, and that a shared representational space appears to mediate vocabulary inconsistency across institutional boundaries in ways that single-institution search cannot replicate. The implications for how cultural heritage institutions approach discovery infrastructure, retrieval evaluation, and the appropriate role of AI in their systems are discussed.
Live prototype
Try the hybrid search described in the paper.
The hybrid interface combines keyword retrieval with semantic similarity and lets users tune the balance between exact search and exploratory discovery. It is the clearest public demonstration of LinkedCulture's layered retrieval architecture.
LinkedCulture Project Overview
A layered discovery prototype for open-access cultural heritage records across major museum collections. LinkedCulture combines keyword retrieval, semantic similarity, multilingual search, topic clustering, and shareable visual interfaces while preserving the institutional record as the authoritative object shown to users. The architecture places AI at the representation layer, not the interpretation layer: AI helps records become findable, but does not rewrite or replace museum metadata. Built on Ollama embeddings and Qdrant with a custom ingestion pipeline that can be rerun against updated collections and extended to additional institutions.

LinkedCulture Topics
Extends the same index into unsupervised discovery. Each cluster is labeled using the object descriptions and promoted to a draft topic for editing. The result is a set of thematic groupings that emerge purely from the geometry of the embedding space: objects that land near each other semantically, regardless of institution, date, or catalog category. Topics are a way of reading what the model already knows about the collection. To date, the system has identified over 1,939 clusters across 92 published topics, each under review before publishing. LinkedCulture Topics are then folded back into the semantic search results to enrich discovery further.
LinkedCulture Multilingual
Recherche sémantique multilingue dans les collections LinkedCulture. Interface en français pour les utilisateurs francophones découvrant les collections par concept et signification. Utilise un index d'intégrations multilingues pour rechercher au-delà des limites linguistiques et institutionnelles.
Who It Serves
Museums, archives, libraries, researchers, digital humanities teams, and cultural heritage organizations exploring keyword and semantic discovery across collections.
Pilot Fit
A useful pilot would index one collection or a small cross-institutional set, then evaluate search quality against real research, education, or public discovery tasks.
Shared Process
From fragmented inputs to usable outputs.
Ingest open-access cultural metadata
Generate metadata embeddings
Index in vector database
Hybrid search, keyword and semantic search by concept, material, or meaning