Projects

Live - MVP

LinkedCulture

Unified Cultural Search

Published paper

Discovery architecture for cultural heritage

Layered retrieval, institutional authority, and the limits of keyword search

Zenodo · May 27, 2026 · CC BY 4.0

Central claim

AI belongs at the representation layer, not the interpretation layer.

LinkedCulture helps users find authoritative museum records. It does not generate, rewrite, or interpret those records for them.

Scope
8 cultural heritage institutions
Scale
200k+ open-access records
Architecture
Keyword + semantic + query mediation

Abstract

Cultural heritage discovery systems were designed for a retrieval environment that no longer fully describes the people who use them. The vocabulary of the cataloger and the vocabulary of the searcher are not the same, and no refinement of keyword search has been able to close that gap. This paper argues that the answer lies in a layered discovery architecture that combines keyword retrieval, semantic similarity, and AI-assisted query mediation, while preserving the authoritative institutional record as the only thing the end user ever sees. The central architectural principle is that AI belongs at the representation layer, not the interpretation layer. The system helps users find records. It does not generate, rewrite, or interpret them. LinkedCulture, an open-source prototype built across eight cultural heritage institutions and more than two hundred thousand records, demonstrates this architecture in operation and documents three observations from its deployment: that layered retrieval surfaces records keyword search alone misses, that neither retrieval mode dominates unconditionally across query types, and that a shared representational space appears to mediate vocabulary inconsistency across institutional boundaries in ways that single-institution search cannot replicate. The implications for how cultural heritage institutions approach discovery infrastructure, retrieval evaluation, and the appropriate role of AI in their systems are discussed.

Live prototype

Try the hybrid search described in the paper.

The hybrid interface combines keyword retrieval with semantic similarity and lets users tune the balance between exact search and exploratory discovery. It is the clearest public demonstration of LinkedCulture's layered retrieval architecture.

Published researchSemantic searchKeyword searchEmbeddingsVector DBClusteringOpen access collections

LinkedCulture Project Overview

A layered discovery prototype for open-access cultural heritage records across major museum collections. LinkedCulture combines keyword retrieval, semantic similarity, multilingual search, topic clustering, and shareable visual interfaces while preserving the institutional record as the authoritative object shown to users. The architecture places AI at the representation layer, not the interpretation layer: AI helps records become findable, but does not rewrite or replace museum metadata. Built on Ollama embeddings and Qdrant with a custom ingestion pipeline that can be rerun against updated collections and extended to additional institutions.

LinkedCulture Hybrid Search interface showing keyword and semantic discovery controls and results
Hybrid keyword and semantic retrieval across open-access museum records

LinkedCulture Topics

Extends the same index into unsupervised discovery. Each cluster is labeled using the object descriptions and promoted to a draft topic for editing. The result is a set of thematic groupings that emerge purely from the geometry of the embedding space: objects that land near each other semantically, regardless of institution, date, or catalog category. Topics are a way of reading what the model already knows about the collection. To date, the system has identified over 1,939 clusters across 92 published topics, each under review before publishing. LinkedCulture Topics are then folded back into the semantic search results to enrich discovery further.

LinkedCulture Multilingual

Recherche sémantique multilingue dans les collections LinkedCulture. Interface en français pour les utilisateurs francophones découvrant les collections par concept et signification. Utilise un index d'intégrations multilingues pour rechercher au-delà des limites linguistiques et institutionnelles.

Who It Serves

Museums, archives, libraries, researchers, digital humanities teams, and cultural heritage organizations exploring keyword and semantic discovery across collections.

Pilot Fit

A useful pilot would index one collection or a small cross-institutional set, then evaluate search quality against real research, education, or public discovery tasks.

Shared Process

From fragmented inputs to usable outputs.

01

Ingest open-access cultural metadata

02

Generate metadata embeddings

03

Index in vector database

04

Hybrid search, keyword and semantic search by concept, material, or meaning