Ten years ago, the Harvard Art Museums, a teaching and research museum on the campus of Harvard University, started using multiple computer vision (CV) services to tag and describe its collections.
The initial goal was to improve search and discovery of the collections in both internal and external systems by augmenting the curatorial written descriptions with machine generated metadata.
During early tests it was apparent that CV showed a lot of promise for describing representational art in ways our catalogers didn’t have time to do, but quickly stumbled when presented with more abstract imagery. While assessing the stumbles we started asking a lot of questions. Including:
- What transpires when humans and machines look at art?
- How much does accuracy matter when describing material that is evaluated subjectively?
- How can we use the inconsistencies of AI to serve our university and public audiences in new ways?
Ten years later, we’ve fully embraced the inconsistency of CV, and now modern large-language models (LLMs).
Jeff Steward will cover:
- The museum’s image and data pipeline.
- Seven CV and AI services used to describe images of art.
- Several ongoing R&D projects exploring new modes of storytelling and engagement for university and public audiences.