Liliana Melgar, Jaap Blom, Eva Baaren, Marijn Koolen, Roeland Ordelman

A conceptual model for the annotation of audiovisual heritage in a media studies context

Paper Presentations IIAnalysis and discovery models for audiovisual materials

To support annotation tasks by media scholars and other scholars who use audiovisual media, a conceptual annotation model is being designed in the framework of the CLARIAH project (Common Lab Research Infrastructure for the Arts and Humanities). This model will serve as the basis for the development of a general workspace service for this group of scholars. This paper introduces the main conceptual dimensions involved in the design of this model.

Annotating has been identified as one of the scholarly primitives (Unsworth, 2000). Scholars usually make annotations in the context of their natural tasks (i.e., research or teaching). Previous research has found that scholars usually create their own set of semantic categories (Westman, 2009), which correspond to their own research questions. However, since current information systems are focused on retrieval, scholars are forced to create these categories using external applications. Instead of this separation, in an ideal case, scholars could be supported in performing annotation for research purposes during searching or browsing, as a form of information interaction integrated in information access systems, as well as being able to perform data correction or metadata enrichment that could become part of the memory institutions’ metadata.

There are four ways of looking at annotations, which have had or could have value for media studies: (1) an information professional approach (i.e., manual cataloging and indexing), (2) an automatic approach (i.e., through algorithms that index the content of media objects based on their low-level features), (3) a novice user approach (i.e., social tagging and crowdsourcing), and (4) a domain expert (or scholarly) approach. Separate applications, methods, and standards have been developed within each tradition. A point often overlooked by the previous approaches is that the experts (e.g., media scholars or cultural historians) are already active annotators (Walkowski & Barker, 2014) of all kinds of media, because this is an integral part of their research.

Different models on how media objects should be annotated and/or represented have been proposed in the framework of the annotating perspectives mentioned above. Those models have evolved separately into, for example, the W3C “open annotation data model” (Sanderson et al, 2013), tagging ontologies (Lohmann et al., 2011), or semantic models such as the Panofsky/Shatford matrix (Armitage & Enser, 1997), among others. The model that we propose includes some of their elements, for example: (1) expertise level (e.g., task expertise, domain expertise), (2) media types, (3) granularity level, (4) conceptual level (from work to item), (5) purpose of the annotation (e.g., retrieval, interpretation, economic, fun), (6) research stage in which the annotation is created (e.g.,

corpus creation, analysis), (7) annotating task (e.g., bookmarking, commenting, and transtextual relations between media, such as: based on, advertised in), (8) annotation type (i.e., closed vs natural language representations), (9) level of abstraction (from ofness to aboutness), and (10) level of structure (from highly controlled to loose control).

The attempt to create integrative models for annotating interactions is not new, and this effort aligns with previous research in this direction by Agosti and Ferro (2007), Agosti et al., (2007), Lanagan and Smeaton (2012), Haslhofer (2009), or Ruvane (2005), as well as with the efforts of the DARIAH network’s intention to enable interoperability across annotation methods, tools, and/or datasets in the humanities (Interoperable Annotations for the Arts and Humanities). In the crowdsourcing sector, the work by Geisler et al., (2010) also provides a broader framework for incorporating user annotations of film content.

The model is presented as a relational graph, a highly informal semantic network which provides a high-level view of the main dimensions involved in media annotation. The main benefit of developing the model proposed in this paper is that it is generic and technology-independent enough to be integrated across different audiovisual heritage information systems, but also provides a clear conceptual framework that supports different use cases identified in the domain of media studies. In this way, scholars can use annotations while interacting with these systems and in each stage of their research.


Agosti, M., & Ferro, N. (2007). A Formal Model of Annotations of Digital Content. ACM Trans. Inf. Syst., 26(1).

Armitage, L. H., & Enser, P. G. B. (1997). Analysis of user need in image archives. Journal of Information Science, 23(4), 287–299.

Geisler, G., Willard, G., & Whitworth, E. (2010). Crowdsourcing the indexing of film and television media. In Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem (p. 82:1–82:10). Silver Springs, MD, USA: American Society for Information Science. Retrieved from

Haslhofer, B., Jochum, W., King, R., Sadilek, C., & Schellner, K. (2009). The LEMO annotation framework: weaving multimedia annotations with the web. International Journal on Digital Libraries, 10(1), 15–32.

“Interoperable Annotations for the Arts and Humanities”, Colloque, Calenda, Publié le lundi 02 décembre 2013,

Lanagan, J., & Smeaton, A. F. (2012). Video digital libraries: contributive and decentralised. International Journal on Digital Libraries, 12(4), 159–178.
Lohmann, S., Díaz, P., & Aedo, I. (2011). MUTO: the modular unified tagging ontology (pp. 95–104). Presented at the 7th International Conference on Semantic Systems, New York, NY, USA: ACM.

Ruvane, M. B. (2005). Annotation as process: A vital information seeking activity in historical geographic research. Proceedings of the American Society for Information Science and Technology, 42(1), n/a-n/a.

Sanderson, R., Ciccarese, P., & Van de Sompel, H. (2013). Designing the W3C open annotation data model. In Proceedings of the 5th Annual ACM Web Science Conference (pp. 366–375). New York, NY, USA: ACM.

Unsworth, J. (2000). Scholarly Primitives: what methods do humanities researchers have in common, and how might our tools reflect this? “Humanities Computing: formal methods, experimental practice” symposium, May 13, 2000, London, UK

Walkowski, N.-O., & Barker, E. T. E. (2014). Digital humanists are motivated annotators. Presented at the Digital Humanities 2014, Laussane, Switzerland. Retrieved from

Westman, S. (2009). Image users’ needs and searching behaviour. In A. Göker & J. Davies (Eds.), Information Retrieval: Searching in the 21st Century; Human Information Retrieval (pp. 63–83). Chichester, UK: John Wiley & Sons, Ltd. Retrieved from http://doi/10.1002/9780470033647