TY - GEN
T1 - A multimodal complexity comprehension-time framework for automated presentation synthesis
AU - Sridharan, Harini
AU - Mani, Ankur
AU - Sundaram, Hari
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2005
Y1 - 2005
N2 - In this paper, we present a joint multimodal (audio, visual and text) framework to map the informational complexity of the media elements to comprehension time. The problem is important for interactive multimodal presentations. We propose the joint comprehension time to be a function of the media Kolmogorov complexity. For audio and images, the complexity is estimated using a lossless universal coding scheme. The text complexity is derived by analyzing the sentence structure. For all three channels, we conduct user-studies to map media complexity to comprehension time. For estimating the joint comprehension time, we assume channel independence resulting in a conservative comprehension time estimate. The time for the visual channels (text and images) are deemed additive, and the joint time is then the maximum of the visual and the auditory comprehension times. The user studies indicate that the model works very well, when compared with fixed-time multimodal presentations.
AB - In this paper, we present a joint multimodal (audio, visual and text) framework to map the informational complexity of the media elements to comprehension time. The problem is important for interactive multimodal presentations. We propose the joint comprehension time to be a function of the media Kolmogorov complexity. For audio and images, the complexity is estimated using a lossless universal coding scheme. The text complexity is derived by analyzing the sentence structure. For all three channels, we conduct user-studies to map media complexity to comprehension time. For estimating the joint comprehension time, we assume channel independence resulting in a conservative comprehension time estimate. The time for the visual channels (text and images) are deemed additive, and the joint time is then the maximum of the visual and the auditory comprehension times. The user studies indicate that the model works very well, when compared with fixed-time multimodal presentations.
UR - http://www.scopus.com/inward/record.url?scp=33750573039&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750573039&partnerID=8YFLogxK
U2 - 10.1109/ICME.2005.1521603
DO - 10.1109/ICME.2005.1521603
M3 - Conference contribution
AN - SCOPUS:33750573039
SN - 0780393325
SN - 9780780393325
T3 - IEEE International Conference on Multimedia and Expo, ICME 2005
SP - 1042
EP - 1045
BT - IEEE International Conference on Multimedia and Expo, ICME 2005
T2 - IEEE International Conference on Multimedia and Expo, ICME 2005
Y2 - 6 July 2005 through 8 July 2005
ER -