In this work, the problem of the audibility of musical backgrounds in TV programs is studied. Background music is frequently inserted in many TV programs at low volume and often the audibility of such music can be compromised causing copyright disputes. After reviewing the state of the art and verifying its incipient state, the problem is faced considering the 3 levels of audibility suggested by the WIPO (World Intellectual Property Organization): Audible, Barely audible and Inaudible. The effort is focused on finding the thresholds between these levels based on objective sound descriptors supported by subjective evaluation tests. A set of controlled listening tests have been prepared in order to find the threshold between inaudible and audible. The samples have been prepared mixing music and voice tracks at different levels using realistic material and in controlled conditions as similar as possible to the television room of an average home. An analysis of the results reveals that the difference in integrated loudness between voice and music is the most defining factor in audibility, although the type of music also reveals a certain influence. To take this influence into account, various indicators related to the momentary loudness of the signal were tested, finally obtaining a highly correlated statistic. By means of a linear regression, an expression dependent on both parameters was obtained that provides a very stable final estimator and with a mean error with respect to the jury's mean of about 0.5 dB for the sound material tested. This result can serve as a basis for the elaboration of a recommendation in this field. For the case of broadcast analysis where voice and music are mixed, the new voice-music separation techniques based on deep learning neural networks allow resynthesizing both isolated tracks at the destination to apply the proposed algorithm.
http://www.aes.org/e-lib/browse.cfm?elib=21722