This was part of
Statistical and Computational Challenges in Probabilistic Scientific Machine Learning (SciML)
Semantic Information Pursuit
Rene Vidal, University of Pennsylvania
Monday, June 9, 2025
Abstract: In 1948, Shannon published a famous paper, which laid the foundations of information theory and led to a revolution in communication technologies. Critical to Shannon’s ideas was the notion that a signal can be represented in terms of “bits,” and that the information content of the signal can be measured by the minimum expected number of bits. However, while such a notion of information is well suited for tasks such as signal compression and reconstruction, it is not directly applicable modern AI applications involving images and text, because bits do not depend on the “semantic content” of the signal, such as words in a document, or objects in an image. In this talk, I will present a new measure of semantic information content called “semantic entropy”, which is defined as the minimum expected number of semantic queries about the data whose answers are sufficient for solving a given task (e.g., classification). I will also present an information-theoretic framework called ``information pursuit'' for deciding which queries to ask and in which order, which requires a probabilistic generative model relating data and questions to the task. Applications of AI to interpretable AI and medical diagnosis will also be presented