Article
Authors: Ruobin Gong (Rutgers University) , Joseph B. Kadane (CMU) , Mark J. Schervish (CMU) , Teddy Seidenfeld (CMU) , Rafael B. Stern (University of São Carlos)
A familiar defense of Personalist or Subjective Bayesian theory is that, under a variety of sufficient conditions, asymptotically—with increasing shared evidence—almost surely, each non-extreme, countably additive Bayesian opinion, when updated by conditionalization, converges to certainty that is veridical about the truth/falsity of hypotheses of interest. Then, with probability 1 over possible evidential histories, personal probabilities track the truth. In this note we examine varieties of failures of these asymptotics. In an extreme case, conditional probabilities are deceptive when they converge to certainty for a false hypothesis. We establish that proposals for so-called “modest” credences, offered by Elga (2016) and by Nielsen and Stewart (2019) in response to a concern about Bayesian orgulity raised by Belot (2013), instead support deceptive credences. We argue that deceptive credences are not modest, but for a reason different than Belot adduces.
Keywords:
How to Cite: Gong, R. , Kadane, J. , Schervish, M. , Seidenfeld, T. & Stern, R. (2021) “Deceptive Credences”, Ergo an Open Access Journal of Philosophy. 7(0). doi: https://doi.org/10.3998/ergo.1125
In this note we continue an old discussion of some familiar results about the asymptotics of Bayesian updating (aka conditionalization^{1}) using countably additive^{2} credences. One such result (due to Doob 1953, with details reported in Section 2) asserts that, for each hypothesis of interest $H$, with the exception of a probability 0 “null” set of data sequences, the Bayesian agent’s posterior probabilities converge to the truth value of $H$. Almost surely, the posterior credences converge to the value 1 if $H$ is true, and to 0 if $H$ is false. So, with probability 1, this Bayesian agent’s asymptotic conditional credences are veridical: they track the truth of each hypothesis under investigation. This feature of Bayesian learning is often alluded to in a justification of Bayesian methodology, e.g., Lindley (2006: ch. 11) and Savage (1972: §3.6): Bayesian learning affords sound asymptotics for scientific inference.
In Section 3, we explore the asymptotic behavior of conditional probabilities when these desirable asymptotics fail and credences are not veridical. We identify and illustrate five varieties of such failures, in increasing severity. An extreme variety occurs when conditional probabilities approach certainty for a false hypothesis. We call these extreme cases episodes of deceptive credences, as the agent is not able to discriminate between becoming certain of a truth and becoming certain of a falsehood.^{3} Result 1 establishes a sufficient condition for credences to be deceptive. In Appendix A, we discuss four other, less extreme varieties when conditional probabilities are not veridical.
In Section 4 we apply our findings to a recent exchange prompted by Belot’s (2013) charge that familiar results about the asymptotics of Bayesian updating display orgulity: an epistemic immodesty about the power of Bayesian reasoning. In rebuttal, Elga (2016) argues that orgulity is avoided with some merely finitely additive credences for which the conclusion of Doob’s theorem is false. Nielsen and Stewart (2019) offer a synthesis of these two perspectives where some finitely additive credences display what they call (understood as a technical term) reasonable modesty, which avoids the specifics of Belot’s objection. Our analysis in Section 4 shows that these applications of finite additivity support deceptive credences. We argue that it is at least problematic to call deceptive credences “modest” in the ordinary sense of the word ‘modest’ when deception has positive probability.
For ease of exposition, we use a continuing example throughout this note. Consider a Borel space of possible events based on the set of denumerable sequences of binary outcomes from flips of a coin of unknown bias using a mechanism of unknown dynamics. The sample space consists of denumerable sequences of 0s (tails) and 1s (heads). The nested data available to the Bayesian investigator are the growing initial histories of length $n$, ${h}_{n}$, arising from one denumerable sequence of flips, which corresponds to the unknown state. The class of hypotheses of interest are the elements of the Borel space generated by such histories.
For example, an hypothesis of interest $H$ might be that, with the exception of some finite initial history, the observed relative frequency of 1s remains greater than 0.5, regardless whether or not there is a well-defined limit of relative frequency for heads. Doob’s result, which we review below, asserts that for the Bayesian agent with countably additive credences $\text{P}$ over this Borel space, with the exception of a P-null set of possible sequences, her/his conditional probabilities, $\text{P(}H\text{\hspace{0.17em}}|\text{\hspace{0.17em}}{h}_{n}\text{)}$ converge to the truth value of $H$.
Consider the following, strong-law (countably additive) version of the Bayesian asymptotic approach to certainty, which applies to the continuing example of denumerable sequences of 0s and 1s.^{4} The assumptions for the result that we highlight below involve the measurable space, the hypothesis of interest, and the learning rule.
The measurable space $<\mathbf{\text{X}},\mathbf{\text{B}}>$. Let ${X}_{i}\text{\hspace{0.17em}}(i=1,\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.)$ be a denumerable sequence of sets, each equipped with an associated, atomic $\sigma $-field ${B}_{i}$, where if ${x}_{i}\in {X}_{i}$ then $\left\{{x}_{i}\right\}\in {\mathit{\U0001d4d1}}_{i}$. That is, the elements of ${X}_{i}$ are the atoms of ${B}_{i}$. ${X}_{i}$ is the state-space and ${B}_{i}$ is the set of the measurable events for the i^{th} experiment. Form the infinite Cartesian product $\mathbf{X}={X}_{1}\text{\hspace{0.17em}}\times \text{\hspace{0.17em}}{X}_{2}\times \text{\hspace{0.17em}}\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}$ of all sequences $x=x({x}_{1},\text{\hspace{0.17em}}{x}_{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.)$, where ${x}_{i}\in {X}_{i}$. The $\sigma $-field $\mathbf{\text{B}}$ is generated by the measurable rectangles from $\mathbf{\text{X}}$: the sets of the form $A={A}_{1}\times {A}_{2}\times \text{\hspace{0.17em}}\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.$ where ${A}_{i}\in {\mathit{\U0001d4d1}}_{i}$ and ${A}_{i}={X}_{i}$ for all but finitely many values of $i$. $\mathbf{\text{B}}$ is the smallest $\sigma $-field containing each of the individual ${B}_{i}$. As $\left\{{x}_{i}\right\}\in {\mathit{\U0001d4d1}}_{i}$ for each ${x}_{i}\in {X}_{i}$, also $\mathbf{\text{B}}$ is atomic with atoms the sequences $x$.
Each hypothesis of interest $H$ is an element of $\mathbf{\text{B}}$. That is, in what follows, the result about asymptotic certainty applies to an hypothesis $H$ provided that it is “identifiable” with respect to the $\sigma $-field, $\mathbf{\text{B}}$, generated by finite sequences of observations.^{5} These finite sequences constitute the observed data.
We are concerned, in particular, with tracking the nested histories of the initial $n$ experimental outcomes:
That is, for $x=({x}_{1},{x}_{2},\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.)\in \mathbf{\text{X}}$, let ${h}_{n}(x)=({x}_{1},\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.,{x}_{n})$ be the first $n$-terms of $x$.
The probability assumptions. Let $\text{P}$ be a countably additive probability over the measurable space $<\mathbf{\text{X}},\mathbf{\text{B}}>$, and assume there exist well-defined conditional probability distributions over hypotheses $H\in \mathbf{\text{B}}$, given the histories ${h}_{n}:\text{P}(H|{h}_{n})$, $n=1,\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.$.
The learning rule for the Bayesian agent: Consider an agent whose initial (“prior”) joint credences are represented by the measure space $<\mathbf{\text{X}},\mathbf{\text{B}},\text{P}$. Let ${\text{P}}^{n}$ be this agent’s (“posterior”) credences over $<\mathbf{\text{X}},\mathbf{\text{B}}$ having learned the history ${h}_{n}$.
Bayes’ Rule for updating credences requires that ${\text{P}}^{n}(H)=\text{P}(H|{h}_{n})$.
The result in question, which is a substitution instance of Doob’s (1953: T.7.4.1), is as follows:
For $H\in \mathbf{\text{B}}$, let ${\mathbf{\text{I}}}_{H}:\mathbf{\text{X}}\to \text{{0,1}}$ be the indicator for $H$. ${\mathbf{\text{I}}}_{H}(x)=1$ if $x\in H$ and ${\mathbf{\text{I}}}_{H}(x)=0$ if $x\notin H$. The indicator function for $H$ identifies the truth value of $H$.
Asymptotic Bayesian Certainty: For each $H\in \mathbf{\text{B}}$,
In words, subject to the conditions above, the agent’s credences satisfy asymptotic certainty about the truth value of the hypothesis $H$. For each measurable hypothesis $H$, and with respect to a set ${S}_{H}$ of infinite sequences $x$ that has “prior” probability 1, for each $x$ in ${S}_{H}$her/his sequence of “posterior” opinions about $H$, $\text{P}(\text{H}|{h}_{n}(x))$, converges to probability 1 or 0, respectively, about the truth or falsity of $H$.
To summarize: For each $x$ in ${S}_{H}$, as $n\to \infty $, the sequence of conditional probabilities, $\text{P}(\text{H}|{h}_{n}(x))$, asymptotically correctly identifies the truth of $H$ or of ${H}^{c}$ by converging to 1 for the true hypothesis in this pair. In this sense, asymptotically, the Bayesian agent learns whether $H$ or ${H}^{c}$ obtains.
Definition: Call an element $x$ of $\text{X}$ a veridical state if $\text{P}(\text{H}|{h}_{n}(\text{x}))$ converges to ${\mathbf{\text{I}}}_{H}(x)$.^{6}
In other words, the non-veridical states constitute the failure set for Doob’s result.
Next, we examine details of conditional probabilities given elements of the failure set, even when the agent’s credences are countably additive and the other assumptions in Doob’s result obtain. Specifically, consider the countably additive Bayesian agent’s conditional probabilities, $\text{P}(H|{h}_{n})$, in sequences of histories that are generated by points $x$ in the failure set, ${S}_{H}^{c}$—the complement to the distinguished set of veridical states. It is important, we think, to distinguish different varieties of non-veridical states within the failure set.
At the opposite pole from the veridical states, the states in ${S}_{H}$ —states whose conditional probabilities converge to the truth about $H$—are states whose histories create conditional probabilities that converge to certainty about the false hypothesis in the pair $\{H,{H}^{c}\}$.
Define $x$ as a deceptive state for hypothesis $H$ if $\text{P}(H|{h}_{n}(x))$ converges to $1-{\mathbf{\text{I}}}_{H}(x)$.
For deceptive states, the agent’s sequence of posterior probabilities also creates asymptotic certainty. This sense of certainty is introspectively indistinguishable to the investigator from the asymptotic certainty created by veridical states, where asymptotic certainty identifies the truth. Thus, to the extent that veridical states provide a defense of Bayesian learning—the observed histories ${h}_{n}(x)$ move the agent’s subjective “prior” for $H$ towards certainty in the truth value of $H$—deceptive states move the agent’s subjective credences towards certainty for a falsehood. Thus, for the very reasons that states in ${S}_{H}$ underwrite a Bayesian account of Bayesian learning of $H$, deceptive states frustrate such a claim about $H$. Then, Doob’s result serves a Bayesian’s need provided that the Bayesian agent is satisfied that, with probability 1, the actual state is veridical rather than deceptive with respect to the hypothesis of interest.
When the failure set for an hypothesis $H$ is deceptive, then the investigator’s credences about $H$ converge to 0 or to 1 for all possible data sequences. But this convergence is logically independent of the truth of $H$ since the investigator is unable to distinguish veridical from non-veridical data histories.
Less problematic than being deceptive, but nonetheless still challenging for a Bayesian account of objectivity, is a non-deceptive state $x$ where for each $\epsilon >0$, infinitely often
Then, with respect to hypothesis $H$, infinitely often $x$ induces non-veridical conditional probabilities that mimic those from a deceptive state.
Definition: Call a state $x$ that satisfies Equation (1) intermittently deceptive for hypothesis $H$.
Definition: Consider a non-veridical state where, for each $\epsilon >0$, infinitely often $\text{|P}(H|{h}_{n}(\text{x}))-{\mathbf{\text{I}}}_{H}(x)|<\epsilon $. Call such a state intermittently veridical for hypothesis $H$.
Within the failure set for an hypothesis, the following partition of non-veridical states appears to us as increasingly problematic for a defense of Bayesian methodology, in the sense that seeks asymptotic credal certainty about the truth value of the hypothesis driven by Bayesian learning. In this list, we prioritize avoiding deception over obtaining veridicality:^{7}
states that are intermittently veridical but not intermittently deceptive;
states that are neither intermittently veridical nor intermittently deceptive;
states that are both intermittently veridical and intermittently deceptive^{8};
states that are intermittently deceptive but not intermittently veridical;
states that are deceptive.
We find it helpful to illustrate these categories within the continuing example of sequences of binary outcomes. Consider the set of denumerable, binary sequences: $\mathbf{\text{X}}=\{x:{\mathbf{\text{N}}}^{+}\to \{0,1\left\}\right\}$. That is, in terms of the structural assumptions in Doob’s result, ${X}_{i}=\{0,1\}$; each ${\mathit{\U0001d4d1}}_{i}$ is the 4-element algebra $\{\varnothing ,\{0\},\{1\},\{0,1\left\}\right\}$, for $i=1,2,.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.$; and the inclusive $\sigma $-field $\mathbf{\text{B}}$ is the Borel $\sigma $-algebra generated by the product of the ${\mathit{\U0001d4d1}}_{i}$.
First, if $H$ is defined by finitely many coordinates of $x$ (a finite dimensional rectangular event) then ${\text{P}}^{n}(H)$ converges to the indicator function for $H$, ${\mathbf{\text{I}}}_{H}$, after only finitely many observations. Then ${S}_{H}=\mathbf{\text{X}}$ and all states are veridical. That is, there is no sequence where the conditional probabilities ${\text{P}}^{n}(H)$ fail to converge to ${\mathbf{\text{I}}}_{H}$. Moreover, this situation obtains regardless whether $\text{P}$ is countably or merely finitely additive, provided solely that $\text{P(}E|{h}_{n})$ is a conditional probability that satisfies the following propriety condition: $\text{P(}B|A)=1$ whenever $\varnothing \ne A\subseteq B$.
Next, consider an hypothesis that is logically independent of each finite dimensional rectangular event, an hypothesis that is an element of the tail $\sigma $-sub-field of $\mathit{\U0001d4d1}$. For instance, note that each sequence $x$ has a well-defined lim inf $L(x)$ and lim sup $U(\text{x})$ of the relative frequency for the digit 1. For $0\le l\le u\le 1$, let $<l,u>=\{x:L(x)=l,\text{\hspace{0.17em}}U(x)=u\}$. The collection $\{<l,u>:0\le l\le u\le 1\}$ of all such sets is a partition of $\mathbf{\text{X}}$ into $\mathbf{\text{B}}$-measurable events, each of which has cardinality of the continuum. Figure 1, below, graphs these points in the isosceles right triangle with corners $<0,0>$, $<1,1>$ and $<0,1>$.
Let $H$ be the subset of $\mathbf{\text{X}}$ of sequences with a well-defined limit of relative frequency for the digit 1. In Figure 1, $H$ corresponds to the set of ordered pairs $<l,u>$ with $l=u$, the (solid blue) line of points along the main diagonal.
For a countably additive personal probability that satisfies de Finetti’s (1937) condition of exchangeability, this subset $H$ of $\mathbf{\text{X}}$ has personal “prior” probability 1, $\text{P}(H)=1$. Also, assume for convenience that this probability $\text{P}$ is not extreme within the class of exchangeable probabilities: $0<\text{P}(\{1\})<1$. Then for each sequence $x$ in $\mathbf{\text{X}}$, $\text{P}({h}_{n}(x))>0$, and trivially, also $\text{P}(H|{h}_{n}(x))=1$. For the result on asymptotic Bayesian certainty, then ${S}_{H}=H$. However, on the complementary set, for $x\in {S}_{H}^{c}$ the conditional probabilities satisfy: $\text{P}(H|{h}_{n}(x))=1$; hence, each $x\in {S}_{H}^{c}$ is deceptive: category $\left(\text{E}\right)$. Moreover, under these conditions, when a state is not veridical then it is deceptive: the posterior probability converges to $1-{\mathbf{\text{I}}}_{H}(x)$.
Definition: Call a failure set ${S}_{H}^{c}$ deceptive if each state in the failure set is deceptive for $H$.
Also, in this case we say that the associated credence for $\text{H}$ is deceptive.
We summarize this elementary finding as follows:
Result 1 Suppose that the credence function treats each possible initial history ${h}_{n}$ as not “null”: $\text{P}({h}_{n}(x))>0$. Then for each hypothesis $H(\ne \mathrm{\Omega})$ for which $\text{P}(H)=1$, the failure-set for $H$ is not empty and deceptive.
Moreover, if the space is uncountable, so that there is an uncountable partition of the space each of whose elements is an uncountable set, as depicted in Figure 1, then we have the following as well:
Corollary For each finitely additive probability $\text{P}$ on a space of denumerable sequences of (logically independent) random variables, where each initial history ${h}_{n}$ is not “null,” there exists an hypothesis $H$, with $\text{P}(H)=1$, whose failure set ${S}_{H}^{c}$ is an uncountable set, and that failure set is deceptive.
The non-veridical states, $x\in {S}_{H}^{c}$, can populate each of the other four categories, (A)–(D). We discuss these in Appendix A.
Next, we apply these findings to a recent debate about what Belot (2013) alleges is mandatory Bayesian orgulity. We understand Belot’s meaning as follows. For a Bayesian agent who satisfies, e.g., the conditions for Doob’s result, the set of samples where the desired asymptotic certainty fails for an hypothesis $H$ (the so-called “failure set” for $H$) has probability 0. Nonetheless, this failure set may be a “large” or “typical” event when considered from a topological perspective. Specifically, the failure set may be comeager with respect to a privileged product topology for the measurable space of data sequences. As we understand Belot’s criticism, such a Bayesian suffers orgulity because she/he is obliged by the mathematics of Bayesian learning to assign probability 0 to the possible evidence where the desired asymptotic result fails, even when this failure set is comeager.
In a (2016) reply to Belot’s analysis, A. Elga focuses on the premise of countable additivity in Doob’s result. Countable additivity is required in neither Savage’s (1972) nor de Finetti’s (1974) theories of Bayesian coherence. Elga gives an example of a merely finitely additive (and not countably additive) probability over denumerable binary sequences and a particular hypothesis $H$ where with positive probability (in fact, with probability 1) the investigator’s posterior probability fails to converge to the indicator function for $H$. So, not all finitely additive coherent Bayesians display orgulity.
M. Nielsen and R. Stewart (2019) extend the debate by explicating what they understand to be Belot’s rival account of reasonable modesty of Bayesian conditional probabilities. They offer a reconciliation of Elga’s rebuttal and Belot’s topological perspective. For Nielsen and Stewart, a credence function is modest for an hypothesis $H$ provided that it gives (unconditional) positive probability to the failure set for the convergence of posterior probabilities to the indicator function for $H$. By this account, each credence in the class of countably additive credences is immodest over all hypotheses that are the subject of the asymptotic convergence result but have non-empty failure sets. Since requiring modesty for all such hypotheses is too strong of a condition even for (merely) finitely additive credences—as per the Corollary to Result 1, above—Nielsen and Stewart propose a standard of reasonable modesty. This condition requires modesty solely for failure sets that are typical in the topological sense, for some privileged topology.
With their Propositions 1 and 2, Nielsen and Stewart point out that there exist a class of merely finitely additive credences (with cardinality of the continuum) such that each credence function in this class assigns unconditional positive probability (even probability 1) to each comeager set. Then, such a credence displays reasonable modesty for each failure set that is “typical.”
Below, we show that the reasonably modest credences that Nielsen and Stewart point to with their Proposition 1, nonetheless, mandate deceptive failure sets for specific hypotheses. And as we explain (in Appendix B), Nielsen and Stewart’s Proposition 2 provide reasonably modest credences in their technical sense at the price of making it impossible to learn about hypotheses that concern unobserved parameters, in all familiar statistical models.
First we argue that this sense of “modesty” is mistaken when deception is not a null event, regardless whether the modesty is reasonable or not. When the investigator’s credences are merely finitely additive, with respect to a particular hypothesis the failure set for Doob’s result may have positive prior probability, as is well known.^{9} In such cases, the investigator’s credences are called modest according to Nielsen and Stewart. Suppose, further, that such a modest credence also has a deceptive failure set. Then, each state is either veridical or deceptive. But the investigator behaves just as though asymptotic certainty tracks the truth. That is, the fact that the set of deceptive states (for a particular hypothesis) has positive probability—$\text{P}\left({S}_{H}^{c}\right)>0$ rather than $\text{P}\left({S}_{H}^{c}\right)=0$—the fact that the investigator’s credence is “modest,” is irrelevant to the investigator’s decision making. Here is why.
Let $H$ be an hypothesis, and suppose that each state is either veridical for $H$ or deceptive for $H$. Then, for each state $x$, the sequence $\{\text{P}(H|{h}_{n}(x)):n=1,2,\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\}$ converges to 1 if and only if either $x$ is veridical and in $H$, or if $x$ is deceptive and in ${H}^{\text{c}}$. And $\{\text{P}(H|{h}_{n}(x)):n=1,2,\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\}$ converges to 0 if and only if $x$ is veridical and in ${H}^{\text{c}}$, or if $x$ is deceptive and in $H$. Hence, the investigator becomes asymptotically certain about the truth of $H$ no matter what data are observed. This analysis holds regardless of what prior probability the investigator assigns to $H$ and regardless how probable is the failure set. The modesty of $\text{P}$ for $H$, namely that $\text{P}\left({S}_{H}^{c}\right)>0$, is irrelevant to this conclusion. And so too, it is irrelevant to this conclusion whether the modesty of $\text{P}$ for $H$ is reasonable or not. It is irrelevant whether ${S}_{H}^{c}$ is a comeager set or not.
To put this analysis in behavioral terms, suppose the Bayesian investigator faces a sequence of decisions. These decisions might be practical, with cardinal utilities that reflect economic or legal, or ethical consequences. Or, these decisions might be cognitive with epistemically motivated utilities, e.g., for desiring true hypotheses over false ones, or for desiring more informative over less informative hypotheses. Or, these might form a mixed sequence of decisions, with some practical and some cognitive. Suppose each decision in this sequence rides on the probability for one specific hypothesis $H$ and, regarding the corresponding sequence of Bayesian conditional probabilities for $H$ that parallel these decisions, the investigator’s credence is deceptive for $H$. Then, asymptotically, the investigator’s sequence of decisions will be determined by the asymptotic certainty—the conditional credence for $H$ of 0 or 1—that surely results, no matter which sequence of observations obtains. But if also the investigator has a positive unconditional probability for deception, this “modesty” plays no role in her/his sequence of decisions. The “modesty” reported by her/his unconditional probability of deception, $\text{P}\left({S}_{H}^{c}\right)>0$, be it a large or a small positive probability, is irrelevant to the sequence of decisions that she/he makes. When a failure set is both deceptive and non-null, the Bayesian investigator ignores this in her/his decision making, treating all certainties alike. Just as if $\text{P}\left({S}_{H}^{c}\right)=0$. We do not agree, then, that the investigator’s credences are modest for hypothesis $H$ when the failure set is deceptive and $\text{P}\left({S}_{H}^{c}\right)>0$.
One example in which the conditions of this analysis hold was given by Elga (2016) and is an instance of our continuing example about binary sequences. In Elga’s example, $H$ is the hypothesis that the binary sequence satisfies $L(x)=U(x)=.9$. In his example the failure set ${S}_{H}^{c}$ is deceptive with probability 1, i.e., $\text{P{}x\text{:}x\text{isdeceptivefor}H\text{}=1}$.^{10}
A large class of examples of this kind arise by using Proposition 1 of Nielsen and Stewart. Here is how Proposition 1 applies to the continuing example of the Borel space, $\mathit{\U0001d4d1}$, of binary sequences on $\text{{0,1}}$. Let ${\text{P}}_{1}$ be a non-extreme, exchangeable countably additive probability. That is, in addition to being an exchangeable probability, for each finite initial history, i.e., for each of the ${2}^{n}$ possible sequences ${h}_{n}$, and for each $n=1,2,\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.\text{\hspace{0.17em}}.$, then ${\text{P}}_{1}({h}_{n})>0$. By Doob’s result, ${\text{P}}_{1}$ is not modest (in Nielsen and Stewart’s sense) because, for each hypothesis $H$ its failure set is ${\text{P}}_{1}-\text{null}$, ${\text{P}}_{1}\left({S}_{H}^{c}\right)=0$. Let ${\text{P}}_{2}$ be a finitely additive, 0–1 (“ultrafilter”) probability with the property that if E is a comeager set in $\mathit{\U0001d4d1}$, then ${\text{P}}_{2}\text{(}E\text{)=1}$.^{11} Fix $0<y<1$and define $\text{P}=y{\text{P}}_{1}+(1-y){\text{P}}_{2}$, the $y:(1-y)$ mixture of these two probabilities.
Nielsen and Stewart’s Proposition 1 establishes that $\text{P}$ is reasonably modest, since for each hypothesis $H$, if the failure set ${S}_{H}^{c}$ is comeager, then $\text{P}\left({S}_{H}^{c}\right)>0$. However, as we show next, Proposition 1 creates reasonably modest credences that, in the Continuing Example, have failure sets for specific hypotheses that have positive probability, are comeager, and are deceptive.
Result 2 In the continuing example, let $H$ be the hypothesis that the binary sequence belongs to the set of maximally chaotic relative frequencies, corresponding to the (red) point $<0,1>$ in Figure 1. This is the set of sequences with lim inf $\text{(relfreq\u201c1\u201d)=0}$ and lim sup $\text{(relfreq}\u201c1\u201d)=1$. Then the failure set for $H$ under $P$, ${S}_{H}^{c}$, has positive probability, $\text{P}\left({S}_{H}^{c}\right)=(1-y)>0$, is comeager, and is deceptive.
Proof: Because both ${\text{P}}_{1}(H)=0$ and for each history ${h}_{n}$, ${\text{P}}_{1}({h}_{n})>0$, then ${\text{P}}_{1}(H|{h}_{n})=0$.
Under ${\text{P}}_{2}$ there is a distinguished binary sequence ${x}_{{P}_{2}}$ in the following sense. The finite initial histories form a binary branching tree: for each $n$ there are ${2}^{n}$ distinct histories ${h}_{n}$. Because ${\text{P}}_{2}$is an “ultrafilter” distribution, then for each $n$ and for each possible finite initial history ${h}_{n}$ of length $n$, ${\text{P}}_{2}({h}_{n})=0$ or ${\text{P}}_{2}({h}_{n})=1$. So, there is one and only one sequence ${x}_{{P}_{2}}$ where, for each $n$,
${\text{P}}_{2}\left({h}_{n}\left({x}_{{P}_{2}}\right)\right)=1$.^{12} That is, for each sequence ${x}^{\prime}\ne {x}_{{P}_{2}}$ there exists an $m$ such that for all $n>m$,
${\text{P}}_{2}({h}_{n}(x\text{'}))=0$. Thus, for each ${x}^{\prime}\ne {x}_{{\text{P}}_{2}}$ there exists an $m$ such that for all $n>m$,
$P(H|{h}_{n}({x}^{\text{'}}))={P}_{1}(H|{h}_{n}({x}^{\text{'}}))=0$.^{13}
Specifically, the failure set ${S}_{H}^{c}$ is either the set $H-\left\{{x}_{{P}_{2}}\right\}$ (if the sequence ${x}_{{P}_{2}}$ belongs to $H$), or it is the set $H\cup \left\{{x}_{{P}_{2}}\right\}$ (if the sequence ${x}_{{P}_{2}}$ belongs to ${H}^{\text{c}}$ ). In either case, the failure set ${S}_{H}^{c}$ is deceptive for $H$. According to Cisewski et al. (2018) $H$ is a comeager set. Evidently then, ${S}_{H}^{c}$ is a comeager set where $\text{P(}{S}_{H}^{c})=(1-y){\text{P}}_{2}({S}_{H}^{c})=(1-y){\text{P}}_{2}(H)=(1-y)>0$.^{14}_{$\text{QED}$}
We emphasize that certainty with deception is indistinguishable from certainty that is veridical. In the context of Result 2, the investigator can tell when the observed history ${h}_{n}$ differs from the history that would be observed in the one distinguished sequence, ${h}_{n}\left({x}_{{P}_{2}}\right)$. But that recognition provides no basis for altering the certainty, $\text{P}(H|{h}_{n})=0$, that results once the observed history departs from the distinguished one, once ${h}_{n}\ne {h}_{n}\left({x}_{{P}_{2}}\right)$. Regardless the magnitude of the (unconditional) probability of deception, $\text{P}\left({S}_{H}^{c}\right)$, the investigator cannot identify when certainty is deceptive rather than when it is veridical. Her/his conditional credence function, $\text{P}(\text{\hspace{0.17em}}\cdot |{h}_{n})$, already takes into account the total evidence available. Certainty is certainty, full stop.
We have argued above that a credence $\text{P}$ is not epistemically modest where there is an hypothesis $H$ that has a deceptive failure set ${S}_{H}^{c}$ that is not P-null. Then, in the continuing example, each probability $\text{P}$ created according to Proposition 1 fails this test of epistemic modesty.
In Summary, it is our view that having a positive probability over non-veridical states is not sufficient for creating an epistemically modest credence because categories (D) or (E) may have positive prior probability as well. Indeed, in the continuing example, each probability $\text{P}$ created according to Proposition 1 fails this test of epistemic modesty.
When the failure set for an hypothesis is deceptive and not null, that is in conflict with an attitude of epistemic modesty about learning that hypothesis.
Regarding the asymptotics of Bayesian certainties, e.g., Doob’s result, neither of Nielsen and Stewart’s concepts of modesty, nor reasonable modesty distinguishes deceptive from other varieties of failure sets. According to Result 2, in the Continuing Example each credence $\text{P}$ that satisfies Nielsen and Stewart’s Proposition 1 admits an hypothesis whose failure set is P-non-null, comeager, and deceptive.
We thank two anonymous referees for their constructive feedback. Research for this paper was supported by NSF grant DMS-1916002.
Belot, Gordon (2013). Bayesian Orgulity. Philosophy of Science, 80(4), 483–503.
Cisewski, Jessica, Kadane, J. B., Schervish, M. J., Seidenfeld, T., and Stern, R. B. (2018). Standards for Modest Bayesian Credences. Philosophy of Science, 85(1), 53–78.
de Finetti, Bruno (1937). Foresight: Its Logical Laws, Its Subjective Sources. In Kyburg, Henry E. Jr. and Smokler, Howard E. (Eds.), Studies in Subjective Probability (1964). John Wiley.
de Finetti, Bruno (1974). Theory of Probability (Vol. 1). John Wiley.
Doob, Joseph L. (1953). Stochastic Processes. John Wiley.
Elga, Adam (2016). Bayesian Humility. Philosophy of Science, 83(3), 305–23.
James, William (1896). The Will to Believe. The New World, 5, 327–47. Reprinted in his (1962) Essays on Faith and Morals. The World.
Kadane, Joseph B., Schervish, M. J., and Seidenfeld, T. (1996). Reasoning to a Foregone Conclusion. Journal of the American Statistical Association, 91(435), 1228–36.
Lindley, Dennis V. (2006). Understanding Uncertainty. John Wiley.
Nielsen, Michael, and Stewart, R. (2019). Obligation, Permission, and Bayesian Orgulity. Ergo, 6(3), 58–70.
Savage, Leonard J. (1972). The Foundations of Statistics (2nd rev. ed.). Dover.
Schervish, Mark J. and Seidenfeld, T. (1990) An Approach to Certainty and Consensus with Increasing Evidence. Journal of Statistical Planning and Inference, 25(3), 401–14.
Weatherson, Brian (2015) For Bayesians, Rational Modesty Requires Imprecision. Ergo, 2(20), 529–45.
Here, we discuss and illustrate categories (A)–(D) of failure sets using the continuing example. Restrict the exchangeable “prior” probability $\text{P}$ so that, in terms of de Finetti’s Representation Theorem, the “mixing prior” for the Bernoulli parameter is smooth, e.g., let it be the uniform $U[0,1]$. Choose $0<c<d<1$ and consider the hypothesis $H=\{x:c\le L(x)\le U(x)\le d\}$. So, with the “uniform” prior, $\text{P}(H)=d-c$; so, $1>\text{P}(H)>0$.
The set of veridical states for this credence and hypothesis includes each sequence where,
either $c<\text{L}(x)\le \text{U}(x)<d$ —in which case $H$ obtains and ${\mathit{lim}}_{n\to \infty}\text{P}(H|{h}_{n})=1$;
or, either $U(x)<c$ or $L(x)>d$ —in which case ${H}^{\text{c}}$ obtains and ${\mathit{lim}}_{n\to \infty}\text{P}(H|{h}_{n})=0$.^{15}
The non-veridical states (the failure set) ${S}_{H}^{c}$, the set of sequences where $\text{P}(H|{h}_{n}(x))$ does not converge to the indicator ${\mathbf{\text{I}}}_{H}(x)$, include states $x$ such that $L(x)<c<U(x)$ or $L(x)<d<U(x)$. For such a state $\text{x}$, $\text{P}(H|{h}_{n}(x))$ fails to converge and
$\mathit{lim}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathit{inf}\text{\hspace{0.17em}}\text{P}(H|{h}_{n}(x))=0$ and $\mathit{lim}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathit{sup}\text{\hspace{0.17em}}\text{P}(H|{h}_{n}(x))=1$.
Then $x$ is both intermittently veridical and intermittently deceptive for $H$—category (C).
In order to illustrate the other three categories of non-veridical states, (A), (B), and (D), the following adaptation of the previous construction suffices. Depending upon which category is to be displayed, consider a state $x$ such that the likelihood ratio $\text{P}({h}_{n}(x)|H)/\text{P}({h}_{n}(x)|{H}^{c})$ oscillates with suitably chosen bounds, in order to have the sequence of posterior odds,
We illustrate category (A) using the same hypothesis $H=\{x:c\le L(x)\le U(x)\le d\}$ and credence as above. For a non-veridical state in category (A), consider a sequence $x$ such that both:
$c<U(x)<d$. Then $x$ is intermittently veridical as, infinitely often, the relative frequency of ‘1’ falls strictly between $c$ and $d$, and
$L(x)=c$ but there exists $0<\mathrm{\rho}<\infty $, where for only finitely many values of $n$,
$\text{P}({h}_{n}(x)|H)/\text{P}({h}_{n}(x)|{H}^{\text{c}})]<\mathrm{\rho}$ —so that $x$ is not intermittently deceptive;
and infinitely often $\text{P}({h}_{n}(x)|H)/\text{P}({h}_{n}(x)|{H}^{\text{c}})]=\mathrm{\rho}$ —so that $x$ is not veridical.
In this appendix we consider Nielsen and Stewart’s Proposition 2, and related approaches for creating a reasonably modest credence, $\text{P}\text{'}$. We adapt Proposition 2 to the continuing example of the Borel space of denumerable binary sequences. Consider a finitely additive probability $\text{P}\text{'}$ on the space of binary sequences in accord with Nielsen and Stewart’s Proposition 2, where
(i) $\text{P}\text{'}({h}_{n})>0$ for each possible finite initial history;
and (ii) $\text{P}\text{'}(E)=1$, whenever $E$ comeager.
Nielsen and Stewart’s Proposition 2 asserts that, however $\text{P}\text{'}$ is defined on the field of finite initial histories, which space we denote by $\mathit{\U0001d4d0},\mathit{\U0001d4d0}\subset \mathit{\U0001d4d1}$, then $\text{P}\text{'}$ may be extended to a finitely additive probability that is extreme with respect to the field of comeager and meager sets in $\mathit{\U0001d4d1}$. For example, if ${\text{P}}_{1}$ is a countably additive probability on $\mathit{\U0001d4d1}$, then $\text{P}\text{'}$ might agree with ${\text{P}}_{1}$ on $\mathit{\U0001d4d0}$, while $\text{P}\text{'}(\text{E})=1$ if $\text{E}$ is a comeager set. Then, $\text{P}\text{'}$ is reasonably modest in the technical sense used by Nielsen and Stewart since, whenever a failure set ${S}_{H}^{c}$ is comeager, ${\text{P}}^{\prime}\left({S}_{H}^{c}\right)=1$.
We do not know whether the conclusion of Result 2 extends also to the reasonably modest credences $\text{P}\text{'}$ created according to the technique of Proposition 2. For instance, we do not know, for a general $\text{P}\text{'}$, when an hypothesis $H$ has a deceptive failure set ${S}_{H}^{c}$ with ${\text{P}}^{\prime}\left({S}_{H}^{c}\right)>0$. Evidently, we are unwilling to grant that a credence satisfying Proposition 2 is epistemically modest about learning an hypothesis $H$ merely because ${\text{P}}^{\prime}\left({S}_{H}^{c}\right)>0$ whenever ${S}_{H}^{c}$ is a comeager set.
However, there is a second issue that tells against the technique of Proposition 2 for creating reasonable modesty. In Proposition 1, probability values from the immodest countably additive credence ${\text{P}}_{1}$for events in the tail field of $\mathit{\U0001d4d1}$ are relevant to the values that the reasonably modest credence $\text{P}$ gives these events. And, as ${\text{P}}_{1}$is countably additive, the ${\text{P}}_{1}$ probability values for tail events are approximated by ${\text{P}}_{1}$ values in $\mathit{\U0001d4d0}$. In short, under the method used in Proposition 1, ${\text{P}}_{1}$ probability values for events in $\mathit{\U0001d4d0}$ constrain the reasonably modest values of $\text{P}\left({S}_{H}^{c}\right)$. However, in Proposition 2 the P_{1} values in $\mathit{\U0001d4d0}$ are not relevant to the $\text{P}\text{'}$-values for events in the tail field. In Proposition 2, the $\text{P}\text{'}$ probability values are stipulated to be extreme for comeager sets, regardless how the $\text{P}\text{'}$-credences are assigned to the elements of the observable $\mathit{\U0001d4d0}$. The upshot is that with $\text{P}\text{'}$ credences the investigator is incapable of learning about comeager sets based on Bayesian learning from finite initial histories.
With respect to the continuing example, Cisewski et al. (2018) establish that the set of sequences corresponding to the one point $<0,1>$ in Figure 1 is comeager. Thus, in order to assign a prior probability 1 to each comeager set, this agent is required to hold an extreme credence that the sequence has maximally chaotic relative frequencies: $\text{P}\text{'}\{x:x\in <0,1>\}=1$.
As above, let the hypothesis of interest be $H=\text{{}x\text{:}x\in <0,1>\text{}}$: the hypothesis that the sequence has maximally chaotic relative frequencies. Then Result 1 obtains as $\text{P}\text{'}\text{}(H)=1$ and $\text{P}\text{'}\text{}(H|{h}_{n})=1$ for each $n=1,2,\text{}\text{\hspace{0.17em}}\text{.}\text{\hspace{0.17em}}\text{.}\text{\hspace{0.17em}}\text{.}$. No matter what the agent observes, her/his posterior credence about $H$ remains extreme. With credence $\text{P}\text{'}$, the failure set for $H$ is the meager set (hence a $\text{P}\text{'}$-null set) of continuum many states corresponding to each point in Figure 1 other than the corner $\text{<0,1>}$. Each point in the failure set for $H$ is deceptive: the failure set ${S}_{H}^{c}$ is deceptive!^{16} On what basis do Nielsen and Stewart dismiss the deceptiveness of ${S}_{H}^{c}$ as irrelevant to the question whether $\text{P}\text{'}$ is an appropriate credence for investigating statistical properties of binary sequences? We speculate their answer is, solely, that the failure set ${S}_{H}^{c}$ is meager.
Propositions 1 and 2 do not exhaust the varieties of finitely additive probabilities that assign positive probability to each comeager set in $\mathit{\U0001d4d1}$. For instance, one may recombine the techniques from these two Propositions as follows.
Let ${\text{P}}_{1}$ be an (immodest) countably additive probability on $\mathit{\U0001d4d1}$ that assigns positive probability to each finite initial history. Let ${\text{P}}_{2}$ be a finitely additive probability defined on $\mathit{\U0001d4d1}$ obtained by the technique of Proposition 2, but where ${\text{P}}_{1}$ and ${\text{P}}_{2}$ agree on $\mathit{\U0001d4d0}$. So, ${\text{P}}_{2}\text{(}H\text{)=1}$, for the hypothesis $H$ that the sequence is maximally chaotic. Then, in the spirit of Proposition 1, define ${\text{P}}_{3}$ as a (non-trivial) convex combination of ${\text{P}}_{1}$ and ${\text{P}}_{2}$: let $0<y<1$ and define ${\text{P}}_{3}=y{\text{P}}_{1}+(1-y){\text{P}}_{2}$. Then ${\text{P}}_{3}$ avoids the difficulty displayed by the probability $\text{P}\text{'}$ of Proposition 2, discussed above, namely ${\text{P}}_{3}(H)=1-y<1$. There is no prior certainty under ${\text{P}}_{3}$ that the sequence is maximally chaotic.
But ${\text{P}}_{3}$ has its own difficulties. Here are two. The Corollary applies to ${\text{P}}_{3}$ with the hypothesis $\tilde{H}$: that the sequence is either maximally chaotic or has a well-defined limit of relative frequency. In Figure 1, $\tilde{H}$ corresponds to the sequences either in the set corresponding to the point $<0,1>$ or in the set of points with well-defined limits of relative frequency, where $\text{L}(x)=\text{U}(x)$. The ${\text{P}}_{3}$ failure set for $\tilde{H}$ is uncountable and deceptive, though meager. Second, ${\text{P}}_{3}$makes all observations irrelevant for learning about the hypothesis $H$: the sequence is maximally chaotic. This follows because