Article
Authors: Christian Tarsney (University of Texas at Austin) , Teruji Thomas (University of Oxford)
Is the overall value of a world just the sum of values contributed by each value-bearing entity in that world? Additively separable axiologies (like total utilitarianism, prioritarianism, and critical level views) say 'yes', but non-additive axiologies (like average utilitarianism, rank-discounted utilitarianism, and variable value views) say 'no'. This distinction appears to be practically important: among other things, additive axiologies generally assign great importance to large changes in population size, and therefore tend to strongly prioritize the long-term survival of humanity over the interests of the present generation. Non-additive axiologies, on the other hand, need not assign great importance to large changes in population size. We show, however, that when there is a large enough `background population' unaffected by our choices, a wide range of non-additive axiologies converge in their implications with additive axiologies—for instance, average utilitarianism converges with critical-level utilitarianism and various egalitarian theories converge with prioritarianism. We further argue that real-world background populations may be large enough to make these limit results practically significant. This means that arguments from the scale of potential future populations for the astronomical importance of avoiding existential catastrophe, and other arguments in practical ethics that seem to presuppose additive separability, may succeed in practice whether or not we accept additive separability as a basic axiological principle.
Keywords:
How to Cite: Tarsney, C. & Thomas, T. (2024) “Non-Additive Axiologies in Large Worlds”, Ergo an Open Access Journal of Philosophy. 11(0). doi: https://doi.org/10.3998/ergo.5714
Is the overall value of a possible world just the sum of values contributed by individual value-bearing entities in that world? This question represents a central dividing line in axiology, between axiologies that are additively separable (hereafter usually abbreviated ‘additive’) and those that are not. Additive axiologies allow the value of a world to be represented as a sum of values independently contributed by each value-bearing entity in that world, while non-additive axiologies do not. Total utilitarianism, for example, claims that the value of a world is simply the sum of the welfare of every welfare subject in that world, and is therefore additive. On the other hand, average utilitarianism, which identifies the value of a world with the average welfare of all welfare subjects, is non-additive.
As these examples suggest, we will assume the context of welfarist population axiology, meaning that we take the ‘value bearers’ to be the lives of welfare subjects, and assume that ‘value’ is a function of their welfare—although, unsurprisingly, our formal results will not depend on this interpretation.
Prima facie, the question of additive separability appears to carry considerable practical significance. In particular, according to any additive axiology, the value contributed to the world by all future people depends linearly on how many such people there will be. This means that additive axiologies are likely to assign very great importance to existential catastrophes (human extinction or other events that would seriously curtail humanity’s future prospects), since these events will generally correspond to very large reductions in future population size (Bostrom 2003; 2013). On an additive axiology, the sheer number of people whose existence is at stake strongly suggests that we should be willing to pay very high costs (e.g., in terms of the welfare of the present generation) for the sake of avoiding existential catastrophe. In contrast, many non-additive axiologies—particularly average utilitarianism and various kindred views—are not sensitive in the same way to population size, and may therefore regard the question of humanity’s long-term survival as having much more limited significance in comparison with the welfare of the present generation.
As a stylized illustration: suppose that there are existing people, all with welfare 1. We can either () leave things unchanged, () improve the welfare of all the existing people from 1 to 2, or () create some number of new people with welfare 1.5. Total utilitarianism, of course, tells us to choose , as long as is sufficiently large. But average utilitarianism—while agreeing that is better than and that the larger is, the better—nonetheless prefers to no matter how astronomically large may be. Now, additive axiologies can disagree with total utilitarianism here if they claim that adding people with welfare 1.5 makes the world worse instead of better; but the broader point is that they will almost always claim that the difference in value between and becomes astronomically large (whether positive or negative) as increases—bigger, for example, than the difference in value between and . Non-additive axiologies, on the other hand, need not regard as making a big difference to the value of the world, regardless of . Again, average utilitarianism agrees with total utilitarianism that is an improvement over , but regards it as a smaller improvement than , even when it affects vastly more individuals.
Thus, additive separability seems to play a crucial role with respect to arguably the most important practical question in population ethics: the relative importance of (i) ensuring the long-term survival of our civilization and its ability to support a very large number of future individuals with lives worth living vs. (ii) improving the welfare of the present population.
In this paper, however, we show that under certain circumstances a wide range of non-additive axiologies ‘converge’ with additive ones: that is, they have the same practical implications as certain additive axiologies to which they correspond. This convergence between additive and non-additive axiologies has a number of interesting consequences, but perhaps the most important is that non-additive axiologies can inherit the linear sensitivity of their additive counterparts to changes in population size. This makes arguments for the overwhelming importance of avoiding existential catastrophe based on the potentially astronomical scale of the far future less reliant on the controversial assumption of additive separability. It thereby increases the robustness of the practical case for the overwhelming importance of avoiding existential catastrophe and diminishes the practical importance of additive separability as an abstract axiological principle.
Our starting place is the observation that, according to non-additive axiologies, which of two outcomes is better can depend on the welfare of the people unaffected by the choice between them. That is, suppose we are comparing two populations and .1 And suppose that, besides and , there is some ‘background population’ that would exist either way. ( might include, for instance, past human or non-human welfare subjects on Earth, faraway aliens, or present/future welfare subjects who are simply unaffected by our present choice.) Non-additive axiologies allow that whether -and- is better than -and- can depend on facts about .2
With this in mind, our argument has two steps. First, we prove several results to the effect that, if the background population is sufficiently large, then non-additive axiologies converge with additive ones. For example, average utilitarianism converges with critical-level utilitarianism, and various egalitarian theories converge with prioritarianism. Second, we argue that the background populations in real-world choice situations are large—at a minimum, orders of magnitude larger than the present and near-future human population, and plausibly orders of magnitude larger than the entire population of our future light cone. This provides some prima facie reason to believe that non-additive axiologies of the types we survey will agree closely with their additive counterparts in practice. More specifically, we argue that real-world background populations are large enough to substantially increase the importance that average utilitarianism assigns to avoiding existential catastrophe.
The paper proceeds as follows: Section 2 introduces some formal concepts and notation. Section 3 formally defines additive separability and describes some important classes of additive axiologies. Sections 4–5 survey several classes of non-additive axiologies and show that they become additive in the large-background-population limit. Section 6 argues that real-world background populations are large, and briefly considers what their welfare distributions might look like. Section 7 illustrates the implications of the preceding arguments by examining how realistic background populations affect the importance of avoiding existential catastrophe according to average utilitarianism. Section 8 considers (without endorsing) two ways in which our results might be taken as arguments against the non-additive views to which they apply. Section 9 is the conclusion.
All of the axiologies we will consider evaluate worlds based only on the number of welfare subjects at each level of lifetime welfare. We will consider only worlds containing a finite total number of welfare subjects. We will also set aside worlds that contain no welfare subjects, simply because some population axiologies, like average utilitarianism, do not evaluate such empty worlds.
Thus, for formal purposes, a population is a function from the set of all possible welfare levels to the set of all non-negative integers, specifying the number of welfare subjects at each level; we require it to be finitely supported, and not everywhere equal to zero.3 Despite this formalism, we’ll say that a welfare level occurs in a population if . An axiology is a strict partial order on the set of all populations, with ‘’ meaning that population is better than population according to .4
Almost all the axiologies we will consider in this paper are defined in terms of a value function , which represents the axiology’s ranking of worlds in the sense that if and only if .5 When an axiology is defined in this way, it is natural (though not obligatory) to think of as encoding not only the ‘ordinal’ facts about which populations are better than which others, but also the ‘cardinal’ facts about how much better they are. We will state our results in both ordinal and cardinal terms. The cardinal facts may be especially important when evaluating populations in the face of uncertainty, an issue we will mainly set aside until Section 7.2.
To illustrate this formalism, the size of a population , denoted , is simply the total number of welfare subjects:
Similarly, the total welfare is
Of course, the definition of only makes sense on the assumption that we can add together welfare levels, and in this connection we generally assume that is given to us as a set of real numbers. (In common terminology, we assume that welfare is ‘measurable on a ratio scale’.) With that in mind, the average welfare
is also well-defined.
We can now give a precise definition of additive separability.
If and are populations, then let be the population obtained by adding together the number of welfare subjects at each welfare level in and . That is, for all . An axiology is separable if, for any populations , and ,
This means that in comparing and , one can ignore the shared sub-population . Separability is entailed by the following more concrete condition:
Additivity
An axiology is additively separable (or additive for short) iff it can be represented by a value function of the form
with . Thus the value of is given by transforming the welfare of each welfare subject by the function and then adding up the results.
In the following discussion, we will sometimes want to focus on the distinction between additive and non-additive axiologies, and sometimes on the distinction between separable and non-separable axiologies. While an axiology can be separable but non-additive, none of the views that we focus on will have this feature. So for our purposes, the additive/non-additive and separable/non-separable distinctions are more or less extensionally equivalent.6
We will consider three categories of additive axiologies in this paper, which we now introduce in order of increasing generality. First, there is total utilitarianism, which identifies the value of a population with its total welfare.7
Total Utilitarianism
An arguable drawback of is that it implies the so-called ‘Repugnant Conclusion’ (Parfit 1984), that for any two positive welfare levels , for any population in which everyone has welfare , there is a better population in which everyone has welfare . The desire to avoid the Repugnant Conclusion is one motivation for the next class of additive axiologies, critical-level theories.8
Critical-Level Utilitarianism
for some constant (representing the ‘critical level’ of welfare above which adding an individual to the population constitutes an improvement), generally but not necessarily taken to be positive.
We sometimes write ‘’ rather than merely ‘’ to emphasize the dependence on the critical level. is a special case of , namely, the case with . But as long as is positive, avoids the Repugnant Conclusion since adding lives with very low positive welfare makes things worse rather than better.9
Another arguable drawback of both and is that they give no priority to the less well off—that is, they assign the same marginal value to a given improvement in someone’s welfare, regardless of how well off they were to begin with. We might intuit, however, that a one-unit improvement in the welfare of a very badly off individual has greater moral value than the same welfare improvement for someone who is already very well off. This intuition is captured by prioritarian theories.10
Prioritarianism
for some function (the ‘priority weighting’ function) that is concave and strictly increasing.
is the special case of with , and is the special case with .11 Note also that our definition of the prioritarian family of axiologies is very close to our definition of additive separability, just adding the conditions that is concave and strictly increasing.
In this section and the next, we consider a variety of non-additive axiologies, and show that each one gives the same verdicts as some additive axiology when there is a large enough background population. In this sense, non-additive axiologies ‘converge’ with additive ones. In this section, we show that average utilitarianism and related views converge with , where the critical level is the average welfare of the background population. In the next section, we show that various non-additive egalitarian views converge with PR.
First, though, let us make the notion of ‘convergence’ more precise. Informally, we say that one axiology, , converges with another, , if the verdicts of approximate the verdicts of to arbitrary precision, as the size of the background population increases. In spelling this out, we will restrict attention to background populations of a given type, for example, all those having a certain average level of welfare. Here is the basic formal definition.
Ordinal Convergence
Axiology converges ordinally with relative to background populations of type if and only if, for any populations and , if is a sufficiently large population of type , then
Of course, if is additive, the last implication is equivalent to
We can, in other words, compare and with respect to by comparing and with respect to —if we know that is a sufficiently large population of the right type.
Note two ways in which this notion of convergence is fairly weak. First, what it means for to be ‘sufficiently large’ can depend on and . Second, the displayed implications need not be a biconditional; thus, when does not have a strict preference between and (e.g., when it is indifferent between them), convergence with does not imply anything about how ranks those two populations.12 Because of this, every axiology converges with the trivial axiology according to which no population is better than any other. Of course, such a result is uninformative, and we are only interested in convergence with more discriminating axiologies. Specifically, we will only ever consider axiologies that satisfy the Pareto principle (which we discuss in Section 5.1).
Ordinal convergence is ‘ordinal’ because it only concerns the way in which the two axiologies rank populations. As we noted in Section 2, one could interpret the value function used to define an axiology as conveying ‘cardinal’ information about the relative values of different populations. There is, correspondingly, a different notion of convergence that we will call cardinal convergence. Specifically, if is the value function for , then one could interpret ratios like
as measuring how much better is than , compared to how much better is than .13 Cardinal convergence occurs when two axiologies agree about these cardinal facts to arbitrary precision as the background population becomes large.
Cardinal Convergence
Axiology (with value function ) converges cardinally with (with value function ) relative to background populations of type if and only if, for any four populations and any margin of error , if Z is a sufficiently large population of type , then
(assuming the denominator ).
As we have defined it, cardinal convergence does not quite imply ordinal convergence. Thus our main results will assert both ordinal and cardinal convergence. And we will often just speak of ‘convergence’ to cover both kinds.
Average utilitarianism identifies the value of a population with its average welfare level.14
Average Utilitarianism
Our first result describes the behavior of AU as the size of the background population tends to infinity.
Theorem 1. Average utilitarianism converges ordinally and cardinally to , relative to background populations with average welfare . In fact, for any populations , if and
then .
Proofs of all theorems are given in the appendix.
Discussion of the normative implications of this and other results is deferred to the second half of the paper (§§6–9).
Some philosophers have sought an intermediate position between total and average utilitarianism, acknowledging that increasing the size of a population (without changing its average welfare) can count as an improvement, but holding that additional lives have diminishing marginal value. The most widely discussed version of this approach is the variable value view.15 It is useful to distinguish two types of this view, the second more general than the first.
Variable Value I
where is a non-zero function that is weakly increasing, concave, and bounded above.
Recall that the total welfare of a population is equal to ; roughly speaking, says that changes in the second factor, the size of , are less important when is already large. The next view also gives varying marginal importance to average welfare:
Variable Value II
where is differentiable and strictly increasing, and is a non-zero function that is weakly increasing, concave, and bounded above.
Sloganistically, variable value views can be ‘totalist for small populations’ (where may be nearly linear), but must become ‘averagist for large populations’ (as approaches its upper bound). It is therefore not entirely surprising that, in the large-background-population limit, and display the same behavior as AU, converging with a critical-level view with the critical level given by the average welfare of the background population.
Theorem 2. Variable value views converge ordinally and cardinally to , relative to background populations with average welfare .
For the broad class of variable value views, we cannot give the sort of threshold for that we gave for , above which the ranking of and must agree with the ranking given by . For instance, because can be any non-zero function that is weakly increasing, concave, and bounded above, variable value views can remain in arbitrarily close agreement with totalism for arbitrarily large populations, so if prefers one population to another, there will always be some variable value theory that agrees. In the case of , we can say that if both and prefer to , then all views will as well (see Proposition 1 in appendix B), and so whenever and have the same strict preference between and , the threshold given in Theorem 1 holds for as well. For , we cannot even say this much.16
A second category of non-additive axiologies are motivated by egalitarian considerations. Does adding an individual to a population, or increasing the welfare of an existing individual, increase or decrease equality? The answer depends on the welfare of other individuals in the population, so it is easy to see why concern with equality might motivate separability violations.
Egalitarian views have been widely discussed in the context of distributive justice for fixed populations, but relatively little has been said about egalitarianism in a variable-population context. We are therefore somewhat in the dark as to which egalitarian views are most plausible in that context. But we will consider a few possibilities that seem especially promising, trying to consider each fork of two major choice points for variable-population egalitarianism.
The most important choice point is between (i) ‘two-factor’/‘pluralistic’ egalitarian views, which treat the value of a population as the sum of two (or more) terms, one of which is a measure of inequality, and (ii) ‘rank-discounting’ views, which give less weight to the welfare of individuals who are better off relative to the rest of the population. These two categories of views are extensionally equivalent in the fixed-population context, but come apart in the variable-population context (Kowalczyk 2020: ch. 3).
Among two-factor egalitarian theories, there is another important choice point between ‘totalist’ and ‘averagist’ views.
Totalist Two-Factor Egalitarianism
where is some measure of inequality in .
Averagist Two-Factor Egalitarianism
where is some measure of inequality in .17
Here, in each case, the second term of the value function can be thought of as a penalty representing the badness of inequality. Such a penalty could have any number of forms, but for the purposes of illustration we stipulate that depends only on the distribution of , where this can be understood formally as the function giving the proportion of the population in having each welfare level. The degree of inequality is indeed plausibly a matter of the distribution in this sense, and the badness of inequality is then plausibly a function of the degree of inequality and the size of the population. The more substantial assumption is that the badness of inequality either scales linearly with the size of the population (for the totalist version of the view) or does not depend on population size (for the averagist version).
Now, we want to know what these theories do as . In the last section, we had to hold one feature of constant as , namely, . Egalitarian theories, however, are potentially sensitive to the whole distribution of welfare levels in the population, and so to obtain limit results it is useful to hold fixed the whole distribution of welfare in the background population, that is, .
We’ll state the general result, explain some of the terminology it uses, and then give some examples.
Theorem 3. Suppose is a value function of the form , or else , where is a differentiable function of the distribution of . Then the axiology represented by converges ordinally and cardinally with an additive axiology, relative to background populations with any fixed distribution ; specifically, it converges with the additive axiology with weighting function given by18
If the Pareto principle holds with respect to , then is weakly increasing, and if Pigou-Dalton transfers are weak improvements, then is concave.
A few points in the theorem require further explanation. Informally, the function is differentiable if varies smoothly with ; we will give the formal definition when it comes to the proof (see Remark 1 in the appendix), but at any rate all proposed measures of inequality that we’re aware of are differentiable, including the two we discuss below. The Pareto principle holds that increasing anyone’s welfare increases the value of the population. This principle clearly holds for prioritarian views (because the priority-weighting is strictly increasing), but it need not in principle hold for egalitarian views: conceptually, increasing someone’s wellbeing might contribute so much to inequality as to be on net a bad thing. Still, the Pareto principle is generally held to be a desideratum for egalitarian views. Finally, a Pigou-Dalton transfer is a total-preserving transfer of welfare from a better-off person to a worse-off person that keeps the first person better-off than the second. The condition that Pigou-Dalton transfers are at least weak improvements (they do not make things worse) is often understood as a minimal requirement for egalitarianism.
To illustrate Theorem 3, let’s consider two more specific families of egalitarian axiologies that instantiate the schemata of totalist and averagist two-factor egalitarianism respectively.
For the first, we’ll use a measure of inequality based on the mean absolute difference () of welfare, defined for any population as follows:
represents the average welfare inequality between any two individuals in , which scales with population size, can be understood as taking each individual’s average welfare inequality with the members of (including herself), then summing across individuals. Consider, then, the following totalist two-factor view:
Mean Absolute Difference Total Egalitarianism
where is a constant that determines the relative importance of inequality.19
Second, consider the following averagist two-factor view, which identifies overall value with a quasi-arithmetic mean of welfare:20
Quasi-Arithmetic Average Egalitarianism
for some strictly increasing, concave function .
Implicitly, the measure of inequality in , which one can show is a positive function, weakly decreasing under Pigou-Dalton transfers. In the limiting case where is linear, .
Theorem 4. converges ordinally and cardinally to , relative to background populations with a given distribution . Specifically, converges with , the prioritarian axiology whose weighting function is
Here is the average distance between and the welfare levels occurring in .
Theorem 5. converges ordinally and cardinally to , relative to background populations with a given distribution . Specifically, converges with , the prioritarian axiology whose weighting function is
Another family of population axiologies that is often taken to reflect egalitarian motivations is rank-discounted utilitarianism (RDU). The essential idea of rank-discounting is to give different weights to marginal changes in the welfare of different individuals, not based on their absolute welfare level (as prioritarianism does), but rather based on their welfare rank within the population. One potential motivation for RDU over two-factor views is that, because we are simply applying different positive weights to the marginal welfare of each individual, we clearly avoid any charge of ‘leveling down’: unlike on two-factor views, there is nothing even pro tanto good about reducing the welfare of a better-off individual—it is simply less bad than reducing the welfare of a worse-off individual.21
Versions of rank-discounted utilitarianism have been discussed and advocated under various names in both philosophy and economics, for example, by Asheim and Zuber (2014) and Buchak (2017). In these contexts, the RDU value function is generally taken to have the following form:
where denotes the welfare of the th worst off welfare subject in , and is a positive but weakly decreasing function.22
However, these discussions often assume a context of fixed population size, and there are different ways one might extend the formula when the size is not fixed.
We will consider the most obvious approach, simply taking equation (2) as a definition regardless of the size of X.23 A view of this type, explicitly designed for a variable-population context, is set out in Asheim and Zuber (2014). Simplifying slightly to set aside features irrelevant for our purposes, their view is as follows:
Geometric Rank-Discounted Utilitarianism
for some .
Here, the rank-weighting function is . In general, since is assumed to be weakly decreasing and positive, must asymptotically approach some limit as increases. For . But a simpler situation arises when (so that is bounded away from zero):
Bounded Rank-Discounted Utilitarianism
for some weakly decreasing, positive function that is eventually convex24 with asymptote .
We will state formal results about both and in Appendix A; they involve a slightly more restricted notion of convergence than we have considered so far. The case of is relatively simple: it converges with total utilitarianism. This is because, when the background population is very large, each life in the foreground population with welfare level contributes approximately to the overall value of the population (at least assuming that is higher than some level in the background population). So the overall contribution of the foreground population is approximately equal to its total welfare times .
When, as in , the asymptote of the weighting function is at , the situation is subtler and appears to depend on the exact rate at which decays. We will consider only , as it is the best-motivated example in the literature. Uniquely among the axiologies we consider, does not converge with an additive, Paretian axiology on any interesting range of populations. Roughly speaking, this is because, as the background population gets larger, the weight given to the best-off individual in becomes arbitrarily small relative to the weight given to the worst-off—smaller than the relative weight given by any particular additive, Paretian axiology.
Nonetheless, it turns out that GRD does converge with a separable, Paretian axiology, which we call critical-level leximin. This is an extreme form of prioritarianism in which infinite priority is always given to the less well-off. We’ll explain this carefully in Appendix A, but perhaps the most important take-away is that (because critical-level leximin is so extreme) GRD leads to some very strange and counterintuitive results when the background population is sufficiently large.
For example, tiny benefits to worse-off individuals will often be preferred over astronomical benefits to even slightly better-off individuals; moreover, adding an individual to the population with anything less than the maximum welfare level in the background population will often make things worse overall.25 In fact, implies what we might call the ‘Snobbish Conclusion’:
Snobbish Conclusion
In some circumstances, given a very high welfare level just slightly below the best in the background population, and an even higher welfare level greater than any in the background population, adding even one life at makes things so much worse that it cannot be compensated by any number of lives at .
This seems crazy to us. We could just about understand the Snobbish Conclusion in the context of an anti-natalist view, according to which adding lives invariably has negative value; but, according to , there are many possible background populations (for instance, any in which the highest welfare level is less than ) to which the addition described above would constitute an improvement. We could also understand the view that adding good lives can make things worse if it lowers average welfare or increases inequality (e.g., as measured by mean absolute difference or standard deviation). But, again, that’s not what’s going on here. Instead, implies that adding excellent lives makes things worse if the number of even slightly better lives already in existence happens to be sufficiently great, regardless of the other facts about the distribution. In some cases, it makes things so much worse that it cannot be compensated by adding any number of even better lives.
To sum up, many forms of egalitarianism, including many forms of rank-discounted utilitarianism, converge with interesting additive axiologies. Geometric Rank-Discounted Utilitarianism provides one counterexample, although it does converge with an interesting separable axiology. Moreover, our general methodology of thinking about large background populations draws out some features that make seem especially implausible.
In the rest of the paper, we explore the implications of the preceding results, and especially their practical implications for morally significant real-world choices. To apply our limit results in this way, there are two basic things one would like to know, which we investigate in this section.
First, one would like to know that the real-world background population is large enough that non-additive axiologies of the types we investigate give the same verdicts as the additive axiologies with which they converge. This is the topic of Section 6.1. The background population will be large if there are many welfare subjects whose lives are unaffected by our choices (although it may also be larger than the number of unaffected welfare subjects, as we explain in a moment). Many readers will already grant that the background population is extremely large, given the enormous number of welfare subjects in Earth’s past, to say nothing of life elsewhere. However, we think it is nonetheless useful to develop some numerical estimates. After all, what counts as ‘large enough’ in the mathematical sense required to apply the limit results will depend on the specific axiology and the choice situation in question; being enormous by ordinary standards need not suffice. Indeed, while the background population will obviously be large compared to the foreground population in many ordinary or toy cases, this is much less obvious in other cases, where (for example) the future of life on Earth is at stake. The estimates developed in this section will allow us, in Section 7, to reach firmer conclusions about a stylized but basically realistic case of that type. Moreover, as we’ll explain in Sections 6.1.1 and 6.1.2, there are some subtle ways in which advocates of non-additive axiologies might try to limit the size of the background population. (Mightn’t non-human animals count less than humans? Shouldn’t we simply set aside the past?) To evaluate these moves, it will again be useful to have some actual numerical estimates and the justifications for them in mind.
Second, we have seen that which additive axiology is relevant depends on the average welfare of the background population, and perhaps on its entire welfare distribution. Thus one would like to know something about this distribution. Here, unfortunately, it is very difficult to go beyond speculation, but we will still make some tentative remarks that will guide our discussion, as well as, we hope, providing a starting point for future research.
We have so far been informal about the distinction between ‘background’ and ‘foreground’ populations, but it will now be helpful to make these notions more precise. If we are interested in evaluating populations , the population that can be treated as background is defined by . That is, the background population consists of the minimum feasible number of welfare subjects at each welfare level. For this and for each , there is then a population such that . A choice between can therefore be understood as a choice between the foreground populations , in the presence of background population .
As we noted above, welfare subjects will contribute to the size of the background population if they are unaffected by the choice at hand. However, it is important to realize that the size of the background population can exceed the number of unaffected individuals. This is because the background population depends on the number of welfare subjects guaranteed to exist at each level, not on their identities. As a result, for instance, future welfare subjects might contribute to the background population even if their identities are entirely dependent on our present choices (as argued by Parfit 1984: ch. 16, among others).
Having said that, in this section we will focus mainly on welfare subjects whose lives are entirely outside of our causal future, and thus would count as background for any choice we could realistically face. We will return to the possibility of affectable individuals contributing to the background population at the end of Section 6.1.2.
We will make two claims about the size of the background populations that are relevant to real-world choices, with different degrees of confidence.
First, with high confidence, these populations are much larger (at least multiple orders of magnitude) than the present human population. Concretely, while there are fewer than humans alive today, we conservatively estimate that there have been at least welfare subjects in Earth’s past, with estimates of or more being plausible. Informally, this suggests that our limit results should at least be relevant when comparing options that only affect present and near-future humans (though a background population of this size can also substantially affect the evaluation of choices affecting the far future, as we will see in §7).
Second, with much lower confidence, real-world background populations may well be much larger (again, by multiple orders of magnitude) than the entire population in our future light cone. If this is right then our limit results are likely to be relevant to essentially any real-world choice.
Let’s start by establishing the first claim, which only requires us to consider past welfare subjects on Earth. Estimates of the number of human beings who have ever lived are on the order of (Kaneda & Haub 2018), already an order of magnitude larger than the present human population. But past welfare subjects include a vast number of non-human animals, and especially wild animals over many millions of years. There are today at least wild mammals; for vertebrates in general, the number is far higher, with a conservative lower bound of (dominated by fish).26 Prehistoric wild animal populations were presumably similarly large or larger, given the significant decline in wild animal populations as a result of human encroachment.27 Inferring the total number of animals from the number alive at a given time requires assumptions about mortality rates. We will use a very conservative estimate of 0.1 deaths per individual per year in wild animal populations (roughly corresponding to a life expectancy of 10 years). The actual rates are almost certainly much higher for most species (especially given high infant mortality), implying larger total past populations. Being extremely conservative, then, we find that there have been at least mammals since the extinction of the dinosaurs 66 million years ago.28 This gives our basic lower bound for the size of the background population. If we less conservatively allow that all vertebrates are welfare subjects, then a similar calculation gives a lower bound of individuals over the last 500 million years. And of course some invertebrates may be welfare subjects too.
While these background populations are large compared to the present population, they may not be large compared to the entire affectable future population. If our civilization survives for a very long time, the number of future individuals might be truly astronomical, and actions that affect the long-term future (for instance by causing or preventing existential catastrophes) might affect this entire population. If we can sustain just the size of the present human population until the Earth becomes uninhabitable, this would yield a future population size on the order of , even ignoring non-humans.29 This is roughly on a par with our lower-bound estimate of the number of past animals on Earth (though still much smaller than the more generous estimate that includes all vertebrates). Less conservatively, if humanity someday settles the cosmos and creates digital minds on a mass scale, far larger populations become possible—for instance, Bostrom (2013) estimates that such an interstellar civilization could support 1054 subjective life-years of human-like experience, perhaps in the form of 1052 lives with a subjective duration of 100 years each. Any plausible estimate of past animals on Earth will pale in comparison with these numbers.
There may nevertheless be unaffectable background populations even larger than these astronomical potential future populations. The crucial point is that the universe as a whole appears to be at least 100 times larger, and perhaps vastly larger, than the accessible universe (the portion of the universe that it is possible in principle for us to reach).30 So, if life arises independently in many places, we would expect at least 99% of it to be outside the accessible universe and thus necessarily part of the background population. Similarly, if the universe contains many spacefaring civilizations, at least 99% of them should be inaccessible. However large the population of future human-originating civilization, this background population (consisting of many similar civilizations) will be orders of magnitude larger. But, of course, the hypothesis of extraterrestrial civilizations is entirely speculative, and deserves significantly less confidence than our lower-bound estimate of for the size of the background population.31 We next consider two common objections to this lower-bound estimate.
Our basic estimate involves a large number of small and relatively simple animals. Several readers have suggested that, although such simple animals are still welfare subjects, perhaps they should receive less weight when we calculate the ‘size’ of a population for axiological purposes: perhaps, when evaluating outcomes, a typical mouse should effectively count as only (say) one fiftieth of a welfare subject, given its cognitive simplicity. In fact, a view along these lines is developed by Kagan (2019: see especially §4.5.). This way of accounting could, in principle, dramatically reduce the size of the background population.32
We have three responses to this suggestion. First, of course, one might lodge straightforward ethical objections to assigning different weights to different animals, since it seems to contradict the ideals of impartiality and equal consideration that are often seen as central to ethics in general and axiology in particular. Second, it seems that any plausible way of assigning weights is likely to leave a background population several orders of magnitude larger than the present human population. Let us take mice as representative of the background mammalian population. Adopting Kagan’s suggestion of an axiological weight of 1/50 for mice (2019: 109) would only lower our estimate of the background population to ~1016. Alternatively, it might be plausible to adopt weights that are proportional to cortical neuron count or lifespan.33 But even weighting by both cortical neuron count and lifespan would only cut our lower-bound estimate of the size of the background population down to ~1013, three orders of magnitude larger than the present human population. And this, of course, still only counts mammals since the extinction of the dinosaurs, ignoring all other animals.
Perhaps, however, there is some other rationale on which one would assign even tinier weights to practically all non-human animals. This brings us to our third response: even if we entirely ignore non-humans we may still find that background populations are large relative to foreground populations in most present-day choice situations. Past humans outnumber present humans by more than an order of magnitude, as we saw above. And, as we’ll argue at the end of §6.1.2, it seems plausible that the large majority even of the present and near-future human population is approximately background in most choice situations.
Here is another way in which proponents of non-additive axiologies might limit the size of the background population, at least for practical purposes. They could claim that, when it comes to decision making, we should apply our axiology to the population of welfare subjects who exist in our causal future (presumably, our future light cone), rather than to the universe as a whole. Such a causal domain restriction (Bostrom 2011) would simply exclude the kinds of large background populations we have considered so far. It could also be seen as a somewhat principled way for proponents of non-additive views to explain the common intuition that facts about the welfare levels of ancient humans simply can’t be practically relevant today; or to mitigate the difficulty of applying non-additive views given our deep uncertainties about life outside the accessible universe.
We have three replies to this suggestion. First, to adopt a causal domain restriction is to abandon a central and deeply appealing feature of consequentialism, namely, the idea that we have reason to make the world a better place, from an impartial and universal point of view. That some act would make the world a better place, full stop, is a straightforward and compelling reason to do it. It is much harder to explain why the fact that an act would make your future light cone a better place (e.g., by maximizing the average welfare of its population), while making the world as a whole worse, should count in its favor.34
Second, the combination of a causal domain restriction with a non-separable axiology can generate counterintuitive inconsistencies between agents (and agent-stages) located at different times and places, with resulting inefficiencies. As a simple example, suppose that and are both agents who evaluate their options using causal-domain-restricted average utilitarianism. At must choose between a population of one individual with welfare 0 who will live from to (population ) or a population of one individual with welfare –1 who will live from to (population ). At , must choose between a population of three individuals with welfare 5 (population ) or a population of one individual with welfare 6 (population ), both of which will live from to . If chooses , then will choose (yielding an average welfare of 6 in ’s future light cone), but if chooses , then will choose (since yields average welfare 3.5 in ’s future light cone, while yields only 2.5). Since prefers to (which yield averages of 3.5 and 3 respectively in ’s future light cone), will choose . Thus we get , even though would have been better from both A’s and ’s perspectives. That two agents who accept exactly the same normative theory and have exactly the same, perfect information can find themselves in such pointless squabbles is surely an unwelcome feature of that normative theory, though we leave it to the reader to decide just how unwelcome.35
Third, a causal domain restriction might not be enough to avoid the limit behaviors described in §§4–5, if there are large populations inside our future light cones that are background (at least, to a good approximation) with respect to most real-world choice situations. For instance, it seems likely that most choices we face will have little effect on wild animal populations over the next 100 years. More precisely, our choices might determine the identities of wild animals born in the next century (in the standard ways in which our choices are generally supposed to be identity-affecting with respect to most of the future population), while having little if any effect on the number of individuals at each welfare level in that population. And this alone would supply quite a large background population—conservatively, 1012 mammals and 1014 vertebrates. Indeed, it is plausible that with respect to most choices (even comparatively major, impactful choices), the vast majority of the present and near-future human population can be treated as background. For instance, if we are choosing between spending $1 million on anti-malarial bednets or on efforts to mitigate long-term existential risks, even the intervention that more directly impacts the near future (bednets) may have only a comparatively tiny effect on the number of individuals at each welfare level in the present- and near-future human population, so that most of that population can be treated as background.36
What about the distribution of welfare in the background population? Anything we say about this will of course be enormously speculative. However, since it is—according to non-additive views!—an important topic, it seems worth making a few brief remarks.
With respect to average welfare in the background population, two hypotheses seem particularly plausible.
Hypothesis 1
The background population consists mainly of small animals (whether terrestrial or extraterrestrial). Most of these animals have short natural lifespans, with high rates of infant mortality, so the average welfare level of the background population is likely close to zero. If the capacity for welfare scales with brain size or something similar, this would reinforce the same conclusion. Moreover, it seems plausible that average welfare in these populations will be negative, at least on a hedonic view of welfare (Horta 2010; Ng 1995). These assumptions imply, for instance, that and converge with a version of with a slightly negative critical level.
Hypothesis 2
The background population consists mainly of the members of advanced alien civilizations, with astronomically large population sizes driven by space settlement or other technological advances. Under this hypothesis, given the limits of our present knowledge, all bets are off: average welfare in the background population could be very high (Ord 2020: 235–39), very low (Sotala & Gloor 2017), or anything in between.
With respect to the distribution of welfare more generally, we have even less to say. There is clearly a wide range of welfare levels in the background population, leading to significant inequality within specific groups.37 However, it could still turn out that the background population as a whole is dominated by welfare subjects who lead fairly similar lives—for example, by small animals who almost always experience lifetime welfare close to 0, or by members of a highly egalitarian alien civilization. This would lead to a low level of inequality, at least by standard measures.
If, as we have just argued, real-world background populations are indeed large relative to foreground populations, this provides some prima facie reason to believe that our limit results are practically significant: many plausible non-additive views will agree closely with their additive counterparts. So, even if we don’t accept additivity as a fundamental axiological principle, it may nevertheless be a useful heuristic for real-world decision-making purposes, and arguments in practical ethics that rely on separability assumptions may still succeed in practice.
In this section we give a concrete illustration of this point. As we suggested in §1, perhaps the most important practical question at stake in debates over additive separability is the relative importance of
(i) ensuring the existence of a large future population; versus
(ii) improving the welfare of the present generation.
For example, what sacrifice by the present generation would be worth it to forestall an ‘existential catastrophe’ that drastically reduces future population sizes?38 On additive views, the amount of present welfare we should be willing to sacrifice to ensure the existence of a future population scales linearly with .39 Thus, insofar as future populations would be astronomically larger than the present human population, it would be worth very large sacrifices on the part of the present generation to ensure their existence. But non-additive views need not endorse this sort of reasoning—in particular, and other similar views do not.
We therefore present a deeper analysis of how real-world background populations affect the relative importance of these two objectives according to . We focus on to keep the discussion manageable, and because exhibits the central relevant feature of insensitivity to population size, without the essentially orthogonal feature of inequality aversion.40 Moreover, we will assume that the future generations that would exist if we avoid existential catastrophe would have higher-than-average welfare; in this case, assigns positive value to avoiding existential catastrophe. But most of what we say also applies, mutatis mutandis, to the disvalue of avoiding existential catastrophe on the opposite assumption that the potential future population would have lower-than-average welfare.
Given Theorem 1, our basic conclusion will be unsurprising: if the background population is indeed as large as we have suggested in Section 6, then even gives great importance to existential catastrophe. However, there are a number of subtleties that we think are worth drawing out. In particular, we will take into account two points that complicate the application of our theorems. First, we can at best hope to affect the probability of existential catastrophe—a topic our theorems say nothing about. And, second, our more conservative estimates of the background population suggest that it may be much smaller than the size of the affected future population, making it less clear that will give verdicts similar to the additive axiology .
The results of our analysis are summarized in Section 7.5.
We almost never face a choice between a certainty of catastrophe and a certainty of non-catastrophe. So, we suggest, the best way to understand the tradeoff between (i) and (ii) is in terms of the sacrifice the current generation might make in order to reduce existential risk, that is, the probability of existential catastrophe.41
To make this precise, let denote a background population that includes, as usual, any past welfare subjects, as well as any present or future ones who will be unaffected by the choice. Let denote the future population that will exist only if we avoid existential catastrophe. Let denote the current generation; more specifically, let be a version of the current generation with average welfare (and fixed size ). The following risky prospect then represents a probability of existential catastrophe:
From this baseline we can consider reducing the probability of catastrophe by an infinitesimal amount while also decreasing the average welfare of the current generation by to obtain a new prospect
Suppose we do this in such a way that and are equally good prospects. Then the ratio is an ‘exchange rate’ telling us how to weigh small changes in the welfare of the current generation against small changes in the probability of catastrophe. The higher the exchange rate, the greater the sacrifice that would be compensated by a marginal reduction in risk. So, formally, our question becomes:
Question 1. How important is existential catastrophe as measured by the exchange rate ? In particular, how does it depend on the relative sizes of , and ?
Unfortunately, one cannot read the answer to Question 1 directly off of our limit results, which, after all, say nothing about probabilities. The plan for the rest of the section is to explain why our limit results are nonetheless relevant, and then to give a concrete analysis using the estimates for population sizes that we developed in Section 6.
To address Question 1, we must adopt some rule for evaluating risky prospects like . When—as in all our examples—an axiology is defined using a value function, the most obvious rule is to rank prospects by their expected value. We will assume that this is the appropriate rule for both and for its limiting axiology . Let us call the extended theories and , respectively, where the stands for expected.42
What justifies this assumption? When it comes to critical level views, there are foundational arguments supporting the use of expected value (see, e.g., Blackorby et al. 2005). For we have less to go on, but maximizing expected average welfare seems like a natural default, and there is no alternative for (or for other non-additive axiologies) that has achieved anything like widespread acceptance. Moreover, the use of expected value is closely connected to the idea that the value function represents cardinal facts about value. If is a measure of how much better it is to get instead of , we should expect to be at least a rough measure of how much better it is to get instead of with probability . The use of expected value is a systematic development of this idea.
This connection explains why our results about cardinal convergence are, after all, relevant to Question 1. We can interpret and as involving three scenarios: (1) with probability , the catastrophe happens no matter which option is chosen; (2) with probability , choosing would successfully prevent the catastrophe; (3) with probability , the catastrophe fails to happen no matter what. In each of these scenarios, and lead to different outcomes, and expected value theory effectively tells us to weigh up how much better or worse the outcome of would be than the outcome of in each scenario, using the probabilities as weights. When the background population is extremely large, Theorem 1 says that and will agree to high precision about the relative sizes of these differences in value. They will thus tend to agree about which option has higher expected value. And in particular, they will answer Question 1 in approximately the same way.
Here is the general theoretical lesson, stated somewhat informally. Suppose that axiology converges cardinally with relative to background populations of type . For any two prospects, each involving finitely many possible outcomes, if there is certain to be a sufficiently large background population of type , then and will agree about which prospect has higher expected value.
With this set-up in hand, we now compare the answers to Question 1 given by and by . We will give a general qualitative analysis of how the answers depend on the population sizes, illustrated numerically using some of our estimates from Section 6.
Let’s first consider the case of . Again, we will assume that the prospect is evaluated using its expected value . By definition,
We can think of this expected value as a function of and , while holding all other parameters fixed. Then the exchange rate is given by a ratio of derivatives:
(For example, if the expected value decreases rapidly as increases, but increases slowly as increases, then existential catastrophe is relatively important.) As is easy to deduce,
As one would anticipate, this quantity is positive (so some sacrifice is warranted) only if is above the critical level , and the importance of existential catastrophe scales linearly with .
By contrast, using the value function instead of , one finds
This expression is unattractive, but informative, and it simplifies greatly if we make further assumptions about population sizes. Consider the following three cases:
Case 1: . In this case, with a small background population, is approximately .43 So it is roughly independent of the population sizes, and (in particular) does not scale with . Moreover, in this approximation, reducing existential risk is only worthwhile if the future population would have average welfare higher than the current generation.
Case 2: . In this case, where is intermediate in size between and is approximately . In one way, this approximation agrees with : it is worth reducing existential risk insofar as . Moreover, will tend to be very large, proportional to . So, in this regime, existential catastrophe may be very important, but its importance is still insensitive to the size of .
Case 3: . In this case, where the background population is much larger than any of the potential foreground populations, is approximately . This is exactly the value we found using . In particular, for both and , the importance of existential catastrophe scales with in this regime.
The most basic qualitative point to take away from this analysis is that increases without bound as we increase both and . The fact that possible future and actual background populations are both extremely large suggests that is likely to be large (thus favoring existential risk minimization) for a robust range of the other parameters.
We now illustrate the preceding qualitative points using plausible numerical estimates of the various population sizes. The results are summarized in Table 1.
For the sizes of the foreground populations, we will suppose that and . The former is a realistic estimate of the size of the present and near-future human population; the latter is a rough estimate of the potential size of the future human-originating population, supposing that we maintain current population sizes for as long as possible on Earth (see §6.1).
For the size of the background population, we will consider three values, which correspond exactly to Cases 1–3 above. First, just for illustration, we consider (the case of no background population). Second, more realistically, , a rounding-down of our most conservative estimate of the number of past mammals, weighted by lifespan and cortical neuron count, from §6.1.1. Finally, a somewhat less conservative estimate of . This last value corresponds to our lower bound for the number of vertebrates in Earth’s past; alternatively, it could correspond to a background population dominated by 1000 alien civilizations, of the same scale that our civilization will achieve if we avoid existential catastrophe.
In terms of average welfare, we have less to go on, but the specific values are also less important for our present purposes. We will assume (plausible for the case where consists mainly of wild animals, somewhat less plausible for the case where it consists mainly of advanced civilizations). If the current generation has positive average welfare, we can then choose units so that . And finally, for simplicity let us suppose (ensuring that reducing existential risk will be worth some sacrifice in all three cases). Finally, we take .
Table 1 gives the values of according to and , under these assumptions, for all three background population sizes. Comparing the values in the third and fourth columns, we see that in this example, with three- or four-order-of-magnitude differences in the population sizes of , and , the approximations used in the last subsection are accurate to at least the third significant figure. In particular, in the third case, where agrees with to the third significant figure—preferring even very small reductions in the probability of existential catastrophe over a fairly substantial increase in the welfare of the current generation.
Table 1: The importance of avoiding existential catastrophe, as measured by , according to or with different background sizes. The other parameters are , and . We give just enough significant figures to show disagreement with the approximations developed in Cases 1–3, and stated in the fourth column for comparison.
Axiology | Approximation | ||
0 | 1.9999996 | ||
Any | – |
We have used the standard value functions defining and to analyze the expected value of reducing existential risk. This analysis yields the following conclusions: (1) When the background population is small or non-existent, the importance of avoiding existential catastrophe according to is approximately independent of population size, depending only on the average welfares of the potential foreground populations. It is therefore unlikely to be astronomically large. (2) When—as suggested by our most conservative estimates— is much larger than the current generation but still much smaller than the potential future population, the importance of avoiding existential catastrophe according to approximately scales with , and may therefore be extremely large, while still falling well short of its importance according to . (3) Finally, if the background population is much larger even than the potential future population (as it would be, for instance, if it includes many advanced civilizations elsewhere in the universe), agrees closely with about the importance of avoiding existential catastrophe, treating it as approximately linear in .
Our primary goal in this paper has been to explore the implications of non-additive axiologies in the context of large background populations. We don’t see our results primarily as a reason to reject the views to which they apply, but rather simply as suggesting that their practical implications are more similar to additive views than one might have thought. However, there are at least two ways in which our results might be taken to support objections to those views, which we briefly explore in this section.
The Repugnant Conclusion, recall, is the conclusion (implied by TU among other axiologies) that for any two positive welfare levels , for any population in which everyone has welfare , there is a better population in which everyone has welfare . Avoidance of the Repugnant Conclusion is often seen as a significant desideratum in population axiology. But additive axiologies, as we have defined them, can avoid the Repugnant Conclusion only at the cost of implying the Strong Sadistic Conclusion: for any negative welfare level , for any population in which everyone has welfare , there is a worse population in which everyone has positive welfare (Arrhenius 2000).44
One of the motivations for population axiologies with an ‘averagist’ flavor (like and ) is to avoid these unappealing consequences of additivity. These views do not imply either the Repugnant Conclusion or the Strong Sadistic Conclusion. But a straightforward implication of our results is that they imply both of the following, closely related conclusions.
Repugnant Addition. For any positive welfare levels and any population in which everyone has welfare , there is a population in which everyone has welfare and a population such that .
Strong Sadistic Addition. For any negative welfare level and any population in which everyone has welfare , there is a population in which everyone has positive welfare and a population such that .
Informally, where the Repugnant Conclusion says that for any imaginable utopia, there’s a better population in which everyone’s life is barely worth living, Repugnant Addition says that it’s sometimes better to add the latter population to a preexisting population. And likewise, where the Strong Sadistic Conclusion says that for any imaginable dystopia, there’s a worse population in which everyone’s life is worth living (if only barely), Strong Sadistic Addition says that it’s sometimes worse to add the latter population to a preexisting population.
A non-additive view will imply both of these conclusions if it can converge with an additive view with either a positive or a negative critical level. This includes , and all natural versions of two-factor egalitarianism.45 AU yields Repugnant Addition, for instance, when there is a large background population with average welfare 0, and yields Strong Sadistic Addition when there is a large background population with positive average welfare.46 We find these implications nearly as counterintuitive as the original Repugnant and Strong Sadistic Conclusions, though your mileage may vary. And non-additive views that imply both Repugnant Addition and Strong Sadistic Addition are, in one respect, worse off than any additive view, which will only imply one of the Repugnant and Strong Sadistic Conclusions.
These observations are not particularly original. Spears and Budolfson (2021) point out the difficulty of avoiding Repugnant Addition, for a broader range of axiologies than we have considered in this paper. And Franz and Spears (2020) show that, under modest assumptions, any view that rejects Mere Addition (including AU and similar views) will imply a weaker version of what we have called Strong Sadistic Addition. But our results provide a systematic and illuminating explanation for the difficulty of avoiding these unpalatable conclusions.
Our results also show that agents whose choices are guided by non-additive axiologies are vulnerable to a particular kind of exploitation. Suppose, for instance, that we in the Milky Way are all average utilitarians, while the inhabitants of the Andromeda Galaxy are all total utilitarians. And suppose that, the distance between the galaxies being what it is, we can communicate with each other but cannot otherwise interact. Being total utilitarians, the Andromedans would prefer that we act in ways that maximize total welfare in the Milky Way. To bring that about, they might create an astronomical number of welfare subjects with welfare very close to zero—for instance, very small, short-lived animals with mostly bland experiences—and send us evidence that they have done so. We in the Milky Way would then make all our choices under the awareness of a large background population whose average welfare is close to zero. The Andromedans could thus ‘force’ us to behave like de facto total utilitarians, doing the work of total utilitarianism on their behalf.
Agents who accept additive (or, more generally, separable) axiologies, on the other hand, are immune from this sort of exploitation. Thus non-additivists are at a practical disadvantage in strategic interactions with additivists. The incentive for others to exploit in this way also makes non-additive views potentially self-defeating: adopting and acting on such a view can incentivize other agents to act in ways that make things worse by its lights. For instance, the existence of average utilitarians incentivizes total utilitarians to add individuals with welfare 0 to the population, which makes things worse from the average utilitarian’s perspective if the average welfare of the preexisting population is positive.
We ourselves do not see this vulnerability as a particularly weighty reason to reject non-additive views—after all, nearly every agent is vulnerable to some forms of exploitation. (On the vulnerabilities of total utilitarians, for instance, see Gustafsson 2022a.) But it is certainly an unwelcome feature, and others may see it as a more severe drawback.
We have shown that, in the presence of large enough background populations, a range of non-additive axiologies asymptotically agree with some counterpart additive axiology (either critical-level or, more broadly, prioritarian). And we have argued that the real-world background population is large enough to make these limit results practically relevant. These facts may have important practical implications for tradeoffs between avoiding existential catastrophe and benefiting the current generation: they suggest that and kindred axiologies should, in practice, strongly prioritize existential catastrophe avoidance in virtue of the astronomical size of the potential future population, just as additive axiologies seem to do. Thus, arguments for the overwhelming practical importance of avoiding existential catastrophe may not depend on additive separability.
We have left many questions unanswered that might be valuable topics of future research: (1) a more careful characterization of the size and welfare distribution of real-world background populations; (2) how to extend our limit results to the context of risk/uncertainty, including uncertainty about features of the background population; (3) the behavior of a wider range of non-additive axiologies (e.g., incomplete, intransitive, or person-affecting) in the large-background-population limit; and (4) exploring more generally the question of how large the background population needs to be for the limit results to ‘kick in’, for a wider range of axiologies and choice situations than we considered in §7.
For helpful discussion and/or feedback on drafts of this paper, we are grateful to Tomi Francis, Hilary Greaves, Kacper Kowalczyk, Toby Newberry, Toby Ord, Itzhak Rasooly, Dean Spears, Orri Stefánsson, Philip Trammell, and audiences at the University of Oxford, Jagiellonian University, Kansas State University, and the Massachusetts Institute of Technology.
In this appendix, we present two results about rank-discounted utilitarianism that are explained informally in Section 5.2. In stating the results, we will need to restrict the foreground populations under consideration.
Ordinal Convergence on S
Axiology converges ordinally with , relative to background populations of type , on a set of populations , if and only if, for any populations and in , if is a sufficiently large population of type , then
Similarly, we have an obvious notion of cardinal convergence on , where the four populations occurring in the definition of cardinal convergence are restricted to be elements of .
Having fixed a background distribution , say that a population is moderate with respect to if the lowest welfare level in is no lower than the the lowest welfare level in . In other words, for any with , there is some with and . Then we can state the following result:
Theorem 6. converges ordinally and cardinally to relative to background populations with a given distribution , on the set of populations that are moderate with respect to .
Now we turn to . The limiting axiology will be critical-level leximin, defined by the following conditions:
Critical-Level Leximin
If and have the same size, then if and only if and the least such that is such that .
If and differ only in that has additional individuals at welfare level , then and are equally good.47
Although is not additively separable in the narrow sense defined in §2, which requires an assignment of real numbers to each individual, one can check that it is separable, and indeed one can show that it is additively separable in a more general sense, if we allow the contributory value of an individual’s welfare to be represented by a vector rather than a single real number.48
To state the theorem, fix a set of welfare levels. Say that a population is supported on if and only if for all . And say that is covered by a distribution if and only if there is a welfare level in between any two elements of , a welfare level in below every element of , and welfare level in above every element of .
Theorem 7. Let be any set of welfare levels, and a distribution that covers . converges ordinally with relative to background populations with distribution , on the set of populations that are supported on ; the critical level is the highest welfare level occurring in .
Here, note, we only consider ordinal convergence, since is not defined using a real-valued value function.
Recall that is the set of welfare levels, and consists of all non-zero, finitely supported functions . By a type of population we mean a set that contains populations of arbitrarily large size: for all there exists with .
The following result, while elementary, indicates our general method. Let us say that a function is additive if for all . The value functions we have given for additive axiologies are all of this kind.
Lemma 1. Suppose given and a positive function . Define
as ranges over populations of some type . If is an additive function, then the axiology with value function converges ordinally and cardinally to the axiology with value function , relative to background populations of type .
Proof. Let be two populations, and let be a background population of type . To prove ordinal convergence, suppose . Then, by additivity of . Moreover,
Therefore, if is large enough, we must have , as ordinal convergence requires.
For cardinal convergence, consider four populations . We have
as long as the denominator . On the other hand, for any given ,
by additivity. Therefore
as cardinal convergence requires.
Theorem 1. Average utilitarianism converges ordinally and cardinally to , relative to background populations with average welfare . In fact, for any populations , if and
then
Proof. In this case, a brief calculation shows
Setting we find , in the notation of Lemma 1. That lemma then yields the first statement.
We now verify the more precise second statement directly. Suppose , that (1) holds, and that . We have to show . Using (3), that desired conclusion is equivalent to
Cross-multiplying, this is equivalent to
or, rearranging,
Given that , the desired conclusion (4) follows from (1).
Theorem 2. Variable value views converge ordinally and cardinally to , relative to background populations with average welfare .
Proof. Suppose the variable value view has a value function of the form . Then
We now apply two lemmas, proved below.
Lemma 2. We have .
Lemma 3. We have .
Since , and approaches some upper bound as , we find
as ranges over populations with . Let . Then we have found
The result now follows from Lemma 1.
Proof of Lemma 2. Let be the result of rounding up to the nearest integer. By increasingness and concavity of , we have49
Cross-multiplying,
Since and both tend to a common limit as , we find that the right-hand side tends to 0 in that limit. Therefore the expression in the middle also tends to 0.
Proof of Lemma 3. First, if then and , so the result is trivial in that case. Otherwise, since tends toward as , we have (by the definition of the derivative)
We have, from (3),
Inserting this into the preceding formula, we find
Since , we obtain the desired result.
Proposition 1. For any populations and , if and , then .
Proof. Since is non-zero, non-negative, and concave, we must have for all (it is possible, however, that ). It follows that has the same sign as . So if , it is automatic that . (The condition that excludes the case where .) Thus it remains to consider the cases when and are both positive or both negative. Let us treat the case where both are positive; the case where both are negative is similar, with careful attention to signs.
Suppose . Since is weakly increasing and , we find . In other words, , as required.
Suppose instead . We have
Both terms on the right are weakly decreasing in (the first because is concave). Therefore . This yields
Since , we conclude that .
Theorem 3. Suppose is a value function of the form , or else , where is a differentiable function of the distribution of . Then the axiology represented by converges ordinally and cardinally with an additive axiology, relative to background populations with any fixed distribution ; specifically, it converges with the additive axiology with weighting function given by
If the Pareto principle holds with respect to , then is weakly increasing, and if Pigou-Dalton transfers are weak improvements, then is concave.
Remark 1. Before proving Theorem 3, we should explain the requirement that ‘ is a differentiable function of the distribution of ’. It has two parts. First, let be the set of finitely-supported, non-zero functions . Let be the subset of distributions, that is, those functions that sum to 1. The first part of the requirement is that there is a function such that . In that sense, is just a function of the distribution of . Another way to put this is that can be extended to a function on all of that is scale-invariant, that is, for all reals and all . The second part of the requirement is that , so extended, is differentiable, in the following sense:50 for all , the limit
exists and is linear as a function of . In effect, is the best linear approximation of . In practice we only need to be differentiable at the background distribution .
Proof. Let range over background populations with the given distribution . Thus is of the form for some .
Define , in the case of TU-based egalitarianism, and in the case of AU-based egalitarianism. Noting that value functions of the assumed form can be evaluated not only on but on the larger set (see Remark 1), we have
We can then see that (as defined in Lemma 1) is the directional derivative of at :
Given that is differentiable as in Remark 1, this function is linear in and therefore represents an additive axiology . More specifically, for each welfare level let be a population with one person at level . We then have
as claimed in the statement of the theorem. In particular, for totalist egalitarianism, we find that
Similarly, for averagist egalitarianism,
Now, suppose differs from in that one person is better off, say with welfare instead of . If the Pareto principle holds with respect to , then for all ; by convergence, we cannot have . It follows that ; thus is weakly increasing. By the same logic, Pigou-Dalton transfers do not make things worse with respect to , and it follows that is concave.
Theorem 4. converges ordinally and cardinally to , relative to background populations with a given distribution . Specifically, converges with , the prioritarian axiology whose weighting function is
Here is the average distance between and the welfare levels occurring in .
Proof. Define . Then . It is easy to check that and therefore
In particular, is differentiable and Theorem 3 applies. We know from equation (5) in the proof of Theorem 3 that MDT converges with the additive axiology with weighting function
Theorem 5. converges ordinally and cardinally to , relative to background populations with a given distribution . Specifically, converges with , the prioritarian axiology whose weighting function is
Proof. Theorem 3 applies, with . (We omit the proof that this is differentiable.) We have, then, convergence with prioritarianism with a priority weighting function
Since the background distribution is fixed, this differs from the stated priority weighting function only by a positive scalar (i.e., the denominator), which does not affect which axiology the value function represents.
Theorem 6. converges ordinally and cardinally to relative to background populations with a given distribution , on the set of populations that are moderate with respect to .
Proof. Suppose that the weighting function has a horizontal asymptote at . As in Lemma 1 it suffices to show that , as ranges over populations with distribution , and on the assumption that is moderate with respect to .
Write for the number of people in with welfare at most , and similarly . Separating out contributions from and contributions from , we have
The assumption that is moderate means that, in those cases where , so that the first inner sum is non-trivial, we also have . Therefore each summand in the first double-sum tends to . The first double sum then converges to . It remains to show that the second double sum converges to 0. Call the summand in that double sum .
Since there are finitely many for which (i.e., for which the inner sum is non-trivial), it suffices to show that, for each such , the inner sum converges to 0. If , then the inner sum is identically zero, so we can assume . We can also assume that is large enough that is convex in the relevant range; then
Moreover, the number of terms in the inner sum, , is proportional to . It remains to apply the following elementary lemma with and .
Lemma 4. If is an eventually convex function decreasing to a finite limit, then as .
This is just a small variation on Lemma 2, and we omit the proof.
Theorem 7. Let be any set of welfare levels, and a distribution that covers . converges ordinally with relative to background populations with distribution , on the set of populations that are supported on ; the critical level is the highest welfare level occurring in .
Proof. Suppose and are supported on , and . Let be a population with distribution , so for some . We have to show that for all large enough.
Let and be populations of equal size, obtained from and by adding people at the critical level . By the second condition characterizing is just as good as , and just as good as . Therefore, the assumption that implies that . According to the first condition characterizing , we have for the first such that . Now, since covers , no welfare level occurring in or is higher than . Thus , and it follows that . For brevity define .
Let be the next welfare level occurring in above . If there is no such welfare level, then define .
We can decompose (and similarly other populations) as , where only involves welfare levels in the interval involves only only involves welfare levels in , and only involves those in . Note that , because of the way was chosen. We can therefore write
The populations on the right are written in rank-order; that is, every welfare level in is below every welfare level in , and so on. This makes it easy to apply the value function :
A similar expression holds for in place of . Note that because of the way was chosen. Combining expressions for and , and dividing by a common factor, we find
where the remainder is given by
Our goal is to show that the right-hand side of (6) is positive when is sufficiently large, for if it is positive then and thus .
To simplify (6), we use the standard fact that . Since , and similarly for , we find
Substituting this into (6) and rearranging, we find
To conclude that for all large enough, it suffices to show
For the first of these conditions, note that , by the way was chosen; therefore .
For the second, we claim that : that is, some welfare level in occurs in . There are two cases. First, if , some welfare level in occurs in , because covers . Otherwise, but . Then , and occurs in . Having proved the claim, let be the lowest welfare level occurring in . Since has distribution , is also the lowest welfare level in . Then
Finally, we will have as long as , since the second, complicated factor in the definition of is bounded as . And since , it suffices that , as we already showed.
Adler, Matthew (2009). Future Generations: A Prioritarian View. George Washington Law Review, 77 (5–6), 1478–520.
Adler, Matthew (2011). Well-Being and Fair Distribution: Beyond Cost-Benefit Analysis. Oxford University Press.
Arneson, Richard J. (2000). Luck Egalitarianism and Prioritarianism. Ethics, 110 (2), 339–49. http://doi.org/10.1086/233272
Arntzenius, Frank (2014). Utilitarianism, Decision Theory and Eternity. Philosophical Perspectives, 28 (1), 31–58.
Arrhenius, Gustaf (2000). An Impossibility Theorem for Welfarist Axiologies. Economics and Philosophy, 16 (2), 247–66. http://doi.org/10.1017/s0266267100000249
Asheim, Geir B. and Stéphane Zuber (2014). Escaping the Repugnant Conclusion: Rank-Discounted Utilitarianism with Variable Population. Theoretical Economics, 9 (3), 629–50.
Bar-On, Yinon M., Rob Phillips, and Ron Milo (2018). The Biomass Distribution on Earth. Proceedings of the National Academy of Sciences, 115 (25), 6506–11.
Bentham, Jeremy (1789). An Introduction to the Principles of Morals and Legislation. T. Payne and Son.
Blackorby, Charles, Walter Bossert, and David Donaldson (1997). Critical-Level Utilitarianism and the Population-Ethics Dilemma. Economics and Philosophy, 13 (2), 197–230. http://doi.org/10.1017/S026626710000448X
Blackorby, Charles, Walter Bossert, and David J. Donaldson (2005). Population Issues in Social Choice Theory, Welfare Economics, and Ethics. Cambridge University Press.
Bostrom, Nick (2003). Astronomical Waste: The Opportunity Cost of Delayed Technological Development. Utilitas, 15 (3), 308–14.
Bostrom, Nick (2011). Infinite Ethics. Analysis and Metaphysics, 10, 9–59.
Bostrom, Nick (2013). Existential Risk Prevention as Global Priority. Global Policy, 4 (1), 15–31.
Broad, C. D. (1914). The Doctrine of Consequences in Ethics. International Journal of Ethics, 24 (3), 293–320.
Broome, John (1997). Is Incommensurability Vagueness? In Ruth Chang (Ed.), Incommensurability, Incomparability and Practical Reason (67–89). Harvard University Press.
Browning, Heather (2020). If I Could Talk to the Animals: Measuring Animal Welfare (Doctoral dissertation). Australian National University
Buchak, Lara (2017). Taking Risks Behind the Veil of Ignorance. Ethics, 127 (3), 610–44. http://doi.org/10.1086/690070
Budolfson, Mark and Dean Spears (2022). Does the Repugnant Conclusion Have Important Implications for Axiology or for Public Policy? In Gustaf Arrhenius, Krister Bykvist, and Tim Campbell (Eds.), Oxford Handbook of Population Ethics (350–68). Oxford University Press.
Carlson, Erik (1995). Consequentialism Reconsidered. Kluwer.
Crisp, Roger (2003). Equality, Priority, and Compassion. Ethics, 113 (4), 745–63. http://doi.org/10.1086/373954
de Lazari-Radek, Katarzyna and Peter Singer (2014). The Point of View of the Universe: Sidgwick and Contemporary Ethics. Oxford University Press.
Fishburn, Peter C. (1970). Intransitive Indifference in Preference Theory: A Survey. Operations Research, 18 (2), 207–28.
Fleurbaey, Marc (2010). Assessing Risky Social Situations. Journal of Political Economy, 118 (4), 649–80.
Frankfurt, Harry (1987). Equality as a Moral Ideal. Ethics, 98 (1), 21–43. http://doi.org/10.1086/292913
Franz, Nathan and Dean Spears (2020). Mere Addition is Equivalent to Avoiding the Sadistic Conclusion in All Plausible Variable-Population Social Orderings. Economics letters, 196, 109547.
Greene, Brian (2004). The Fabric of the Cosmos: Space, Time and the Texture of Reality. Random House.
Gustafsson, Johan E. (2020). Population Axiology and the Possibility of a Fourth Category of Absolute Value. Economics and Philosophy, 36 (1), 81–110.
Gustafsson, Johan E. (2022a). Bentham’s Mugging. Utilitas, 34 (4), 386–91.
Gustafsson, Johan E. (2022b). Our Intuitive Grasp of the Repugnant Conclusion. In Gustaf Arrhenius, Krister Bykvist, and Tim Campbell (Eds.), The Oxford Handbook of Population Ethics (371–89). Oxford University Press.
Hardin, Garrett (1968). The Tragedy of the Commons. Science, 162 (3859), 1243–48.
Harsanyi, John C. (1977). Morality and the Theory of Rational Behavior. Social Research, 44 (4), 623–56.
Helliwell, John F., Richard Layard, and Jeffrey D. Sachs (2019). World Happiness Report 2019. Sustainable Development Solutions Network.
Horta, Oscar (2010). Debunking the Idyllic View of Natural Processes: Population Dynamics and Suffering in the Wild. Telos: Revista Iberoamericana de Estudios Utilitaristas, 17 (1), 73–90.
Hudson, James L. (1987). The Diminishing Marginal Value of Happy People. Philosophical Studies, 51 (1), 123–37. http://doi.org/10.1007/BF00353967
Hurka, Thomas (1982a). Average Utilitarianisms. Analysis, 42 (2), 65–69. http://doi.org/10.1093/analys/42.2.65
Hurka, Thomas (1982b). More Average Utilitarianisms. Analysis, 42 (3), 115–19. http://doi.org/10.1093/analys/42.3.115a
Hurka, Thomas (1983). Value and Population Size. Ethics, 93 (3), 496–507. http://doi.org/10.1086/292462
Hutcheson, Francis (1738). An Inquiry into the Original of our Ideas of Beauty and Virtue, In Two Treatises (4th ed.). D. Midwinter, A. Bettesworth, and C. Hitch, J. and J. Pemberton, R. Ware, C. Rivington, F. Clay, A. Ward, J. and P. Knap. (Original work published 1725)
Kagan, Shelly (2019). How to Count Animals, More or Less. Oxford University Press.
Kaneda, Toshiko and Carl Haub. How Many People Have Ever Lived on Earth? Population Reference Bureau. Retrieved from https://www.prb.org/howmanypeoplehaveeverlivedonearth/
Knobe, Joshua, Ken D. Olum, and Alexander Vilenkin (2006). Philosophical Implications of Inflationary Cosmology. The British Journal for the Philosophy of Science, 57 (1), 47–67.
Kowalczyk, Kacper (2020). Persons, Populations, and Value (Doctoral dissertation). University of Oxford.
McCarthy, David (2015). Distributive Equality. Mind, 124 (496), 1045–109. http://doi.org/10.1093/mind/fzv028
McCarthy, David, Kalle Mikkola, and Teruji Thomas (2020). Utilitarianism With and Without Expected Utility. Journal of Mathematical Economics, 87, 77–113.
Mill, John Stuart (1863). Utilitarianism. Parker, Son, and Bourne.
Ng, Yew-Kwang (1989). What Should We Do About Future Generations? Impossibility of Parfit’s Theory X. Economics and Philosophy, 5 (2), 235–53. http://doi.org/10.1017/s0266267100002406
Ng, Yew-Kwang (1995). Towards Welfare Biology: Evolutionary Economics of Animal Consciousness and Suffering. Biology and Philosophy, 10 (3), 255–85. http://doi.org/10.1007/BF00852469
Norwood, F. Bailey and Jayson L. Lusk (2011). Compassion, by the Pound: The Economics of Farm Animal Welfare. Oxford University Press.
Ord, Toby (2020). The Precipice: Existential Risk and the Future of Humanity. Bloomsbury Publishing.
Ord, Toby (2021). The Edges of Our Universe. arXiv: https://arxiv.org/abs/2104.01191v2 [gr-qc]
Parfit, Derek (1984). Reasons and Persons. Oxford University Press.
Parfit, Derek (1997). Equality and Priority. Ratio, 10 (3), 202–21. http://doi.org/10.1111/1467-9329.00041
Pressman, Michael (2015). A Defence of Average Utilitarianism. Utilitas, 27 (4), 389–424. http://doi.org/10.1017/s0953820815000072
Rabinowicz, Wlodzimierz (1989). Act-Utilitarian Prisoner’s Dilemmas. Theoria, 55 (1), 1–44. http://doi.org/10.1111/j.1755-2567.1989.tb00720.x
Roth, Gerhard and Ursula Dicke (2005). Evolution of the Brain and Intelligence. Trends in Cognitive Sciences, 9 (5), 250–57.
Shulman, Carl (2014). Population Ethics and Inaccessible Populations. Unpublished manuscript.
Sidgwick, Henry (1907). The Methods of Ethics (7th ed.). Macmillan and Company. (Original work published 1874)
Smil, Vaclav (2013). Harvesting the Biosphere: What We Have Taken from Nature. The MIT Press.
Sotala, Kaj and Lukas Gloor (2017). Superintelligence as a Cause or Cure for Risks of Astronomical Suffering. Informatica, 41 (4), 389–400.
Spears, Dean and Mark Budolfson (2021). Repugnant Conclusions. Social Choice and Welfare, 57 (3), 567–88.
Thomas, Teruji (2022). Separability and Population Ethics. In Gustaf Arrhenius, Krister Bykvist, Tim Campbell, and Elizabeth Finneron-Burns (Eds.), The Oxford Handbook of Population Ethics (271–95). Oxford University Press.
Tomasik, Brian. How Many Wild Animals Are There? Retrieved from https://reducing-suffering.org/how-many-wild-animals-are-there/
Vardanyan, Mihran, Roberto Trotta, and Joseph Silk (2009). How Flat Can You Get? A Model Comparison Perspective on the Curvature of the Universe. Monthly Notices of the Royal Astronomical Society, 397 (1), 431–44.
Vardanyan, Mihran, Roberto Trotta, and Joseph Silk (2011). Applications of Bayesian Model Averaging to the Curvature and Size of the Universe. Monthly Notices of the Royal Astronomical Society: Letters, 413 (1), L91–L95.
Weirich, Paul (1983). Utility Tempered with Equality. Noûs, 17 (3), 423–39. http://doi.org/10.2307/2215258