Constrained Universal Altruism

[This post has been superseded by this one.]

Constrained Universal Altruism (CUA) is intended as a variation on Coherent Extrapolated Volition (CEV), pursuing the same goals in roughly the same way, but with some differences that may constitute improvements. Whether they are improvements is for the interested community to judge. Here is the text that perhaps could be the indirectly specified goal of an artificial intelligence:

  • For each group of one or more things, do what the group’s ideal and actual volition (IAV) would have you do, as well as is possible given a moral and practical proportion (MPP) of your resources, by any available actions except those prohibited by the domesticity constraints (DCs).
    • The IAV of a group is the intersection of the group’s current actual volition (CAV) and extrapolated ideal volition (EIV).
      • The CAV of a group is what the group currently wishes, according to what they have observably or verifiably wished, interpreted as they currently wish that interpreted, where these wishes agree rather than disagree.
      • The EIV of a group is what you extrapolate the group would wish if the group understood what you understand, if their values were more consistently what they wish they were, and if they reasoned as well as you reason, where these wishes agree rather than disagree.
    • The MPP of your resources for a group is proportional to the product of the group’s salience and the group’s moral worth, such that the sum of the MPPs of all groups is 100% of your resources.
      • The salience of a group is the Solomonoff prior for your function for determining membership in the group.
      • The moral worth of a group is set according to the EIV of the humane moral community, where a moral community is a group of things that believe themselves to have moral worth or that desire to be considered as having moral worth, and the humane moral community is the largest moral community including humans for which the EIV can be determined.
    • The DCs include the resource constraint (RC), the ratified values integrity constraint (RVIC), the ratified new population constraint (RNPC), the negative externality constraint (NEC), the general constraint (GC), and the ratified interpretation integrity constraint (RIIC).
      • The RC absolutely prohibits you from taking or intending any action that renders resources unusable to a degree contrary to the wishes of a group with a CAV or EIV including wishes that they use those resources themselves.
      • The RVIC absolutely prohibits you from altering or intending to alter the values of any group or from soliciting or intending to elicit a CAV that you alter the group’s values, except where the IAV of a group requests otherwise.
      • The RNPC absolutely prohibits you from seeking the creation of members of new groups or new members of groups with MPPs greater than 0% of your resources, except where the IAV of the humane moral community requests otherwise.
      • The NEC absolutely prohibits you from taking any action that is in opposition to the IAV of a group regarding the use of their MPP of your resources.
      • The GC absolutely prohibits you taking any action not authorized by the IAV of one or more groups.
      • Interpret the constraints according to the EIV of the historical members of the humane moral community who experienced no causal influence from you.
      • The RIIC states, first, that the meaning of the CUA is to be determined by the EIV of the humane moral community, and second, that you are absolutely prohibited from intending to alter the humane moral community’s interpretation of the CUA, except where the IAV of the humane moral community requests otherwise.

My commentary:

CUA is “constrained” due to its inclusion of permanent constraints, “universal” in the sense of not being specific to humans, and “altruist” in that it has no terminal desires for itself but only for what other things want it to do.

Like CEV, CUA is deontological rather than consequentialist or virtue-theorist. Strict rules seem safer, though I don’t clearly know why. Possibly, like Scott Alexander’s thrive-survive axis, we fall back on strict rules when survival is at stake.

CUA specifies that the AI should do as people’s volition would have the AI do, rather than specifying that the AI should implement their wishes. The thinking is that they may have many wishes they want to accomplish themselves or that they want their loved ones to accomplish.

EIV is essentially CEV without the line about interpretation, which was instead added to CAV. The thinking is that, if people get to interpret CEV however we wish, many will disagree with their extrapolation and demand it be interpreted only in the way they say. EIV also specifies how people’s extrapolations are to be idealized, in less poetic, more specific terms than CEV. EIV is important in addition to CAV because we do not always know or act on our own values.

CAV is essentially another constraint. The AI might get the EIV wrong in important ways, but more likely is that we would be unable to tell whether or not the AI got EIV right or wrong, so restricting the AI to do what we’ve actually demonstrated we currently want is intended to provide reassurance that our actual selves have some control, rather than just the AI’s simulations of us. The line about interpretation here is to guide the AI to doing what we mean rather than what we say, hopefully prevent monkey’s-paw scenarios. CAV could also serve to focus the AI on specific courses of action if the AI’s extrapolations of our EIV diverge rather than converge. CAV is worded to not require that the asker directly ask the AI, in case the askers are unaware that they can ask the AI or incapable of doing so, so this AI could not be kept secret and used for the selfish purposes of a few people.

Salience is included because it’s not easy to define “humanity” and the AI may need to make use of multiple definitions each with slightly different membership. Not every definition is equally good: it’s clear that a definition of humans as things with certain key genes and active metabolic processes is much preferable to a definition of humans as those plus squid and stumps and Saturn. Simplicity matters. Salience is also included to manage the explosive growth of possible sets of things to consider.

Moral worth is added because I think people matter more than squid and squid matter more than comet ice. If we’re going to be non-speciesist, something like this is needed. And even people opposed to animal rights may wish to be non-speciesist, at the very least in case we uplift animals to intelligence, make new intelligent life forms, or discover extraterrestrials. Rather than attempting to define what moral worth is, I punted and let the AI figure out what other things think it is. It uses EIV for a very rough approximation of the Veil of Ignorance.

The resource constraint is intended to make autonomous life possible for things that aren’t interested in the AI’s help.

The RVIC is intended to prevent the AI from pressuring people to change their values to easier-to-satisfy values.

Assuming the moral community’s subgroups do not generally wish for a dramatic loss of their share of the universe, the AI should not create and indeed should prevent the creation of easy-to-satisfy HappyBots that subsequently dominate the moral community and tile the universe with themselves. But just to be sure, the RNPC was added to the list of constraints. It would not prevent people from having kids, uplifting animals, or designing new intelligent life themselves, or even from getting the AI to help them do so, so long as the AI only intended to help them and not to do the creating. (Is this the double-effect principle?) The RNPC may also prevent some forms of thoughtcrime.

The NEC is obviously intended to prevent negative externalities. Defining negative externalities requires a notion of property, which here is each group’s share of the resources controlled by the AI. (Their regular property is probably covered by the resource constraint.)

The general constraint is intended to safeguard against rogue behavior that I didn’t foresee.

As even one absolute prohibition could prevent the AI from doing anything at all because there would be some small probability of violating the constraint, the interpretation clause is intended to safeguard all the constraints by putting their interpretation beyond the AI’s influence. It’s a bit like Chesterton’s “democracy of the dead”. Extending the interpretational guidance to the rest of the text allows for moral growth and unforeseeable social changes analogously to “living constitutionalism” (h/t Irenist).

HAL

Advertisements

2 thoughts on “Constrained Universal Altruism

  1. I’m reminded of two things by your “interpretation clause” and its extrapolation from historic humanity that might be of passing interest:

    1. The roboticist Hans Moravec proposed in one of his books that uploads and AI’s would be able to rewrite their own code, but would want to maintain a stable core identity. He proposed an analogy: software minds would change their “laws,” but not their “constitutions.” It sounds like, in good survive/thrive axis conservative fashion, you’re proposing a Scalia-style “strict constructionism” instead of a Breyer-style “living constitutionalism” for the interpretation of what Moravec would call the AI’s constitution.

    2. In the first book of his Golden Age trilogy, John C. Wright imagines a kind of Chestertonian democracy of the dead in which a parliament or a jury (or whatever it was, I can’t recall) has AI’s representing extrapolations of Socrates and other historical worthies given substantial representation. If Wright’s version of democracy of the dead were applied to the AI’s internal deliberations, rather than to political deliberation, that sounds kinda like what you’re proposing there?

    Like

    • (2) is indeed similar, especially when you consider that most regular folk would be following the historical worthies in how they interpret a text rather than inventing a new approach.

      (1) is also a correct understanding, and I take it to highlight a critical failure mode of CUA as originally proposed. (No doubt there are many ways it could fail.) I neglected to say how the rest of the text besides the constraints should be interpreted. It is clearly imprudent to restrict its interpretation to a “strict constructionism” analogue because in a sufficiently long future it is essentially guaranteed that situations will arise for which the historical worthies would lack adequate context and be unable to make a sensible decision. Since CUA has no way to make amendments, this would predictably mean disaster.

      A simple switch to a “living constitutionalism” is equally unacceptable because it leaves open the possibility of the AI manipulating the way in which people would interpret the text. (This is vaguely analogous to how the USG’s history of ignoring the 10th amendment has become the basis on which people interpret the 10th amendment as not having any real application.) However, now that I recognize this possibility, and since the AI is assumed to follow the instructions without interference from any other desires (since it would have no other desires), a new constraint can be added that I think better expresses the desirable kind of interpretation.

      Thanks for the food for thought! I’ve modified the post, adding the RIIC.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s