The Iterated Prisoner’s Dilemma and The Evolution of Cooperation

The Iterated Prisoner’s Dilemma and The Evolution of Cooperation


Let’s say 2 people have 2 options and which
option they each pick is going to define how much of stuff they will get with one another.
The numbers represent something that they want, like money. Or points! Everybody likes
points they want as many points as they can get.
What makes the prisoner’s dilemma the prisoner’s dilemma is really, this number being bigger
than this number, this number being bigger than this number, and this number being bigger
than this number. In a pattern like this. They can pick A or B but they have no control
over whether their opponent will pick A or B
So when faced with their choices, they see that option B always gets them more.
And it�s the same for the other guy, going option 2 is always better. But because of
the way it�s set up, both going option B is the worst for the group. And really not
one of the better situations for the individual. Because going option A is worse for the individual,
but better for the other person, and best for the group we might call it sharing or
cooperating or whatever depending on the situation. It looks like working together to get more
together. Going option B is always better for the individual,
but worse for the other person, so it looks like defecting, cheating, or betraying depending
on the situation. Their aiming for personal gain.
If they’re only going to play once, a player will always get more by defecting.
But if they are going to interact with somebody multiple times; if they play once, then they
play again and again, and we add up their scores, the strategy changes.
INTRO In one off games, defecting gives a higher
payout, doesn’t matter what the other person is doing.
And with multiple games, if the opponent cooperates and defects at random, or follows a set pattern,
always defecting still gives the best payout. Try always cooperating with them? No. Start
off cooperating then defect? No. Try to line up cooperation and defection? Doesn’t matter.
Always defecting is better. Because here, like in a one-off game, defecting
has no consequences, and it always gives a higher payout.
But if what a player picks, changes depending on what the other player did, for example:
this player starts off cooperating and always cooperates. Unless their opponent defects.
Then they switch to defecting and defect no matter what. You know it sort of gets pissed
off. We’ll call it GRUDGER. Any strategy that started off cooperating with GRUDGER, or always
cooperated would have gotten a higher score than always defect.
Because here defecting can have consequences. With multiple games there is an opportunity
to influence the other player for future games. ALWAYS DEFECT isn’t the best strategy anymore.
What is? In 1980 Robert Axelrod held a tournament where
anyone could submit a strategy. Each strategy did 200 rounds against each other strategy.
There were 14 strategy submitted. Plus the strategy 50/50 RANDOM.
These were the payoffs so if two strategies cooperated with eachother for all 200 rounds,
they would each get 600. If they both defected they would both get 200. If one cooperated
and the other defected the whole time, one would get 0 and the other 1000, the highest
and lowest possible scores. If they went back and forth, like this, they would each get
500. And these are the averaged results of the
tournament. The winner was a strategy called tit for tat.
TIT FOR TAT cooperates on the first round and from then on it just copies what the other
person did last round. Why did it win? It’s a simple strategy, but
one thing is does is, it reciprocates quickly against defectors. Any strategy against tit
for that that tries to take advantage of it, gets instantly punished and put into a bad
situation. So if the other strategy keeps defecting, or even if it tries to go back
to cooperating, it would have gained less than it would have if it had just kept cooperating
with TIT FOR TAT. And if TIT FOR TAT didn’t punish, TIT FOR TAT would have been worse
off. So we might say TIT FOR TAT is “retaliating”,
it punished defection. Which is good because it can prevent some losses, and it can disincentive
an opponent from defecting. Another thing is, TIT FOR TAT is never the
first to defect. Players would want to be in this situation as much as they can, there
is a temptation to defect when the other person is cooperating. But any responsive opponent
would quickly defect and they would both end up here. It’s risky to defect. An easy solution
is to ignore that temptation and just try to maximize long term mutual cooperation.
Start off cooperating and then never defect unless you need to punish someone.
It can end up giving better gains, especially with opponents that do it too.
A strategy never defects first kind of looks like it’s being nice, so we could say TIT
FOR TAT is a nice strategy. And it seemed to be a good trait to have in this tournament,
the top 8 strategies were nice, and the bottom 7 were not.
The least successful nice strategy was GRUDGER. Never the first to defect, but once the other
defected it never cooperated again no matter what. Which doesn’t really give a great payout.
TIT FOR TAT, allows cooperation if the other person wants to cooperate again so they can
both cooperate going forwards. The amount of punishment GRUDGER’s gives, hurts the punisher
alongside the punished. OK, so we could say TIT FOR TAT is forgiving,
gives a quick punish then allows for mutual cooperation again. And since it’s just copying,
if that other strategy keeps defecting, so does TIT FOR TAT.
These seemed to be the traits that made tit for tat so good in this tournament.
It’s nice, it’s not tempted by this risky option. It’s retaliating, it disincentives
being taken advantage of, and it’s forgiving it will allow getting back to cooperation.
Because TIT FOR TAT is just copying, it can’t ever beat an opponent. It can either tie.
Or lose. Really just depends on whether the opponent defects in the very last round where
TIT FOR TAT can’t reciprocate. The opposite is true for ALWAYS DEFECT, it
can only tie. Or win if the opponent ever tried cooperating. But that doesn’t matter.
Which strategy wins is about how many points they got, not their relative score to any
given opponent. But TIT FOR TAT can run into problems though.
JOSS is a strategy that’s basically tit for tat, but sometimes it tries defecting. Against
regular tit for tat, they would go. Hey you cheated. Hey you cheated. Hey YOU cheated.
Back and forth, a sort of defection echo. Then when it tries defecting again it becomes
all defection. There were a few strategies that could have
won this tournament if they had been entered. One was called FORGIVING TIT FOR TAT, or TIT
FOR TWO TATS. This strategy requires 2 defections before it retaliates. It would have prevented
the echo effects that hurt regular TIT FOR TAT and won the tournament.
It gained more preventing scenarios like this, than it lost letting itself be taken advantage
of once in a while. Most strategies trying to improve tit for
tat tried to do so by being less nice, trying to find a way to capitalize on defection.
Instead the opposite was the case, not even punishing every defection ended up being better
in the long run. At least in this environment. Later Axelrod held a second tournament. This
time there were a lot more entries. And they didn’t do a set 200 rounds. That way nobody
would know when the interaction would end. See the footnotes below.
In this tournament, even though FORGIVING TIT FOR TAT was entered, regular TIT FOR TAT
won again. FORGIVING TIT FOR TAT didn’t win because people
knew about it. A strategy called TESTER starts out cooperating
but tries defecting like this to see how the player reacts. If the opponent punishes, it
cooperates to apologize and prevent echo defections, then just becomes tit for tat for the rest
of the time. So this is what it would look like against TIT FOR TAT. But against easygoing
strategies like FORGIVING TIT FOR TAT, it can learn that it’s able take advantage of
them. FORGIVING TIT FOR TAT proved too forgiving. At least in this context
Let’s change the rules a bit. Let’s say we’re in a reproduction situation.
These points aren’t just points, they represent resources that could be used for reproduction.
If it gets lots of points like TIT for TAT, it will reproduce more and we’ll put more
of them into the next generation, the next tournament. If they get fewer points like
50 50 RANDOM, we’ll put fewer of them into the next generation.
TIT FOR TAT and other successful strategies reproduced well and followed upward arcs like
these. The not so successful strategies followed downward trends and went extinct. Exploitative
strategies like HARRINGTON, did well at the start but as its victims “went extinct”, its
population declined as well. The really successful strategies were ones that could work well
with other successful strategies, basically nice or otherwise cooperative strategies.
They supported one another were able to continue to reproduce.
OK, now let’s imagine another situation like this, but it’s a world of ALWAYS DEFECTORs.
It’s a cruel world with otherwise the same rules. Can a “nice” mutation establish itself?
can something like TIT FOR TAT invade a group of ALWAYS DEFECTORS? If it was just one individual,
maybe not so well . As a nice strategy it’s constantly getting
taken advantage of by the native defectors. The natives get better scores with one another
than TIT FOR TAT gets with them. TIT FOR TAT has nobody to cooperate with and just comes
away as the worst reproducer. But if there were a couple TIT FOR TATS, then
they could gain more from one another than they lose to the defectors like they do in
the tournaments. And eventually they would end up taking over.
And once established, it would be really hard for a non-nice strategy to invade TIT FOR
TAT. Because TIT FOR TAT is retaliating, any non-nice strategy is going to get a lower
score with a TIT FOR TAT than TIT FOR TATs get with other TIT FOR TATs. Here the non-nice
strategies would get the lowest scores and be the worst reproducers.
Anyway you can play around with these models all day.
Like what if there were random mistakes. Sometimes nice strategies accidentally defect or look
like they defect. Then there may be lots of defection echo problems and then variations
on FORGIVING TIT FOR TAT would dominate. Or what if the players were able to learn
and change their strategy? Then you might seem, for example, cooperation spreading to
a bunch of defectors as they learn they can get more from it.
And so on. But the point is. For purely self-interested players, like reproducing
cells, there is more to be gained by being cooperative; being nice and forgiving. If
also retaliating. And this would be beside an “inclusive fitness
help them because it carries the same genes” sort of thing.
TIT FOR TAT does quite well in these model reproduction situations, it can invade other
strategies, and it’s difficult to be invaded. But the way TIT FOR TAT works, it’s not factoring
a larger reproduction game or how much it’s gaining. It has no foresight and almost no
memory. It just reacts to specific situations. So IF situations LIKE these were some part
of a cell’s history; if any cells that survived the gauntlet of time to still exist today,
did so at least partially in prisoner’s dilemma like situations.
Then like TIT FOR TAT, they don’t need to think about themselves or reproduction, to
be reproductively successful. They could just learn or have instinct to
be kind, to forgive, to feel cheated and want to retaliate. They could even just go “I’m
going to reciprocate whatever they do, bur bur bur”. Those actions are where the reproductive
success comes from. They don’t necessarily have to only be nice as a part of some sort
of selfish plan or selfish viewpoint. Video’s over now Oh one more thing, ThisPlace was brought to you today by, the letter G. For 10% off your first order of the letter G enter promo code thisplace at checkout

You May Also Like

About the Author: Sam Caldwell

100 Comments

  1. Using tit for tat in school in rock paper scissors is so fun it just destroys them, as long as we play like 10+ games

  2. This is a interesting way to explain why socialism fails. It punishes the consensual co-op relationships with forced defection. The more socialism is pushed into the economy the faster it kills said economy.

  3. So,in theory,if we're going the highest chance to win, isn't always defecting the best strategy? Because on the first round,if the opponent co-operates,we get more points,and even if they defect later,we still end up with more points.

  4. what if you gave the strategies foresight like tit for tat can see what the player will do next and reciprocate that it would retaliate faster and cooperate faster of course foresight is not always easy to get in real life

  5. How about one that always starts off nasty but when the other shows kindness, it gives it back in the next round so it is always either drawing or winning.

  6. 1:00
    No if they play once and both defect the individual would still be worse off than if they both cooperated.
    1:15
    That is not at random. This uses cooperated in a plurality of situations which is the one situation in which defecting is worth it…

  7. "The evolution of trust game"
    Step1: go there
    Step 2: be surprised about how stolen this video is
    Step 3: keep playing because it is fun to do.

  8. There are 2 Nash equilibriums for the prisoner dillemma. Watch my video for an explanation!

    https://youtu.be/8QFsu–iC3Q

  9. This dumps a good part of religious and moral philosophy. Thanks for making it as simple as possible but not simpler.

  10. Wouldn't something that beat Tit for Tat be exactly like it but on the last round it would defect at the last turn.

  11. yes but all this ignores a crucial fact. This is not a game where score counts, this is going to prison. The difference between not going to prison at all and going for one year is many many many times larger than the difference between going 1 year vs 2 years. and the cost of cooperating will always be unacceptably high. The only route to an acceptable outcome (zero prison time) is screwing the other person and never getting screwed back. The risk of possibly receiving a slightly worse outcome overall is worth the opportunity to play for the only good outcome.

    Consider this variation. You play the game every month. If you both cooperate you each get $3 if you both rat out the other you both get $1, if you rat out the other and the other cooperates you get $5 and the other person gets nothing. Oh and it costs $5 a month to feed your family, anything less and you get to watch them starve to death. Cooperating Is only useful to you if the other person's prosperity costs you nothing and is of benefit to you.

  12. This is not how life works. This experiment is a game of clinical psychology. It's played in a controlled clinical environment. This is not science but a game to confirm Axelrod's belief that cooperators win over competitors. In real life, competitors (and manipulators) win because they manipulate the environment.

  13. Nicky Case has cool simulations for the prisoner's dilemma https://ncase.me/trust/. Shows when each strategy comes into its own based on repeated interactions, mistakes, etc…

  14. humans are selfish, we will always take advantage of each other for personal gains. Cooperation for cooperation's sake doesn't exist. everyone has an ulterior motive. this is also why systems like anarcho-capitalism work and communism doesn't. communism relies on the goodness of peoples hearts but forgets that humans are corrupt and crave power and influence. anarcho-capitalism takes advantage of that, by establishing a competitive environment where competition is key, but cooperation often takes place. "you scratch my back i'll scratch yours" type of system. anarcho-capitalism is not every man for himself it's every man for themselves with the help and support of others. because again, humans are competitive creatures but tend to cooperate for personal gain.

  15. I remember hearing of this, in a "Social Darwinism" context. People always asume that the most complex one are the best adapted.
    But this is an actuall test where one of the least complex tactics (just copy the other sides last move) won. It was way less complex then any other strategy entered (except maybe "always play nice") but still won out.

  16. I don't know if you realize it but that's PROOF OF WHY EMOTIONS EXIST. They are body manipulators to make you follow a certain stategy to reproduce better.

  17. So this is how humans developed their innate sense of justice. Nothing intellectual, just a survival adaptation. Fascinating.

  18. hmm, I'm interested in what would happen if you created a more dynamic Dilemma. like if you have 3 or 5 possible options instead of just 2.

  19. example of my strategy, o=coop and x=vs
    my strategy is the opposite of tit for tat
    me: o o x x o o x x =tie
    me: 0 3 5 1 0 3 5 1
    op: x o o x x o o =tie
    op: 5 3 0 1 5 3 0 1

  20. There is a very good interactive game called The Evolution of Trust which shows basically all of this stuff but in a game. It's really cool!

  21. Couldn't tit for rat be improved by also ending with a defct, even if the opponnent cooperated last round?

  22. I like how this video kinda explains why many animals have some kind of moral (many experiments have shown that humans aren't the only animals with a sense of morality)

  23. You know, it's always nice to have your worldview — that cooperation is better than competition — validated statistically.

  24. Dude my professor showed this too us in class and then i found the rest of your stuff on this channel, def. earned my sub.

  25. i would like to see what happend when in last one round i will betray… Then optimal situation would be to betray in last 2 rounds if betray in last one would be normal… how would look like stable population?

  26. One variant of the generational game to look at:
    https://codegolf.stackexchange.com/q/122118/47990
    A little backstory: because the number of iterations was known (200) there started a chain of backstabbing in the final rounds (it got silly) and N tits for a tat (defect N times every time its defected against, which got up to N=54). The tweak that this particular sim had was that every generation each bot would start with 10% of the points it had accumulated during the previous generation (as well as being more populous in the pool).

    The winning bot (mine) couldn't win against one of its competitors in a straight up pairing (namely the T4T and Backstab variants; could only tie), so it feeds itself to defector variants in order to boost the defector population which CAN out-perform T4T and Backstab (by starting with a point lead from the previous round and then resulting in a low-point draw). During later generations Perfect Gentleman returns and out-competes the survivors.

    Literally wins 95% of the simulations with 100% victory. The remaining 5% is when every PG in the first generation pairs off against a defector and goes extinct.

  27. This all assumes defecting is only 5/3's more points than cooperating. What happens when the reward for defecting is vastly higher, like say it was worth 10 points?

  28. I highly recommend everyone playing "The Evolution of Trust" free online flash game by Nicky Case. It's basically an interactive version of this video that's easier to understand and where you can change some rules to see how it changes which strategies survive.

  29. tit-for-tat plus the-one-who-laughs-last-laughs-best. Regular tit-for-tat + last round always defect. Against regular tit-for-tat they will always win. Against themselves they would do worse than only tit-for-tat. Meaning that there is a place for stabbing someone in the back in a society with both versions, especially when this backstabbing version is limited.

    Guess what we're seeing in business, politics and the like? The BEST things, power and positions go to backstabbers. And they're always a small group.

  30. What about the most successful strategy so far discovered? Win-Stay, Lose-Switch, also known as Pavlov?

    This strategy behaves similarly to Tit-for-Tat, but CAN exploit players who are too nice. It has its own downfalls, most notably that it will perform more poorly against defectors, but against many other strategies, it's quite effective.

    The strategy goes like this: Start by cooperating. If your opponent cooperates, do what you did last turn. If your opponent defects, do the opposite of what you did in the last round.

    In scenarios where the strategies sometimes make mistakes, Pavlov really excels. Against Tit-for-Tat, it will mostly just cooperate, but if one of them ends up accidentally defecting, Pavlov will forgive, and they'll go back to cooperating with one another. Two Tit-For-Tats will end up in a punishment loop. Against Cooperators or Forgiving Tit-For-Tat, Pavlov can exploit their forgiving nature to get extra points. In the case of Cooperators, if either side defects, Pavlov will switch from Cooperating to Defecting, and then continue defecting. In the case of Forgiving Tit-For-Tat, Pavlov can end up exploiting the forgiveness for extra points any time either of them randomly defects. Against other Pavlovs, Pavlov also does well, because the two strategies will forgive one another after a mutual defection and then go on cooperating. Grudger would cause it problems in the event that one of them defected accidentally, in which case Grudger would exploit the forgiveness of Pavlov for extra points.

    In populations where Pavlov is not alone against Defectors, Pavlov can actually do extremely well. While it won't be as effective as Grudger or Tit-For-Tat at defending against Defectors, it will be able to get lots of points from interactions with nice strategies, and can even exploit strategies that are slow to retaliate. Even though Pavlov matches up worse against Grudger than Tit-For-Tat, Grudger will tend to get lower scores against any strategy that retaliates, and will thus tend to lose out in the long run.

    I can think of a few other strategies that could be specialized to defeat Pavlov, but the ideas I have will always result in lower total scores than Pavlov vs Pavlov.

  31. Dont forget about 'Detetcive'

    First 4 rounds: cooperate, cheat, cooperate, cooperate
    If cheat, play 'Tit for tat'
    If never cheat, play 'Always cheat'

  32. it seems obvious cooperation would always beat the alternative. The key I took away is, dont hold a grudge, an eye for an eye (or tit for tat) and than forgive.

Leave a Reply

Your email address will not be published. Required fields are marked *