Antoine Comeau's blog

Tuesday, 29 July 2025

Game Theory: Bonus Points Exam Question

Here's a question I came up with that you can ask your students if you are feeling ruthless.

The Question

(Bonus Question) How many bonus points do you want? Only those who choose the least popular answer will be awarded their point(s). In case of equality, the lower number prevails.

a) 1

b) 2

c) 3

Discussion

Let's start by analyzing a simpler version where we remove option c) and we assume there are only three people taking the exam.

Let p denote the probability that a student will pick choice a) and let q denote the probability that a student will pick choice b). Since the rules are the same for every student, we may assume every student will use the same p and q.

We now take the point of view of a single student trying to gain an advantage. If they choose a), their expected value will be q^2. If they choose b), their expected value will be 2p^2.

The goal of the students is to aim for q^2=2p^2, because this way for any student it doesn't matter what choice they make. Using q=1-p, we solve the quadratic to get p=sqrt(2)-1 ≈ 0.41421.

Again from the point of view of a single student, if the other students are using these p and q values, then their expected value is the same for any strategy they might choose. So they may as well use these p and q values to participate in the defensive strategy.

This gives an expected value of 6 - 4*sqrt(2) ≈ 0.34315 for everyone.

But what if the students want to cooperate to maximize the total expected value of everyone. Then, they want to aim for choice b) to be the least popular answer with the maximum number of people possible making this choice (only one in the case of three people). This implies that most people will have to make a sacrifice and choose choice a) for the greater good. Let's now compute what p and q should be in this setting.

The total expected value we aim to maximize is given by 3pq^2+6p^2q. We get this expression by summing over all the possible scenarios their probability of happening times the total number of points the students are getting. In this case, we get p = 1/sqrt(3) ≈ 0.57735. The total expected value of the whole class is then 2/sqrt(3) ≈ 1.1547. In contrast, the non-cooperative strategy gives 18-12*sqrt(2) ≈ 1.0294.

Interestingly, in the cooperative strategy, the individual expected value is 2*sqrt(3)/9 ≈ 0.3849 before rolling the dice which is greater than 0.34315. After rolling the dice, if a student ends up having to pick a), their expected value drops to (4-2*sqrt(3))/3 ≈ 0.17863, and if they have to pick b), their expected value go to 2/3 ≈ 0.66667.

Now, somebody that needs to pick a) might decide to pick b) instead for individualistic gain, since it will increase their expected value if everyone else is cooperating. This is why this strategy is only viable if there is trust between the students. Even if a few students decide to be selfish, the cooperative strategy can still be overall better than the non-cooperative strategy. Funnily enough, if everyone decides to be selfish and always pick b) while thinking other people are going to cooperate then their expected value drops to zero! This means there is a breaking point where if there are too many people trying to profit from the collective, then they all would be better under the non-cooperative strategy. A reason the students might want to cooperate could be if they get a gift if their average beats another group of students for example.

Of course, the ideal way to cooperate would be if they are allowed to talk and all agree on who are going to make the sacrifice before taking the exam, then the collective expected value would be 2 and a single person would get the bonus points.

As the number of students increases to infinity, p tends to 1/2. To see this, let p=1/2-epsilon for some fixed epsilon>0. Then, once the number of student is large enough, choice a) will become the least popular answer with probability close to 1. If p=1/2+epsilon, choice b) will become the last popular answer with probability close to 1. Since an individual student isn't supposed to gain any advantage, the choice of p must be in 1/2-epsilon and 1/2+epsilon. The expected value for a student will then tend to 3/4 since the chance that a) or b) is the least popular answer will get close to 1/2 for each. Please note this is not a proof, but merely a heuristic. Also note that p equals exactly 1/2 is never an answer for any number of students, since then someone choosing b) 100% of the time will get expected value close to 1 if everyone else is using p=1/2 (when the number of student is large enough, one's vote becomes almost meaningless).

Let's move on by adding choice c) to the options and we consider a class of five students. We denote by r the probability of a student picking choice c). Notice that r=1-p-q.

The idea to solve the non-cooperative setting is essentially the same. A student picking choice a) has expected value q^4+4q^3r+6q^2r^2+4qr^3+r^4. For choice b), 2*(4p^3r+6p^2r^2) and for choice c), 3*(6p^2q^2).

We get two solutions for this set of equations from WolframAlpha. First is p = 1; q = 0; r = 0. Other one is p ≈ 0.315955; q ≈ 0.349065; r ≈ 0.33498. In the first solution, everyone loses, so it does not matter what choice you make. The other options gives an expected value greater than zero (≈0.22), so it makes sense for everyone to go for this solution instead of the trivial one.

In conclusion, as we increase the number of choices, the number of variables in the polynomials increases. As we increase the number of students, the degree of the polynomials increases.

Monday, 26 January 2015

Fast generation of all permutations

This current algorithm only works for odd numbers of elements. Here's the code in Java:

 public static void main(String[] args)  
 {  
      int n=11;  
      int a,b,c,i,tmp;  
      int end=(int)Math.floor(n/2);  
      int[][] pos = new int[end+1][2];  
      int[] perm = new int[n];  
   
      for(i=0;i<n;i++) perm[i]=i;  
        
      while(true)  
      {  
           //this is where you can use the permutations (perm)  
           i=0;  
           c=n;  
             
           while(pos[i][1]==c-2 && pos[i][0]==c-1)  
           {  
                pos[i][0]=0;  
                pos[i][1]=0;  
                i++;  
                c-=2;  
           }  
             
           if(i==end) System.exit(0);  
             
           a=(pos[i][0]+1)%c+i;  
           b=pos[i][0]+i;  
             
           tmp=perm[b];  
           perm[b]=perm[a];  
           perm[a]=tmp;  
             
           if(pos[i][0]==c-1)  
           {  
                pos[i][0]=0;  
                pos[i][1]++;  
           }  
           else  
           {  
                pos[i][0]++;  
           }  
      }  
 }

It works by swapping the elements i and i+1 mod n, adding 1 to i after every swap. After doing this process n*(n-1) times, you loop back to the first permutation. This is where you apply the recursive step: ignore the first and last elements of the array and start the same process. After one swap on this smaller array you get back to the whole array and re-do the n*(n-1) swaps. Then you return to the smaller array and apply the next swap. After (n-2)*(n-3) swaps to the truncated array it will also loop back to its original order, this is why the algorithm is recursive, you must ignore the first and last elements of the smaller array to get a smaller one and apply the algorithm again.

Here's an example using 5 elements, keep in mind it only goes two levels deep in recursion.

0 1 2 3 4
1 0 2 3 4
1 2 0 3 4
1 2 3 0 4
1 2 3 4 0
0 2 3 4 1
2 0 3 4 1
2 3 0 4 1 ... etc.

You keep doing that till you loop to 0 1 2 3 4, here's the permutation before:

4 1 2 3 0

You'd permute 4 and 0, but instead you go down one level in recursion: you ignore 4 and 0, you permute 1 and 2.

4 2 1 3 0

Then you return to the whole array and do the process again:

2 4 1 3 0
2 1 4 3 0
2 1 3 4 0
... etc

And when you get to 0 2 1 3 4 the next step would be to swap 0 and 4 to get back to 4 2 1 3 0, instead of doing this we return to the smaller array of three elements. But now it's the second time we're here so we permute these elements to get:

0 2 3 1 4

And you go back to the full array and do those 19 steps again. In short, these are the positions you swap each time after you do the 19 steps (the 20th swap would get you to a permutation you already generated):

(...19 swaps...)
X X X X X
(...19 swaps...)
X X X X X
(...19 swaps...)
X X X X X
(...19 swaps...)
X X X X X
(...19 swaps...)
X X X X X

It's always the same process that you apply, ignoring two elements for each level of recursion you are in.

I will work on the algorithm later to optimize it and fix the issue for even numbers of elements. I will also work on a proof that it generates distinct permutations.

Thursday, 10 July 2014

Mapping between permutations and natural numbers

This algorithm is composed of two functions, one mapping a natural number between 0 and n!-1 to a unique permutation, and the other its inverse. Both are in O(n), you can find the code here: http://pastebin.com/0EU5v3dd.

See here (Joren's answer) for a good explanation on Lehmer's code which my algorithm is based on. The algorithm he presents is in O(n^2) because it needs to find the nth element in an array filled with holes. This means it needs to iterate over the elements in the list until it finds the nth non-empty element.

My idea was to keep a list without holes, which means finding the nth remaining element in one operation. To do this, every time you remove an element from the array, you swap the last element of the array in the position of the one you just removed. We can thus keep a list without holes doing only a few operations each time we remove an element: elems[ind]=elems[n-i-1].

You can work out the inverse function yourself it's pretty fun.

 public static int[] perm(int n, int k)  
 {  
      int i, ind, m=k;  
      int[] permuted = new int[n];  
      int[] elems = new int[n];  
      for(i=0;i<n;i++) elems[i]=i;  
      for(i=0;i<n;i++)  
      {  
           ind=m%(n-i);  
           m=m/(n-i);  
           permuted[i]=elems[ind];  
           elems[ind]=elems[n-i-1];  
      }  
      return permuted;  
 }  
 public static int inv(int[] perm)  
 {  
      int i, k=0, m=1;  
      int n=perm.length;  
      int[] pos = new int[n];  
      int[] elems = new int[n];  
      for(i=0;i<n;i++) {pos[i]=i; elems[i]=i;}  
      for(i=0;i<n-1;i++)  
      {  
           k+=m*pos[perm[i]];  
           m=m*(n-i);  
           pos[elems[n-i-1]]=pos[perm[i]];  
           elems[pos[perm[i]]]=elems[n-i-1];  
      }  
      return k;  
 }