Tuesday, 29 July 2025

Game Theory: Bonus Points Exam Question

Here's a question I came up with that you can ask your students if you are feeling ruthless.

 

The Question

(Bonus Question) How many bonus points do you want? Only those who choose the least popular answer will be awarded their point(s). In case of equality, the lower number prevails.

a) 1

b) 2

c) 3 

 

Discussion

Let's start by analyzing a simpler version where we remove option c) and we assume there are only three people taking the exam.

Let p denote the probability that a student will pick choice a) and let q denote the probability that a student will pick choice b). Since the rules are the same for every student, we may assume every student will use the same p and q.

We now take the point of view of a single student trying to gain an advantage. If they choose a), their expected value will be q^2. If they choose b), their expected value will be 2p^2.

The goal of the students is to aim for q^2=2p^2, because this way for any student it doesn't matter what choice they make. Using q=1-p, we solve the quadratic to get p=sqrt(2)-1 ≈ 0.41421.

Again from the point of view of a single student, if the other students are using these p and q values, then their expected value is the same for any strategy they might choose. So they may as well use these p and q values to participate in the defensive strategy.

This gives an expected value of 6 - 4*sqrt(2) ≈ 0.34315 for everyone.

But what if the students want to cooperate to maximize the total expected value of everyone. Then, they want to aim for choice b) to be the least popular answer with the maximum number of people possible making this choice (only one in the case of three people). This implies that most people will have to make a sacrifice and choose choice a) for the greater good. Let's now compute what p and q should be in this setting.

The total expected value we aim to maximize is given by 3pq^2+6p^2q. We get this expression by summing over all the possible scenarios their probability of happening times the total number of points the students are getting. In this case, we get p = 1/sqrt(3) ≈ 0.57735. The total expected value of the whole class is then 2/sqrt(3) ≈ 1.1547. In contrast, the non-cooperative strategy gives 18-12*sqrt(2) ≈ 1.0294.

Interestingly, in the cooperative strategy, the individual expected value is 2*sqrt(3)/9 ≈ 0.3849 before rolling the dice which is greater than 0.34315. After rolling the dice, if a student ends up having to pick a), their expected value drops to (4-2*sqrt(3))/3 ≈ 0.17863, and if they have to pick b), their expected value go to 2/3 ≈ 0.66667.

Now, somebody that needs to pick a) might decide to pick b) instead for individualistic gain, since it will increase their expected value if everyone else is cooperating. This is why this strategy is only viable if there is trust between the students. Even if a few students decide to be selfish, the cooperative strategy can still be overall better than the non-cooperative strategy. Funnily enough, if everyone decides to be selfish and always pick b) while thinking other people are going to cooperate then their expected value drops to zero! This means there is a breaking point where if there are too many people trying to profit from the collective, then they all would be better under the non-cooperative strategy. A reason the students might want to cooperate could be if they get a gift if their average beats another group of students for example.

Of course, the ideal way to cooperate would be if they are allowed to talk and all agree on who are going to make the sacrifice before taking the exam, then the collective expected value would be 2 and a single person would get the bonus points.

As the number of students increases to infinity, p tends to 1/2. To see this, let p=1/2-epsilon for some fixed epsilon>0. Then, once the number of student is large enough, choice a) will become the least popular answer with probability close to 1. If p=1/2+epsilon, choice b) will become the last popular answer with probability close to 1. Since an individual student isn't supposed to gain any advantage, the choice of p must be in 1/2-epsilon and 1/2+epsilon. The expected value for a student will then tend to 3/4 since the chance that a)  or b) is the least popular answer will get close to 1/2 for each. Please note this is not a proof, but merely a heuristic. Also note that p equals exactly 1/2 is never an answer for any number of students, since then someone choosing b) 100% of the time will get expected value close to 1 if everyone else is using p=1/2 (when the number of student is large enough, one's vote becomes almost meaningless).

Let's move on by adding choice c) to the options and we consider a class of five students. We denote by r the probability of a student picking choice c). Notice that r=1-p-q.

The idea to solve the non-cooperative setting is essentially the same. A student picking choice a) has expected value q^4+4q^3r+6q^2r^2+4qr^3+r^4. For choice b), 2*(4p^3r+6p^2r^2) and for choice c), 3*(6p^2q^2).

We get two solutions for this set of equations from WolframAlpha. First is p = 1; q = 0; r = 0. Other one is p ≈ 0.315955; q ≈ 0.349065; r ≈ 0.33498. In the first solution, everyone loses, so it does not matter what choice you make. The other options gives an expected value greater than zero (≈0.22), so it makes sense for everyone to go for this solution instead of the trivial one.

In conclusion, as we increase the number of choices, the number of variables in the polynomials increases. As we increase the number of students, the degree of the polynomials increases.

No comments:

Post a Comment