The first game is three person Russian roulette: Three persons roll a dice to decide who is player A, B and C. First player A tries to shoot one of his opponents, if player B is still alive he tried to shot one of his opponents, then it is player C’s turn if he is still alive, after that it is player A’s turn and so on, until only one person is left. We assume that all shots are either lethal or a miss and we require all player to do their best at each shoot: you are not allowed to miss on purpose. If all three contestants have 100% chance of hitting on each shot, player A will kill one of B and C, and the survivor will kill player A. If A chooses his victim randomly (to be correct: with uniform distribution) then player B and C each have 50% chance of winning, and player A will die. However, if player B has only 99% chance of succeeding at a shot – and his opponents knows that – then player A will shoot player C, to get a 1% chance of surviving. Then player B has a 99% chance of winning. Similarly, if player C has 99% chance of succeeding on each shot, and A and B has 100%, then player C will win with probability 99%. Finally, consider the case where player A has 99% chance of hitting and the two others have 100%. If player A misses his first shot, he will be in the same situation as if he was player C, so he will win with probability 99%. If he hits in his first shot, the other player will hit him. Thus player A has 1%99%=0.99% chance of surviving. So if you have 99% chance of hitting on each shot, both you opponents have 100% chance and everyone know this, then you will survive with probability 66.33%.
This game has another “paradox”. If you are player A, you hope to miss your first shot. If we change the rules to allow the player to miss on purpose (and to avoid a stalemate, let’s stay that everyone dies if there are three misses in a row) the player with 99% chance of succeeding at each shot will survive with probability 99%, while each perfect shooter will only survive with a probability of 0.5%.
It might not be surprising that the weakest player can have an advantage in three person games: The two best compete against each other and then the weakest player can attack the survivor of the two strongest. I think the game from the last post was more surprising. Here two players play chicken, but one of the players, let’s call him player B, cannot turn the steering wheel. If player B could turn the steeling wheel, there would be two Nash equilibria, one where player A wins the most and one where player B wins the most. But when player B cannot swerve, the Nash equilibrium where A wins the most disappears, and we end up in the Nash equilibrium where B wins the most.
I find the next game even more surprising. This game also have two players, A and B with A being stronger that B, but now both A and B can “do the same to the environment” and player A can choose to use his strength against player B. The game is played by two pigs. They are trained separately to press a panel in one end of their sty to get food in a feeding bowl in the other end of the sty. We then put both pigs in the sty together. We assume that the dominant pig, A, can push B away from the feeding bowl, but he cannot hurt B. If B presses the panel, A will be closer to the food bowl, and B is not strong enough to push him away, so B does not have any reason to push the panel. One the other hand, if A pushes the panel, B will eat some of the food, but A can push him away. If they get enough food for each press on the panel, there will be food left for A, so he will start running back and forth between the panel and the food bowl, while B will be standing close to the food bowl all the time. If they do not get too much food for each press on the panel, B will get more food and A.
————————————————————————————————————————
I remember that I have heard about the three person Russian roulette, but a cannot find any references now (added later: a reader pointed out that this game was mentioned here in the quiz show QI). The game with the two pigs is described an article by Baldwin and Meese (but it is older). They tried this experiment, but it was a box of length 2.8 m so the dominant pig got the most food. I do not know if there are experiments that show that a dominant animal would do the panel pressing if it gets less food than it opponent.
Baldwin, B. A. & Meese, G. B. 1979. Social behaviour in pigs studied by means of operant conditioning. Animal Behaviour, Vol .27 Part 3, pp. 947–957.
The ”Dollar Auction game” is a very simple game: An auctioneer wants to sell one dollar to the highest bidder, but there is one unusual rule in this auction: Both the highest and the second highest bidder have to pay their bid, but only the highest bidder will get the dollar. All bids have to be in multiples of one cent. What would you do in this game?
Let’s see what happens if you play this game with a lot of people. It only cost 1 cent to give the first bid, and it could earn you 1 dollar, so probably someone will give that bid. But then 2 cents for a dollar is also a good deal, so someone else bets 2 cent. Then someone bets 3 cent and so on. Now, let’s say the Alice bet 98 cents and Bob has just bid 99 cents. If Alice stops here, Bob will get the dollar for 99 cents, so he earn 1 cent, but Alice will have to pay 98 cent. To avoid this, Alice bids 1 dollar and if Bob stop here, Alice get the dollar for one dollar, so she don’t lose anything. However, Bob don’t want to stop because he will then loose 99 cents, so instead he bids $1,01, hopping that Alice stop and that he will only loose one cent. So Alice and Bob will continue the bid for a while, until one of them give up. I have never tried this game, but there are claims that someone paid $200 for one dollar, or even $3000 for $100, so you shouldn’t try this at home,… but perhaps you should try it somewhere else as the auctioneer!
What should you do if someone started a Dollar Auction? One strategy would be not to bid at all, but that it too boring. Another strategy is to explain the problem to everyone and then bid 1 cent hoping that no one else bids… or at least, hope that the probability that no one else bids is less than 99%. A third strategy is to bid 99 cents before anyone else bids. This way, no one has a reason to overbid you. There are two problems with this strategy: Even in the best case, you can only earn 1 cent with this strategy and if some in the crowd really hates you, he can bid 1 dollar just to make you lose the 99 cents!
A more interesting strategy would be to bid 1 cent and promise that you would not let anyone else get the dollar for less than $1.02. If all others really believed you, they should not bid. But why should they believe you? If Alice bids 99 cent after you have bid 1 cent, it would probably be best for you to just break your promise. Alice would then earn one cent, and you are the only one who loses, so no one will be mad at you for breaking your promise. This leads us to a counterintuitive strategy. Make a deal with Bob:
“If someone else gets the dollar for less than $1.02, I have to pay you $3”*
After making this deal, you bet one cent. If someone, say Alice, bets 99 cents, it will be better for you to bet higher, that to give Bob the 3$. Alice know this, so she will not try to bet higher.
So the strategy is to promise to give away some money under certain conditions. Intuitively, you would think that this is a bad idea because you are restricting yourself. However, in some games it is best to make a “voluntary but irreversible sacrifice of freedom of choice”** If you play chicken (a game where two drivers drive their car against each other, if you swerve you lose, and if none of you swerve you probably gets killed or injured so that also counts as losing) you are almost sure to win, if you, before the game starts, take of your steering wheel. Your opponent knows that you cannot swerve, so he will have to swerve. However, it is important to tell your opponent that you cannot swerve, otherwise it might end in disaster, as in the film Dr. Stangelove! [SK]
——————————————————————————————————————–
Puzzle: How many “essentially different” games can you find, where it is best to be the weakest/less capable player?
I know that statement is a bit weak, but I didn’t want to make it too precise. From the above we can find one example: If you play chicken it will be an advantage to not be able to move your arms (and not be able to turn a steering wheel in any other way) as long as your opponent knows that you cannot swerve. I consider many games to be “essentially” the same as this game, although I am not able to define the class of games that I consider to be essentially the same. I have two other essentially different games where it is an advantage to be the weakest/less capable, and I will post them next week.
* Actually, this is not a good deal, because it will be possible to use it against you to blackmail you. Furthermore, Bob should know that you would never get anyone else get the dollar for less that $1.02, so he would never earn the $3. A better deal would be the following (I hope!) “If someone else get the dollar for less than $1.02, or if I make any other agreement during this game, or pay or receive money during this game or as a consequence of this game I have to pay you $3. If you make any other deals during the game, you have to pay me $10. You get 10 cents for accepting this deal.”
**This phrase is from the Nobel prize winner Thomas Schelling [TS]. Steven Pinker gives other examples of such games in [SP, p. 408-411].
[MK] Muringhan, J. Keith. “A Very Extreme Case of the Dollar Auction.” Journal of Management Education 26, 56-69. 2002
[TS] T. C. Schelling, The strategy of conflict, Harvard University Press, 1980.
[SL] S. Kubrick, Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb [film], Columbia Pictures, 1964.
[SP] S. Pinker, How the mind works, Norton, 1997.
However, this is a very simplified model an infinite world. Presumably, the infinitely many rooms would be in . So let’s say there is a room for each point in , and that you can only send money to one of your 26 neighbors and it takes one minute to send the money. Then is it no longer possible for your wealth to grow exponentially fast. Even if everyone cooperated and tried to make the person in rich, he could have at most dollars after minutes. This is only polynomial, but still pretty good for the person in (0,0,0). But this protocol is unfair: we want everyone to earn at least some amount during the first minutes. However, we can easily see that the persons in rooms in will in total have at most dollars after minutes, and goes to as goes to infinity. This shows that we cannot use the infinite world to make everyone richer by an epsilon in finite time. (If you know about amenability, you may notice that the sequence of sets is a Følner sequence, so the comparison to the Banach-Tarski paradox goes deeper than just “doubling something using some infinite trick that is not possible in the real world”).
So, let’s say you are a God and you want to construct a world, with the following restrictions:
Your objective is to make everyone’s wealth grow exponentially fast.
One way to do this is to give the world the structure of the Cayley graph of the free group with two generators (if all the rooms have the same positive volume and the distance between neighboring rooms has to be bounded it is not possible to embed this in , but you are God, so you don’t care about ). Now each person has 4 neighbors. E.g. has , , , and and has and . The protocol is that everyone except should each minute send all his money to the neighbor that is closer to . This way everyone (besides ) will receive money from 3 persons, so at time they will have dollars. The person at will receive money from 4 person and won’t pay to anyone, so he will become rich even faster.
So you are a God, you have just created this fantastic world with infinitely many people and you are just about to announce your strategy for how everyone can become rich in no time without working, when you discover that you have made a terrible mistake: you forgot to label the doors. In fact, there is no way to distinguish any two rooms, so now the inhabitants of your world can’t know what direction they should send their money.
If only you could tell one person that he was a chosen one (=), he could then tell his neighbors and they would give him money each minute. They would then tell their neighbors, and so on (remember, no one would lie, since they know they will get rich even if they are honest). This way it would take some time for people far away from the chosen one before they starts getting richer, but when they do, their wealth will grow exponentially.
Unfortunately, you have no way to communicate to only one person: you can only use the loudspeakers. You did, however, remember to give them a dice each. Is there a protocol that would make everyone’s wealth grow exponentially with probability 1?
Edit: I asked this question here on mathoverflow . The answer is no. It turns out to be a special case of the “Mass Transport Principle”. It can be found on page 283 in this book which is available online (page 293 in the pdf-file).
Abstract:
This thesis is about the problem and why this is so difficult to solve. We consider two different techniques from complexity theory, see how they can be used to separate complexity classes and why they do not seem to be strong enough to prove . The first technique is diagonalization, which we will use to prove the time hierarchy theorem. We will then see why a result of Baker, Gill and Solovay seems to indicate that this technique cannot prove . We will see how the circuit model can be use to prove and see a result by Razborov and Rudich, which indicate that this approach is also too weak to prove . Finally, we will see that provability and computability are closely related concepts and show some independence results.
Not abstract:
I wish I had had enough time to include algebrization.
A function could have a graph that is dense in the plane, but still be ‘nice’ on a large part of the domain. E.g. there exist functions that are on all irrational numbers, but still have a graph that is dense in the plane. Inspired by this, here is a list of things that would make a function wild even in a measure theory sense. Again, the list is ordered, and again all these statements are true for any non-linear solution to Cauchy’s functional equation.
I will prove that last one.
Proof: Assume for contradiction that there is a non-linear function satisfying Cauchy’s functional equation and a non-empty open set and a measurable set with positive measure, and . Any open set contains an open interval, so without loss of generality, we can assume that is an open interval. We assumed that , where denotes the measure of a set , and since measures are countably additive, there must be an interval of length with , so without loss of generality, we assume that is contained in an interval of length . To reach a contradiction, I will show that under these assumptions, there exist two sequences of sets, and of subsets of satisfying:
Since and for we have for . We assumed that is contained in an interval of length , so is contained in an interval of length . Furthermore, so using the pigeonhole principle we can find an interval of length such that . We see that so I choose and .
I will now use the existence of and to reach a contradiction. For each let be a number such that and let and be numbers such that . We know that the graph of is dense in the plane, so we can find some with . Now the sequences and satisfy the above five requirements for and , and furthermore and the lower and upper bound of will tend to minus infinity resp. plus infinity. Now for all there is some such that for all . But , so the sequence of indicator functions converge pointwise to the -function. It is dominated by which have integral , so the dominated convergence theorem tells us that tends to as . But this contradicts . QED.
Assuming the axiom of choice we can find a discontinuous solution to Cauchy’s functional equation, and if we let in the above be the set of positive real numbers, we see that any measurable set with
must have measure . But similarly, any measurable set with
must also have measure . Thus we have found a set, such that neither the set nor its complement contains a set with positive measure.
We have seen that the graph of a non-linear solution is in many ways ‘spread out all over the plane’. But there are some ways to interpret ‘spread out all over the plane’ for which this is not true for all solutions. E.g. let be a Hamel basis. Now the function
is a solution to Cauchy’s functional equation, but is always rational. However, there exist surjective solutions, so in some ways, some solution functions are `wilder’ than others. As usual, I will give a list of wild properties a function can have, and as usual the list is ordered such that any of the properties imply the one above. Unlike for the other lists, there are some solution functions that do not satisfy any of the properties and some that satisfy all of them. Moreover, for any two properties on the list, there exist solutions, that satisfy the upper of the two, but not the lower one.
First I will prove that there is a solution satisfying the third property. The proof of the existence of solutions satisfying the two last properties are similar, and I will sketch those proofs afterwards.
Proof: We begin by choosing a Hamel basis and well-order . That is, we find an ordering on such that any subset of has a least element. The existence of such an ordering (on any set) is equivalent to the axiom of choice. The set is a subset of , so the cardinality of this set is not greater than the cardinality of . Since for we get that . Assume for contradiction that . Using the rules for calculations with cardinality we know that and more generally . Since any element in is a linear combination of the ‘s over so for any real number here is a and such that
Hence
To reach this contradiction, we assumed that , so .
Now, let’s see how many continuous functions there is. A continuous function is uniquely determined by its value on the rational numbers, so , where is the set of continuous functions. On the other hand, the constant functions are continuous, and there are of them, so . Hence , so we can index the set of continuous functions with the set , so that . We can now define to make sure that the equation have a solution. Now is given by
QED.
If we want the set of solutions to to be dense in , it is a bit more complicated. The idea is, that instead using the set to index the set of continuous functions, we use it to index the set of continuous functions times the set of open intervals. Unfortunately we cannot be sure that is in the open interval corresponding to . Instead we start by defining . Now we know that for each there is a such that is in the open interval corresponding to , and we define .
If we want to show that there exist solution functions with the last property, it is much more complicated: Here we need transfinite induction, because we need to choose the elements of the Hamel basis one at a time. We know that the set of (Borel-)measurable set has the same cardinality as , thus the set of functions , that can be defined as a continuous function restricted to a measurable set with positive measure, has the cardinality . Now we index this set by a set . Using axiom of choice, we can well-order this set, and we can even choose the ordering such that for all . Now for each we choose and such that is in the domain of and and such that the ‘s to be linear independent over .
To show that this is possible, I only need to show that when we have chosen for all we can choose such that is linearly independent of the over and in the domain of . The rest follows by transfinite induction. We know that measurable sets with positive measure are uncountable, so if we assume the continuum hypothesis (a statement independent of ZFC: it states that a set cannot have a cardinality between and ), any measurable set have the same cardinality as . (It is still true without the continuum hypothesis, but it is more difficult to prove. See [BS].) We know that so the cardinality of the linear span over of this set is also less than , since you cannot reach by taking countable union and finite products of sets of smaller cardinalities. (In general, still assuming axiom of choice, you cannot get a set with some infinite cardinality , by taking finite products of sets with smaller cardinality, or by taking union of sets with smaller cardinality.) Since the domain of have the same cardinality as , we can choose an element in the domain of and not in the linear span of .
By transfinite induction, we have now chosen ‘s such that they are linearly independent over . However, we cannot be sure that they span all of . So we end by using the axiom of choice once again to extend the set to a Hamel basis, and we set the rest of the ‘s to be zero. This gives us a solution to Cauchy’s functional equation, with a graph that intersects any continuous function on any measurable set with positive measure.
[BS]: James M. Briggs and Thomas Schaffter. Measure and Cardinality. The American Mathematical Monthly, Vol. 86, No. 10, pp. 852-855.
[EH]: Ernst Hansen. Measure Theory. Department of Mathematical Sciences University of Copenhagen. 2009.
[MO]: mathoverflow: Do sets with positive lebesgue measure have same cardinality as R?
Cauchy’s functional equation, looks very simple, and it has a class of simple solutions, , but there are many other and more interesting solutions. In these notes, I will show you what some of these “wild” solutions look like, and I will use them to prove that there exist a set , such that neither nor contains a measurable subset with positive measure. Section 1 is about Cauchy’s functional equation on the rational numbers, in section 2 I show that there some wild solutions on , and in section 3 I will show that their graphs are dense in . In section 4 I’ll show that these functions are ugly from a measure theoretical point of view, and in section 5, I’ll show that some of these functions are wilder than others. E.g., I will prove that there is a solution to Cauchy’s functional equation, that intersects any continuous function from to .
First, we consider the equation over the rational numbers. That is, By setting we get and thus . Let’s set . If we get: By definition of , we have for , so by induction, for all . More generally, we can prove that for and we have : It is clearly true for and if it is true for we get: Let be a positive rational number, and write it as , where . Now, Dividing by we get . Furthermore, so . Putting it all together we have for all . It is easy to verify that is a solution for the general equation on .
Now consider Cauchy’s functional equation on the real numbers, The proof from last section, tells us that for all rational numbers , and using the same idea, we can prove that for all and . But this does not imply that for all the real numbers. However, if we assume that is continuous, we can show that for all : We simply choose a sequence of rational numbers that converge to . By continuity we get But it is much more fun if we do not have any assumptions on ! Using axiom of choice we can find non-continuous solutions. The idea is: A priori we only know that . Now we choose some value for , e.g. . This determines on all the rational numbers, for , but the value of is not determined on any irrational number. So we make another choice, let’s say . Now the functional equation tells us that for all . But for numbers not on this from, we cannot determine the value of . So we simply continue by choosing more and more values of the function. Unfortunately, we have to make infinitely many choices, so we need axiom of choice. In the rest of these notes, I will assume axiom of choice. To formalize the above, we consider the set of real numbers as a vector space over , in much that same way as you can consider to be a two dimensional vector space over . An important difference is, that when we consider to be a vector space over it is infinite dimensional: it even has uncountably many dimensions. We now use the axiom of choice to choose a basis (a so-called Hamel basis) and we choose some coefficients . This defines a linear map from this vector space to itself: where the s are rational numbers, and only finitely many of them are non-zero. I called this function ‘linear’, so it sounds like it is a nice function. But it is not! It is only linear when we consider as a vector space over and forget about the rest of the structure on . This function is only linear in the usual sense on if is the same for all . All functions on this form are solutions to the Cauchy’s functional equation, and conversely all solutions to Cauchy’s functional equation are on this form.
A function can be more or less wild/ugly/pathological. Here is a list of possible definitions of what makes a function wild. The list is ordered, such that any of the properties imply the one above.
All of these statements are true for any non-linear solution to Cauchy’s functional equation. I will show that the last one of these is true. Proof: Let be a non-linear solution. If and are points in the graph of , and is a rational number, we see that the points and are both in the graph too. In words, any linear combination over of points in the graph are also in the graph. Since is non-linear we can find real numbers and , both non-zero, such that . Now the two vectors and are linearly independent (over ), so they span the plane. That is, any point can be written as for some . Let and be sequences of rational numbers with and . Now is a sequence of points in the plan converging to , so the graph is dense in .
The answer is written in white. Highlight it to read it.
No, x doesn’t have to be a global minimum. Consider the function f(x,y)=(e^y+e^{-y^2})(-2x^3+3x^2)-e^{-y^2} (I don’t know how to make latex white, so I wrote it in plain text instead, sorry). We see that (e^y+e^{-y^2}) is positive so df/dx=0 only if x=0 or x=1. For x=1 the function is e^y and doesn’t have any stationary points. For x=0 the function is -e^{-y^2}, and (0,0) is a stationary point. The point is a local minimum, since for x < 1 we have f(x,y)>=f(0,y)>=f(0,0)=-1, but it is not a global minimum since f(2,0)=-9. Now the next question is: What if f is a polynomium? I don’t know the answer.
As you can see in the above link, Tao started a mini-polymath project about problem 5. So here you can see how a group of mathematicians worked together to solve that problem (and see the problem statement). I decided to ask some of the Danish contestants for a description of how they solved the problem, so that future contestants can see how a single person thinks. Here is Anders Eller Thomsen’s description. Edit: And here is Mathias Bæk Tejs Knudsen’s.
When solving this problem, it helps to find a stronger statement, that is true for all n, because that gives you a stronger induction hypothesis, so that you can make the induction step. But sometimes, this is not enough, and you have to “invent” (or should I say discover?) other cases. Let me give an example of this:
Problem 5 day 1 IMC 1999: Suppose that points of a grid are marked. Show that for some one can select distinct marked points, say , such that and are in the same row, and are in the same column, , indices taken mod .
We have point and rows and columns, so on average there is points in each row and in each column. If we knew that there was at least two points in each row and column, the problem would be easy: We could just start at one point, go to a point in the same row, then to a point in the same column, and so on. Continue to do this (you can do that, because there is a least two points in each row and column), until you hit a point you have visited before, and you have a loop (if you both began and ended with a “row-move” or with “column-moves”, you can just “jump over” the first point). But unfortunately, there could be rows or columns with only one point or none points at all. If there is both a row and a column with at most 1 point in each, we can delete this row and column, and we have a problem with a grid and at least marked points. This gives us a hope, that we can prove the statement by induction. But what if all rows contains 2 marked points, but some columns only contain 1? If we delete a column, we would get a grid with marked points, so this suggests that we should try to prove something stronger:
Stronger statement: Let points of a grid be marked. Now for some you can select distinct marked points, , such that and are in the same row, and are in the same column, , indices taken mod .
Proof: By induction on n and m. This is true if or if because you can’t choose points in a grid. If we have two point in every row and column, we can just use the above proof, if not, we delete a row or column with at most one point, and thereby reduces the problem to a smaller one.
Here is another IMC problem, you can try to solve:
Problem 5 day 1 IMC 2004: Let be a set of real numbers, where is a positive integer. Prove that there exists a monotone sequence such that
Update: Here are two MathOverflow questions about this subject, where you can find more examples
Problem: Let , , and be real square matrices of the same size, and suppose that is invertible. Prove that if then .
It is possible to prove this in three lines, but I don’t think that anyone would learn anything about problem solving from just seeing the proof. Instead I want to describe how I solved it. In order to make it easier to read, I decided to write the description in present tense and to include the reader, so instead of “Then I tried to” I write “Now we try to”. I solved this problem about two months ago, so I probably had a lots of thoughts that I have forgotten about now.
At first this problem seems a bit confusing, so we’ll try get a better understanding of the problem. It is a problem from a contest, so unlike when you’re doing research, you don’t have to worry about the possibility that the statement could be false. The assumption is that , , and are real square matrices, so the first problem is to decide if this is important. What if , , and are complex matrices? Or more generally: What if , , and are elements in a ring? (If you don’t know what a ring is, don’t worry. All I’m asking is: Is it possible to prove that statement, by only using algebraic rules like ?). It is easy to state the problem for elements in a ring, and it seems unlikely that it should be false for rings, given that it is true for real matrices, so let’s try to work on this conjecture:
Conjecture: Let , , and be elements of a ring , and suppose that is invertible. Now implies .
In other words, you shouldn’t think of the matrices as sets of real numbers arranged in arrays, but only of their algebraic properties. The problem is still a bit difficult to get your head around, so we’ll try to “turn off” some of the difficulties. The three most obvious difficulties are , and , so let’s see what happens if we let one of them be the identity. We get three new problems:
Proving one or more of these won’t solve the original problem, but it might give an idea of how to solve it. (You could also set two of the three matrices to be equal to , but the resulting statements are too trivial to be interesting). Let’s look at the first of the three statements:
That is, if then and commute. How can we prove that two elements in a ring commute? Well, if their product is the identity, we know that they are each others inverses, and therefore commute (Edit: This is not true in a general ring, but it is true for matrices. See my comment). Unfortunately the right hand side is , and if we assume the statement is trivial. So instead we assume to be invertible and multiply by . Remember that right now we are not trying to give a formal proof of anything, but only to get some ideas of what a proof might look like, so we are free to add this “niceness” –assumptions about . Assume also that . Now we get:
That is, is the inverse of , so they commute and we have:
We have now proved the statement in the case where and is invertible. Now it’s natural to try the same idea in the -case. Without any further assumptions we get:
As we wanted. The final case is equivalent to showing that if then and commute. We can’t just use the same trick as before, because we don’t know that and commute. But we have learned one important lesson from the two other cases: In order to show that two matrices commute, it is useful to find two matrices that are each other’s inverse, and use that they commute. So we add the identity on both sides and move everything else to the left hand side, and look for a factorization:
Now that we have solved the three simpler problems, we can look back at the original conjecture.
There are two ways you can use a proof of a simpler case to prove a more general theorem: The first is to use the fact that the simpler case is true. If we make the assumption that is invertible, this technique is actually useful. We get:
Where we use that with and as and . Now we have proved the statement under the assumption that is invertible, but if we don’t have this assumption, we have to do something else. (We could use a limit argument, like Gowers did in a comment to my last post, but I didn’t think of this trick when I solved the problem).
I mentioned that there are two ways of using a proof of a special case to prove a more general theorem, but I only gave one. The other is one to modify your proof to cover the more general theorem. So let’s look back at the proof that , and try to modify the proof to showing that . Now the proof almost writes itself:
QED.