In this post I will explain how I solved the following problem:
Problem: Let , , and be real square matrices of the same size, and suppose that is invertible. Prove that if then .
It is possible to prove this in three lines, but I don’t think that anyone would learn anything about problem solving from just seeing the proof. Instead I want to describe how I solved it. In order to make it easier to read, I decided to write the description in present tense and to include the reader, so instead of “Then I tried to” I write “Now we try to”. I solved this problem about two months ago, so I probably had a lots of thoughts that I have forgotten about now.
At first this problem seems a bit confusing, so we’ll try get a better understanding of the problem. It is a problem from a contest, so unlike when you’re doing research, you don’t have to worry about the possibility that the statement could be false. The assumption is that , , and are real square matrices, so the first problem is to decide if this is important. What if , , and are complex matrices? Or more generally: What if , , and are elements in a ring? (If you don’t know what a ring is, don’t worry. All I’m asking is: Is it possible to prove that statement, by only using algebraic rules like ?). It is easy to state the problem for elements in a ring, and it seems unlikely that it should be false for rings, given that it is true for real matrices, so let’s try to work on this conjecture:
Conjecture: Let , , and be elements of a ring , and suppose that is invertible. Now implies .
In other words, you shouldn’t think of the matrices as sets of real numbers arranged in arrays, but only of their algebraic properties. The problem is still a bit difficult to get your head around, so we’ll try to “turn off” some of the difficulties. The three most obvious difficulties are , and , so let’s see what happens if we let one of them be the identity. We get three new problems:
Proving one or more of these won’t solve the original problem, but it might give an idea of how to solve it. (You could also set two of the three matrices to be equal to , but the resulting statements are too trivial to be interesting). Let’s look at the first of the three statements:
That is, if then and commute. How can we prove that two elements in a ring commute? Well, if their product is the identity, we know that they are each others inverses, and therefore commute (Edit: This is not true in a general ring, but it is true for matrices. See my comment). Unfortunately the right hand side is , and if we assume the statement is trivial. So instead we assume to be invertible and multiply by . Remember that right now we are not trying to give a formal proof of anything, but only to get some ideas of what a proof might look like, so we are free to add this “niceness” –assumptions about . Assume also that . Now we get:
That is, is the inverse of , so they commute and we have:
We have now proved the statement in the case where and is invertible. Now it’s natural to try the same idea in the -case. Without any further assumptions we get:
As we wanted. The final case is equivalent to showing that if then and commute. We can’t just use the same trick as before, because we don’t know that and commute. But we have learned one important lesson from the two other cases: In order to show that two matrices commute, it is useful to find two matrices that are each other’s inverse, and use that they commute. So we add the identity on both sides and move everything else to the left hand side, and look for a factorization:
Now that we have solved the three simpler problems, we can look back at the original conjecture.
There are two ways you can use a proof of a simpler case to prove a more general theorem: The first is to use the fact that the simpler case is true. If we make the assumption that is invertible, this technique is actually useful. We get:
Where we use that with and as and . Now we have proved the statement under the assumption that is invertible, but if we don’t have this assumption, we have to do something else. (We could use a limit argument, like Gowers did in a comment to my last post, but I didn’t think of this trick when I solved the problem).
I mentioned that there are two ways of using a proof of a special case to prove a more general theorem, but I only gave one. The other is one to modify your proof to cover the more general theorem. So let’s look back at the proof that , and try to modify the proof to showing that . Now the proof almost writes itself: