Worse is better

Posted May 28, 2012 by Sune Kristian Jakobsen
Categories: Uncategorized

Tags: ,

As I promised in my last post, I will give two more examples of games, where it is better to be the weaker/less capable player.

The first game is three person Russian roulette: Three persons roll a dice to decide who is player A, B and C. First player A tries to shoot one of his opponents, if player B is still alive he tried to shot one of his opponents, then it is player C’s turn if he is still alive, after that it is player A’s turn and so on, until only one person is left. We assume that all shots are either lethal or a miss and we require all player to do their best at each shoot: you are not allowed to miss on purpose. If all three contestants have 100% chance of hitting on each shot, player A will kill one of B and C, and the survivor will kill player A. If A chooses his victim randomly (to be correct: with uniform distribution) then player B and C each have 50% chance of winning, and player A will die. However, if player B has only 99% chance of succeeding at a shot – and his opponents knows that – then player A will shoot player C, to get a 1% chance of surviving. Then player B has a 99% chance of winning. Similarly, if player C has 99% chance of succeeding on each shot, and A and B has 100%, then player C will win with probability 99%. Finally, consider the case where player A has 99% chance of hitting and the two others have 100%. If player A misses his first shot, he will be in the same situation as if he was player C, so he will win with probability 99%. If he hits in his first shot, the other player will hit him. Thus player A has 1%\cdot99%=0.99% chance of surviving. So if you have 99% chance of hitting on each shot, both you opponents  have 100% chance and everyone know this, then you will survive with probability 66.33%.

This game has another “paradox”. If you are player A, you hope to miss your first shot. If we change the rules to allow the player to miss on purpose (and to avoid a stalemate, let’s stay that everyone dies if there are three misses in a row) the player with 99% chance of succeeding at each shot will survive with probability 99%, while each perfect shooter will only survive with a probability of 0.5%.

It might not be surprising that the weakest player can have an advantage in three person games: The two best compete against each other and then the weakest player can attack the survivor of the two strongest. I think the game from the last post was more surprising. Here two players play chicken, but one of the players, let’s call him player B, cannot turn the steering wheel. If player B could turn the steeling wheel, there would be two Nash equilibria, one where player A wins the most and one where player B wins the most. But when player B cannot swerve, the Nash equilibrium where A wins the most disappears, and we end up in the Nash equilibrium where B wins the most.

I find the next game even more surprising. This game also have two players, A and B with A being stronger that B, but now both A and B can “do the same to the environment” and player A can choose to use his strength against player B. The game is played by two pigs. They are trained separately to press a panel in one end of their sty to get food in a feeding bowl in the other end of the sty. We then put both pigs in the sty together. We assume that the dominant pig, A, can push B away from the feeding bowl, but he cannot hurt B.  If B presses the panel, A will be closer to the food bowl, and B is not strong enough to push him away, so B does not have any reason to push the panel. One the other hand, if A pushes the panel, B will eat some of the food, but A can push him away. If they get enough food for each press on the panel, there will be food left for A, so he will start running back and forth between the panel and the food bowl, while B will be standing close to the food bowl all the time. If they do not get too much food for each press on the panel, B will get more food and A.

————————————————————————————————————————

I remember that I have heard about the three person Russian roulette, but a cannot find any references now (added later: a reader pointed out that this game was mentioned here in the quiz show QI). The game with the two pigs is described an article by Baldwin and Meese (but it is older). They tried this experiment, but it was a box of length 2.8 m so the dominant pig got the most food. I do not know if there are experiments that show that a dominant animal would do the panel pressing if it gets less food than it opponent.

Baldwin, B. A. & Meese, G. B. 1979. Social behaviour in pigs studied by means of operant conditioning. Animal Behaviour, Vol .27 Part 3, pp. 947–957.

How to get one dollar for only a few cents

Posted May 24, 2012 by Sune Kristian Jakobsen
Categories: Uncategorized

Tags: , ,

Most of this post is a translation of my article in Famøs (a student journal at University of Copenhagen). If you have already read that article, you might want to jump to the puzzle.

The ”Dollar Auction game” is a very simple game: An auctioneer wants to sell one dollar to the highest bidder, but there is one unusual rule in this auction: Both the highest and the second highest bidder have to pay their bid, but only the highest bidder will get the dollar. All bids have to be in multiples of one cent. What would you do in this game?

Let’s see what happens if you play this game with a lot of people. It only cost 1 cent to give the first bid, and it could earn you 1 dollar, so probably someone will give that bid. But then 2 cents for a dollar is also a good deal, so someone else bets 2 cent. Then someone bets 3 cent and so on. Now, let’s say the Alice bet 98 cents and Bob has just bid 99 cents. If Alice stops here, Bob will get the dollar for 99 cents, so he earn 1 cent, but Alice will have to pay 98 cent. To avoid this, Alice bids 1 dollar and if Bob stop here, Alice get the dollar for one dollar, so she don’t lose anything. However, Bob don’t want to stop because he will then loose 99 cents, so instead he bids $1,01, hopping that Alice stop and that he will only loose one cent. So Alice and Bob will continue the bid for a while, until one of them give up. I have never tried this game, but there are claims that someone paid $200 for one dollar, or even $3000 for $100, so you shouldn’t try this at home,… but perhaps you should try it somewhere else as the auctioneer!

What should you do if someone started a Dollar Auction? One strategy would be not to bid at all, but that it too boring. Another strategy is to explain the problem to everyone and then bid 1 cent hoping that no one else bids… or at least, hope that the probability that no one else bids is less than 99%. A third strategy is to bid 99 cents before anyone else bids. This way, no one has a reason to overbid you. There are two problems with this strategy: Even in the best case, you can only earn 1 cent with this strategy and if some in the crowd really hates you, he can bid 1 dollar just to make you lose the 99 cents!

A more interesting strategy would be to bid 1 cent and promise that you would not let anyone else get the dollar for less than $1.02. If all others really believed you, they should not bid. But why should they believe you? If Alice bids 99 cent after you have bid 1 cent, it would probably be best for you to just break your promise.  Alice would then earn one cent, and you are the only one who loses, so no one will be mad at you for breaking your promise. This leads us to a counterintuitive strategy. Make a deal with Bob:

“If someone else gets the dollar for less than $1.02, I have to pay you $3”*

After making this deal, you bet one cent. If someone, say Alice, bets 99 cents, it will be better for you to bet higher, that to give Bob the 3$. Alice know this, so she will not try to bet higher.

So the strategy is to promise to give away some money under certain conditions. Intuitively, you would think that this is a bad idea because you are restricting yourself.  However, in some games it is best to make a “voluntary but irreversible sacrifice of freedom of choice”** If you play chicken (a game where two drivers drive their car against each other, if you swerve you lose, and if none of you swerve you probably gets killed or injured so that also counts as losing) you are almost sure to win, if you, before the game starts, take of your steering wheel. Your opponent knows that you cannot swerve, so he will have to swerve. However, it is important to tell your opponent that you cannot swerve, otherwise it might end in disaster, as in the film Dr. Stangelove! [SK]

——————————————————————————————————————–

Puzzle: How many “essentially different” games can you find, where it is best to be the weakest/less capable player?

I know that statement is a bit weak, but I didn’t want to make it too precise. From the above we can find one example: If you play chicken it will be an advantage to not be able to move your arms (and not be able to turn a steering wheel in any other way) as long as your opponent knows that you cannot swerve. I consider many games to be “essentially” the same as this game, although I am not able to define the class of games that I consider to be essentially the same. I have two other essentially different games where it is an advantage to be the weakest/less capable, and I will post them next week.

* Actually, this is not a good deal, because it will be possible to use it against you to blackmail you.  Furthermore, Bob should know that you would never get anyone else get the dollar for less that $1.02, so he would never earn the $3. A better deal would be the following (I hope!) “If someone else get the dollar for less than $1.02, or if I make any other agreement during this game, or pay or receive money during this game or as a consequence of this game I have to pay you $3. If you make any other deals during the game, you have to pay me $10. You get 10 cents for accepting this deal.”

**This phrase is from the Nobel prize winner Thomas Schelling [TS]. Steven Pinker gives other examples of such games in [SP, p. 408-411].

[MK] Muringhan, J. Keith.  “A Very Extreme Case of the Dollar Auction.” Journal of Management Education 26, 56-69. 2002

[TS] T. C. Schelling, The strategy of conflict, Harvard University Press, 1980.

[SL] S. Kubrick, Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb [film], Columbia Pictures, 1964.

[SP] S. Pinker, How the mind works, Norton, 1997.

Economy in an infinite world

Posted September 24, 2011 by Sune Kristian Jakobsen
Categories: Uncategorized

Tags: , , ,

Suppose we have a full Hilberts hotel … actually, lets make it a “Hilbert world” instead. This world consists of one building that have infinitely many rooms, numbered 0, 1,2 \dots . In each room lives one person, and he is going to live there in his entire infinite life. In the beginning each person have 1$, but then they figure out a clever way to get richer. Each minute the person in room n gives all his money to the person in room \lfloor n/2 \rfloor. So after one minute they will each have 2$ and after t minutes they will each have 2^t dollars. (This is similar to the Banach-Tarski paradox, where you can “use infinity” to turn one ball into two balls). Of course, money would lose their value in such a world, but the inhabitants of such a world could also get an infinitely amount of utility by sending food, cake, bubble wrap, whatever, instead of sending money.

However, this is a very simplified model an infinite world. Presumably, the infinitely many rooms would be in \mathbb{R}^3. So let’s say there is a room for each point in \mathbb{Z}^3, and that you can only send money to one of your 26 neighbors and it takes one minute to send the money. Then is it no longer possible for your wealth to grow exponentially fast. Even if everyone cooperated and tried to make the person in (0,0,0) rich, he could have at most (2t+1)^3 dollars after t minutes. This is only polynomial, but still pretty good for the person in (0,0,0). But this protocol is unfair: we want everyone to earn at least some amount \epsilon during the first t minutes. However, we can easily see that the persons in rooms in \{-n,-(n-1),\dots , n-1,n\}^3 will in total have at most (2n+2t+1)^3 dollars after t minutes, and \frac{(2n+2t+1)^3}{(2n+1)^3} goes to 1 as n goes to infinity. This shows that we cannot use the infinite world to make everyone richer by an epsilon in finite time. (If you know about amenability, you may notice that the sequence of sets \{-n,-(n-1),\dots , n-1,n\}^3 is a Følner sequence, so the comparison to the Banach-Tarski paradox goes deeper than just “doubling something using some infinite trick that is not possible in the real world”).

So, let’s say you are a God and you want to construct a world, with the following restrictions:

  • You first build the world and afterwards you can’t change anything.
  • Each person starts in a room (and doesn’t move) each person has a finite number of neighbors and he can send any amount of information and money to his neighbor. Each transaction takes one minute.
  • Each person starts with 1$ and all he care about is money.
  • Everyone cooperates as long as he knows that he will get rich by doing so (even if he could become rich faster by cheating).
  • When the world starts, you can explain a protocol to everyone in the world using loudspeakers (everyone will get the same message).

Your objective is to make everyone’s wealth grow exponentially fast.

One way to do this is to give the world the structure of the Cayley graph of the free group with two generators (if all the rooms have the same positive volume and the distance between neighboring rooms has to be bounded it is not possible to embed this in \mathbb{R}^3, but you are God, so you don’t care about \mathbb{R}^3). Now each person has 4 neighbors. E.g. aba^{-1} has aaba^{-1}, baba^{-1}, a^{-1}aba^{-1}=ba^{-1}, and b^{-1}aba^{-1} and \epsilon has a, b, a^{-1} and b^{-1}. The protocol is that everyone except \epsilon should each minute send all his money to the neighbor that is closer to \epsilon. This way everyone (besides \epsilon) will receive money from 3 persons, so at time t they will have 3^t dollars. The person at \epsilon will receive money from 4 person and won’t pay to anyone, so he will become rich even faster.

So you are a God, you have just created this fantastic world with infinitely many people and you are just about to announce your strategy for how everyone can become rich in no time without working, when you discover that you have made a terrible mistake: you forgot to label the doors. In fact, there is no way to distinguish any two rooms, so now the inhabitants of your world can’t know what direction they should send their money.
If only you could tell one person that he was a chosen one (=\epsilon), he could then tell his neighbors and they would give him money each minute. They would then tell their neighbors, and so on (remember, no one would lie, since they know they will get rich even if they are honest). This way it would take some time for people far away from the chosen one before they starts getting richer, but when they do, their wealth will grow exponentially.

Unfortunately, you have no way to communicate to only one person: you can only use the loudspeakers. You did, however, remember to give them a dice each. Is there a protocol that would make everyone’s wealth grow exponentially with probability 1?

Edit: I asked this question here on mathoverflow . The answer is no. It turns out to be a special case of  the “Mass Transport Principle”. It can be found on page 283 in this book which is available online (page 293 in the pdf-file).

My bachelor thesis: Barriers to proving P != NP

Posted June 11, 2011 by Sune Kristian Jakobsen
Categories: Uncategorized

Tags: , , ,

I have just finish my bachelor thesis Barriers to proving P != NP. I hope that some of you will find it interesting.

Abstract:

This thesis is about the \text{P} \stackrel{?}{=}\text{NP} problem and why this is so difficult to solve. We consider two different techniques from complexity theory, see how they can be used to separate complexity classes and why they do not seem to be strong enough to prove \text{P}\neq \text{NP}. The first technique is diagonalization, which we will use to prove the time hierarchy theorem. We will then see why a result of Baker, Gill and Solovay seems to indicate that this technique cannot prove \text{P}\neq \text{NP}. We will see how the circuit model can be use to prove PARITY\notin AC^0 and see a result by Razborov and Rudich, which indicate that this approach is also too weak to prove \text{P}\neq\text{NP}. Finally, we will see that provability and computability are closely related concepts and show some independence results.

Not abstract:

I wish I had had enough time to include algebrization.

Cauchy’s functional equation II

Posted December 21, 2010 by Sune Kristian Jakobsen
Categories: Uncategorized

Tags: ,

In this post I will finish what I began in my previous post. I have collected the two posts in a pdf-file, which might be a bit easier to read. Again you are welcome to comment on both math and language.

4: More wild facts about non-linear solutions

A function could have a graph that is dense in the plane, but still be ‘nice’ on a large part of the domain. E.g. there exist functions that are 0 on all irrational numbers, but still have a graph that is dense in the plane. Inspired by this, here is a list of things that would make a function wild even in a measure theory sense. Again, the list is ordered, and again all these statements are true for any non-linear solution to Cauchy’s functional equation.

  • f is not measurable. (When I write measurable function and measurable set, I always mean Borel-measurable.)
  • |f| is not dominated by any measurable function.
  • If A is measurable set such that f is bounded above on A, then A has measure 0.
  • If A is measurable set such that f is bounded above on A, then A has measure 0.
  • If B\subset \mathbb{R} is open and non-empty and A\subset \mathbb{R} is measurable and f(A)\cap B= \emptyset, then A have measure zero.

I will prove that last one.

Proof: Assume for contradiction that there is a non-linear function f satisfying Cauchy’s functional equation and a non-empty open set B\subset \mathbb{R} and a measurable set A\subset \mathbb{R} with positive measure, and f(A)\cap B=\emptyset. Any open set contains an open interval, so without loss of generality, we can assume that B is an open interval. We assumed that m(A)>0, where m(X) denotes the measure of a set X\subset \mathbb{R}, and since measures are countably additive, there must be an interval of length 1 with m(A\cap I)>0, so without loss of generality, we assume that A is contained in an interval of length 1. To reach a contradiction, I will show that under these assumptions, there exist two sequences of sets, A_n and B_n of subsets of \mathbb{R} satisfying:

  • Each A_n is a subset of a interval of length 1.
  • m(A_n)\geq m(A)>0 for each n.
  • Each B_n is a open interval.
  • The length of B_n tends to infinity as n\to\infty.
  • f(A_n)\cap B_n=\emptyset.

Since f(A)\cap B = \emptyset and f(nx)=nf(x) for n\in\mathbb{N} we have f(nA)\cap nB = \emptyset for n\in\mathbb{N}. We assumed that A is contained in an interval of length 1, so nA is contained in an interval of length n. Furthermore, m(nA)=n\cdot m(A) so using the pigeonhole principle we can find an interval I_n of length 1 such that m(nA\cap I_n)\geq m(A). We see that f(nA\cap I_n)\cap nB =\emptyset so I choose A_n=nA\cap I_n and B_n=nB.

 

I will now use the existence of A_n and B_n to reach a contradiction. For each n let a_n be a number such that A_n\subset [a_n,a_n+1] and let b_n and c_n be numbers such that B_n=(b_n,c_n). We know that the graph of f is dense in the plane, so we can find some x_n\in (a_n-1,a_n) with \frac{3b_n+c_n}{4}<\frac{b_n+3c_n}{4}. Now the sequences C_n=A_n-x_n and D_n=B_n-f_n satisfy the above five requirements for A_n and B_n, and furthermore C_n\subset [0,2] and the lower and upper bound of D_n will tend to minus infinity resp. plus infinity. Now for all x\in \mathbb{R} there is some N\in \mathbb{N} such that x\in D_n for all n\geq N. But f(C_n)\cap D_n=\emptyset, so the sequence of indicator functions 1_{C_n} converge pointwise to the 0-function. It is dominated by 1_{[0,2]} which have integral 2, so the dominated convergence theorem tells us that \int 1_{C_n}=m(C_m) tends to 0 as n\to\infty. But this contradicts m(C_n)\geq m(A)>0. QED.

 

Assuming the axiom of choice we can find a discontinuous solution to Cauchy’s functional equation, and if we let B in the above be the set of positive real numbers, we see that any measurable set A with

A\subset \{x|f(x)\leq 0\}

must have measure 0. But similarly, any measurable set A with

A\subset \mathbb{R}\setminus \{x|f(x)\leq 0\}=\{x|f(x)> 0\}\subset \{x|f(x)\geq 0\}

must also have measure 0. Thus we have found a set, such that neither the set nor its complement contains a set with positive measure.

5: Extra wild functions

We have seen that the graph of a non-linear solution is in many ways ‘spread out all over the plane’. But there are some ways to interpret ‘spread out all over the plane’ for which this is not true for all solutions. E.g. let (x_i)_{i\in I}, x_i\in \mathbb{R} be a Hamel basis. Now the function

f\left(\sum_{i\in I}q_ix_i\right)=\sum_{i\in I}q_i, \quad q_i\in \mathbb{Q}, |\{i\in I|q_i\neq 0\}|<\infty

is a solution to Cauchy’s functional equation, but f(x) is always rational. However, there exist surjective solutions, so in some ways, some solution functions are `wilder’ than others. As usual, I will give a list of wild properties a function can have, and as usual the list is ordered such that any of the properties imply the one above. Unlike for the other lists, there are some solution functions that do not satisfy any of the properties and some that satisfy all of them. Moreover, for any two properties on the list, there exist solutions, that satisfy the upper of the two, but not the lower one.

  • f is surjective.
  • The graph of f intersects any line in the plane.
  • For any continuous function c:\mathbb{R}\to \mathbb{R} there is a x\in \mathbb{R} with f(x)=c(x).
  • For any continuous function c:\mathbb{R}\to \mathbb{R} the set \{x\in\mathbb{R}|f(x)=c(x)\} is dense in \mathbb{R}.
  • any continuous function c:\mathbb{R}\to \mathbb{R}, any measurable set A\subset \mathbb{R} with \{x\in A|f(x)= c(x)\}=\emptyset have measure 0.

First I will prove that there is a solution f satisfying the third property. The proof of the existence of solutions satisfying the two last properties are similar, and I will sketch those proofs afterwards.

 

Proof: We begin by choosing a Hamel basis (x_i)_{i\in I} and well-order I. That is, we find an ordering on I such that any subset of I has a least element. The existence of such an ordering (on any set) is equivalent to the axiom of choice. The set \{x_i|i\in I\} is a subset of \mathbb{R}, so the cardinality of this set is not greater than the cardinality of \mathbb{R}. Since x_i\neq x_j for i\neq j we get that |I|\leq |\mathbb{R}|. Assume for contradiction that |I|<|\mathbb{R}|. Using the rules for calculations with cardinality we know that |I\times \mathbb{Q}|=\max(|I|,|Q|) and more generally |(I\times \mathbb{Q})^n|=\max(|I|,|\mathbb{Q}|,|I|,\dots,|\mathbb{Q}|)=\max(|I|,|\mathbb{Q}|). Since any element in \mathbb{R} is a linear combination of the x_i‘s over \mathbb{Q} so for any real number y here is a n\in \mathbb{N} and ((i_1),(q_1),\dots, (i_n,q_n)), i_k\in I,q_k\in\mathbb{Q} such that

y=\sum_{k=1}^nq_kx_{i_k}

Hence

|\mathbb{R}|\leq \left|\bigcup_{n=1}^{\infty}(I\times \mathbb{Q})^n\right|=|\mathbb{N}|\cdot |\max(|I|,|\mathbb{Q}|)|=\max(|\mathbb{N}|,|I|,|\mathbb{Q}|)<|\mathbb{R}|

To reach this contradiction, we assumed that |I|<|\mathbb{R}|, so |I|=|\mathbb{R}|.

 

Now, let’s see how many continuous functions c:\mathbb{R}\to\mathbb{R} there is. A continuous function is uniquely determined by its value on the rational numbers, so |C|\leq |\mathbb{R}^{\mathbb{Q}}|=|\mathbb{R}|, where |C| is the set of continuous functions. On the other hand, the constant functions are continuous, and there are \mathbb{R} of them, so |C|\geq |\mathbb{R}|. Hence |C|=|\mathbb{R}|=|I|, so we can index the set of continuous functions with the set I, so that C=\{c_i|i\in I\}. We can now define f(x_i)=\lambda_i=c_i(x_i) to make sure that the equation f(x)=c_i(x) have a solution. Now f is given by

f\left(\sum_{i\in I}q_ix_i\right)=\sum_{i\in I}q_i\lambda_i \quad q_i\in \mathbb{Q}, |\{i\in I|q_i\neq 0\}|<\infty.

QED.

 

If we want the set of solutions to f(x)=c(x) to be dense in \mathbb{R}, it is a bit more complicated. The idea is, that instead using the set I to index the set of continuous functions, we use it to index the set of continuous functions times the set of open intervals. Unfortunately we cannot be sure that x_i is in the open interval corresponding to i. Instead we start by defining f(1)=0. Now we know that for each i there is a q_i\in\mathbb{Q} such that q+x_i is in the open interval corresponding to i, and we define f(x_i)=c_i(q_i+x_i).

 

If we want to show that there exist solution functions with the last property, it is much more complicated: Here we need transfinite induction, because we need to choose the elements of the Hamel basis one at a time. We know that the set of (Borel-)measurable set has the same cardinality as \mathbb{R}, thus the set of functions c:A\to \mathbb{R}, that can be defined as a continuous function \mathbb{R}\to\mathbb{R} restricted to a measurable set A with positive measure, has the cardinality |\mathbb{R}\times \mathbb{R}|=|\mathbb{R}|. Now we index this set by a set I. Using axiom of choice, we can well-order this set, and we can even choose the ordering such that |\{j\in I|j<|I|=|\mathbb{R}| for all i\in I. Now for each i\in I we choose x_i and \lambda_i such that x_i is in the domain of c_i and f(x_i)=c_i(x_i) and such that the x_i‘s to be linear independent over \mathbb{Q}.

 

To show that this is possible, I only need to show that when we have chosen x_j for all j < i we can choose x_i such that x_i is linearly independent of the \{x_j|j<i\} over \mathbb{Q} and in the domain of c_i. The rest follows by transfinite induction. We know that measurable sets with positive measure are uncountable, so if we assume the continuum hypothesis (a statement independent of ZFC: it states that a set cannot have a cardinality between |\mathbb{N}| and |\mathbb{R}|), any measurable set have the same cardinality as \mathbb{R}. (It is still true without the continuum hypothesis, but it is more difficult to prove. See [BS].) We know that |\{x_j|j<|\mathbb{R}| so the cardinality of the linear span over \mathbb{Q} of this set is also less than |\mathbb{R}|, since you cannot reach |\mathbb{R}| by taking countable union and finite products of sets of smaller cardinalities. (In general, still assuming axiom of choice, you cannot get a set with some infinite cardinality \kappa, by taking finite products of sets with smaller cardinality, or by taking union of \mu<\kappa sets with smaller cardinality.) Since the domain of c_i have the same cardinality as \mathbb{R}, we can choose an element x_i in the domain of c_i and not in the linear span of \{x_j|j<i\}.

 

By transfinite induction, we have now chosen x_i‘s such that they are linearly independent over \mathbb{Q}. However, we cannot be sure that they span all of \mathbb{R}. So we end by using the axiom of choice once again to extend the set \{x_i|i\in I\} to a Hamel basis, and we set the rest of the \lambda‘s to be zero. This gives us a solution to Cauchy’s functional equation, with a graph that intersects any continuous function on any measurable set with positive measure.

References

[BS]: James M. Briggs and Thomas Schaffter. Measure and Cardinality. The American Mathematical Monthly, Vol. 86, No. 10, pp. 852-855.

[EH]: Ernst Hansen. Measure Theory. Department of Mathematical Sciences University of Copenhagen. 2009.

[MO]: mathoverflow: Do sets with positive lebesgue measure have same cardinality as R?

Cauchy’s functional equation I

Posted December 20, 2010 by Sune Kristian Jakobsen
Categories: Uncategorized

Tags: ,

Next semester, I’m going to write my undergraduate thesis about the P\neq NP problem, and right now I’m trying to decide if I should write it in Danish or in English. So I decided to translate a few pages about Cauchy’s functional equation that I wrote in Danish last year. Today I’ll post the first half of this, and I will post the rest in a few days (update: It’s here. Here you can also find the notes in pdf.). If you have anything to say about the mathematics in this post or about my English, please post a comment.

Cauchy’s functional equation

Cauchy’s functional equation, f(x+y)=f(x)+f(y), f:\mathbb{R}\to\mathbb{R} looks very simple, and it has a class of simple solutions, f(x)=\lambda x, but there are many other and more interesting solutions. In these notes, I will show you what some of these “wild” solutions look like, and I will use them to prove that there exist a set A\subset \mathbb{R}, such that neither A nor \mathbb{R}\setminus A contains a measurable subset with positive measure. Section 1 is about Cauchy’s functional equation on the rational numbers, in section 2 I show that there some wild solutions on \mathbb{R}, and in section 3 I will show that their graphs are dense in \mathbb{R}^2. In section 4 I’ll show that these functions are ugly from a measure theoretical point of view, and in section 5, I’ll show that some of these functions are wilder than others. E.g., I will prove that there is a solution to Cauchy’s functional equation, that intersects any continuous function from \mathbb{R} to \mathbb{R}.

1: The simple solutions

First, we consider the equation over the rational numbers. That is, f(x+y)=f(x)+f(y),f:\mathbb{Q}\to\mathbb{Q} By setting x=y=0 we get f(0)=2f(0) and thus f(0)=0. Let’s set \lambda = f(1). If f(n)=\lambda n we get: f(n+1)=f(n)+f(1)=\lambda n+\lambda=\lambda (n+1) By definition of \lambda, we have f(n)=\lambda n for n=1, so by induction, f(n)=\lambda n for all n\in\mathbb{N}. More generally, we can prove that for x\in \mathbb{Q} and n\in\mathbb{N} we have f(nx)=nf(x): It is clearly true for n=1 and if it is true for n we get: f((n+1)x)=f(nx+x)= f(nx)+f(x)=nf(x)+f(x)=(n+1)f(x) Let x be a positive rational number, and write it as x=\frac{n}{m}, where n,m\in \mathbb{N}. Now, mf(x)=mf(n/m)=f(n)=nf(1)=\lambda n Dividing by m we get f(x)=f(n/m)=\lambda(n/m)=\lambda x. Furthermore, 0=f(0)=f(x+(-x))=f(x)+f(-x) so f(x)=-f(-x). Putting it all together we have f(x)=\lambda x for all x\in \mathbb{Q}. It is easy to verify that f(x)=\lambda x is a solution for the general equation on \mathbb{R}.

2: Existence of wild solutions

Now consider Cauchy’s functional equation on the real numbers, f(x+y)=f(x)+f(y), f:\mathbb{R}\to\mathbb{R} The proof from last section, tells us that f(q)=qf(1) for all rational numbers q, and using the same idea, we can prove that f(qx)=qf(x) for all q\in \mathbb{Q} and x\in \mathbb{R}. But this does not imply that f(x)=xf(1) for all the real numbers. However, if we assume that f is continuous, we can show that f(x)=\lambda x for all x\in \mathbb{R}: We simply choose a sequence (x_i)_{i\in \mathbb{N}} of rational numbers that converge to x. By continuity we get f(x)=f\left(\lim_{i\to\infty}x_i\right)=\lim_{i\to\infty}f(x_i)=\lim_{i\to\infty}\lambda x_i=\lambda x But it is much more fun if we do not have any assumptions on f! Using axiom of choice we can find non-continuous solutions. The idea is: A priori we only know that f(0)=0. Now we choose some value for f, e.g. f(1)=3. This determines f on all the rational numbers, f(q)=3q for q\in \mathbb{Q}, but the value of f is not determined on any irrational number. So we make another choice, let’s say f(\sqrt{2})=\pi. Now the functional equation tells us that f(q_1+q_2\sqrt{2})=3q_1+\pi q_2 for all q_1,q_2\in \mathbb{Q}. But for numbers x not on this from, we cannot determine the value of f(x). So we simply continue by choosing more and more values of the function. Unfortunately, we have to make infinitely many choices, so we need axiom of choice. In the rest of these notes, I will assume axiom of choice. To formalize the above, we consider the set of real numbers as a vector space over \mathbb{Q}, in much that same way as you can consider \mathbb{C} to be a two dimensional vector space over \mathbb{R}. An important difference is, that when we consider \mathbb{R} to be a vector space over \mathbb{Q} it is infinite dimensional: it even has uncountably many dimensions. We now use the axiom of choice to choose a basis (x_i)_{i\in I}, x_i\in \mathbb{R} (a so-called Hamel basis) and we choose some coefficients (\lambda_i)_{i\in I},\lambda_i\in \mathbb{R}. This defines a linear map from this vector space to itself: f\left(\sum_{i\in I}q_ix_i\right)=\sum_{i\in I}q_i\lambda_i, where the q_is are rational numbers, and only finitely many of them are non-zero. I called this function ‘linear’, so it sounds like it is a nice function. But it is not! It is only linear when we consider \mathbb{R} as a vector space over \mathbb{Q} and forget about the rest of the structure on \mathbb{R}. This function is only linear in the usual sense on \mathbb{R} if \frac{\lambda_i}{x_i} is the same for all i. All functions on this form are solutions to the Cauchy’s functional equation, and conversely all solutions to Cauchy’s functional equation are on this form.

3: What do I mean by “wild”?

A function can be more or less wild/ugly/pathological. Here is a list of possible definitions of what makes a function f wild. The list is ordered, such that any of the properties imply the one above.

  • f is discontinuous.
  • f is discontinuous in every point in \mathbb{R}.
  • f is unbounded on any open interval.
  • f is unbounded above on any open interval.
  • the graph of f, defined by \{(x,f(x))|x\in\mathbb{R}\}\subset \mathbb{R}^2, is dense in \mathbb{R}^2, that is, any point in \mathbb{R}^2 is the limit of a sequence of points in the graph of f.

All of these statements are true for any non-linear solution to Cauchy’s functional equation. I will show that the last one of these is true. Proof: Let f be a non-linear solution. If (x_1,y_1) and (x_2,y_2) are points in the graph of f, and q is a rational number, we see that the points (x_1,y_1)+(x_2,y_2)=(x_1+x_2,y_1+y_2) and q(x_1,y_1)=(qx_1,qy_1) are both in the graph too. In words, any linear combination over \mathbb{Q} of points in the graph are also in the graph. Since f is non-linear we can find real numbers x_1 and x_2, both non-zero, such that \frac{f(x_1)}{x_1}\neq \frac{f(x_2)}{x_2}. Now the two vectors (x_1,f(x_1)) and (x_2,f(x_2)) are linearly independent (over \mathbb{R}), so they span the plane. That is, any point (x,y) can be written as a(x_1,f(x_1))+b(x_2,f(x_2)) for some a,b\in \mathbb{R}. Let q_i and r_i be sequences of rational numbers with \lim_{i\to \infty}q_i=a and \lim_{i\to\infty}r_i=b. Now q_i(x_1,f(x_1))+r_i(x_2,f(x_2)) is a sequence of points in the plan converging to (x,y), so the graph is dense in \mathbb{R}^2.

Functions with exactly one stationary point

Posted August 9, 2010 by Sune Kristian Jakobsen
Categories: Uncategorized

Let f:\mathbb{R}^n\rightarrow \mathbb{R} be a continuous differentiable function, with exactly one stationary point x, and suppose that this point is a local minimum. If n=1 it is easy to see that x must be a global minimum, but what if n\geq 2?

The answer is written in white. Highlight it to read it.

No, x doesn’t have to be a global minimum. Consider the function f(x,y)=(e^y+e^{-y^2})(-2x^3+3x^2)-e^{-y^2} (I don’t know how to make latex white, so I wrote it in plain text instead, sorry). We see that (e^y+e^{-y^2}) is positive so df/dx=0 only if x=0 or x=1. For x=1 the function is e^y and doesn’t have any stationary points. For x=0 the function is -e^{-y^2}, and (0,0) is a stationary point. The point is a local minimum, since for x < 1 we have f(x,y)>=f(0,y)>=f(0,0)=-1, but it is not a global minimum since f(2,0)=-9. Now the next question is: What if f is a polynomium? I don’t know the answer.

IMO 2010

Posted July 23, 2010 by Sune Kristian Jakobsen
Categories: Uncategorized

Tags: ,

This year was a great year for the Danish IMO-team: We (I say “we” because I was deputy leader for the team) got our first gold medal, and as a team we beat our previous record, 77 points, by 13 point. In particular I was happy to see that we, together with China and US, was the best country on problem 5 (a problem that Tao thought was more challenging and interesting than problem 6). Denmark is usually in the second half on the country list, and even in a good year like this, we only beat 54% of the teams, so this is really remarkable. One reason, that we got that many points on problem 5 is, that is was a problem in combinatorics, and Denmark is usually good at combinatorics. Another reason is that problem 4 was a geometry problem, and Danes usually hate geometry! (remember that you get problem 1, 2, and 3 on day 1 and problem 4, 5, and 6 on day 2).

As you can see in the above link, Tao started a mini-polymath project about problem 5. So here you can see how a group of mathematicians worked together to solve that problem (and see the problem statement). I decided to ask some of the Danish contestants for a description of how they solved the problem, so that future contestants can see how a single person thinks. Here is Anders Eller Thomsen’s description. Edit: And here is Mathias Bæk Tejs Knudsen’s.

Prove a more general theorem

Posted July 20, 2010 by Sune Kristian Jakobsen
Categories: Problem solving

Tags: , ,

Often when you try to prove something, it is easier prove a stronger statement. Many of the examples of this, are statements that are proven by induction. E.g. define the sequence of functions f_n:\mathbb{R}\to\mathbb{R} by f_1(x)=x^2-1 and f_n(x)=f_{n-1}(f_1(x)) for n\geq 2. Prove that f_n has n+1 real roots (inspired by problem 2 day 1  IMC 2004).

When solving this problem, it helps to find a stronger statement, that is true for all n, because that gives you a stronger induction hypothesis, so that you can make the induction step. But sometimes, this is not enough, and you have to “invent” (or should I say discover?) other cases. Let me give an example of this:

Problem 5 day 1 IMC 1999: Suppose that 2n points of a n\times n grid are marked. Show that for some k > 1 one can select 2k distinct marked points, say a_1,...,a_{2k}, such that a_{2i-1} and a_{2i} are in the same row, a_{2i} and a_{2i+1} are in the same column, \forall i, indices taken mod 2k.

We have 2n point and n rows and n columns, so on average there is 2 points in each row and in each column. If we knew that there was at least two points in each row and column, the problem would be easy: We could just start at one point, go to a point in the same row,  then to a point in the same column, and so on. Continue to do this (you can do that, because there is a least two points in each row and column), until you hit a point you have visited before, and you have a loop (if you both began and ended with a “row-move” or with “column-moves”, you can just “jump over” the first point). But unfortunately, there could be rows or columns with only one point or none points at all. If there is both a row and a column with at most 1 point in each, we can delete this row and column, and we have a problem with a (n-1)\times (n-1) grid and at least 2(n-1) marked points. This gives us a hope, that we can prove the statement by induction. But what if all rows contains 2 marked points, but some columns only contain 1? If we delete a column, we would get a n\times (n-1) grid with 2n-1 marked points, so this suggests that we should try to prove something stronger:

Stronger statement: Let n+m points of a n\times m grid be marked. Now for some k > 1 you can select 2k distinct marked points, a_1,...,a_{2k}, such that a_{2i-1} and a_{2i} are in the same row, a_{2i} and a_{2i+1} are in the same column, \forall i, indices taken mod 2k.

Proof: By induction on n and m. This is true if n=1 or if m=1 because you can’t choose m+1 points in a 1\times m grid. If we have two point in every row and column, we can just use the above proof, if not, we delete a row or column with at most one point, and thereby reduces the problem to a smaller one.

Here is another IMC problem, you can try to solve:

Problem 5 day 1 IMC 2004: Let S be a set of \displaystyle { 2n \choose n } + 1 real numbers, where n is a positive integer. Prove that there exists a monotone sequence \{a_i\}_{1\leq i \leq n+2} \subset S such that
|x_{i+1} - x_1 | \geq 2 | x_i - x_1 | ,\text{for all } i=2,3,\ldots, n+1

Update: Here are two MathOverflow questions about this subject, where you can find more examples

A solution to IMC09 problem 2

Posted May 7, 2010 by Sune Kristian Jakobsen
Categories: Problem solving

Tags: , ,

In this post I will explain how I solved the following problem:

Problem: Let A, B, and C be real square matrices of the same size, and suppose that A is invertible. Prove that if (A-B)C=BA^{-1} then C(A-B)=A^{-1} B.

It is possible to prove this in three lines, but I don’t think that anyone would learn anything about problem solving from just seeing the proof. Instead I want to describe how I solved it. In order to make it easier to read, I decided to write the description in present tense and to include the reader, so instead of “Then I tried to” I write “Now we try to”. I solved this problem about two months ago, so I probably had a lots of thoughts that I have forgotten about now.

At first this problem seems a bit confusing, so we’ll try get a better understanding of the problem. It is a problem from a contest, so unlike when you’re doing research, you don’t have to worry about the possibility that the statement could be false. The assumption is that A, B, and C are real square matrices, so the first problem is to decide if this is important. What if A, B, and C are complex matrices? Or more generally: What if A, B, and C are elements in a ring? (If you don’t know what a ring is, don’t worry. All I’m asking is: Is it possible to prove that statement, by only using algebraic rules like (A+B)C=AC+BC?). It is easy to state the problem for elements in a ring, and it seems unlikely that it should be false for rings, given that it is true for real matrices, so let’s try to work on this conjecture:

Conjecture: Let A, B, and C be elements of a ring R, and suppose that A is invertible. Now (A-B)C=BA^{-1} implies C(A-B)=A^{-1}B.

In other words, you shouldn’t think of the matrices as sets of real numbers arranged in arrays, but only of their algebraic properties. The problem is still a bit difficult to get your head around, so we’ll try to “turn off” some of the difficulties. The three most obvious difficulties are A, B and C, so let’s see what happens if we let one of them be the identity. We get three new problems:

  1. (I-B)C=B \Rightarrow C(I-B)=B
  2. (A-I)C=A^{-1} \Rightarrow C(A-I)=A^{-1}
  3. A-B=BA^{-1} \Rightarrow A-B=A^{-1}B

Proving one or more of these won’t solve the original problem, but it might give an idea of how to solve it. (You could also set two of the three matrices to be equal to I, but the resulting statements are too trivial to be interesting). Let’s look at the first of the three statements:

(I-B)C=B \Rightarrow C(I-B)=B

That is, if (I-B)C=B then I-B and C commute. How can we prove that two elements in a ring commute? Well, if their product is the identity, we know that they are each others inverses, and therefore commute (Edit: This is not true in a general ring, but it is true for matrices. See my comment). Unfortunately the right hand side is B, and if we assume B=I the statement is trivial. So instead we assume B to be invertible and multiply by B^{-1}. Remember that right now we are not trying to give a formal proof of anything, but only to get some ideas of what a proof might look like, so we are free to add this “niceness” –assumptions about B. Assume also that (I-B)C=B. Now we get:

(I-B)CB^{-1}=BB^{-1}=I

That is, I-B is the inverse of CB^{-1}, so they commute and we have:

CB^{-1}(I-B) =I \Rightarrow C(B^{-1}-I)= I \Rightarrow C(I-B)B^{-1}=I \\ \Rightarrow C(I-B)=B

We have now proved the statement in the case where A=I and B is invertible. Now it’s natural to try the same idea in the B=I-case. Without any further assumptions we get:

(A-I)C=A^{-1}\Rightarrow (A-I)CA=I\Rightarrow CA(A-I)=I\Rightarrow \\ C(A^2-A)=I \Rightarrow C(A-I)A=I \Rightarrow  C(A-I)=A^{-1}

As we wanted. The final case is equivalent to showing that if A-B=BA^{-1} then B and A^{-1} commute. We can’t just use the same trick as before, because we don’t know that A and B commute. But we have learned one important lesson from the two other cases: In order to show that two matrices commute, it is useful to find two matrices that are each other’s inverse, and use that they commute. So we add the identity on both sides and move everything else to the left hand side, and look for a factorization:

A-B=BA^{-1}\Rightarrow A-B-BA^{-1}+I=I\Rightarrow \\ (A-B)(A^{-1}+I)=I\Rightarrow  (A^{-1}+I)(A-B)=I\Rightarrow \\I-A^{-1}B+A-B=I\Rightarrow A-B =A^{-1}B

Now that we have solved the three simpler problems, we can look back at the original conjecture.

(A-B)C=BA^{-1}\Rightarrow C(A-B)=A^{-1}B

There are two ways you can use a proof of a simpler case to prove a more general theorem: The first is to use the fact that the simpler case is true. If we make the assumption that C is invertible, this technique is actually useful. We get:

(A-B)C=BA^{-1}\Rightarrow AC-BC=BC(AC)^{-1}\Rightarrow \\AC-BC=(AC)^{-1}BC\Rightarrow  A-B=C^{-1}A^{-1}B\Rightarrow \\ C(A-B)=A^{-1}B

Where we use that A-B=BA^{-1} \Rightarrow A-B=A^{-1}B with AC and BC as A and B. Now we have proved the statement under the assumption that C is invertible, but if we don’t have this assumption, we have to do something else. (We could use a limit argument, like Gowers did in a comment to my last post, but I didn’t think of this trick when I solved the problem).

I mentioned that there are two ways of using a proof of a special case to prove a more general theorem, but I only gave one. The other is one to modify your proof to cover the more general theorem. So let’s look back at the proof that A-B=BA^{-1} \Rightarrow A-B=A^{-1}B, and try to modify the proof to showing that (A-B)C=BA^{-1}\Rightarrow C(A-B)=A^{-1}B. Now the proof almost writes itself:

(A-B)C=BA^{-1}\Rightarrow (A-B)C-BA^{-1}+I=I\Rightarrow \\ (A-B)(A^{-1}+C)=I\Rightarrow (A^{-1}+C)(A-B)=I\Rightarrow \\ I-A^{-1}B+C(A-B)=I\Rightarrow C(A-B) =A^{-1}B

QED.