The purpose of today’s post is to make your head explode with simple arithmetic. If you’re reading this, I’ll bet that you feel pretty confident in your ability to add and subtract, and that you have an excellent grasp of a complicated concept like “three.”
So OK, let’s start out with a simple challenge: Explain to me what “three” means. Define it.
If you give me an answer like “one plus two,” then OK, smarty-pants – define “one,” “two,” and “plus.”
Try this for a bit, and look up some definitions, and you may realize that an idea which seems blindingly obvious is actually fairly hard to put into words. And that’s important, because if we want to do mathematics, or things that depend on it like “building bridges that don’t fall down,” then we generally want to be able to start by defining our basic concepts, and if we can’t even define
three, then what hope have we got of defining more complicated things like “differential equations?”
Today I’m going to walk you through the answer to this seemingly simple question. This was actually the product of decades of very hard work by very intelligent, albeit somewhat eccentric, individuals. (If you want to read about their story, I highly recommend the book
Logicomix by Apostolos Doxiadis and Christos Papadimitriou. Fair warning: the phrase “mad as a bag of clams” will apply.) Rather than giving you the fully technical answer (the book that ultimately put a lot of this on a sound footing,
Principia Mathematica, takes until page 83 of volume 2 to finally prove Proposition 110.643, “1+1=2,” a fact which they note is “occasionally useful”), I’ll try to boil it down to its simplest concepts – but what you’re going to get below is the real mathematics, not the kiddie version. By the time we’re done, you’ll know
why 1+1=2.
Sets and relationshipsThe first big insight that let us answer these questions was that, while it can be really messy to define “three,” one thing that we
can define is the notion of a set of items. I’ll skip the very technical definition of a set (see the notes below if you want it), but it basically consists of some simple notions: two sets are the same if every element of one is an element of the other, and vice-versa; there is an empty set, which has no elements; if you have a set, then any collection of elements from that set is also a set; so is the union of two sets, or the intersection of two sets, or the set containing just that set. Basically, a set is a careful way to describe the idea of “here are a bunch of objects, in no particular order.”
Another thing that we can define is relationships between things. (And since the only things we’ve defined so far are sets, we’ll talk about that a lot) You can probably think of lots of relationships between objects in daily life: “is the same as,” (a relationship between all sorts of things) “is less than,” (a relationship between numbers) “is the mother of,” (a relationship between people), “has just eaten.” (A relationship between me and my dinner) A relationship is a statement about a pair of objects which can be either true or false.
Now, there are some special kinds of relationships out there, and they have special names. A
reflexive relationship is a relationship that is always true about any object and itself. For example, “is the same as” is reflexive: every object is the same as itself. Ditto “is as big as” for anything that has a size. On the other hand, “is less than” isn’t reflexive: in fact,
no number is less than itself. Likewise, no person is her own mother. (Although this isn’t true for every family relationship:
I'm my own Grandpa)
A
symmetric relationship is one that always works both ways: if A has this relationship to B, then B always has this relationship to A. “Is the same as” is symmetric; if A is the same as B, then B is the same as A. Likewise “is married to,” among people. On the other hand, “is less than” isn’t symmetric: if A is less than B, then B isn’t less than A!
A
transitive relationship is one such that if A is related to B, and B is related to C, then A is related to C. Being equal to, or being less than, is transitive: if A < B, and B < C, then we immediately know that A < C, too. But “eats” isn’t transitive, as a relationship between animals: wolves eat deer, and deer eat grass, but wolves generally don’t eat grass.
Why did I just tell you about these three kinds of relationship? Because it turns out that something very magical happens when you find a relationship which is reflexive, symmetric, and transitive. Such a relationship is called an “equivalence relationship,” because “equals” is the most famous example. What’s magical about equivalence relationships is that you can use them to divide up all of the objects in the world. To see how, let’s take a concrete example: the relationship “is a blood relative of” among people. You can see that it’s an equivalence relationship, because everyone is a blood relative of themselves; if Alice is a blood relative of Bob, then Bob is one of Alice; and if Alice is Bob’s relative, and Bob is Christie’s, then Alice is Christie’s relative as well.
Pick any one person to begin with, say Alice. Our first person forms her own group.
Now pick a second person, Bob. If Bob is a blood relative of Alice, then he goes in her group; if he isn’t, then he starts his own group.
Pick a third person, Christie. If Christie is a blood relative of Alice, she goes in Alice’s group; if she’s a blood relative of Bob’s, she goes in his group; and if she’s neither, she starts her own group. There’s no possibility that she could be both, because if she were a blood relative of both Alice and Bob, then Alice and Bob would also be blood relatives, so they would have been in the same group!
Keep doing this until you’ve put all of the people into groups. What are these groups? Well, if you pick any two people in the same group, they’re blood-related, and if you pick any two people in different groups, they aren’t blood-related. You’ve just divided the entire world into families: every person is in exactly one group, and two people are related to each other if and only if they’re in the same group. This kind of division is called a
partition.Partitions are useful, because these groups – we call them “equivalence classes” sometimes – are now objects that we can talk about much like we talked about the individuals. For example, say there’s some other relationship – “has red hair” – which has the property that, if two people are blood related, then they have the same hair color. We don’t even require the reverse property, that people who aren’t blood related have different hair color; maybe
everyone in this world has red hair. This is a property of people, but we can use the partition to say that it’s also a property of families: after all, if two people are in the same family, they’re blood-related, and so if one has red hair, then the other does. That means that if anyone in a family has red hair,
everyone in that family does. So we can say that “the family has red hair:” a property of individuals is also a property of these groups.
We can give these families names, if we want, or we can just name them after any element we pick: “Alice’s family.”
Comparing apples and orangesNow, let’s do some real magic. Here is a relationship for you, a relationship between sets: “has as many items as.” Be careful! This sentence sounds simple, but I actually almost cheated you, because I never explained what “as many as” means. The definition we want is:
If we have two sets A and B, then A has as many items as B if there is a way to pair up every element of A with an element of B, so that none are left over on either side.
Makes sense, right? This is how you would (say) compare apples and oranges. Let’s define that pairing up a bit more closely: this means that I have some rule that, given any element of A, tells me which element of B it’s paired with. Being a pairing means that every element of A gets mapped to a
different element of B. Having nothing left over on the A side means that every element of A gets a mapping. Having nothing left over on the B side means that every element of B is mapped to by some element of A.
If we want to write this in math-speak, incidentally, we give this rule a name, say “f,” and we say that if x is in A, then f(x) (pronounced “f of x”) is the element of B that it’s paired with. We write “f: A → B” (“f maps A to B”) to say that every element of A gets mapped, and the result is always an element of B. “f is one-to-one” means that every element of A gets mapped to a different element of B; “f is onto” means that every element of B gets mapped onto.
I claim that this definition is actually what we normally think of as meaning “as many as.” For example, let’s say that I have two sets, namely a bucket of apples and a bucket of oranges. To make a pairing between them, I will line up the apples and oranges in two rows. “f” pairs an apple with the adjacent orange. “f is one-to-one” means that every apple is next to one orange, not two. If f weren’t one-to-one, then I might have put one apple next to a giant pile of oranges, which doesn’t really tell me anything about how many oranges I have. “f is onto” means that every orange is next to one apple; if it weren’t, then I might have a bunch of oranges left over, and again I have no idea how many oranges I have. But if f is one-to-one and onto, then every apple is next to exactly one orange, and that can only happen if I have exactly as many apples and oranges.
As it happens, “has as many items as” is an equivalence relationship. Let’s prove it.
Slightly harder part. If you slow down for these paragraphs until you understand them, the rest will seem a lot easier.First of all, it’s reflexive: for any set A, A has as many items as A does. This may seem idiotically obvious, but given that “three” turned out not to be so obvious, let’s be careful this time. What we’re really claiming is that for any set A, there’s a way to pair the items of A with the items of A which is one-to-one and onto. I choose the pairing “f(x) = x” – that is, I will pair each item of A with itself. (Note: this pairing is really not rocket science) It’s one-to-one, because if I pick two different elements of A, x and y, then f(x) = x and f(y) = y which are, therefore, different. It’s onto because if I pick any element x, then f(x) = x, which means that x was paired with – lo and behold! – itself. Therefore! I know that every set has as many items as it does.
Yes, that proof was a bit silly. OK, let’s up the ante: our relationship is symmetric. If A has as many elements as B, then B has as many elements as A. How do we know this? Well, if A has as many elements as B, then there is some pairing f: A → B which is one-to-one and onto. (By our definition of “has as many elements as”) So I will now define a new pairing g: B → A by the rule: if y is in B, then let g(y) be the element of A which f mapped onto y. (That is, g is the same pairing as f, only in the opposite order) That is, g(y) is the element of A such that f(g(y)) = y.
Is g even well-defined? Well, we know that f is onto, so every element of B was f of
something in A. And we know that f is one-to-one, so we know that this element of B was f of exactly
one thing in A. So it’s true that for every item in B, there is exactly one item in A that our f-rule paired with it, and that shall be our definition of g.
The statement “g is one-to-one” then means that g pairs every item of B with a different item of A – or a bit more explicitly, if x and y are two different elements of B, then g(x) and g(y) are two different elements of A. We want to prove that: if x ≠ y, then g(x) ≠ g(y). How do we know it? g(x) and g(y) are both elements of A, by definition, so we can apply the f-rule to them and get back elements of B. Now imagine that g wasn’t one-to-one: x and y are different, but g(x) = g(y). Since those two are the same, obviously f(g(x)) must be the same as f(g(y)). (If you apply the rule to the same thing twice, you get the same result) But we know what that combination is, because it’s just the definition of g! f(g(x)) is x, and f(g(y)) is y. That means that x and y are the same, which is a contradiction! Therefore we know that g
must be one-to-one.
The statement “g is onto” means that every item of A is g of something in B. I can even tell you what it is: if x is in A, then it’s g of f(x). How do I know this? Go back to the definition of g: “g(y) is the element of A such that f(g(y)) = y.” That means that g(f(x)) is the element of A such that f(g(f(x))) = f(x). (Got that? f of g of f of x – that is, start with some x, then apply rule f to get an element of B, then apply rule g to get an element of A, then apply rule f again to get another element of B) Now, what we have here is a statement that f of two different things are the same – but we know that f is one-to-one, which means that if f of one thing is the same as f of another, then the two things themselves must be the same. If we apply that to our messy equation above, that means that g(f(x)) = x – which is exactly what we wanted to prove! Every item x is g of something, namely f(x).
Whew. You can see that this one was a bit trickier.
The next proof is that “as many as” is transitive – that is, if A has as many elements as B, and B has as many elements as C, then A has as many elements as C. I’m going to apply the famous Math Teacher’s Cheat here and say that if you really want to understand this, you should sit down with a piece of paper and work it out yourself. You can use all of the same tricks that I just used above, and if you can prove it correctly, then you will really understand how equivalence works.
End of the hard part. Back to numbers! Let’s define three!So what we just proved is that “has as many items as” is an equivalence relationship. That means that we can use the partitioning trick that we worked out earlier with it, and divide all of the sets in the world up by how many things they have – that is, into “families” which all have the same number of items. A set of three apples and a set of three oranges would be in the same family, but they would be in a different family from a set of four armadillos or a set of twelve black holes.
And just like above, we can name these families. “Three” is the name of the family of all the possible sets that have as many items as, say, a set of three apples.
That’s the actual definition of three.
Wait! you say, having heard this story before. You just used three to define three! And I don’t think you ever actually defined any apples, just sets!
You clever fellow. Let’s be more careful.
We know that there is an empty set, a set with no elements at all. Let’s call it ∅. (That’s the standard symbol for it, and it’s pronounced “the empty set.”) This set must belong to some family, and we shall name that family Zero. (For reasons which will shortly become clear, I’ll give this set another name as well, “0.”)
Since that is a set, we also know that {∅} is a set. That means “the set which contains exactly one element, namely the empty set.” That is, this is a
set of other sets, and it has exactly one thing in it, which is the only other object we had lying around, namely ∅. Fine! It’s a set.
It can’t be in the same family as ∅ is, though, because there’s no way to pair them up. {∅} has exactly one element in it, namely ∅, but what do I pair it with? ∅ has
no elements in it, by definition. So the pairing can’t possibly be onto. So this set is in a new family, which we shall name One. And we’ll name this set “1,” since it’s the first element of One that we’ve met.
What other sets can we build? Well, we could make {{∅}}, namely the set that contains exactly one set, that being the set that contains the empty set. This set belongs in the same family as {∅}, though: the pairing is simply to pair {∅} with ∅. Both of those sets have one element, or rather, One element. This set doesn’t get a special name; it’s just {{∅}}, or {1}, to use the other name.
But we can also take the union of two sets, and this is where things start to look really absurd. {∅, {∅}} is also a set. That is, this is a set which has two elements, namely ∅ and {∅}. This one, you will quickly figure out, doesn’t go in either Zero or One, because (for example) if I paired up the ∅ of {∅, {∅}} with the ∅ of {∅}, then there would be nothing left over to pair the {∅} with. We have met our first element of Two: “2,” a.k.a. {∅, {∅}}, a.k.a. {0, 1}.
You can see where this is going to go. Our first element of Three will be the set {∅, {∅}, {∅, {∅}}}, “3,” a.k.a. {0, 1, 2}. And we’ll get Four, Five, and so on as well. We have just defined the counting numbers.
In fact, we also just figured out how to add one to any number, because each of our “numbers” is just the list of lesser numbers: 1 is {0}, 2 is {0, 1}, 3 is {0, 1, 2}, and so on. To add one to x, you simply take the union (combine the sets) of x with the set {x} itself: that is, to add one to 2, you just combine {0, 1} with {2} to get {0, 1, 2}, better known as 3.
And lo and behold, we have just proven that 1 + 2 = 3!
I want you to stop for a moment and feel very proud of yourself for having proven something known to small children. Knowing something is easy: knowing
why something is true is not so easy.
What we just did was this: We figured out what a set of things was. We figured out that there are relationships which can compare sets, and that special kinds of relationships – “equivalence relationships” – act a lot like “equals,” so you can divide up all of the sets in the world into families which share those properties. Then we realized that “has as many items as” is one of these equivalence relationships, and so you can divide up all the sets in the world into families which have the same number of elements: a family of sets with no elements, a family of sets with one element, a family of sets with two elements, and so on. And that’s what “two” actually
means: a set has two items if it has as many items as our “Standard Two.”
The next step – adding any pair of numbers – is pretty easy. (To add 3 and 2, just say that 2 = 1 + 1, and so 3 + 2 = 3 + 1 + 1, which we already understand) Beyond this comes a rather clever trick to define subtraction, and then multiplication (with rows of identical items), division (using the same trick as subtraction), fractions, and the most subtle and tricky of all, decimals. We can even use these methods to define “infinity” and talk about its behavior. But I’ll save those for another day, as we’ve only just stepped out from the rabbit hole, and even Alice waited before her second venture through the looking glass.
A final note: Less than an hour after I finished writing this, I received news of the death of Alexander Grothendieck, a brilliant mathematician, one of the founders of the field of algebraic geometry, and the inventor of the “clever trick” defining subtraction with which the next part of this would have to open. His mathematical work was at the foundation of a great deal of the modern cutting edge of physics, including a good deal of my own research in my physics days. Among his most important work was bringing category theory into the mainstream of mathematics; category theory essentially takes the ideas above, of kinds of mapping, equivalence, relations of relations, and so on, to their widest possible conclusion. I would like to dedicate this article to his memory, and hope that it may be the first step for some people to go and visit the great intellectual landscapes that he opened.––––––––––––––––––––
If you want more on a similar line, my older article "Counting to Infinity" is about how to use the same method to define what infinity really means. It's a bit lighter-weight than this one.
https://plus.google.com/+YonatanZunger/posts/BtHzUPFz4teIf you want to know the careful definition of a set, you should read
http://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory . This was one of those subjects that people thought was “really obvious” until Bertrand Russell pointed out that if you did the “really obvious” thing, you could define “sets” like “the set of all sets which don’t contain themselves,” which quickly led to paradoxes and confusion. Zermelo and Fraenkel finally laid out a careful definition of sets which doesn’t lead to paradoxes, and in fact works. It turns out that you need seven rules to do so, and there are two other famous “optional rules” – optional in that set theory actually makes sense with or without them. These two optional rules are the “Axiom of Infinity” (that roughly says, “infinite sets exist”) and the “Axiom of Choice.”
The Axiom of Choice roughly says that for any set, there exists a way to pick an element from it, a statement which seems sort of obvious until you realize that for
infinite sets, it isn’t. For example, the rule “pick the smallest element” works great for “the set of positive integers” or “the set of real numbers between 0 and 1, inclusive,” but it doesn’t work very well for “the set of real numbers greater than zero,” because there
isn’t a smallest element: if you pick any number greater than zero, half that number is smaller than it is. In fact, it turns out that despite how obvious it seems, the Axiom of Choice has some really weird consequences, such as the Banach-Tarski Theorem: you can prove that there’s a way to slice up a solid sphere into five pieces, and then rearrange those pieces to get two spheres, each identical to the original! This requires infinitely fine cuts, so it isn’t (alas) applicable to gold bullion. The main use of the Banach-Tarski Theorem is to make mathematicians’ heads explode, and to demonstrate just how weird the consequences of the Axiom of Choice really are. (Once their heads have exploded, they can be rearranged into the brains of two identical mathematicians, which is useful for math departments on a tight staffing budget)
The image is "Three," by Grant Hutchinson (
https://www.flickr.com/photos/splorp/59231687), a photograph of graffiti by an unknown artist.