## Garbage in, garbage out

The other day, my friend told me that her math teacher’s favorite thing to say is: “If you take the average of a bunch of people who estimate the length of a cow, the outcome will be better than that of an expert.”  (This is false don’t believe this and walk away!)  I think that it’s implicit that the people can’t see the cow.  Being tired and considering that I have a 4 minute passing period at my high school, I believed it and moved on.  I mean after all, if you take the average of a million peoples’ guesses, the outcome should be more or less right, right?  Doesn’t the scientific method say that accuracy increases through repetition?

Wrong. (The statement about the cow, not the scientific method.)  As the title implies, the average of a bunch of garbage is still garbage.  What, do you expect all the guesses to be perfectly symmetric about the actual length of the cow?  I think my best resolution to this paradox (I think it’s a paradox maybe you aren’t so surprised) is that while accuracy does increase with more people, the accuracy increases more and more slowly and there’s a limit to how accurate you can get.  The problem is also pretty ill-defined, since after all, isn’t an expert defined to be someone who gets very very nearly the right answer?  You could call the first hit on Google for cow-length the expert answer, and maybe that helps convince you.  When you start asking questions like “how will the average of a bunch of amateur cow-guessers do” (pretend that they exist!) it gets fuzzy.

(ADDENDUM:  Another satisfying explanation is accuracy vs. precision.  People are easily swayed by many factors, and so while your results may be precise, they’re not accurate.  If I polled another million people in the same way, I would get a very close answer, but that doesn’t say anything about cows, it says something about people.  Thanks dad.)

By the way, the scientific method does work of course.  It’s not like you can ever say–well, let me just do this experiment really, really, really well and that’ll suffice.  No, you do your best in every experiment and taking the average definitely increases accuracy.

Ironically enough, this fact can be applied to the act of people guessing about this very problem.  If you ask a bunch of people what the answer is, they’ll probably say yes; especially if they’ve been swayed by public opinion.  If you ask an expert like Feynman, however, you can’t go wrong.  (By the way, Feynman’s example of this was when he was reviewing textbooks for the California Board of Education.  He found that people gave good reviews to a textbook that was blank, because people took the average of the reviews they saw and passed it on to other people.  A few good experts would be better in this scenario rather than polling tons of teachers and administrators. :D)

If you still don’t believe me (if there’s one thing to take away here it’s to always question a popular opinion) feel free to post your best argument for why it should be better in the comments section.  And even better, if you don’t believe me try doing the experiment yourself 😉

## The Locker Problem

There is a hallway of lockers with lockers numbered 1 through 100. There are also 100 students. Student 1 opens every locker, then student 2 closes every other locker, then student 3 opens or closes every 3rd locker (if it’s open then she closes it and if it’s closed then she opens it), and so on, where the nth student opens or closes every locker which is a multiple of n. (Does it matter in which order the students go down the hallway in?)

The problem: Which lockers are open after all the students walk down the hallway?

Think about it for a bit, and I’ll post a solution later.

## The First-digit Law (Benford’s Law)

If you go through a newspaper and write down all of the numbers used in the paper, among those numbers approximately 30% of them will start with a 1.

I challenge you to go count for yourself (maybe not the whole paper but part of it!) if you feel so inclined.  I’ve never done it and I’m not sure how much of it you need to count to observe this (since small samplings wont be accurate).  If you do count let us know what you find!

But what’s so special about 1?  Shouldn’t they all appear with the same frequency, and so 1 should appear 11% of the time?  That’s what makes this a paradox.

Furthermore, this applies to almost any real-world data set without restrictions on what the numbers can be.  As formulated, this is not a mathematical law because it is not stated precisely–what constitutes a real-world data set?  (The “approximately” isn’t precise either but can be made precise.)  Of course you could construct a data set where the first digits are 1 through 9 equally, but that would be cheating 😛  There is a way to make this mathematically precise, but it’s a bit out of the scope of this blog.

You’re probably still begging to know why this strange phenomenon even has a chance of happening.  Well, here are 2 facts that might convince you.  Firstly, you might be wondering what’s so special about 30%, and the answer is it is very close to $\log{2} \approx .30103$ (where $\log$ is the log base 10 as explained in the previous post).

1)  If you take a number, say 3 million (this is very loose), and look at the number of numbers less than it that start with 1, you’ll find that there are at least as many that start with 1 as any other number, at least.  In the case of 3 million, about 40% start with a 1 (1,000,000 numbers between a million and 2 million + 100,000 between 0 and a million), so it is very biased towards 1’s.

2)  The sequence of powers of 2, and many other sequences, have $\log{2}$ of its numbers beginning with 1, or approximately 30%.

Proof: (Medium-hard)  What does it mean for a number $n$ to start with a 1?  It means that the fractional part of $\log{n}$ (the fractional part of 2.3 is .3; you chop off the integer part) is less than $\log 2$, because for some integer $k$, $10^k < n < 2 \cdot 10^k$, which implies that $k < \log{n} < \log(2 \cdot 10^k) = \log{2} + \log{10^k} = \log{2} + k$ (by basic properties of log).  This is the same saying that the fractional part is less than $\log 2$; it is less than $\log 2$ bigger than some integer.  Furthermore, going back to the original question where $n$ is a power of 2, the fractional part of $\log{2^m} = m \log{2}$ is equally distributed about the interval [0,1], which is hard to define precisely but just imagine that if you take all multiples of $\log{2}$ and put a dot for where each fractional part lands, the space between 0 and 1 is uniformly marked up.  This shows that the probability that $\log{n}$ is less than $\log{2}$ where $n$ is a power of 2 is $\frac{\log{2}}{1} = \log{2}$!

But WAIT!  Awesomely enough, Benford’s law has a clear real-world application, which is to check if a set of data is authentic or not.  (Unless the data-forger is careful enough to do a logarithmic distribution for the first digits of the numbers!)  According to Wikipedia, this law was used to discover fraud in the 2009 Iranian elections, which is pretty cool!   I wonder if they used it in the show NUMB3RS… (which sadly stopped airing 😦 )

NOTE:  The Wikipedia article on this topic seems to be horribly inaccurate.  (For more advanced readers, the way they define a logarithmic distribution isn’t even possible since there is no such thing as a uniformly distributed set on the number line…)

## Properties of 2011

So, it’s the new year.  I challenged myself to think of as many cool mathematical properties about 2011 and this is what I came up with…

1.  2011 is a prime number!  To verify this, you have to check that none of the prime numbers less than $\sqrt{2011}$, which rounded down is 44, divide 2011.

2.  As my friends have pointed out, 2011 is a sexy prime, since it is a member of a pair of 2 primes 6 apart, which is (2011, 2017).  Though there seems to be no such term, it’s an octy prime as well since 2003 is prime as well.  After 2017, the next prime is 2027.

That so many primes occur in the 2000’s surprises me, but it’s actually not too surprising.  The Prime Number Theorem says that the “chance” that a number $n$ is prime (of course really a number is either prime or it’s not) is $\frac{1}{\ln{n}}$ ( $\ln{n}$ is the log base e, e being the constant more important than $\pi$ equal to around 2.718.   The log base b of n, or $\log_{b} n = x$ if $b^x=n$, so it is the opposite of taking $x$ to $b^x$.)  So the “chance” of a number around 2000 being prime is roughly $\frac{1}{\ln{2000}}=\frac{1}{7}$.  Once again I have underestimated how slowly the log function grows, because $\ln{2000}$ isn’t very big.

I guess you could say then that the probability of 2011 being a sexy prime is rare, but really a sexy prime is so contrived!!  (The Wikipedia article has failed to convince me of its usefulness, at least.)  (EDIT: Finding the probability of two numbers being sexy primes is actually harder than I thought because $n$ being prime and $n+6$ being prime aren’t independent events so you can’t multiply their probabilities!)

3.  2011 can be written as the sum of 3 squares: $39^2+21^2+7^2$.  Lagrange’s Four Square Theorem says that every number can be written as the sum of 4 squares in at least 1 way.  An extension of that (Jacobi’s Four Square Theorem) says that the number of ways a number can be written as the sum of 4 squares is 8 times the sum of its divisors for odd integers, and since 2011 is prime, this is easy to find–it’s just $8 \cdot (2011+1) = 16,096$ ways.  (You can find the sum of the divisors of a number in general as well by taking the sum of the divisors of each of the prime powers in its factorization and multiplying them.)  Don’t try listing all 16,096 ways 😛

Have a great 2011 🙂

## 2010 in review

Wow, I’m completely shocked by these results.  I thought only my friends and family were reading but I guess I’m wrong!  Thanks to all my readers and people who publicized this!!

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health: The Blog-Health-o-Meter™ reads This blog is on fire!.

## Crunchy numbers A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 1,500 times in 2010. That’s about 4 full 747s.

In 2010, there were 5 new posts, not bad for the first year! There were 13 pictures uploaded, taking up a total of 96kb. That’s about a picture per month.

The busiest day of the year was August 31st with 152 views. The most popular post that day was What is math anyway?.

## Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

What is math anyway? August 2010

2

To Infinity and Beyond August 2010

3

The Beauty of Math: Fractals October 2010
1 comment

4

How to Count…Better September 2010
2 comments and 1 Like on WordPress.com,

5

Resources for Middle Schoolers August 2010

## The Beauty of Math: Fractals

Fractals are pictures that are constructed using mathematics that are also quite beautiful.  There are many many fractals in math but I will only describe a few.  Amazingly, it seems that fractals are all around us in the physical world as well, coming up in nature.

Take an equilateral triangle.  Break it into 4 equilateral triangles and shade in the middle one.  Repeat this process with the 3 equilateral triangles left.  Keep going forever.  What do you get? (Image taken from http://math.bu.edu/DYSYS/chaos-game/node2.html)

This strange shape, called the Sierpinski Triangle, is one of many different fractals.  A fractal is a shape created through this process of doing something over and over again.  Another way to state this is to say that fractals have self-similarity; in other words that you can zoom in on part of the fractal and it is identical to the whole.  For example, one fourth of the Sierpinski triangle is identical to the entire thing!

Another fractal called the Koch snowflake is created just out of simple equilateral triangles as well, this time starting with an equilateral triangle and adding triangles with 1/3 the side length of the original to each side and repeating forever.  The resulting fractal looks like a snowflake! For an awesome animation of the construction of the Koch snowflake, see this page on Wikipedia.

Fractals are absolutely gorgeous mathematical objects.  The prettiest fractal of all in my opinion is the Mandelbrot Set, which is constructed using advanced math: These fractals are not just made up for the sake of looking pretty; they actually arise from different places in mathematics.  This is visual evidence for why math is beautiful.

Fractals come up in the study of nature as well.  The structure of leaves, DNA, and even the stars in the universe have been observed to be fractals with the self-similar property!  Do you see it in the fern below?  (Generating using a computer but it certainly resembles an everyday fern): Fractals are all around us.

## How to Count…Better

This post is an introduction to combinatorics, which is how to count without actually counting.  For example, you have 12 pairs of shoes, 16 shirts, and 8 pairs of pants in your closet.  How many different outfits can you make out of these, assuming that you have to wear a pair of shoes, a shirt, and a pair of pants (and assuming everything matches :)).

Listing out all of the combinations would be tedious!  Better is to use the following method.  There are 12 choices for shoes and 16 shirts.  So there are $12 \cdot 16$ shirt-pant combinations!  If this is unclear, think about drawing a grid with 12 slots on one side and 16 on the other.  Each box represents a different combination, and there are $12 \cdot 16$ boxes.  This method generalizes to more than 2 things; there are $12 \cdot 16 \cdot 8 = 1536$ shoe-shirt-pant combinations.  Don’t try listing that out 😛

Now suppose that there are 20 people in a class, and you need to pick a class president, vice president, and secretary.  How many ways are there to do this?  This is an application of our method.  It doesn’t matter, but say you pick the president first.  There are 20 choices for the president.  Now, the vice president can be anyone except the person who you just designated as president, so there are 19 choices.  Lastly, there are 18 choices for secretary.  Following our method, we get that there are $20 \cdot 19 \cdot 18 = 6840$ different combinations of people for the three positions.

Now I’ll change the problem a bit.  There are 20 people in the class and you need to choose 3 class representatives from them.  This may seem the same as the previous problem, except that in this case if Alison, Bob, and Charlotte are representatives that’s the same as Bob, Alison, and Charlotte being representatives, whereas in the previous problem order mattered, since Alison being president and Bob being vice president was different from Bob being president and Alison being vice president.  Order is a very important factor to consider in combinatorics.  So what do we do?  Well, we can still use the above method, but keep in mind that we counted all arrangements of the students Alison, Bob, and Charlie as different.

How many arrangements are there?  Well, there’s 1.  A,B,C  2.  A,C,B  3.  B,A,C  4.  B,C,A,  5.  C,A,B  6.  C,B,A.  So 6 arrangements.  Therefore, if we take our answer from before and divide by 6, that will correct for our over-counting.  So there are $\frac{6840}{6}=1140$ ways to choose 3 representatives!

Being able to count things in this way is essential for calculating the probability of events happening, which is extremely useful in playing games and analyzing real-world events.  Lastly, if you like this and want to see more, this and much more is covered in the Art of Problem Solving’s Introduction to Counting and Probability textbook.  See the resource page on the left side for a link to the Art of Problem Solving website and more.

## The 3n+1 conjecture

Hi everyone!  First of all, note that there’s now a button to share a post with all of your friends by email, facebook, or twitter.  If you do share that would be great!

In a typical middle or high school math class it may be hard to tell, but math is constantly evolving.  There are many important problems in various fields which we don’t know the answer to.

Here’s one good problem that is easy to state, but that has stumped the best mathematicians in the world since 1937!  It is called the Collatz Conjecture (a conjecture is an unsolved problem) after the guy who came up the problem, Lothar Collatz.

Pick a number, any (positive whole) number.  If your number is even, divide it by 2.  Otherwise if it’s odd, do three times your number and add 1.  Do this procedure over and over, if your current number is even divide by 2, if odd multiply by 3 and add 1.  The conjecture is that eventually, no matter what your number is, you will get the number 1.

For example, for the number 6, the steps are as follows: $6 \rightarrow 3 \rightarrow 10 \rightarrow 5 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1$.

Try it out for yourself!

Due to the power of computers, we know that every number up to $10^{18}$ satisfies the 3n+1 conjecture.  That’s 1,000,000,000,000,000,000.  While this is enough evidence to convince most people, it will only be proven when it shown for every single whole number (see my first post for why we prove things).

Also see my friend’s article about “The Art of Math” which came out recently in an online magazine called KidSpirit.  It covers other nice proofs, some of which I may write about in the future too 🙂  http://kidspiritonline.com/2010/09/the-art-of-mathematics/

## To Infinity and Beyond

As I mentioned in my previous post, prime numbers are numbers which have exactly 2 divisors: themselves and 1.  This means that 1 itself isn’t prime, which is very important!  Prime numbers are fundamental in number theory, which is the study of positive integers and divisibility.  This is because of unique factorization, which is the property that every number can be written in exactly one way (where the order of the divisors doesn’t matter) as the product of prime numbers where primes can be used more than once (this fact requires careful proof but we’ll assume for now).  So prime numbers are the building blocks of all numbers.

A very important result that there are an infinite number of primes was proved over 2000 years ago by the Greek mathematician Euclid (although there’s some debate about what he actually proved).  I will prove this in two different ways here.

Proof #1:  This proof is a , which is where you start by assuming the opposite of what you want to prove and show that the assumption is impossible by finding an impossible consequence.

Assume to the contrary that there are actually a finite number of primes, which we can let be $p_1, p_2, p_3, \dots, p_n$ (the subscripts indicate that not all of the p’s are the same and that there are n of them).  By unique factorization, this means that every number can be written as a product of these primes.  However, consider the number $p_1 \cdot p_2 \cdot p_3 \dots \cdot p_n + 1$, taking the product of all of the primes and then adding 1.  This number is not divisible by any of the primes $p_1, p_2, p_3 \dots, p_n$.  Therefore it has no prime factorization, which is a contradiction.  So, since if there were a finite number of primes that would be a problem, we have shown that there are an infinite number of primes!

Proof #2:  While proof by contradiction is extremely useful and is often the only good way to prove things, in this case there is a direct way (without contradiction) to prove the infinitude of primes as well, if you don’t like the contradiction way.  It’s more like what Euclid said.  It is:

For any finite set of primes $p_1, p_2, \dots, p_n$, the number $p_1 \cdot p_2 \cdot \dots \cdot p_n + 1$ is divisible by another prime $p_{n+1}$ which is different from those n primes.  Therefore, we can make any set of primes bigger and bigger, so there are an infinite number of them.

And now for some nerdy humor, a quote by Noga Alon, a famous Israeli professor:  “I was interviewed on the Israeli radio for five minutes and I said that more than 2000 years ago, Euclid proved that there are infinitely many primes.  Immediately the host interrupted me and asked, ‘Are there still infinitely many primes?’ ” 🙂

## What is math anyway?

Since I’m trying to write a blog about math, I should probably explain what math is first!

Math, roughly speaking, is a vast collection of knowledge that is rigorously established. What does that mean? Proofs. Proofs are at the heart of math.

Mathematicians go crazy with proofs. In fact, they insist on making everything rigorous and basing things off of very basic assumptions called axioms. Even the simple, seemingly obvious fact that $0 \cdot n = 0$ for any number n has proof using simpler axioms about our number system.

So why do we bother proofs? Often the first step towards proving something is noticing that a certain fact is true in a lot of cases. This is a pattern. But, patterns are not enough to show a general result. For example, in the early 1800’s, the mathematicians Legendre and Gauss noticed that the expression $n^2-n+41$ is a prime number for n from 1 to 40. (A prime number is a number that is only divisible by 1 and itself, so 3 is a prime number because it cannot be broken up into more factors but 6 isn’t a prime number because it is $2 \cdot 3$.) It seems like with so much evidence, it should be true for all positive counting numbers, and that’s what they thought too. However, in the very next case, $n=41$, it fails to be prime. This is easy to see because $41^2-41+41=41^2$, which is not prime. Silly Legendre and Gauss!  Furthermore, often a proof gives us important insights that can then be applied to other problems.  The why is crucial.

Making guesses is how math evolves.  Something that is not proven but is believed to be true is called a conjecture. There are many, many important problems with the status of conjecture right at this moment. That is what current mathematicians keep themselves busy with. For example, one conjecture that hasn’t been resolved is whether or not there are an infinite number of positive counting numbers for that same expression $n^2-n+41$ where the result is prime (so even though it isn’t true for all positive counting numbers, it could still be true for an infinite number of them).  This is an open question for almost all quadratic equations (an expression with a squared term as its highest power), in fact.

In most middle and high school math classes, proofs are often left out or only partially stated. I think this is a mistake, since proofs are so key to math. Everything has a proof. So, the next time your teacher tells you something is true, try to find a proof!