## Garbage in, garbage out

The other day, my friend told me that her math teacher’s favorite thing to say is: “If you take the average of a bunch of people who estimate the length of a cow, the outcome will be better than that of an expert.”  (This is false don’t believe this and walk away!)  I think that it’s implicit that the people can’t see the cow.  Being tired and considering that I have a 4 minute passing period at my high school, I believed it and moved on.  I mean after all, if you take the average of a million peoples’ guesses, the outcome should be more or less right, right?  Doesn’t the scientific method say that accuracy increases through repetition?

Wrong. (The statement about the cow, not the scientific method.)  As the title implies, the average of a bunch of garbage is still garbage.  What, do you expect all the guesses to be perfectly symmetric about the actual length of the cow?  I think my best resolution to this paradox (I think it’s a paradox maybe you aren’t so surprised) is that while accuracy does increase with more people, the accuracy increases more and more slowly and there’s a limit to how accurate you can get.  The problem is also pretty ill-defined, since after all, isn’t an expert defined to be someone who gets very very nearly the right answer?  You could call the first hit on Google for cow-length the expert answer, and maybe that helps convince you.  When you start asking questions like “how will the average of a bunch of amateur cow-guessers do” (pretend that they exist!) it gets fuzzy.

(ADDENDUM:  Another satisfying explanation is accuracy vs. precision.  People are easily swayed by many factors, and so while your results may be precise, they’re not accurate.  If I polled another million people in the same way, I would get a very close answer, but that doesn’t say anything about cows, it says something about people.  Thanks dad.)

By the way, the scientific method does work of course.  It’s not like you can ever say–well, let me just do this experiment really, really, really well and that’ll suffice.  No, you do your best in every experiment and taking the average definitely increases accuracy.

Ironically enough, this fact can be applied to the act of people guessing about this very problem.  If you ask a bunch of people what the answer is, they’ll probably say yes; especially if they’ve been swayed by public opinion.  If you ask an expert like Feynman, however, you can’t go wrong.  (By the way, Feynman’s example of this was when he was reviewing textbooks for the California Board of Education.  He found that people gave good reviews to a textbook that was blank, because people took the average of the reviews they saw and passed it on to other people.  A few good experts would be better in this scenario rather than polling tons of teachers and administrators. :D)

If you still don’t believe me (if there’s one thing to take away here it’s to always question a popular opinion) feel free to post your best argument for why it should be better in the comments section.  And even better, if you don’t believe me try doing the experiment yourself 😉

## The Locker Problem

There is a hallway of lockers with lockers numbered 1 through 100. There are also 100 students. Student 1 opens every locker, then student 2 closes every other locker, then student 3 opens or closes every 3rd locker (if it’s open then she closes it and if it’s closed then she opens it), and so on, where the nth student opens or closes every locker which is a multiple of n. (Does it matter in which order the students go down the hallway in?)

The problem: Which lockers are open after all the students walk down the hallway?

Think about it for a bit, and I’ll post a solution later.

## The First-digit Law (Benford’s Law)

If you go through a newspaper and write down all of the numbers used in the paper, among those numbers approximately 30% of them will start with a 1.

I challenge you to go count for yourself (maybe not the whole paper but part of it!) if you feel so inclined.  I’ve never done it and I’m not sure how much of it you need to count to observe this (since small samplings wont be accurate).  If you do count let us know what you find!

But what’s so special about 1?  Shouldn’t they all appear with the same frequency, and so 1 should appear 11% of the time?  That’s what makes this a paradox.

Furthermore, this applies to almost any real-world data set without restrictions on what the numbers can be.  As formulated, this is not a mathematical law because it is not stated precisely–what constitutes a real-world data set?  (The “approximately” isn’t precise either but can be made precise.)  Of course you could construct a data set where the first digits are 1 through 9 equally, but that would be cheating 😛  There is a way to make this mathematically precise, but it’s a bit out of the scope of this blog.

You’re probably still begging to know why this strange phenomenon even has a chance of happening.  Well, here are 2 facts that might convince you.  Firstly, you might be wondering what’s so special about 30%, and the answer is it is very close to $\log{2} \approx .30103$ (where $\log$ is the log base 10 as explained in the previous post).

1)  If you take a number, say 3 million (this is very loose), and look at the number of numbers less than it that start with 1, you’ll find that there are at least as many that start with 1 as any other number, at least.  In the case of 3 million, about 40% start with a 1 (1,000,000 numbers between a million and 2 million + 100,000 between 0 and a million), so it is very biased towards 1’s.

2)  The sequence of powers of 2, and many other sequences, have $\log{2}$ of its numbers beginning with 1, or approximately 30%.

Proof: (Medium-hard)  What does it mean for a number $n$ to start with a 1?  It means that the fractional part of $\log{n}$ (the fractional part of 2.3 is .3; you chop off the integer part) is less than $\log 2$, because for some integer $k$,  $10^k < n < 2 \cdot 10^k$, which implies that $k < \log{n} < \log(2 \cdot 10^k) = \log{2} + \log{10^k} = \log{2} + k$ (by basic properties of log).  This is the same saying that the fractional part is less than $\log 2$; it is less than $\log 2$ bigger than some integer.  Furthermore, going back to the original question where $n$ is a power of 2, the fractional part of $\log{2^m} = m \log{2}$ is equally distributed about the interval [0,1], which is hard to define precisely but just imagine that if you take all multiples of $\log{2}$ and put a dot for where each fractional part lands, the space between 0 and 1 is uniformly marked up.  This shows that the probability that $\log{n}$ is less than $\log{2}$ where $n$ is a power of 2 is $\frac{\log{2}}{1} = \log{2}$!

But WAIT!  Awesomely enough, Benford’s law has a clear real-world application, which is to check if a set of data is authentic or not.  (Unless the data-forger is careful enough to do a logarithmic distribution for the first digits of the numbers!)  According to Wikipedia, this law was used to discover fraud in the 2009 Iranian elections, which is pretty cool!   I wonder if they used it in the show NUMB3RS… (which sadly stopped airing 😦 )

NOTE:  The Wikipedia article on this topic seems to be horribly inaccurate.  (For more advanced readers, the way they define a logarithmic distribution isn’t even possible since there is no such thing as a uniformly distributed set on the number line…)

## Properties of 2011

So, it’s the new year.  I challenged myself to think of as many cool mathematical properties about 2011 and this is what I came up with…

1.  2011 is a prime number!  To verify this, you have to check that none of the prime numbers less than $\sqrt{2011}$, which rounded down is 44, divide 2011.

2.  As my friends have pointed out, 2011 is a sexy prime, since it is a member of a pair of 2 primes 6 apart, which is (2011, 2017).  Though there seems to be no such term, it’s an octy prime as well since 2003 is prime as well.  After 2017, the next prime is 2027.

That so many primes occur in the 2000’s surprises me, but it’s actually not too surprising.  The Prime Number Theorem says that the “chance” that a number $n$ is prime (of course really a number is either prime or it’s not) is $\frac{1}{\ln{n}}$ ($\ln{n}$ is the log base e, e being the constant more important than $\pi$ equal to around 2.718.   The log base b of n, or $\log_{b} n = x$ if $b^x=n$, so it is the opposite of taking $x$ to $b^x$.)  So the “chance” of a number around 2000 being prime is roughly $\frac{1}{\ln{2000}}=\frac{1}{7}$.  Once again I have underestimated how slowly the log function grows, because $\ln{2000}$ isn’t very big.

I guess you could say then that the probability of 2011 being a sexy prime is rare, but really a sexy prime is so contrived!!  (The Wikipedia article has failed to convince me of its usefulness, at least.)  (EDIT: Finding the probability of two numbers being sexy primes is actually harder than I thought because $n$ being prime and $n+6$ being prime aren’t independent events so you can’t multiply their probabilities!)

3.  2011 can be written as the sum of 3 squares: $39^2+21^2+7^2$.  Lagrange’s Four Square Theorem says that every number can be written as the sum of 4 squares in at least 1 way.  An extension of that (Jacobi’s Four Square Theorem) says that the number of ways a number can be written as the sum of 4 squares is 8 times the sum of its divisors for odd integers, and since 2011 is prime, this is easy to find–it’s just $8 \cdot (2011+1) = 16,096$ ways.  (You can find the sum of the divisors of a number in general as well by taking the sum of the divisors of each of the prime powers in its factorization and multiplying them.)  Don’t try listing all 16,096 ways 😛

Have a great 2011 🙂

## 2010 in review

Wow, I’m completely shocked by these results.  I thought only my friends and family were reading but I guess I’m wrong!  Thanks to all my readers and people who publicized this!!

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

The Blog-Health-o-Meter™ reads This blog is on fire!.

## Crunchy numbers

A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 1,500 times in 2010. That’s about 4 full 747s.

In 2010, there were 5 new posts, not bad for the first year! There were 13 pictures uploaded, taking up a total of 96kb. That’s about a picture per month.

The busiest day of the year was August 31st with 152 views. The most popular post that day was What is math anyway?.

## Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

What is math anyway? August 2010

2

To Infinity and Beyond August 2010

3

The Beauty of Math: Fractals October 2010
1 comment

4

How to Count…Better September 2010
2 comments and 1 Like on WordPress.com,

5

Resources for Middle Schoolers August 2010

## The Beauty of Math: Fractals

Fractals are pictures that are constructed using mathematics that are also quite beautiful.  There are many many fractals in math but I will only describe a few.  Amazingly, it seems that fractals are all around us in the physical world as well, coming up in nature.

Take an equilateral triangle.  Break it into 4 equilateral triangles and shade in the middle one.  Repeat this process with the 3 equilateral triangles left.  Keep going forever.  What do you get?

(Image taken from http://math.bu.edu/DYSYS/chaos-game/node2.html)

This strange shape, called the Sierpinski Triangle, is one of many different fractals.  A fractal is a shape created through this process of doing something over and over again.  Another way to state this is to say that fractals have self-similarity; in other words that you can zoom in on part of the fractal and it is identical to the whole.  For example, one fourth of the Sierpinski triangle is identical to the entire thing!

Another fractal called the Koch snowflake is created just out of simple equilateral triangles as well, this time starting with an equilateral triangle and adding triangles with 1/3 the side length of the original to each side and repeating forever.  The resulting fractal looks like a snowflake!

For an awesome animation of the construction of the Koch snowflake, see this page on Wikipedia.

Fractals are absolutely gorgeous mathematical objects.  The prettiest fractal of all in my opinion is the Mandelbrot Set, which is constructed using advanced math:

These fractals are not just made up for the sake of looking pretty; they actually arise from different places in mathematics.  This is visual evidence for why math is beautiful.

Fractals come up in the study of nature as well.  The structure of leaves, DNA, and even the stars in the universe have been observed to be fractals with the self-similar property!  Do you see it in the fern below?  (Generating using a computer but it certainly resembles an everyday fern):

Fractals are all around us.

## How to Count…Better

This post is an introduction to combinatorics, which is how to count without actually counting.  For example, you have 12 pairs of shoes, 16 shirts, and 8 pairs of pants in your closet.  How many different outfits can you make out of these, assuming that you have to wear a pair of shoes, a shirt, and a pair of pants (and assuming everything matches :)).

Listing out all of the combinations would be tedious!  Better is to use the following method.  There are 12 choices for shoes and 16 shirts.  So there are $12 \cdot 16$ shirt-pant combinations!  If this is unclear, think about drawing a grid with 12 slots on one side and 16 on the other.  Each box represents a different combination, and there are $12 \cdot 16$ boxes.  This method generalizes to more than 2 things; there are $12 \cdot 16 \cdot 8 = 1536$ shoe-shirt-pant combinations.  Don’t try listing that out 😛

Now suppose that there are 20 people in a class, and you need to pick a class president, vice president, and secretary.  How many ways are there to do this?  This is an application of our method.  It doesn’t matter, but say you pick the president first.  There are 20 choices for the president.  Now, the vice president can be anyone except the person who you just designated as president, so there are 19 choices.  Lastly, there are 18 choices for secretary.  Following our method, we get that there are $20 \cdot 19 \cdot 18 = 6840$ different combinations of people for the three positions.

Now I’ll change the problem a bit.  There are 20 people in the class and you need to choose 3 class representatives from them.  This may seem the same as the previous problem, except that in this case if Alison, Bob, and Charlotte are representatives that’s the same as Bob, Alison, and Charlotte being representatives, whereas in the previous problem order mattered, since Alison being president and Bob being vice president was different from Bob being president and Alison being vice president.  Order is a very important factor to consider in combinatorics.  So what do we do?  Well, we can still use the above method, but keep in mind that we counted all arrangements of the students Alison, Bob, and Charlie as different.

How many arrangements are there?  Well, there’s 1.  A,B,C  2.  A,C,B  3.  B,A,C  4.  B,C,A,  5.  C,A,B  6.  C,B,A.  So 6 arrangements.  Therefore, if we take our answer from before and divide by 6, that will correct for our over-counting.  So there are $\frac{6840}{6}=1140$ ways to choose 3 representatives!

Being able to count things in this way is essential for calculating the probability of events happening, which is extremely useful in playing games and analyzing real-world events.  Lastly, if you like this and want to see more, this and much more is covered in the Art of Problem Solving’s Introduction to Counting and Probability textbook.  See the resource page on the left side for a link to the Art of Problem Solving website and more.