Archive for January, 2011

Garbage in, garbage out

The other day, my friend told me that her math teacher’s favorite thing to say is: “If you take the average of a bunch of people who estimate the length of a cow, the outcome will be better than that of an expert.”  (This is false don’t believe this and walk away!)  I think that it’s implicit that the people can’t see the cow.  Being tired and considering that I have a 4 minute passing period at my high school, I believed it and moved on.  I mean after all, if you take the average of a million peoples’ guesses, the outcome should be more or less right, right?  Doesn’t the scientific method say that accuracy increases through repetition?

Wrong. (The statement about the cow, not the scientific method.)  As the title implies, the average of a bunch of garbage is still garbage.  What, do you expect all the guesses to be perfectly symmetric about the actual length of the cow?  I think my best resolution to this paradox (I think it’s a paradox maybe you aren’t so surprised) is that while accuracy does increase with more people, the accuracy increases more and more slowly and there’s a limit to how accurate you can get.  The problem is also pretty ill-defined, since after all, isn’t an expert defined to be someone who gets very very nearly the right answer?  You could call the first hit on Google for cow-length the expert answer, and maybe that helps convince you.  When you start asking questions like “how will the average of a bunch of amateur cow-guessers do” (pretend that they exist!) it gets fuzzy.

(ADDENDUM:  Another satisfying explanation is accuracy vs. precision.  People are easily swayed by many factors, and so while your results may be precise, they’re not accurate.  If I polled another million people in the same way, I would get a very close answer, but that doesn’t say anything about cows, it says something about people.  Thanks dad.)

By the way, the scientific method does work of course.  It’s not like you can ever say–well, let me just do this experiment really, really, really well and that’ll suffice.  No, you do your best in every experiment and taking the average definitely increases accuracy.

Ironically enough, this fact can be applied to the act of people guessing about this very problem.  If you ask a bunch of people what the answer is, they’ll probably say yes; especially if they’ve been swayed by public opinion.  If you ask an expert like Feynman, however, you can’t go wrong.  (By the way, Feynman’s example of this was when he was reviewing textbooks for the California Board of Education.  He found that people gave good reviews to a textbook that was blank, because people took the average of the reviews they saw and passed it on to other people.  A few good experts would be better in this scenario rather than polling tons of teachers and administrators. :D)

If you still don’t believe me (if there’s one thing to take away here it’s to always question a popular opinion) feel free to post your best argument for why it should be better in the comments section.  And even better, if you don’t believe me try doing the experiment yourself 😉


The Locker Problem

There is a hallway of lockers with lockers numbered 1 through 100. There are also 100 students. Student 1 opens every locker, then student 2 closes every other locker, then student 3 opens or closes every 3rd locker (if it’s open then she closes it and if it’s closed then she opens it), and so on, where the nth student opens or closes every locker which is a multiple of n. (Does it matter in which order the students go down the hallway in?)

The problem: Which lockers are open after all the students walk down the hallway?

Think about it for a bit, and I’ll post a solution later.

The First-digit Law (Benford’s Law)

My dad told me about this really cool phenomenon that I almost didn’t believe the other day…

If you go through a newspaper and write down all of the numbers used in the paper, among those numbers approximately 30% of them will start with a 1.

I challenge you to go count for yourself (maybe not the whole paper but part of it!) if you feel so inclined.  I’ve never done it and I’m not sure how much of it you need to count to observe this (since small samplings wont be accurate).  If you do count let us know what you find!

But what’s so special about 1?  Shouldn’t they all appear with the same frequency, and so 1 should appear 11% of the time?  That’s what makes this a paradox.

Furthermore, this applies to almost any real-world data set without restrictions on what the numbers can be.  As formulated, this is not a mathematical law because it is not stated precisely–what constitutes a real-world data set?  (The “approximately” isn’t precise either but can be made precise.)  Of course you could construct a data set where the first digits are 1 through 9 equally, but that would be cheating 😛  There is a way to make this mathematically precise, but it’s a bit out of the scope of this blog.

You’re probably still begging to know why this strange phenomenon even has a chance of happening.  Well, here are 2 facts that might convince you.  Firstly, you might be wondering what’s so special about 30%, and the answer is it is very close to \log{2} \approx .30103 (where \log is the log base 10 as explained in the previous post).

1)  If you take a number, say 3 million (this is very loose), and look at the number of numbers less than it that start with 1, you’ll find that there are at least as many that start with 1 as any other number, at least.  In the case of 3 million, about 40% start with a 1 (1,000,000 numbers between a million and 2 million + 100,000 between 0 and a million), so it is very biased towards 1’s.

2)  The sequence of powers of 2, and many other sequences, have \log{2} of its numbers beginning with 1, or approximately 30%.

Proof: (Medium-hard)  What does it mean for a number n to start with a 1?  It means that the fractional part of \log{n} (the fractional part of 2.3 is .3; you chop off the integer part) is less than \log 2, because for some integer k,  10^k < n < 2 \cdot 10^k, which implies that k < \log{n} < \log(2 \cdot 10^k) = \log{2} + \log{10^k} = \log{2} + k (by basic properties of log).  This is the same saying that the fractional part is less than \log 2; it is less than \log 2 bigger than some integer.  Furthermore, going back to the original question where n is a power of 2, the fractional part of \log{2^m} = m \log{2} is equally distributed about the interval [0,1], which is hard to define precisely but just imagine that if you take all multiples of \log{2} and put a dot for where each fractional part lands, the space between 0 and 1 is uniformly marked up.  This shows that the probability that \log{n} is less than \log{2} where n is a power of 2 is \frac{\log{2}}{1} = \log{2}!

But WAIT!  Awesomely enough, Benford’s law has a clear real-world application, which is to check if a set of data is authentic or not.  (Unless the data-forger is careful enough to do a logarithmic distribution for the first digits of the numbers!)  According to Wikipedia, this law was used to discover fraud in the 2009 Iranian elections, which is pretty cool!   I wonder if they used it in the show NUMB3RS… (which sadly stopped airing 😦 )

NOTE:  The Wikipedia article on this topic seems to be horribly inaccurate.  (For more advanced readers, the way they define a logarithmic distribution isn’t even possible since there is no such thing as a uniformly distributed set on the number line…)

Thanks to my dad for telling me about this and for presenting it so well that all I had to do was essentially recreate what he said.

Properties of 2011

So, it’s the new year.  I challenged myself to think of as many cool mathematical properties about 2011 and this is what I came up with…

1.  2011 is a prime number!  To verify this, you have to check that none of the prime numbers less than \sqrt{2011}, which rounded down is 44, divide 2011.

2.  As my friends have pointed out, 2011 is a sexy prime, since it is a member of a pair of 2 primes 6 apart, which is (2011, 2017).  Though there seems to be no such term, it’s an octy prime as well since 2003 is prime as well.  After 2017, the next prime is 2027.

That so many primes occur in the 2000’s surprises me, but it’s actually not too surprising.  The Prime Number Theorem says that the “chance” that a number n is prime (of course really a number is either prime or it’s not) is \frac{1}{\ln{n}} (\ln{n} is the log base e, e being the constant more important than \pi equal to around 2.718.   The log base b of n, or \log_{b} n = x if b^x=n, so it is the opposite of taking x to b^x.)  So the “chance” of a number around 2000 being prime is roughly \frac{1}{\ln{2000}}=\frac{1}{7}.  Once again I have underestimated how slowly the log function grows, because \ln{2000} isn’t very big.

I guess you could say then that the probability of 2011 being a sexy prime is rare, but really a sexy prime is so contrived!!  (The Wikipedia article has failed to convince me of its usefulness, at least.)  (EDIT: Finding the probability of two numbers being sexy primes is actually harder than I thought because n being prime and n+6 being prime aren’t independent events so you can’t multiply their probabilities!)

3.  2011 can be written as the sum of 3 squares: 39^2+21^2+7^2.  Lagrange’s Four Square Theorem says that every number can be written as the sum of 4 squares in at least 1 way.  An extension of that (Jacobi’s Four Square Theorem) says that the number of ways a number can be written as the sum of 4 squares is 8 times the sum of its divisors for odd integers, and since 2011 is prime, this is easy to find–it’s just 8 \cdot (2011+1) = 16,096 ways.  (You can find the sum of the divisors of a number in general as well by taking the sum of the divisors of each of the prime powers in its factorization and multiplying them.)  Don’t try listing all 16,096 ways 😛

Have a great 2011 🙂

2010 in review

Wow, I’m completely shocked by these results.  I thought only my friends and family were reading but I guess I’m wrong!  Thanks to all my readers and people who publicized this!!

The stats helper monkeys at mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads This blog is on fire!.

Crunchy numbers

Featured image

A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 1,500 times in 2010. That’s about 4 full 747s.

In 2010, there were 5 new posts, not bad for the first year! There were 13 pictures uploaded, taking up a total of 96kb. That’s about a picture per month.

The busiest day of the year was August 31st with 152 views. The most popular post that day was What is math anyway?.

Attractions in 2010

These are the posts and pages that got the most views in 2010.


What is math anyway? August 2010


To Infinity and Beyond August 2010


The Beauty of Math: Fractals October 2010
1 comment


How to Count…Better September 2010
2 comments and 1 Like on,


Resources for Middle Schoolers August 2010

%d bloggers like this: