Last Thursday, Steven J. Miller, associate professor of mathematics, kicked off the 2015 faculty lecture series with his lecture entitled “Why the IRS cares about the Riemann Zeta Function and Number Theory (and why you should too!).” This lecture was the first of five faculty lectures, the next four of which will take place on consecutive Thursdays starting Feb. 26.
The focal point of Miller’s talk was Benford’s law. “My goal today is to explain to you what is Benford’s law,” Miller said. “Where does it occur? Why does it occur? Why do we care about its occurrence? And what can we use this for?” The motivating question for his lecture was: For a nice data set, what percent of the leading digits are one? A leading digit is the first digit in a number, so for example, the leading digit in the number 12 is one. The natural guess, Miller said, was 10 percent, though this guess is immediately corrected to 11 percent upon considering that zero will never be a leading digit.
Benford’s law shows, however, that in nature, this doesn’t occur. In fact, approximately 30 percent of leading digits for a data set are a one. This holds for data sets such as the Fibonacci numbers, most common iPhone passcodes and distance of stars from Earth.
Miller went on to discuss examples where Benford’s law applies. The examples were numerous and ranged from recurrence relations to special functions such as factorials to hydrology and financial data. He also mentioned numerous applications of Benford’s law, such as analyzing round-off errors and detecting tax and image fraud. “It may not be fraud,” Miller said. “It may just be the system is not reporting values properly. So this could be data integrity. There may be nothing malevolent going on but you may have values being recorded improperly, and Benford’s law becomes a very nice way to check this.”
Miller went on to state Benford’s law mathematically. He then gave a brief overview of modular arithmetic, explaining to the audience how time and clocks are a good example. He asked his son what time it would be four hours after 10 a.m. and his son correctly answered 2 p.m. “We say two numbers are the same mod c if their difference is a multiple of c,” Miller said. For example, clocks operate in modulo 12 arithmetic, so 10 plus four is equal to 14 normally, but 14 is congruent to two mod 12 because 14 minus two equals 12, which is a multiple of 12. Miller continued giving the audience the machinery to determine whether two numbers have the same leading digits, making the key observation that two numbers have the same leading digits if and only if their logs are congruent mod one, thus converting the problem to one of an analysis of logarithms.
“The key ingredient in a lot of the proofs is this notion of equidistribution,” Miller said.
“We say a sequence of numbers is equidistributed modulo one if the fraction of the time it lands in the interval is the relative length of that interval. So let’s say all of our numbers, the only possible values they take are between zero and one. Then what does it mean for my sequence to fall equidistributed in the interval, to fall uniformly in the interval? It means the fraction of the time I land in the subinterval (a,b) is just the length of the interval b minus a.”
An important theorem arises from this, which says that if a number is irrational, then the integer multiples of that number are equidistributed mod one. For example, the square root of pi is irrational, and looking at n times the square root of pi for n less than or equal to 10,000, Miller showed that “very quickly, this settles into equidistributed behavior.”
“It turns out your initial data set is Benford, it has this digit bias, if and only if this transform data set [via logarithms] falls evenly in the interval zero to one when we look at just the fractional parts,” Miller said.
For example, two to the n is Benford because the log base 10 of two to the n is equal to n times the log base 10 of two, which is n times an irrational number, which falls uniformly, as proven earlier in the talk. Miller went on to sketch the proof that the Fibonacci numbers are indeed Benford, and in fact, most linear recurrence relations are as well.
To end his talk, Miller gave the audience a concrete example of how the IRS used Benford’s law to discover fraud. “An audit of a bank revealed a huge spike of numbers starting with first digits … 48 and 49,” after accounting for Benford’s law. A bank officer had been having his friends get credit cards, run up balances just below 5000 dollars, and then the officer would write the debts off.
“The bank had an internal limit of 5000 dollars. It wasn’t worth their time to investigate a stolen credit card with fifteen dollars in charges,” Miller said. “The mistake a lot of the banks were making was they had a fixed line … What you should really do is you should say we investigate everything above 60,000 and then we randomly investigate some subset of things below 60,000.” Benford’s law allowed the bank to notice this unusual pattern in the credit card charges and helped it catch the offender.