One word for graduate: statistics (translation)
took the liberty of translating an interesting article from The New York Times.
After graduating from Harvard with a degree in Archeology and Anthropology, Carrie Grimes studied the types of Mayan settlements, marking on the map the places where artifacts were found. But then she was fascinated by what she calls “all these mathematical and computer things” that were part of her work.
“People think of archeology as what Indiana Jones did, but in fact, most of the work is data analysis,” Carrie says.
Now Miss Grimes is engaged in a “dig” of a different kind. She works at Google, where she is engaged in statistical analysis of huge amounts of data in order to find ways to improve Google search engines.
Miss Grimes is an Internet generation statistician, one of those many who are changing the image of a profession that used to be considered a haven for idle maths. Now statisticians are feeling an increasing demand for their services.
“I continue to argue that statistics will be the most attractive profession in the next ten years,” said Hal Varian, chief economist at Google. “And I'm not joking!”
The growing status of statisticians who can earn $ 125,000 a year in leading companies immediately after receiving a doctorate is a consequence of the explosive growth in database volumes. Computational mathematics and the Internet are creating new opportunities for data analysis - sensor data, recordings from security cameras, correspondence in social networks and much more. The growth rate of digital data in the foreseeable future will not decrease, and by 2012 it will increase fivefold, according to a study by IDC.
Data is just the material from which knowledge is extracted. “We are fast moving to a world where everything is measured and recorded,” says Eric Brynjolfson, economist and director of the MIT Digital Business Center. “But the challenge remains the ability of people to use, analyze and extract something meaningful from the data.”
A new generation of statisticians is vigorously tackling this problem. They use powerful computers and sophisticated mathematical models to search for interpreted models in large data warehouses. Applications are extremely diverse: from improving Internet searches and online advertising to cancer treatment and optimizing food delivery.
Even the recently concluded Netflix contest, for which one million dollars was awarded to anyone who could significantly improve the system of recommendations for films to users, was a competition between the means of modern statistics.
But despite all this, statistics are only a small part of the many experts who use statistics to analyze data. Computational and numerical methods are more important than it might seem. Therefore, new specialists in data analysis come from areas such as economics, computer science and mathematics.
Data analysis specialists are in high demand at the White House today. “Clean, reliable data is the first” step towards coordinating our long-term economic policies and key policy priorities, ”said Peter Orzag, Director of Office of Management and Budget in his May speech. Later that day, Mr. Orzag admitted on his blog that his talk about the meaning of statistics was “close to my (I must admit to my pedantic) heart.”
IBM, seeing the prospect in data analysis, created the “Business Intelligence and Optimization Services” unit in April. This unit will attract more than 200 mathematicians, statisticians and other analysts to research laboratories - but this is not enough. IBM plans to attract and retrain 4,000 analysts from its employees.
Another indicator of growth in activity in this area is approximately 6,400 people attending a professional statistical conference in Washington this week, instead of 5,400 in past years, according to information from the American Statistical Association. The participants, men and women, young and already graying, looked like any other crowd of tourists in the capital. But their enthusiastic dialogues were about randomness, parameters, regression, and clustering. Data mining is developing as a profession that has traditionally been less visible and profitable, such as setting rates for life insurance.
Miss Grimes, at 32, had already earned a degree in statistics from Stanford in 2003, and the same year she joined Google. She is now one of many statisticians in a group of 250 data analysts. It uses statistical modeling to help make search engines better.
For example: Miss Grimes is working on an algorithm that sets up a search robot. The model has increased the likelihood that the robot will often check constantly updated pages and less often check non-updated ones.
The goal, according to Miss Grimes, is to get a small gain in computing efficiency. “Increasing efficiency by a percentage or two can have a huge effect if the operation is repeated millions and billions of times, as we have at Google,” adds Carrie.
A new world in research opens up thanks to the amount of data on the web. Traditionally, social sciences have monitored behavior through interviews and surveys. “But the Web provides this great opportunity to watch millions of people behave,” says John Kleinberg, social media specialist at Cornell.
For example, in a study just published, Kleinberg and two of his colleagues monitored the flow of ideas on the Web. They followed 1.6 million news sites and blogs during the 2008 presidential campaign, using algorithms that searched for and tracked phrases related to the news.
Researchers at Cornell found that, in general, traditional methods lead and blogs follow, usually with a delay of two and a half hours. But few blogs were the fastest to cite, which later became widespread.
Huge sources of data on the Web, according to experts, are dangerous. Their volume can simply “crush” statistical models. Researchers warn that a strong correlation between data does not always mean a causal relationship between them.
For example, in the late forties of the twentieth century, before the invention of the polio vaccine, health experts in America noticed that there were more cases of the disease with increasing consumption of ice cream and soft drinks, according to David Alan Greer, a historian and statistician fromGeorge Washington University . Removing such delicacies from the menu was even recommended as a polio diet. Later it turned out that polio outbreaks occurred more often in the hot months of summer, when people ate more ice cream.
The “explosion” of data attracts lengthy research in statistics, which also opens up new frontiers.
“The key to letting computers do what they are good at is to look for something that seems odd from a mathematical point of view in these datasets,” says Daniel Gruhl, an IBM researcher whose recent work focuses on analyzing medical data to improve quality of service. “And what remains for people to do what they do best is to interpret these anomalies.”