Google Engineer Uses Porn to Predict Unemployment
MOUNTAIN VIEW, Calif. – Porn is in the eye of the beholder. In one former Google data engineer’s eye, explicit adult content online — or, rather, the number of searches for such content — is a bellwether for the state of the U.S. economy.
According to Seth Stephens-Davidowitz, former Google employee and author of the new book Everybody Lies, the search giant’s data can predict all sorts of things. For example, one researcher at the company designed a piece of software that used flu-related searches to create a predictive map of outbreaks well before the U.S. Centers for Disease Control released its official statement. The Google-search-based map proved as accurate as the CDC’s.
Stephens-Davidowitz and Google’s chief economist, Hal Varian, wanted to know what a similar process would indicate about the U.S. economy. Coincidentally, at about the same time Google software developers released Google Correlate, which is now part of Google Trends. The program helps researchers theorize based on what amounts to crowdsourcing, allowing them to draw broad-stroke pictures based on the way what they’re studying correlates with Google searches.
Because economists are notoriously fascinated with statistics for things like joblessness, Stephens-Davidowitz and Varian set out to discover what Google Correlate could tell them about the unemployment rate.
They compared search and unemployment data from the years 2004 through 2011, and found an interesting trend: When unemployment is up, so are searches for porn.
“This may seem strange at first blush, but unemployed people presumably have a lot of time on their hands,” Stephens-Davidowitz wrote. “Many are stuck at home, alone and bored.”
He may be onto something there, as another search term highly correlated with unemployment was “spider solitaire.”
The gist of Stephens-Davidowitz’s book is not that porn and unemployment go hand in hand, although that’s an interesting tangential notion. The book actually takes a look at datamining and how it may provide answers to all sorts of vexing questions if only researchers look beyond the obvious sources of information. Surveys simply do not provide reliable measures all by themselves because, as the title of the book suggests, everybody lies, especially about sensitive and/or personal issues.
Big Data, on the other hand, has proved remarkably truthful.