Datanomics

Feb 13, 2022

Pick up a macroeconomic textbook, and you’ll read chapters of theory, mathematical equations, and models of rational behavior. Heavy on mathematical and social sciences, such books are light on one essential thing: strong evidence that the theories, equations, and models correspond to the real world.

They seem to offer more attempts at grand theories of economics than empirical analyses of both micro- and macroeconomic phenomena. In the literature, there has been relatively little interest in how this received theoretical knowledge works in practice or how we could improve both the data and procedures we use to measure economic performance. The very classification of economics as a social science implies its distancing from “hard” sciences, such as physics or statistics. All of this raises an important question: is economics a study of how humans should behave or is it an empirical study of what actually happens in human economic activities?

Most of the intellectual infrastructure (methods and processes) used to measure the economy was set up in the mid-20th century in response to the Great Depression and World War II1. Governments hold a monopoly on economic information, conducting vast surveys, censuses, and interviews -- in the United States, through agencies such as the U.S. Bureau of Labor Statistics and the U.S. Census Bureau. For example, the Census Bureau’s Monthly Advance Retail Trade Survey (MARTS), considered the bellwether for American consumer spending, is compiled each month by fifty-five hundred randomly selected businesses reporting their sales, using mail, fax, or telephone2. Many other useful indices and surveys, ranging from unemployment figures to population censuses (e.g., the American Community Survey, or ACS), similarly rely on large-scale, random sample polling. The government’s mid-20th-century apparatus merely selects, questions, and asks for the information. The speed of collecting, processing, and publishing such information has not increased meaningfully, especially when compared to other data-reliant sectors, such as the near-instant reporting of securities’ prices on electronic financial markets. Moreover, falling response rates – that is, the percentage of respondents who answer telephone-prompted surveys – profoundly distort the data collected3. At the same time, the increasing digitization of the economy, ranging from online payments to cheaper and abundant sensors on the movement of economic goods and software tracking business activities, has effectively given the private sector possession of the raw data that might provide an accurate and real-time measure of the economy4. Data owners themselves could conceivably produce a granular, real-time view of the economic conditions that traditional methods can only approximate slowly and at a lag.

Further, around the world, economic measurements based on government-monopolized methods and collection systems may not even be as accurate as data collected from corporate sources. In extreme cases, totalitarian regimes’ data are generally mistrusted by Western economists, and regime bureaucrats misrepresent or avoid publishing details, for fear of missing anticipated targets5. Few Western economists give complete credence to the details of official Chinese economic statistics, and recently the secretive interventions of Turkey’s central bank raise doubts about that country’s official statistics6. Even in liberal-democratic nations, outright economic misrepresentations can occur, as demonstrated by Japan’s recent admission that, for years, the government overstated construction orders and the health ministry misrepresented its wage data for 20187.

The United States has thus far avoided such errors, but recent price inflation has sparked debate about the composition of common inflation indices8, and the Census Bureau’s changes to employment microdata have led to questioning of data granularity9 10. In addition, routine revisions of earlier U.S. economic figures add to the effective data lag between measurement and publication of definitive data. While data from real-time, corporate sources may not be representative (because they bias selection), statistical methods, well tested in other fields such as political polling, can compensate for such deficiencies.

Andrew German, for instance, demonstrated that political polling of gamers through opt-in questions on the Xbox gaming console – surely an unrepresentative sample of voters – can be quite accurate, as well as cheaper and faster to administer than traditional polling methods11 12. Applying similar types of methods to economics, in June 2020 Raj Chetty and a team of economists published daily spending statistics from private-sector sources, disaggregated by industry and consumer income groups, presenting valuable conclusions on the new U.S. federal Paycheck Protection Program (PPP), which provided relief to small businesses under the CARES Act, and associated economic stimuli. Chetty’s conclusions appeared in June 2020, one month after the implementation of the stimulus and PPP programs and, most important, before the government had released aggregate figures on gross domestic product (GDP) for the first quarter of 2020, let alone disaggregated results evaluating particular policies13. Other forward-minded economists have embraced this change, including John Friedman, Nathaniel Hendren, and Ezra Karger14 15. They benefit from a tailwind of data appetite in the midst of the COVID-19 pandemic and associated social unease and even civil unrest.

The notion of using real-time or ‘fast’ data to inform policy decisions is not new. In the early 1970s, Salvador Allende’s socialist regime in Chile, keen to avoid food shortages and labour unrest, set up a network of telex machines (Project Cybersyn) to transmit details about factory production, inputs, and available transportation. While this example should not be taken as an endorsement of Allende’s policies, his government’s data-driven strategy alleviated one economic crisis in a country beset by many16.

The adoption of real-time economic data in making measurements and forecasts of economic activity is perhaps not entirely surprising. What may be surprising, however, is that one does not need any in-depth knowledge of economic theory to make predictions based on real-time economic data. Preparing such forecasts is closer to applied statistics and machine learning than to theoretical modelling. Consider M-Competitions, for example. Hosted by Spyros Makridakis at the University of Nicosia, Cyprus, these contests have shown that statistical or machine learning approaches are better at forecasting econometric time series than approaches rooted in theory or econometrics. The most recent competition has demonstrated conclusively that machine learning paired with basic domain knowledge (of calendar holidays, for example) can out-predict domain-specific or theory-based models. The contest was not even close: the winner was a senior undergraduate at Kyung Hee University in South Korea with three years of forecasting experience. The top thirty performing models almost all used similar machine-learning techniques17.

Makridakis concludes that experts in forecasting should focus on acquiring more contextual data and on measuring statistical uncertainty. The useful interpretation of M-Competitions’ results should not denigrate expertise in measuring and forecasting the economy but suggest how to fine-tune the exercise.

Fast, cheap, “big” economic data and better statistical methods are not the only enablers of better economic decisions. Another essential element is enhanced computer technology to store, transform, and manipulate this data more effectively. “Modern data stack” software has made it easy and inexpensive to store, integrate, transform, and visualize data. “Cloud” infrastructure allows users to scale (by sharing) their computational resources based on use. The development of better software tools for researchers -- rather than for computer programmers -- has made manipulation and modeling available to a wider range of economists.

A recent development worthy of note is the advent of cryptocurrencies and blockchain technology for transactions. While cryptocurrencies are still in their infancy, being little more than a speculative asset and surely subject to the ire of governments fearing loss of control over, and hence democratic input into, monetary policy, the public nature of all transactions on the blockchain might upend the study of empirical economics. China’s digital currency, for example, will give its central bank an X-ray of the economy as never before. Cryptocurrency-based projects and organizations report operating metrics publicly on the web, often in truly real-time – a startling speed when compared to the quarterly reporting of U.S.-listed public companies18. Should cryptocurrencies become tools for commercial transactions, we would find a treasure trove of real-time economic data available to researchers, investors, and the general public alike -- a far cry from three-month-old, 20th-century-era quarterly surveys.

Big Data for Twenty First Century Economic Statistics

How [MARTS] Surveys are Collected

House Hold Surveys in Crisis

A real-time revolution will up end the practice of macroeconomics

Red Flags

What happened to Turkey’s US$128 billion

Japan admits overstating economic data for nearly a decade

Pandemic Shopping Habits Are Giving Inflation Experts a Headache

Nick Bunker @nick_bunker

🚨 Attention users of CPS microdata 🚨 According to this guidance from Census, “2022 data for a household will not be able to be matched to any 2021 data.” In other words: no year-over-year CPS flows data for 2022. census.gov/content/dam/Ce…

Changes to 2022 CPS Public Use Microdata Files

High Frequency Polling with Non-Representative Data

Forecasting elections with non-representative polls

The Economic Impacts of COVID-19: Evidence from a New Public Database Built Using Private Sector Data

Opportunity Insights

Carts Background

Cybernetic Revolutionaries

The M5 Competition and the Future of Human Expertise in Forecasting

The Revolution Will Not Be Reported Quarterly

Lauren Lubetsky

Feb 21, 2022

Thanks for the insight!

Expand full comment

Anonymous

Apr 10, 2022

I had the chance to hear this at Neudata a few weeks ago and appreciate the transcription/citations provided here.

You highlight that methods and processes for measuring the economy are outdated and suggest that high-frequency data from digital sources could be more accurate for economic measures. You note that lack of response rates "distort the data collected" - how do you reconcile this view knowing that data collected by the private sector may also be biased towards segments of the population that are "online"? How long do you think it would take to demonstrate that traditional methods reported at a lag could be approximated by real-time data, or would the public sector have to collect both in parallel?

Magis

Discussion about this post