Why Credit Card Data still makes money
And, why commoditization of alternative data is a lazy argument
I recently came back from Battlefin Miami1, a conference for datasets focused on hedge funds. The conference is a trade show for data vendors pitching offerings to data buyers, predominantly hedge funds. One theme that came up was the perceived commoditization of consumer spending data2. I frequently hear data vendors, buyers, portfolio managers, and hedge fund LPs ask: if alternative data, and in particular consumer spending data, has become commoditized and widely available, why does anyone expect to generate alpha from it?
My tongue in cheek answer is that Compustat has been selling a database for decades and alpha is still generated from it. For the non-quants, Compustat is the standard in quantitative financial research3. When it first became available, mere access to the data via computers was an advantage. Today, the dataset is offered commercially at relatively accessible prices. It is the definition of commoditized and yet, quantitative funds buy it, use it, and a certain number generate consistent alpha from it. This apparent paradox is resolved when the infrastructure and talent needed to execute a strategy, in addition to mere access4, is considered. This observation is not particularly insightful, but worth writing down because it appears so frequently forgotten.
The infrastructure to execute alpha generating strategies on top of consumer spending data remains a major barrier to entry. It is computationally expensive to run multiple experiments and backtests that incrementally improve forecasting ability. It is similarly expensive to update forecasts frequently. So, teams that have sufficient compute resources and institutional patience to invest in infrastructure benefit from a barrier to entry that smaller firms that rely on syndicate research do not. The incremental cost for syndicate providers to do this marginal work likely scales slower than the return on investment for large buy-side firms to invest in this capability5. Further, for at least some types of strategies, computational effort does not increase meaningfully with assets under management – so large firms can spend a smaller percentage of management fees on more compute. This is a very attractive property – if an actionable insight for a large cap, liquid stock will cost one million dollars to compute, it is far better to be betting regularly betting hundreds of millions on such insights than tens of millions. This amortization effect also works favorably when considering the costs of building internal tooling for processing alternative data.
Talent that can generate alpha with consumer spending data also remains extremely rare. The bleeding edge of consumer spending research relies on both having a deep technical understanding of the structure of the data and a deep domain understanding of what matters to markets. For short-term strategies, properly accounting for domain specific issues, such as the impact of sales taxes or revenue recognition, while building systems that are fast are the crux of outracing the competition. For longer term strategies, understanding which of a potential long list of key investment questions, such as retention or hyperlocal competition, matter for which company, while keeping these calculations computationally feasible is the challenge. I have written about this talent gap at length before6.
Without pretense of a major insight, I predict infrastructure and talent will remain the key differentiator to execute on information advantages accruing to investment firms that invest in more sophisticated data science teams. Beyond consumer spending data at hedge funds, similar trends are playing out in adjacent industries like venture capital. A mere five years ago, I would often be met with blank stares when telling most venture capital investors that I was focused on using data to find early stage deals. Today, virtually every venture capital firm has at least one person sourcing deals using data. Clearly, mere data feeds of LinkedIn data are no longer enough – it is about the people and infrastructure you build. Execution is hard and scale begets scale.
I will refer to it as consumer spending data, but it is often referred to as credit card data despite representing all types of consumer spending such as checks, debit cards, gift cards, etc. and not necessarily being limited to literal credit cards.
The prevalence of Compustat in even academia can be quickly found with Google Scholar
There is no denying that the proliferation of consumer spending data has made an impact on investor expectations and information as it pertains to consumer stocks but it appears that this has merely changed expectations. I recommend Expectations Investing on this topic.
For example, how quickly could a syndicate data provider recoup the costs of providing data 1 day faster through price increases likely compares unfavorably to how quickly a large institution could recoup the same cost given the immediate trading advantage.