Market research and targeted advertising requires consumer data, specifically purchase and digital behavior records. Data can be collected actively or passively. Active collection requires consumers to take surveys, submit records, or make some other effort to contribute data. Consumers are required to participate by law, volunteer, or are paid. Collected data can be high quality, but obvious downsides include noncompliance, incomplete data, time required on the part of participants, and effort required to recruit participants1. These disadvantages lead to relatively small samples and infrequent collection. Passive data collection does not require effort from the consumer, rather data is collected continuously in the background by an app, bank, or service the consumer is using. The comparative advantage to active collection is that the collected data is more complete, less biased, and more consumers can be reached. Privacy laws and consumer opinion dictates that passive data collection needs consumer opt-in. Consumer opt-in can be obtained in exchange for free products and services (“in-kind”) or for explicit payment (“data dividends”). The former is well understood - companies like Meta offer their products for free, in exchange for collecting data from their user base that can be used by or sold to data users. The latter is less common but has been the frequent subject of speculation. Proponents of data dividends note that they combine the best attributes of active data collection (very explicit opt-in) with the best attributes of passive data collection (low effort for the consumer).
To my knowledge, no data dividend funded dataset exists of comparable scale to in-kind funded datasets. Despite limitations introduced by privacy laws, changes by corporations in response to media attention2, multiple startup efforts, and emerging interest driven by the crypto community, data dividend datasets have not become common.
My hypothesis is that the unit economics of data dividends are structurally challenging and may preclude large scale adoption. Data dividends scales linearly with the number of participants. In contrast, in-kind payments scale sub-linearly because the marginal cost of provisioning an application or software service is low and can decrease with scale. For instance, Meta’s cost of hosting a marginal user is very low and the marginal cost decreases after each order of magnitude in scale.
The unfavorable marginal costs of data dividends could be overcome, of course, if revenue scaled faster but prevailing data prices per consumer are too low. This is again best illustrated by Meta. Meta’s average revenue per user is approximately ~$40 per year or $3.3 per month3. Meta is just one example of an in-kind data buyer, but Meta’s monetization per user likely represents a best case scenario4.
In exchange for the data that generates this revenue, Meta users can use the family of Meta products (Facebook, Instagram, Whatsapp, Messenger) for free. The monthly in-kind value of the Meta family of apps likely far exceeds $40 per year5. This places data dividend programs into the unenviable position of having to compete against a difficult unit economic equation.
A corollary challenge is adverse selection and redundant opt-in of participants. Consumers willing to accept a low price for their data is correlated with certain demographic, socioeconomic, and other biases. For many (but not all) market research and advertising purposes, this set of users is less valuable. Also, consumers with this preference are likely to sign up for many such data dividend programs to maximize their earnings – so the total universe of data dividend users generated even across data dividend programs often has high overlap.
Finally, there is a lack of consensus among policy stakeholders that data dividends are a good solution. For instance, the Electronic Frontier Foundation (“EFF”), a digital rights and privacy advocacy group, often in favor of stricter data privacy laws, actually opposes data dividend programs6. This is surprising at first glance since the media-driven argument against passive data collection has been that consumers are deceived is clearly not relevant with data dividends. Instead the EFF argues that data dividends are unfair on the basis that the consumer is deceived into the relative value of their data.
So, is the data dividend model hopeless?
Some promising ideas I have come across include:
Hybrid Value: It is conceivable that certain data dividend products also provide value in-kind such that the combined value of the dividend with the utility from the app together overcomes the unit economic hurdles. Consumers would explicitly know they are selling their data (thereby accomplishing the stated goal of most dividend programs) but the app also provided enough utility that the magnitude of direct payments could be economical. For instance, a dividend program may offer gamified experience that is fun to participate in, in addition to being paid for data. Where the lines lie between in-kind offers and true dividend programs, I will leave to the reader.
Variable Upside: Novel data dividend models could offer variable dividends dependent on some third party process or user action. For example, a quant fund may offer a stake in results in exchange for contributing data. Similar profit-sharing programs were popularized with the crypto wave as Decentralized Autonomous Organizations (“DAOs”). Or, a dividend program may offer discounts for purchases consumers were going to make anyway in exchange for their data.
What am I missing?
I could be wrong. This time it could be different. It is possible that consumer preferences for privacy change drastically wherein the unit economics are not the incentive driver (rather, participation becomes an ethical statement). It is possible that privacy oriented changes, such as the elimination of internet cookies, drives up the price of identifiable data. And, it is possible that someone invents use cases for consumer data that are so lucrative that very high dividends are justified. If you have counter examples or if you are aware of a very large data dividend driven dataset, please reach out.
There has been much discussion about how to compensate authors, journalists, artists, and other content creators for their work being used in AI. I have heard variations of the data dividend pitch to accomplish this. Proponents of such a solution would do well to consider the consumer data precedent.
I have previously written a more full accounting of the problems with government surveys:
This is just a matter of opinion, but I would make the case that Meta is among the most lucrative data monetization. Meta owns effectively an exclusive dataset with extremely specialized ad targeting ability and is extremely penetrated among advertisers (data buyers). Therefore, I think Meta’s revenue per user represents, more or less, the highest value scenario for data sellers.
Again, a matter of opinion and difficult to estimate exactly.
Great article and really insightful distinction on the linearity of reward vs marginal data user. I’d also add “marginal data contribution” - because each user also has a range of data they are actively contributing. In this realm, Money management apps (Mint, TrueBill/Rocket)may have reached the largest scale of hybrid data sharing of any category. They combined a value-add experience (budget insights) with direct opt-in on a case by case basis (account connectors).
Great article. Are you aware of any quant funds offering a stake in results in exchange for contributing data?