Simple, Fast, and Transparent Data Sales
Principles for Data Monetization
The following does not represent and is not intended to be investment advice. I may own securities referenced below.
The best data vendors are simple, fast, and transparent. As a former buyer of data at a large hedge fund, I am frequently asked how a new data vendor should approach monetization. Here are some thoughts:
Data governance is an additional cost data consumers incur. The fewer restrictions on the use license, the larger share of the expense the data vendor captures.
Use across the organization, by multiple functions, indicates that a dataset has an operational (rather than just intelligence) use case. Ultimately, this is good for both the consumer and for the vendor.
If a consumer is receiving value from a dataset that vastly outpaces the price, license renewals can accommodate a renegotiation.
Align incentives on price:
Most data are priced annually. The other common alternative is to price by row in cases where there is a direct action or value tied to each row (for instance, for leadgen or contact data).
The most interesting viable alternative is around consumption-based pricing: the more the consumer uses the data, the more they pay. At the cost of some predictability, vendors can lower the barrier to adopting a new dataset.
The goal of pricing data, in general, should be to align incentives (ie. both you and the customer benefit from more accurate, timelier, or more complete data). This is difficult is long-term, multi-year deals.
While it may make sense to data vendors to price discriminate based on the use case, as soon as price discrimination becomes complex, every sale is a custom negotiation.
Data Catalog and Data Dictionary
Data consumers need a full outline of all the datasets you can make available and what those datasets contain that is available, ideally, with minimal conversation.
Organizational complexity (having multiple specialists for different data content) significantly slows down and increases the overhead of the sale.
Bulk data transfer
There is an unfortunate trend towards building custom APIs. APIs requires custom business logic (read engineering time) to pull data. An API makes sense when the underlying data structure is truly so complex, that it is impossible to represent the data in tabular form in any meaningful way.
Usually, tabular structures for transferring data make sense. It is even better when those tabular structures can match the environment the customer will be using, further reducing development time.
It is helpful to be able to instantly provision the data, especially when a consumer may be attempting to react to a real-time situation.
The most used and trusted Covid data providers were, by and large, the ones that had data available immediately and were instantly accessible.
Free Data Evaluation
If dataset is not valuable at the row-level, it is useful to give free data evaluations to prospective clients.
The value in temporal datasets especially comes from the recency of the data, so you are giving away very little value by sharing old, but you are building trust that the data is what you say it is.
Every data buyer may have their own vendor diligence process but the core data privacy and provenance issues are the same. It is helpful to have this material available ahead of time.
Documented data updates
The same questions always come up: how frequently the data is updated, one what lag is the data, and at what granularity does it come.
Further documentation about up-time, availability, updates, and versioning are all helpful and in this vein.
Thanks for reading Magis! Subscribe for free to receive new posts and support my work.
Snowflake has a great outline on how data-sharing works on their platform. As more organizations centralize data in the cloud, it makes sense to keep data where it is and simply permission new consumers to query it.
Alternative Data Council docs are particularly useful; while focusing on the RIA/Investment Management space, the issues are applicable more broadly.
Matt Turck, of FirstMark, has an early but still relevant blog post on this subject, particularly referring to the hedge fund industry.
Safegraph has a particularly noteworthy set of documentation.