Data businesses can be particularly valuable if they have moats that guard their product from replication1. Such moats can come from legitimate technological innovation or business partnerships. The data moats worth criticizing are those built with anticompetitive government relationships: regulatory capture2. There are two types of common unfair moats: ones based on improperly proprietary identifiers and ones based on outright unfair data access.
Proprietary identifiers, also understood as data join keys3, often replace what should be open industry standards. Regulatory, tax, and statistical agencies often require the private sector to submit data and publish aggregate statistics for commercial, academic, and personal use. Various industries use agreed upon standard identifiers for companies, investable securities, and other entities that make the data easy to use. This presents an opportunity for regulatory capture. When a proprietary standard is mandated, the copyright holder can extract high fees.
One such example is CUSIP, an identifier used by financial market participants to identify companies and investable securities. The identifier is copyrighted and owned by the American Bankers Association and the operating company was recently bought by Factset4. Both use and distribution of CUSIP, even indirectly, requires a license. This is problematic because regulatory agencies such as FINRA, SEC, and the CFPB use the identifier in ostensibly public domain data releases. CUSIP benefits from an effective government mandate to do business with them5.
Another example is the DUNS number, which was used as a primary entity identifier for several government agencies, including the General Services Administration’s SAM database of government contractors. This regime required users and distributors of the data to license products from Dun & Bradstreet. While obtaining a DUNS number was ostensibly free, it gave Dun & Bradstreet a monopoly in entity validation services it provided to the government and it gave Dun & Bradstreet an unfair (effectively mandated) advantage in collecting data on business entities other data providers did not have. Several states and international governments still require the DUNS number, and this is advertised on the DUNS website6.
A second regulatory capture model occurs when companies gain privileged access to government data that they monetize. Bill of ladings data are an example of such unevenly available data. Bill of ladings are government forms collected when goods are imported into the United States by sea, approximately equivalent to shipping labels. The data is commercially valuable because it can be used to research supply chains. While the data comes from the government (the Custom and Border Protection Agency, specifically), how and where to access the data is not clearly documented. Instead, several commercial entities obtain this data and sell it. The majority of customers are likely unaware of the exact source of this data. Freedom of information requests are apparently denied in relation to obtaining this data but certain companies are able to find the right contact and obtain the data for a fee7. A similar situation exists with United Kingdom Gilt price data. UK gilt prices were previously calculated by the Debt Management Office (DMO). In 2016, the agency ran a RFP and accepted a proposal by FTSE/Tradeweb to take over calculating daily closing prices8.
The mere fact that the government charges for certain data is not problematic. Government agencies incur real costs in procuring and analyzing data, and it makes sense to charge if the primary beneficiaries are only a subset of the private sector. Further, independently reviewed RFPs to grant a private company the right to process and publish such data - as happened in the UK Gilt case - are preferable to an entirely opaque process. However, I am skeptical that technocrats should ever select a single provider (even if stakeholders claim they prefer a single authoritative source today, such ‘benevolent’ monopolies fail to anticipate changing circumstances and new stakeholders – for instance, the advent of AI/LLM users may well change the optimum)9.
There are reasons to be optimistic that at least certain cases of both regulatory moats can be eroded. Numerous financial regulatory agencies recently proposed moving away from CUSIP10 to the FIGI11 and are soliciting public comment12. SAM.Gov announced, two years ago, that it will move away from the DUNS number to its own, open, standard13. Further still, the recent litigation around CUSIP has led to questions about whether identifier numbers alone (as opposed to in their totality) can be copyrighted at all1415. Similar examples exist in the case where the government produces expensive data to the private sector. For instance, the USPS began charging for their change-of-address database and Fannie Mae and Freddie Mac charged for commercial use and redistribution of their data. In each of those cases, there are multiple competing vendors16, transparency in the license agreement needed to access the data, and transparency in pricing. Any new data vendor can agree to the license and compete on data distribution and value-add.
I also want to point out a few distinct potentially anti-competitive data licensing cases outside the scope of the above comments. There are data products where governments fall short in data integration, so commercial entities step in. This situation is only problematic when other businesses are not allowed the same raw data access. Competition is different from convenience – the mere requirement for high upfront capital expenditure does not make a market anticompetitive. For instance, CoreLogic, BlackKnight Financial, and Attom sell mortgage deed data they gather from county governments. They bear the cost of data standardization and integrating with each county government. In theory, this seems like it should be competitive. What would be problematic, however, would be if certain counties release data only to certain vendors or counties lack transparency in how competing vendors might participate (as in the Bill of Ladings case). A second case, not to be overlooked, is that data vendors may engage in traditional anti-competitive metrics, such as price collusion, that are not regulatory capture, strictly speaking. I do not cover such cases in this essay.
Cleaning, integrating, and distributing public domain data is a valuable commercial service that private sector data companies should be paid for but there will always be a temptation to build anti-competitive moats. That’s lazy. Data companies should compete on value-add on top of public data rather than attempting to be a tax on users. This serves the best interest of the private sector customer, the government, and, most importantly, the taxpayer.
Many of the companies on my list of data companies have enduring moats. Counterpoint Global Research (ran by Michael Mauboussin) has a great list of wide-moat businesses, a surprising number of which are data businesses.
A quick summary on CUSIP from Wikipedia. CUSIP Global Services was recently bought by Factset.
An example of what happens if you indirectly receive their data. Although, legal fights are emerging.
Worth reading the DUNS website on the GSA change.
One can read the full RFP review and make your own decision if the outcome is desirable.
A good summary of that proposal was issued by the FDIC. The full explanation of the joint rule and methods for public comments, as a result of the Financial Data Transparency Act can be read here.
FIGI was originally developed by Bloomberg, but it has transitioned into an independent and open standard with permissive open licensing. While all open source projects have risk when primarily developed by a single, well resourced, commercial developer, this is still the best open standard that exists to my knowledge. Other standards, such as LEI or PermID (operated by a Bloomberg competitor), are also viable.
I will leave it to the reader to decide if the ABA and CUSIP’s response public comment sounds like someone who is definitely not benefiting from an unfair monopoly.
Tim Baker summarizes this well in his Linkedin post.
Worth noting that the EU took anticompetitive legal action against CUSIP Global Service’s previous own, S&P, previously — although, this was around the specifics of issuing a related identifier, ISIN, rather than CUSIP, specifically.
For instance, here is every vendor with full access to USPS COA data. And here is the same from Fannie Mae, along with the standard data redistribution agreement.