While I am now working on a startup1, it is occasionally difficult to not imagine what I would be working on if I were at an investment manager using large language models (LLMs)2.
Here are some rough ideas:
Excel Models: Most investment funds have analysts who primarily work with Excel, even if they also use CRM systems or data science. Data scientists and analysts are feeding information into a system that ultimately gets summarized and consumed by the portfolio manager3 in Excel.
There is good reason for this: Excel is highly flexible when it comes to manipulating data and the lowest common denominator across firms. At its core though, an Excel file is essentially just an XML file4.
LLMs are capable of producing HTML5, so it’s a small step to assume LLMs could produce Excel files. The details are key here: it’s not just about pasting data into a spreadsheet. The formulas and reference need to work, and the formatting will look like it was created by a human. Ultimately, these files can be fully manipulated by the user as if another analyst made them. Investment firms are particularly well-suited to train these models, as they often have a large number of internal files that can be used for training. This seems like an obvious productivity boost that could then also incorporate external data.
Email & Chat Meta-Analysis: Internal Slack/Teams channels, email, and likely some sort of formalized CRM system6 are common across managers. The former are essentially underutilized sources of data that could populate the CRM.
The biggest challenge I’ve heard from data scientists at mangers since leaving Coatue is that it is hard to convince users to use CRMs. How do you convince a series of GPs to log their data in Affinity? This becomes a non-issue if you can crawl communication logs and build the CRM for the company from scratch. No one has to change their behavior but you get perfect meta-data.
Note-taking: Tools like Gong can record and transcribe Zoom calls7. It’s clear that the entire process of note taking can now be automated: record every Zoom meeting, automatically transcribe it, then summarize it. The portfolio manager receives a feed of the most important parts of the conversation to form a mosaic of information.
Bridgewater is famous for its recording of meetings8. Perhaps five years ago, this was indeed ambitious, but at this point, investment managers are throwing away immediately useable data by not doing so. As an owner, your employees (investment analysts) can leave, but it make sense to own the intellectual property they created.
New Datasets: What percentage of CEOs are worried about inflation? In the past, this would require running survey or a painstaking process of reading through transcripts. Suddenly, this structured answer can fall out of unstructured data cheaply and at very large scale.
Companies like Alphasense9 or Tegus10 have an opportunity to create structured datasets out of unstructured data today but almost all investment managers have their own unique unstructured data that could produce structured datasets.
What I wouldn’t spend time on yet:
Arbitrary SQL: There is a range of tools promising to make SQL available to the average user11. I think this is valuable in constrained settings but won’t work on arbitrary datasets (particularly alternative data). Because the logic to make alternative data actually predict KPIs is complex and nonobvious, the problem won’t be the SQL translation but the business logic12.
The LLM-generated query for ‘what is Doordash’s market share’ is easy for LLMs to form, but the obviously working SQL won’t actually produce an accurate answer on raw data. The problem is with the prompter13, not the technology.
I put a lot more thought into this follow Sequoia’s Ascent Conference:
Or investment committee, or GPs preparing for Monday meetings, etc. The terminology changes based on the asset class, but the concept is the same.
Search for public filings: https://www.alpha-sense.com/
Expert transcripts: https://www.tegus.com/
OpenAI even uses this as an example use case: https://platform.openai.com/examples/default-sql-translate
Those who have worked with alternative data are aware of examples of this business logic: panel normalization, de-biasing, fiscal quarter corrections, etc.