What I Learned Working at an AI Startup

In early 2022, I was ready to leave my role leading Stripe's fraud detection product, Radar, to pursue my own entrepreneurial aspirations. OpenAI had released GPT-3 in 2021. I had already been reading Gwern and playing around with the models. It was clear to me that there was going to be a whole new class of products opened up by the emergent capabilities LLMs had. I didn't want to miss it and so I left Stripe and made the leap.

Perhaps because my previous decade of work or so had been as a product manager when I set off I was very focused on coding and a lot less focused on the business side of things. I made a number of decisions early: 1. I wasn't going to raise funding, 2. I was going to build on LLMs, and 3. I was going to make something that was good for people. I started with an automated newsletter I called Infeather. I had trained a model to score web content and would scrape high quality accounts on twitter for daily sources. Then, I made an app which would extract question-answer pairs from the content (using GPT-3) to power a daily quiz to help people learn from content. It was in the education and news domain, which is to say it was very unlikely to make money. I had a few dozen subscribers, and I kept building features, but it wasn't really going anywhere.

Six months later I reconnected with Supernormal, an automated AI notetaker startup. They had wired up GPT-3 to meeting transcripts collected from Google Meet and were able to get pretty good summary and action item notes. Based on this they had gained some traction and were doing a seed raise. I felt I was going to learn more being part of a bigger thing and working with others. I knew Colin and Fabian and Supernormal, too. I helped found the company when they started it in 2019 before I left for the role I had at Stripe.

I stayed at Supernormal for two and half years as 'Head of AI,' which was a fancy title for being the person responsible for building the generative AI features in the notetaker. I would hire small teams of ML engineers to help with this task. I don't usually write with bulleted lists, but here is a list of things I learned in my time at the startup:

When you work in area that explodes worldwide, differentiation matters even more. From the beginning we know we would have competition from the meeting platforms (they provide their own notes) and from other companies. What we didn't realize were the implications of the entire tech world pivoting to LLMs. Supernormal used LLMs before ChatGPT was launched! In the beginning, this helped us. People became aware of AI and wanted to try to new tools. However, over time many other groups came to build similar notetakers. The business became commoditized. We decided not to choose a vertical early on (sales, project management, doctors) and in hindsight this made competition a lot tougher.
How machine learning engineers contribute has greatly changed. GOFAI (good old fashioned AI) engineers worked on hand crafting features and model architectures to solve problems with high business value for optimizing. A certain scale of data, processing, and cleaning of data engineering was required to take the step beyond heuristics as baseline. With LLMs there has been a fundamental change, at least in what is newly possible. Instead of building or fine tuning models, iterating over features, MLEs in product-driven orgs are now optimizing generative AI and other decision points provided by black box model APIs. This shifts the skillset required from more traditional MLOps (deploying and inference) to prompt engineering, fine tuning via API, and defining evals. Prompt engineering itself is something of an art, no MLEs required.
Trying to keep up with everything happening is a fool's errand, but vision matters. The pace of AI news over the past two years has been blinding as the world's companies compete to win hearts and minds with more capable models. We used OpenAI for the majority of the time, but other providers reached an equivalence class or had some differentiation (like Claude 3.5 Sonnet's tone.) Costs for good abstract summarization (our core product of meeting notes) for LLM APIs plummeted over time from O($0.10) to O($0.01). Reasoning models launched, and agents became the next big thing for a long time. Trying to keep up with every new model, way to run inference, and so on was just overwhelming and 90% of the news didn't seem to make much of a difference for a long time until suddenly it did. What I feel was really important is to have a vision of where the product and capabilities are going, because sooner or later similar products will start providing these and your own business can fall behind.

This is a short list and maybe after some reflection I'll have some more points of learning to add here. These might include: 1. product engineering can do a lot, but an embedded MLE on a product team can do more, 2. AI slop is not valuable, but providing the right AI at the right place can be, and 3. evals for LLMs are useful in some ways, but very difficult to get them to solve the core problem you want them for.