Nov. 13, 2025

How to Build a Legal AI Company: The 10-Year Journey from Professor to $25M ARR

How to Build a Legal AI Company: The 10-Year Journey from Professor to $25M ARR

At age 26, Ben Alaire became possibly the youngest tenure-track law professor in University of Toronto's history. Fresh from clerking at the Supreme Court of Canada, he had the academic dream—teaching tax law and co-authoring the 1,500-page textbook generations of Canadian law students would use. Then in 2012, while leading curriculum reform as associate dean, he asked himself an uncomfortable question: What will the legal profession look like in 2050?

The answer was unavoidable. Moore's law would keep doubling computing power. Legal information was entirely digital. AI researchers like Geoff Hinton were making breakthroughs just across campus. Ben could see it clearly—"there is a freight train with AI on the front of it coming directly for the law."

That realization led him to co-found BlueJ in 2015 with a seemingly impossible vision: build an AI system that could automate all of tax law research. Type any tax question, get a world-class legal memo in seconds with citations to primary sources.

Ten years later, BlueJ has 3,400+ law and accounting firms using the platform, an NPS score of 84, and grew from $9 million ARR at the end of 2024 to over $25 million by Q4 2025. The company just raised a $122 million Series D.

But here's what makes Ben's story remarkable: for eight years, BlueJ was technically impressive but commercially struggling. They had supervised machine learning models that predicted court decisions with 90%+ accuracy. They had hundreds of customers. They grew to $5 million in ARR. Yet they didn't have true product-market fit—and Ben knew it.

The breakthrough came in 2023 when Ben made one of the gutsiest pivots in legal tech: abandoning years of work on predictive models to rebuild BlueJ entirely around large language models. That decision took his company from $2 million to $25 million ARR in less than two years.

Key Takeaways

  • Partial product-market fit is a dangerous trap that can waste years. BlueJ spent eight years with customers who loved specific features but couldn't sustain usage because the product didn't solve their complete workflow. According to SaaS retention research, inconsistent value delivery is the #1 predictor of churn. Ben's supervised ML models were technically brilliant—when users had the right type of case. But most tax questions fell outside those narrow use cases, creating the "try it, love it, forget it" cycle that prevented scalable growth.
  • The shift from supervised learning to LLMs was the difference between a feature and a platform. Retrieval augmented generation (RAG) made it possible to handle any arbitrary tax question by intelligently searching a curated corpus and synthesizing authoritative answers. This architectural change eliminated the need to build individual models for each tax issue, transforming BlueJ from useful tools into a comprehensive research platform that could replace traditional workflows entirely.
  • Time-to-value is the most powerful driver of word-of-mouth growth. When users could validate BlueJ's value in 20 seconds instead of hours, the social dynamics of referrals completely changed. It became socially safe to recommend the product because colleagues could immediately verify without significant time investment. According to viral growth research, reducing time-to-value by 10x can increase viral coefficient by 3-5x. BlueJ's instant value demonstration drives 10+ new firm signups daily.

Table of Contents

  1. The Vision That Started Everything
  2. V1: Predictive Models That Were Too Narrow
  3. The Partial Product-Market Fit Trap
  4. The LLM Breakthrough and Gutsy Pivot
  5. From $2M to $25M ARR in 18 Months
  6. Why Time-to-Value Changed Everything
  7. Competing with ChatGPT as a Vertical AI
  8. Frequently Asked Questions

The Vision That Started Everything

When Ben proposed reforming University of Toronto's law school curriculum in 2012, a colleague warned him: the last major reform in the 1970s had created "hard feelings among faculty who never really got along afterward." The advice was clear—focus on your scholarship instead.

Ben took away a different lesson. "If it took 40 years for someone like me to show up and suggest we should do a deep dive into changing the curriculum, it may be another 40 years before someone shows up again."

That observation triggered a thought experiment: What would the legal profession look like in 2050? The answer seemed unavoidable. Computing power was doubling every couple of years. Legal information was already digital. AI researchers across campus were making breakthroughs.

"There is a freight train with AI on the front of it coming directly for the law," Ben realized.

The uncomfortable hypothetical haunted him: "What if I'm standing at the front of a law school classroom in 2040, thinking, oh man, I saw this coming in 2012, 2013, 2014, and I decided not to change my professional trajectory?"

His entry point was personal. Ben had co-authored multiple editions of "Canadian Income Tax Law," a 1,500-page textbook. "It's such a manual job to update it," he explains. New cases, legislation changes, regulatory amendments—the law constantly evolved, rendering each edition outdated almost immediately. "There's got to be a better way."

Ben recruited two brilliant co-founders from the U of T law faculty: Anthony Niblett (Harvard PhD in economics, former University of Chicago professor) and Albert Yoon (Stanford JD and PhD, former Northwestern professor). When Ben pitched starting an AI company for tax research, both responded: "We're on board, but you have to take the lead."

In 2015, they started BlueJ with a clear vision: build a system that could automate all tax law research. You'd type any tax question, and BlueJ would produce a world-class answer with citations to primary legal sources.

There was one problem: in 2015, that vision was "utterly science fiction."

V1: Predictive Models That Were Too Narrow

Since the grand vision wasn't technically feasible, BlueJ started with what they could build: supervised machine learning models that predicted how courts would resolve specific tax questions.

One classic example: determining whether a worker is an independent contractor or employee for tax purposes. Courts look at the functional relationship between hirer and worker, often ignoring written agreements. If the court reclassifies the relationship, the deemed employer faces massive consequences—they should have been withholding income tax, EI and CPP contributions, plus paying employer matching contributions.

BlueJ built models that predicted these classifications with better than 90% accuracy. They created similar models for other recurring tax issues, each performing remarkably well.

They attracted customers from law firms, accounting firms, and government. They raised multiple rounds. The company grew to over $5 million in ARR. Technically impressive. Investors encouraged.

"It was enough to keep us encouraged," Ben admits. "But we knew what the problem was."

The Partial Product-Market Fit Trap

Here's what was actually happening: Users would encounter a tax issue matching one of BlueJ's models. They'd try it, get excited—"this is really great." The same issue would come up again, they'd use it again, still thrilled.

Then they'd encounter a slightly different tax issue. They'd return to BlueJ hoping for help. But BlueJ wouldn't have a model for that problem. "Oh, that's too bad. I was hoping BlueJ would have something on this."

Another disappointing experience. Eventually: "This thing just doesn't cover enough of what I want." They'd forget about it and never log in again.

"There were some power users who knew exactly what BlueJ did," Ben explains. "They would come in with reasonable frequency and get tons of value. But it wasn't consistent enough for most users to really take off."

This is what Ben calls "partial product-market fit"—one of the most dangerous traps in SaaS. The product technically works. Customers pay for it. Some users love it. Revenue grows steadily. You can raise venture capital.

But something fundamental is missing. "The biggest limitation of V1 is that we only had these issue-by-issue models. They were really good at what they did, but it didn't satisfy the original vision—something that could handle any tax research question."

According to behavioral psychology research on habit formation, products delivering value intermittently struggle to achieve the frequency needed to become indispensable. BlueJ couldn't become a habit because tax professionals couldn't rely on it for their complete workflow.

The LLM Breakthrough and Gutsy Pivot

In September 2022—before ChatGPT launched publicly—OpenAI released DaVinci 3. Ben was in the playground experimenting. "Oh, this is surprisingly good," he thought.

That observation triggered a realization: "Maybe there's something with these new large language models we can harness to get to this seemingly magical outcome."

The key innovation was retrieval augmented generation (RAG). Instead of building individual models for each tax issue, BlueJ could create a master corpus of relevant tax research materials. Using vector embeddings and intelligent chunking, they could run smart searches to find relevant materials, then use LLMs to synthesize authoritative answers.

"You just could not do that reliably prior to large language models having sufficient natural language understanding and synthesis capabilities," Ben explains.

By early 2023, BlueJ faced difficult decisions. They had over $5 million ARR from predictive models. About 50 employees. Paying customers getting real value. Investors backing the supervised ML approach.

But Ben knew the current path wouldn't lead to breakout product-market fit.

"It took some courage and conviction," Ben reflects. The conviction: "Doing what we're doing is not going to scale properly. We're not going to get that breakout product-market fit we need."

Ben's plan: "We're going to put all our existing tax research tools into maintenance mode. We'll keep servicing the software, keep updating it, but no new feature development. We're not going to invest in building new models. People get what's in there."

Then the bold commitment: "We're going to take the first six months of 2023 and focus all our development efforts on building something that can answer any tax research question in U.S. federal income tax law."

Why U.S.? Market size, content availability through their TaxNotes relationship, and massive opportunity. ChatGPT had just demonstrated huge consumer adoption of LLM interfaces. "There has to be this version of ChatGPT, but specifically for tax research," Ben reasoned.

"We had the team, the balance sheet, the data prepared, the data science expertise. We were ready to go. We put all our chips on large language models and said, let's see if we can do what we set out to do originally in 2015."

From $2M to $25M ARR in 18 Months

By June 2023, BlueJ had their first prototype. It could answer U.S. federal tax questions, but Ben describes it as "a little gnarly, a little janky."

The problems: half the time there were issues with answers, occasional hallucinations, 90-second response times, single-shot interactions only (no follow-ups), and an NPS around 20—"merely OK, the bare threshold for a saleable SaaS product."

Despite limitations, the product resonated. By year-end 2023, BlueJ had nearly $2 million ARR from the new product—"a very successful launch" for something admittedly rough.

Throughout 2024, relentless iteration: upgrading from GPT-3.5 to GPT-4, improving retrieval algorithms, refining prompts, making it fully conversational, reducing response time to 15 seconds.

By year-end 2024:

  • NPS climbed to around 70
  • ARR reached just shy of $9 million
  • Cashflow positive from operations
  • Grew from fewer than 100 firms to roughly 1,200 firms

The momentum accelerated through 2025. By Q3:

  • NPS reached mid-80s (84 in trailing 30 days)
  • ARR hit mid-$20 millions (roughly triple year-end 2024)
  • Customer count approached 3,400 firms
  • Adding about 10 new firms daily
  • Closed a $122 million Series D in July

According to SaaS benchmarking data, tripling revenue year-over-year at $10M+ ARR represents exceptional growth seen in <5% of B2B SaaS companies. BlueJ achieved it while maintaining positive unit economics.

For Ben personally, the PMF moment came during a demo at the Canada Revenue Agency in early 2024. After his prepared examples, he opened it to the 150-person audience: "Does anyone have tax research questions?"

A front-row attendee raised his hand. "This is a tricky problem. It took us two weeks internally. I don't have my hopes up, but can you try?"

Ben typed the question into BlueJ. The answer started generating. The questioner stood up, walked to the screen, reading line by line.

"That's the answer we came up with."

It wasn't just similar—it was the actual answer CRA experts spent two weeks developing. BlueJ produced it in roughly 20 seconds with all relevant sources cited.

"I remember driving back to my office," Ben recalls. "I turned up the radio and was pretty pumped. That was hugely successful—an ecologically valid test. We hit it out of the park."

Why Time-to-Value Changed Everything

Ben's insight about time-to-value reveals one of the most powerful growth frameworks:

"The conversion, trial conversion, word-of-mouth—it's so much fun to sell BlueJ now compared to V1. Just think about the social dynamics."

The low-friction referral: "If I'm confident I can recommend this to you and you can validate it for yourself at very low cost right away, it makes it socially far less risky for me to suggest it. You can validate extremely quickly. I'm not inviting you to spend days trying to figure something out where you might resent me."

The enthusiasm becomes natural: "Go try it, just try it, it's easy."

The asymmetric payoff: "If it works, awesome—you'll be pleased I shared this tip. If it doesn't work? It wasn't a huge time commitment to figure out it wasn't for you anyway."

The fundamental principle: "The consistency of value—I'm not asking you to make a huge investment in your time and the upside gets so asymmetric."

Compare this to complex products like Photoshop. "If I recommend Photoshop, I'd have to warn them it's great but will take a long time to learn. That makes word-of-mouth so much harder than if you know they're going to try it and love it right away."

This explains why BlueJ adds about 10 firms daily through word-of-mouth despite being in a traditionally conservative market like legal and accounting.

Competing with ChatGPT as a Vertical AI

As LLMs become more powerful, every AI company faces the question: Why can't users just ask ChatGPT?

Ben's answer reveals how vertical AI defends against horizontal platforms:

Authoritative, curated content: "We have copies of all authoritative content necessary to produce great answers." Major legal publishers like Thomson Reuters, CCH, LexisNexis, and Bloomberg spent decades assembling comprehensive collections "safely guarded behind paywalls—it's not on the open web."

Horizontal tools like ChatGPT rely on whatever they find freely available, meaning answers come from variable-quality, often outdated web sources.

Currency and accuracy: "If you're relying on web documents from 2022 and 2023, sometimes the law hasn't changed. But more commonly, materials are out of date, anachronistic." Generic AI models "don't know the difference. They just produce confident-sounding answers."

BlueJ maintains meticulous currency of their entire corpus, ensuring every answer reflects current law.

Verifiable sources: Tax professionals need to verify sources for professional liability. "We make it easy for users to look at authoritative sources and validate where things came from."

Purpose-built for discerning users: "Our users are very picky—tax professionals not content with ChatGPT. Their clients pay them significant money for their time. If they can accelerate research with tools that cut through the task like a hot knife through butter, they want that experience."

Ben's advice for founders: "Be ruthlessly honest with yourself about whether you have product-market fit. You're probably the easiest one to deceive. You can have happy ears listening to folks providing encouragement saying 'this is really great.'

"The real test is: are they using it aggressively every day, telling everybody else, and paying real money for it? That's when you know you've got product-market fit."


Frequently Asked Questions

What is legal AI and how does it differ from general AI tools?

Legal AI refers to artificial intelligence systems specifically designed for legal and tax research, built on authoritative legal content collections rather than general web data. Unlike ChatGPT, legal AI platforms like BlueJ maintain curated corpuses of primary legal sources (statutes, case law, regulatory guidance) behind professional paywalls, ensuring currency and accuracy that legal professionals require for client work.

How did BlueJ achieve product-market fit after 8 years?

BlueJ achieved product-market fit by pivoting from narrow supervised machine learning models to comprehensive LLM-powered research using retrieval augmented generation. The breakthrough came when they could answer any arbitrary tax question instantly with authoritative sources, eliminating the "try it, love it, forget it" cycle. Reducing response time from 90 to 15 seconds while achieving consistent accuracy created the time-to-value dynamic that drove explosive word-of-mouth growth.

What is partial product-market fit and why is it dangerous?

Partial product-market fit occurs when your product delivers value for specific use cases but doesn't solve users' complete workflow. It's dangerous because it creates the illusion of success—paying customers, positive feedback, steady revenue growth—while masking fundamental adoption barriers. BlueJ experienced this when predictive models worked brilliantly for specific questions but failed to sustain usage because most questions fell outside narrow capabilities.

How does time-to-value impact word-of-mouth growth?

Time-to-value dramatically impacts word-of-mouth by changing referral social dynamics. When someone can validate a product's value in seconds rather than hours, recommending it becomes socially safe—minimal time investment if it doesn't work, significant gratitude if it does. This asymmetric payoff (low risk, high reward) encourages natural referrals. BlueJ's instant validation enables 10+ new firms daily through word-of-mouth alone.

Can vertical AI companies compete with ChatGPT?

Vertical AI companies compete successfully by offering authoritative curated content that general models can't access, ensuring currency and accuracy professionals demand, providing verifiable sources for liability purposes, and optimizing user experience for specific workflows. BlueJ maintains behind-paywall legal content collections ChatGPT cannot access and ensures every answer reflects current law—critical differentiators for tax professionals needing defensible accuracy.

What technical architecture enables BlueJ to answer any tax question?

BlueJ uses retrieval augmented generation (RAG), combining intelligent search across curated tax materials with LLM synthesis capabilities. The system maintains a master corpus of authoritative materials with sophisticated vector embeddings and chunking, runs intelligent searches to find relevant content for each query, and uses large language models to synthesize materials into coherent answers with full citation to primary sources.

How long does it take to achieve product-market fit in legal tech?

Based on BlueJ's journey, achieving true product-market fit in legal tech can take significantly longer than consumer or standard B2B software—eight years in their case from founding to the pivot that unlocked explosive growth. Legal professionals are risk-averse with high accuracy requirements and need solutions integrating into established workflows. The timeline depends heavily on technical feasibility and achieving consistency legal professionals demand before changing behavior.


Want More Founder Stories Like This?

This article is based on an episode from The Product Market Fit Show, where host Pablo Srugo interviews successful founders about their journeys from zero to PMF and beyond.

Listen to the full conversation with Ben Alaire to hear more about the technical decisions behind RAG for legal research, competitive dynamics as horizontal AI improves, and why Ben thinks most legal AI startups will fail.

🎧 Listen to the episode here →

Subscribe on Apple Podcasts | Spotify | YouTube