The Prioritization Game

For almost six months, we knew our product was incomplete. We knew exactly what was missing. We knew how much better it could be and we deliberately chose not to fix it. This is a story about why.

Act 1: The Early Days

We had to choose carefully what deserved attention and what had to wait.

We started Enter performing basic extractions and summarizations of legal documents. Our users selected the desired workflow, inputted documents for AI analysis and then reviewed the outputs generated by AI. Once review was done they could download the final version of a legal answer, which was generated by rendering the reviewed fields into a custom template.

Enter workflow — documents in, AI analysis, review, then download.

It was a straightforward process, but one intrinsically capable of saving a lot of time for our customers.

This workflow, which started in mid 2023, remained as the company's most fundamental pillar for very long, sustaining most of our clients' demands, generating value for our customers and over $10 million dollars in ARR.

For lawyers who used to write everything manually, even reviewing AI-generated content felt magical. At first…

Over time, we realized something: the workflow was effective, but painfully manual. To verify a single field, users had to dig through folders, open PDFs, scroll endlessly, and cross-check information. And since they were working on dozens of cases in parallel, everything quickly became a mess. Very manual, very tedious, very chaotic.

We wanted to change that. Our vision was simple: everything in one place. We imagined a small "eye" icon next to each field. Click it, and the platform would open the exact document, at the exact region where the information could be verified.

Today, that sounds obvious. Back then, it was not. We were operating with limited resources. Every engineering hour mattered. We had to choose carefully what deserved attention and what had to wait. So we waited.

For months, we carried that idea in our heads. We felt the pain. We wanted to fix it. But scaling customers, stabilizing infrastructure, and shipping essentials came first. We had a lot to do, and it wasn't possible to do everything at once (at least not back then before Cursor 🙂).

We weren't building the product we dreamed about. We were building the product our customers needed. That was the first lesson of the prioritization game.

Act 2: The First Attempt

It meant we were building something they genuinely needed.

Eventually, the time came. We started small: split the interface in half and added a document viewer beside the AI results. Users could finally browse their documents without leaving the platform.

Looking back, that small change already contained something much bigger. We didn't have the words for it yet, but we were slowly moving toward the idea that later became EnterOS: a place where documents, models, decisions, and evidence would all live together, instead of being scattered across tools and folders. At the time, it was just a practical improvement. In hindsight, it was the first brick of a much larger system.

The next step was the "eye" button: AI-powered visual grounding. Our first idea was straightforward: for each field, we asked the language model to return the exact text snippet that supported its answer. Then, in the frontend PDF viewer, we searched for that snippet and highlighted it. In theory, it was perfect. In practice, it wasn't.

We already had dozens of complex prompts, and more were being created every week. Asking prompt writers to manually request sources for every field would not scale.

So we built an internal prompting syntax. Writers kept their freedom, but followed structural rules that allowed us to programmatically inject source requests and validate prompts at upload time. Every prompt in the system was processed to include requests for supporting transcriptions. Simple. Elegant.

By mid-2024, models were good enough at extracting these snippets. The problem was finding them back in the original documents.

We sent OCR output to the models. OCR encodes layout into text using things like line breaks and spacing. PDFs store text differently: as blocks whose visual layout is reconstructed at render time.¹

So even when the model returned the "correct" text, it often didn't match the viewer's representation. This meant our feature was amazing for short transcriptions contained in a single line. But whenever the model had to use a couple more words to explain something, the chance of the exact string matching works fell to the ground.

In our internal tests, only about 20% of fields had usable audit links. We were so amazed that it worked at all that we weren't bothered by this number.

Then we showed it to our first customer. They loved it. We released it. A few hours later, reality hit…

Users started complaining about every field that didn't have the button. And since that was about 80% of them, they complained desperately! They loved the feature so much that they hated it when it didn't work. And suddenly, 80% failure felt unacceptable.

Paradoxically, that was good news. It meant we were building something they genuinely needed. But more importantly, it confirmed that we postponed it in our roadmap as long as it was possible. Not a day after and not a day before.

We had won the first round of the prioritization game.

Act 3: The Reset

Perfection is rarely the priority. Value is.

But we still had a problem. Complaints were intense. So we made a counterintuitive decision: we turned the feature off. We told the customer we'd fix it quickly.

I had spent the entire week building the first version… And guess what? It was friday…

Good news! We had the whole weekend to figure out a way.

My first instinct was to improve string matching: smarter heuristics, better normalization, more rules. Then Mike, our CTO, came with his brilliant vision and suggested something different. "Let's step back.", he said. Then he argued that consistency mattered more than precision in this case. Users preferred reliable, slightly coarse results over fragile precision. So we redesigned it.

Instead of matching LLM returned transcriptions, we rather mapped the document into numbered regions and asked the model to point out region numbers as sources. Again a very simple idea. So we just split each document's page in 4 horizontal chunks and always send to the model the content of the document with explicit chunks division written among the text.

We were relieved for having something that users would not complain about. But we didn't like it as much as the other version, because now the audit button always showed the same sized blocks to the users. Even if the exact origin of the information was clearly very short (e.g. a person's name or a monetary value) we would highlight an entire fourth of a page. And those were the situations in which the transcriptions-based approach would thrive and return a precise result.

But the results were clear: + 90% hit rate. We relaunched and users were happy.

As engineers, we're natural optimizers. We always want something more elegant, more refined. But at a fast-growing company, perfection is rarely the priority. Value is. So we stayed with the chunking approach and moved on.

Worth mentioning, this version was demoed by our co-founder and CEO to Sam Altman when they had a chat in mid-2024, and Sam realized OpenAI should invest in embedding document traceability into their model outputs.

Act 4: The Breakthrough

The breakthrough didn't come from a grand plan, it came from growth.

So the day came when it was time to have something more fine grained. I would love to tell an inspiring story about how we went to the next level. But it was actually more casual.

As Enter scaled, we hired two brilliant junior engineers. One was visiting from France. The other was returning from a research internship in the US, and they needed scoped onboarding projects. We gave them real problems.

One built a backend service to search inside documents using approximate string matching and return precise bounding boxes. The other rebuilt our transcription-based prompting system, integrating chunking as a fallback.

Now, matching happened server-side during the workflow execution, with more time, better algorithms, and more flexibility. The frontend no longer searched for text. It simply highlighted coordinates. Today, we use this hybrid approach.

When transcription matching works, we get precise sources. When it doesn't, chunking guarantees coverage. It's reliable. It's fast. It scales.

And it works.

Victory.

Our users are happy.

We are happy.

Closing: Learning to Wait

Looking back, the hardest part wasn't building this feature. It was knowing when not to. At every stage, there was something tempting to optimize early. Something elegant to build prematurely. Resisting that urge is part of growing up as a company, and required discipline.

At Enter, prioritization isn't about ignoring good ideas. It's about protecting them until the moment they can actually succeed.

It's about building foundations first, listening carefully, and waiting for the moment when effort turns into real impact.

That's the game we're still learning to play.

And, slowly, we're getting better at it.

PDFs store chunks as blocks. The final visual aspect of a block, like the amount of lines in which its text is disposed, depends on factors like block coordinates and font sizes. But the text inside the block is contiguous, so PDF readers can read it as it is. On the other hand, OCR tries to encode the visual information inside the raw text. So it inserts special characters with relevant spacial information, like line breaks and carriage returns. ↩

Act 1: The Early Days

Act 2: The First Attempt

Act 3: The Reset

Act 4: The Breakthrough

Closing: Learning to Wait

Footnotes