No Toy Projects: Building a Production LLM Pipeline in My First 60 Days

My name is Matheus. I am a 21-year-old Computer Engineering student in my 7th semester at Unicamp. Before joining Enter's AI Fellowship, my world revolved around machine learning: building Retrieval-Augmented Generation (RAG) architectures, training XGBoost models for churn prediction, and competing in quantitative AI challenges.

I knew how to train and prompt AI. But Software Engineering (orchestrating asynchronous microservices, designing distributed workflows, managing AWS S3 buckets, and building React frontends) was entirely new to me.

When you join a company as a student or intern, you usually expect a "toy project" like an internal dashboard or a minor bug fix. Enter took a different approach. They handed me a massive, real-world scaling problem and trusted me to architect the solution from the ground up.

Here is how I spent my first 60 days writing RFCs, learning the hard truth about modern software engineering, and building a production LLM pipeline for our AI Deployment team.

The Problem: Scaling Legal AI

Enter's AI Deployment team relies on analyzing hundreds of legal cases to generate insights for large enterprises. For our specific judgment analysis workflow, the team had built a fantastic proof-of-concept using local Python scripts. It worked, but it didn't scale: it ran locally and required constant manual intervention.

My mission was to replace this local script with a highly concurrent, full-stack web feature embedded directly into our backoffice portal.

Architecting the Validation Flow

Before we can use an LLM to analyze a list of lawsuits, identified by their National Council of Justice (CNJ) case number, we first need to verify them. Are all of the required documents in our database? Are they linked to the correct client? Do they have the actual decision documents (like a judgment or an appeal decision) attached, or is the case still ongoing?

I designed a dedicated backend router to handle this validation. Instead of making hundreds of sequential checks, the system orchestrates asynchronous HTTP calls to our internal document service.

The validation pipeline categorizes each CNJ into specific statuses, such as filtering out cases with an invalid_format, identifying cases with a not_in_any_customer status, or flagging cases that exist but are in_target_docs_but_no_decision_docs (meaning we have the case files, but the judge has not made a decision yet). Only lawsuits that successfully pass the pipeline (passes_pipeline) move forward.

Here is what the architecture for that validation looks like:

Orchestrating the LLM Pipeline

Once the lawsuits are validated, the real heavy lifting begins. I could not just process documents from 150 legal cases in a standard HTTP request because it would time out. A single batch could easily contain thousands of pages of dense legal documents, translating to millions of tokens. Processing this massive volume of data takes several minutes and requires multiple LLM calls, far exceeding standard server timeouts.

To solve this, I used Hatchet, one of our tools for managing asynchronous jobs, to orchestrate a background workflow. The architecture moves through four distinct phases: query_decisions (fetching the text), analyze (using an LLM to evaluate each case), compile (aggregating statistics), and synthesize (generating final insights).

Because the analyze step is the most expensive, I used parallel execution (fan-out) with a semaphore capping concurrent requests to the LLM API. As each step completes, the backend saves the generated Markdown artifacts to an AWS S3 output bucket and updates a PostgreSQL database so the React frontend can poll for real-time progress.

The Reality of Modern Software Engineering

Looking back, the hardest part of this project was not writing the code. With agentic systems and tools like Cursor, writing React components or FastAPI endpoints is incredibly fast. The barrier to typing code has vanished.

The real challenge, the actual engineering, starts when errors appear. AI can write a Python function, but it cannot figure out why your ArgoCD deployment is failing. It cannot navigate the nuances between environment databases, and it cannot create the S3 buckets and IAM permissions for you.

I vividly remember sitting down with my supervisor in front of an Excalidraw board to design the structure of the database tables. That was the moment I realized that software engineering is about system design. It is about understanding concepts like database transactions and isolation levels, and mechanics like async/await (shoutout to the FastAPI concurrent burgers guide, which finally made it click for me).

Real Impact

This week, I presented the first version of the final solution to more than 30 people from the product and AI deployment teams, including one of Enter’s co-founders. This happened during one of the company’s regular lunch sessions, a tradition where engineers share solutions in development with the broader team to gather feedback and exchange ideas. Seeing the team genuinely excited to use a tool I built from scratch, knowing it will save them countless hours of local processing, was the most rewarding experience of my AI Fellowship so far.

The resilience I built studying for Unicamp exams gave me the grit to learn these new tools, but Enter gave me the playground. If you want to bridge the gap between academic theory and building software that matters in the real world, you need an environment that trusts you to tackle the hard problems.

You need a place where going beyond Jupyter Notebook is part of the challenge from day one, a place that doesn’t limit you to toy projects.