Backend Engineering

How I Built Career AI Using FastAPI

Published on March 27, 2026 • 12 Min Read • Written by Bhuvanesh V

Introduction

When building Career AI, my objective was to create a system that could ingest a user's resume, extract and assess their skill profile, and output a highly personalized roadmap. While the frontend logic represents the roadmap nodes dynamically, the core engineering challenge lies in the backend orchestration.

Generating high-quality structured data from unstructured resumes requires complex prompt chaining, document processing, and vector indexing. Doing this within a standard synchronous HTTP thread blocks the event loop, causing major degradation in page performance. In this article, I will detail how I designed the backend of Career AI using asynchronous FastAPI, LangChain, and structured Pydantic schema validation.

Why FastAPI?

For IO-bound workloads like calling external LLM APIs, asynchronous programming is a necessity. Standard frameworks like Flask or Django (by default) spawn a thread block for every incoming request. When a user uploads a resume, the server reads the file, extracts the text, and waits for a response from the OpenAI API (which can take several seconds). During this wait, the server thread is idle but blocked. If fifty users upload their resumes simultaneously, the server exhausts its thread pool, causing incoming connection requests to time out.

FastAPI solves this by utilizing Python's native asyncio event loop. Instead of blocking, the thread yields control back to the event loop using the await keyword, allowing the server to process other incoming connections while waiting for the AI response. Below is a simple performance comparison:

  • Thread-per-request Model (Flask): Inefficient memory utilization. Max throughput is limited by the system thread pool size.
  • Asynchronous Event Loop (FastAPI): Extremely low memory overhead. Capable of handling thousands of concurrent connections on a single instance.

The Resume Processing Pipeline

The backend pipeline consists of three core milestones:

  1. File Ingestion and Text Extraction: Uploading the PDF binary, validating the size, and extracting the raw characters.
  2. Structured Parsing with LangChain: Passing the raw string to an LLM chain with a specific output schema format.
  3. Validation and Insertion: Validating the output structure using Pydantic and caching it in PostgreSQL.

1. File Ingestion

In FastAPI, we handle uploads using the UploadFile parameter. This streams the file payload into temporary memory buffer files instead of loading the entire binary into RAM, preventing memory spikes.

from fastapi import APIRouter, UploadFile, File, HTTPException
import pypdf

router = APIRouter()

@router.post("/assess/resume")
async def upload_resume(file: UploadFile = File(...)):
    # Validate file size (limit to 5MB)
    MAX_FILE_SIZE = 5 * 1024 * 1024
    content = await file.read()
    if len(content) > MAX_FILE_SIZE:
        raise HTTPException(status_code=400, detail="File size exceeds the 5MB limit.")
    
    # Read text from PDF
    try:
        reader = pypdf.PdfReader(file.file)
        text_content = ""
        for page in reader.pages:
            text_content += page.extract_text()
    except Exception as e:
        raise HTTPException(status_code=422, detail="Failed to parse PDF content.")
        
    # Yield control to the AI parser
    assessment = await analyze_resume_text(text_content)
    return assessment

2. Structured Parsing with LangChain

Once we have the raw text, we need to extract a structured schema containing skills, experience levels, and gaps. Passing unstructured text to a raw prompt often yields random markdown formatting. To guarantee consistent structure, we utilize LangChain's structured output parser combined with Pydantic schemas.

First, we define our target schema model:

from pydantic import BaseModel, Field
from typing import List

class TechSkill(BaseModel):
    name: str = Field(description="Normalized name of the skill")
    years_experience: float = Field(description="Years of usage, estimated from project durations")
    assessment_level: str = Field(description="Classified level: Novice, Competent, or Expert")

class CandidateProfile(BaseModel):
    suggested_role: str = Field(description="The matching career path")
    extracted_skills: List[TechSkill] = Field(description="List of detected technical skills")
    educational_gaps: List[str] = Field(description="Concepts missing to achieve target role proficiency")

Next, we configure the LangChain pipeline to query the model and parse the output using the defined schema:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser

async def analyze_resume_text(text: str) -> CandidateProfile:
    # Initialize LLM with strict output structure support
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.0)
    structured_llm = llm.with_structured_output(CandidateProfile)
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert technical recruiting auditor. Analyze the resume and structure the candidate profile."),
        ("user", "Candidate Resume Text:\n{resume_text}")
    ])
    
    # Run the chain asynchronously
    chain = prompt | structured_llm
    response = await chain.ainvoke({"resume_text": text})
    return response

Handling Generative Latency

Even with gpt-4o-mini, compiling a structured profile and generating a personalized roadmap can take several seconds. To keep the UI responsive, we implemented two primary techniques:

1. Async Background Tasks

When a user requests a highly detailed, 10-milestone roadmap, we decouple the request-response lifecycle. FastAPI features a native BackgroundTasks helper. The endpoint extracts the resume text, writes a "Pending" status record to the database, initiates a background thread to generate the roadmap, and returns a 202 Accepted status immediately.

The frontend receives the response and displays an interactive progress animation while polling the database status key. Once the background process completes, the database record is updated to "Ready", and the frontend renders the resulting roadmap.

2. JSON Schema Caching

Many users upload resumes with similar profiles (e.g., college graduates with basic Python and HTML experience). Querying the LLM for every upload wastes API credits.

To optimize this, we generate a SHA-256 hash of the extracted raw resume text. Before calling the OpenAI API, the backend queries the database for matching hashes. If a match exists, it retrieves the cached JSON profile directly from the database, reducing response latency from 6 seconds to under 150 milliseconds.

Database Schema & Storage

To manage roadmap status, we use a relational database with the following table structure:

  • users: Manages user credentials and session tokens.
  • resumes: Stores file metadata, raw text content, and the SHA-256 hash.
  • roadmaps: Links users to their career paths, storing milestones as JSON arrays.
  • milestone_progress: Tracks completed milestones and user scores on linked quizzes.

Conclusion

Building Career AI taught me that when designing AI-powered applications, the backend must be built to handle unpredictable latencies and format variations. FastAPI's native support for asynchronous programming, combined with LangChain's structured parser and Pydantic validation, creates a robust foundation for handling LLM integrations at scale.