Day 08: AI Chatbot Project & API Architectures

Back to Journal Index

On Day 8, the curriculum shifted from simple backend routes to integrating generative intelligence. The goal was to build a complete web-based AI chatbot interface, passing user input from the HTML page through a Flask server, sending it to an LLM, and returning the response dynamically to render.

1. Local vs. API-Driven Inference

We explored two methods of connecting an AI model: running a local pipeline with TinyLlama loading all model weights directly into host memory, or calling remote inference endpoints via the Hugging Face API to run Qwen2.5-7B-Instruct. While a local pipeline allows offline execution, its download size is massive (2GB+) and tokens generate very slowly on standard student laptop CPUs.

2. Input Validation and Exception Boundaries

To make the chatbot robust, we focused on error boundaries: validating that user input is not empty before posting, handling model-generated prompt prefixes, and wrapping requests in try-except blocks to catch API timeouts or offline network drops. This returns user-friendly messages rather than crashing the Flask runtime.

3. The Developer Speed Gap: Torch Downloads vs. Fast Requests

The lab highlighted a stark difference in developer speed. While my classmates struggled to install transformers, torch, accelerate (over 2GB in dependencies) and faced memory allocation crashes on their slow dual-core laptops, I skipped the local download entirely. I set up an asynchronous requests loop calling the remote Qwen-7B endpoint. When Rashmi checked my screen, she was amazed. While others were still installing PyTorch, I was chatting with a zero-latency chatbot. She highly praised my choice, explaining to the class that engineering is about matching resource constraints, not loading massive dependencies when a simple remote request works in milliseconds.

Key LearningsDesigning request-response data flows between front-end inputs, back-end servers, and remote AI APIs.
Analyzing the architectural trade-offs of local pipelines (TinyLlama) vs. remote Inference APIs (Qwen).
Implementing input sanitization and exception handling (try-except) to prevent runtime crashes.
Configuring system prompts, max tokens, and temperature coefficients for generative response boundaries.

Tools & StackFlask
Hugging Face Inference API
Requests Library
Qwen2.5-7B-Instruct
Python 3.11

Challenges OvercomeManaging API connectivity limits and handling request connection timeout exceptions gracefully.
Filtering out model-generated prompt prefixes from the final chatbot response display.

Task to be PerformedSet up a Flask server routing POST request data fields from an HTML form.
Connect to a hosted AI model using a secure Hugging Face Authorization header.
Incorporate robust exception handling for empty user inputs and network connection drops.

Related Logs

Day 00

Onboarding & Exploration: Mapping the Core AI & Python Blueprint

Onboarding at Virtual Height, introducing the training program, providing an overview of the curriculum, and mapping the 2-week schedule.

May 29, 2026 • 4 min readRead Log ➔

Day 01

Day 01: Foundations of AI, Python Lab, and the Velocity of Domain Mastery

Onboarding under Senior AI Trainer Rashmi, mapping the formal pillars of AI from ML to DL, mastering cross-platform Python execution, and demonstrating custom LLM and reverse engineering portfolios to the cohort.

Jun 01, 2026 • 4 min readRead Log ➔

Day 08: The AI Chatbot Project & Architectural Pragmatism

1. Local vs. API-Driven Inference

2. Input Validation and Exception Boundaries

3. The Developer Speed Gap: Torch Downloads vs. Fast Requests

Key Learnings

Tools & Stack

Challenges Overcome

Task to be Performed

Founder Reflections

Related Logs

Onboarding & Exploration: Mapping the Core AI & Python Blueprint

Day 01: Foundations of AI, Python Lab, and the Velocity of Domain Mastery