On Day 8, the curriculum shifted from simple backend routes to integrating generative intelligence. The goal was to build a complete web-based AI chatbot interface, passing user input from the HTML page through a Flask server, sending it to an LLM, and returning the response dynamically to render.

1. Local vs. API-Driven Inference

We explored two methods of connecting an AI model: running a local pipeline with TinyLlama loading all model weights directly into host memory, or calling remote inference endpoints via the Hugging Face API to run Qwen2.5-7B-Instruct. While a local pipeline allows offline execution, its download size is massive (2GB+) and tokens generate very slowly on standard student laptop CPUs.

2. Input Validation and Exception Boundaries

To make the chatbot robust, we focused on error boundaries: validating that user input is not empty before posting, handling model-generated prompt prefixes, and wrapping requests in try-except blocks to catch API timeouts or offline network drops. This returns user-friendly messages rather than crashing the Flask runtime.

3. The Developer Speed Gap: Torch Downloads vs. Fast Requests

The lab highlighted a stark difference in developer speed. While my classmates struggled to install transformers, torch, accelerate (over 2GB in dependencies) and faced memory allocation crashes on their slow dual-core laptops, I skipped the local download entirely. I set up an asynchronous requests loop calling the remote Qwen-7B endpoint. When Rashmi checked my screen, she was amazed. While others were still installing PyTorch, I was chatting with a zero-latency chatbot. She highly praised my choice, explaining to the class that engineering is about matching resource constraints, not loading massive dependencies when a simple remote request works in milliseconds.

Key Learnings

  • Designing request-response data flows between front-end inputs, back-end servers, and remote AI APIs.
  • Analyzing the architectural trade-offs of local pipelines (TinyLlama) vs. remote Inference APIs (Qwen).
  • Implementing input sanitization and exception handling (try-except) to prevent runtime crashes.
  • Configuring system prompts, max tokens, and temperature coefficients for generative response boundaries.

Tools & Stack

  • Flask
  • Hugging Face Inference API
  • Requests Library
  • Qwen2.5-7B-Instruct
  • Python 3.11

Challenges Overcome

  • Managing API connectivity limits and handling request connection timeout exceptions gracefully.
  • Filtering out model-generated prompt prefixes from the final chatbot response display.

Task to be Performed

  • Set up a Flask server routing POST request data fields from an HTML form.
  • Connect to a hosted AI model using a secure Hugging Face Authorization header.
  • Incorporate robust exception handling for empty user inputs and network connection drops.

Related Logs

Day 00

Onboarding & Exploration: Mapping the Core AI & Python Blueprint

Onboarding at Virtual Height, introducing the training program, providing an overview of the curriculum, and mapping the 2-week schedule.

May 29, 20264 min readRead Log
Day 01

Day 01: Foundations of AI, Python Lab, and the Velocity of Domain Mastery

Onboarding under Senior AI Trainer Rashmi, mapping the formal pillars of AI from ML to DL, mastering cross-platform Python execution, and demonstrating custom LLM and reverse engineering portfolios to the cohort.

Jun 01, 20264 min readRead Log