INDEX06 PROJECTS

03 / BUILDS

PROJECTS

The things I've built — from SaaS platforms and NLP tools to systems programming, data visualization, and deep learning.

01

Wheelbase

Multi-tenant dealership management SaaS platform encompassing auction management, vehicle inventory tracking, reconditioning workflows, AI-powered operations assistance, collaborative document management, and programmatic marketing video generation. Built as a Turborepo monorepo with Bun workspaces containing 5 applications and 4 shared packages.

Full vehicle lifecycle management: VIN decoding → inventory intake → customizable status pipeline (Kanban + list views) → reconditioning workflow (stage definitions, holds, parts orders, work orders, inspections) → auction scheduling with runlist CSV import and per-car assessment wizards. AI assistant that converts natural language into tenant-scoped SQL queries (SELECT-only), performs risk-assessed write operations (high-risk ops require explicit approval tokens), and personalizes responses via context documents (USER.md, DEALERSHIP.md, TEAM.md). Real-time collaborative document editor (Vault app) with TipTap, Yjs + Supabase Realtime for multiplayer editing, version history, folder hierarchy, and template gallery. Multi-tenant architecture with Row-Level Security — all queries scoped by tenant_id, with role-based access control (admin, manager, tech, detailer, read_only) and per-tenant customization of status pipelines, recon stages, and demand categories.

Technical Highlights

MonorepoTurborepo + Bun workspaces orchestrating 5 apps and 4 shared packages (@wheelbase/ui, /utils, /types, /config) with subpath exports for tree-shakeable type imports.
Frontend (Main App)Next.js 16 App Router, React 19, TypeScript, tRPC for end-to-end type-safe API layer (20+ routers, 150+ procedures), TanStack Query for server state with optimistic updates, Zustand for global state, Shadcn/ui component library.
Go VIN DecoderGin HTTP framework with a self-contained VIN decoder backed by a ~2GB local SQLite database (NHTSA data, ~1.6M pattern rows, ~8.7M valid-character rows). Decoding pipeline: extract model year from position 10 with 30-year cycle logic, WMI lookup for manufacturer/make, multi-pass VDS pattern matching (positions 4–8) to decode body style, engine, drive type, and model. In-memory caches, custom pattern parser (no regex), check-digit validation, auto-correction for single-character errors, and ranked candidate resolution.
Go Runlist UploadStreaming CSV processing with constant memory usage (csv.Reader row-by-row, not ReadAll), dynamic column-to-field mapping via ImportFlow configurations fetched from Supabase, client-side UUID generation for batching, 500-record batch inserts with atomic-like guarantees — if RunlistCar link insertion fails, the service auto-deletes the just-inserted cars to prevent orphaned records. Strict JSON structure enforcement (no omitempty) to satisfy PostgREST bulk insert requirements.
AI SystemNatural language → DSL query builder with tenant-scoped execution, risk classification for write operations (low/high), approval token workflow for destructive actions, context document versioning with draft/publish, streaming chat via OpenRouter proxy, and full query/write telemetry.
Real-time Collaboration (Vault)TipTap editor with extensions (code blocks, tables, task lists, KaTeX math, Mermaid diagrams), Yjs CRDT for conflict-free concurrent editing, Supabase Realtime as transport layer, 2-second debounced auto-save, version history with restore.
Multi-Tenancy & SecurityPostgreSQL Row-Level Security on all tables, tenant isolation via tenant_id scoping, role-based access control, tenant/dealership switching via cookies + API routes, custom feature preferences per tenant.
Database30+ PostgreSQL tables across inventory, auctions, reconditioning, workflow, AI, document vault, and demand insights domains — with audit trails (status history tables), JSONB fields, and Supabase Edge Functions.
InfrastructureDocker Compose for multi-service deployment (frontend, landing, backend), multi-stage Go build with CGO for SQLite, MinIO object storage for the VIN database (~2GB, downloaded on container startup), Remotion for programmatic video generation.

Stack

Next.js 16React 19TypeScripttRPCTanStack QueryZustandShadcn/uiTailwind CSSViteTanStack RouterHonoTipTapYjsGo 1.25GinSQLiteSupabaseMinIOTurborepoBunDockerRemotion 4.0

02

Grammario

A production-ready, full-stack linguistic analysis platform that helps language learners understand grammar through interactive visualizations and AI-powered explanations. Deployed and publicly accessible at grammario.ai.

Users enter a sentence in one of 5 supported languages (Italian, Spanish, German, Russian, Turkish) and receive deep grammatical analysis: tokenization, lemmatization, POS tagging, morphological analysis, and dependency parsing — all visualized as interactive syntax trees and linear dependency graphs. LLM-generated pedagogical insights explain why grammar works the way it does (e.g., "Why does this verb end in -a?"), providing rules, examples, and cultural nuance. Gamification layer (streaks, XP/levels, achievements, daily goals) and a spaced-repetition vocabulary system (SM-2 algorithm) to drive engagement.

Technical Highlights

FrontendNext.js 16 (App Router, SSR, API routes), React 19, TypeScript, Tailwind CSS v4, ReactFlow for interactive node-graph visualizations, Dagre for automated tree layout, Zustand for client state with localStorage persistence, TanStack Query for server state and caching, Framer Motion for animations.
BackendFastAPI (async Python 3.11), Stanford NLP (Stanza) pipelines for linguistic analysis with LRU-based model caching (up to 5 language models in memory with eviction), Pydantic v2 for schema validation.
NLP Strategy PatternLanguage-family-specific processing — RomanceStrategy (clitics, multi-word token expansion), InflectionStrategy (case governance, verbal aspect), AgglutinativeStrategy (Turkish morpheme segmentation including vowel harmony, consonant softening, buffer consonants).
LLM IntegrationDual-provider setup (OpenRouter primary, OpenAI fallback) with response caching (100-entry manual cache), JSON-mode parsing, and language-specific prompt engineering for pedagogical output.
Auth & DatabaseSupabase Auth (email/password + Google OAuth, PKCE flow), PostgreSQL with Row-Level Security policies for complete user data isolation, auto-triggers for profile creation and timestamp management.
InfrastructureDockerized (multi-stage production builds, non-root user), Docker Compose orchestration, Nginx reverse proxy with rate limiting (10 req/s), SSL via Let's Encrypt/Certbot, GitHub Actions CI/CD pipeline (test → build → push to GHCR → SSH deploy to DigitalOcean), frontend on Vercel.

Stack

Next.js 16React 19TypeScriptTailwind CSS v4ReactFlowZustandTanStack QueryFramer MotionRadix UIAxiosFastAPIPython 3.11StanzaOpenAI SDKPydantic v2UvicornSupabaseJWTDockerDocker ComposeNginxLet's EncryptGitHub ActionsVercelDigitalOceanRemotion

03

Global Terrorism Data Visualization Dashboard

A full-stack data visualization application that transforms 177,000+ records from the Global Terrorism Database (GTD, START consortium) into an interactive 3D globe interface with a comprehensive analytics dashboard.

Renders terrorism incidents (1970–2017) as geospatial points on an interactive 3D globe with hover tooltips showing location, attack type, and casualty details. Provides seven analytics views — Overview, Trends, Regions, Attack Types, Targets, Weapons, and Hotspots — each with detailed statistical breakdowns. Backend processes and serves the full 177K-record dataset through a REST API with data cleaning, NaN handling, type conversion, and performance-conscious sampling (5,000 points for smooth globe rendering).

Technical Highlights

FrontendReact, Globe.gl for WebGL-based 3D earth rendering with realistic textures and night-time lighting, Tailwind CSS, Axios for API communication, Vite as build tool.
BackendFastAPI with Pandas for data manipulation and analysis across 177K+ records, Uvicorn ASGI server, CORS middleware, RESTful API design with 8 endpoints (health check, incidents with configurable limits, and 6 analytics endpoints).
Data EngineeringBuilt a data pipeline that cleans raw GTD data — handling missing coordinates, NaN imputation, type coercion for JSON serialization — and serves it through analytical aggregation endpoints (trends, regional breakdowns, hotspot ranking).
PerformanceImplemented data sampling strategies to balance visualization density against rendering performance, lazy loading for on-demand data fetching, and backend caching.
InfrastructureDockerized with Docker Compose (backend + frontend + Nginx reverse proxy), single-image build option available.

Stack

ReactGlobe.gl (WebGL)Tailwind CSSAxiosViteFastAPIPythonPandasUvicornDockerDocker ComposeNginx

04

Teen Phone Addiction Prediction Dashboard

A full-stack machine learning project that predicts teen phone addiction levels from lifestyle and behavioral survey data, combining model training with experiment tracking, a real-time prediction API, and an interactive data exploration dashboard.

Trains a RandomForestRegressor to predict an Addiction_Level score (0–10 continuous) from survey features, tracked with a comprehensive metric suite (MSE, RMSE, R², MAE, MAPE, Max Error, Median AE). Serves predictions through a FastAPI /predict endpoint that automatically loads the latest registered model from MLflow. Provides a Streamlit dashboard for exploratory data analysis, feature importance visualization, and an interactive prediction playground where users can adjust inputs and see real-time model output.

Technical Highlights

ML Pipelinescikit-learn RandomForestRegressor with MLflow experiment tracking — every training run logs parameters, metrics, and the serialized model artifact, enabling reproducible experiments and automatic model versioning.
Prediction APIFastAPI backend with Pydantic schema validation; dynamically loads the latest MLflow-registered model at startup so deployments always serve the most recent trained version without manual model path updates.
DashboardStreamlit app with data filtering/EDA, feature importance charts (Seaborn/Matplotlib), and a prediction playground that calls the FastAPI backend in real-time.
InfrastructureDockerized for reproducible deployment.

Stack

scikit-learnMLflowFastAPIPydanticUvicornStreamlitPandasSeabornMatplotlibDocker

05

Skin Cancer Classification CNN

A deep learning application for binary classification of skin lesions as benign or malignant, built on the HAM10000 dermatoscopy dataset (~10,000 images). Dual-framework implementations in both TensorFlow/Keras and PyTorch, MLflow experiment tracking, CoreML conversion for iOS/macOS deployment, and Docker containerization.

Trains convolutional neural networks to classify dermatoscopic skin lesion images into benign or malignant categories, mapping seven diagnostic types (akiec, bcc, mel, bkl, df, nv, vasc) into a binary label scheme. Provides multiple training scripts with CLI configuration for epoch count and optional CoreML model conversion for iOS/macOS deployment. Tracks experiments with MLflow, logging hyperparameters, per-epoch metrics (training accuracy, validation accuracy, loss), and model artifacts for reproducibility.

Technical Highlights

Data PipelineIngests the HAM10000 dataset (~10,000 images) across two directories, joined with a CSV metadata file via Pandas merge on image_id. Addresses class imbalance through downsampling the majority class. Images resized (224×224 for PyTorch, 128×128 for TensorFlow), normalized using ImageNet statistics (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) or scaled to [0, 1]. Train/validation split at 80/20 ratio.
PyTorch CNNThree convolutional blocks (Conv2d → ReLU → MaxPool2d) with progressively deeper filters (3→16→32→64 channels, 3×3 kernels, same padding), followed by a fully connected layer (64×28×28 → 256 units) with ReLU activation and a single sigmoid output neuron. Trained with Binary Cross-Entropy loss and Adam optimizer (lr=0.001).
TensorFlow/Keras CNNSequential model with three Conv2D + ReLU + MaxPooling2D blocks (32 filters, 3×3 kernels, valid padding), followed by Flatten, Dense(64) with ReLU, Dropout(0.5) for regularization, and a single sigmoid output unit. Includes ReduceLROnPlateau callback (factor=0.2, patience=5, min_lr=0.0001) for adaptive learning rate scheduling.
MLOpsMLflow experiment tracking under "Skin_Cancer_Detection" — logs hyperparameters, per-epoch metrics, and model artifacts. Model checkpointing saves the best model based on validation accuracy. Training progress logged to file via Python's logging module.
DeploymentCoreML conversion via coremltools for on-device inference on iOS 13+ and macOS — PyTorch path uses torch.jit.trace for model tracing, TensorFlow path converts directly from .h5 with RGB color layout and 1/255 scale normalization. GPU-accelerated training with automatic CUDA device detection and a CUDA version switcher utility. CLI interface via argparse with --epochs and --convert flags.
Model ArtifactsMultiple export formats — PyTorch state dictionaries (.pth), TensorFlow/Keras saved models (.h5), TensorFlow SavedModel directory, and CoreML packages (.mlpackage). Git LFS configured for large model files.

Stack

PyTorchTensorFlow / Kerasscikit-learnPandasNumPyPillowMLflowCoreML ToolsDockerCUDA

06

procmon

Linux host telemetry and process-monitoring tool written in C++17 with ncurses. Real-time terminal dashboard powered directly by /proc, with process-level CPU and memory analytics, fast filtering, and suspicious-process tagging for defensive workflows.

Parses live telemetry from Linux kernel-exposed interfaces (/proc/stat, /proc/meminfo, /proc/<pid>/*) and renders a responsive low-level TUI with no heavy framework. Provides real-time host metrics (total CPU and memory utilization) alongside a full process table showing PID, state, CPU %, memory %, suspicious tag, and command. Interactive keyboard controls for sorting (by CPU, memory, or PID), live substring filtering on PID/command, and immediate non-blocking input.

Technical Highlights

Process TelemetryCollects per-process data from /proc/<pid>/stat, status, cmdline, and comm. Computes CPU % normalized against elapsed process lifetime and logical CPU count, and memory % as RSS relative to host MemTotal.
Defensive TaggingHeuristic-based suspicious process flags — TMP_EXEC for commands launched from temp/shared-memory paths, LOLBIN for living-off-the-land patterns, SPIKE for very high CPU consumers.
System MetricsEnumerates numeric process directories under /proc, computes host CPU % using delta sampling of /proc/stat, and derives host memory % from /proc/meminfo.
ncurses DashboardHigh-frequency terminal UI with interactive sort/filter operations and non-blocking keyboard input. Designed for low overhead and rapid scanning aligned with SOC/IR-style endpoint analysis.

Stack

C++17ncursesLinux /procMake
END OF INDEX