Project
TensorflowJS ReactivAI
Neural networkβpowered face detection, emotion classification, voice analysis, and engagement scoring β running 100% client-side via WebGL-accelerated TensorFlow.js.
π§ Reactiv-AI
Real-Time Emotion, Voice & Engagement Analysis β Entirely in the Browser
No backend. No API calls. No data leaves your device.
Neural networkβpowered face detection, emotion classification, voice analysis, and engagement scoring β running 100% client-side via WebGL-accelerated TensorFlow.js.
Features Β· Architecture Β· Quick Start Β· How It Works Β· Model Training Β· Project Structure Β· Tech Stack
β¨ Features
| Capability | How It Works | Runs On |
|:---|:---|:---|
| π Face Detection | MediaPipe FaceMesh (468 landmarks) via TensorFlow.js | WebGL 2 |
| π Emotion Classification | Custom CNN trained on FER-2013 (7 emotions) | WebGL 2 |
| ποΈ Voice Analysis | Web Audio API β RMS energy + speech detection | AudioContext |
| π Engagement Scoring | Weighted fusion: emotion 40% + voice 30% + speech 30% | CPU |
| π 100% Client-Side | Zero backend, zero network calls β your data never leaves the browser | β |
| β‘ Real-Time | requestAnimationFrame loop with temporal smoothing for stable output | β |
| π Live Performance Metrics | FPS, inference latency, tensor count, memory usage | β |
Detected Emotions
Angry Β· Disgust Β· Fear Β· Happy Β· Sad Β· Surprise Β· Neutral
ποΈ Architecture
flowchart LR
subgraph B[Browser / Client-Side Only]
subgraph V[Video Pipeline]
Cam[Camera] --> FD[Face Detector]
FD --> Crop[Face Crop and Preprocess]
Crop --> EC[Emotion Classifier]
EC --> TS[Temporal Smoother]
end
subgraph A[Audio Pipeline]
Mic[Microphone] --> AE[Audio Engine]
AE --> EE[Engagement Engine]
end
PM[Performance Monitor]
UI[React UI]
TS --> UI
EE --> UI
PM --> UI
end
π Quick Start
# 1. Clone the repository
git clone https://github.com/your-username/TensorflowJS-ReactivAI.git
cd TensorflowJS-ReactivAI
# 2. Install dependencies
npm install
# 3. Start the dev server
npm run dev
Open http://localhost:4321/ β grant camera and microphone access, and you're live.
If youβve configured base: '/TensorflowJS-ReactivAI/' for GitHub Pages and the dev server shows a blank page, try: http://localhost:4321/TensorflowJS-ReactivAI/
Commands
| Command | Action |
|:---|:---|
| npm install | Install dependencies |
| npm run dev | Start dev server at localhost:4321 |
| npm run build | Production build β ./dist/ |
| npm run preview | Preview production build locally |
| npm run test | Run unit tests (Vitest) |
π¬ How It Works
1. Face Detection β MediaPipe FaceMesh
The app uses MediaPipe FaceMesh through TensorFlow.js to detect faces and extract 468 facial landmarks in real time. Each video frame is snapshot to an off-screen canvas and passed to the model, avoiding WebGL texture conflicts with React's DOM management.
Camera β HTMLVideoElement β Off-screen Canvas snapshot β FaceMesh model β 468 keypoints + bounding box
- Bounding box computed from keypoint extremes with 10% padding
- Single-face mode for performance
- ~20β40ms per detection on modern GPUs
2. Emotion Classification β Custom CNN
A dedicated convolutional neural network classifies the detected face region into one of 7 emotions. The face is extracted, resized to 48Γ48 grayscale, normalized to [0, 1], and passed through the model.
Model Architecture:
Input (48Γ48Γ1)
ββ Conv2D(32, 3Γ3) β BatchNorm β Conv2D(32, 3Γ3) β BatchNorm β MaxPool β Dropout(0.25)
ββ Conv2D(64, 3Γ3) β BatchNorm β Conv2D(64, 3Γ3) β BatchNorm β MaxPool β Dropout(0.25)
ββ Conv2D(128, 3Γ3) β BatchNorm β Conv2D(128, 3Γ3) β BatchNorm β MaxPool β Dropout(0.25)
ββ Flatten β Dense(256) β BatchNorm β Dropout(0.5)
ββ Dense(128) β BatchNorm β Dropout(0.3)
ββ Dense(7, softmax) β Probability distribution over 7 emotions
- 1.5M parameters / ~5.7 MB in browser
- Trained on FER-2013 (35,887 facial expression images)
- Augmented with flips, rotation, zoom, and translation
- Class-weighted loss to handle imbalanced classes (e.g., disgust has only 436 training samples)
- Temporal smoothing via exponential moving average eliminates jitter between frames
3. Voice Analysis β Web Audio API
Real-time audio analysis runs entirely in-browser using the Web Audio API:
Microphone β MediaStream β AudioContext β AnalyserNode β Float32 time-domain data β RMS Energy
- RMS Energy: root mean square of the waveform β measures volume/intensity
- Speech Detection: threshold-based binary classification (speaking vs. silent)
- Speech Continuity: rolling 30-frame window tracking ratio of speaking frames
- Echo cancellation, noise suppression, and auto-gain enabled at capture
4. Engagement Scoring β Multi-Signal Fusion
The engagement engine fuses all three signals into a 0β100 composite score:
$$\text{Score} = \bigl(\underbrace{E_{\text{confidence}}}{\text{Emotion}} \times 0.4\bigr) + \bigl(\underbrace{V{\text{energy}}}{\text{Voice}} \times 0.3\bigr) + \bigl(\underbrace{S{\text{continuity}}}_{\text{Speech}} \times 0.3\bigr) \times 100$$
| Component | Weight | Signal Source | |:---|:---|:---| | Emotion Confidence | 40% | Softmax confidence from CNN | | Voice Energy | 30% | Normalized RMS from microphone | | Speech Continuity | 30% | Rolling speaking ratio (30-frame window) |
5. Performance Monitoring
A built-in performance monitor tracks:
- FPS β frames processed per second
- Face Detection Latency β ms per face detection call
- Emotion Inference Latency β ms per CNN forward pass
- Tensor Count & Memory β active WebGL tensors and allocated bytes (leak detection)
ποΈ Model Training
The emotion CNN is trained offline in Python and exported to TensorFlow.js format using a custom pure-Python converter (no tensorflowjs pip package required).
Prerequisites
pip install -r scripts/requirements-python-training.txt
# β tensorflow==2.15.0, numpy==1.26.4, Pillow
Train on FER-2013
# From image directory (recommended β Kaggle's pre-split format)
python scripts/train-emotion-model.py \
--data-dir /path/to/fer2013 \
--epochs 50 \
--batch-size 64
# From CSV file
python scripts/train-emotion-model.py \
--data /path/to/fer2013.csv \
--epochs 50
# Quick pipeline test with synthetic data
python scripts/train-emotion-model.py --synthetic
The script automatically:
- Loads and normalizes images (48Γ48 grayscale,
[0, 1]) - Computes per-class weights (capped at 3Γ) for imbalanced classes
- Trains with augmentation (flips, rotation, zoom, shift)
- Saves Keras model to
artifacts/emotion_model.keras - Converts to TensorFlow.js at
public/models/emotion_model/
Custom Keras β TFJS Converter
The standard tensorflowjs Python package is notoriously difficult to install. We built a pure-Python converter that produces identical output:
python scripts/convert_keras_to_tfjs.py \
--keras artifacts/emotion_model.keras \
--out public/models/emotion_model
It handles:
- Keras 3 β TFJS-compatible topology stripping (
module,registered_name,build_config) - Weight name sanitization (removes
:0suffixes) - Single-shard binary packing (
group1-shard1of1.bin)
A/B Testing Models
Switch models at runtime via URL parameter:
http://localhost:4321/?model=emotion_model_py
π Project Structure
TensorflowJS-ReactivAI/
βββ public/
β βββ models/
β βββ emotion_model/ # TFJS model files served as static assets
β βββ model.json # Model topology + weights manifest
β βββ group1-shard1of1.bin # Binary weight data (~5.7 MB)
β
βββ src/
β βββ pages/
β β βββ index.astro # Main app page
β β βββ debug.astro # Face detection debugger
β β
β βββ components/
β β βββ EmotionAnalyzer.tsx # Main React component (camera, loop, UI)
β β
β βββ ml/ # Machine learning modules
β β βββ tfSetup.ts # WebGL 2 backend initialization
β β βββ faceDetector.ts # MediaPipe FaceMesh wrapper
β β βββ emotionClassifier.ts # CNN model loading + inference
β β βββ temporalSmoother.ts # EMA filter for stable predictions
β β
β βββ audio/
β β βββ audioEngine.ts # Web Audio API: RMS energy + speech detect
β β
β βββ scoring/
β β βββ engagementEngine.ts # Multi-signal engagement scoring
β β
β βββ monitoring/
β β βββ performanceMonitor.ts # FPS, latency, memory tracking
β β
β βββ utils/
β β βββ math.ts # Clamp, lerp, mean, std deviation
β β βββ normalization.ts # Pixel normalization, z-score, min-max
β β
β βββ shims/ # Browser shims for Node-only packages
β βββ node-fetch.ts
β βββ whatwg-url.ts
β
βββ scripts/ # Offline training pipeline (Python)
β βββ train-emotion-model.py # Full training script (FER-2013 / synthetic)
β βββ convert_keras_to_tfjs.py # Pure-Python Keras β TFJS converter
β βββ train-emotion-model.mjs # Legacy JS training pipeline
β βββ test_model.py # Model diagnostic tests
β βββ verify_conversion.py # Bit-level weight verification
β βββ requirements-python-training.txt
β
βββ types/ # TypeScript declarations
βββ astro.config.mjs # Astro config (static output, GitHub Pages)
βββ tsconfig.json
βββ vitest.config.ts
βββ package.json
π οΈ Tech Stack
Runtime (Browser)
| Technology | Role |
|:---|:---|
| TensorFlow.js 4.22 | Neural network inference (WebGL 2 GPU-accelerated) |
| MediaPipe FaceMesh | 468-point facial landmark detection |
| Web Audio API | Real-time microphone RMS energy & speech detection |
| WebRTC | Camera access via getUserMedia |
| React 19 | Reactive UI with real-time metric dashboards |
| Astro 5.17 | Static site framework (zero JS overhead for shell) |
| TypeScript 5.9 | Type safety across all modules |
Training (Offline, Python)
| Technology | Role |
|:---|:---|
| TensorFlow / Keras 2.15 | CNN architecture, training, and augmentation |
| FER-2013 Dataset | 35,887 labeled facial expression images |
| Pillow | Fast image I/O (grayscale conversion, resizing) |
| Custom TFJS Converter | Pure-Python Keras β TFJS export (no tensorflowjs dep) |
| NumPy 1.26 | Numerical operations and data manipulation |
Testing
| Technology | Role | |:---|:---| | Vitest 4 | Unit tests for math, normalization, scoring, smoothing |
π Key Design Decisions
Why No Backend?
Privacy and portability. All neural network inference runs on the user's GPU via WebGL. Camera frames and microphone audio are processed locally β nothing is transmitted over the network. The entire app deploys as static files to GitHub Pages.
Why a Custom TFJS Converter?
The official tensorflowjs Python package has heavy native dependencies and frequent installation failures. Our pure-Python converter (convert_keras_to_tfjs.py) produces bit-identical output using only tensorflow + numpy β verified by comparing all 22 weight tensors against the original Keras model.
Why Temporal Smoothing?
Raw per-frame emotion predictions are noisy β a face might flicker between "Happy" and "Neutral" across consecutive frames. An exponential moving average (Ξ± = 0.3) stabilizes output while remaining responsive to genuine expression changes.
Why Canvas Snapshot for Face Detection?
Passing HTMLVideoElement directly to estimateFaces() fails when the video is rendered inside React's component tree β WebGL's texImage2D cannot reliably read pixels from a video whose layout is managed by CSS transforms and positioned containers. Snapshotting to an off-screen canvas solves this reliably.
π License
This project is licensed under the Apache License 2.0. See LICENSE.
Built with neural networks, Web APIs, and zero backend dependencies.
All AI inference happens on your device. Your data stays yours.