Shaun Heffernan · calibration sheet, rev. 2026 input: ideas · output: software

Shaun Heffernan

I'm a computer science student at the University of Florida building software that ships: machine learning systems, real-time pipelines, and full-stack apps.

the signal · stage 01 · acquisition
sec. 01

Selected work

Wake Word Detection via Transfer Learning

transfer learningCNNmel spectrograms“Go Gators”flagship

We had 360 labeled clips of “Go Gators” in a variety of background noises, with the rest of the dataset hidden for testing. That is nowhere near enough to train a detector from scratch, so I wanted to try transfer learning. I found a dataset from TensorFlow, Google Speech Commands, that has a lot of command words which are similar to detecting wake words. We trained a convolutional encoder on it, attached a classifier head, and fine tuned the whole network on our recordings.

Each 3 second window becomes an 80 bin mel spectrogram, which better approximates human speech frequencies by compressing high frequencies and expanding lower ones. Three convolution blocks with filters doubling from 32 to 128 feed a dense layer of just 32, because we want the final prediction to not be made off of a lot of noise. That dense layer is surrounded by dropouts of 0.5 since we have such a small training set, and the final layer has 1 output with sigmoid: above 0.5 = wake word.

The first problem I ran into was the audio samples being 16 kHz instead of 48 kHz like our recorded data. I originally upsampled the Google speech commands to all be 48 kHz, but it seemed my upsampling was just adding noise, and validation F1 stayed in the 60s. So I tried downsampling the wake words to 16 kHz so they could match the Google libraries. Fine tuning with the encoder weights unfrozen performed significantly better than frozen, reaching F1 0.899 on the held out test set (Fig. 01b tells that story).

Everything in Fig. 01 is the real network: the demo clips are real recordings with precomputed scores, and “Record 3 s” scores your own attempt with the exported ONNX model in your browser. Audio never leaves your device. Write-up in papers ↓.

Fig. 01 · wake-word detector · “Go Gators” mode: precomputed

t = 0.00 s p = – listening…

Real recordings, real inference: an encoder trained on Google Speech Commands with a classifier head, fine tuned on “Go Gators”, scores each 3 s window. Sigmoid output above the red 0.5 threshold = wake word. That threshold can be tuned depending on whether you prioritize false positives or false negatives. “Record 3 s” scores your own attempt with the exported ONNX network locally in your browser. Audio never leaves your device.
Fig. 01b · downsample, don't upsample 48 kHz → 16 kHz
approachresultoutcome

The Google Speech Commands dataset is only in 16 kHz and our recordings were 48 kHz, so one of them had to move. Upsampling the Google data to 48 kHz seemed to just be adding noise, and the model overfit. Instead we downsampled our training dataset from 48 kHz to 16 kHz, because human voice recognition is still good on that frequency range. 48 kHz audio is incredibly dense, and a 16 kHz signal only carries frequencies up to 8 kHz. The gray spectrum shows there is almost nothing to the right of the red line to lose, and the mel spectrogram layer compresses those high frequencies anyway.

Measured spectrum of a real “Go Gators” training recording, not an illustration. Full write-up in the report (PDF).

AI Smart Glasses: Audio Subsystem

UF Real World EngineeringParakeet STT · MLXReDimNet diarizationPython multiprocessingongoing

For dementia patients, the hardest part of a conversation is often who am I talking to? I'm on a UF Real World Engineering team building AI smart glasses that answer that in real time. The glasses stream camera frames and microphone audio over UDP to a Python backend where face recognition, voice activity detection, speech to text, and speaker diarization all run in parallel worker processes, and a coordinator fuses face and voice into one identity.

I built the audio pipeline core: 16 kHz audio in 160 ms chunks through Silero VAD, Parakeet streaming transcription on Apple MLX, and one ReDimNet voice embedding per sentence, matched by cosine similarity with a bias toward whoever spoke last. Benchmarked on the production path, transcription runs at RTF 0.085, about 12× faster than real time. When an unknown voice lines up with a known face, the system registers it and learns the voice on its own.

I also built the monitoring dashboard end to end: an aiohttp server that streams every worker's logs over a WebSocket into a live control panel with the identity database, per person chat history, and a face lookup tool. Plus the LLM agent worker that parses intent from speech, registers people when they introduce themselves, and logs every recognized sentence to a per person history.

Fig. 02 is a real run of this pipeline, not a mockup, recorded with teammates. Team repo on GitHub ↗

Fig. 02 · live demo · face + voice ID press play
The green box is the live face detector; its label flips from “Unknown (ID: 1)” to “Sean (ID: 1)” as the pipeline learns the new face and voice. The log below is the backend's verbatim output, printing in time with the video.

Mentor-Match: Stable Matching, Shipped

Gale–Shapley matchingMERNExpofull-stack

Pairing mentors with mentees is a two sided problem: a good match needs both people to actually want it. Mentor-Match has mentors and mentees rank each other, then runs a Gale–Shapley inspired matching round. Mentors propose, mentees hold on to their best offer, and a displaced mentor moves down their list until the whole pairing is stable and nobody would rather swap.

It shipped as a full product: a MERN stack web app with an Expo mobile app on top. Against hand-curated pairings the matcher reached ~80% match accuracy.

Fig. 03 animates one matching round from the algorithm. Run it and watch a displacement happen live.

Fig. 03 · stable matching round synthetic preferences
Mentors propose, mentees hold their best offer, and displaced mentors try again until the matching is stable. The animation plays one full round of the algorithm. Names and preferences here are synthetic.

BioClock: AIoT Smart Alarm

ESP32edge AImotion + heart rateWarren B. Nelms IoT Conference

The snooze button is a lie you tell yourself at 6 a.m. BioClock kills it: an ESP32 wearable whose on-device classifier reads motion and heart rate and only silences the alarm when it detects actual wakefulness.

Edge inference matters here: no cloud round-trip at the bedside, no raw biometrics leaving the device. Presented at the Warren B. Nelms IoT Conference. Drag the slider to feed the classifier.

Fig. 04 · wakefulness classifier edge model, simulated

asleep conf = – hr = –

alarm: held

Motion + heart rate → 3-state on-device classifier. The alarm only stops when the model says awake, not when you hit snooze.

Predicting NYC Taxi Tips using Machine Learning

linear regressionLASSO λ = 0.01feature selectionNYC Open Data

Our data set is the NYC 2023 taxi cab data set. Each trip record captures pick-up and drop-off times, taxi zone locations, trip distances, itemized fares, rate types, and driver-reported passenger counts. We want to figure out what the most important factors are in predicting tip prices, to help taxi drivers optimize the amount of tips they collect.

Upon inspection of the dataset, only credit card tips are recorded and cash tips are not logged at all. We can't train a model to predict a value that we do not know, so the cash transactions had to be removed, roughly 20% of the nearly 10,000 trip dataset. I created a pre tip total by combining all of the costs that make up the total amount before factoring in the tip, and dropped the total amount because it includes the tip amount so there would be data leakage.

I trained linear regression with cross validation of 10 folds, and for the lasso a grid search picking λ from 0.0001 to 1 chose λ = 0.01. They had very similar R², 0.6033 vs 0.6075, but lasso dropped 15 of 27 features with a 0 coefficient: ride duration, passenger count, and most of the pickup days and times. For the taxi driver perspective, they shouldn't focus on the time of day despite the EDA suggesting otherwise. They should look for long rides, airport rides, and routes that pass through tolls because these are the best predictors for the tip amount.

Fig. 05 plots the real fitted coefficients from the report. Toggle LASSO and watch it prune. Write-up in papers ↓.

Fig. 05 · what predicts a taxi tip NYC 2023 · real coefficients

modelcv R² (95% CI)test R²RMSE
Real fitted coefficients and metrics from the report, not a simulation. Toggle LASSO and watch it zero 15 of 27 features. Cash trips were removed before training because cash tips are never logged. RMSE is shown with its share of the mean tip.

appendix · also built

Appendix A · hackathon, 2nd place

PermitPal: LLM-Assisted Access Requests

Engineers ask for access in plain English and security teams answer in RBAC. PermitPal turns a request like “deploy billing to staging” into a least-privilege permission bundle, with the over-broad grants explicitly refused. The unusual part is the router: requests containing PII stay on a local Ollama model and never leave the building, while everything else takes the OpenAI API hot path for a ~10× latency cut.

OllamaOpenAI APILLM routingRBAC

Appendix B · course project

Chest X-Ray Disease Classification

A convolutional neural network trained on ChestMNIST: 112,120 X-ray scans labeled with 14 diseases, where one scan can carry several at once. The real lesson was class imbalance. Binary accuracy hit 0.9475 on the held out test set, but mostly because predicting no disease is usually right; the AUC of 0.8235 is what shows the model actually learned features. Full write-up in papers ↓.

TensorFlowCNNChestMNISTAdam

the signal · stage 02 · regression analysis
sec. 02

Experience

Prudential Financial
SWE Intern · 2025

Automation & analytics at scale

Built internal automation and analytics dashboards operating over 480k+ records, turning manual reporting workflows into software the team actually uses.

UF Real World Engineering
Software Engineer · ongoing

Audio subsystem lead · AI smart glasses

Leading the real-time audio pipeline (project 02 above): streaming, diarization, transcription, and cross-modal identity for a device that helps dementia patients recognize the people around them.

University of Florida
B.S. Computer Science · May 2028

GPA 3.97

Coursework and research at the intersection of machine learning and signal processing.

the signal · stage 03 · neural attention
sec. 03

Papers

Paper · EEE4773 Final Project · UF · 2026

Wake Word Detection via Transfer Learning

L. Burchill, S. Heffernan, Y. Gelli

Abstract

This report is a culmination of experiments carried out to train a model to detect the presence of the wake words “Go Gators” in audio samples with a variety of background noises. There is a total of 450 samples with hidden, easy, samples reserved for testing. Of the 450 total samples, 360 were available for training. Each researcher conducted their own experiments in pursuit of finding the final model, and their findings will be covered in the experiments section. The implementation section will focus on the development of the final submitted model to be used for testing.

Read paper (PDF) ↗

Paper · ML Course Project · UF · 2026

Predicting NYC Taxi Tips using Machine Learning

S. Heffernan

Abstract

This document shows an analysis of the NYC 2023 Taxi Cab data set. It includes exploratory data analysis and model creation and performance evaluation.

Read paper (PDF) ↗

Paper · Course Project · UF · 2026

Classifying Diseases from Chest X-Ray Scans using Convolutional Neural Networks

S. Heffernan

Abstract

This document shows an analysis of chest lung X-ray data. The goal is to train a convolutional neural network to classify specific diseases.

Read paper (PDF) ↗

Paper · Signal Processing · UF · 2026

Satellite Image Classification with Dimensionality Reduction: Detecting Cargo Ships

S. Heffernan

Abstract

We apply dimensionality reduction techniques to classify ship presence in 80×80 satellite images from San Francisco Bay. Principal Component Analysis (PCA) reduces the 19,200-dimensional RGB feature space to 103 components while retaining 90% variance. Comparison of random forest and logistic regression on raw features, PCA-reduced features, and manifold learning embeddings (LLE, ISOMAP) shows that random forest achieves best performance (95.4% accuracy, 0.90 F1) without dimensionality reduction, though PCA provides comparable results with significantly reduced computational complexity.

Read paper (PDF) ↗
the signal · stage 04 · documented
sec. 04

About

I'm Shaun, a computer science student at the University of Florida. I've trained a wake word model and shipped it running live in the browser, lead the real-time audio pipeline on a team building AI smart glasses, and built full-stack apps from the database to the mobile front end.

The thread through everything I build is taking an idea from a paper or a class and making it run for real: in the browser, on a server, on hardware. Research-grade thinking, shipped as working software.

If you're building something interesting, I'd love to talk.

the signal · stage 05 · resolved. say hello.