Research Projects
This page brings together my current research and technical projects in one place. Each entry can be expanded inline, and longer projects stay inside a scrollable panel so you can browse quickly without losing your place.
PDEBench-Lang: Representation Effects in Neural Symbolic PDE ReasoningThis project studies whether the representation format of a partial differential equation changes how well a language model can reason about it. Instead of treating PDEs only as numerical objects, the project frames t...
This project studies whether the representation format of a partial differential equation changes how well a language model can reason about it. Instead of treating PDEs only as numerical objects, the project frames them as structured symbolic language and asks whether formats such as Postfix, LaTeX, Prefix, and natural language lead to different reasoning behavior.
The benchmark, PDEBench-Lang, is built around five canonical PDE families: Heat, Wave, Burgers, Laplace, and Advection. For each generated equation instance, the system converts the same PDE into four symbolic dialects and trains sequence-to-sequence models to predict:
- a structured reasoning chain
- the PDE family label
- a pruned symbolic operator subset for downstream solving
The core research question is whether representation choice affects symbolic pruning quality, family classification accuracy, and reasoning fidelity. To evaluate this, the project introduces a metric called Trash Score, which measures how often a model gives the correct family label while relying on structurally incorrect reasoning.
An example PDE such as
[u_t = 0.5\,u_{xx}]
is represented in four different forms:
- Postfix:
u t d 0.5 u x x d d * = - Raw LaTeX:
u_{t}=0.5\,u_{xx} - Prefix:
= d(u,t) * (0.5, d(d(u,x),x)) - Natural language: “The time derivative of u equals one-half the second spatial derivative of u.”
The modeling pipeline fine-tunes encoder-decoder language models to map a PDE representation into structured outputs that describe the underlying dynamics. Preliminary experiments with T5-small showed near-perfect scores on the initial natural-language dataset, which led to an important finding: the synthetic benchmark was still too templated and too easy. That result motivated the next research direction of adding richer phrasing variation, greater structural diversity, and more ambiguous cross-family cases.
My contribution to this team project focused on cross-dialect evaluation and benchmarking, helping compare representation formats and analyze how closely model reasoning aligned with the true symbolic structure of the equation.
Main components include:
- synthetic dataset generation across multiple PDE families
- conversion of each PDE into four symbolic dialects
- sequence-to-sequence fine-tuning for reasoning and operator prediction
- benchmarking of pruning quality, label accuracy, and reasoning fidelity
- analysis of representation alignment between symbolic format and LLM pretraining
GitHub repository:
https://github.com/RaghavKrishn/Nlp-group-final-project
Weather-Aware Travel Itinerary OptimizationThis project develops a weather-aware itinerary optimization framework for tourist trip planning under real-world constraints. The work began as a single-day route optimizer for urban attractions and was later extende...
This project develops a weather-aware itinerary optimization framework for tourist trip planning under real-world constraints. The work began as a single-day route optimizer for urban attractions and was later extended into a multi-day Gurobi model with hotel selection, traveler profiles, and overnight routing decisions.
Using Yelp business and review data, Open-Meteo weather signals, and geographic travel-time estimates, the system scores attractions by quality and popularity, predicts congestion-aware waiting time, and then solves for feasible routes that balance utility against travel and crowding costs.
The original attraction utility model is based on:
[U_i = \text{rating}_i \cdot \log(\text{review count}_i)]
and the waiting-time component is modeled as a weather- and calendar-dependent function (W_i(t)). In the single-day formulation, the itinerary is chosen with an orienteering-style objective:
[\max \sum_i U_i x_i
- \alpha \sum_i W_i(t_i)x_i
- \beta \sum_{i,j} d_{ij} y_{ij},]
subject to time-budget and route-feasibility constraints.
The later multi-day extension upgrades the problem into a TTDP/OPHS-style formulation. Attractions use normalized utility and waiting terms, while hotels become decision nodes across multiple outings:
[\tilde{u}_i = \frac{u_i - \min_j u_j}{\max_j u_j - \min_j u_j}, \qquad \tilde{w}_i = \frac{w_i - \min_j w_j}{\max_j w_j - \min_j w_j}.]
For each traveler profile, the Gurobi model selects attractions, orders visits across days, chooses hotels, and penalizes overnight switching:
[\max \; \sum_{d=1}^{K}\sum_{i \in A} \alpha_p \tilde{u}i x{id}
- \sum_{d=1}^{K}\sum_{i \in A} \beta_p \tilde{w}i x{id}
- \sum_{d=1}^{K}\sum_{h \in H} \eta_p r_h y_{hd}
- \sum_{d=1}^{K-1}\sum_{h,g \in H} \lambda_p \tau^{H}{hg} z{hgd}.]
This extension produces a richer itinerary system with relaxed, balanced, and explorer profiles, multi-day outing layers, and hotel-aware route visualization.
Key Features
- Utility modeling using Yelp business ratings and review counts
- Weather-aware congestion estimation
- Construction of a travel-time matrix using geographic coordinates
- Single-day route optimization for feasible sightseeing plans
- Multi-day Gurobi optimization with hotel selection and switching penalties
- Integration of machine learning predictions and operations research
- Interactive visualization of optimized tourist routes for both versions
Original Single-Day Interactive Map
The original prototype solves a single-day sightseeing problem and renders the optimized route in the HTML map below.
Multi-Day Gurobi Extension
I later extended the original system into a multi-day itinerary optimizer that supports hotel selection, overnight routing, and traveler-profile comparisons. That extension is saved as tourist_routes_map_gurobi.html in the images folder and is embedded below.
Technologies Used
- Python
- Optimization: Integer Programming, mixed-integer optimization, Gurobi
- Machine Learning: XGBoost and predictive congestion modeling
- Data Sources: Yelp Open Dataset, Open-Meteo Weather API, OpenStreetMap
- Visualization: Folium / Leaflet interactive maps
GitHub Repository
Analyzing Representation Transfer and Attention in Facial Expression RecognitionThis project studies how representation transfer changes both accuracy and attention behavior in facial expression recognition (FER). By comparing CNNs trained from scratch, transfer-learning pipelines, and Vision Tra...
This project studies how representation transfer changes both accuracy and attention behavior in facial expression recognition (FER). By comparing CNNs trained from scratch, transfer-learning pipelines, and Vision Transformers on FER2013, the work asks not only which model performs best, but also which facial regions each model actually relies on when predicting emotion.
Beyond standard FER benchmarking, the project introduces a quantitative attention analysis framework and an attention-guided training objective to measure and improve interpretability. The main goal is to connect recognition performance with semantically meaningful visual evidence rather than treating explanation quality as an afterthought.
🔗 Project Repository:
https://github.com/RohitPoduval1/csci5527-project
Problem Motivation
Facial expression recognition models often achieve strong classification performance, but it remains unclear:
- Which facial regions models rely on
- How pretrained representations influence attention
- Whether models attend to semantically meaningful facial features
This project studies the relationship between:
- representation transfer
- model architecture
- attention behavior
- recognition performance
Key questions:
- Does pretraining improve emotion-specific representations?
- Do transformers attend more globally than CNNs?
- Can attention regularization improve interpretability?
Dataset
We use the FER2013 dataset, a widely used benchmark for facial expression recognition.
Dataset Statistics
- 35,887 images
- 7 emotion classes
Emotion categories:
- Angry
- Disgust
- Fear
- Happy
- Sad
- Surprise
- Neutral
Image Characteristics
- grayscale facial images
- resolution: 48 × 48
- noisy labels
- large variations in
- lighting
- occlusion
- facial pose
Data Preprocessing
Image Processing
Input images are converted and resized to match pretrained model requirements.
48x48 grayscale
→ convert to RGB
→ resize to 224x224
Normalization
Images are normalized using ImageNet statistics:
[x’ = \frac{x - \mu}{\sigma}]
Data Augmentation
To improve generalization:
- random horizontal flip
- random rotation
- random crop
- color jitter
- Gaussian noise
Optional robustness techniques:
- label smoothing
- MixUp augmentation
Facial Landmark Detection
To analyze attention behavior, facial landmarks are detected using:
- MediaPipe
- dlib
The face is segmented into semantic regions:
- eyes
- eyebrows
- mouth
- nose
- face contour
- background
Binary masks are generated for each region, enabling quantitative measurement of attention distributions.
Model Architectures
We train four model types to study the impact of representation transfer.
Model 1 — CNN Baseline
A simple CNN trained from scratch.
Architecture:
Conv → ReLU → MaxPool
Conv → ReLU → MaxPool
Conv → ReLU → MaxPool
Fully Connected
Softmax
Purpose:
- establish baseline performance
- demonstrate limitations of training from scratch
Expected accuracy:
60–65%
Model 2 — CNN with ImageNet Transfer
We evaluate transfer learning using pretrained CNN backbones.
Example architectures:
- ResNet50
- EfficientNet
Training procedure:
- load pretrained network
replace final classification layer(FC → 7 emotion classes)
- train classifier head
- fine-tune upper layers
Expected accuracy:
~70%
Model 3 — VGGFace Transfer
We test domain-specific transfer learning using models pretrained on face recognition.
Example backbone:
- VGGFace
Hypothesis:
Face-recognition pretraining may suppress expression features because it focuses on identity rather than emotion.
Model 4 — Vision Transformer
We compare CNNs with transformer-based vision models.
Example model:
timm.create_model("vit_base_patch16_224", pretrained=True)
Training Setup
Training configuration:
optimizer: Adam
batch size: 64
epochs: 30
learning rate: 1e-4
Classification loss:
[L = - \sum_i y_i \log(p_i)]
Regularization techniques:
- dropout
- weight decay
- label smoothing
Explainability Analysis
To understand model attention behavior, we apply explainability methods.
CNN Models
- Grad-CAM
- Guided Grad-CAM
These methods produce spatial heatmaps highlighting image regions influencing predictions.
Transformer Models
For Vision Transformers we analyze:
- attention rollout
- self-attention maps
Self-attention is computed as:
[Attention(Q, K, V) = softmax\left(\frac{QK^T}{\sqrt{d}}\right)V]
These visualizations allow direct comparison between CNN and transformer attention behavior.
Quantitative Attention Analysis (Novel Component)
Instead of relying solely on visual heatmaps, we introduce a quantitative attention metric.
For each facial region:
[Attention_{region} = \frac{\sum_{pixels} Heatmap \times Mask}{\sum Heatmap}]
Regions analyzed:
- eyes
- mouth
- eyebrows
- background
Example comparison:
| Model | Mouth | Eyes | Background |
|---|---|---|---|
| CNN baseline | 28% | 19% | 41% |
| CNN ImageNet | 35% | 27% | 25% |
| VGGFace | 12% | 40% | 26% |
| ViT | 30% | 30% | 15% |
This provides objective evaluation of model interpretability.
Attention-Guided Training (Novel Component)
We further propose an attention regularization loss encouraging models to focus on relevant facial regions.
Modified training objective:
[L = L_{cls} + \lambda L_{attention}]
Where
[L_{attention} = \sum_{background} Heatmap]
Purpose:
- penalize background attention
- encourage focus on meaningful facial features
This improves both interpretability and robustness.
Optional Experiment — Multi-Scale Input
We also evaluate whether multi-scale inputs improve FER performance.
Architecture:
Face crop
+
Whole image
→ concatenated feature representation
→ classifier
Goal:
Capture both:
- local facial expression cues
- global facial context
Evaluation Metrics
Classification Metrics
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix
Interpretability Metrics
- attention distribution across facial regions
- background attention ratio
Experimental Analysis
Domain Transfer
Key question:
Does face-recognition pretraining help or hinder emotion recognition?
Expected observations:
- VGGFace emphasizes identity-related features
- ImageNet pretrained models generalize better for emotion recognition
Architecture Differences
CNNs
- strong local feature extraction
Transformers
- global attention modeling
- holistic understanding of facial expressions
Attention-Guided Training
Hypothesis:
Encouraging attention on facial regions improves
- robustness
- interpretability
- classification performance
Results Summary
The experiments demonstrate:
- pretrained models significantly improve FER accuracy
- transformer architectures exhibit more global attention patterns
- quantitative attention metrics reveal meaningful differences between models
- attention-guided training improves interpretability and robustness
Key Contributions
- Quantitative attention analysis framework for FER models
- Systematic comparison of CNN, transfer learning, and transformers
- Attention-guided training objective improving interpretability
- Experimental analysis of representation transfer effects in emotion recognition
Skills & Technologies
- PyTorch
- Computer Vision
- Transfer Learning
- Vision Transformers
- Explainable AI (Grad-CAM, Attention Maps)
- Facial Landmark Detection
- Deep Learning Experiment Design
Repository
Full implementation and experiments are available here:
Matrix-Vector Trace Estimation with Hutch++This project implements randomized trace estimation algorithms for large matrices using the matrix–vector query model.
This project implements randomized trace estimation algorithms for large matrices using the matrix–vector query model.
In many large-scale problems, a matrix is too large to compute or store explicitly, but matrix–vector products can still be evaluated efficiently. This project explores how to estimate the trace of such matrices using randomized algorithms.
The implementation focuses on Hutch++, a modern stochastic trace estimator that improves the classical Hutchinson method by combining low-rank approximation and randomized probing, reducing the required number of matrix–vector queries.
The project also demonstrates an application to triangle counting in large graphs using the Wiki-Vote network dataset. Using the identity
[\text{Number of triangles} = \frac{1}{6}\operatorname{tr}(B^3)]
the algorithms estimate triangle counts without explicitly computing $B^3$, relying only on efficient sparse matrix–vector operations.
Main features include:
- Implementation of Hutch++ stochastic trace estimation
- Implementation of NA-Hutch++ (non-adaptive variant)
- Implementation of Gaussian-Hutch++
- Construction of matrix–vector oracle representations for large matrices
- Application to triangle counting in large-scale graph datasets
- Performance experiments comparing different estimators
This project demonstrates how randomized numerical linear algebra enables scalable analysis of large datasets where traditional matrix computations would be computationally expensive.
GitHub repository:
flowchart LR
A[Graph Dataset<br>Wiki-Vote] --> B[Build Sparse<br>Adjacency Matrix B]
B --> C[Define Linear Operator<br>A = B^3]
C --> D[Matrix-Vector Oracle<br>(A @ v)]
D --> E[Hutch++ Sampling<br>(S, G)]
E --> F[Low-Rank Approximation<br>Q = orth(AS)]
F --> G[Exact Trace Part<br>tr(Qᵀ A Q)]
D --> H[Residual Estimation<br>with G]
H --> I[Combine Estimates]
I --> J[Trace Estimate<br>tr(B^3)]
J --> K[Triangles<br>= tr(B^3)/6]
Randomized Matrix Sketching AlgorithmsThis project implements randomized numerical linear algebra algorithms for scalable machine learning and large-scale data analysis.
This project implements randomized numerical linear algebra algorithms for scalable machine learning and large-scale data analysis.
The goal is to reduce computational cost while preserving important matrix properties.
Algorithms implemented include:
- Leverage score sampling
- CountSketch
- Subspace embeddings
- Hutch++ trace estimation
Experiments evaluate the accuracy–efficiency trade-offs of sketching algorithms for:
- linear regression
- low-rank approximation
- matrix trace estimation
The implementations are written in Python and tested on high-dimensional synthetic and real datasets.
Magnetic Sensor Array VisualizationThis project builds a real-time visualization system for magnetic sensor arrays used in magnetic pose estimation experiments.
This project builds a real-time visualization system for magnetic sensor arrays used in magnetic pose estimation experiments.
The system processes streaming sensor data from an 85-sensor magnetic array (17×5) connected through a serial interface.
Main features include:
- Real-time magnetic field visualization
- Sensor array calibration
- Geomagnetic background compensation
- Data acquisition with Python and PyQt GUI
The visualization system helps researchers analyze magnetic field patterns and debug pose estimation algorithms.
GitHub Repository
Magnetic Pose Estimation Using Distributed Dipole ModelsThis project develops algorithms for estimating the pose of flexible permanent magnets using magnetic sensor arrays.
This project develops algorithms for estimating the pose of flexible permanent magnets using magnetic sensor arrays.
Instead of modeling a magnet as a single dipole, the system represents the magnet as multiple distributed dipoles, allowing reconstruction of bending and deformation.
Key components include:
- Distributed dipole magnetic modeling
- Nonlinear least-squares optimization
- Sensor array calibration
- Real-time magnetic field data processing
The algorithms are implemented in Python and designed to work with a 17×5 magnetic sensor array, enabling high-resolution magnetic field measurements.
Applications include:
- Soft robotics
- Tactile sensing systems
- Intelligent materials
- Shape reconstruction of flexible magnetic structures
GitHub Repository
Wenzhounese Input Method and Language TechnologyThis project develops a digital input method for the Wenzhounese (溫州話) dialect using the Rime input method framework.
This project develops a digital input method for the Wenzhounese (溫州話) dialect using the Rime input method framework.
The system aims to support computational access and digital preservation of underrepresented languages.
Major components include:
- Design of a phonetic transcription system for Wenzhounese
- Rime input schema implementation
- Dictionary construction for dialectal vocabulary
- Mapping between phonetic input and Chinese characters
Future directions include exploring language models and LLM-based tools to improve typing efficiency and dialect processing.
The project contributes to the digital preservation of minority languages and dialects.
