I'm a graduate student at NYU pursuing a Master's in Computer Engineering, passionate about building efficient and scalable AI systems. I focus on LLM optimization, multimodal models, and open-source contributionsβmost recently to the π€ transformers
library.
- Languages: Python (PyTorch, DeepSpeed, NumPy, Scikit-Learn, PySpark, TensorFlow), CUDA C++,C/C++, SQL
- Domains: LLMs, Vision-Language Models, Quantization, Distributed Training (DDP), Recommender Systems
- Tools: Docker, Slurm, Hugging Face Transformers, LangChain, Ollama, GCP, AWS, Spark, Airflow
-
Quantization Techniques
- SmoothQuant, Dynamic Quantization, Quantization-Aware Training (QAT)
- Frameworks: PyTorch FX, ONNX Runtime, Hugging Face Optimum
-
Pruning Strategies
- Filter/channel pruning, magnitude pruning, NetAdapt-style structured pruning
- Latency-aware model slimming via FLOPs/accuracy trade-offs
-
Distributed Training
- PyTorch Distributed Data Parallel (DDP), Deepspeed
- Mixed precision (FP16), gradient accumulation, multi-node cluster scaling
-
Multimodal Systems
- CLIP-like ViT-BERT architectures
- Vision-Language alignment, CLIPScore evaluation, knowledge distillation
π οΈ Comfortable implementing research papers from scratch and profiling performance with tools like Weights & Biases and PyTorch Profiler.
-
#38487: Enable
device_map="auto"
support for Dinov2
Enabled automatic device placement for Dinov2 by defining_no_split_modules
, unlocking inference across CPU/GPU seamlessly. -
#38461: Add
GLPNImageProcessorFast
Implemented a fast image processor variant for the GLPN model using TorchVision. Achieved functional parity with the original (max abs diff < 1e-7) and added complete tests. -
#38509: SparseVLM β Visual Token Sparsification for Efficient VLM Inference
Proposed support for SparseVLM: a training-free, plug-and-play method to prune redundant image tokens in VLMs like BLIP and Flamingo.
It uses attention-guided token selection and recycling for up to 60% FLOPs reduction with minimal accuracy loss. Currently preparing an implementation compatible with π€transformers
.
- Developed ViT+BERT architecture for Visual Question Answering.
- Trained with QAT + DDP over 4ΓL4 GPUs for 1.8Γ speed-up and 60% model compression.
- Reduced size by 64% with dynamic quantization.
- Automated finetuning and benchmarking via Hugging Face tools.
I'm currently open to:
- Remote internships or research collaborations in LLM efficiency, model compression, or AI infrastructure
- Open-source projects focused on cutting-edge ML research
- π§ Email: ac11274@nyu.edu
- π Portfolio: github.com/aryanchauhan31
Letβs build something impactful together.