Back
Year
2025
Tech & Technique
LLM Fine-Tuning, PyTorch, DPO/GRPO, LoRA, Qwen2.5, Formal Verification
Description
A step-by-step proof assistant designed for advanced mathematical reasoning by integrating small Large Language Models (LLMs) with formal verification techniques.
Key Achievements:
Key Achievements:
- Engineered DPO/GRPO fine-tuning pipelines using 4-bit LoRA for specialized models like Qwen2.5-MATH and Phi-mini-4k.
- Achieved a +14.7 percentage point increase on the MATH500 benchmark, significantly surpassing Gemini-2.0 on GSM8K.
- Developed custom data preparation and error-driven reward design methodologies to enhance model accuracy and reasoning capabilities.
My Role
As the lead developer, I was responsible for the entire project lifecycle:
- ✅ Conceptualized and designed the core architecture integrating LLMs with formal verifiers.
- 🧠Implemented and optimized the fine-tuning pipelines from scratch.
- 📊 Conducted rigorous benchmarking and performance analysis.
- 🔬 Designed the custom reward models for error-driven learning.