SPARK: Proof Assistant for Reasoning - LLM Fine-Tuning, PyTorch, DPO/GRPO

Year

2025

Tech & Technique

LLM Fine-Tuning, PyTorch, DPO/GRPO, LoRA, Qwen2.5, Formal Verification

Description

A step-by-step proof assistant designed for advanced mathematical reasoning by integrating small Large Language Models (LLMs) with formal verification techniques.

Key Achievements:

Engineered DPO/GRPO fine-tuning pipelines using 4-bit LoRA for specialized models like Qwen2.5-MATH and Phi-mini-4k.
Achieved a +14.7 percentage point increase on the MATH500 benchmark, significantly surpassing Gemini-2.0 on GSM8K.
Developed custom data preparation and error-driven reward design methodologies to enhance model accuracy and reasoning capabilities.

My Role

As the lead developer, I was responsible for the entire project lifecycle:

✅ Conceptualized and designed the core architecture integrating LLMs with formal verifiers.
🧠 Implemented and optimized the fine-tuning pipelines from scratch.
📊 Conducted rigorous benchmarking and performance analysis.
🔬 Designed the custom reward models for error-driven learning.