Back
Amazon ML Challenge 2024
Year
2024
Tech & Technique
Qwen-VL, CV, OCR, DocTr, PaddlePaddle, Llama 3.1, Prompt Engineering
Description
An intense national competition focused on automated document processing and information extraction. My solution achieved an All India Rank of 14 out of over 75,000 participants.
The system involved a multi-stage pipeline:
The system involved a multi-stage pipeline:
- Built a robust OCR pipeline using DocTr and PaddlePaddle for high-accuracy text extraction from various document layouts.
- Paired the OCR output with a prompt-tuned Llama 3.1 (7B) model to intelligently retrieve specific entity values.
- Leveraged Qwen-VL (2B/7B) for direct image-based entity extraction, combined with rule-based post-processing for validation.
My Role
As the sole developer, I designed, built, and optimized the entire solution:
- ✅ Engineered the end-to-end OCR and information extraction pipeline.
- 💡 Developed sophisticated prompt engineering strategies for the Llama 3.1 model.
- 🔧 Implemented rule-based validation logic to improve the accuracy of the final output.
- 🚀 Iterated rapidly on the solution to climb the competitive leaderboard.