Amazon ML Challenge 2024 - Qwen-VL, CV, OCR

Year

2024

Tech & Technique

Qwen-VL, CV, OCR, DocTr, PaddlePaddle, Llama 3.1, Prompt Engineering

Description

An intense national competition focused on automated document processing and information extraction. My solution achieved an All India Rank of 14 out of over 75,000 participants.

The system involved a multi-stage pipeline:

Built a robust OCR pipeline using DocTr and PaddlePaddle for high-accuracy text extraction from various document layouts.
Paired the OCR output with a prompt-tuned Llama 3.1 (7B) model to intelligently retrieve specific entity values.
Leveraged Qwen-VL (2B/7B) for direct image-based entity extraction, combined with rule-based post-processing for validation.

My Role

As the sole developer, I designed, built, and optimized the entire solution:

✅ Engineered the end-to-end OCR and information extraction pipeline.
💡 Developed sophisticated prompt engineering strategies for the Llama 3.1 model.
🔧 Implemented rule-based validation logic to improve the accuracy of the final output.
🚀 Iterated rapidly on the solution to climb the competitive leaderboard.