Search Results for author: Abdur Rahman

Found 4 papers, 1 papers with code

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

no code implementations • 24 May 2024 • Abdur Rahman, Rajat Chawla, Muskaan Kumar, Arkajit Datta, Adarsh Jha, Mukunda NS, Ishaan Bhola

In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs).

Paper
Add Code

UTRNet: High-Resolution Urdu Text Recognition In Printed Documents

1 code implementation • 27 Jun 2023 • Abdur Rahman, Arjun Ghosh, Chetan Arora

To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11, 000 lines and UTRSet-Synth, a synthetic dataset with 20, 000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research.

Ranked #1 on Printed Text Recognition on UPTI