This project focuses on the classification of HTML pages for musical events using a Gensim Glove Model. The goal was to identify specific event-related URLs (e.g., ticket purchase links or event details) from a domain and distinguish them from other irrelevant URLs. By leveraging NLP-based similarity models and pattern extraction techniques, the solution provided efficient URL classification and discovery for event pages.
The solution also included URL discovery for image source links and ticket URLs, providing APIs for seamless access. The model was deployed on AWS EC2 with Flask APIs and Dockerized for efficient deployment in production.