CollataBot Search Engine
At Byte, I spearheaded the development of a search engine to catalog and explore all public libraries in the United States.
Starting with an incomplete IMLS library list that lacked URLs, I used Python's Pandas and Beautiful Soup libraries to spider and aggregate missing data, creating a comprehensive CSV file with over 11,000 libraries and their URLs.
Building on this foundation, I developed a system to crawl the base URLs, gather all accessible paths, and scrape the content of each page. The scraped data was stored in a database, and I implemented a feature to send email notifications whenever content updates matched specified keywords.
As a proof of concept, the project, dubbed 'Collatabot,' was envisioned to include a landing page explaining its functionality and an analytics dashboard. This dashboard would provide libraries with actionable insights into their websites' indexability, accessibility, and performance, empowering them to enhance their digital presence.
Workplace:
Byte Studios
Team:
- Chris Barnett - Lead Designer and Developer
- Ryan Golner - Assisted In Speeding Up Search Engine
- Michael Diedrick - Project Manager
Software:





