CollataBot Search Engine

At Byte, I spearheaded the development of a search engine to catalog and explore all public libraries in the United States.

Starting with an incomplete IMLS library list that lacked URLs, I used Python's Pandas and Beautiful Soup libraries to spider and aggregate missing data, creating a comprehensive CSV file with over 11,000 libraries and their URLs.

Building on this foundation, I developed a system to crawl the base URLs, gather all accessible paths, and scrape the content of each page. The scraped data was stored in a database, and I implemented a feature to send email notifications whenever content updates matched specified keywords.

As a proof of concept, the project, dubbed 'Collatabot,' was envisioned to include a landing page explaining its functionality and an analytics dashboard. This dashboard would provide libraries with actionable insights into their websites' indexability, accessibility, and performance, empowering them to enhance their digital presence.

Workplace:

Byte Studios

Team:

  • Chris Barnett - Lead Designer and Developer
  • Ryan Golner - Assisted In Speeding Up Search Engine
  • Michael Diedrick - Project Manager

Software:

PHPJavaScriptHTMLCSSSQLPythonSketchCron