Web Scraping to find and identify websites
I enjoy tinkering around with weird things. This one is particularly fun.
I created a webscraper and gave it one website address. It then started by scraping
the contents from the landing page on that site and identifying all the links.
When a link to another website was found, it was added to the list to be scraped next.
The Results So Far
Total Websites Found: 207,674
Total Web Pages Found: 3,041,632
Next up I started analyzing the contents of the web pages to identify if the website is a family friendly website, or contains content probably not suitable for children:
Family Friendly Websites: 42,004
'Unsafe' Websites: 21,107