web crawlers are what search engines use to index websites.
they literally crawl the internet following links, reading pages and reporting the content back to their providers, that's how Google manage to index the internet.
you can (supposedly) stop web crawls by using a file on the root of your web server called robots.txt, or assuming the web crawlers don't pay attention to this you can regularly review your web server logs and ban traffic from web crawlers IP addresses or address ranges.
if you mean worms, then firewalls can stop those.