4.3. How can I make Wget ignore the robots.txt file/no-follow attribute?

By default, Wget plays the role of a web-spider that plays nice, and obeys a site's robots.txt file and no-follow attributes. 

If Wget's --debug output says something like 

Not following foo.bar because robots.txt forbids it

or 

no-follow in index.html

then this is the cause of your trouble. 

Wget enables you to ignore robots.txt and no-follow attributes; however, you should think about what you're doing first, and what those robots.txt files may be preventing. While some people use the robots.txt to block people from automatically fetching portions of their site, they can also be used to prevent automata from incurring huge loads on the server, by following links to CGI scripts that require some processing power. Ignoring a robots.txt or no-follow can mean giving migraines to site administrators, so please be sure you know what you're doing before disabling these things. 

To ignore robots.txt and no-follow, use: 

wget -erobots=off http://your.site.here