Robots.txt vs. FOIA

Scripting News:Kicking Ass, the DNC weblog, on robots.txt disabling of caches on White House pages about Iraq. Interesting point. Now would be an appopriate[sic] time to ask the Democrats if they will have a different policy should a Democrat be elected to the White House in 2004.

It is also the time to take a look at how robots.txt should be used on government sites. Should web crawlers respect robots.txt files on government sites? The information is covered by the Freedom of Information Act. Perhaps bloggers should make requests for the earlier copies of the documents before the pages were changed.

By contrast the Colorado State robots.txt file has some comments in it about why each directory is eliminated.

Should robots respect robots.txt files on government servers?




Keeping a continual archive of government sites ought to be tasked to the Library of Congress These need to be complete archives, robots.txt-be-damned.

-Ross (

What changes?

What documents have been changed? There is no evidence at all that any documents have been changed. This robots.txt change is likely a poor effort to deal with an out-of-control robot (note that many of the directories that are exempted do not even exist -- they simply added /iraq/ to every existing directory. Likely an out-of-control robot going through the site adding /iraq/ to every directory).

The changes

The site said at one time that the war with Iraq is over. Now it says major combat operations.