Should sites like turnitin.com have to follow licensing provisions? For example a site can have a license that says you may catalog and index the site as long as the index is made freely available to the public. However sites like turnitin.com are using my copyrighted material to make a profit. I wonder if their robot will listen to robots.txt.

How are they making certain that their archiving of my information published on the web does not violate the copyright I have on it? Turnitin has a legal document explaining how their service does not infringe. However as far as I can tell it only applies to works submitted to their service and does not cover their robot crawling my website.

A similar note came from reading a comment on Dave Winer's Scripting News the other day. In this piece he suggests they are handling referrer spam by using robots.txt so that indexes such as Google won't crawl these pages and the spammers won't benefit from seeing the links listed. I prefer to use my reporting software to have these references not show up. In this way the integrity of the reporting pages still works for sites like Google.