Google

Sunday, September 30, 2007

When is a file.rss not an actual .rss file?

When the Header sent back is Content-Type: text/html; charset=utf-8

Using Live HTTP Headers Firefox Add On https://addons.mozilla.org/en-US/firefox/addon/3829

I could see Apache was treating my filename.rss files as normal text. Modifying the .htaccess file to treat the .rss files correctly, I made the following modification :

# Set up correct header for the rss files


AddType application/rss+xml .rss

Now checking out the reply header, it now sends back the correct header

Content-Type: application/rss+xml

Now my problem is checking with Google using the allinurl and site filters I can see almost 5000+ are indexed. I am not sure if I should be a little concerned, as I would rather search engine traffic go to the site, rather than the rss feed. Also duplicate content could become an issue.

I would of thought Google would of known not to add a rss feed to their main search results, as the files began with :
<?xml version='1.0'?>
<rss version='2.0'>
<channel>
<title>blah...</title>
....


So now I guess only time will tell if I get more of my RSS files indexed, or not.

To find out how many files I had listed, I used to following search (replaced sitename of course)
allinurl:www.sitename.com site:www.sitename.com .rss

No comments: