Arena Red » 14 Dec 2003 » Fun with HTTP Referrers
« Threadsafe C++ Singleton | Buffer to Hex + Back Again »
Fun with HTTP Referrers

Last week I added referrer support to Dynoweb, the software I've written to manage the content on this site. That means that for the site in general, and for each entry, the referrer in the page request is saved in the database, and I can display each entry's referrers in the permalink-displayed version of the entry.

Initially, I'm doing several things to filter and adjust the data before it goes in the database or gets displayed:

  • I found three things to look for that indicate the referrer is a web mail client. No point in showing referrers that can't be linked back to, so I don't record these.
  • I also filter out referrers that are from within my site.
  • When displaying the referrer, I look for Google, Yahoo, Netscape, and Feedster URLs and display them in a slightly nicer way.

After letting this update run for a week or so, some interesting, funny, and disappointing things appear in the referrers.

First of all, I'm not sure there's any point in displaying the search engine referrers, since I can still look at them directly in the database, or by generating a private page to display them. Some of the search terms that land on my site are kind of funny, which I'll point out below. Then again, to filter out every search engine result will require knowing what a URL from each them looks like, and adding it to the filtering mechanism.

One problem with unifying the references from a particular actual referrer is when a site puts a varying element in the URL for no good reason. The prime example is one site that puts a session ID in the URL. Unless I do something special to deal with it (which is not worth doing), each user who clicks through from that page will have a unique referrer.

I've already seen one referrer that I'd guess is "referrer spam". It's a top level URL for a site that is named like you'd expect to see in a spam email (no point in me linking to them here). It would be great if there were a shared service for filtering this crap out, like the efforts to block comment spam.

Interesting Searches

Certain web search strings are interesting or just plain funny.

It seems that now by having the referrers show up on the page for an entry, it makes the entry rank even higher in the search results for that page. At the least, it makes the search page show the referrer more directly. It feels like there is some kind of feedback loop going on.

Some searches for particular music (artist or album) hit my site because of the "Now Playing" section on the right side. If there were a way I could easily exclude that stuff from the search crawlers, it'd be nice because there's really nothing useful for people to find there, and it changes frequently so what their search found is usually not there by the time they hit it. Out of curiousity I clicked back to the Google search reference for "Elfman So Lo mp3", presumably because at the time Google crawled one particular entry, I had recently listened to this excellent Danny Elfman disc. I was curious why someone would land on my site -- was I really such an authority on this album simply by referencing it? I wasn't on the first page, but out of curiosity checked the top-ranked site, which is bizarre: it appears to contain nothing but the same boiler plate page cloned for a vast number of specific CD albums, each page with identical text describing how to burn a CD. That is, each such page tells you how to burn that particular CD. It looks like a weird way to auto-generate seemingly different Ad Words pages.

My description of the mechanism I use for getting my recently played iTunes list into the sidebar in near real-time, fails to help the people who got there by searching Google for "iTunes password cracking utility", "ipod crack", or specifically "ipod crack music partition export mp3 file hardware device export".

My recap of a day on the track at Laguna Seca mentioned in passing a couple of folks there who had Miatas, along with the less-than-desired performance of PCCB brakes on a Porsche GT-2, but that was no help to the person who got there by Googling for something that probably does not exist: "ceramic miata brakes". And the joking reference to my friend Kevin "cheating" by getting an extra day of practice, coupled with a note of the relative cost of a Spec Miata vs. the GT-2's PCCB brakes, won't help the person who was either trying to catch someone or become someone involved with Spec Miata series racing and a little "spec miata cheating".

However, if you are looking for an example of writing a "C++ Singleton", then as of this writing, then I am your man, and both Google and Yahoo think so.

Top 10 of 1383 Referrers:
[1252] http://www.alleghanyeda.com
[1197] http://www.amateurvoetbal.net
[350] http://www.protzonbeer.com
[350] http://www.candiria.com
[349] http://www.sbj-broadcasting.com
[349] http://www.conecrusher.org
[349] http://www.edthompson.com
[349] http://www.axionfootwear.com
[262] http://food.95mb.com/dating-agency/index.html
[221] http://www.nawh.org