Chapter 18: |
|
Tracking Hits with a DatabaseThe first array is for storing the names of the search engines we expect to be sending traffic to the page(s). Of course, you can add to this list. The second array is for the preferred language set in the visitor's browser. It's an associative array or has, which makes looking up the name of the language (English-US, for example) easy. The indexes are the language codes (en-us, etc.) that we'll get from the $HTTP_ACCEPT_LANGUAGE environment variable Next we'll get the visitor's IP address from the environment and the date from the system clock by using the built-in date() function. The variable $HTTP_USER_AGENT holds the browser version and the user's operating system (Windows, etc.), separated by a (;), so we can use that fact to split them up. The stuff we want happens to be the 2nd and 3rd items in the resulting list. I've used ereg_replace() to get rid of the closing ) - just for neatness. |
|
The variable $HTTP_REFERER holds the address of the page the user came from, so we can look in there to find out which search engine - and which keywords - were used. If this variable is empty that usually means that the visitor wasn't referred from another site. To find out which search engine was used - if any - we just loop through the array or (partial) names and look for a match with ereg(). We'll set $engine to "None or Unknown" if we don't get a match on anything in the array. Next we'll want to know what keywords the visitor typed into the search engine to find this page. Very often, there will be a lot of items in the query string of the referring URL that we just don't care about. These various items will be in key/value pairs, separated by an &. So we split up the referring URL into a list, using the & as our split point. The keywords will be separated by + signs instead of spaces (spaces aren't allowed). Now we want to look at the list of things we broke out of that big long URL. The one we want will have some + in it, so we search for a match on that. As soon as we find one, we can stop. That what the break; statement does. When we find out match, we remove everything up to and including "q=". Then we replace the + signs with spaces. Then we remove and %22s, which represent "s. They may or may not be there, but there's no point cluttering our database with useless characters. What's left is the exact keywords the user typed in! We get the visitor's preferred language from the $HTTP_ACCEPT_LANGUAGE variable. There could be more than one, separated by ; but we'll just take the first one. If you want to capture all the languages the visitor's browser accepts, you can split on the ;s and strip out any strings that look like "q=.8 . Values for q will be in decreasing order, telling us the relative preference for each language. I've mentioned $REQUEST_URI before; it's everything in the path to the page - after the first /. The home page is an exception, since it doesn't have to be specified as part of the URL. I chose to check for this condition and set $uri to "home page" so it would be easy to recognize in the database later on. Cookies. Ya love 'em or ya hate 'em. Or maybe you don't even know what they are - but you've heard about them so much that you're ready to kill the Pillsbury Doughboy! Well, they can be quite useful - so you need to know about them. Let's start with what they are. Here's the real deal. Cookies are small text files that some sites ask to store on your computer. They contain certain information that can be read by that site the next time you visit it. One of the useful things they do is make logging into membership sites a lot faster. The cookie might have a code that correlates to your username/password info so you don;t have to type it in every time. I'm using them for a different purpose here. When someone visits one of the pages I want to track, I write the date of the visit into the cookie. Later when they visit again, I can tell if they're a new visitor or a returning one. I write the date of the return visit into the cookie. Comparing the two dates, I can tell how many days it's been since their last visit. Pretty harmless information, really. Now here's what the cookie code in this script does exactly. First it reads the cookie values "visit1" and "visit2". Visit1 was stored in the normal date format of YYYY-MM-DD HH:MIN:SEC and visit2 is stored as the system time in seconds (seconds since Jan. 1, 1970). From these two pieces of info, I can calculate the time (in seconds) since the last visit, convert that to days and figure out if this is a new visitor (possibly a "unique" visitor). It only tells me if they've visited the site before - not if they've been to this specific page. Also, since cookies have an expiration date, I calculate a date 90 days from now. Bottom line is that if someone doesn't return within 90 days, the cookie crumbles...I mean, expires. After that, they won't have a cookie from my site and will appear to be a new first-time visitor. Finally, I set the new values into the cookie - along with the revised expiration date. The date of last visit, date of this visit and the (calculated) number of days since the last visit are store in the database. Whether it's a "new" visitor or not is also noted. Later on, I can look for all the database entries that say "user was new" and count how many "unique" visitors there were in, for example, the last 30 days. This is a good thing to know, since it cannot be reliably extracted from just the timestamp and the IP address. That's because many peoples' IP address is different each time they visit. Nearly all dialup users and a lot of cable and DSL users have many possible IP addresses. The next chunk of code is simply to write all the data I gathered into the database. You've seen that explained before, so I won't go over it again here. The point is that this collection of data can be used to determine total page views (for any or all pages), "unique" visitors, etc. - for any specified time period. Whew! I'm glad that's over. But now you should be able to see what a handy tool these cookies can be. If you're interested in pursuing this further, try looking in a PHP manual and read up on "session" cookies. By using that kind of cookie - which has a unique session ID, you can track a visitor's path through your site - by following them from page to page.
From that you can tell which visitors ended up on the thankyou page and how they got there. Now you know which "hits" actually turned into sales! Based on how they got to your site in the first place, you can tell what routes (search engines, AdWords ads, etc.) were most effective in terms of putting cash in your pocket. You can even see if someone reached your thankyou(or download) page without paying for your product. Hmmm.... Edit the values for $db_name, $db_host, $db_user and $db_pass as needed. Upload this script to the same folder where you keep your web pages. Using what you leaned from Chapter 14, you can make a script that will show you whatever results you want to see. Suppose you wanted to see the hits on page5.php for the month of May, 2004. You'd use a SELECT statement with a WHERE clause that filtered the data and returned only the results where the "date" contained "May" and "2004" and the "uri" field contained "page5.php." That new script first displays a form where you'd select page name, month and year. When the button is clicked, it plugs those values into variables and creates the SELECT statement. Then it queries the database to get the results that match your search. Finally it displays the results in HTML page, most likely in the form of tables. But how? Actually that part is simple, too. First you make an HTML table and use the first row for your column headings. Create one row with examples of the data you want to display. Then replace the actual values with variables ($hits, $keywords, $search_engine, etc.). Then, while the script is pulling out results, display each result by adding a row to the table. Use your imagination! You can make each keyword or keyword phrase an element of an array and store the number of hits on each keyword in the array, too. Then you can display hit counts per keyword. You can make another table to store user data(name, email, etc.) for a download page. Another table might be your actual sales data. Suppose you gave people a trial version of your product and you wanted to know what percentage of them eventually bought the paid version? Easy as pie, my friends. Use SELECT to get the records where name and email matched in the "free_downloads" table and "customer" table. Count the number of matches; it only takes one line of code. Divide sales by downloads and you have your answer! PHP can do the math for you so you don't have to dig out your calculator. The free script Website Toll Both does some of this for you - but it keeps its records in a text file. I plan to write one very soon that uses a MySQL database and the method I just described to make the reports easier to read, more customizable and targeted to the exact question you want answered in a flash. If you have feature suggestions or an idea for a name to give the new software...feel free to email me about it. |
| Previous Page Table of Contents Next Page |
Copyright © 2004 Steve Humphrey |