It is no secret that the Web experience has evolved to include a great deal of tracking by various (third-party) sites to classify each visitor. The most obvious reason for tracking and classification is to target ads to the appropriate audience. However, it is also apparent that tracking can be used by government agencies to detect perceived threats and by criminals and hackers to steal, stalk, and create havoc.
The bottom line is that if dozens or even hundreds of unknown sites are tracking you in the shadows of the Internet, then you have lost control of your privacy and maybe your security, too.
The extent of identification depends on the information you share. It's indeed possible that a third-party site can know a great deal about you in addition to the sites you visit and what you do while there. And sometimes that third party was at one point the primary site (e.g., google.com).
In this case, a site can build an extensive classification of you including your background, associates, habits, likes & dislikes, and maybe even your intentions.
In response to the state of the Internet and its condition of pervasive undesirable tracking, a typical modern browser includes an option to instruct sites to not track its particular user. If the Do Not Track (DNT) feature is enabled, each HTTP web request sent via the browser includes a field that directs the web server to not track its user.
A typical exchange of a request and response that includes the DNT option is shown below (note that content removed is denoted with an ellipsis '...').
An example Request from the browser:
GET / HTTP/1.1 Host: www.whitehouse.gov User-Agent: Mozilla/5.0 ... Firefox/29.0 ... DNT:1 ...
HTTP/1.0 200 OK Date: Wed, 04 Jun 2014 12:40:23 GMT Server: WSGIServer/0.1 Python/2.7.3 ... Set-Cookie: webcookie=randomstring; expires=Wed, 03-Jun-2015 12:40:23 GMT; Max-Age=31449600; Path=/ <!DOCTYPE html> <html> <head> ...
DNT may seem like a good solution, but testing shows it's about as effective as wearing a "don't rob" sign while counting cash in public.
Third Party Testing
Our testing consists of the following:
- Identify the extent of sites tracking us by using the FireFox Lightbeam plugin as we visit seven sites in a particular (repeatable) order.
- First visit the seven sites with Don't Track (DNT) disabled and then visit them again with DNT enabled and record & compare the results. Use both the Lightbeam data and our local FireFox cookie database to determine the impact of the Don't Track option.
- If finding the Don't Track option is NOT effective, then utilize our Linux network controller as a content filter to block requests to third-party tracking sites.
The seven sites we choose to visit along with tracking results are shown below in Tables 1 and 2. Table 1 summarizes the data reported by Lightbeam, and Table 2 summarizes the number of third-party cookies set in our browser. Each column indicates the number of third parties detected as an independent result and also the cumulative result: independent/cumulative.
For example, visiting weather.com independently results in 43 third-party sites being detected. Visiting wsj.com independently results in connecting with 75 third parties. However, there are only a total of 90 third-party sites reported after visiting weather.com and then wsj.com. The difference between the sum of independent results with the cumulative result (43+75-90) is correlated with the extent of third-party tracking: 28 sites are shared between the two domains and many of them are third-party tracking sites.
|Web site||No Preference||Don't Track|
|Web site||No Preference||Don't Track|
Also, for this particular example (wsj.com + weather.com), the Lightbeam graph is provided below and shows how third-party sites track users across both weather.com and wsj.com. Lightbeam is an amazing tool and conveys its data in both list and graph format. In this graph depiction, circles are the primary sites we visit (e.g., wsj.com), and triangles are the third-party sites that we unintentionally visit. Note that not all third-party sites are shown in the graphical display.
Returning to Tables 1 and 2 we can see that setting the "Don't Track" preference had virtually no impact in the cumulative number of third-party sites tracking us and the number of third-party cookies being set. For all we know, third-party sites are even using the "Don't Track" preference to classify us further.
The bottom line is that after visiting seven sites, we have nearly 150 third-party sites potentially tracking us along with 250 third-party cookies regardless of our Don't Track preference.
To further illustrate the results, Figures 2 and 3 below show the Lightbeam graphs at the end of visiting our seven sites with and without the "Don't Track" preference set.
And before we go on to discuss cookies and then filtering & blocking at the network level, it's important to point out a few things:
- Not all cookies are bad. They enable your content provider to do useful things like remember your state and that you have logged in (authenticated yourself).
- The results provided here should be repeatable in general, but it has been shown to be extremely unlikely to get the same number of tracking sites or cookies during repeated trials. It appears that most sites and / or ad engines change their particular ads on a frequent basis.
- Both weather.com and maximintegrated.com did exhibit repeatedly a lower number of third-party connections and cookies with Don't Track set.
- Some sites can exhibit wild swings from vist to visit. For example, visiting maxim.com triggered over 100 third-party connections during one independent visit.
- An independent visit implies that the browser's history was cleared of everything and the Lightbeam data was reset before visiting the site and collecting the data.
- There are other ways for a web site to track its users including tracking the IP address of the visitor. However, for most homes and businesses, multiple browsers will connect using the same IP address behind a firewall and most home owners will see their IP address change periodically.