Mind Chasers Inc.

Don't Track Don't Work?

Testing shows that Don't Track doesn't work. However, identifying tracking sites and blocking them does...

advertisement:
Get your Safari Membership Today!

Background

It is no secret that the Web experience has evolved to include a great deal of tracking by various (third-party) sites to classify each visitor. The most obvious reason for tracking and classification is to target ads to the appropriate audience. However, it is also apparent that tracking can be used by government agencies to detect perceived threats and by criminals and hackers to steal, stalk, and create havoc.

The bottom line is that if dozens or even hundreds of unknown sites are tracking you in the shadows of the Internet, then you have lost control of your privacy and maybe your security, too.

The bread and butter of tracking is the use of cookies. These are small pieces of data that are written to your browser (e.g., FireFox, Chrome, Safari, etc.) by the sites you visit. A third-party site is one you unintentionally visit because the primary site embedded them, typically through an ad. And a third-party cookie is one that is written by the third-party site. Tracking occurs when a third-party site (or family of sites) is present across the multiple sites that you intentionally visit. A third-party site tracks you by identifying your unique cookie(s) as you travel the Internet via your browser.

The extent of identification depends on the information you share. It's indeed possible that a third-party site can know a great deal about you in addition to the sites you visit and what you do while there. And sometimes that third party was at one point the primary site (e.g., google.com).

In this case, a site can build an extensive classification of you including your background, associates, habits, likes & dislikes, and maybe even your intentions.

In response to the state of the Internet and its condition of pervasive undesirable tracking, a typical modern browser includes an option to instruct sites to not track its particular user. If the Do Not Track (DNT) feature is enabled, each HTTP web request sent via the browser includes a field that directs the web server to not track its user.

A typical exchange of a request and response that includes the DNT option is shown below (note that content removed is denoted with an ellipsis '...').

An example Request from the browser:

GET / HTTP/1.1
Host: www.whitehouse.gov
User-Agent: Mozilla/5.0 ... Firefox/29.0
...
DNT:1
...

Response:

HTTP/1.0 200 OK
Date: Wed, 04 Jun 2014 12:40:23 GMT
Server: WSGIServer/0.1 Python/2.7.3
...
Set-Cookie:  webcookie=randomstring; expires=Wed, 03-Jun-2015 12:40:23 GMT; Max-Age=31449600; Path=/

	
<!DOCTYPE html>
<html>
<head>
...

DNT may seem like a good solution, but testing shows it's about as effective as wearing a "don't rob me" sign while counting cash in public.

Third Party Testing

Our testing consists of the following:

  1. Identify the extent of sites tracking us by using the FireFox Lightbeam plugin as we visit seven sites in a particular (repeatable) order.
  2. First visit the seven sites with Don't Track (DNT) disabled and then visit them again with DNT enabled and record & compare the results. Use both the Lightbeam data and our local FireFox cookie database to determine the impact of the Don't Track option.
  3. If finding the Don't Track option is NOT effective, then utilize a proxy and content filter to block requests to third-party tracking sites.

The seven sites we choose to visit along with tracking results are shown below in Tables 1 and 2. Table 1 summarizes the data reported by Lightbeam, and Table 2 summarizes the number of third-party cookies set in our browser. Each column indicates the number of third parties detected as an independent result and also the cumulative result: independent/cumulative.

For example, visiting weather.com independently results in 43 third-party sites being detected. Visiting wsj.com independently results in connecting with 75 third parties. However, there are only a total of 90 third-party sites reported after visiting weather.com and then wsj.com. The difference between the sum of independent results with the cumulative result (43+75-90) is correlated with the extent of third-party tracking: 28 sites are shared between the two domains and many of them are third-party tracking sites.

Table 1. Third Party Sites Detected by Lightbeam
Web site No Preference Don't Track
1 weather.com 43/43 30/30
2 wsj.com 75/90 67/100
3 cnn.com 26/97 27/109
4 maxim.com 14/112 37/123
5 maximintegrated.com 14/117 6/126
6 disney.com 17/125 16/131
7 foxnews.com 49/146 41/153
Table 2. Third Party Cookies
Web siteNo PreferenceDon't Track
1weather.com38/3815/15
2wsj.com139/155132/142
3cnn.com18/16418/191
4maxim.com14/19567/217
5maximintegrated.com31/2087/222
6disney.com12/21511/228
7foxnews.com58/25735/267

Also, for this particular example (wsj.com + weather.com), the Lightbeam graph is provided below and shows how third-party sites track users across both weather.com and wsj.com. Lightbeam is an amazing tool and conveys its data in both list and graph format. In this graph depiction, circles are the primary sites we visit (e.g., wsj.com), and triangles are the third-party sites that we unintentionally visit. Note that not all third-party sites are shown in the graphical display.

Figure 1. Lightbeam Graph after visiting wsj.com and weather.com
Lightbeam Graph after visiting wsj.com and weather.com

Returning to Tables 1 and 2 we can see that setting the "Don't Track" preference had virtually no impact in the cumulative number of third-party sites tracking us and the number of third-party cookies being set. For all we know, third-party sites are even using the "Don't Track" preference to classify us further.

The bottom line is that after visiting seven sites, we have nearly 150 third-party sites potentially tracking us along with 250 third-party cookies regardless of our Don't Track preference.

To further illustrate the results, Figures 2 and 3 below show the Lightbeam graphs at the end of visiting our seven sites with and without the "Don't Track" preference set.

Figure 2. Lightbeam Graph for Seven Sites with No Tracking Preference Set
Lightbeam Graph for Seven Sites with No Tracking Preference Set
Figure 3. Lightbeam Graph for Seven Sites with Don't Track Set (DNT=1)

And before we go on to discuss cookies and then filtering & blocking at the network level, it's important to point out a few things:

  • Not all cookies are bad. They enable your content provider to do useful things like remember your state and that you have logged in (authenticated yourself).
  • The results provided here should be repeatable in general, but it has been shown to be extremely unlikely to get the same number of tracking sites or cookies during repeated trials. It appears that most sites and / or ad engines change their particular ads on a frequent basis.
  • Both weather.com and maximintegrated.com did exhibit repeatedly a lower number of third-party connections and cookies with Don't Track set.
  • Some sites can exhibit wild swings from vist to visit. For example, visiting maxim.com triggered over 100 third-party connections during one independent visit.
  • An independent visit implies that the browser's history was cleared of everything and the Lightbeam data was reset before visiting the site and collecting the data.
  • There are other ways for a web site to track its users including tracking the IP address of the visitor and a signature conveyed in request headers (e.g., User-Agent)

Third Party Cookies

The previous page mainly focused on Lightbeam data. However, this page will focus on the issue of the cookies themselves. For convenience, we again show Table 2.

Table 2. Third Party Cookies
Web siteNo PreferenceDon't Track
1weather.com38/3815/15
2wsj.com139/155132/142
3cnn.com18/16418/191
4maxim.com14/19567/217
5maximintegrated.com31/2087/222
6disney.com12/21511/228
7foxnews.com58/25735/267

The first number in each column is the number of third-party cookies set for an independent visit, and the second number is the cumulative number of third-party cookies set: [ independent / cumulative ]. The extent of tracking is correlated with the difference of the two numbers: comparing the sum of independent results with the cumulative result. For weather.com and wsj.com, the difference is 139+38-155=22. In other words, 22 cookies have been set that are shared between the two sites (with no tracking preference set).

The overall result is that regardless of our Don't Track preference, at the end of visiting our seven sites, we have over 250 third-party cookies set in our browser.

There are multiple ways to determine the cookies being set in the browser by third parties. For our study, we have chosen to work directly with the mozilla cookie database: moz_cookies in the cookies.sqlite file. This file can be found within the ~/.mozilla/firefox folder on Linux and ~/Library/Application\ Support/Firefox/ on MacOS. The file can be read using the sqlite3 command line tool.

Before accessing the cookies.sqlite file, we quit Firefox and copy the cookies.sqlite file to our local working directory. The example below shows how to write out the third-party cookies to a csv file after visiting weather.com and wsj.com.

	$ sqlite3 cookies.sqlite
	
	SQLite version 3.7.9 2011-11-01 00:52:41
	...
	sqlite> .mode csv
	sqlite> .output 2_t.csv
	sqlite> select * from moz_cookies where baseDomain!="weather.com" and baseDomain!="wsj.com";

If we had only wanted to display the number of third-party cookies in the moz_cookies table, then we could have replaced select * with select count(*):

	$ sqlite3 2_t.sqlite
	
	SQLite version 3.7.9 2011-11-01 00:52:41
	...
	sqlite> select count(*) from moz_cookies where baseDomain!="weather.com" and baseDomain!="wsj.com";
	155

The description (schema) of moz_cookies is provided below. For our study, we're mainly interested in the baseDomain column.

	.schema moz_cookies
	CREATE TABLE moz_cookies (
		id INTEGER PRIMARY KEY, 
		baseDomain TEXT, 
		appId INTEGER DEFAULT 0, 
		inBrowserElement INTEGER DEFAULT 0, 
		name TEXT, 
		value TEXT, 
		host TEXT, 
		path TEXT, 
		expiry INTEGER, 
		lastAccessed INTEGER, 
		creationTime INTEGER, 
		isSecure INTEGER, 
		isHttpOnly INTEGER, 
		CONSTRAINT moz_uniqueid UNIQUE (name, host, path, appId, inBrowserElement)
	);
	CREATE INDEX moz_basedomain ON moz_cookies (baseDomain, appId, inBrowserElement);

Below we process our cumulative Don't Track data base. We output a file of distinct third-party domains.

	sqlite> .output tracking_cookies1.txt
	sqlite> select distinct baseDomain from moz_cookies where baseDomain!="wsj.com"
	   ...> and baseDomain!="weather.com" and baseDomain!="cnn.com" and baseDomain!="maxim.com"
	   ...> and baseDomain!="maximintegrated.com" and baseDomain!="disney.com" and baseDomain!="foxnews.com";

The "tracking_cookies1.txt" file is provided below. Keep in mind that this is a list of each distinct domain that set at least one cookie in our browser while visiting our seven sites. There are 94 domains listed, and the majority of these third-party sites set multiple cookies in our browser.

At this point you may be in disbelief that 94 sites that you unintentionally visited set (tracking) cookies in your browser after visiting only seven sites with Don't Track set.

There are solutions to this problem. You can choose to disable cookies entirely; however, you probably won't like the result. In some browsers, including Firefox, you can set a preference to not allow third parties to set cookies. However, keep in mind that these third-party sites can still track you by your IP address and setting an individual browser isn't a great solution when you frequently surf the Web from multiple machines and browsers.

Content Filtering to Counter Third Party Tracking

So far, we have shown that extensive third-party tracking occurs as we travel across the Internet regardless of how we set the Don't Track preference in our browser. Now armed with a list of domains that potentially act as third-party tracking sites, we turn to a content filter within our network to block third parties from tracking us.

By utilizing the combination of a proxy, content filter, and web server on our network controller, we can locally process and respond to undesired web requests to retrieve third-party content. These third-party requests typically originate from javascript that is (subsequently) linked within the original page.

The data flow is depicted in Figure 4. Our local network controller blocks undesired web requests utilizing its database and instructs the local web server to respond with a blocking message rather than forwarding the request onto the Internet. This prevents the third-party site from receiving the request and subsequently setting third-party cookies in our browser.

Figure 4. Content Filter Data Flow Example for Blocking

At this point, we will discontinue using Lightbeam to gather data and just rely on analyzing the cookie database. This is because Lightbeam won't adequately discern our local web server's response from a true third-party response.

For this test, we disable the Don't Track preference in Firefox.

The results using our content filtering database is shown below. We now have 27 third-party sites setting cookies in our browser. Note that a few of them are probably desirable (e.g, turner.com and dowjoneson.com).

As a final step, we will add the majority of this list to a new branch in our content filter database tree and re-visit the seven sites one final time.

The results are shown below. Only eight third parties are now setting cookies in our browser. Remember that we initially had 94 third-party sites setting cookies in our browser.

Final Thoughts

  • Testing shows that third-party tracking is indeed pervasive, and the Don't Track preference appears to show little to no positive effect.
  • Centralized content filtering is an effective way to either drastically reduce or eliminate third-party tracking. However, its success depends on maintaining a current database of third-party tracking sites.
  • Both the Lightbeam plugin and the cookie database are great collections of data to identify the third-party sites performing tracking.
  • Blocking third-party tracking sites does not negatively impact the browsing experience. In fact, it may be more pleasurable since ads are often distracting. Figure 5 shows a screen shot of wsj.com with centralized content filtering enabled.
  • Blocking requests to third party sites can yield a more responsive experience since the browser doesn't need to make the dozens (if not hundreds) of additional third-party web requests for each page visited.
Figure 5. Screen Shot of wsj.com with Content Filtering Enabled

Related articles on this site:

share

The latest and greatest Raspberry Pi 3 board

TP-Link 5-Port Gigabit Switch. TP-Link unmanaged switches are low-cost, solid performers.

Fly the US Flag proudly this summer

Please help us improve this article by adding your comment or question:

For enhanced features and capabilities, please authenticate using a popular third party

your email address will be kept private
previous month
next month
Su
Mo
Tu
Wd
Th
Fr
Sa
loading