6
5. Longitudinal financial websites. We collected privacy
policies from 60 U.S. companies banks, credit card, random retail as mentioned above. For each company, we collected their
privacy policies once a year from 1999 to 2005. 2005 policies were collected from the companies’ website directly. Policies
from 1999 to 2004 were collected from the Internet Archive.
22
6. P3P-enabled websites. We used a list of P3P-enabled websites
discovered by the Privacy Finder
23
search engine. The search engine maintains a cache of every P3P-enabled website that is
returned from a user query. As of December of 2006, Privacy Finder’s cache contained over 150,000 websites, 9,408 of which
were P3P-enabled.
7. Industry segmented websites. We were able to define
categories for a portion of the websites in the Privacy Finder cache using information from the Yahoo Directory. Using a
custom script, we were able to categorize 16,919 sites that were in our cache about 11 of our cache. Of these, 1,181 were P3P-
enabled 7. Due to the large number of categories yielded in this fashion, we decided to only analyze the most popular
categories: “shopping,” “government,” “news and media,” “computers,” “banking,” “B2B business to business,” “adult,”
“blogs,” and “education.”
8. Typical search terms. We obtained a list of 19,999 unique
search terms randomly sampled from a complete weekly log of search queries entered by AOL users in 2005. We received only
the search queries themselves, with no information linking the search queries to the users who entered them or linking multiple
search queries together. We consider these search queries to be “typical” search queries. This particular sample size was used
because it provides generalizable statistically significant results.
9. Froogle search terms. We collected search terms from
Google’s Froogle service.
24
Froogle displays a list of 25 recently used search terms. Since Froogle is designed to show products
for sale, these terms generally are going to be indicative of e- commerce. Using a Perl script, we screen-scraped these search
terms from Froogle. We collected 940 unique terms in this manner.
7
3 Comparison of Popular Websites to Random
Websites
We coded privacy policies from the top 75 websites “Popular” and a random sample of 100 websites “Random” into standard P3P format. We
checked the policies against the privacy settings used by the Privacy Bird P3P user agent. We also analyzed the policies against sixty-two rule sets
developed for a 2003 study.
25
We follow a similar approach as used in that study, and compare our results to that study in section Analysis of Privacy
Protections on page 11.
3.1 Privacy Bird Settings