5
In addition, many websites implement P3P and publish their privacy policies directly in P3P form. We used our software to analyze the privacy
P3P policies of approximately 8,000 P3P-enabled websites. Finally, in some cases we had both the P3P policies from a website as
well as our translation of their human readable policies into P3P format. We were able to contrast the P3P they provided with the P3P we coded, to see if
their human readable policies match their P3P policies.
2.2 Data Sets
We used or created the following data sets:
1. AOL search most-clicked domains. We obtained a list of the
30,000 most clicked on domains from America Online AOL search results collected during October of 2005. This list
included the number of clicks made to each domain during that period. We created two data sets for our study based on this list:
a “Popular” list and a “Random” list, described below.
2. Most popular websites. We selected the most popular
websites from the AOL search data. In order to make our results comparable to other studies in the Milne and Culnan study
20
, we further refined the list by removing all websites that had a top-
level domain other than .com, pornographic websites, and websites targeted to children. Of the 75 websites on our Popular
list, 72 had human-readable privacy policies and 21 had both human-readable policies and P3P policies. See Appendix A for
the list of websites we used.
3. Random websites. We again used the AOL search data and
omitted sites other than .coms, pornographic content, and sites targeted to children. Of the top 12,000 most popular websites,
we selected 100 at random. We limited the sample frame to the top 12,000 in order to be comparable to other studies. Of the 100
websites on our Random list, 78 had human-readable privacy policies and 9 had both human-readable policies and P3P
policies. See Appendix B for the list of websites we used.
4. Financial websites. We compiled a list of the top 10 U.S banks
with a significant online presence from 1999 to 2005, which we extracted from the list of top 500 depository institutions SIC
code 602.
21
We also generated a list of 30 randomly selected U.S. banks with a significant online presence. In addition, we
created a dataset of the top 10 credit card issuers. Finally, we selected 10 retail websites at random to serve as a control group.
6
5. Longitudinal financial websites. We collected privacy