Leaky Websites

closeThis post was published 7 years 1 month 26 days ago. A number of changes have been made to the site since then, so please contact me if anything is broken or seems wrong.

This is my second blog post assignment for my Journalism course. As with the first, reposted here because “why not”.

The New York Times‘ “Bits” blog published an article last Tuesday that really opened my eyes. The Center for Internet and Society at Stanford Law School released data on what information is passed between certain popular websites.

Long story short, logging in (or even trying and failing to log in) to a site can pass information about you to third parties. That information can be as innocuous (but still trackable) as a “unique identifier” generated by the site or as specific as your email address, username, and real name.

Somini Sengupta (author of the Bits blog post) says:

Take for instance these findings, released on Tuesday by computer scientists at Stanford University. If you type a wrong password into the Web site of The Wall Street Journal, it turns out that your e-mail address quietly slips out to seven unrelated Web sites. Sign on to NBC and, likewise, seven other companies can capture your e-mail address. Click on an ad on HomeDepot.com and your first name and user ID are instantly revealed to 13 other companies.

I did some digging of my own through the Microsoft­® Excel® spreadsheet available from the Stanford Law School page (direct link to XLSX file) and found some interesting examples of my own.

For example, MSN.com leaks your birth year and birthdate to FBCDN.net (a domain owned by Facebook and used for content distribution). Facebook’s CDN can’t possibly need that information for anything but tracking. Take another case: Ask.com sends your username to Google Analytics, reCAPTCHA (owned by Google), ScorecardResearch (part of comScore, Inc.), Gigya (a company that “makes websites social”), Quantserve.com (used by Quantcast, an advertising network), IMRWorldwide.com (controlled by Nielsen), and LinkedIn.

Incredibly, The Huffington Post’s website sends your username to BlogCDN.com (another CDN), BuzzFeed (“Tracks the Web’s Obsessions in Real Time”), AdSonar (owned by Advertising.com; provides targeted text ads), ScorecardResearch, AOL.com (Huffington Post’s owner), FBCDN.net, aolcdn.com (AOL’s CDN), ATWOLA.com (stands for AOL Time Warner Online Advertising; tracks surfing habits), Facebook.com and Facebook.net, Google Analytics, IMRWorldwide.com, Quantserve.com, and HuffPost.com (used for delivering static content without cookies, ironically); your birthday to BuzzFeed and IMRWorldwide.com; and your birth year to Advertising.com and ATWOLA.com.

The point is, any information given to a website as part of the registration process or entered later while updating a profile allows third parties to do just that: profile you as a person through your behavior across countless sites. All this tracking is thanks to the triviality of circumventing the “same origin policy” of data stored in browser cookies through collaboration between sites.

A standard feature of Web browsers is sending the address of the last page visited (the “referrer”) to the page being loaded. In the case of images, scripts, or other resources loaded within a page, the referrer is the page in which they are embedded. If the page displaying advertising has personal information embedded in its URL, that information is passed on to any sites whose assets are embedded in the page. This kind of information leakage can be accidental as well as deliberate. It does not typically function for sites that are encrypted (URLs beginning with https://), as most browsers disable sending referrers for secured connections.

Websites intentionally wanting to share user information might go about doing so another way, and while I had a written explanation of an example process it is sufficient to say that methods for intentionally sharing information and tracking users across domains, even in spite of user privacy choices like clearing cookies, are numerous.

When information is revealed in the URL, it’s not necessarily intentional. Back in May, Symantec discovered (The Daily Mail reports) that some applications on Facebook’s platform were potentially giving advertisers access to users’ accounts due to app URLs including access tokens, the bits of information older Facebook apps used to identify themselves and connect to users’ accounts. It was just an oversight.

dgw

I am an avid technology and software user, in addition to being reasonably well-versed in CSS, JavaScript, HTML, PHP, Python, and (though it still scares me) Perl. Aside from my technological tendencies, I am also a theatre technician, sound designer, violinist, singer, and actor.

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail (or subscribe without commenting)

Comments are subject to moderation, and are licensed for display in perpetuity once posted. Learn more.