Leaky Websites
This is my second blog post assignment for my Journalism course. As with the first, reposted here because "why not".
The New York Times' "Bits" blog published an article last Tuesday that really opened my eyes. The Center for Internet and Society at Stanford Law School released data on what information is passed between certain popular websites.
Long story short, logging in (or even trying and failing to log in) to a site can pass information about you to third parties. That information can be as innocuous (but still trackable) as a "unique identifier" generated by the site or as specific as your email address, username, and real name.
Somini Sengupta (author of the Bits blog post) says:
Take for instance these findings, released on Tuesday by computer scientists at Stanford University. If you type a wrong password into the Web site of The Wall Street Journal, it turns out that your e-mail address quietly slips out to seven unrelated Web sites. Sign on to NBC and, likewise, seven other companies can capture your e-mail address. Click on an ad on HomeDepot.com and your first name and user ID are instantly revealed to 13 other companies.
I did some digging of my own through the Microsoft® Excel® spreadsheet available from the Stanford Law School page (direct link to XLSX file) and found some interesting examples of my own.
For example, MSN.com leaks your birth year and birthdate to FBCDN.net (a domain owned by Facebook and used for content distribution). Facebook's CDN can't possibly need that information for anything but tracking. Take another case: Ask.com sends your username to Google Analytics, reCAPTCHA (owned by Google), ScorecardResearch (part of comScore, Inc.), Gigya (a company that "makes websites social"), Quantserve.com (used by Quantcast, an advertising network), IMRWorldwide.com (controlled by Nielsen), and LinkedIn.
Incredibly, The Huffington Post's website sends your username to BlogCDN.com (another CDN), BuzzFeed ("Tracks the Web's Obsessions in Real Time"), AdSonar (owned by Advertising.com; provides targeted text ads), ScorecardResearch, AOL.com (Huffington Post's owner), FBCDN.net, aolcdn.com (AOL's CDN), ATWOLA.com (stands for AOL Time Warner Online Advertising; tracks surfing habits), Facebook.com and Facebook.net, Google Analytics, IMRWorldwide.com, Quantserve.com, and HuffPost.com (used for delivering static content without cookies, ironically); your birthday to BuzzFeed and IMRWorldwide.com; and your birth year to Advertising.com and ATWOLA.com.
The point is, any information given to a website as part of the registration process or entered later while updating a profile allows third parties to do just that: profile you as a person through your behavior across countless sites. All this tracking is thanks to the triviality of circumventing the "same origin policy" of data stored in browser cookies through collaboration between sites.
A standard feature of Web browsers is sending the address of the last page visited (the "referrer") to the page being loaded. In the case of images, scripts, or other resources loaded within a page, the referrer is the page in which they are embedded. If the page displaying advertising has personal information embedded in its URL, that information is passed on to any sites whose assets are embedded in the page. This kind of information leakage can be accidental as well as deliberate. It does not typically function for sites that are encrypted (URLs beginning with https://), as most browsers disable sending referrers for secured connections.
Websites intentionally wanting to share user information might go about doing so another way, and while I had a written explanation of an example process it is sufficient to say that methods for intentionally sharing information and tracking users across domains, even in spite of user privacy choices like clearing cookies, are numerous.
When information is revealed in the URL, it's not necessarily intentional. Back in May, Symantec discovered (The Daily Mail reports) that some applications on Facebook's platform were potentially giving advertisers access to users' accounts due to app URLs including access tokens, the bits of information older Facebook apps used to identify themselves and connect to users' accounts. It was just an oversight.
Google Books and the Book Industry
I wrote this for my Journalism class at college, but figured I might as well share it here too.
The New York Times ran a story Monday about a new lawsuit filed against HathiTrust, a partnership of universities and research libraries that maintains a digital book collection on its website.
Plaintiffs in the suit include three major authors' groups: the Authors Guild, the Australian Society of Authors, and the Québec Union of Writers. Eight individual authors are also party to the filing, among them Pat Cummings, Roxana Robinson, and T.J. Stiles.
The objections raised in the suit center around the HathiTrust collection itself. "[S]even million copyright-protected books" (according to Paul Aiken, executive director of the Authors Guild, as quoted by the NYT) are available without any consent from the authors. The Authors Guild and its fellow plaintiffs say that the collection violates copyright law.
HathiTrust's collection consists of books digitized by Google, Inc. as part of the Google Books project, which has been steadily scanning books from participating university libraries across the United States.
The Google Books project has been the subject of many lawsuits over the years since work on it was begun in 2002. A few examples will help provide context:
- 2005: The Authors Guild sues Google for "plain and brazen violation of copyright law" (archived press release from AG via Archive.org)
- 2009: French court halts Google Books in France: the ruling applies only to books published in France under copyright (Los Angeles Times article)
- 2010: Several professional photographers' organizations bring a class-action suit regarding the reproduction of copyrighted images within the books scanned by Google (Mashable.com article)
The Authors Guild has been involved with this issue before. This time, the fight has been brought to an organization with a bit less might than Google.
But never mind who sued whom, for what, and when. The issue is really quite simple, and most of the lawsuits against Google Books have had little to no merit.
United States copyright law (the laws under which most Google Books lawsuits have been filed) contains a doctrine known as Fair Use. It was originally intended to protect commentary, critique, and parody of copyrighted works. However, the principles of Fair Use (Cornell University Law School Legal Information Institute):
- "the purpose and character of the use" — e.g. for commentary, critique, parody, scholarship, etc.
- "the nature of the copyrighted work" — published/unpublished, fact/fiction
- "the amount and substantiality of the portion used" — how much of the work was used, and how significant the used portion is to the work as a whole
- "the effect of the use upon the potential market" — if the use of that portion will negatively affect demand for or the value of the original work
(Thanks to Stanford University's Copyright & Fair Use information center for helping me refresh my own memory of these concepts.)
The way Google Books works is carefully designed to fit within existing copyright laws. Books in the public domain are fully accessible, with no restrictions. Copyrighted, in-print books allow whatever access the publisher has specified. For in-copyright books that do not have a publisher, Google restricts access to "snippets", which show just a few words surrounding the user's search term.
So: Whenever Google Books shows a significant portion of a book, it has permission from the publisher to do so. Without permission, Google Books displays tiny fractions of the full work in an immensely transformative manner.
Google Books falls well within Fair Use doctrine, at the very least. Displaying card catalog – type information about the book plus at most a sentence or so for each search result (I'll go down the Fair Use list):
- Is for scholarly reasons
- Uses published works
- Displays at most a few percent of the whole book
- May actually increase demand for the books featured in the results
(Parts of Lawrence Lessig's 2006 video discussion of Google Book Search came in handy for an overview of how Google Books works.)
So why are publishers and authors suing Google and HathiTrust?
As far as I can tell,[original research?] HathiTrust follows the same rules as Google Books. This makes sense, as the content is from the Google Books program.
HathiTrust's entire archive is intended for academic use. It's unclear why the various plaintiffs in this new lawsuit are suing for the removal of their books from the archive, rather than suing for better access controls. If the concern is that anyone can access the books (which they can), then restricting access to verified researchers would clear up the problem.
It's like big music, film, and television. The music industry figured out that it could simply adapt to the Internet and start offering content over the new medium, giving people an alternative to pirated copies shared through services like Napster, LimeWire, and BitTorrent. Film and television haven't yet figured that out, and I guess the book industry is still working on it too.
Finally: Google Voice Export Feature Released (sort of)
It took quite a while — more than two years since launching in March 2009—but Google Voice finally supports exporting!
I'd love to think my export format ideas post had something to do with the end product released yesterday, but I seriously doubt it.
Sort of...
Let's just say, Google Takeout isn't behaving very well. The test archive I created yesterday won't download, and I've tried both Google Chrome 13 and Mozilla Firefox 3.6. The feature isn't there yet, but I'm sure Google engineers are working on it.
I'm still happy...as soon as they make it actually work.
Polishing Minneapolis’ Wireless Civic Garden
I've done some playing around with the citywide Wi-Fi here in Minneapolis, and I must say that the range of information accessible through the Civic Garden feature (which allows even non-subscribers access to City-related sites) is impressive.
However, while I understand that the whitelist of "free" domains is limited to noncommercial properties, there are a few exceptions that should be made. Or at least, some resources should be hosted by the City or proxied for Civic Garden users.
Metro Transit's site
Visiting MetroTransit.org when online via the Civic Garden is a little weird. The home page is a lot longer than usual — actually, most pages are longer than usual — due to the absence of JavaScript libraries hosted at ajax.googleapis.com.
Because of the missing code, features that normally hide away in compact accordion stacks or appear when the mouse is moved over them are left in the open. One of them even steals focus when the page has loaded, making the view jump most of the way down the page. It took me a while to figure out why the page was scrolling by itself.
The navigation is broken for all but the top-level sections, because the missing code runs the drop-down menus that allow deeper browsing into the site. On the front page, a series of five images depicting the various Metro Transit services1 that is normally an automatic slideshow with mouse interaction expands to five panes stacked down the page — and the links embedded in them don't work.
On the right side of the page, a clutter of tools appears where there is normally a neat stack of expandable options. One of them is the culprit for the page-scrolling I observed, and it gets annoying after a few pageviews to have to scroll to the top of each new page loaded. (Somewhere in the page's code, a JavaScript snippet that doesn't rely on one of the missing libraries is placing the caret2 in a text input near the bottom of the page, and most browsers automatically scroll to make such a "focused" element visible.
Glancing at the page's source, I notice immediately which files must be the problem. Two <script> tags include the jQuery and jQuery UI libraries from Google's CDN. This practice usually improves speed, since the likelihood of the files already being cached by a visitor's browser is increasing as more and more sites start using these Google-hosted versions of popular JavaScript libraries instead of their own copies — but in this case, it's causing breakage for a subset of users. Google's service is not whitelisted as part of the Civic Garden.
Solutions
Two solutions present themselves, and they are both simple.
Ideally, Metro Transit would pass a request up the chain for ajax.googleapis.com to be whitelisted. Not only would doing so solve the problem for their site, but it would also allow any other Civic Garden website to take advantage of Google-hosted libraries without any further work from either Civic Garden administrators or individual site maintainers.
This first solution also has the potential to save bandwidth usage, since Google sends aggressive caching instructions along with the files hosted on its CDN. More Civic Garden sites using libraries hosted by Google would result in negligible increases in data transfer, because the same files would be downloaded once and then cached for use by any site requesting them. Saving bandwidth on the free Civic Garden would open up more of the pipe for paying subscribers — an outcome with which U.S. Internet would no doubt be pleased.
Alternatively, Metro Transit could add the core jQuery and jQuery UI files to the pre-existing /ClientScript/ directory, which I can see already contains plugins to those libraries, the Cufón library,3 and a font file for Cufón to use, among other things.
This alternate solution is a good fallback if the higher powers in control of the whitelist refuse a request to allow access to ajax.googleapis.com. It only solves the problem for Metro Transit's website, but it would fix the issues discussed above.
A third, much more complicated, option is described below. Obviously, if it were applied to the Metro Transit problem, ajax.googleapis.com would be used where www.google.com is in those examples. While it would also work, it is unnecessarily complicated for the scope of the problem facing Metro Transit's website, and that is why I don't count it as a solution here.
Resolution
Some time ago I contacted Metro Transit using the feedback form on their site to notify them about the breakage and propose (in brief) my solutions. I received a response just a few days ago, with the welcome news that they will be fixing the problem in the next site update by hosting the JavaScript files themselves. Not the ideal solution, but definitely the easier of the two possibilities I could think of.
Way to go, Metro Transit! You've beaten me to the punch. Not that it's hard to do these days, what with my posting frequency and all...
The City's site
Located at www.ci.minneapolis.mn.us, the Official Website of the City of Minneapolis has a wealth of information on everything from regulations to recycling and more. It allows access to City Council agendas, a list of what can and cannot be left for the recycling program, and countless other unexciting but eminently useful bits of information.
The main problem with the City's site as viewed through the Civic Garden access is that it is impossible to search. Submitting a query through the search box at the top of any page leads the user to a page that says "Search the Minneapolis Web Site" above another (empty) search box. And pretty much ends there.
It's great to see that the City (or at least its Web developers) is embracing modern Web services like Google's Custom Search Engine, but all the resources required to fetch and display results come from www.google.com, a domain blocked when using Civic Garden access.
Not an Easy Problem
Solving this problem is a bit more difficult. Whitelisting www.google.com is out of the question, as that would also allow free access to many of Google's consumer-oriented services including its trademark search engine, calendar, feed reader, and so on.4 Unfortunately there are no easy solutions here. Hosting the JavaScript files doesn't solve the problem because those files in turn load other files whose locations are embedded in the code.
Implementing some sort of proxy would seem to be a solution, but there's still the matter of hard-coded resource locations. Nothing returned by Google would request files via a City-controlled proxy, no matter how sophisticated the proxy.
There's also the matter of load. Obviously any solution involving the use of City hosting services should be restricted to those users who need it — that is, Civic Garden users — to avoid unnecessary load on the servers. But there might not be a way of separating the "needs" from the rest of the crowd in a way that would allow the server to send different pages to those who need them.
Best Idea Forward
Without knowing more about the network architecture, I can come up with only one possible solution.
The flow would go something like this:
- User loads search page, and browser requests resources from Google
- U.S. Internet network5 receives and recognizes requests destined for www.google.com
- Network scans a list of allowed request patterns to www.google.com; such a list allows only the resources needed for Google Custom Search
- User's browser receives the needed resources
- Google's Custom Search code sends its requests to retrieve results, which are filtered through the same mechanism at the network level and allowed to return data to the user, completing the search
It's a rough description, but generally all that's needed is an extension of the domain-based filtering to enable filtering on request patterns — that is, the contents of the GET line in the request headers.
If the requested hostname matches www.google.com, that request is sent to a second filtering routine that performs pattern analysis (via regular expressions or what-have-you) on the requested path. /jsapi and /coop/cse/* can get through and return those resources to the user; /reader/view/ and /webhp?q=denied can't, and redirect to the subscription login page (the current behavior for all non – Civic Garden sites).
Implementing this solution would require analysis of all the possible requests generated by Google Custom Search, though Google might have available (or be willing to provide) a reference of how Custom Search works. Once put in place the filtering expansion would enable any site in the Civic Garden to use the service and have it work for everyone, without changing anything else. It might also require changes to the network equipment that runs the citywide wireless service, but such upgrades would prove useful in short order as more City services were made available to Civic Garden users thanks to the accessibility of search. (See next section)
Other Applications
While the main problem with the City's site as accessed via the Civic Garden is the lack of search, there are other issues.
Forms, for instance, seem to mostly be hosted on external sites that are not included in the Garden whitelist. Much information is given about the services these forms can be used to obtain (such as snow emergency notifications by telephone or email), but filling out the forms is impossible.
A complete audit of all external resources called by the City's site (and in general, all Civic Garden sites) could provide a list of domain names and resource paths for whitelisting. The above-described filtering system could be extended with the contents of such a list so specific pages from commercial sites used on City properties could be made available, while still blocking effectively all commercial traffic from the Civic Garden.
Enabling access to third-party resources that are currently blocked, despite being included in Civic Garden properties, would provide an even greater return on the investments of time and (possibly) money in the upgrades of network hardware and firmware that would likely be necessary to support such a filtering system.
I emailed the City about this and was notified several days later that my message had been forwarded to their IT department. At that time I hadn't come up with this new filtering idea, so I've contacted them again with a link to this post. Maybe they'll read it, maybe not; but it's been a nice thought experiment.
- Which are: Bus, light rail, Northstar commuter train, bicycle accommodations, and Rideshare (car or van pools). [↩]
- caret: the blinking line or box often used to enter text on a computer [↩]
- Cufón replaces specified text elements with graphics rendered dynamically by the browser to provide more control over typography than the current lowest-common-denominator browser-native technologies. [↩]
- I for one would love it if the City had Google services whitelisted so I could check my email and calendar from pretty much anywhere for free, but I can understand the need to block commercial sites on a publicly funded network. [↩]
- U.S. Internet is the local ISP that was awarded the contract to build and run the citywide wireless service. [↩]
Reflection Squared: On Clifford Stoll’s “High Tech Heretic”
The other day, I was browsing the computer shelves at a local Border's book store. I came across Cliff Stoll's acclaimed book, The Cuckoo's Egg. My dad's recommended the story to me in the past, and the premise was intriguing. After all, who wouldn't want to read a non-fiction account of cyber espionage that reads like a top fiction mystery? I picked up the book and proceeded to spend the next two hours engrossed, reading right through the soft muttering and louder tapping of the woman in the chair beside me.
Of course, the time to depart arrived and I had to stop. Still, I read about 25% of the book in one sitting. I replaced the book on the shelf, noting to look for it at the library and/or add it to my wish list. (Even if I wanted to buy it, I wasn't exactly in a position to do so.)
The next day, en route to the upstairs computer lab, I checked the public library catalog. The Cuckoo's Egg wasn't in stock, and was checked out until the 21st of April, but I noticed that one of Stoll's other books was: High Tech Heretic: Why Computers Don't Belong in the Classroom and Other Reflections by a Computer Contrarian. On impulse, I checked the book out.
What I found inside, later, was intriguing. My parents have been skeptical of computers for a while. Though my dad uses them for his business, and my mom is warming up to them after years of asking me why I find them so interesting,1 there's still a big disconnect between us.2 I've vaguely known the reasoning behind their conclusions for years, but High Tech Heretic has shed some light on the details — and not monitor glow.
Programmed Instruction
Despite my parents' computer skepticism, I took my entire high school education online. I believe it was a good experience, though not for the reasons one might expect. It's not that I necessarily learned more than I would have in a conventional school — though I probably did, since the online coursework better fit my learning style — but rather that I spent a good chunk of my "school" time correcting the course material. Lazy QA teams had left the text, quizzes, and tests riddled with little errors. Through my teachers, I sent corrections, and my correction work earned back more than a few points that were wrongfully denied me in nearly every course — though I never got so much as a "Thank you" from the course distributors. (A rare few courses were bereft of glitches. I treasured them, because I didn't have to keep second-guessing everything.)
What was interesting about some of the corrections, though, was that sometimes it was just a matter of input formats. Most of the graded tests were multiple-choice, but many of the in-text "Self-Check" quizzes featured free-text inputs. Such quizzes were graded by JavaScript code, to give students an idea of how well they understood the material. But some of them had vague or quirky requirements about how answers were entered, and some of the quirky expectations made by the programmers resulted in points lost by students.
Stoll addresses the issue on page 16, in reference to B. F. Skinner's experiments with programmed instruction in the 1950s. Skinner's approach was nothing new, really — it mimicked a popular learning method preached by many educators then and now: repeat a topic until the student demonstrates understanding. Skinner's machines rewarded students for correct answers with further exploration of the topic, while incorrect answers led to review.3 However:
…programmed instruction flopped. The machine forced kids to regurgitate whatever answers the programmer wanted. There was no place for innovation, creativity, whimsy, or improvisation.
This sounds very familiar. Almost too familiar. The quizzes in my online coursework sometimes had bizarre expectations for what was to be typed into the text boxes. I once had a quiz (thankfully not graded) that balked at accepting a floating-point number (0.17 or something) with the leading zero; the expected input was .17 and too bad if you've been trained to put in the leading zero. The programmers were treating all text box inputs as strings, rather than parsing the values into numbers when appropriate. We all know that programmers are lazy, but certain kinds of laziness are inexcusable.
Skinner's ideas persisted, even into the years of my childhood. I had plenty of educational computer games in my youth, and maybe they did help teach me. Very little of what I know comes from conventional schooling — I know that much. Reading, writing, arithmetic, higher math, typing, (amateur) programming — all of it I learned outside the classroom. Reader Rabbit, Treasure Math Storm, and Edmark's Mighty Math software deserve more credit for my education than any school classroom I ever set foot in. Forgive me if it sounds like bragging, but I could read and write circles around most of my traditionally-educated friends all through my schooling. Kumon and my learning-friendly home environment can take the credit for my perfect score on the ACT's English section, not the school system.
Stoll also brings up computers in the classroom repeatedly. One great example is the replacement of science labs with computer programs. My local high school has a chemistry/physics lab, but an unscientific sample of the classes taught in the room shows much greater use of the computers for experimentation, rather than the lab equipment.
Learning the Tools, Not the Trades
Stoll also brings up the issue of learning how to use specific tools rather than the concepts underlying them. Chiefly discussed in the chapter "Calculating Against Calculators", the arguments focus on numerical fields; however, the thread is present practically from the beginning and applied to all subjects.
Through school, students are handed calculators in math class. They're trained to punch in the numbers and trust the calculator to come up with the right answer. Now, common sense dictates that one should always be able to estimate, so as to be able to catch errors in a calculation. In theory, students are taught to mentally check the calculator's results; in practice, assignments are turned in with answers stating that a radio tower is a fraction of a millimeter tall.
On page 85, the University of Illinois is used as an example. The school developed a calculus course centered on the Mathematica software. As such, the students learned how to integrate functions using Mathematica, rather than learning how to integrate. Students trained to use certain software programs for problem-solving often didn't know what to do when the electronic part of the equation (sorry) was removed.
In my math classes, I can remember very few times when I wasn't encouraged to use a calculator. A TI graphing calculator was a requirement for high school math classes, but I got through four years of online instruction with a photoelectrically-powered scientific calculator, used mostly for checking myself and dealing with nasty decimals. (I was fine graphic linear equations on graphing paper, but I did cave in and download a software program to do the parabolic and asymptotic functions for me.)
Learning tools at the expense of the underlying concepts isn't just limited to math. From my own experience, as well as friends', I've seen courses teach how to use a particular software program to solve a problem, without explaining what the program does. Modern English course requirements for electronically-submitted papers just begs for students to rely on spell-checking software. Many of my fellow students routinely misspelled even the most common and simple words. I can't help but blame Microsoft Word; it's the de facto standard for word processing these days, and defaults to automatically correcting a huge list of common misspellings so sometimes the user doesn't even know he's made a mistake. That's a bad idea for software used in education.
Systems Design Philosophy
Perhaps one of the best points made in the book is taken from David Gelernter's thesis: "Technology's most important obligation is to get out of the way." This point, from page 139, illustrates the basic purpose of machinery: making life easier. Bad design and useless features remove the helpful aspect of technology and replace it with nuisance.
Ah, PowerPoint
Following chapters on, among other things, the wiring of libraries and the planned obsolescence of computer systems, an entire chapter is devoted to PowerPoint and its fellow presentation software products. I thought the best part of this chapter was the section discussing the use of presentations in schools.
With my online learning experience, I was thankfully spared most of the PowerPoint junk that has made its way into the school curriculum. However, I had teachers in the offline world as well, and a few of them used PowerPoint to disastrous effect.
One such teacher followed the model for meetings presented earlier in the chapter: Notes for the students, slides on the screen; the lectures consisted of reading the slides aloud, with zero additional information presented in the spoken words. I was always bored to tears in that class. It was ironic that the course title was "Public Speaking", since such a class should be teaching students how to keep an audience's attention instead of how to make the audience yawn.
Another teacher — this was in a public school — taught her AP U.S. Government course using PowerPoint. She read from the slides, often rushing through and/or skipping slides for time (no worries, the slides were available on her personal Web page for study at home). Her habit of putting paragraphs on the slides wasn't exactly prime PowerPoint use, but at least she added extra tidbits to her lectures that weren't in the textbook or on the screen.
I should also note that part of that Government class was a group presentation project, on which I got a good grade just by going up and reading a few of the several slides produced by my group while I was sick. That isn't a complaint — I like good grades just as much as the next guy — but I didn't really have any input whatsoever on the project save for a few grammatical corrections. (I won't get into how my classmates made it difficult for me to contribute, even though I was perfectly willing to do my share.4)
I present these examples mainly to illustrate my own personal experience with the problems Cliff mentions on pages 182 – 183. (It's interesting that his main classroom example also involves a social studies teacher.) I'm sure educators would be quick to defend the growing use of PowerPoint in schools by citing technological familiarity for future job use, same as they would for school Internet connections (which are useful, but often inadequately restricted).
Dated Material?
I did have the thought throughout the book, however, that perhaps some of Stoll's opinions would be quite different if written today. In particular, page 189's assertion that professional editors and journalists just don't exist on the Internet is no longer true. That assertion is a fundamental point in several arguments following — arguments that would probably be different (if only slightly) if written from a 2010 perspective instead of a 1999 perspective.
Similarly, page 191 asserts that search engines don't understand concepts and ideas, only words. Today's indexing engines aren't perfect, but great strides have been made in machine understanding of language. Just look at services like Aardvark. (This is, of course, just a tiny subset of the possible examples I could have pulled from the book.)
Of course some things — unfortunately — never seem to change. I stupidly didn't note the location of it, but somewhere in the latter part of the book Stoll laments that search engines rely on correct spelling to find information. Spelling is a skill seldom taught or learned in today's world (it seems), and we rely more than ever on spell-checkers. Many services offer their own (see Gmail & Google Docs as examples) in the event that the user's browser doesn't have one already built in. Search engines have been trained to recognize our mistakes in queries (à la Google's classic "Did you mean?" lines) and sometimes I think they also detect mistakes in pages they index.
Overall
High-Tech Heretic contains a good many well-placed warnings, and I very much appreciate Stoll's opinions on the replacement of human and paper resources with technology. However, I hope that his later writings are better edited. This book has quite good spelling (good, since he brought up that issue) but the grammar is lacking in a few spots; I found a decent number of omitted or misplaced words.
Nitpicking aside, the message of the book is clear and appreciated. Technology has a place, and we shouldn't let it get out of the corner we've set aside for it.
Update (05/04): Corrected missing markup that caused most of the text to appear as a giant footnote. Proofreading failure on my part; sorry!
- She's begun asking me about websites and such: Hosting recommendations, platform suggestions, that sort of thing. It's kind of cool that she's interested now. [↩]
- I used to go to my dad with questions about the computer. Now, he comes to me with his questions and I use search engines to find answers for my own. [↩]
- I had several experiences with this type of learning, including both online (with Stanford's EPGY program) and off (with Kumon, a Japanese-originated curriculum in math and reading). [↩]
- Schools seem to use group projects a lot without teaching students how to collaborate, kind of like a lot of theatre classes tell the actors to project without getting into the mechanics of doing so. [↩]
tr.im: An Exercise in How Not to Run a Service
It recently came to my attention that tr.im has decided to stop accepting new URLs shortened through the website and asked developers to remove tr.im functionality from their applications, and plans to shut down the redirection service in a year or two. I went there to shorten an address on Tuesday but came upon this page instead:
Ever since discovering the service about two years ago, I have shortened almost every URL I post to Twitter, Facebook, and several other such sites through tr.im. That will have to stop, apparently, because those addresses will no longer work in the not-too-distant future. It is unfortunate that nothing can be done about the millions of tr.im links that have already been flung to all corners of the Web.
Apparently, the August 2009 announcement/scare (see Mashable's coverage) should have been taken more seriously—a lot more seriously. Following that little episode, the overwhelming response from users convinced Nambu Networks (tr.im's developer, whose main products are Twitter apps) to abort the planned shutdown. I, and a lot of other Internet users, thought all was well.1 Crisis seemed to have been averted. Now this.
Mashable, in the article from last August, stated optimistically that someone would probably buy the service before the planned hard shutdown sometime after December 31, 2009. Obviously that hasn't happened, or the service wouldn't be shutting down. But there has to be a better solution than pulling the plug, even if that doesn't happen until 2011 or 2012.
I can accept that Nambu administrators have had to deal with a lot of spam links being generated using their service, but it puzzles me that the spam would lead hosting providers to threaten termination of the site. After all, Nambu is not responsible for the links its users submit, nor the contents on the other end of its redirections — but that's far beyond my expertise.
However I must wonder: Instead of just giving up, why not develop better spam-fighting algorithms? Digg, Reddit, any site that accepts user-submitted links — even Facebook and Twitter — have countless spammers fighting to get their links in front of millions of users, and they all do a pretty good job of keeping it off the site algorithmically, with no human intervention. I don't see bit.ly giving up its fight against spam, or is.gd, TinyURL, SnipURL, or any of the other established shortening services. They must have spam link submissions too, but they get by. None of the other shortening services I've come across in the past few years have ever threatened to disappear, for spam volume or any other reason — and I've looked at a lot of them. Yet tr.im has done so now twice in less than nine months, and it looks like this time may be for good.
A lot can happen between now and when Nambu decides to finally pull the plug on tr.im's redirection service, of course. Perhaps a buyer will surface. (Then again, offers were made in August, only to be turned down because Nambu didn't feel it could trust the potential buyers.) Perhaps Nambu will change its mind — again. Heck, I'd buy the service and run it myself if I had the funds. Anything's better than breaking millions of links across the Internet; shutting down a service like tr.im will even affect email archives, since shortened URLs make their way into emails all the time.
No matter what happens, I'm going to follow the old saying, "Fool me once, shame on you. Fool me twice, shame on me." I stayed loyal to tr.im and Nambu after they threatened to make my digital world fall apart last summer. I continued to use their awesome service because I loved it — the name, the interface, everything — and they've turned around and made the same threat, only stronger. I cannot possibly ignore this decision, what amounts to pulling out the knife they stabbed in all of their users' backs in August and driving it back in an inch away. It's absolutely infuriating. SnipURL (and snurl.com, sn.im, cl.lk, and snipr.com — the service maintains five different options), here I come.
tr.im, you've been a great example. Nambu, I sure as hell won't be buying any of your software products, ever. You better give some serious thought to giving us users a way to keep the redirections working, or at least a way to export the redirections we've created so we can go through and change or annotate whatever old content we can to keep the links from breaking, because that's the big reason I'm angry. If you simply shut down, you will be intentionally breaking a large percentage of the Web.
Is this the future of millions of tr.im URLs all over the Internet?
- It is, however, true that many users vowed to never again use tr.im after that episode. I wasn't one of them, but as it turned out that was a mistake. [↩]
reMAP: IMAP reConceptualized
Gabor Cselle, the founder of reMail, recently posted an idea for replacing the IMAP email protocol with something with which working would be easier. The proposed name? reMAP, short for reimagined Mail Access Protocol.
He calls for a RESTful design that among other things would globalize message identifiers (rather than changing them the instant a message is moved to a new folder), replace folders with labels (a la Gmail), require the server to handle email search indexes, and make conversations the basic unit of email (instead of individual messages). reMAP would also make handling MIME messages unnecessary; the client could simply call the server with a request for text or HTML message representations without having to deal with parsing the MIME format itself.
I personally am in agreement with his entire proposal. The experiences I've had with IMAP in the past have highlighted shortcomings in a standard that was drafted over 15 years ago. Email has changed a great deal since then, but IMAP has not been revised to accommodate the enhancements made by newer clients and services like Gmail.
If IMAP is to be improved, it's probably appropriate to just completely replace it with something new. If the new system can translate IMAP commands into the equivalent operations in its own protocol, that's even better, because then servers can be upgraded without worries of breaking compatibility with older clients or the need to run server applications for IMAP and reMAP side by side.
There's plenty of discussion going on at the original post and on Hacker News. If, however, you would like to say something here, please don't hesitate.
As a side note, I see that Gabor is using Blogger's FTP publishing option, which will be going away soon. I hope the link will still work when he has to move.
“Houdini” plugin for WordPress is no magician
I've seen some pretty absurd WordPress plugins show up in the Plugins dashboard widget on this site, but the recently-released "Houdini" takes the cake so far. It claims to prevent spammers from copying the contents of any post or page upon which the [houdini] shortcode is placed.
The fact is the internet is open can lead to theft especially to content stealing and plagiarism.
Until now, there was very little to discourage and deter this serious crime. Yes content theft and plagarism is a crime in some jurisdictions.
You cannot rely on others or the authorities to continue to police the internet as they do not have enough resources. You need to protect your content and deter this theft.
The basic form of content theft is to copy and paste your content to another medium.
Well Houdini, prevents this using a little known special algorithm that prevents copying by making the selected text that is targeted by the perps to be copied, to disappear! Yes disappear!!! The only way to recover is to reload the page in the web browser. If they try again, the content disappears again. As long as they keep trying to select and copy your content, the content will disappear before they can get a chance to execute the copy command!
After a few unsuccessful attempts, the theives will move on to a easier target.
Your safe!
So what can we glean from this PHK Corporation plugin's description, other than the fact that the author has poor English skills? We can most definitely conclude that phkcorp2005 has no understanding of how most copying of Internet content is carried out. As I and others have pointed out many times over in blog and forum posts, copying is usually not done by a person using a mouse to cut and paste, but rather by automated computer programs called scrapers. (For the uninitiated: See these two Wikipedia articles.)
What is left out of that messy, error-riddled description is the word "JavaScript". It is by no means the only word or phrase that should be inserted, but it is the most important. That fifth "paragraph" (the formatting is also very poor) should say "special JavaScript algorithm", which is synonymous in this case with "useless JavaScript algorithm". All it does is wait for the user to try to select text in the browser and clear the selection if any is made. Besides, any copy-protection scheme based upon JavaScript is inherently useless by virtue of the fact that it doesn't do anything to prevent copying. There are tons of ways to get around it. Disabling JavaScript, for example (as mentioned below).
For example, take hatkirby's rant. I quote from that post the list of circumvention techniques below:
- Go old fashioned and turn off JavaScript. Yep, the script is rendered useless.
- More advanced content thieves likely don't just go around to random blogs and copy/paste off of them. They write screen scrapers, small programs that visit sites and download specific parts of the site. As these do not render pages and simply download from them, the script isn't even seen by the scraper.
- Due to the nature of the Internet, anyone, and I mean anyone, can see the source code of a website. It's done differently in different web browsers, but it's always pathetically easy and, as it simply shows HTML code instead of parsing anything, no scripts are run.
- RSS. Syndication feeds are normally viewed in feed readers with little to no JavaScript interpreter. Script bypassed.
- There's this cool little button on most keyboards that says "Print Screen". Even on the keyboards that don't have it, there's usually a key combination that achieves the same effect. It takes a picture of whatever's on the screen. No selection occurs and yet the thief has a copy of your article. They do, however, have to retype it, so this keeps the lazy thieves out.
That's just a smattering of ways to get around the JavaScript inserted by Houdini.
In the face of all the arguments presented, the plugin's author has insisted that the purpose of Houdini is not to "prevent" copying, but to "deter" copying. I don't think that statement holds any weight whatsoever. It still depends upon the copying being performed in a JavaScript-enabled browser by a human.
There's also the matter of just how absurd copy-protection of any kind is on the Internet. Every single document or file anywhere on the Internet must be copied in order for the user-agent (usually a browser in the case of human interaction) to retrieve and display or otherwise make use of the content. This is why it's quite simple for any user to just view the source code of a page. It has to be copied in order to display the content.
Also mentioned in the first (started, chronologically) forum thread is the ability of JavaScript to disable the browser's context menu and thus the "View source" option. That's just as useless as the selection-clearing code, and actually more so because many modern browsers allow specific JavaScript capabilities to be disabled — capabilities like removing or replacing the context menu — as an alternative to disabling all JavaScript. The "View source" option is also present in other places — places such as the browser toolbar's "View" or "Tools" menu — which JavaScript code cannot modify even in the most permissive environment.
Legitimate quoting must also be considered. There are a million and one reasons why someone might legitimately want to copy a few sentences of a blog post. Maybe they like it enough to post a quote to Twitter or Facebook, or perhaps they want to comment on it in a blog post of their own. Content theft is a big problem, but the old methods of periodically searching for and reporting content stolen from one's site are infinitely preferable to this plugin's ineffective method.
Finally, why require the use of a shortcode? Why not just add the script globally to all content pages and forget that stupid "This page is copy protected" header?
At most, Houdini has the ability to add a superfluous <h5> tag to the page and annoy legitimate users with an obnoxious script while doing absolutely nothing to thwart real content thieves. I wonder if WordPress Extend would consider removing this laughable plugin from the directory... Of course, we bloggers would then be denied this ripe opportunity to satirize this particular piece of code.
Why I’m Always Promoting Dropbox

- Image via CrunchBase
If you've had much interaction with me regarding computers, no matter what the medium — Twitter, Facebook, email, dinner conversation, small talk during a gathering — I've probably mentioned a service called Dropbox. A few of you have already succumbed to my uncharacteristic marketing tone and signed up, but I thought I'd blog about it and perhaps get more people on board.
I'll start with the reasons I like the service, and then explain why, exactly, I'm doing this a little later.
The site bills itself as an online synchronization and backup solution. I use it mostly for the backup, but that will probably change in the future. After all, it was created by a couple guys who were tired of forgetting their flash drives. It's ironic to note that my current use of Dropbox is to back up my 8GB (soon to be 32GB) SanDisk Cruzer Micro, using a modification contributed by another user.
Dropbox is just plain fun to use, and it has a lot of cross-platform compatibility. It synchronizes files between computers running Windows, Mac OS, or Linux; keeps backup copies online (using Amazon's S³ service — not that I should get too technical); stores revisions when files are changed; and keeps deleted files in case of the inevitable "damn, I shouldn't have deleted that" moment.
The backups, revisions, and deleted files are accessible from any computer with an Internet connection. Files can be added, updated, deleted, and otherwise managed via the website, too, which is great for travel or forgotten files (presentations, school projects, whatever). There is also a mobile website for PDAs and a higher-end version optimized for Android- and iPhone OS – based devices, as well as an iPhone App (there's an app for that™) which of course also works on the iPod Touch.
When naughty Vista workstations have tried to corrupt irreplaceable recordings and other files, Dropbox has restored them (with a little direction from me). Last summer, I bent the connector on my flash drive pretty severely while working on a design project at a poorly arranged Emerson computer desk. It still works, and retracts; but after asking around a bit and hearing that the drive was now likely unreliable, I was motivated to upgrade from the old Dropbox U3 mod — which had trouble on all sorts of non-personal computers — to DropboxPortable — which has worked everywhere so far. (It still won't work at the local public libraries, though. But neither will anything else; they blanket – block EXEs.) If and when my drive decides to give up the ghost, I know Dropbox will be there to give me my drive back just as I had it, as soon as I replace the failed hardware.
I also back up my music collection in Dropbox, which is a great, perfectly legal way to make sure I don't lose any downloaded or ripped MP3s. As it turns out, it's also useful because the device I use as my MP3 player — a Roland Edirol R-09HR — happens to be very picky about file structure. If there's one bit out of place, I get an "Improper Song!" error and can't listen to that file. When this happens to a song that used to play, I've often been able to go back into the previous file versions for that MP3 and load a playable version onto my R-09HR. It's much easier than using a so-called "repair tool" on the file.
There's also the matter of deleted file recovery. I've used the deleted file recovery to reinstate everything from seldom-played music (deleted to free up space) and design research (I just messed up) to irreplaceable recordings from my R-09HR (corrupted by Vista).
The list of times Dropbox has come in handy and/or saved my bacon is endless. If it's saved me so many times in the space of one year, it can surely do you some good.
So do yourself and the great people at Dropbox ("the Dropboxers") a favor and give it a try. I'll bet you won't be disappointed.
Note: Signing up through the links in this post will net you an extra 250MB* of storage in addition to Dropbox's free 2GB plan. That extra storage will stay with you if you decide to upgrade your account. (Disclaimer: You'll also earn me an extra 250MB.*) I tried to work out something special with the Dropbox team via their now-defunct affiliate program, but they stopped the program just before I inquired, so I'm unfortunately rather limited in the benefits I can pass on. Too bad, really; I had in mind something rather spectacular.
* – Please note that you must install the Dropbox application on at least one computer before you or I will receive any additional storage.
Bringing Back Skribit
A while back, I tried out a service called Skribit on my Blogger site. Skribit's purpose is to help bloggers overcome writer's block. It places a suggestion form on the site where visitors can leave ideas for the blogger to use when he or she is out of his or her own ideas.
It never got much use by readers of the old site, so I initially didn't bother transferring it here; but now that Skribit has launched and (I see) done a lot of work on the experience users leaving suggestions have, I figured I might as well install the WordPress plugin and give it another go. It doesn't take up any space in my sidebar, either, now; since I stopped paying attention to it, a new floating tab option has popped up. For now, it's over on the right-hand side of the page.
Go ahead, shoot me some ideas! I may have something of a backlog from the last few months, but I certainly don't have an endless list.






