Technobabbles I try to sound like I know what I'm talking about. Don't be fooled.

26May/140

Force Dropbox on Mac OS X to Respect Your Disk Space

Dropbox is one of those tools without which it's probably no longer possible for me to live. It just syncs files between my computers, and it makes those same files accessible through any Web browser. It's not like my love for it is any secret.

The Problem

But it so happens that over the last year or so, I've increasingly run into a pretty major issue: Dropbox on my MacBook Pro will happily fill its .dropbox.cache folder with gigabytes upon gigabytes of files, sometimes filling 50+GB of disk space before finally the system chokes up and apps start crashing because, as Finder says, the hard drive has "Zero bytes free". The system will pop up warnings that "Your startup disk is almost full", but if the machine is unattended for several hours straight (overnight, during the day when I haven't taken it to class, etc.) those warnings don't help.

It seems the issue happens when a program makes lots of small updates to big files tracked by Dropbox. Other computers start accumulating several copies of each file in the .dropbox.cache folder, and they start eating up disk space pretty fast. Apparently the Dropbox client doesn't have any logic built into it to prune the cache based on some reasonable size constraint — at least, not on Mac OS X; I've never encountered this issue on a Windows machine. So it'll happily keep sticking previous versions of big changed files into its cache until the drive is completely full, then complain that it "Can't sync; not enough free disk space" — even though that's its own fault.

The Fix

Let's just say I got tired of waiting for the Dropboxers to fix this. The solution I came up with is simple, taking just one line in my user crontab:

*/20 * * * * find -E /Users/dgw/Dropbox/.dropbox.cache -type f -regex '.*/[0-9]{4}-[0-9]{2}-[0-9]{2}/.*' -cmin +60 -exec rm {} \;

If you're unfamiliar with crontab syntax, let me explain. Each line in a crontab file on OS X (and most other Unix-like systems) consists of 6 values, separated by tab characters: which minute to run the job; which hour to run the job; which day of the month to run the job; which weekday to run the job; and the command to run. Setting the first to */20 and the next four fields to * has the effect of running the command every 20 minutes. The last value, the command itself, uses the find utility to locate any file (-type f) in Dropbox's cache (/Users/dgw/Dropbox/.dropbox.cache and -regex '.*/[0-9]{4}-[0-9]{2}-[0-9]{2}/.*) that is more than an hour old (-cmin +60) and delete each file it finds with rm (-exec rm {} \;).1

To set this up for yourself, do the following:2

  1. Open Terminal.app
  2. Run crontab -e
  3. Press i to enter vim's insert mode
  4. Type */20 and four * characters, each followed by Tab (these five values should line up under the first five column headings, #min, hour, mday, month, & wday)
  5. Copy and paste (using Cmd+V) the command shown above, replacing /Users/dgw/Dropbox with the path to your Dropbox folder3
  6. Hit Esc to exit vim's insert mode
  7. Type ZZ (two capital Zs) to save the file and quit vim
  8. vim will close and you will see the message crontab: installing new crontab in Terminal

Now, every 20 minutes, your Mac will automatically delete any of Dropbox's cached files older than an hour. This will make it extremely hard (if not impossible) for Dropbox to single-handedly fill your hard drive to bursting.

Note that the numeric values are just what works best for me. */20 just means every 20 minutes; you could make it */15 for every 15 minutes, 0 for on the hour, 30 for half-past every hour, etc. -cmin +60 means files that were created (added to the cache) more than an hour ago, but this could be -cmin +180 for files that are more than three hours old or -cmin +30 for files more than half an hour old, etc. I initially had -cmin +120 until I ran into a particularly "productive" day on Dropbox's part and had to cut down on its cache more aggressively.

The Caveats

The only potential downside I can think of is that, because you're deleting things from Dropbox's cache of changed and deleted file versions, restoring one of those previous versions or a deleted file will almost always mean having to download it again from Dropbox's servers. If you regularly edit or delete files, only to restore a previous version within a day or two, this might not be something you want to do unless Dropbox is unrelenting in filling up your hard drive with cache files.

Again, the ideal fix would be for the Dropboxers to code in some logic that keeps Dropbox's cache from filling your Mac's hard drive in the first place. But until and unless they do, this is the little hack around the problem that I'll be using.


Notes:
  1. Yes, these are standard GNU/Linux commands. I linked to the OS X manpages because this tip targets OS X, but I don't believe there are any huge differences between the Mac and Linux implementations. []
  2. I've had it in place for about six weeks and never had any problems, but <insert standard disclaimers about implementing this at your own risk>. []
  3. If you don't know this path, it's easy to find: Right-click your Dropbox folder and select "Get Info". In the info dialog, the path is under "General", labeled "Where:", and it's probably something like /Users/yourusername/Dropbox unless you picked a custom Dropbox folder location. []
9Jun/1210

Did Virgin Mobile USA cut an anti-Android deal with Apple?

Note: There are several embedded tweets in this post that may not appear properly on all devices. Tweets will definitely lack context (what they're "in reply to") on lower-end devices or browsers without JavaScript. Visit the post in a full browser on a PC for the best possible viewing experience. Hopefully WordPress 3.4 will include better tweet display mechanisms and I'll be able to remove this notice.

All right, so the big news is, Virgin Mobile USA will soon carry the Apple iPhone 4S. Which is to say, my pre-paid, Sprint-owned cellular telephone carrier may have cut a deal with Apple to make all their Android devices suddenly look unattractive.

Why do I think that? Oh, no reason, just the plan prices. As my long-time Web contact Zoli Erdos asked of Virgin Mobile's Twitter customer service account, and got an interesting (but not entirely clear) answer:

Wait, "Auto top-up" just means letting them charge for monthly service automatically. I let them do that for my Motorola Triumph.1 Can I get that discount, too? Zoli already got an answer to that question, too:

Huh? Yep, exclusively for iPhone customers, Virgin Mobile USA will take $5/month off of your service plan if you let them charge you automatically every month. Want Android instead? Sucks to be you, you get to pay more.

This story gets even better. I asked, specifically, if there was some kind of deal going on between Virgin Mobile USA and Apple. The answer was surprising, but I'm not entirely sure the responding CSR actually read and understood what I asked:

Let me get this straight. I asked if Virgin Mobile and Apple decided to make Android less appealing, and the answer was "Yes!"? With an exclamation mark?

Wat.

Needless to say, I've been less than amused by the changes to Virgin Mobile's policies over the last year. First they jacked up prices for new customers right before launching the Motorola Triumph in June 2011.23 Virgin Mobile then started throttling 3G data after a 2.5GB monthly usage threshold.4 Then they ended grandfathered plan rates for users who upgrade their devices, meaning that if (when) I eventually upgrade away from the Triumph, my monthly fee will jump from $25/month to $35/month, just because I'm changing phones.5

What started as a great deal for cell phone service is still a good deal compared with contract carriers, to be sure, but the policy changes and new competitors like republic wireless entering the market make it much less sweet. ($19/month for unlimited everything? Tanj, republic, launch something newer than the LG Optimus already!) Ting and NET10 also offer lower-cost smartphone service compared to contract plans, but for my level of usage both are more expensive even than Virgin Mobile's current pricing.

I really don't like this iPhone policy. The one change over the past year that I was actually happy to see Virgin Mobile make was dropping their $10 monthly surcharge for Blackberry devices. Though RIM and its Blackberry devices are all but dead, it was nice to see Virgin Mobile start treating all smartphones equally, pricing-wise. Now, we're back to favoring one platform over the others, and I really don't like that. All smartphone platforms have roughly equal potential for using network capacity, so charging less for one of them makes absolutely no logical sense.

Whether or not there's some behind-the-scenes deal between Virgin Mobile — actually, let's be honest, if it exists the deal is with Sprint — and Apple that's responsible for this price discrepancy, it sure seems like a very anti-Android thing to do. Virgin Mobile, please treat all smartphone plans equally — no platform favoritism. It's the customer-friendly thing to do. Extend the $5/month "Auto top-up" discount to all Beyond Talk plan subscribers (you don't have to include grandfathered users, that's totally understandable) and maybe I won't jump ship to republic wireless as soon as they launch a more powerful device.


Notes:
  1. I've had it since December, but haven't felt the need to review it as I did the LG Optimus V. Pretty much all the bug reports and battery life problems are absolutely true. If I feel like a writing project, though, I'll do a full review of my own, just for completeness. []
  2. Virgin Mobile have since remained unwilling to push Motorola to fix the software problems with said Triumph. Motorola, for its part, pretty much ignores/dismisses all bug reports. They keep offering "Factory Data Reset" as the solution to everything, and haven't said a peep about whether or not there will be a software update. As far as I'm concerned, Motorola's reputation as a phone maker is completely shot. []
  3. Again, I should do a full review of this phone. It's been out almost a year. I also have a really, really ridiculous story about how I got mine. Plus, I need to rant about the whole "Motorola isn't supporting its devices" thing. []
  4. At least there aren't any overage fees. It's slower, but it's still "unlimited". []
  5. As I understand it, this new policy would also affect an emergency switch back to my LG Optimus V, if my Triumph fails someday. That's one of the major reasons that I don't like the policy change. []
1Jun/120

Some Joy in Seesmic Ping Land

Wednesday, Seesmic sent all Ping.fm users an email with "important information". Dated May 31 (Seesmic's timezone is well ahead of mine), the letter included some basic information that we all pretty much knew. But one sentence actually made me happy:

To further support development and upcoming features, we will offer Seesmic Ping as both a free and paid service.

(emphasis added). Back in February, I wrote "Why I Will Not Use Seesmic, Ever", a post expressing my dismay at the shutdown of Ping.fm and the apparent paid-only nature of Seesmic Ping. I begged the company to consider a "freemium" model and not make all users pay for the service. My post got the attention of a Seesmic employee, who commented, inviting me to share further feedback via email. I never emailed Yama — perhaps I used a feedback form instead, I don't recall — but anyway… I'm glad to see this announcement of tiered pricing and a free base service.

A March 14 blog post from Seesmic gives the pricing tiers:

There will be three plans for Seesmic Ping: a free plan, so that everyone gets a chance to enjoy it, a $4.99/month plan for the ones who want to get more, and a $49.99 for the ones who just want it all.

Before reading the blog post, I posted this suggestion in Seesmic's UserVoice feedback forum, asking that they maintain a free service tier. The response was pretty swift, and positive:

We will have a free version with limited accounts and posts per day. We’ll continue to add features and services which we’ll make paid. — jyamasaki, Seesmic admin, on UserVoice

Limiting the number of accounts makes sense. It's something HootSuite has done for a while now, requiring a paid plan to add more than five networks. Since I presently post to four networks I care about, and would make Google+ a fifth if Ping.fm supported it, I hope that Seesmic Ping's free plan service limit is also five accounts.

I'm leery, however, of the posts-per-day limit. It has the potential to be unreasonable and oppressive if set too low. Personally, I'd like to see a number between 25 and 50 as the daily posting limit, and enforcement in a rolling 24-hour window (no "resets at 00:00 GMT" or some such). I think it's legitimate for a personal user to average two posts an hour. Posting during a regular 14-hour day, a limit of 50 posts would allow a user to share their thoughts about every 15 minutes or so. For people I follow on Twitter who work office jobs, that seems like a common average sharing rate. (Sorry, no scientific study here, just guesstimating.)

All in all, I'm a lot less down on Seesmic Ping than I was three months ago. The final pricing & limitation details will ultimately set my opinion when they're released within the next few weeks, but for now I've rescinded my personal ban on using anything Seesmic makes in light of the "free plan" announcement.

If Seemic closes Ping.fm before I can auto-publish my blog posts to Seesmic Ping with a WordPress plugin, though, I'll get mad again. Fortunately there's already an API for the new service in private testing.

27Feb/124

My Google AdSense Account: Moved to Where It Belongs

Google AdSense logoHonestly, I'm no longer sure how it happened, but suffice it to say that a few years ago I did something stupid.

No, no, it was nothing like that. I just applied for Google AdSense a few days before my actual 18th birthday. That, of course, netted me a declined application, because I was obviously still too young to participate in AdSense — but I wasn't counting on it also killing my ability to reapply later. When I tried again to sign up for AdSense using my main Google Account, after I was old enough, I got nothing but errors.

When I emailed Google AdSense Support about the problem, they said I could just reapply, but would have to use a different email address — meaning a different Google Account — to do so. I eventually did so, after I created an alias or two at Gmail, but I never used the approved account. I wasn't sure if I even wanted to try ads on this site, and I also had a hang-up about the principle of having one service in an account separate from all the others.

I did try to change the login email address associated with my approved AdSense account. The only problem was, my account's login couldn't be changed, because it was associated with a Gmail account. I was less than pleased, but figured it was a problem I could solve later.

And so it was: Last week, just in time for February break at college, I discovered that my primary Google Account again had the ability to apply for AdSense. Maybe there's some kind of expiration on declined applications; I haven't read enough of Google's policies to figure that out. (Who has the time, especially as a full-time college student?) So I did it: I reapplied for AdSense on my primary account.

Google's systems noticed that an active AdSense account already claimed my Payee Name. But instead of telling me I couldn't complete my application, as I expected, it asked if I wanted to transfer the account. What did I say? Yes!, of course.

I filled out a short form, got a bit of data from my approved AdSense account, agreed to forfeit my $0.00 of unpaid earnings in the old AdSense account,1 submitted, and waited. In less than an hour, I got confirmations addressed to both my old and new email addresses that my account had been transferred. I logged into AdSense using my main Google Account, and it worked.

Technically, what Google did was close my old account and open a new one associated with my primary Gmail address. That's why unpaid earnings below the payment threshold didn't transfer. If I had generated ad code using the old account, I would have had to replace it. Not having to deal with that made a simple process even easier.

Thanks, Google. Every so often, you do something that makes me really happy. This was one of those things.

This change affects my website in a small way: I'm testing AdSense ads in the places supported by LightWord, the WordPress theme I use, whose development I have kind of taken over.2 Since enabling the new ads (which only show on single posts) several days ago, I've seen exactly zero clicks. It'll be an interesting experiment to see if that changes.


Notes:
  1. Earnings below the payment threshold of US$10 are forfeit in transfers. Not that I ever used my old account, so it couldn't possibly have any earnings. []
  2. It's not like anyone has really seen my changes. I haven't gotten around to officially forking the code and releasing my own version under a different name. Doing that sounds like a summer project, maybe, depending on how busy I am, as it will involve updating the theme code to meet all of the current Theme Review guidelines. []
25Feb/128

Why I Will Not Use Seesmic, Ever

Update (06/01): Seesmic eventually killed the green bar overlay. They announced a time-frame (by the middle of June) for closing Ping.fm, and also confirmed that the new Ping service will have a free service level. I commend this outcome, with reservations.

Update (03/03): This post garnered a response from a Seesmic employee, Yama, in the comments. From "figure out the best pricing model", I gather that pricing remains undecided, so I maintain my hope for a HootSuite-like freemium model. I'm also glad to hear that the green bar will be reviewed for possible improvements. Thank you, Yama; if I have more thoughts I will certainly email you.

Earlier this month, no doubt on or soon after February 6, 2012, I went to Ping.fm to find a green bar on top of the area where I usually clicked to log in and get on with posting things to my social networks. Seesmic, apparently, had other plans. They really wanted to make sure I heard about their new product, Seesmic Ping. They covered the login link with a green bar to make sure I'd notice it.

Ping.fm with a green bar advertising Seesmic Ping

The offending green bar on Ping.fm's homepage

All right, fine, I went to have a look. I didn't feel like signing up for the new service, though. Instead, I dug up the blog post announcing Seesmic Ping, from February 6. Near the end, there was a very telling paragraph:1

For Ping.fm users – With the release of Seesmic Ping, we’ll look to maintain Ping.fm for some time. In the meantime, we encourage you to sign up for a Seesmic Profile and give Seesmic Ping a ride through our mobile applications or the web.

I wasn't the only one made uneasy by those two sentences. "for some time" really doesn't mean "indefinitely", and sure sounds like Seesmic will eventually kill Ping.fm entirely.

Axe Ping.fm?

Source images: Question mark, Axe, logo from Ping.fm website

I've had complaints over the years with Ping.fm, occasionally with performance. But most of them came from decisions made by Seesmic, explicitly or not, after they acquired Ping.fm. They were things like:

  • No new API keys for applications
  • Disabling API keys for applications like the Shorten2Ping WordPress plugin, instead of blocking the users who were spamming
  • No new services for years
  • Issues with existing services, like Jaiku (which Google later shut down completely about a month ago)2
  • Broken post-by-email3

Despite all the issues following the Seesmic acquisition, Ping.fm has remained solidly usable. But Seesmic has now announced a successor to Ping.fm — and what's more, they intend to charge for it (emphasis mine):4

We’ll look to have more features and services when Seesmic Ping comes out of beta as a paid service.

No pricing came with the announcement, just a notice that the new service would eventually cost money. I know we've all been spoiled by free Web services, and the money has to come from somewhere, but somehow I have my doubts that Seesmic will take an approach that is consumer-friendly. HootSuite has a great pricing model: Features that consumers will use (a few profiles, with one user who can manage them) are free; business-level features (more profiles, multiple-user collaboration) cost money. I don't think Seesmic Ping will follow that structure; if I had to guess, everyone will have to pay for it.

I mean, really, Seesmic could have made the green bar push the entire page down, instead of floating it over the four tabs at the top. Look at what it covers:

The green bar superimposed on the normal Ping.fm look

With the green bar made partially transparent, we can see what it covers

It floated on top of the page for a reason, I'm sure. Putting it there made me click on it to make it go away (it didn't). Then I read it, and followed the link. No doubt I followed the expected sequence of actions precisely. And that irritates me, because the green bar should have just looked like this:

How the green bar should have looked, pushing down the rest of the page

This is just as informative, but less annoying.
It probably failed the In-Your-Face Test.

I imagine that the reasoning went something like, "If it doesn't cover the login link, users will ignore it. No, displacing the login link by 40 pixels isn't enough; it has to actually be inaccessible. We will force users to read this bar on every single page." Oh yeah, it pops up on every single page view. Home, login, Dashboard, settings, you-name-it — green bar ALL the pages… for lack of a better X all the Y idea.

There was also an email newsletter sent out on February 15, announcing Seesmic Ping, which I read after going through the whole "green bar" thing. It too addressed the future of Ping.fm… sort of:5

Like many of you, we appreciate the passion that Ping.fm brings, and made sure to carry over its core value of the simplicity in posting. With the launch of Seesmic Ping, we continue to enhance this service with reliability and robustness, while offering key features such as scheduling and the ability to post to multiple Twitter accounts and Facebook pages.
Eventually, Seesmic Ping will be a paid service. While in beta, Seesmic Ping is free to access. If you have any feedback, please tell us what you think: feedback.seesmic.com.

The email announcement carefully avoided any mention of shutting down Ping.fm. The original blog post never changed, though, so the plans are certainly still in place.

Bar chart of which interface @hidgw used to post to Twitter, as of 02:25 EST February 25, 2012

TweetStats' bar chart showing how I posted my tweets, as of 02:25 EST February 25, 2012

This state of affairs is really disappointing, because I've used Ping.fm as a staple of my online life for, literally, years. According to TweetStats, I've posted from Ping.fm more than I have from Twitter.com. (twhirl is still on top because I used to have it open all the time back in high school.) I post from the Web, from a third-party app on my Android phone, via SMS, and I used to use email posting from my mother's cell phone back before I had my own. In short, I use Ping.fm a lot. It still is the best option I've found on the market for cross-posting to different social networks.

If When Ping.fm goes away, I'll probably end up switching to Hellotxt. Hellotxt has its own share of issues at the moment, including a lot of services that are disabled and a significant slowness to the site, but it's still the best alternative to Ping.fm. I can also just roll up my sleeves and build my own personal system, since all of the sites I use provide free API access, but I'd rather not take the time to do that. It would also load my (very) shared server and lack a lot of features like posting via SMS6 and scheduled posting.7 Could I implement them? Sure. Would I take the time? Questionable. Additional features also mean additional server load, and so on.

The point is, I have only one practical alternative — Hellotxt — because building my own is hard, time-consuming, and unlikely to happen any time soon. I dream that Seesmic will change plans and decide not to kill Ping.fm, but the reality is that it's almost certain to happen and the only question is when. Hopefully Hellotxt will have its issues worked out by then and will be ready to take over as king of the cross-posting niche. It would certainly serve Seesmic right if Ping never went anywhere, and that might be worth losing Ping.fm.

As for never using Seesmic, ever, well, let's just say I oppose the way they do things. I don't like it when a company buys another company, takes the ideas and technology from existing products, and then shuts down the old company's services. Google does that a lot, and those are the times when I come closest to hating Google. The difference is, Google almost always creates awesome things out of the remains of old companies and services. Seesmic hasn't really done anything but allow a useful product to stagnate, and now they're going to kill it at some unspecified future date, replacing it with something that can never be a true replacement. You can't replace a free service with a paid service; it doesn't work that way.

If Seesmic takes their pricing structure in the same direction as HootSuite, though, and they only charge for certain features, I might actually give Ping a try. I have a hard time imagining a situation that would make me actually like Seesmic as a company, though.


Notes:
  1. The paragraph was riddled with links to Seesmic.com, which I didn't copy. There was no point. []
  2. Unlike other social networks that died, Jaiku had a dedicated following willing to preserve its contents, if not the functionality. Apparently, my "presences" are archived. []
  3. Added later on publish date (23:20 or so) when I discovered that Shorten2Ping had failed to post this article via Ping.fm. My server's emails are working. The problem is with Ping.fm. Grr. []
  4. Yes, I skipped copying another link to Seesmic.com. All occurrences of "Seesmic Ping" were linked except for one. I guess somebody missed it. []
  5. And just like in the blog post, every occurrence of the phrase "Seesmic Ping" was linked to Seesmic.com. Talk about carpet-bombing links. []
  6. If I'm not paying for Seesmic Ping, I'm certainly not shelling out for an SMS gateway to serve my one-user app. []
  7. Ping.fm only has scheduled posting because HootSuite supports Ping.fm. It's not native. Hellotxt has native scheduling, but I haven't tested it yet. []
30Nov/110

Facebook News Apps Open Firehose of Pageviews

This is my fourth (and final) blog post assignment for my Journalism course. It's kind of an op-ed in its own right, though not something I was likely to bother writing about if not for the assignment.

Back in September, at its f8 conference, Facebook announced a new kind of app, with the ability to use "frictionless sharing" — basically a fancy way of saying that users' activity can be shared without users specifically clicking a "Share" button.

The first reaction to this announcement was lukewarm at best. As users began to notice just how much activity was being shared, they complained about both ends of the process. Some users were upset about how much of their activity was being shared (Spotify, in particular, started out by sharing every single track listened to); others felt overwhelmed by the new activity ticker in the upper right corner of their Facebook home pages (which was flooded by Spotify posts in the beginning).

News organizations jumped on board with their own apps for auto-sharing every article read by a Facebook user. Some, like Washington Post Social Reader, live entirely in Facebook, allowing (and encouraging) users to read articles without even leaving Facebook.com. Others, like Yahoo News, share activity from the organization's site via code that pushes activity to Facebook.

The Yahoo News model of frictionless sharing is actually more disturbing, because there's little to no indication to the user that sharing is taking place. Activity on Facebook can be reasonably expected to be shared, but activity on a third-party site seems outside Facebook's influence.

There are other considerations as well, around the meaning of sharing. As Farhad Manjoo wrote in a now-archived Slate article, "You experience a huge number of things every day, but you choose to tell your friends about only a fraction of them, because most of what you do isn’t worth mentioning." Nicki Porter, blogging for CopyPress.com (which provides content development services), made a very relevant point based on that: "If we only share about 10% of what we see online, we’re sharing the best 10%."

Philosophy and user opinion aside, the last two months have seen massive growth in news app usage. Poynter's Jeff Sonderman wrote this morning that news organizations are reaching millions of users through these new auto-sharing apps. In particular, Open Graph statistics released by Facebook yesterday show:

  • Yahoo News: 600% increase in traffic from Facebook; 10 million users connected, who read more articles than the average
  • The Independent: 1 million monthly active connected users; articles from the late 1990s taken viral
  • The Guardian: 4 million users installed their app, more than half of them under age 24; averaging nearly 1 million extra pageviews per day
  • Washington Post: 3.5 million monthly active users of Social Reader app; 83% of readers under age 35

Facebook is helping news organizations with a box at the top of the homepage News Feed that shows a small selection of stories that friends have read recently.

The lesson from all this is that a platform like Facebook, which has over 800 million active users (as self-reported on its statistics page), can be a real boon to news outlets. Traffic equals eyeballs, and more eyeballs can generate more advertising revenue.

What's especially interesting to me is how similar the new sharing (which is officially part of the Open Graph API) seems to Beacon, a "mistake" (said Mark Zuckerberg, founder of Facebook) that launched in late 2007 but was shut down in 2009.

Beacon also shared user activity on third-party sites back to Facebook, at first without permission. The class-action lawsuit Lane v. Facebook, Inc. resulted in Beacon being modified to require user confirmation before any sharing occurred. Open Graph sharing as it is today resembles the original Beacon, sending data to Facebook and publishing activity without any user intervention or even consent.

I, for one, stubbornly refuse to install any of those new auto-sharing Facebook apps. (Fortunately, it's pretty easy to bypass the request for permissions that pops up when I click an article featured at the top of my own News Feed.) I agree with Farhad Manjoo and Nicki Porter: Sharing is about choice. If I want to share something, I'll take the three minutes to post it myself.

Filed under: opinion, school No Comments
9Nov/112

Finding Sources for Interviews is Hard

This is my third blog post assignment for my Journalism class. I went for the reflection option this time instead of the news topic option because I had something to say about my experiences with the class in the last two weeks.

As I've worked to find people I can interview for my feature article, I've found that it can be really difficult to actually connect with even one person who can address the topic in question. Many people will simply ignore interview requests.

I'm sure part of the problem is my choice of subject. Not that many people know about Bitcoin, after all. What's more, privacy and anonymity are cornerstones of Bitcoin's design. That makes them part of the user culture…or maybe that just means Bitcoin attracts privacy fanatics.

In any case, I've successfully found only one source, an assistant professor of economics here on campus. I found him through the head of the economics department, and even that wasn't in time for me to include in my first draft anything he said. (He's only on campus on Fridays, and I didn't get his name until the Saturday before the Thursday my draft was due.) I also couldn't take the time to properly write my first draft. It was probably the roughest piece of writing I've ever submitted to a teacher, whether graded or not. (Well, there were those bits of writing I did in elementary school, but I won't count those because I don't count those years as part of my real education.)

On the social media front, I've had a nibble or two, but no real responses. I got a really good referral on Twitter from someone I interact with pretty often, who told me about a Bitcoin fanatic he knows, but this fanatic 1) has a private Twitter stream and 2) ignored my attempts to get in touch. What I said about privacy before definitely applies to this guy.

Actually, a follow-up message to the economics department chair here at Brandeis fell through the cracks when I asked about another source within the department who might be available for interview sooner — in time for my first draft. (I hope it fell through the cracks; the alternative is being ignored, and I don't like being ignored when I'm trying to do an assignment. No, Brandeis' email system doesn't lose messages. Google Apps has higher reliability than that. I use it for my personal domain, so I have some experience there.) I guess that can't be blamed on the Bitcoin culture.

Having failed to find any more sources in the week since turning in my draft, I plan to launch something of a guerilla campaign on Friday. (The rest of Wednesday and all of Thursday will be dedicated to making sure I finish my Java programming assignment by the deadline, and to studying for my Hebrew midterm on Friday morning.) My current campaign hit list includes the economics and computer science departments of several colleges, a few friends of mine who must either know about or know someone who knows about Bitcoin, and a couple of legal firms with which I have connections. This last item is important, as I need to understand the legal environment surrounding Bitcoins competing with the United States Dollar (and with every other nation's currency).

May my campaign result in a deluge of responses. If it doesn't work, I guess I'll be asking my professor for help on or around Tuesday afternoon.

As an aside, Bitcoin is also hard to research. In looking for material online (for not much has been said about it in physical media), I followed many dead links. The system is somewhat unstable, as shown by what happened when the Mt. Gox exchange was compromised (a part of my research); the information resources about it are even more so.

Thanks to my source-finding campaign plans and my need for better research, I foresee that my weekend will be full of work for my journalism class. Well, the part of it that is not taken up by tech week for The Last Night of Ballyhoo, for which I am the sound designer.

Perhaps I should just say that I will be having a busy week(end).

Filed under: musings, opinion, school 2 Comments
17Oct/110

Leaky Websites

This is my second blog post assignment for my Journalism course. As with the first, reposted here because "why not".

The New York Times' "Bits" blog published an article last Tuesday that really opened my eyes. The Center for Internet and Society at Stanford Law School released data on what information is passed between certain popular websites.

Long story short, logging in (or even trying and failing to log in) to a site can pass information about you to third parties. That information can be as innocuous (but still trackable) as a "unique identifier" generated by the site or as specific as your email address, username, and real name.

Somini Sengupta (author of the Bits blog post) says:

Take for instance these findings, released on Tuesday by computer scientists at Stanford University. If you type a wrong password into the Web site of The Wall Street Journal, it turns out that your e-mail address quietly slips out to seven unrelated Web sites. Sign on to NBC and, likewise, seven other companies can capture your e-mail address. Click on an ad on HomeDepot.com and your first name and user ID are instantly revealed to 13 other companies.

I did some digging of my own through the Microsoft­® Excel® spreadsheet available from the Stanford Law School page (direct link to XLSX file) and found some interesting examples of my own.

For example, MSN.com leaks your birth year and birthdate to FBCDN.net (a domain owned by Facebook and used for content distribution). Facebook's CDN can't possibly need that information for anything but tracking. Take another case: Ask.com sends your username to Google Analytics, reCAPTCHA (owned by Google), ScorecardResearch (part of comScore, Inc.), Gigya (a company that "makes websites social"), Quantserve.com (used by Quantcast, an advertising network), IMRWorldwide.com (controlled by Nielsen), and LinkedIn.

Incredibly, The Huffington Post's website sends your username to BlogCDN.com (another CDN), BuzzFeed ("Tracks the Web's Obsessions in Real Time"), AdSonar (owned by Advertising.com; provides targeted text ads), ScorecardResearch, AOL.com (Huffington Post's owner), FBCDN.net, aolcdn.com (AOL's CDN), ATWOLA.com (stands for AOL Time Warner Online Advertising; tracks surfing habits), Facebook.com and Facebook.net, Google Analytics, IMRWorldwide.com, Quantserve.com, and HuffPost.com (used for delivering static content without cookies, ironically); your birthday to BuzzFeed and IMRWorldwide.com; and your birth year to Advertising.com and ATWOLA.com.

The point is, any information given to a website as part of the registration process or entered later while updating a profile allows third parties to do just that: profile you as a person through your behavior across countless sites. All this tracking is thanks to the triviality of circumventing the "same origin policy" of data stored in browser cookies through collaboration between sites.

A standard feature of Web browsers is sending the address of the last page visited (the "referrer") to the page being loaded. In the case of images, scripts, or other resources loaded within a page, the referrer is the page in which they are embedded. If the page displaying advertising has personal information embedded in its URL, that information is passed on to any sites whose assets are embedded in the page. This kind of information leakage can be accidental as well as deliberate. It does not typically function for sites that are encrypted (URLs beginning with https://), as most browsers disable sending referrers for secured connections.

Websites intentionally wanting to share user information might go about doing so another way, and while I had a written explanation of an example process it is sufficient to say that methods for intentionally sharing information and tracking users across domains, even in spite of user privacy choices like clearing cookies, are numerous.

When information is revealed in the URL, it's not necessarily intentional. Back in May, Symantec discovered (The Daily Mail reports) that some applications on Facebook's platform were potentially giving advertisers access to users' accounts due to app URLs including access tokens, the bits of information older Facebook apps used to identify themselves and connect to users' accounts. It was just an oversight.

14Sep/110

Google Books and the Book Industry

I wrote this for my Journalism class at college, but figured I might as well share it here too.

The New York Times ran a story Monday about a new lawsuit filed against HathiTrust, a partnership of universities and research libraries that maintains a digital book collection on its website.

Plaintiffs in the suit include three major authors' groups: the Authors Guild, the Australian Society of Authors, and the Québec Union of Writers. Eight individual authors are also party to the filing, among them Pat Cummings, Roxana Robinson, and T.J. Stiles.

The objections raised in the suit center around the HathiTrust collection itself. "[S]even million copyright-protected books" (according to Paul Aiken, executive director of the Authors Guild, as quoted by the NYT) are available without any consent from the authors. The Authors Guild and its fellow plaintiffs say that the collection violates copyright law.

HathiTrust's collection consists of books digitized by Google, Inc. as part of the Google Books project, which has been steadily scanning books from participating university libraries across the United States.

The Google Books project has been the subject of many lawsuits over the years since work on it was begun in 2002. A few examples will help provide context:

  • 2005: The Authors Guild sues Google for "plain and brazen violation of copyright law" (archived press release from AG via Archive.org)
  • 2009: French court halts Google Books in France: the ruling applies only to books published in France under copyright (Los Angeles Times article)
  • 2010: Several professional photographers' organizations bring a class-action suit regarding the reproduction of copyrighted images within the books scanned by Google (Mashable.com article)

The Authors Guild has been involved with this issue before. This time, the fight has been brought to an organization with a bit less might than Google.

But never mind who sued whom, for what, and when. The issue is really quite simple, and most of the lawsuits against Google Books have had little to no merit.

United States copyright law (the laws under which most Google Books lawsuits have been filed) contains a doctrine known as Fair Use. It was originally intended to protect commentary, critique, and parody of copyrighted works. However, the principles of Fair Use (Cornell University Law School Legal Information Institute):

  1. "the purpose and character of the use" — e.g. for commentary, critique, parody, scholarship, etc.
  2. "the nature of the copyrighted work" — published/unpublished, fact/fiction
  3. "the amount and substantiality of the portion used" — how much of the work was used, and how significant the used portion is to the work as a whole
  4. "the effect of the use upon the potential market" — if the use of that portion will negatively affect demand for or the value of the original work

(Thanks to Stanford University's Copyright & Fair Use information center for helping me refresh my own memory of these concepts.)

The way Google Books works is carefully designed to fit within existing copyright laws. Books in the public domain are fully accessible, with no restrictions. Copyrighted, in-print books allow whatever access the publisher has specified. For in-copyright books that do not have a publisher, Google restricts access to "snippets", which show just a few words surrounding the user's search term.

So: Whenever Google Books shows a significant portion of a book, it has permission from the publisher to do so. Without permission, Google Books displays tiny fractions of the full work in an immensely transformative manner.

Google Books falls well within Fair Use doctrine, at the very least. Displaying card catalog – type information about the book plus at most a sentence or so for each search result (I'll go down the Fair Use list):

  1. Is for scholarly reasons
  2. Uses published works
  3. Displays at most a few percent of the whole book
  4. May actually increase demand for the books featured in the results

(Parts of Lawrence Lessig's 2006 video discussion of Google Book Search came in handy for an overview of how Google Books works.)

So why are publishers and authors suing Google and HathiTrust?

As far as I can tell,[original research?] HathiTrust follows the same rules as Google Books. This makes sense, as the content is from the Google Books program.

HathiTrust's entire archive is intended for academic use. It's unclear why the various plaintiffs in this new lawsuit are suing for the removal of their books from the archive, rather than suing for better access controls. If the concern is that anyone can access the books (which they can), then restricting access to verified researchers would clear up the problem.

It's like big music, film, and television. The music industry figured out that it could simply adapt to the Internet and start offering content over the new medium, giving people an alternative to pirated copies shared through services like Napster, LimeWire, and BitTorrent. Film and television haven't yet figured that out, and I guess the book industry is still working on it too.

7Sep/110

Finally: Google Voice Export Feature Released (sort of)

It took quite a while — more than two years since launching in March 2009—but Google Voice finally supports exporting!

I'd love to think my export format ideas post had something to do with the end product released yesterday, but I seriously doubt it.

Sort of…

Let's just say, Google Takeout isn't behaving very well. The test archive I created yesterday won't download, and I've tried both Google Chrome 13 and Mozilla Firefox 3.6. The feature isn't there yet, but I'm sure Google engineers are working on it.

I'm still happy…as soon as they make it actually work.