Theodore Pak

Hacker News Sidebar: An Extension for Google Chrome

20 March 2013

tldr: I’ve fixed up the Hacker News Sidebar Chrome extension. It shows comment threads from HN in a handy tab next to any other pages that you visit. Browse the source code on GitHub, and install the extension from the Chrome Store.

screenshot

Like many other web developers, I use Google Chrome as my primary web browser because it is incredibly fast, the UI gets out of the way, and the web development tools are superlative. That might change some day—Firefox is making a comeback with regard to all of these things, and I like using it as well, just not quite as much to make me switch back yet.

One surprisingly nice thing about Chrome is that it is really easy to develop extensions, particularly if you are a web developer. Generally all that’s required are a few JavaScript files and optionally some HTML and CSS and you could do anything from bookmark syncing, page syncing across devices, or elimination of those pesky tracking scripts. Admittedly there are limitations: you can only modify the browser UI in a few specific ways, like adding buttons or icons next to the Omnibox that pop out content, or items to the context menu.

Chrome extensions function much like an offline HTML5 app, running in the background as a hidden tab. Besides all the typical features of the DOM, like the XHR object for asynchronously communicating with webservers, they can access special Javascript APIs (under the chrome global object) for opening and manipulating tabs, history, windows, and bookmarks. They can also selectively insert Javascript into webpages, like the Greasemonkey scripts or Userscripts that are popular for enhancing the functionality of certain websites. To securely communicate between code running in the user’s browser tabs and your extension’s hidden background tab, you use a messaging API. All the special permissions your extension needs must be specified in a manifest.json file.

As you may have guessed by the links at the end of each post on this blog, I read HN quite a bit—enough so that I’d rather just direct any discussion to that forum, rather than try to police my own comments database. Meta-discussions on link-aggregating websites (e.g. HN, reddit) are often more interesting than the original content. Once upon a time I used this Chrome extension to reveal these HN threads automatically: it would pop out a side tab on any page that had been posted to HN. Whenever the orange tab happily emerged, a quick scan of the sidebar would show what HN thought and how many upvotes it had, sometimes significantly changing my impression of the original page.

Unfortunately, that extension has fallen into disrepair. HN has switched to HTTPS and now disallows framing of their site via the X-Frame-Options: DENY header ¹, causing the tab to be blank. Additionally, the Chrome extension packaging scheme has changed to tighten security policies, so barring some serious updates, this extension will stop working in September 2013. In order to get this extension running again, I had to rewrite much of it. Let’s break down the source code.

All Chrome extensions begin with a manifest.json file. The first half of this file says that I need the xhr_handler.js script to run in my background tab, and I need to insert hn.css, jquery.js, and script.js into all pages (as specified by matches). Note that through some scoping magic that Chrome does, these inserted scripts will not interfere with any global variables or functions created by their host pages, but they will still be able to affect the DOM. Finally, there is some metadata for the Chrome Store in the second half, along with permissions, which enumerates the domains that I want my background script to be able to access.

{
   "manifest_version": 2,
   "background": {"scripts": ["xhr_handler.js"]},
   "content_scripts": [ {
      "css": [ "hn.css" ],
      "js": [ "jquery.js", "script.js" ],
      "matches": [ "http://*/*", "https://*/*" ]
   } ],
   "icons": { "48": "icon-48.png",
             "128": "icon-128.png" },
   "description": "Hacker News integration for Chrome",
   "name": "Hacker News Sidebar",
   "permissions": [ "http://api.thriftdb.com/api.hnsearch.com/items/*", "https://news.ycombinator.com/*" ],
   "version": "1.0.8"
}

Let’s look at the script.js file next, which will be inserted into every page I visit. For the most part, this looks like a run of the mill jQuery script. The first atypical tidbit is this:

var port = chrome.extension.connect({}),
   callbacks = [];
// ...[snip]...

port.onMessage.addListener(function(msg) {
  callbacks[msg.id](msg.text);
  delete callbacks[msg.id];
});

function doXHR(params, callback) {
  params.id = callbacks.push(callback) - 1;
  port.postMessage(params);
}

This appears to be a wrapper for performing XHR (also called AJAX requests), but why all the bother when I would typically just use $.ajax? Well, despite their special scope, content scripts still execute with all the cross domain restrictions that the host page has; they can’t perform AJAX requests outside the host, port, and protocol the host page was served rom. That’s a problem because we need to pull outside content from both the HNSearch API and HN itself. So, to work around this, the code packages the parameters for the XHR into a message that is sent via the Message Passing API under the special chrome global object to the background script, xhr_handler.js, where it can executed with the correct permissions.

What happens there? Let’s see:

function xhrCall(url, port, id) {
  var xhr = new XMLHttpRequest();
  xhr.open("GET", url, true);
  xhr.onreadystatechange = function() {
    if (xhr.readyState == 4) {
      port.postMessage({id: id, text: xhr.responseText});
    }
  };
  xhr.send();
}

chrome.extension.onConnect.addListener(function(port){
  port.onMessage.addListener(function(request) {
    xhrCall(request.url, port, request.id);
  });
});

Pretty much all it does is actually perform the XHR request and send the responseText (the raw text content of the response) right back. Note the use on both ends of the port.postMessage method to send data and the addListener methods to create event handlers that receive data. This is very reminiscent of the HTML5 Web Workers API.

So now that we can perform XHR to the necessary outside sites, back in script.js, we query the HNSearch API filtering by the URL of the current page:

var exclude = /\.(xml|txt|jpg|png|avi|mp3|pdf|mpg)$/;
// ...[snip]...
var curPath = window.location.href;
if (exclude.test(curPath)) { return; }

var queryURL = "https://api.thriftdb.com/api.hnsearch.com/items/_search?filter[fields][url][]=" \
  + encodeURIComponent(curPath);
doXHR({'action': 'get', 'url': queryURL}, function(response) {
  // JSON.parse will not evaluate any malicious JavaScript embedded into JSON
  var data = JSON.parse(response);

  // No results, maybe it's too new
  if (data.results.length < 1) {
    doXHR({'action':'get','url': HN_BASE + "newest"}, function(response) {
      searchNewestHN(response);
    });
    return;
  }

  // If there is a result, create the orange tab and panel
  var foundItem = data.results[0].item;
  createPanel(HN_BASE + 'item?id=' + foundItem.id);
});

and if we find results, we kick off creation of the orange tab with createPanel; if not, we give the newest HN stories a try with searchNewestHN, in case this is a story that just made it to the front page. searchNewestHN works by parsing the HTML of the front page and finding any anchors that link to this page.

function searchNewestHN(html) {
  var titleAnchor = $('a[href=\'' + window.location.href.replace(/'/g, "\\'") + '\']', html),
    linkAnchor = titleAnchor.parent().parent().next().find('a').get(1);
  if (linkAnchor) {
    createPanel(HN_BASE + $(linkAnchor).attr('href'));
  }
}

If it does find it, it too calls createPanel. The createPanel function is pretty dry with a lot of dry DOM construction, but the fun part is when we actually need to put the HN comment thread into the DOM:

function createPanel(HNurl) {
  // ...[snip]...
  var HNembed = $("<div />").attr({'id' : 'HNembed'});
  var HNsite = $("<iframe />").attr({'id' : 'HNsite', 'src' : 'about: blank'});
  // ...[snip]...
  HNembed.append(HNsite);
  HNembed.hide();

  $('body').append(HNtab);
  $('body').append(HNembed);

  doXHR({'action': 'get', 'url': HNurl}, function(response) {
    var doc = HNsite.get(0).contentDocument;
    response = response.replace(/<head>/, '<head><base target="_blank" href="'+HN_BASE+'"/>');
    doc.open();
    doc.write(response);
    doc.close();
  });
}

This is not your typical use of an <iframe>: instead of setting the src attribute to the comment thread URL, which we can’t do because HN forbids framing, we have to leave it as about:blank, fetch the contents of the comment thread page via XHR, and then write to the <iframe>’s document object extracted with the contentDocument property. We can write the same HTML that HN serves, with one modification: by adding a <base> element to the <head> of this document, we can ensure that all the links within are followed relative to HN, and not the URL of the host site. The .open() and .write() calls here are indeed those yucky methods you have been taught to never use since DOM scripting became practical, and in general you really shouldn’t, but here it is fair game because the document we are writing to is completely empty.

So, after all of that lovely bit-pushing, we get a collapsible sidebar with an HN comment thread in it! To try it out yourself, install the extension from the Chrome Store, or clone the source code and load the directory into Chrome as an unpacked extension.

screenshot

One note on a privacy weakness of this extension, which you may have noticed if you were following the code above: the extension sends all the URLs that you visit to https://api.thriftdb.com/. Obviously, that isn’t going to be acceptable for some users, but there’s no way to look up Hacker News threads for all the URLs you visit without sending those URLs somewhere. The Hacker News’d extension, which does something similar but doesn’t add the comment thread to the browser window, makes a strange attempt to mitigate privacy concerns by MD5ing URLs, but then it requires a special server-side component that is constantly scraping the front page and shoving the MD5s into a MongoDB on Heroku, and besides that MongoDB being necessarily more incomplete than the official API, it doesn’t really solve the privacy problem anyway. So, if you want to use this on your primary browser you basically have to trust the ThriftDB folks with your browsing history (hey, at least it’s being sent over SSL).

Presumably, HN uses the relatively new X-Frame-Options header to prevent people from embedding HN comment threads directly into their own site, either because they don’t want them surrounded by other people’s content or because they want to block clickjacking schemes for tricking users into upvoting items. Unfortunately this also precludes the simplest method of accomplishing what the extension used to do—loading the comment thread into an <iframe> by setting its src attribute. I suppose I could have also tried to get around this restriction by modifying HTTP headers as they are received, but the way I did it seemed easier. ↩

Discuss this post on HN.

All Science is Anthropological at the Margins

28 February 2013

science 2

It is no secret that hard science folks (think Physics, Engineering, and Math majors) sometimes look down on the “softer” sciences for being less rigorous. There is a joke that a professor from my college once told, which I’m copping directly from a former classmate’s blog:

The library at the Princeton Institute of Advance Study was divided into two wings. One wing was for the sciences, which included all the updated journals in math, physics, etc. The other wing was for the humanities, which had the analog journals in history, literature, etc. Given that this was an advanced institute, these scholars spent a great many hours in the library, in their respective wings. One day, Kurt Gödel had enough, gathered a stack of journals, walked to the central librarian, and stated that these journals have been shelved incorrectly. The librarian looked puzzled because all the call numbers to the anthropology journals were correct. In response, Gödel supposedly shouted, “These surely cannot belong in the science wing!

As much as anthropology can’t get no respect, I’m starting to notice a funny trend in the classes I’m taking in grad and med school as compared to college. Introductory science and mathematics classes in college are very didactic: you learn principles and their applications. Although in introducing a topic professors talk a little about its backstory—e.g., how Rutherford’s experiment with gold foil led to the development of the Rutherford model of the atom, which became the Bohr model—the who and the why is secondary to grokking the setup, the principle demonstrated, and its logical consequences.

And that is in fact a wonderful thing about math and science, and a large part of why I liked it throughout grade school: rote memorization was de-emphasized to the point of remembering only a small set of principles, and the skill lay in combining and applying them. By contrast, history courses seemed to be full of details that had to be slowly committed to memory with little to tie them together except that things happened to turn out that way. Whereas knowing F = dp/dt could take you very far in solving a multitude of physics problems, knowing Moctezuma took power in 1440 would not help you guess how long he stayed in power, how he died, or much else about his life or Aztec history.

There is a tide shifting, however, in my grad school courses on the biomedical sciences. Whereas the principles taught in college are usually well established, once you reach the fringes of what is known (i.e., results from just a few years ago), there are enough potholes that the multiplicative benefits of combining prior knowledge breaks down. For example, here’s a slide from a developmental biology lecture I had this week:

non-canonical Wnt pathway

There are almost as many question marks here as named proteins! It turns out that this is the “non-canonical” Wnt pathway, which the lecturer happens to research in his own lab. That name already tells you a few things: that there is a “canonical” pathway (the one enshrined in Alberts’ MBOC, of course), that this one is less established, and that people don’t completely understand the relationship between the two yet. The fact that one pathway is “canonical” is a complete accident, since that one would have likely been considered differently had it been discovered later, or at the same time as the “non-canonical” pathway.

This is starting to sound as arbitrary as history, right? It is, and the reality is that in order to do research at the fringes of science, particularly in super sparse fields like the life sciences where there are a lot less researchers than things to study (there are possibly billions of molecular interactions that take place in the human body), you need an anthropological backstory of what you are studying. That includes: 1) who worked on it and what else they worked on, 2) why they started working on it, and 3) what their assumptions were. Not only is it helpful for understanding how things in the subfield were named—gene names have a tendency to get really fanciful¹—but it is crucial for properly evaluating the claims made and the way the evidence is presented.

Speaking of naming things: odd names pop up just as frequently in Anatomy and lately, Histology. For better or worse, many of the cells in the body were named by histologists who were working in the late 19th and early 20th century and could often see a lot more than they could understand (certainly at a molecular level). Also, what they saw depended on the stains they chose. The result is that in the 21st century, we still use names like “enterochromaffin cells”, “eosinophils”, and “basophils,” which have nothing to do with the function of the cells, but reflect the stains that revealed them on microscope slides.

So lately, science classes seem to be more and more dependent on the history and people behind them. A unit on a particular cellular transport mechanism, summarized as “this person argued for this controversial idea, the more established groups vehemently responded, and a great solution was overlooked for years because the author died before he was recognized” could equally describe a history of cultural movements in 19th century France. Is there something to be learned from this? Well, a new scientific hypothesis has much in common with a cultural movement: when only a few labs are willing to come up with supportive data, how the greater scientific community will respond depends quite a bit on the kind of the people behind it, how well the idea is marketed, and the responses of others with vested interests. (A 1% rule borrowed from web design is probably appropriate here: for every 100 people that hear about your great idea, 10 will actually understand it enough to care, and maybe 1 will do something about it.)

The beauty of science is that data trumps all, as my dad says. The tough news is that getting enough other people to hear your idea and generate data about it, to the point where it becomes a principle that stands on its own, is an anthropological problem. That’s why it takes two decades from an initial (amazing) result to get a Nobel Prize, assuming of course that it is even awarded to the right person. Until then, to make sense of brand-new science, you have to learn just as much about the people behind the ideas, because that is often the only way to comprehend and contextualize what they are trying to say.

A sample of actual gene names made up by goofy D. melanogaster researchers: Bride-of-Sevenless, Kevin and Barbie, Swiss Cheese, cheapdate, Mothers-Against-Dpp, Sugar Daddy, and John Wayne Bobbit. Must be all those ether fumes… ↩

Discuss this post on HN.

Pebble, a watch made for iPhone/Android

26 February 2013

I ran across this interesting gadget while stumbling upon a favorable review of it on another blog.

It’s a watch with an e-Ink display that is meant to be tethered via Bluetooth to an iPhone. Besides displaying the time, which it can obviously do in just about any format (analog/digital/words), it will discreetly display texts, caller ID data, and other alerts from your phone. When the phone is playing music, it can control playback.

Watches seem to currently occupy a tenuous and nostalgic place in our digital lives—unless you are wearing them for show, sport, or jewelry, a brick phone from 2001, functionally speaking, can outdo a Swiss chronometer in every regard except size. Sadly, that includes displaying an accurate time for the local timezone: GSM time has always been more consistent than whatever my radio-synchronized watch showed, and it required zero fiddling. Having been wristwatch-less for a while now, I’ve only missed it in two scenarios: 1) when, in the middle of a conversation, I needed to check the time and pulling out a phone seemed faux pas; and 2) trying to time something while my hands are occupied, e.g. taking a pulse or respiratory rate (who took all the clocks out of hospital exam rooms?)

So, the smart move by watchmakers is to have them piggyback off of a phone’s capabilities. However, the established brands have mostly ignored this potential, with only two lackluster products that I can find: Sony’s is only Android compatible, and G-Shock is still using the same LCD screens as ten years ago. Even car manufacturers like GM are ahead of the game on this one, dropping CD players from cars marketed to younger drivers and prioritizing smartphone integration. Because seriously, besides an FM radio, what can a car audio system offer that a phone with an iTunes library, Pandora, and Spotify couldn’t outdo?

Thankfully, it appears that Kickstarter projects have stepped in to fill the void. The Pebble makes a case for being permanently clasped to your wrist by offering a second interface to your phone during those times when yanking out the phone seems rude, inconvenient, or both. And when it’s not doing that, it is probably pretty good at telling the time, since it pulls that from your phone too. Unlike most geeky watches, it looks more nondescript than ridiculous, and the e-Ink display can display information all day long without sucking down battery life.

Particularly cool for the programming crowd is that they will release an SDK. Surely, hackers will find interesting uses for a 10k-pixel wrist display with a 4G data connection.

I can’t justify buying it yet, but I’ll be curious enough to follow this product type and see if it catches on. Other comparable items in this space from no-name brands have been cheap and trashy so far; by creating something that looks like a watch, designing it around seamless integration with popular smartphones, and promoting a dev environment for the geeks that might want to make apps, Pebble and its self-funded design team could win big.

Discuss this post on HN.

Ted Pak

Assistant Professor, Division of Infectious Diseases; University of California, Irvine

Hacker News Sidebar: An Extension for Google Chrome

All Science is Anthropological at the Margins

Pebble, a watch made for iPhone/Android