Why Blag

06 February 2013

Ah, the old existentialist topic that plagues all too many blogs, sometimes as a “frist post!” article, other times as a retrospective question from a blogger that wonders if it has been sucking up too much time. I’ve read these posts all too often wondering if any of the reasons apply to me. Currently, I’m a medical and graduate student, not an entrepreneur seeking publicity, nor a writer for a magazine, nor a full-time web dev sharing tips on design and content. Therefore, their opinions, while sensible, don’t seem to apply as much to my life. Dr. C. Titus Brown, a bioinformatician, perhaps hits closest to home in his take, where he draws from Paul Graham (a idol among young Valley founders) to outline his thoughts on blogging as scientist. The punchline:

I write blog posts because I want to figure out why I think what I think.

This rings true for me. I can’t count the number of times I’ve stopped mid-sentence while writing, discovering that what I’m trying to express had a hidden problem. The process of writing forces you to think through the details, refining mental mush into crystallized logic. Best of all, you can refer back to written words whenever you want to recall exactly how you formed your arguments.

Furthermore, I want to write faster, and that requires practice. There are a few people I’ve met who have the enviable skill of banging out 10-sentence emails in under a minute. Email is not going anywhere for decades. As far as I could tell, all of my past PIs (bosses, for anybody who hasn’t worked in a lab) spent a lot of time handling email. Not to mention, physicians are now expected to write tens to hundreds of notes into medical records every day, without having time explicitly allotted for it. So, my future day-to-day will inevitably involve a lot of writing, and I might as well get ready for it.

On the proposition of blogging, researchers sometimes counter that they are already expected to do a lot of writing: grant proposals, abstracts, and publications, publications, publications. Time spent blogging is time spent not doing those things. Should scientists really only communicate via peer-reviewed journal articles? There are obvious drawbacks:

  • The lag time on publication is something on the order of months, even with early access.
  • Not all opinions or pieces of data merit publication in a “worthwhile” journal for an author to bear the costs of submission, but they may be interesting to other scientists anyway.1
  • Most journals are locked up behind paywalls, despite publishing mostly publicly-funded work.
  • Publication bias almost certainly exists in most journals, leading to an overrepresentation of positive results.2

On the other hand, there are equally concerning drawbacks to moving most scientific conversation into the blogosphere:

  • Blogs mostly operate as a huge decentralized network, with content appearing and disappearing at will, and no central index (blog search engines exist, yes, but not specifically for content by scientists.) Citations are much less useful without content archival.
  • Statements are made without any peer review. (The obvious counter-argument here is that the current peer-review culture of science has serious flaws, which is already believed by many.)
  • Most frightening, and a specific drawback of the current peer-review process: scientists might blog about opinions that come back to haunt him in the anonymous review process for a journal or grant.

Because this debate is playing out slowly within the current ecosystem of journals, reviewers, and funding sources, it might be a while before scientists can start putting blogs on R01 applications. I am heartened by the amount of attention Open Access is now getting, along with innovators in science publishing like Figshare. But that’s not really what motivates me to start a blog now, anyway.

What really tips the scales is that the web will the publishing vehicle of choice for next-generation science, certainly for people in my field of interest (systems and computational biology). It’s the nature of the beast: models, databases, live visualizations, and tools cannot live exclusively within the hallowed confines of a PDF article. Biblical bioinformatics projects already reside entirely inside websites, and essentially only publish journal articles to maintain a current citation: the UCSC genome browser, UniProt, Entrez, Saccharomyces Genome Database, etc. While the information they contain will remain relevant for a long time, web technologies tend to move a lot faster than science. Sometimes cool things happen when you marry newer web technologies with the data on the venerable genomics sites; indeed, this was the premise for ChromoZoom, a genome browser that feels as slick as Google Maps.

How can you stay up to date with technology (particularly for the web) when it moves so much faster than science? It requires practice and experience. That means building and maintaining any sort of project, even if it as lowly as a blog. Blog technology has evolved a lot over the past decade, with homespun static websites evolving into the Livejournals and Xangas, replaced by the more customizable and self-hostable Wordpress and Movable Type blogs, and most recently, the wave of sleeker Twitter-inspired microblogs like Tumblr. As blogs started catering more content to link listing sites like Digg and Reddit, traffic tended to get more “spikey,” and there was nothing more embarassing than losing 10,000 potential hits to a slow backend database; therefore, another recent blog-engineering trend has been to host as much content as possible from static files, with dynamic features like comments “outsourced” to services like Disqus, Facebook, and IntenseDebate. For HuffPost-tier sites, it is then easy to push static content to content-delivery networks like Akamai and Amazon so they can handle serving it from high-performance datacenters while the bloggers focus on churning out more linkbait.

As much as Linux gearheads will groan at the premise that building a blog is at all interesting, there is actually a lot to learn by doing so. (There is a reason Rails and most other web frameworks continue to push screencasts showing how fast you can code up a blog.) The web will need to host blog-like content until the internet dies. Engineering innovations will follow this need, whether it’s nginx for super-fast static content hosting, jekyll for neatly transforming source content into web content, or unicorn for running Rack applications behind nginx. On that note, a post on the tools that power this site is forthcoming.3 I hadn’t worked with any of them before making this and I’m glad I took some time to learn them.

So why blog, if you are a budding medical or graduate student? If you are at all interested in computational research, I say do it for three reasons: 1. it helps refine your thoughts, and improve your writing; 2. it promotes an alternative and open form of scientific dialogue; and 3. it keeps you current on web technology. That’s all for now folks—let me git commit -a && git push and get these bits on the web already!

Update: Soon after I wrote this, I ran across an interview with two social scientists on this very topic. Their takeaway: “Blogging is quite simply, one of the most important things that an academic should be doing right now.” Quite different reasons than mine, but informative nonetheless, and it is followed by a recent reply from a Discover Magazine writer.

  1. This is almost certainly true for computational work: new processes, tools, and scripts are exciting and interesting to other people on the web, and while none of these typically merit publication, they can really help others in the field and usually are easily suited to sharing over the web. I’ve probably learned a thousand random tricks from blog posts that help me each day.

  2. A lot of the contemporary discussion on this stems from a widely-cited article called Why Most Published Research Findings Are False by Dr. John P. A. Ioannidis.

  3. Short version: the three things I just mentioned, on a Linode 512.