Theodore Pak

07 February 2013

I’ll start off this blog with something lightweight and relatively useless except for creating amusing internet gimmicks and perhaps the occasional animated diagram. Surprisingly, there are few good tools for making animated GIFs on a Mac or Linux desktop. On Windows, Paint Shop/Animation Shop used to be the go-to suite for doing this—now, the closest thing for Mac folks might be GIFBrewery, which has far less features and some mixed reviews. I haven’t plunked down $6 for that yet, so I’ve still been trekking about on the command line, which is certainly more tedious, but you can control every step of the process and get exactly the results you want.

Prerequisites

You won’t be able to jump into this without a little comfort on the command line, and the installation of ffmpeg (if you are working from a video source) and ImageMagick.

If you’re on a Mac, and you already use MacPorts, it should be a simple sudo port install ImageMagick ffmpeg and a lot of thumb twiddling. If you don’t, try installing HomeBrew which is a little more lightweight, and run a brew install imagemagick ffmpeg.

For Linuxes, I’ll leave you to figure out how to get those two packages from your package manager. Depending on how idealistic your distro is, it might require adding some non-free repositories or even compiling ffmpeg from scratch. You only really need ffmpeg if you have a video source file, though; everything else is done with ImageMagick.

The full workflow

First, you want to start with your source video or images. If it’s video, you can crop/trim it down to the relevant clip with Quicktime X or any more advanced video-editing suite. Then, it’s time to extract the images with ffmpeg. Open a terminal and change to the directory with your movie file, and assuming whatever.mov is the filename:

$ mkdir decode
$ ffmpeg -i whatever.mov decode/%d.png

Now you’ll have a sequence of images in the decode folder that correspond to each frame of your movie. At this point, it’s a good idea to check out this folder and preview the sequence of images. They are likely to be far too big to be appropriate for a GIF, so we have to resize and crop them. We may also want to refine the range of frames that will be used in the gif.

Bash can expand a sequence of numbers using the {$begin..$end} notation, which functions as a shell glob. We can use this to select the frames we want, resize and crop them with ImageMagick, and put these into a new folder. In this example, we’ll take the first 12 frames, crop them to 405x720 starting 437 pixels from the left side, and resize them by 30%.

$ mkdir resized
$ convert decode/{1..12}.png -crop 405x720+437+0 -resize 30% resized/%d.png

Time to check the output images again; if they aren’t how you like, you might try fiddling with the parameters in the previous command until it looks better.

A typical thing you might need to do at this point is label some of the frames. For that, we can use the -annotate feature of ImageMagick.

Quick aside: It might so happen that you don’t have any civilised fonts (e.g. Impact) available for use by ImageMagick, which requires them to be in TTF format and specified by an XML configuration file. To fix that, you can follow this guide; the short version is, convert the font into TTF format, put it somewhere sane like /usr/local/share/fonts and then write up an ~/.magick/type.xml that points to it:

<?xml version="1.0"?>
<typemap>

  <type
     format="ttf"
     name="Impact"
     fullname="Impact"
     family="Impact"
     glyphs="/usr/local/share/fonts/Impact.ttf"
  />

</typemap>

Back to text annotation. Note that we set the text and Y coordinate (from the bottom of the image) at the start of this command to avoid repetition, since the text must be painted twice for best results (once for the outline, and once for the fill). To print it near the top of the image instead, change -gravity south to -gravity north. We can copy over any images that we don’t want labeled.

$ mkdir annotated 
$ TEXT="CAPTION" && YDIST="10" && convert resized/{5..8}.png -pointsize 24 \
  -font Impact -strokewidth 2 -stroke black -fill white -gravity south -annotate \
  "+0+$YDIST" "$TEXT" -stroke none -annotate "+0+$YDIST" "$TEXT" annotated/%d.png
$ cp {0..4}.png {9..11}.png annotated

One last thing we may want to do is play the frames in reverse at the end of the GIF, so it appears to play seamlessly when looped. For that, a simple cp inside a for loop will do, but note that we omit copying the first and last frames to avoid playing them twice. Set LAST to the number of the last image in your animation.

$ LAST=11 && for i in `seq $[LAST-1]`; do cp annotated/$i.png \
  annotated/$[LAST*2-i].png; done

Finally, it’s time to assemble this bad boy into a GIFfy little bundle of web-friendly glory.

$ convert -delay 6 -loop 0 annotated/{0..21}.png out.gif

out.gif will contain your masterpiece. Preview it in a few web browsers to get a sense for its flavor. N.B. with that argument to -delay: Using any value lower than 6 is liable to produce strange results in Internet Explorer. This has to do with legacy-compatible interpretations of what the maximum reasonable frame rate for a web browser animation is, explained in great detail elsewhere.

Discuss this post on HN.

06 February 2013

Ah, the old existentialist topic that plagues all too many blogs, sometimes as a “frist post!” article, other times as a retrospective question from a blogger that wonders if it has been sucking up too much time. I’ve read these posts all too often wondering if any of the reasons apply to me. Currently, I’m a medical and graduate student, not an entrepreneur seeking publicity, nor a writer for a magazine, nor a full-time web dev sharing tips on design and content. Therefore, their opinions, while sensible, don’t seem to apply as much to my life. Dr. C. Titus Brown, a bioinformatician, perhaps hits closest to home in his take, where he draws from Paul Graham (a idol among young Valley founders) to outline his thoughts on blogging as scientist. The punchline:

I write blog posts because I want to figure out why I think what I think.

This rings true for me. I can’t count the number of times I’ve stopped mid-sentence while writing, discovering that what I’m trying to express had a hidden problem. The process of writing forces you to think through the details, refining mental mush into crystallized logic. Best of all, you can refer back to written words whenever you want to recall exactly how you formed your arguments.

Furthermore, I want to write faster, and that requires practice. There are a few people I’ve met who have the enviable skill of banging out 10-sentence emails in under a minute. Email is not going anywhere for decades. As far as I could tell, all of my past PIs (bosses, for anybody who hasn’t worked in a lab) spent a lot of time handling email. Not to mention, physicians are now expected to write tens to hundreds of notes into medical records every day, without having time explicitly allotted for it. So, my future day-to-day will inevitably involve a lot of writing, and I might as well get ready for it.

On the proposition of blogging, researchers sometimes counter that they are already expected to do a lot of writing: grant proposals, abstracts, and publications, publications, publications. Time spent blogging is time spent not doing those things. Should scientists really only communicate via peer-reviewed journal articles? There are obvious drawbacks:

The lag time on publication is something on the order of months, even with early access.
Not all opinions or pieces of data merit publication in a “worthwhile” journal for an author to bear the costs of submission, but they may be interesting to other scientists anyway.¹
Most journals are locked up behind paywalls, despite publishing mostly publicly-funded work.
Publication bias almost certainly exists in most journals, leading to an overrepresentation of positive results.²

On the other hand, there are equally concerning drawbacks to moving most scientific conversation into the blogosphere:

Blogs mostly operate as a huge decentralized network, with content appearing and disappearing at will, and no central index (blog search engines exist, yes, but not specifically for content by scientists.) Citations are much less useful without content archival.
Statements are made without any peer review. (The obvious counter-argument here is that the current peer-review culture of science has serious flaws, which is already believed by many.)
Most frightening, and a specific drawback of the current peer-review process: scientists might blog about opinions that come back to haunt him in the anonymous review process for a journal or grant.

Because this debate is playing out slowly within the current ecosystem of journals, reviewers, and funding sources, it might be a while before scientists can start putting blogs on R01 applications. I am heartened by the amount of attention Open Access is now getting, along with innovators in science publishing like Figshare. But that’s not really what motivates me to start a blog now, anyway.

What really tips the scales is that the web will the publishing vehicle of choice for next-generation science, certainly for people in my field of interest (systems and computational biology). It’s the nature of the beast: models, databases, live visualizations, and tools cannot live exclusively within the hallowed confines of a PDF article. Biblical bioinformatics projects already reside entirely inside websites, and essentially only publish journal articles to maintain a current citation: the UCSC genome browser, UniProt, Entrez, Saccharomyces Genome Database, etc. While the information they contain will remain relevant for a long time, web technologies tend to move a lot faster than science. Sometimes cool things happen when you marry newer web technologies with the data on the venerable genomics sites; indeed, this was the premise for ChromoZoom, a genome browser that feels as slick as Google Maps.

How can you stay up to date with technology (particularly for the web) when it moves so much faster than science? It requires practice and experience. That means building and maintaining any sort of project, even if it as lowly as a blog. Blog technology has evolved a lot over the past decade, with homespun static websites evolving into the Livejournals and Xangas, replaced by the more customizable and self-hostable Wordpress and Movable Type blogs, and most recently, the wave of sleeker Twitter-inspired microblogs like Tumblr. As blogs started catering more content to link listing sites like Digg and Reddit, traffic tended to get more “spikey,” and there was nothing more embarassing than losing 10,000 potential hits to a slow backend database; therefore, another recent blog-engineering trend has been to host as much content as possible from static files, with dynamic features like comments “outsourced” to services like Disqus, Facebook, and IntenseDebate. For HuffPost-tier sites, it is then easy to push static content to content-delivery networks like Akamai and Amazon so they can handle serving it from high-performance datacenters while the bloggers focus on churning out more linkbait.

As much as Linux gearheads will groan at the premise that building a blog is at all interesting, there is actually a lot to learn by doing so. (There is a reason Rails and most other web frameworks continue to push screencasts showing how fast you can code up a blog.) The web will need to host blog-like content until the internet dies. Engineering innovations will follow this need, whether it’s nginx for super-fast static content hosting, jekyll for neatly transforming source content into web content, or unicorn for running Rack applications behind nginx. On that note, a post on the tools that power this site is forthcoming.³ I hadn’t worked with any of them before making this and I’m glad I took some time to learn them.

So why blog, if you are a budding medical or graduate student? If you are at all interested in computational research, I say do it for three reasons: 1. it helps refine your thoughts, and improve your writing; 2. it promotes an alternative and open form of scientific dialogue; and 3. it keeps you current on web technology. That’s all for now folks—let me git commit -a && git push and get these bits on the web already!

Update: Soon after I wrote this, I ran across an interview with two social scientists on this very topic. Their takeaway: “Blogging is quite simply, one of the most important things that an academic should be doing right now.” Quite different reasons than mine, but informative nonetheless, and it is followed by a recent reply from a Discover Magazine writer.

This is almost certainly true for computational work: new processes, tools, and scripts are exciting and interesting to other people on the web, and while none of these typically merit publication, they can really help others in the field and usually are easily suited to sharing over the web. I’ve probably learned a thousand random tricks from blog posts that help me each day. ↩
A lot of the contemporary discussion on this stems from a widely-cited article called Why Most Published Research Findings Are False by Dr. John P. A. Ioannidis. ↩
Short version: the three things I just mentioned, on a Linode 512. ↩

Discuss this post on HN.

Ted Pak

MD/PhD and Infectious Diseases fellow at Mass General Brigham.

Creating Animated GIFs with ImageMagick & ffmpeg

Prerequisites

The full workflow

Why Blag