Anki-Slideshow: Publish Anki flashcards to the web

I’ve started putting all my medical school flashcards on this website using some Python and Ruby. Read how I did it below, or get the source code.


I’ve been using Anki to generate flashcards for just about every class in medical school since the second semester, and it’s been clutch for trying to memorize the insane amounts of material that we’re given. I’d say that Anki is almost perfect for medical students and the reasons why deserve a post all to themselves. This post, rather, is about something that I thought was missing from Anki—namely, publishing the content to the web—and how I wrote some code to fix it, all thanks to Anki being open source and extensible.

Most people think using open source software is an ideological consideration, but as a student, it can be just as much a practical benefit. Students are already reliant on so much software, from office programs to course management systems, much of it proprietary. It can be disastrous when those apps become unsupported, stop working on your new device, or worse, lose data. Imagine taking a semester’s worth of notes on a shiny new website or app that you tried because it was free or had some killer feature that you couldn’t resist. Then the company goes out of business, and the website shuts down. Or, your computer breaks, and the app doesn’t work on your new one. Maybe the company decides to disable support for exporting content unless you pay a huge fee. These situations are terrifying—especially if you have a test, much less a career licensing exam, right around the corner.

With Anki, not only is the desktop program free, but all notes are stored permanently on your own computer. Also, because it is open source, your data will always be retrievable, even if that means some tooling around with some Python libraries or firing up an SQLite console. It’s literally impossible for the developers to shut you out, and even if they tried, somebody could fork Anki and fix it. For these reasons, data in open source software is inherently much more future-proof than something like Microsoft Office. In case you didn’t know, Office still has no complete specifications for its legacy file formats, and probably never will. (And now it’s moving to the cloud with Office 365, where you will have even less control over preserving your content.)

As it turns out, Anki is exceptionally customizable. Since the desktop app is written in Python with bindings to Qt for the GUI, not only is it cross-platform, but it supports writing plugins as Python scripts, called “addons”. That means: no compiling, everything lives in one text file, and live objects can be debugged in a console. Sweet! You can do everything from GUI alterations to enhancing internal data structures or the scheduling algorithm. There are quite a few addons already available online.

That brings us to my problem: most kinds of content have a “natural” online destination for sharing: Flickr and Instagram for photos, YouTube for video, Slideshare for slide decks, and Scribd for office documents. Anki doesn’t really have anything in the same league. Yes, there’s AnkiWeb, but it’s designed to be used primarily as a sync backend. An account is required to do anything, it has less features than the desktop app, and the sharing functions are pretty limited. There are a few gems hidden in there for medical students, but in general it’s hard to find, share, and show off good Anki material.

And so, I was struck with the idea of creating a way to do exactly this. I spend most of my time in lecture writing flashcards, and I don’t want them locked up on my computer forever. It would be nice to have them right here on this blog, not only for me so I can watch them when I’m bored, but so that others might do the same. Sometimes it’s nice to passively browse through content. Why watch cat videos when you can watch Pathology flashcards? (Har har.)

The vision was to create a menu item within Anki that you click, and then everything gets synced to the web server, which plays the cards like a slideshow. The Anki component fits entirely in one short Python script. There are only 5 functions, the key one being exportCardsToWeb(). In short, the mw.col object, which represents the current collection of Anki notes, is dumped into a JSON file holding the decks and the cards within them, rendered to HTML by the renderQA() method. Then, the whole mess is copied using rsync to a destination of your choice. Adding these functions into the Anki GUI is as easy as the following lines at the end of the script:


action = QAction("Export Cards to Anki-Slideshow", mw)
mw.connect(action, SIGNAL("triggered()"), exportCardsToWeb)

action = QAction("Change Sync Destination for Anki-Slideshow", mw)
mw.connect(action, SIGNAL("triggered()"), getSyncTarget)

This throws an extra section with two items onto the end of the Tools menu of the main window. Clicking on these items runs the functions written at the beginning of the script.

It’s not that hard to serve data from a big JSON file with a web app. It’s a perfect use case for the lightweight Sinatra web framework. The entirety of the Ruby code written for this is 72 lines, with around 75 lines of HTML templating in ERB. There are more lines of Javascript (~150 + a few jQuery plugins) than server-side code. I wanted to serve this from a subdomain of this server ( and have it look decent both on its own and as an embeddable widget. There are only three routes defined:

get "/" do
  redirect "/" + AnkiSlideshow.decks.keys.sample

get "/:image.jpg" do
  content_type "image/jpeg"
  send_file File.join(AnkiSlideshow.media_dir, params[:image] + ".jpg")

get "/:deck" do
  @title = @deck_name = params[:deck]
  deck = AnkiSlideshow.decks[params[:deck]]
  pass unless deck
  random_card_id = deck.sample && deck.sample.to_s
  if random_card_id then @card =[random_card_id]
  else @card = {"q" => NO_CARDS_MESSAGE, "a" => NO_CARDS_MESSAGE}; end
  erb :card

In short, this means that if you go to the bare domain, you’re redirected to a random deck; if you fetch a URL ending in “.jpg”, it grabs the picture (if there is one) and serves it. If the URL is anything else, it looks for a deck with that name and serves a random card from it.

The rest of the hard work is done by some jQuery in the interface. Once it is time to fetch a new card, a request is sent via AJAX to the same /:deck URL route, which provides the full HTML for a new random card. To avoid a distracting transition, the new card is parsed out of the HTML and inserted in the current page. If the user changed the deck they want to view, we can even gracefully update the URL in the address bar using the HTML5 History API. A lot of this is made straightforward by the jQuery .load() method, as can be seen in the excerpt below:

$("#next-card").load(href + " #card > *", function() {
  $('#card').fadeOut(fadeTime, function() {
    if (window.history && window.history.pushState 
        && window.location.href != href) {
      window.history.pushState(null, null, href);
    $('#card').empty().show().append($('#next-card > *'));
    $('#next').val('Turn over').unbind('click').click(flipCard);
    $('#timer').trigger('callback', flipCard);
    $('#timer').trigger('start', changeTime[0]).removeClass('transition');
    $('#card .front').addClass('solid').fadeIn();
    $.scrollTo('#card .front', fadeTime, {offset: {axis: 'y', 
        top: -switcherHeight}});
    if ($('body').is('.hover')) { $('#timer').trigger('stop'); }

That $.scrollTo() call provides a nice animated transition to the reverse side of the card in case it is off the edge of the screen, using the very nice scrollTo jQuery plugin.

Since I preferred the web app to run more like a passive slideshow than an active learning tool like the Anki apps (requiring you to click through the cards according to right or wrong answers), the bulk of the remaining JavaScript is dedicated to making the little timer widget in the upper right corner, which automatically flips cards unless you turn it off or pause it by hovering over the card. You can see how other code has to interact with the widget via those events triggered on $('#timer') in the previous excerpt. The spinning wheel is drawn via <canvas> but it would probably look even better as SVG.

To embed the web app into another page, an <iframe/> element with a src attribute pointing to one of the decks can be used. For example, this is how I can embed my Cardiovascular deck.

<iframe src="" 
    width="480" height="400" frameborder="2"></iframe>

It produces the following, which I’ve highlighted with the frameborder="2" attribute so you can see where its edges are. But those can be turned off by setting frameborder="0".

And there you have it. Hopefully this inspires other projects that try to use Anki content in larger web apps, since it is a powerful tool that deserves more integration into so many other things. The code is on Github, and you can see the slideshow in action with my medical school decks on this site or as a standalone website. Some ideas I currently have for improving it are:

  • A better front page for it that explains the UI and deck content before jumping into a deck
  • Upgrading the timer drawing to SVG so it looks better on high-res screens
  • Adding tag support, so you can filter by subtopic and by lecture (I tag in Anki by lecture number)
  • Having the server remember your recent cards, so it tries not to repick them

The First Year At Icahn School of Medicine at Mount Sinai

17 July 2013

I was happy to be featured in a video produced by my medical school about life as a first year student here.

Since it was filmed throughout the entirety of first year starting from the white coat ceremony, and now I’m knee-deep in the second, it’s interesting to look back and remember snippets of how it all went down. It was shot in a hands-off style: besides coordinating when the crew would be where, and the “debriefing” interviews, we as students pretty much did our thing and the cameras rolled.

The video is sincere about the culture and mission of the school, with the essence of our experience as first-years conveyed quite honestly. There are a lot of special things about Mount Sinai that I’ve already discovered in the short time that I’ve been here. Most of all, what makes Sinai especially great are the students and faculty here, and I was happy to see the video let the students’ thoughts and personalities speak for themselves. Ultimately medical education is about the people around you, so you should never underestimate the impact of a happy and well-rounded student body.

Hopefully applicants see the video and get a more personal sense of what Sinai is like. If you’re applying to Sinai and are interested in anything that you saw or heard about, feel free to email me!

Predicting Influenza Virulence with Machine Learning


Machine learning (ML) excels at creating models of the interactions between many weak correlations that may elude lower-dimensional statistical analysis. An example of a network of such interactions is the multifactorial sequence properties that determine the phenotype of a virus, such as influenza, in a given host. Although ML on viral sequence features has been used to predict more effective antiretroviral combinations for HIV 1, identify genetic markers for host selectivity within families of viruses 2, and refine genotyping strategies for Hepatitis C virus 3, the use of ML algorithms to predict the pathogenicity, infectivity, transmissibility, and vaccination response of an uncharacterized influenza strain from viral genomic sequence is still in its infancy 4,5,6. It is widely understood that these properties of influenza virus as they manifest within a specific host are complex polygenic traits, currently characterized as a collection of genetic mutations 7 isolated from wild strains whose singular effect on virulence are further characterized in animal studies, e.g., the N66S mutation in the proapoptotic PB1-F2 viral protein that increased the virulence of the 1918 Spanish Flu virus 8. Multivariate analyses of the interactions between observed mutations are not commonly published, however. One such meta-analysis in 2009 used 69 genomic sequences of H5N1 avian influenza to create a Bayesian graphical model inferring their virulence in mammals and confirmed that virulence is directly influenced by mutations in at least four genes, with at least two mechanisms requiring particular mutation combinations 9.

The ability to predict changes in virulence properties in animal reservoirs of influenza and likely mutations that would cause transmission to humans would have a profound effect on our ability to take preventative measures against the outbreak of hypervirulent influenza like the strains causing pandemics in 1918-1919 and 1957-1958 with tens of millions of casualties, and more recently tens of thousands of deaths resulting from reemergence of an older strain in 1977 and triple reassortment in 2009 7. For example, the current vaccination strategy could be enhanced for generation of immunity against not only previously known strains on the rise, but predicted future pandemic strains. The novel triple-reassortment strain that produced the 2009 H1N1 pandemic was identified too late to be included in that season’s trivalent vaccine 10, requiring rapid development of a second vaccine at an additional cost of $2 billion 11. While some computational models of influenza virulence 12 and mutation 6 take a highly structural approach (e.g., based on antigen-receptor binding affinity), we propose that an ML algorithm modeling these phenomena should be constructed on phenotype-genotype correlation data for three reasons. Firstly, many mutations that are known to affect influenza virulence occur in the viral polymerase complex (PB1 and PB2) or the IFN antagonist (NS1) 7,9, and these are not captured by the cited models which focus on the hemagluttinin protein (HA). Secondly, genotypic and phenotypic data on influenza isolates are being actively concentrated into several online public databases: the GISAID, the Influenza Virus Resource, and the Influenza Research Database (IRD), from which data should be pipelined to continually inform any systematic model of influenza so that it may remain up to date. Thirdly, depending on the selection of the ML technique, a human-interpretable model created by the training process could facilitate biological interpretation of the results, e.g., an ML analysis of host selectivity in Picornaviridae showed by mapping the most predictive AA k-mers back to annotated domains that replicase motifs in the polymerases were most discriminative 2.

We propose that a decision-tree-based ML algorithm trained on phenotypic data from the IRD will be able to distinguish significant interactions between virulence factors and predict virulence caused by combinations of variants and segments that are currently uncharacterized. Firstly, the associated publication for each of the the phenotype records will be filtered for articles that report pathogenicity data, and for each strain, we will extract data on lethality in the experimental animal cohort, severity of the disease, associated symptoms, and a timecourse of viral titers in various tissues. For example, Govorkova et al. report such data for four human isolates and nine avian isolates in ferrets 13, and other papers in the IRD report analogous results for other strains.

Table 1 from Govorkova et al., illustrating phenotypic data from animal experiments that will be used to train the ML algorithm. Table 1 from Govorkova et al., illustrating phenotypic data from animal experiments that will be used to train the ML algorithm.

Having loaded this data into a relational database similar to the design seen in Figure 2, we will relate it to sequence data downloaded from IRD or GenBank. Sequence data must be processed to extract genotypic features that will be used by the model. We may do this by simply using the sequencing derived phenotype markers already annotated by IRD (typically AA substitutions), but we may also choose to use FLAN from the Influenza Virus Resource to generate a feature table, or align the sequences to a generalized reference assembly or Hidden Markov Model 14,15 and extract features according to our own criteria. This could include raw k-mers of AAs or nucleotides, SNPs from the nucleic acid sequence, deleteriousness of these SNPs as predicted by PolyPhen 16 or similar, functional domains predicted within ORFs, codon usage bias within these domains, and more complex measures that attempt to capture the proximity of pairs of features.

A relational data model for capturing pathogenic genotype-phenotype associations (apologies for cameraphone quality) A relational data model for capturing pathogenic genotype-phenotype associations (apologies for cameraphone quality)

We will then extract (in a process analagous to denormalization of the sequence table in Figure 2) vectors of sequence features and phenotype features with which we can train a machine-learning algorithm. A decision-tree based algorithm, such as probabilistic decision trees, random forest or alternating decision trees, will be trained on these vectors to produce a model. The model can be tested by taking sequence data for an unknown strain, processing it via the same feature extraction pipeline used for the training set, and running the model on this vector of features. We can validate this model internally via ten-fold cross-validation and externally via analysis of strains that are not yet in the IRD’s phenotype database but for which experimental or epidemiological data in the literature strongly suggests a correct phenotype.

Assuming our model holds up to internal and external validation, the decision tree or ensemble of decision trees produced by ML can be examined to determine interesting combinations of features that we predict will produce significant changes in virulence. The biological significance of these combinations can potentially be experimentally verified in animal models. Furthermore, by starting with the phenotypic prediction for a particular virus sequence and varying small combinations of features at a time, we can predict the changes that would cause the greatest change in virulence for that virus. This can be performed on the entire GenBank influenza library to predict which sequenced strains are predicted to already be the most virulent in humans, and which will increase most in virulence after a small number of genetic alterations.

We can then present our model to the internet via a web interface, allowing analysis of arbitrary influenza sequences using our pipeline, submission of new phenotypic data, and visualization of predictions for the GenBank influenza library. Sequence data is frequently the first informative data available for an emerging pathogen. Here, we hope to produce a model for influenza virulence based exclusively on sequence data that will produce for any given viral sequence, based on the most up-to-date experimental evidence curated in the IRD, 1) a likelihood of the unknown strain’s danger to humans and 2) the number and location of mutations that would most likely increase its virulence, allowing an assessment of whether it will evolve to be dangerous in the near future.

  1. Lengauer T. Bioinformatical Assistance of Selecting Anti-HIV Therapies: Where Do We Stand? Intervirology. 2012;55(2):108–112. PMID:22286878.

  2. Raj A, Dewar M, Palacios G, Rabadan R, Wiggins CH. Identifying Hosts of Families of Viruses: A Machine Learning Approach. PLoS ONE. 2011;6(12):e27631. doi:10.1371/journal.pone.0027631.s004.

  3. Hraber P, Kuiken C, Waugh M, Geer S, Bruno WJ, Leitner T. Classification of hepatitis C virus and human immunodeficiency virus-1 sequences with the branching index. Journal of General Virology. 2008;89(9):2098–2107. PMID:18753218.

  4. Attaluri PK, Chen Z, Weerakoon AM, Lu G. Integrating Decision Tree and Hidden Markov Model (HMM) for Subtype Prediction of Human Influenza A Virus. In: Shi Y, Wang S, Peng Y, eds. Cutting-Edge Research Topics on Multiple Criteria Decision Making. Springer; 2009:52–58.

  5. Trtica-Majnaric L, Zekic-Susac M, Natasa Sarlija, Vitale B. Prediction of influenza vaccination outcome by neural networks and logistic regression. Journal of Biomedical Informatics. 2010;43(5):774–781. PMID:20451660.

  6. Xia Z, Das P, Huynh T, Royyuru AK, Zhou R. Modeling mutations of influenza virus with IBM Blue Gene. IBM J Res & Dev. 2011;55(5):7:1–7:11. doi:10.1147/JRD.2011.2163276.

  7. Tscherne DM, García-Sastre A. Virulence determinants of pandemic influenza viruses. J Clin Invest. 2011;121(1):6–13. PMID:21206092.

  8. Conenello GM, Zamarin D, Perrone LA, Tumpey T, Palese P. A Single Mutation in the PB1-F2 of H5N1 (HK/97) and 1918 Influenza A Viruses Contributes to Increased Virulence. PLoS Pathog. 2007;3(10):e141.

  9. Lycett SJ, Ward MJ, Lewis FI, Poon AFY, Kosakovsky Pond SL, Brown AJL. Detection of Mammalian Virulence Determinants in Highly Pathogenic Avian Influenza H5N1 Viruses: Multivariate Analysis of Published Data. Journal of Virology. 2009;83(19):9901–9910. PMID:19625397.

  10. Lambert LC, Fauci AS. Influenza vaccines for the future. N Engl J Med. 2010;363(21):2036–2044. PMID:21083388.

  11. Nabel GJ, Fauci AS. Induction of unnatural immunity: prospects for a broadly protective universal influenza vaccine. Nature Publishing Group. 2010;16(12):1389–1391. PMID:21135852.

  12. Goh G, Dunker AK, Uversky VN. Protein intrinsic disorder and influenza virulence: the 1918 H1N1 and H5N1 viruses. Virol J. 2009;6(1):69. PMID:19493338.

  13. Govorkova EA, Rehg JE, Krauss S, et al. Lethality to Ferrets of H5N1 Influenza Viruses Isolated from Humans and Poultry in 2004. Journal of Virology. 2005;79(4):2191–2198. PMID:15681421.

  14. Kuiken C, Yoon H, Abfalterer W, Gaschen B, Lo C, Korber B. Viral Genome Analysis and Knowledge Management. In: Methods in Molecular Biology. Totowa, NJ: Humana Press; 2012:253–261.

  15. Hughey R, Krogh A. Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci. 1996;12(2):95–107.

  16. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nature Publishing Group. 2010;7(4):248–249. PMID:20354512.