A Next Generation Sequencing (NGS) Approach to Influenza Vaccine Development

Figure 1: Seasonal flu vaccine effectiveness

Influenza is a seasonal disease. It fluctuates in a very typical manner, and probably has for centuries, but certainly for decades since it’s been monitored.

Influenza-like illness (ILI) refers to the clinical condition caused by various respiratory pathogens but is typically driven by the influenza virus. ILI rates rise above baseline between November and February and decrease by March and April in the Northern Hemisphere. Within that narrow window, the timing of peak infection varies as does the yearly mortality, both driven by changes in the circulating flu strains.

Seasonal vaccine effectiveness

Likewise, vaccine efficacy varies widely from year to year (see Figure 1). Subtle or significant changes in the influenza virus itself dictate what the efficacy will be for any vaccine. For example, in both the 2004 and the 2014 seasons, the efficacy of the vaccine was quite low due to shifts in the H3N2 circulating strains compared to the strain used in the vaccine. In 2004, the A/Fujian strain was chosen for the vaccine, but unrelated strains actually circulated that year, resulting in lower vaccine efficacy. In 2014, the H3N2 strain, A/Texas, was selected for the Northern Hemisphere vaccine. Again, the circulating H3N2 strains that year did not match A/Texas, so vaccine efficacy decreased.

Influenza structure

There are eight genes in influenza. Two of them are represented in segments four and six, the hemagglutinin (HA) and neuraminidase genes (NA), respectively. Hemagglutinin is responsible for entry of the virus into cells via sialic acid residues. Neuraminidase is responsible for releasing the virus from infected cells in combination with the hemagglutinin antigen. There are 18 serotypes for hemagglutinin and 11 serotypes for neuraminidase. Because they are present on the surface of the virus, these are the most immunogenic antigens. Vaccine candidates are chosen based on the HA thought most likely to be prevalent in the next season.

Antigenic drift vs. shift

Two biological activities dictate how effective an influenza vaccine will be.

The first is antigenic drift, in which mutations in the hemagglutinin and neuraminidase genes result in small changes to those proteins. These changes in influenza occur as it goes through its normal stage of infection among multiple species (humans, pigs, birds), making those antigens unrecognizable by antibodies from previous influenza exposures.

Antigenic shift is a more dramatic change in which major changes occur in influenza A viruses that result in a new hemagglutinin or a new combination of hemagglutinin and neuraminidase proteins in a virus infecting humans. The most well-known example of antigenic shift is the Spanish flu of 1918, in which there was little or no cross protection.

Although influenza changes from year to year, we still rely on the prediction of the strains that will be circulating in the upcoming year to select for vaccine production.

Because of the difficulty of this process, there is an effort to find alternative solutions to enhance the yearly efficacy of the vaccine.

Traditional vaccine production

For now, twice a year, in February for the upcoming Northern Hemisphere season, and in September for the upcoming Southern Hemisphere season, the strains to be used in the vaccines are selected based on the consensus of experts.

The HA and NA genes that are identified are inserted either by reverse genetics or through re-assortant technology into viral backbone genes, typically the A/Puerto Rico/8/34 strain, that are adapted for growth in eggs. That engineered virus becomes the candidate vaccine virus (CVV). The CVV is then made available, from either the World Health Organization (WHO) or the Centers for Disease Control and Prevention (CDC) for use in either egg-based vaccine production or cell culture-based systems.

Next-generation vaccines

Driven by both the need to adapt to new strains every year as well as to develop production techniques that are more efficient, new- generation vaccines have either been approved and are on the market or are still in the investigation stage and in various phases of clinical trials.

Medicago is in phase 3 clinical trials with a VLP (virus-like particle) approach using tobacco plants to produce the VLP.  The goal is efficient virus production without using eggs.  VLPs are empty viral shells with the hemagglutinin exposed on the surface. Protein Sciences, now part of Sanofi Pasteur, has received approval for their influenza vaccine, which is a baculovirus-derived recombinant hemagglutinin produced in Sf9 cells.

Seqirus uses the (CVV) grown in a mammalian cells, MDCK for example, rather than eggs.

Other approaches to account for both the changes seen in the influenza virus as well as addressing the difficulties in vaccine production are also in development. These include using a higher number of hemagglutinin epitopes and putting them into a single vaccine to try to get a broader spectrum of coverage, VLP-based delivery approach using recombinant technology, and using empty virus that is replication deficient but can still express NA and HA.

A more universal approach that avoids the need to make a new vaccine every year would be ideal. Several companies are pursuing that goal. For these new approaches to be successful, knowing the detailed genetics of hemagglutinin and neuraminidase is essential. One approach is to use the more conserved region of the HA stem region as the antigen in the vaccine to allow a broader coverage across multiple flu strains.

Vaccine efficacy studies


Influenza vaccine efficacy studies entail monitoring ILI in vaccine trials participants to determine the efficacy of the vaccine in preventing influenza infection. Samples collected from trial participants with ILI are tested using molecular techniques such as PCR to identify the pathogen causing the ILI. If influenza is detected using these methods, the infecting influenza strain is isolated and expanded in cell culture.

Genetic characterization

The influenza isolates can be serotyped for strain identity, but can also undergo sequencing by Sanger or Next-Generation Sequencing (NGS) methods, which allow a more granular view of the hemagglutinin and neuraminidase genes, or even the entire genome.

NGS can provide particular insight, including characterization of mixed populations. Genome characterization illuminates viral evolution, thus providing insight into the efficacy of vaccines and therapeutics. It supports epidemiological studies and informs the development of future treatments.

Over 20,000 influenza genomes have been sequenced through the Influenza Genome Project. The Global Initiative on Sharing All Influenza Data (GISAID) promotes sharing of sequence, clinical and epidemiological data and comprises over 40,000 isolates.

Influenza genome structure

The influenza genome is very compact, and contained in virus particles that are ~ 100 nm in diameter (see Figure 2). The genome is just over 13 kilobases, comprising eight RNA segments ranging from about 800 to 2,500 nucleotides packaged into ribonucleoprotein complexes containing the RNA segment, a nuclear protein and the polymerase complex, which are both encoded by the viral genome. Variations in the HA and NA genes determine the subtype and strain. In vaccine development, we target viruses that are forecast via surveillance to be circulating during the flu season. Tracking how influenza changes during surveillance and in response to challenges by vaccines or antivirals is also very important.

Influenza particle with eight ribonucleoprotein complexes
Figure 2: Influenza particle with eight ribonucleoprotein complexes

Comparing Sanger and next-generation sequencing

It is possible to sequence the entire viral genome of influenza via reverse transcription of all eight RNA segments. Gene-specific primers are used to amplify the viral genome, which can then be sequenced bi-directionally using Sanger technology. The genome structure introduces complications, however. For example, the NS segment is small, producing a single PCR product that is less than 1000 bases, while the 2.3 to 2.5 kilobase PD2 segment needs to be split into several smaller amplicons to enable Sanger sequencing.

Sanger sequencing is broadly available and well understood; however, this approach does not work well with mixed populations. With Sanger sequencing, every sequence is actually an ensemble of sequences: —the readout is an amalgam of many individual PCR product sequences. If the isolate is mixed, it may not be possible to detect minority subtypes or strains. There’s also the possibility of false negatives: if primers fail to land due to sequence drift, they will fail to amplify cDNA from that isolate. Due to these limitations with Sanger sequencing, NGS approaches are becoming the standard.

NGS uses a single tube reaction, taking advantage of the conserved sequences on the 5´ and 3´ ends of each viral RNA. PCR products are rather large, but by employing the Nextera Tagmentation method, all amplified fragments can be converted to sequencing library simultaneously. During this process, a sample barcode is incorporated to allow assignment of each resulting sequence to a specific sample. Individual fragments are clonal, and thus the sequence represents an individual molecule. This in turn enables identification of multiple subtypes or strains within a single isolate with outstanding sensitivity.


A custom bioinformatics pipeline assembles sequences from all viral segments and compares them against reference sequences for strain identification. The bioinformatics pipeline includes FASTQ processing and contig assembly; contigs are matched against existing sequences from the GISAID and other sources. A scoring matrix determines the type and strain of the isolate, with top-scoring strains subjected to pairwise competitive alignment, which is then used to make the final determination. We have accurately identified 100 percent of validated strains with this method.

Table 1: Comparing molecular detection methods
Table 1: Comparing molecular detection methods
Advantages of NGS
  • Sequencing the entire viral genome
  • Full sequence of HA and NA genes
  • Reduced chance of false negatives
  • Streamlined workflow: nonspecific amplification is filtered out bioinformatically
  • Very high throughput

Although the focus is on HA and NA genes, characterization of the entire viral genome allows detailed insight into strain evolution outside of those genes.

Vaccines are developed using specific strains. If a vaccine-inoculated patient develops flu symptoms, a sample can be collected and identified using NGS. Identifying a strain identical to the vaccine for that patient is evidence of poor efficacy.

Similarly, when a different strain is observed, we gain insight into breakthrough strains or potential escape mechanisms that can inform future vaccine strategies. Thus, we identify genetic changes that occur in the face of viral evolution and selection by various challenges, including vaccines, and incorporate this knowledge into the analysis of efficacy, leading to development of new anti-viral therapies.

This article was created in collaboration with the sponsoring company and our sales and marketing team. The editorial team does not contribute.