DrugOps

DrugOps (Part 3): Transforming Sequence Analysis into an Automated Workflow

Fully-automated antibody lead identification example of Workflow Automation to save scientists’ time on redundant and error-prone steps.


This article is Part 3 in a 5 part series on the fundamentals of DrugOps.  In case you'd like to jump directly any of the other sections please find the links below:

 

Workflow Automation in Drug Discovery

If you take the name of your favorite LIMS or ELN and add the word "automation" afterwards, you're guaranteed to find results on Google that emphasize "efficiency" in your "workflow," perhaps to "accelerate" your science. Sometimes there are more details - a system can read from a specific format instead of requiring data to be entered manually, or perform a series of calculations when you press a button. These can be valuable time-saving steps, but they're still individual steps.

 

iStock-181898348

 

Workflow Automation is one of the three core elements of DrugOps and requires that we find ways to automate entire workflows, not just individual steps. Using software to orchestrate the steps in a particular workflow will be our means of achieving this. To emphasize what we mean, this article will describe automating the processing, analysis, and identification of lead antibodies from an antibody discovery workflow. We'll leave the nitty-gritty for another article and will focus instead on the steps involved and what we gain by automating them.

 

Assume we've constructed a phage-displayed Fab library. Our goal is to automate as many steps as possible in our lead identification workflow, beginning with acquisition of sequences for a selected set of plated clones. From there we want to:

  • Associate the reads with each well of our sequenced plate
  • Analyze the reads to identify the displayed variable regions and corresponding germlines
  • Record screening assay data for the plated clones
  • Identify the unique clones that pass our screen for leads
  • Assemble a new plate of our lead clones

 

A manual system for working through this process might look something like this:

  1. A scientist loads the sequencing result file(s) into a program for review. In the best case, this is a molecular biology program; in the worst case, it's Excel.
  2. The scientist identifies the Fab chain in each read. This might be performed on the scientist's behalf by the program they're using, or if they're using Excel, then the extraction proceeds manually by identifying key sequences 5' and 3' of the Fab sequence and copying that sequence into a new cell.
  3. The scientist checks the quality of the read in the region identified in step #2 and culls sequences of insufficient quality (though in Excel, this may be impossible).
  4. For each Fab chain, the scientist identifies and annotates the variable domain. Again, in the best case, the software used in step #1 can do this on their behalf; in the worst case, they load the sequences into a different program or submit them on a website that can do this for them (in the very worst case, one sequence at a time).
  5. For each Fab chain identified, the scientist identifies the closest-matching germline sequence, ideally a step performed by the same software that identified and annotated the variable domains in step #4. If not, then the scientist again turns to a program or website that can do this for them.
  6. The heavy and light chains for each clone, and their respective annotations, are combined together to provide the complete Fab expressed by the clone. Some programs may be able to identify a common element within the ID for a read (StackWave's Affinity does this), but it's probably done in Excel.
  7. Screening data is associated with each of the clones. Again, there may be some programs that support this (Affinity does), but the scientist is probably copying the data into Excel again.
  8. Based on the properties of the desired therapeutic, filters are applied to the assay results to identify the most promising leads. If the program used in steps #6 and #7 supported lining up clone sequences and assay data, it should support filtering as well. Excel certainly does!
  9. From the set isolated in step #8, identify the unique Fabs. Since highly enriched clones predominate, duplicate Fabs need to be removed from the final set before we proceed. If this can’t be done programmatically, such as by an Excel macro, it becomes a tedious and time-consuming manual process.
  10. The set produced from step #9 represent our leads; now we must construct a source-to-destination mapping so that we can correctly pipette (or, if we're using a liquid handling instrument, instruct the machine to correctly pipette) from our initial set of plates into a new set that becomes the basis for downstream steps, such as reformatting and expression. This is likely done manually in a new Excel sheet, copying the plate and well information from the filtered set of clones and progressively assigning plate and well information for the destination plates.

 

That's a lot of steps - frankly, it's impressive that you even read this far. More than simply being a time-consuming process, though, each step introduces the risk that the scientist performing the step will make a mistake, a mistake that may not be caught until a lot of time and money have been spent. As useful as Excel is, it exacerbates this risk: it won't tell you if you're off by a single cell, misapplied your filter, or copied data into the wrong row.

 

What we'd like to do instead is have the computer perform as many steps as possible on the scientist's behalf, then have the scientist make whatever decisions are necessary to move the process into the next series of automated steps. This is "human-in-the-loop" automation; the computer does as much work as possible, then presents an actual person with the information they need to make a decision about next steps. To achieve this, we need software that will load our reads, check them for quality, extract the chains, annotate the variable regions, call the germlines, and then present the user with the results.

 

Applications exist that can perform these steps - we know since we built one - but if purchasing such an application isn't feasible, we'll be describing how to automate them on your own in another article. Either way, equipped only with the automation we described in the last paragraph, we've taken a process that required the ten manual steps above and gotten it down to merely the final three. This is a huge step forward in automating a complex process.

 

Workflow automation makes processes more consistent and more reliable, and restores time that would otherwise be lost to a series of manual steps. This is time that can be put towards solving more useful, more interesting, or more complicated problems. It's also time that can be invested in reviewing progress, performance, or existing automated processes, and looking for ways to improve them. That kind of work is a key component of Rapid Feedback, the final element of DrugOps and the one we'll be discussing next.

 

If you'd like to continue reading, check out Part 4 of this series, Better Therapeutics through Rapid Feedback

 

Similar posts

StackWave Affinity™

Whether you're a CRO making customer deliveries, a start-up advancing towards IND, or an established biopharma supporting multiple programs at once, Affinity supports every team, at every step, on a single platform, for a single price.

Download the StackWave Affinity LIMS presentation

Sign up for notifications