Tutorial 1 - Canonical workflow

Using the demonstration data, work through a cononical workflow. This tutorial is available as a video

Create a new plate set by reformating

Using the demonstration data, reformat Project 10, Plate Set 6, 10 96 well plates, into 384 well assay plates with a replication of 1. Navigate into Project 10 and select plate set 6:

Select reformat to be presented with the reformat form. The top panel provides information about the source plate set, and the bottom panels requests information on the design of the destination plate set. I will use sample replicates of 1 and target replicates of 4 i.e. all 4 quadrants of the 384 well plate coated with the same target. This requires that I define the layout as 1S4T.

Pressing Submit brings you to the confirmation page where you have the opportunity to select the target layout. I will get a new Plate Set #8, assuming no earlier modifications were made to the demo data. Navigate into PS-8 and note that there are 3 plates, 30-32, with 32 only half filled. I am ready to load data.

Apply barcode IDs to plates, Accession IDs to samples

Applying barcode and accession IDs is optional. The purpose of these identifiers is to allow you to associate LIMS*Nucleus data with data in other systems. Start by selecting the Plate Set of interest, PS-8 in this case, and follow the directions for importing accession ids and barcode ids. Use the following data:

ID type File
accession ps8-accessions.txt
barcode ps8-barcodes.txt

Note that empty wells in the third plate do not have an entry in the acessions file. A template for loading the accession IDs can be generated by exporting the underlying plate set data. The columns of interest are “Order” and “Well” which must be relabeled “plate” and “well”. Also include the accession as “accs.id”.

Load assay data.

Download the demonstration tab delimitted data set plates384x3_1S4T.txt to your local drive. The plate layout looks like:

and the response values look like:

This might be an ELISA assay showing a majority of responses less than 0.5. There is good reproducibility amongst the controls, with a good assay window. With the destination plate set highlighted, select Utilities/Import assay data:

Fill in the dialog, substituting the path name with the path to the file on your computer. Note that the import form provides a sample of what the import data should look like. Column order, spelling and capitalization are critical. After file selection the first few rows of data are displayed for confirmation:

I want hits automatically identified using a built in algorithm so I select the option mean +3SD in the algorithm dropdown so that all responses greater than the mean of the background wells + 3 standard deviation units are considered a hit. Because I am auto-identifiying hits, the add hit list name and description fields become active giving me the opportunity to name and describe the hit list. I did NOT select Top N as the selection algorithm so the number of desired hits field remains inactive:

Click import data. Not that the import button changes to ‘Loading…’. Do not reclick. Depending on the size of the data, import could take many seconds. Once complete you will be presented with the list of plates for the current plate set. Scroll down to see assaciated hit lists - HL-7 being the just created hit list:

View the Assay Run

Click on the assay run hyperlink AR-6, to view the data under default conditions - normalized with a threshold of mean of the negative controls plus 3 standard deviations. Controls are color coded:

Below the plot is a form for replotting under different conditions of normalization and thresholding. After replot, the tools menu allows for the viewing of hits above the selected threshold and the creation of a hit list if desired.

View the Hit List

To view a hit list, select the menu item under the tools icon. In the hit list view you can inspect the hits and are given the opportunity to save the list. Enter and name and description, and select ‘Add New Hit List’ from the tools menu:

Upon successful creation of the hit list you are presented with its parent assay run. Scroll to the bottom to see the newly created hit list:

Rearray

At this point you are ready to rearry your hits into a new plate set for secondary assay or further processing. Note that HL-7 contains 162 samples. Click on the ‘HL-7’ hyperlink to view the hit list. Scroll to the bottom to see the availability of hits in various plate sets:

In our simplistic demonstration data set, it is obvious that the parental samples for our assay plate set are in plate set 6. In real life processing, samples may get distributed across many plate sets. Use the type and count fields to help determine an appropriate source of samples for rearray. Assay plates are typically transient and not a source of material for rearraying. You must establish type naming conventions for your plate set, for example archive plates or master/daughter/rearry plates contain samples while assay plates are transient. In the demo data PS-6 is labeled “master”, which helps identify a plate that contains the parent samples. Assay plates are typically transient and not a source for sample recovery.

The count field is to confirm that a particular plate set contains (or not) all the hits of interest. If no single plate set contains all hits, you may have to group plate sets.

Select the plate set that will serve as the rearry source and select “Rearry” using the tools button. This will initiate the production of a new plate set that will contain hits only:

Provide the requested information and press “Submit”. You will then have the opportunity to select the layout of controls. The number of plates required will be automatically calculated and hits will be distributed in numerical order by column into the plates. Since this is a rearray plate, and not an assay plate, the target is auto-assigned to DefaultQuadruplicates, which is inconsequential. To achieve the physical rearry in the lab a worklist is generated and associated with the plate set. Indication that a plate set was generated by a worklist is the presence of the worklist ID in the “worklist” column in the plate set view. Only plate sets generated by a rearray have an integer in the “worklist” column. The worklist is permanently stored in the database and can be recalled by selecting the plate set and using the tools icon select ‘Worklist’.

The worklist will be displayed:

The worklist can be exported to e.g. Microsoft Excel or LibreOffice Calc, depending on what your computer has associated with the .CSV extension:

This worklist can be used with a liquid handling robot such as a Beckman Biomek, to perform the physical rearry.

Tutorial 2 is a repeat of the canonical workflow, this time using an assay plate layout that allows for duplicate samples.

Split plate sets

Only plate sets can be split. You can think of splitting a plate set as a regrouping of plates within a plate set. Navigate into the plate set of interest containing the plates to be grouped and highlight the plates. Select Utilities/Group from the menu bar:

A dialog will open. Fill in the name and description for the new plate set. The plates must be of the same format and layout, which will be indicated in the dialog box. Select a plate type and press OK.

Systems

A systems approach involves the integration of multiple independant commercial and custom software products to work in unison towards a common goal. A systems approach allows flexibility by allowing for the upgrade or discard and replacement of individual components as requirements change.

Advantages

  • Flexible; can evolve as process evolves
  • Best of breed components can be used
  • Portability of knowledge (Spotfire, R, SQL)
  • Adaptable to containerization

Disadvantages

  • Components on different upgrade cycles
  • Components use different technologies with scattered expertise
  • Configuration challenges: missing libraries, auxilliary software
  • May depend on external network connectivity
  • User training can be challenging
  • Integration can be challenging

References

Microservices as innovation enablers best practices == common practices
Split the monolith
Trulia switches to “Islands”
A contrarian’s (with vested interests) view

Case study of monolith implementation: Why Doctors hate their computers Discusses feature creep and the “Tar Pit”

Proprietary IT give big companies their edge.

Rob Brigham, Amazon AWS senior manager for product management: “Now, don’t get me wrong. It was architected in multiple tiers, and those tiers had many components in them. But they’re all very tightly coupled together, where they behaved like one big monolith. Now, a lot of startups, and even projects inside of big companies, start out this way. They take a monolith-first approach, because it’s very quick, to get moving quickly. But over time, as that project matures, as you add more developers on it, as it grows and the code base gets larger and the architecture gets more complex, that monolith is going to add overhead into your process, and that software development lifecycle is going to begin to slow down.”

When computational pipelines go ‘clank’

Next>> Features

Target Layouts

Target layouts define the pattern of targets coated on assay plates. The available patterns are described on the replication page. Observe the patterns under the “Target Pattern” column and note that singlicates, duplicates, and quadruplicates are the only allowed options. Duplicates are always in the same column, while sample duplicates are in the same row. Before setting up a layout pattern, targets must be imported as described on the targets page. Alternatively you can use the built in generic targets Target1, Target2 etc. Note that assigning targets is not required and is available only to allow merging with target information held in other systems.

To set up a layout, navigate into the project of interest and select the menu item Utilities/Targets/Create Target Layout. Provide a name and description, and select the level of replication desired. The dropdowns will be enabled as needed:

Once the layout is saved, it is available for use during reformatting or plate set creation. Note that the layout will only appear as an option when appropriate selections have been made e.g. replication is singlicates:

Targets

For a definition of target see the layouts page. Targets are primarily used to annotate data and assist with merging LIMS*Nucleus data with data from other systems. Defining targets is optional and if not done, generic “Target1”, “Target2” labels will be used in output. Using targets requires three steps:

  1. Register targets inividually or (administrator) import in bulk.
  2. Define target layouts
  3. Apply layouts to plate sets

Defining layouts only makes sense when creating assay plate sets. Apply the target layout during the reformating step.

There are two methods of importing targets:

Bulk import by an administrator

Under the admin menu item select “Bulk target import”. A file chooser dialog will appear. Choose an import file with the format described below:

1
2
3
4
5
6
7
8
9
project	target 	description	accession
1 muCD71 Mouse transferrin receptor FHD8SU29
1 huCD71 Human transferrin receptor JDHSU789
1 cynoCD71 Monkey transferrin receptor KSIOW8H3
1 BSA Bovine serum albumin KEUI87YH
2 Lysozyme Lysozyme KDJFG98D
2 GAPDH Glyceraldehyde Phosphate Dehydrogenase KFIIOD09
2 ICAM4 ICAM 4 integrin KL0OIE7U
2 IL21R IL21 receptor KOI89IUY

Here is an example target import file: targets200.txt

Column header spelling, capitalization, and order are critical. Indicate the project to which the target should be associated in column one. Import will fail if the project id is not in the database. For targets that should be available to all projects, place “NULL” (no quotes) in the first column. Only administrators can designate target project id as NULL during bulk import. Note that currently there is no opportunity to update an accession at a later time should it be blank upon import.

One at a time import by users

Under the menu bar Targets/Add New Target will show all targets. At the top use the tool button to navigate to the add target page:

Fill in the form. Press Submit. The target is associated with the current project and is only available within that project. Once targets have been registered, they can be used in a target layout.

Terminology

Sample: Item in a well. Could be antibody, small molecule, virus, antisense oligo, expression construct, etc.
Target: Material coated on or in an assay plate. Substance of interest that will interact with samples.
Rearray: Select random samples (hits) across a plate and place them in a new plate.
Reformat: Combine plates of one format into a higher density plate e.g. collapse four 96 well plates into a 384 well plate
Group: Combine two or more plate sets into one plate set; combine a subset of plates from a plate set into a new plate set
Format: Number of wells in a plate e.g. 96, 384, 1536.
Hit: A sample that surpasses and assay threshold.
Source: Plates from which samples are drawn.
Destination: Plates into which samples are deposited.
Plate set order The assigned order of plates within a plate set. Order is visible in the client.
Required data Data required for LIMS*Nucleus to function e.g. plate layouts, assay types, well types
Example data Fake data that can be used to test LIMS*Nucleus functionality

Provided install scripts

A variety of installation/configuration scripts for both the client and the PostgreSQL database server are provided as links on this web site or packaged with the LIMS*Nucleus client. Various scripts are described below. Scripts without hyperlinks are included in the install package.

Supplied Scripts

Name Description
install-limsn-ec2.sh Full installation on AWS including web server, database server, and application software
install-limsn-pack.sh Install LIMS*Nucleus client and optionally database using a Guix pack (easiest install)
lnpg.tar.xz archive of sql scripts for database configuration
install-pg-aws-ec2.sh Installation of the PostgreSQL database with LIMS*Nucleus tables, methods and example data. This script is called by install-limsn-ec2.sh. This script is only used to reinstall the database after manual deletion
install-pg-aws-rds.sh install database on AWS Remote Database Service PostgreSQL instance
start-limsn.sh Use to start the client application software. Run in detached mode so the terminal can be shut down.
init-limsn-pack.sh place $HOME on $PATH; modify $HOME/.bashrc; for use with Guix pack
init-limsn-channel.sh place $HOME on $PATH; modify $HOME/.bashrc; for use with channel installation
load-pg.sh load database by running all SQL scripts at command line
lnpg.sh run lnpg.scm passing necessary parameters to initialize database

Sequence evaluation

When processing sequences obtained from a vendor, it is useful to have an idea of how well the sequencing reactions worked, both in an absolute sense and relative to other recently obtained sequences in the same project. What follows is a primary sequence independent method of evaluating a collection (i.e. and order from an outside vendor) of sequences.

The first step is to align sequences by nucleotide index (ignoring the actual sequence). Start by reading the sequences into a list. I use the list s.b to hold forward (5’ to 3’) sequences, and the list s.f to hold the reverse (but in the 5’ to 3’ orientation, as sequenced) sequences:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
rm(list=ls(all=TRUE))
library(seqinr)

working.dir <- "B:/<my-working-dir>/"


back.files <- list.files( paste(working.dir, "back/", sep="" ))
for.files <- list.files( paste(working.dir, "for/", sep="" ))

> back.files[1:20]
[1] "MBC20120428a-A1-PXMF1.seq" "MBC20120428a-A10-PXMF1.seq"
[3] "MBC20120428a-A11-PXMF1.seq" "MBC20120428a-A12-PXMF1.seq"
[5] "MBC20120428a-A2-PXMF1.seq" "MBC20120428a-A3-PXMF1.seq"
[7] "MBC20120428a-A4-PXMF1.seq" "MBC20120428a-A5-PXMF1.seq"
[9] "MBC20120428a-A6-PXMF1.seq" "MBC20120428a-A7-PXMF1.seq"
[11] "MBC20120428a-A8-PXMF1.seq" "MBC20120428a-A9-PXMF1.seq"
[13] "MBC20120428a-B1-PXMF1.seq" "MBC20120428a-B10-PXMF1.seq"
[15] "MBC20120428a-B11-PXMF1.seq" "MBC20120428a-B12-PXMF1.seq"
[17] "MBC20120428a-B2-PXMF1.seq" "MBC20120428a-B3-PXMF1.seq"
[19] "MBC20120428a-B4-PXMF1.seq" "MBC20120428a-B5-PXMF1.seq"
>

Next determine the number of files read and create a list of that length to hold the sequences. Then read them in and inspect a sequence:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
s.b <- list()
length(s.b) <- length(back.files)
s.f <- list()
length(s.f) <- length(for.files)

for(i in 1:length(back.files)){
s.b[[i]] <- read.fasta(paste( working.dir, "back/", back.files[[i]], sep=""))
}

for(i in 1:length(for.files)){
s.f[[i]] <- read.fasta(paste( working.dir, "for/", for.files[[i]], sep=""))
}

> getSequence(s.b[[2]])[[1]][1:200]
[1] "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "n" "c" "n" "n" "n"
[19] "g" "t" "c" "c" "a" "c" "t" "g" "c" "g" "g" "c" "c" "g" "c" "c" "a" "t"
[37] "g" "g" "g" "a" "t" "g" "g" "a" "g" "c" "t" "g" "t" "a" "t" "c" "a" "t"
[55] "c" "c" "t" "c" "t" "t" "c" "t" "t" "g" "g" "t" "a" "g" "c" "a" "a" "c"
[73] "a" "g" "c" "t" "a" "c" "a" "g" "g" "c" "g" "c" "g" "c" "a" "c" "t" "c"
[91] "c" "g" "a" "t" "a" "t" "t" "g" "t" "g" "a" "t" "g" "a" "c" "t" "c" "a"
[109] "g" "t" "c" "t" "c" "c" "a" "c" "t" "c" "t" "c" "c" "c" "t" "g" "c" "c"
[127] "c" "g" "t" "c" "a" "c" "c" "c" "c" "t" "g" "g" "c" "g" "a" "g" "c" "c"
[145] "g" "g" "c" "c" "g" "c" "c" "a" "t" "c" "t" "c" "c" "t" "g" "c" "a" "g"
[163] "g" "t" "c" "t" "a" "g" "t" "c" "a" "g" "a" "g" "c" "c" "t" "c" "c" "t"
[181] "a" "c" "a" "t" "a" "a" "t" "g" "g" "a" "t" "a" "c" "a" "a" "c" "t" "a"
[199] "t" "a"


Note that ambiguities are indicated with an “n”. The sequence evaluation will involve counting the number of ambiguitites at each index position. The expectation is that initially - first 25 or so bases - will have a large number of ambiguities, falling to near zero at position 50. This is the run length required to get the primer annealed and incoporating nucleotides. Next will follow 800-1200 positions with near zero ambiguity count. How long exactly is a function of the sequencing quality. Towards the end of the run the ambiguities begin to rise as the polymerase loses energy. Finally the ambiguity count will fall as the reads terminate.

Create a vector nbsum that will tally the count of ambiguities at a given index. Then process through each sequence and count, at each index, the number of ambiguities. The total count of ambiguities is entered into nbsum at the corresponding index position.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
for( i in 1:length(s.b)){
for( j in 1:length( getSequence(s.b[[i]][[1]]))){
if( getSequence(s.b[[i]])[[1]][j] == "n") nbsum[j] <- nbsum[j] + 1
}
}

> nbsum[1:100]
[1] 167 168 168 168 168 166 163 164 160 153 149 142 131 150 135 125 120 111
[19] 99 93 79 80 59 51 61 52 48 38 26 20 17 17 22 20 18 14
[37] 16 15 13 11 14 16 23 21 13 12 6 7 5 6 9 5 3 3
[55] 1 4 3 1 1 3 3 3 2 0 1 0 0 0 0 1 1 1
[73] 5 5 12 14 21 25 24 28 29 31 21 20 8 8 7 10 4 2
[91] 2 3 5 1 3 0 3 1 1 0
>


x <- 1:1200
plot(nbsum[x])

Overlay the reverse reads in red.

1
2
3
4
5
6
7
8
9
10
nfsum <- vector( mode="integer", length=2000)

for( i in 1:length(s.f)){
for( j in 1:length( getSequence(s.f[[i]][[1]]))){
if( getSequence(s.f[[i]])[[1]][j] == "n") nfsum[j] <- nfsum[j] + 1
}
}

points(nfsum[x], col="red")

I have created a shiny app that implements the above code. Download it here.

Installation

Edit your channels.scm file to include the labsolns channel

Once edited:


$guix pull
$guix package -i seqeval
$source $HOME/.guix-profile/etc/profile

##run the bash script

$ seqeval.sh

R / Shiny

Use R-Shiny to prototype algorithms and visualizations and extend LIMS*Nucleus. Below is a list of assay runs from Project 1. The assay run hyperlink transfers you to a Shiny dashboard that allows you to manipulate and visualize your data and generate a hit list.

ID Name Description
AR-1 assay_run1 PS-1 LYT-1;96;4in12
AR-2 assay_run2 PS-2 LYT-1;96;4in12
AR-3 assay_run3 PS-3 LYT-1;96;4in12

Simplifying Assumptions

  • Always use the 3 character well name A01, not A1

  • Default to tab delimitted text In some cases comma or tab delimitted will be offered as an option. Proprietary formats are avoided.

  • Plates are always filled by column Well number is derived from the order of filling.

*Reformatting is performed in the “Z” pattern Quadrants are numbered in the Z pattern.

  • Plate sets contain plates of the same format and layout

  • Always import a full plate of data, even if the plate isn’t full e.g. a data file for three 384 well plates should have 3*384=1152 rows even if the third plate isn’t full. Only control wells and unknown wells with samples will be processed.