[WIP] Data platforms and lab automation

Work

Mar 1

This page is still a work in progress but here are some early thoughts around supporting data pipelines in automated labs….

Intro: Automated labs are chains of integrated robots each doing its job in a laboratory system. Data is a critical component to monitoring how a lab is performing at all times.

Data Objective: Company needs to know when a component of the automated lab system is “sick” before it completely fails.

Background:

Process monitoring is the act of tracking and measuring performance of an operational process to determine ideal and sick performance states.
Data needs: Monitoring health of an automated lab requires joining and transforming data across data sources quickly. The format of the data may range from robotic logs to json blobs. Critical needs of scaling monitoring include data availability, data recency and standardized data transformations.
Challenges:
- Clinical lab operation is 24/7
- Automated labs are early enough in development that most are one-of-a-kind systems with limited knowledge of performance on patient samples
- Engineers need an automated digital monitoring process
- Process monitoring requires control charts (graph used to study how a process changes over time)

Users

Automation engineer: Responsible for telling the automated system when to do what action. Most often uses robotic logs to measure actions per time and errors.
Data scientist: Responsible for determining health thresholds of the automated system. Uses robotic logs and lab info at the sample and batch level to triage high, medium and low performance. Responsible for ensuring any software developed is SDLC compliant and has required data snapshotting.
Lab Director: Responsible for using data to triage performance issues across people and machines to minimize sample failures
Engineering Operations: Monitors performance of lab in real time. On call if failure arises.
Data Engineer: Responsible for provisioning data and infrastructure that will enable data scientists to self service process monitoring data and develop relevant tooling.

Product Requirements [WIP] There are many more product requirements to dig into and assumptions to clarify but these are some examples of where I would start…

Robotic logs are read into a structured database
Robotic logs and lab data are refreshed at [time cadence]
Robotic logs can be joined with lab data at the [sample or batch] level
Data scientists and automation engineers have access to production data
Data scientists and engineers can define sample subgroups (e.g. 24-, 96-, 384-plates). Samples may belong to one or more subgroups
Data scientists and automation engineers can use scheduling and transformation tools [DBT? Airflow?] to write transformations and and output to central schemas

Sara Balch

[WIP] Data platforms and lab automation

Framework for software product management in life sciences

Launching patient-partnered research programs