Innovation vs Standardization: The data challenge in biotech

Statements I’ve overheard at biotech companies:

  • "I need to use all our patient data to train my [insert cool ML algorithm]"

  • "We can't use data before last year because the schema changed so the data is incomparable"

  • "The data you need is in 10+ spreadsheets. $coworker_on_vacation knows how to navigate them."

  • "I need access to that data with PHI. I need to present results tomorrow at our board meeting."

If you’ve said any of these or heard them yourself, you aren’t alone. R&D complexity and dataset sizes are growing exponentially. The pressure to get products to market faster demands faster data infrastructure. Meanwhile, strict federal and state laws govern patient data usage. It’s a challenging and exciting situation.

As a technical product manager for a data platform, my job is to help users get data they need quickly and without breaking the law so we have quick solutions to the statements listed above. When ideating on the right solution for a user group, I frequently run into the classic data tradeoff of innovation vs standardization.

  • [Innovation] R&D teams need the freedom to access and use data flexibly for product development

  • [Standardization] Regulatory/legal/IT requires strict data standards (e.g., secure and limited access, restricted data usage for low-risk cases, no data sharing)

The optimal solution is where standardization accelerates innovation.

When done well, standardization creates a foundation for highly scaleable development. By establishing common data formats, access protocols, and governance frameworks, researchers can focus immediately on solving their questions rather than hunting for data or reconciling incompatible formats. Standardization enables more efficient collaboration, while well implemented compliance requirements enables teams to move confidently and easily within regulatory boundaries. Different companies at different stages need different levels of standardization and data experts can work with compliance and R&D leadership to assess those requirements.

Build vs Buy

This brings us to the question of building versus buying data platforms and tools. Platforms require ongoing setup, maintenance, and upgrades as businesses evolve and requirements for innovation or standardization change. Companies often default to accepting the short-term costs of building custom data platforms rather than paying for tool licenses—largely because hiring people is somehow easier than navigating procurement processes (I’ll rant about that another time). The consequence? They end up with a custom data platform that requires a large maintenance team and depends heavily on institutional knowledge of the original engineers.

I encourage teams to embrace proven industry standards and tools for core data standardization infrastructure instead of dismissing them because "our process is different". For critical needs like security, compliance, and governance, it's better to purchase industry tools (like Snowflake) with standardization features rather than building from scratch. This will enable companies to focus a technical team’s expertise on innovation that drives business value rather than the basics of standardization. Everyone will be happier for it.

Next
Next

The power of great technical translation