
Kenny Workman
@kenbwork • 7,107 subscribers
cto @latchbio, data infrastructure for biology
Videos

Agents are finally starting to work in biology. We’ve partnered with Anthropic and major biotech vendors - Vizgen, AtlasXOmics, Takara, 10x Genomics - to build a tool that allows scientists to steer their own analysis with natural language. Raw spatial data to publication quality figures. Our team believes this will soon be the standard way biologists interact with data. Spatial biology agents look a bit different from coding products: - tailored to the molecular details of each kit type - run in sandboxes on very large machines - orchestrate data infra, eg. bioinformatics workflows, with tool calls - build graphical analysis notebooks to communicate results Detailed breakdown of engineering decisions, product philosophy and concrete flows follows:
Kenny Workman113,915 次观看 • 7 个月前

I've had many engineers ask me why its worth their time and effort to learn biology in response to this post. Why should they be excited? We are poised for a revolution in biotech that will be uniquely enabled by computers. Convince yourself by digging into the examples I link: - The tooling is getting better. Assays are able to measure a broad array of molecules at a falling cost and increasing throughput. Look to ScaleBio, Curio Bioscience, AtlasXOmics for inspiration. We are sequencing millions of single cells and building spatial maps of the molecular state of tumors. After two generations of "next-generation sequencing", and stagnating DNA read + write costs under the monopoly of Illumina, this wave of the new assays will have a profound impact on the iteration speed and scale of experimentation. Little needs to be said about the impact of compounding trends in core tooling over a sufficient period of time. - Biotechs are using data to guide decisions and are incorporating domain informed machine learning as a core part of the molecular design process. There is great synergy here with better tooling as a means of abundant, cheap data. Look to Recursion, Manifold Bio, Dyno Tx, Asimov. There is also the cross pollination of biology informed architectures with the recent explosion in new machine learning techniques. These models are starting to do useful things, like generate functional gene editing proteins and entire prokaryotic organisms. Look to ESM, AlphaFold3, RFdiffusion. - New classes of therapies - genetic medicines and engineered immune cells - are having real success in the clinic. One dose cures for cardiovascular disease (Verve w ACSD), vaccines for cancer (Moderna w mRNA 4157), in vivo gene editing proteins (CRISPR Tx, Beam, Ensoma), metabolic disease (Novartis, Eli Lily w GLP-1/GIP modulators) are being dosed in real people right now + transforming lives. - The AI craze is commoditizing accelerated hardware and fast storage devices like NVMes, improving developer frameworks for writing code against these devices and maturing the systems tooling for moving around lots of data between computers for distributed training. One happy accident of this bubble will be the reuse of these components to build a new systems stack for the large scale processing of molecular data. This will be very important to construct a 1 billion single cell atlas and beyond. (For reference, the state of the art is ScaleBio's 2M cell kit, dubbed QuantumScale, and it is pushing things with the hardware + software we have today.) - Language models might be the perfect tool to distill the unstructured corpus of public data, literature, and methods sitting around on the Internet into real biological insights for scientist asking questions in natural language. It will also allow them to install, configure and run the slew of useful but poorly maintained academic computational tools to explore and hypothesize new biology on their own. Increasing the productivity of each scientist will do much to reverse Eroom's law. - There is an appetite from the market for new applications of biotech beyond drug development. Bacteria driven lithium mining (maverick), cell agriculture (growing cows in vats), early signs of consumer biologics (Geltor), biofoundries (Ginkgo). My guess is some of the greatest minds of our generation will want to do more than perturb the human body with therapies. I think it is also important to recognize that the need for computers and software is a secular trend in the progression of biotech independent of the interests of Silicon Valley. Clusters of computers, the information they store and the software that runs on them are precisely the technology needed by this field as it transforms into a discipline of information management, towards reducing living things into well characterized building blocks we can rebuild in our image. Software companies, as some local and hyper efficient structure in the arc of capitalism, with established methods and well trod rails to attract resources, talent and easily distribute product to an entire market, are the perfect place to incubate and disseminate these tools. There will probably be many, very large computer companies in biology in the next century.
Kenny Workman113,534 次观看 • 1 年前

Full footage from our systems x biology reading group with researchers from Arc Institute + FutureHouse. 2:33 LData: Building a distributed filesystem on Postgres and S3 (LatchBio) 26:10 BINSEQ: High-performance binary formats for DNA sequences (Noam Teyssier, Arc Institute) 49:00 Data Flywheels: Reinforcement learning algorithms for scientific AI (James Braza, FutureHouse) 1:08:07 Scaling Deep Learning to 1B+ Single Cells (Abhinav Adduri, Arc Institute) 1:36:30 Shreya Shekha, Greylock; Closing Biology is still a greenfield space for systems work. As molecular datasets continue to scale, engineering challenges will emerge at every layer of the stack, eg. file systems, storage + ML infra.
Kenny Workman27,547 次观看 • 4 个月前

DNA and RNA is not enough. Epigenetic state is necessary for drug development + basic biology. Understanding ATAC-seq data requires interactive + genome scale operations. We present a suite of interactive + publication-quality tools, in Python, to view chromatin accessibility.
Kenny Workman15,915 次观看 • 1 年前

Data is the oil of modern biotech. New tools allow us to model living systems too complex for unaided human cognition. Latch is releasing a new capabilities in data curation + delivery: - 30M observational cell atlas across 150 diseases, 200 tissues, 27 technologies - a partnership with two expert labeling teams - an agentic, human-in-the-loop framework for mass scRNA-seq curation
Kenny Workman10,082 次观看 • 11 个月前
没有更多内容可加载