Bo Wang's banner
Bo Wang's profile picture

Bo Wang

@BoWang8730,352 subscribers

Prof @UofT | Building first Virtual Cell @Xaira_Thera | Chief AI Scientist @UHN | AI & Bio & Healthcare | Inventor of scGPT, MedSAM, BioReason | Opinions my own

Shorts

Can’t believe they actually shipped it 😂 A $20, bottlecap-sized 🦞 in your pocket 🔥

Can’t believe they actually shipped it 😂 A $20, bottlecap-sized 🦞 in your pocket 🔥

1,611,156 次观看

New nature paper today : Sony's Ace robot beats 3 of 5 elite table tennis players. Loses to professionals. Human players win points with faster-than-average shots (p<0.001 between won vs returned). Ace wins with ordinary shots. Same speed and spin profile whether it wins or loses the rally (p=0.88). It's playing a completely different sport than the humans are. Trained entirely in simulation. Zero sim-to-real tricks beyond good physics modeling and asymmetric actor-critic (critic sees ground truth, actor sees noisy sensors). Best part — after watching a point, 1992 Olympian Kinjiro Nakamura said: "I didn't think it was possible. But the fact that it was possible... means that there is a possibility that a human could do it too." Code: Paper:

New nature paper today : Sony's Ace robot beats 3 of 5 elite table tennis players. Loses to professionals. Human players win points with faster-than-average shots (p<0.001 between won vs returned). Ace wins with ordinary shots. Same speed and spin profile whether it wins or loses the rally (p=0.88). It's playing a completely different sport than the humans are. Trained entirely in simulation. Zero sim-to-real tricks beyond good physics modeling and asymmetric actor-critic (critic sees ground truth, actor sees noisy sensors). Best part — after watching a point, 1992 Olympian Kinjiro Nakamura said: "I didn't think it was possible. But the fact that it was possible... means that there is a possibility that a human could do it too." Code: Paper:

99,893 次观看

Few people realize how fast AI is already revolutionizing surgery! Medivis's SurgicalAR platform for assisting neurosurgeons during surgery just received FDA clearance in Dec 2025. We keep debating whether AI will replace doctors. Meanwhile surgeons are literally seeing through patients with AR navigation in real-time. The right question was never replacement. It was always: what does a surgeon become when they have superhuman perception? We will share a lot more projects in AI & surgery soon from University Health Network ! Stay tuned

Sensitive content

Few people realize how fast AI is already revolutionizing surgery! Medivis's SurgicalAR platform for assisting neurosurgeons during surgery just received FDA clearance in Dec 2025. We keep debating whether AI will replace doctors. Meanwhile surgeons are literally seeing through patients with AR navigation in real-time. The right question was never replacement. It was always: what does a surgeon become when they have superhuman perception? We will share a lot more projects in AI & surgery soon from University Health Network ! Stay tuned

41,660 次观看

🚀 The Segment Anything Model (SAM) has been upgraded to SAM2, featuring an efficient image encoder for segmenting images and videos. But does SAM2 outperform SAM1 in medical image and video segmentation? We're thrilled to present our paper "Segment Anything in Medical Images and Videos: Benchmark and Deployment"! We comprehensively benchmark SAM2 across 11 medical image modalities and videos. 📄 Paper: 💻 Code: **Highlights:** 1. SAM2 doesn’t always outperform SAM1 in 2D medical images, but excels in video segmentation, making it more accurate and efficient for 3D images, such as CT and MR scans. 2. MedSAM still outperforms SAM2 on most 2D modalities, but SAM2 surpasses MedSAM for 3D image segmentation in a slice-by-slice approach. 3. Segmentation performance varies with model size; sometimes the smallest model outperforms larger ones. 4. Fine-tuning SAM2 significantly boosts its performance for medical image segmentation. While SAM2 may struggle with challenging objects that have unclear boundaries or low contrast, it excels in generating good initial segmentation masks for common medical images and videos. However, the official interface doesn’t support medical data formats and has limitations on video length. To address this, we've developed a 3D Slicer Plugin and Gradio API for efficient 3D medical image and video segmentation. We invite you to try them out and provide feedback! 🔧 Deployment: - 3D Slicer Plugin: - Gradio API: (Note: Due to GPU limitations, the online API is available for only 12 hours and may be slow. We highly recommend deploying the Gradio API with your own computing resources: A big shoutout to Jun Ma (JunMa) who recently joined our UHN AI hub (UHN AI Hub) as Machine Learning Lead, and kudos to all co-authors: Sumin Kim, Feifei Li, Mohammed Baharoon (Mohammed Baharoon), Reza Asakereh, and Hongwei Lyu! This is true teamwork! Looking forward to collaborating with the community to advance 3D medical image and video segmentation foundation models! University Health Network U of T Department of Computer Science Department of Laboratory Medicine & Pathobiology Temerty Centre for AI in Medicine (T-CAIREM) Vector Institute #MedTech #AIinHealthcare #DeepLearning #MedicalImaging #SAM2 #MedSAM #AIResearch

🚀 The Segment Anything Model (SAM) has been upgraded to SAM2, featuring an efficient image encoder for segmenting images and videos. But does SAM2 outperform SAM1 in medical image and video segmentation? We're thrilled to present our paper "Segment Anything in Medical Images and Videos: Benchmark and Deployment"! We comprehensively benchmark SAM2 across 11 medical image modalities and videos. 📄 Paper: 💻 Code: **Highlights:** 1. SAM2 doesn’t always outperform SAM1 in 2D medical images, but excels in video segmentation, making it more accurate and efficient for 3D images, such as CT and MR scans. 2. MedSAM still outperforms SAM2 on most 2D modalities, but SAM2 surpasses MedSAM for 3D image segmentation in a slice-by-slice approach. 3. Segmentation performance varies with model size; sometimes the smallest model outperforms larger ones. 4. Fine-tuning SAM2 significantly boosts its performance for medical image segmentation. While SAM2 may struggle with challenging objects that have unclear boundaries or low contrast, it excels in generating good initial segmentation masks for common medical images and videos. However, the official interface doesn’t support medical data formats and has limitations on video length. To address this, we've developed a 3D Slicer Plugin and Gradio API for efficient 3D medical image and video segmentation. We invite you to try them out and provide feedback! 🔧 Deployment: - 3D Slicer Plugin: - Gradio API: (Note: Due to GPU limitations, the online API is available for only 12 hours and may be slow. We highly recommend deploying the Gradio API with your own computing resources: A big shoutout to Jun Ma (JunMa) who recently joined our UHN AI hub (UHN AI Hub) as Machine Learning Lead, and kudos to all co-authors: Sumin Kim, Feifei Li, Mohammed Baharoon (Mohammed Baharoon), Reza Asakereh, and Hongwei Lyu! This is true teamwork! Looking forward to collaborating with the community to advance 3D medical image and video segmentation foundation models! University Health Network U of T Department of Computer Science Department of Laboratory Medicine & Pathobiology Temerty Centre for AI in Medicine (T-CAIREM) Vector Institute #MedTech #AIinHealthcare #DeepLearning #MedicalImaging #SAM2 #MedSAM #AIResearch

178,419 次观看

New Cell paper from Bergles lab at Johns Hopkins just built the most comprehensive map of brain myelin ever made — every oligodendrocyte, across the entire mouse brain, across the lifespan. The scale: >10 million cells per brain, terabyte-scale 3D lightsheet volumes, registered to the Allen Brain Atlas across 417 regions from 2 months to 2+ years of age. The technical stack: Custom tissue clearing (CUBIC-L + SHIELD + uRIMS with 40% urea) to preserve endogenous fluorescence. 3D Mask R-CNN for instance segmentation — not just semantic, instance — so it can distinguish individual cells within dense clusters at scale via overlapping sliding windows. Vision Transformer to classify newly-formed vs. mature oligodendrocytes using soma morphology. All cross-referenced against Allen ISH transcriptomics and MICrONS serial EM. What they found: Oligodendrocyte density varies 10,000-fold across brain regions. Left-right hemispheres: r=0.99. Sex: no significant difference. Strain: matters. The brain never stops myelinating. New oligodendrocytes are still being generated in 2-year-old mice. Prefrontal cortex L6 shows the fastest rates of new myelination into old age — the circuits for executive function keep rewiring throughout life. After demyelination, L4 sensory cortex is the most resilient — oligodendrocytes survive at higher rates. The hippocampus loses nearly everything and barely recovers. Degree of injury doesn't predict rate of recovery. These are independent axes. The Alzheimer's result is the most surprising: Dense-core plaques dominate in cortex and hippocampus. Diffuse/small-core plaques dominate in white matter fiber tracts. Old assumption: diffuse plaques are "less toxic." The data says the opposite — small plaques in fiber tracts cause more myelin loss per plaque than dense-core plaques in gray matter. Plaque load and oligodendrocyte loss are essentially uncorrelated (ρ=0.22). The damage is plaque-type and location specific, not load-dependent. For MS and AD research: you can't read off white matter injury from gray matter plaque burden. The pathology in fiber tracts is running on different rules. Data: Paper:

New Cell paper from Bergles lab at Johns Hopkins just built the most comprehensive map of brain myelin ever made — every oligodendrocyte, across the entire mouse brain, across the lifespan. The scale: >10 million cells per brain, terabyte-scale 3D lightsheet volumes, registered to the Allen Brain Atlas across 417 regions from 2 months to 2+ years of age. The technical stack: Custom tissue clearing (CUBIC-L + SHIELD + uRIMS with 40% urea) to preserve endogenous fluorescence. 3D Mask R-CNN for instance segmentation — not just semantic, instance — so it can distinguish individual cells within dense clusters at scale via overlapping sliding windows. Vision Transformer to classify newly-formed vs. mature oligodendrocytes using soma morphology. All cross-referenced against Allen ISH transcriptomics and MICrONS serial EM. What they found: Oligodendrocyte density varies 10,000-fold across brain regions. Left-right hemispheres: r=0.99. Sex: no significant difference. Strain: matters. The brain never stops myelinating. New oligodendrocytes are still being generated in 2-year-old mice. Prefrontal cortex L6 shows the fastest rates of new myelination into old age — the circuits for executive function keep rewiring throughout life. After demyelination, L4 sensory cortex is the most resilient — oligodendrocytes survive at higher rates. The hippocampus loses nearly everything and barely recovers. Degree of injury doesn't predict rate of recovery. These are independent axes. The Alzheimer's result is the most surprising: Dense-core plaques dominate in cortex and hippocampus. Diffuse/small-core plaques dominate in white matter fiber tracts. Old assumption: diffuse plaques are "less toxic." The data says the opposite — small plaques in fiber tracts cause more myelin loss per plaque than dense-core plaques in gray matter. Plaque load and oligodendrocyte loss are essentially uncorrelated (ρ=0.22). The damage is plaque-type and location specific, not load-dependent. For MS and AD research: you can't read off white matter injury from gray matter plaque burden. The pathology in fiber tracts is running on different rules. Data: Paper:

24,611 次观看

🧬 We have many foundation models or language models for DNAs, but can we control them? We introduce Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL — a reinforcement learning framework for controllable cis-regulatory sequence generation. Paper: Code: 🔬What’s the challenge? Designing regulatory DNA that is both highly expressive in target cell types and inactive in others is essential for synthetic biology, gene therapy, and precision medicine. Yet, controlling these trade-offs is challenging due to sparse, sequence-level rewards and biological constraints. 🔥Why Ctrl-DNA? Ctrl-DNA fine-tunes pre-trained DNA language models using a value model free, Lagrangian-guided RL framework, enabling flexible and customizable constraint optimization. Users can define application-specific thresholds across cell types, balancing expression strength with specificity. ✅ Maximize target-cell expression ✅ Constrain off-target activity under user-defined thresholds ✅ Preserve cell-type-specific TF motif structure Benchmarked on human enhancer and promoter datasets, Ctrl-DNA consistently outperforms prior methods, achieving stronger specificity, higher fitness, and more biologically grounded sequence generation — all with direct control over regulatory trade-offs. Shoutout to the PhD students Xingyu Chen (Xingyu Chen ) and Rex Ma (Rex Ma) for their amazing work leading this project!

🧬 We have many foundation models or language models for DNAs, but can we control them? We introduce Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL — a reinforcement learning framework for controllable cis-regulatory sequence generation. Paper: Code: 🔬What’s the challenge? Designing regulatory DNA that is both highly expressive in target cell types and inactive in others is essential for synthetic biology, gene therapy, and precision medicine. Yet, controlling these trade-offs is challenging due to sparse, sequence-level rewards and biological constraints. 🔥Why Ctrl-DNA? Ctrl-DNA fine-tunes pre-trained DNA language models using a value model free, Lagrangian-guided RL framework, enabling flexible and customizable constraint optimization. Users can define application-specific thresholds across cell types, balancing expression strength with specificity. ✅ Maximize target-cell expression ✅ Constrain off-target activity under user-defined thresholds ✅ Preserve cell-type-specific TF motif structure Benchmarked on human enhancer and promoter datasets, Ctrl-DNA consistently outperforms prior methods, achieving stronger specificity, higher fitness, and more biologically grounded sequence generation — all with direct control over regulatory trade-offs. Shoutout to the PhD students Xingyu Chen (Xingyu Chen ) and Rex Ma (Rex Ma) for their amazing work leading this project!

30,719 次观看

Videos

BoWang87's profile picture

A new Nature paper from Johns Hopkins (by Prof. Lin Dingchang Lin ) just solved one of the hardest problems in biology: how do you record what every cell in a tissue experienced over time, not just what it looks like right now? The answer: GEMINI — Granularly Expanding Memory for Intracellular Narrative Integration. It works exactly like tree rings. Cells are genetically engineered to express a computationally designed protein assembly. As the assembly grows inside the cell, it captures cellular activity as fluorescent ring patterns — each ring a timestamp, each ring's properties encoding signal intensity. Look at a cross-section under a microscope and you can read the cell's history backward, with ~15-minute resolution. The key: cells build the recorder themselves. GEMINI doesn't interfere with normal function — it just quietly writes. What they demonstrated: In a full tumor xenograft, GEMINI captured every cancer cell's activity history across the entire tumor while it continued to grow normally. For the first time, researchers can look back and see how different regions of the same tumor responded differently to therapy over time — not snapshots, but film. In a mouse brain, GEMINI recorded neural activity dynamics without disrupting behavior, coordination, or memory. It could temporally resolve the history of a brain seizure. Why this matters: Every tool we have in biology gives you state — what the cell looks like now. Sequencing, imaging, proteomics — all snapshots. GEMINI gives you trajectory. It's the difference between a photograph and a video, applied to every cell in an organ simultaneously. The team is explicit that AI-based decoding tools will be central to reading GEMINI's output at whole-brain scale. This is the data layer that makes temporal single-cell atlases possible. Paper: Congratulations Dingchang Lin

Bo Wang

84,974 次观看 • 3 个月前

BoWang87's profile picture

Nature figured out distributed systems millions of years before we did. Meet the giant honeybee (Apis dorsata). No hive box, no protection—just thousands of bees exposed on a cliff face or tree branch. Their defense? A biological Mexican wave that makes predators freeze in confusion. This is shimmering. And the science behind it is wild. The Visual Picture a dark sheet of bees covering an open comb. Suddenly, a ripple of light flashes across the surface—hundreds of bees flipping their abdomens upward in perfect coordination, creating a wave that propagates in under a second. To a wasp or bird approaching for a meal, it's disorienting. The nest surface seems alive, unpredictable, dangerous. How It Actually Works Three distinct "agent types" coordinate this defense: 1. Bucket-Bridging Agents (75% of participants) The foot soldiers. These bees pass the signal neighbor-to-neighbor like a bucket brigade at a fire. They receive the cue from an adjacent bee, flip their abdomen, and pass it on. Velocity: ~0.32 m/s. Linear, reliable, slow. 2. Chain-Tail Agents (9%) The end of the line. These bees get activated but don't propagate the signal further. They're the wave's trailing edge. 3. Generator Agents (16%) Here's where it gets interesting. These bees flip their abdomens before the main wave reaches them. They create "daughter waves" that merge with the parental wave, accelerating the whole process by 41.5% to ~0.51 m/s. Without generators, shimmering would be too slow to matter. With them, the colony responds in real-time to a wasp's flight path. The "Special Agents" Hypothesis Early researchers assumed the bees closest to a predator would trigger the wave. Makes sense, right? Wrong. Experiments with tethered wasps revealed something stranger: shimmering starts at specific "trigger centers" clustered around the nest's mouth zone—where foragers enter and exit. These aren't random bees. They're specialized. The position of trigger cohorts doesn't match the predator's location. Instead, bees in these zones are primed to respond faster, possibly through age or experience. Think of them as sentinels—stationed strategically, not reactively. The Visual Trigger System Shimmering isn't automatic. Bees are selective about when to deploy it: • Contrast matters: Dark objects against bright backgrounds (like a hornet silhouetted against sky) trigger strong responses. Reverse the contrast—light object on dark—and nothing happens. • Size threshold: Objects smaller than ~4cm don't trigger shimmering. Below a certain visual angle (1.6–3.4 degrees), the threat isn't worth the energy. • Light dependence: Shimmering peaks in bright daylight. At dawn/dusk, the colony switches to other defenses. The visual system needs illumination to work. Why This Is Brilliant Shimmering solves multiple problems simultaneously: 1. Predator deterrence: Wasps see the wave and abort approach. The movement is unpredictable, hard to track, signals a coordinated colony. 2. Internal alarm: The wave propagates mechanoreceptive cues and Nasonov pheromone through the nest, alerting bees to prepare for escalation—mass stinging if the predator persists. 3. Energy efficiency: Not every threat triggers full defense. The visual filtering (size, contrast, light) prevents false alarms. 4. Speed through parallelism: Generator agents create saltatory (jumping) propagation that outpaces simple neighbor-to-neighbor transfer. The colony literally shortcuts information flow.

Bo Wang

72,063 次观看 • 3 个月前

BoWang87's profile picture

Welcome to the Lab of the Future! 🧬🤖 Excited to share LUMI-lab, out today in Cell — a self-driving platform that pairs an AI foundation model with a robotic lab to autonomously discover ionizable lipids (LNPs) for mRNA delivery. The core problem: Designing lipid nanoparticles (LNPs) is hard. The chemical space of ionizable lipids is vast, experimental cycles are slow, and — critically — historical LNP datasets are far too small to train a predictive model from scratch. Most AI approaches in this space hit a wall immediately: not enough data to learn from. Our solution: lab-in-the-loop foundation model learning. Instead of training on LNP data alone, LUMI starts as a transformer-based foundation model pretrained across broad chemical space, building rich molecular representations before it ever sees a single LNP experiment. Then it enters a closed loop with a robotic synthesis platform: predict → synthesize → assay → update. Each round of real wet-lab experiments fine-tunes the model, which then proposes smarter candidates for the next round. The lab isn't just validating AI predictions — it's actively teaching the model, continuously. What happened when we let it run: LUMI-lab autonomously synthesized and screened 1,700+ ionizable lipids in human bronchial epithelial cells. The top candidate — LUMI-6 — features a brominated lipid tail, a structural motif that had been largely overlooked in LNP design. LUMI found it without being told where to look. When formulated into LNPs and delivered intratracheally to mice, LUMI-6 achieved 20.3% gene editing efficiency in lung epithelial cells — a compelling result for one of the hardest-to-reach therapeutic targets, directly relevant to diseases like cystic fibrosis and alpha-1 antitrypsin deficiency. Why this matters beyond LNPs: This is a proof of concept for a broader thesis — that foundation model pretraining + active learning + robotic experimentation can overcome the data scarcity bottleneck that plagues AI-driven discovery in biology. You don't need a massive domain-specific dataset to start. You need a model that can generalize, a lab that can generate the right data, and a loop that connects them. Huge congratulations to first authors Yue Xu, Haotian Cui, and Kuan Pang, and to the entire Bowen LI team. Grateful to our collaborators at University Health Network and Leslie Dan Faculty of Pharmacy, and to Princess Margaret Cancer Centre Research Princess Margaret Cancer Centre Research. 📄 Paper:

Bo Wang

57,280 次观看 • 3 个月前

BoWang87's profile picture

🚀 The Segment Anything Model (SAM) has been upgraded to SAM2, featuring an efficient image encoder for segmenting images and videos. But does SAM2 outperform SAM1 in medical image and video segmentation? We're thrilled to present our paper "Segment Anything in Medical Images and Videos: Benchmark and Deployment"! We comprehensively benchmark SAM2 across 11 medical image modalities and videos. 📄 Paper: 💻 Code: **Highlights:** 1. SAM2 doesn’t always outperform SAM1 in 2D medical images, but excels in video segmentation, making it more accurate and efficient for 3D images, such as CT and MR scans. 2. MedSAM still outperforms SAM2 on most 2D modalities, but SAM2 surpasses MedSAM for 3D image segmentation in a slice-by-slice approach. 3. Segmentation performance varies with model size; sometimes the smallest model outperforms larger ones. 4. Fine-tuning SAM2 significantly boosts its performance for medical image segmentation. While SAM2 may struggle with challenging objects that have unclear boundaries or low contrast, it excels in generating good initial segmentation masks for common medical images and videos. However, the official interface doesn’t support medical data formats and has limitations on video length. To address this, we've developed a 3D Slicer Plugin and Gradio API for efficient 3D medical image and video segmentation. We invite you to try them out and provide feedback! 🔧 Deployment: - 3D Slicer Plugin: - Gradio API: (Note: Due to GPU limitations, the online API is available for only 12 hours and may be slow. We highly recommend deploying the Gradio API with your own computing resources: A big shoutout to Jun Ma (JunMa) who recently joined our UHN AI hub (UHN AI Hub) as Machine Learning Lead, and kudos to all co-authors: Sumin Kim, Feifei Li, Mohammed Baharoon (Mohammed Baharoon), Reza Asakereh, and Hongwei Lyu! This is true teamwork! Looking forward to collaborating with the community to advance 3D medical image and video segmentation foundation models! University Health Network U of T Department of Computer Science Department of Laboratory Medicine & Pathobiology Temerty Centre for AI in Medicine (T-CAIREM) Vector Institute #MedTech #AIinHealthcare #DeepLearning #MedicalImaging #SAM2 #MedSAM #AIResearch

Bo Wang

178,419 次观看 • 1 年前

BoWang87's profile picture

🔬 Exciting News! Our manuscript, "scGPT: toward building a foundation model for single-cell multi-omics using generative AI" is now finally published in Nature Methods (Nature Methods) 🎉 !!! (Re-)Introducing scGPT: A transformative foundation model engineered for single-cell omics analysis. Developed through the analysis of over 33 million human cells, scGPT sets a new benchmark for application versatility, offering both fine-tuning and zero-shot capabilities. Since its preprint in May 2023, scGPT has significantly impacted the field, evidenced by 13K+ installations, 600+ GitHub stars 🌟, and 40+ citations before its official publication! scGPT has been validated by numerous benchmark studies as a leading foundation model in single-cell analysis. Its pre-trained embeddings extend its utility beyond single-cell studies, enhancing a variety of downstream tasks including protein enrichment and genetic perturbation predictions. Some key updates lately: ---Expanded zero-shot applications for efficient reference mapping and integration, now with CellXGene census integration. ---Advanced perturbation analysis capabilities, including genome-scale perturb-seq data analysis and bulk sequencing data generalization. ---Upgraded scGPT package, offering versatile model loading compatible with PyTorch and flash-attn, for both GPU and CPU. ---Cloud-based scGPT applications for reference mapping, cell annotation, and gene regulatory network inference are available on ---Integration with Hugging Face for easier model training. Limitations: scGPT is an early foray into foundation models for single-cell omics, facing challenges like limited zero-shot learning in some tasks, pretraining constraints, data quality issues, and evaluation limitations. See our Supplementary Notes for details. 🚀 Future Work? Short-Term Goals: 1. Releasing a Mouse Model for broader analysis. 2. Developing a comprehensive evaluation suite for foundation models in single-cell analysis. 3. Creating a foundation model for single-cell spatial omics. 4. Enhancing zero-shot capacity by integrating scGPT with RAG (e.g., knowledge graphs). Long-Term Goals: 1. Expanding scGPT for comprehensive single-cell multi-omics analysis. 2. Developing an in-silico perturbation model for predicting genetic perturbation effects. 3. Merging scGPT with multi-modal genomic sequence models for a deeper understanding of cell biology. 📚 Access the paper on Nature Methods: 🔬Preprint in Bioarixv: 💻 All our codes/data/weights are open source: Wholehearted congratulations to all the authors, especially the two co-first authors, Haotian (Haotian Cui ) and Chloe (ChloeXWang), who are really the emerging superstars in AI and biology! Vector Institute Peter Munk Cardiac Centre AI U of T Department of Computer Science Department of Laboratory Medicine & Pathobiology University Health Network University of Toronto #scGPT #GenerativeAI #AI4Science #Combio #opensource

Bo Wang

199,592 次观看 • 2 年前

BoWang87's profile picture

🚀 We're thrilled to introduce Orthrus 🧬🐕—a groundbreaking mature RNA foundation model designed to push the boundaries of RNA property prediction! 🔬 What is Orthrus? Orthrus is a Mamba-based RNA foundation model, pre-trained using a novel self-supervised contrastive learning objective with biologically inspired augmentations. It optimizes the similarity between splicing isoforms and orthologous transcripts, capturing functional and evolutionary relationships to enhance mature RNA property prediction accuracy. 📑 Preprint: 💻 Code: 🌐 Project Page: 📦 Model Weights: 🧠 Why Orthrus? Decoding the RNA regulatory code is key to understanding biology, but traditional experimental approaches are slow and costly. Existing genomic foundation models rely on techniques like masked language modeling or next-token prediction, which aren't fully aligned with the complexities of genomic data—leading to suboptimal results. 🌟 Orthrus Highlights: - Biologically-Informed Contrastive Learning 🧪: A novel contrastive learning objective designed specifically for genomics, maximizing similarity between splicing isoforms and orthologous transcripts across species. - Extensive Pre-training 📊: Trained on splicing annotations from 10 species and orthologous alignments from 400+ mammalian species (Zoonomia Project), with a focus on sequences of high functional importance. - Superior Representations🏅: Orthrus outperforms existing genomic models on 5 mRNA property prediction tasks, often surpassing supervised methods with just a simple linear transformation. - Efficiency in Low-Data Settings📉: Orthrus excels in low-data regimes, achieving state-of-the-art results with as few as 45 labeled examples for fine-tuning on RNA half-life prediction. Shoutout to the amazing leading authors Phil (Phil Fradkin) and Ian (Ian Shi)! Also the work is impossible without an outstanding collaboration by Karina (Karin(a) Isaev), Brendan (Brendan Frey) , Quaid (Quaid Morris), Leo J. Lee! Vector Institute University Health Network U of T Department of Computer Science Temerty Centre for AI in Medicine (T-CAIREM) Department of Laboratory Medicine & Pathobiology

Bo Wang

114,621 次观看 • 1 年前

BoWang87's profile picture

🎉 The best way to start the week is to find out that our MedSAM is finally published today in Nature Communications! **Segment anything in medical images** Paper: arXiv: Data & Code: MedSAM is the first promotable foundation model for medical image segmentation. **Highlights**: ⭐ Before its formal publication, we have received 220 citations and 1400+ GitHub stars 🙏🙏❤️‍🔥❤️‍🔥❤️‍🔥 📊 We curated a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types. 🚀 Built on top of SAM (AI at Meta ) with transfer learning, we have significantly enhanced its segmentation performance of medical images. 📈 Comprehensive evaluations of 86 internal validation tasks and 60 external validation tasks demonstrate its better accuracy and robustness than modality-wise specialist models. **What is Next? --- Clinical Translation!!** 🍕Our next goal is to make the model deployable on laptops (CPUs) or other edge devices without reliance on GPUs. We have distilled a lightweight model, LiteMedSAM, offering a speed boost of 10x while maintaining accuracy. Plus, we have integrated it into the 3D Slicer plugin, providing an efficient tool for medical image segmentation. 🌐 To further promote developments in this field, we organize a competition on #CVPR2026: Segment Anything in Medical Images on Laptop! An out-of-the-box baseline has been released to reduce the entry barriers. Welcome to join us to push the boundary further: 🙏 Massive thanks to MetaAI AI at Meta for their open-source project SAM and many reviewers/users for their invaluable feedback. A huge shoutout to my postdoc Jun Ma (JunMa) for his leadership on this project!! UHN AI Hub Vector Institute Peter Munk Cardiac Centre AI Department of Laboratory Medicine & Pathobiology U of T Department of Computer Science University of Toronto University Health Network Brad Wouters 🇨🇦 Barry Rubin MD, PhD, FRCSC Shaf Keshavjee

Bo Wang

140,171 次观看 • 2 年前

BoWang87's profile picture

🚀 Introducing scGPT-spatial! 🧬🌍 A game-changing spatial-omic foundation model, built on the powerful scGPT framework with MoE (mixture of experts) and continually pretrained on a massive 30 million spatial single-cell profiles! 🧠 What’s the challenge? Spatial transcriptomics is next-level complex—not only must we model single-cell/spot profiles, but we also need to capture intricate spatial relationships while handling diverse sequencing protocols (imaging-based vs. sequencing-based). 🔥 Why scGPT-spatial? ✨ A Spatial-omic Foundation Model with Continual Pretraining – Built on scGPT’s robust initialization, it unlocks spatial context in tissues. ✨ SpatialHuman30M Dataset – The largest curated dataset: 30M profiles from Visium, Visium HD, Xenium, and MERFISH across 821 slides. ✨ Revolutionary MoE Decoders – A cutting-edge Mixture of Experts (MoE) architecture for protocol-aware gene expression decoding. ✨ Spatially-Aware Training Strategy – A neighborhood-based masked reconstruction approach to capture complex cell-type colocalization. ✨ Multi-Modal & Multi-Slide Integration – Seamless clustering & spatial domain identification across slides and modalities. ✨ Cell-Type Deconvolution & Gene Imputation – Unlocks cross-resolution & cross-modality harmonization with fine-tuned embeddings. 📄 Read the preprint: 💻 Explore the code/weights: #SpatialTranscriptomics #SingleCell #AIResearch #MachineLearning #SpatialData Huge shoutout to the incredible PHD students Chloe (ChloeXWang) and Haotian (Haotian Cui) for leading this groundbreaking project! 🎉 Massive thanks to our amazing co-authors Andrew, Ronald, and Hani (Hani Goodarzi) from Arc Institute—this work wouldn't have been possible without you! 👏

Bo Wang

58,976 次观看 • 1 年前

BoWang87's profile picture

Alibaba just dropped Qwen3.5-397B-A17B and there's a lot to unpack. 397B params, 17B active per forward pass. Sparse MoE done right. But the real story isn't the size—it's the architecture choices. The MoE Design Most MoE models feel like bolt-ons. Qwen 3.5's sparse activation is native—only 4.3% of parameters fire per token. That's how you get trillion-parameter-class performance without trillion-parameter inference costs. The 0.8 RMB/million tokens pricing isn't subsidized; it's structurally earned. Native Multimodal, Not Glued-On This is a vision-language model from the ground up. Heterogeneous architecture—separate processing pipelines for text, image, video that fuse early. Not a vision encoder slapped onto an LLM. The result: 90.8 on OmniDocBench, 79.0 on MMMU-Pro. Document understanding and visual reasoning without the usual brittleness. The Context Window Reality Qwen3.5-Plus (the hosted version) ships with 1M tokens by default. That's not a marketing number—they're actually positioning it for long-document workflows. With built-in adaptive tool use, it's clearly aimed at agentic automation, not just chat. What Actually Impressed Me • FP8 native pipeline: ~50% activation memory reduction • Async RL framework for continuous refinement—training and inference workloads separated • 201 languages (up from 119), 250k vocab for better low-resource encoding • Apache 2.0 license. Full weights on HuggingFace and ModelScope. The Benchmark Context 76.4 on SWE-bench Verified puts it in the range where it can handle real debugging workflows. 72.9 on BFCL v4 for agentic tool use. 88.4 on GPQA Diamond. These aren't SOTA in isolation, but the breadth is unusual—strong across reasoning, coding, multimodal, and agentic tasks. The Honest Caveat I haven't stress-tested the 1M context for needle-in-haystack retrieval yet. And "native multimodal" claims need real-world torture testing—PDFs with tables, charts, mixed layouts. Benchmarks are benchmarks. Bottom Line This isn't just another model release. It's a bet on efficient scale: big model capabilities, small active compute, open weights. At 1/18th the cost of Gemini 3 Pro, it's going to force pricing conversations across the board.

Bo Wang

13,221 次观看 • 3 个月前