正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions paper page: present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at continuous... query points. The spatial positions of their neural features are fixed on grid nodes and cannot well adapt to target signals. Our method instead builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals. To further improve the channel-wise capacity of radial basis functions, we propose to compose them with multi-frequency sinusoid functions. This technique extends a radial basis to multiple Fourier radial bases of different frequency bands without requiring extra parameters, facilitating the representation of details. Moreover, by marrying adaptive radial bases with grid-based ones, our hybrid combination inherits both adaptivity and interpolation smoothness. We carefully designed weighting schemes to let radial bases adapt to different types of signals effectively. Our experiments on 2D image and 3D signed distance field representation demonstrate the higher accuracy and compactness of our method than prior arts. When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed.show more

AK

510,462 subscribers

194,469 次观看 • 2 年前 •via X (Twitter)

科学技术新闻政治教育

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited): – We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; – We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; – Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.

Human Hair Reconstruction with Strand-Aligned 3D Gaussians Contributions (cited): – We propose a new 3D line lifting scheme that uses a modified 3DGS reconstruction technique to lift 2D orientation maps into a 3D field while also providing refinement of the camera parameters; – We introduce a dual representation of hair strand polylines and 3D Gaussians to achieve differentiable rasterization of hair strands and leverage photometric constraints for strand-based hair reconstruction; – Based on these components, we propose a coarse-to-fine optimization method for prior-guided hair reconstruction that leverages both latent and explicit representations of the hairstyle.

MrNeRF

106,525 次观看 • 1 年前

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

3D Gaussian Splatting for Real-Time Radiance Field Rendering paper page: Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

AK

633,532 次观看 • 3 年前

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction Note: Check below for full video. Abstract (cited): "In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion, and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. Our technique is particularly effective for high-quality scene reconstruction from large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. We introduce a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortions, and demonstrates state-of-the-art performance on both synthetic and real-world datasets."

MrNeRF

17,206 次观看 • 1 年前

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

Wonderland: Navigating 3D Scenes from a Single Image Contributions: • First, we introduce a representation for controllable 3D generation by leveraging the generative priors from camera-guided video diffusion models. Unlike image models, video diffusion models are trained on extensive video datasets. This enables them to capture comprehensive spatial relationships within scenes across multiple views and embed a form of "3D awareness" in their latent space, which allows us to maintain 3D consistency in novel view synthesis. • Second, to achieve controllable novel view generation, we empower video models with precise control over specified camera motions. We introduce a novel dual-branch conditioning mechanism that effectively incorporates desired diverse camera trajectories into the video diffusion model. This enables expansion of a single image into a multi-view consistent capture of a 3D scene with precise pose control. • Third, to achieve efficient 3D reconstruction, we directly transform video latents into 3DGS. We propose a novel latent-based large reconstruction model (LaLRM) that lifts video latents to 3D in a feed-forward manner. With this design, during inference, our model directly predicts 3DGS from a single input image, effectively aligning the generation and reconstruction tasks—and bridging image space and 3D space—through the video latent space. Compared with reconstructing scenes from images, the video latent space offers a 256× spatial-temporal reduction while retaining essential and consistent 3D structural details. Such a high degree of compression is crucial, as it allows the LaLRM to handle a wider range of 3D scenes within the reconstruction framework, with the same memory constraints.

MrNeRF

52,849 次观看 • 1 年前

In the summer of 2023, I cold emailed Jensen Huang and asked to capture a NeRF of him at SIGGRAPH. He responded in about an hour and said yes. A radiance field is, in the simplest terms, akin to a 3D photograph. A moment in time, so completely reconstructed that you can move through it and see it from angles the original cameras never occupied. NeRFs were the original method. Gaussian splatting, which debuted at that same SIGGRAPH, has since become the dominant form of radiance field. I called my late friend James, who told me we needed to begin practicing immediately. We ran capture after capture for weeks until we consistently got the capture time down to ~30 seconds with one camera. Later, in a hallway at the LA Convention Center during SIGGRAPH, I captured the portrait you're seeing now, a full 360° gaussian splat of Jensen, rendered here as a 2D flythrough. Afterward, I continued the conversation with him and members of his team to make the case for radiance fields as a foundational representation for imaging. To my surprise, they listened. Three years later, NVIDIA has several works, including NuRec, fVDB, 3DGRUT, and gsplat all utilizing radiance fields. The landscape has evolved enough that the reasoning is obvious. Gaussian splatting has begun to ship across some of the world’s largest industries, including autonomous vehicles, AEC, geospatial, media and entertainment, robotics, e-commerce, hospitality. It’s become clear that lifelike 3D is here to stay. And yet I think we will look back and be disappointed by how late we started taking 3D portraits of the people around us, just like how we have sparse 2D photos of our grandparents and great grandparents. We have billions of photographs of the people we know and love, but almost no radiance fields of them. I'll be returning to SIGGRAPH in LA where this was initially captured three years ago, with the landscape looking significantly different. Radiance fields are more under deployed than ever relative to what they can do. I'm excited for the future of imaging, and for 2D to transition into 3D. I have a few things up my sleeve that I think will make that case plainly.

In the summer of 2023, I cold emailed Jensen Huang and asked to capture a NeRF of him at SIGGRAPH. He responded in about an hour and said yes. A radiance field is, in the simplest terms, akin to a 3D photograph. A moment in time, so completely reconstructed that you can move through it and see it from angles the original cameras never occupied. NeRFs were the original method. Gaussian splatting, which debuted at that same SIGGRAPH, has since become the dominant form of radiance field. I called my late friend James, who told me we needed to begin practicing immediately. We ran capture after capture for weeks until we consistently got the capture time down to ~30 seconds with one camera. Later, in a hallway at the LA Convention Center during SIGGRAPH, I captured the portrait you're seeing now, a full 360° gaussian splat of Jensen, rendered here as a 2D flythrough. Afterward, I continued the conversation with him and members of his team to make the case for radiance fields as a foundational representation for imaging. To my surprise, they listened. Three years later, NVIDIA has several works, including NuRec, fVDB, 3DGRUT, and gsplat all utilizing radiance fields. The landscape has evolved enough that the reasoning is obvious. Gaussian splatting has begun to ship across some of the world’s largest industries, including autonomous vehicles, AEC, geospatial, media and entertainment, robotics, e-commerce, hospitality. It’s become clear that lifelike 3D is here to stay. And yet I think we will look back and be disappointed by how late we started taking 3D portraits of the people around us, just like how we have sparse 2D photos of our grandparents and great grandparents. We have billions of photographs of the people we know and love, but almost no radiance fields of them. I'll be returning to SIGGRAPH in LA where this was initially captured three years ago, with the landscape looking significantly different. Radiance fields are more under deployed than ever relative to what they can do. I'm excited for the future of imaging, and for 2D to transition into 3D. I have a few things up my sleeve that I think will make that case plainly.

Radiance Fields

17,663 次观看 • 1 个月前

Segment Any 3D Gaussians paper page: Interactive 3D segmentation in radiance fields is an appealing task since its importance in 3D scene understanding and manipulation. However, existing methods face challenges in either achieving fine-grained, multi-granularity segmentation or contending with substantial computational overhead, inhibiting real-time interaction. In this paper, we introduce Segment Any 3D GAussians (SAGA), a novel 3D interactive segmentation approach that seamlessly blends a 2D segmentation foundation model with 3D Gaussian Splatting (3DGS), a recent breakthrough of radiance fields. SAGA efficiently embeds multi-granularity 2D segmentation results generated by the segmentation foundation model into 3D Gaussian point features through well-designed contrastive training. Evaluation on existing benchmarks demonstrates that SAGA can achieve competitive performance with state-of-the-art methods. Moreover, SAGA achieves multi-granularity segmentation and accommodates various prompts, including points, scribbles, and 2D masks. Notably, SAGA can finish the 3D segmentation within milliseconds, achieving nearly 1000x acceleration compared to previous SOTA.

Segment Any 3D Gaussians paper page: Interactive 3D segmentation in radiance fields is an appealing task since its importance in 3D scene understanding and manipulation. However, existing methods face challenges in either achieving fine-grained, multi-granularity segmentation or contending with substantial computational overhead, inhibiting real-time interaction. In this paper, we introduce Segment Any 3D GAussians (SAGA), a novel 3D interactive segmentation approach that seamlessly blends a 2D segmentation foundation model with 3D Gaussian Splatting (3DGS), a recent breakthrough of radiance fields. SAGA efficiently embeds multi-granularity 2D segmentation results generated by the segmentation foundation model into 3D Gaussian point features through well-designed contrastive training. Evaluation on existing benchmarks demonstrates that SAGA can achieve competitive performance with state-of-the-art methods. Moreover, SAGA achieves multi-granularity segmentation and accommodates various prompts, including points, scribbles, and 2D masks. Notably, SAGA can finish the 3D segmentation within milliseconds, achieving nearly 1000x acceleration compared to previous SOTA.

AK

69,542 次观看 • 2 年前

Break-A-Scene: Extracting Multiple Concepts from a Single Image introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method paper page:

Break-A-Scene: Extracting Multiple Concepts from a Single Image introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method paper page:

AK

154,511 次观看 • 3 年前

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery Abstract: Drones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.

MrNeRF

21,346 次观看 • 1 年前

Introducing Kaleido💮 from AI at Meta — a universal generative neural rendering engine for photorealistic, unified object and scene view synthesis. Kaleido is built on a simple but powerful design philosophy: 3D perception is a form of visual common sense. Following this idea, we formulate rendering purely as a sequence-to-sequence generation problem, successfully unifying neural rendering with the architecture principles behind modern language and video models. Unlike traditional neural rendering methods, Kaleido learns 3D purely in a data-driven way, without explicit 3D representations or structures. It acquires spatial understanding directly through large-scale video pretraining, then multi-view 3D data finetuning, inspired by how LLMs acquire textual common sense from large corpora before specialising in domains like coding. Through extensive ablations, we progressively modernised the architecture design and training strategies and tackled key scaling challenges in sequence-to-sequence generative rendering, arriving at a design that’s simple, versatile, and scalable. Kaleido significantly outperforms prior generative models in few-view settings, and remarkably is the first zero-shot generative method matches InstantNGP-level rendering quality in multi-view settings. We view Kaleido also as an alternative step towards world modeling that flexibly spans a spectrum of “realities": with many views, it faithfully reconstructs grounded reality; with fewer views, it imagines plausible unseen details. 🔗 Explore more results and paper:

Introducing Kaleido💮 from AI at Meta — a universal generative neural rendering engine for photorealistic, unified object and scene view synthesis. Kaleido is built on a simple but powerful design philosophy: 3D perception is a form of visual common sense. Following this idea, we formulate rendering purely as a sequence-to-sequence generation problem, successfully unifying neural rendering with the architecture principles behind modern language and video models. Unlike traditional neural rendering methods, Kaleido learns 3D purely in a data-driven way, without explicit 3D representations or structures. It acquires spatial understanding directly through large-scale video pretraining, then multi-view 3D data finetuning, inspired by how LLMs acquire textual common sense from large corpora before specialising in domains like coding. Through extensive ablations, we progressively modernised the architecture design and training strategies and tackled key scaling challenges in sequence-to-sequence generative rendering, arriving at a design that’s simple, versatile, and scalable. Kaleido significantly outperforms prior generative models in few-view settings, and remarkably is the first zero-shot generative method matches InstantNGP-level rendering quality in multi-view settings. We view Kaleido also as an alternative step towards world modeling that flexibly spans a spectrum of “realities": with many views, it faithfully reconstructs grounded reality; with fewer views, it imagines plausible unseen details. 🔗 Explore more results and paper:

Shikun Liu

22,332 次观看 • 9 个月前

Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians paper page: Creating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions.

AK

65,847 次观看 • 2 年前

The Hidden Language of Diffusion Models paper page: tackle the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulation

The Hidden Language of Diffusion Models paper page: tackle the challenge of understanding concept representations in text-to-image models by decomposing an input text prompt into a small set of interpretable elements. This is achieved by learning a pseudo-token that is a sparse weighted combination of tokens from the model's vocabulary, with the objective of reconstructing the images generated for the given concept. Applied over the state-of-the-art Stable Diffusion model, this decomposition reveals non-trivial and surprising structures in the representations of concepts. For example, we find that some concepts such as "a president" or "a composer" are dominated by specific instances (e.g., "Obama", "Biden") and their interpolations. Other concepts, such as "happiness" combine associated terms that can be concrete ("family", "laughter") or abstract ("friendship", "emotion"). In addition to peering into the inner workings of Stable Diffusion, our method also enables applications such as single-image decomposition to tokens, bias detection and mitigation, and semantic image manipulation

AK

41,746 次观看 • 3 年前

Depth Any Video with Scalable Synthetic Data AI physicists and chemists continue to make strides in depth estimation from video. Check out this new paper featuring some impressive examples. See the thread for more details (unfortunately no code yet). Abstract: Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse game environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates 0 - even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.

Depth Any Video with Scalable Synthetic Data AI physicists and chemists continue to make strides in depth estimation from video. Check out this new paper featuring some impressive examples. See the thread for more details (unfortunately no code yet). Abstract: Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results. In this paper, we introduce Depth Any Video, a model that tackles the challenge through two key innovations. First, we develop a scalable synthetic data pipeline, capturing real-time video depth data from diverse game environments, yielding 40,000 video clips of 5-second duration, each with precise depth annotations. Second, we leverage the powerful priors of generative video diffusion models to handle real-world videos effectively, integrating advanced techniques such as rotary position encoding and flow matching to further enhance flexibility and efficiency. Unlike previous models, which are limited to fixed-length video sequences, our approach introduces a novel mixed-duration training strategy that handles videos of varying lengths and performs robustly across different frame rates 0 - even on single frames. At inference, we propose a depth interpolation method that enables our model to infer high-resolution video depth across sequences of up to 150 frames. Our model outperforms all previous generative depth models in terms of spatial accuracy and temporal consistency.

MrNeRF

27,428 次观看 • 1 年前

[Discrete Fourier Transform] by Hand ✍️ In signal processing, the Discrete Fourier Transform (DFT) is no doubt the most important method. But the math involved is extremely complex, literally, involving a summation over a complex number term e^(-iwt). I developed this exercise to demonstrate that underneath such complexity, DFT is just a series of matrix multiplications you can calculate by hand. ✍️ Once you see that, it should not surprise you that a deep neural network, which is also a series of matrix multiplications, with activation functions in-between, can learn to perform DFT to process and analyze signals so effectively. How does DFT work? [1] Given ↳ Signals A, B, and C in the 🟧 frequency domain: ◦ A = cos(w) + 2cos(2w) ◦ B = cos(w) + cos(3w) + cos(4w) ◦ C = -cos(2w) + cos(3w) ◦ Each signal is a weighed sum of four cosine waves at frequencies 1w, 2w, 3w, and 4w. ◦ We will apply Inverse DFT to convert the signals to time domain representations, and then demonstrate DFT can convert back to their original frequency domain representations. ↳ Signal X in the 🟩 time domain. X is sampled at 10 time points 1t, 2t, …, 10t: ◦ X = [-2.5, -1.8, 3, -0.7, -1.0, -0.7, 3, -1.8, -2.5, 5] ◦ Suppose X is also a weighted sum of the same four cosine waves, but we don’t already know their weights. We will apply DFT to discover them. [2] 🟧 Frequency Matrix (F) ↳ Write the coefficients of A, B, C as a matrix F. Each signal is a row. Each frequency is a column. ↳ A → [1, 2, 0, 0] ↳ B → [1, 0, 1, 1] ↳ C → [0, 1-, 1, 0] [3] Cosine → Discrete ↳ Sample from the continuous cosine waves at discrete time points 1t, 2t, 3t, to 10t. [4] Cosine Matrix (W) ↳ Write the samples as a matrix, Each frequency is a row. Each time point is a column. [5] Inverse DFT: 🟧 Frequency → 🟩 Time ↳ Multiply the frequency matrix F and the cosine matrix W. ↳ The meaning of this multiplication is to linearly combine the four cosine waves (rows in W) into time-domain signals (rows in T) using the weights specified in F. ↳ The result is matrix T, which are signals A, B, C converted to the time domain. Each signal is a row. Each time point is a column. [6] Transpose ↳ Transpose T, converting each signal’s time domain representation from a row to a column. [7] DFT: 🟩 Time → 🟧 Frequency ↳ Multiply the cosine matrix W with the transpose of matrix T. ↳ The purpose of this multiplication is to take a dot-product between each time-domain signal (columns in the transpose of T) and each cosine wave (rows in W), which has the effect of projecting the signal onto a cosine wave to determine how much they are correlated. Zero means not correlated at all. ↳ The result is an intermediate version of the “recovered” frequency matrix where each column corresponds to a signal and each row corresponds to a frequency. ↳ Compared to the original frequency matrix F, this intermediate matrix has non-zero weights in the correct places, but scaled up by a factor of 5 (n/2, n=10). For example, signal A, originally [1,2,0,0], is recovered at [5,10,0,0]. [8] Scale ↳ Multiply each value by 2/n = 1/5 to scale down the intermediate matrix to match the magnitude of the original frequency matrix F. [9] Transpose ↳ Transpose the recovered frequency matrix back to the same orientation of the original frequency matrix F. ↳ Like magic 🪄, the result is identical to the original F, which means DFT successfully recovered the frequency components of signals A, B, C. [10] Apply DFT to X: 🟩 Time → 🟧 Frequency ↳ Now that we have some confidence in DFT’s ability to recover frequency components, we apply DFT to X’s time-domain representation by multiplying W with X. ↳ The result is the an intermediate matrix. [11] Scale ↳ Similarly, we scale down by a factor of 5 to obtain the recovered frequency components of X (a column). [12] Transpose ↳ Similarly, we transpose the recovered column to row to match the orientation of the frequency matrix. ↳ Using the coefficients [0,0,3,2], we can write the equation of X as 3cos(3w) + 2cos(4w). Notes: I hope this by hand exercise helps you understand the essence of DFT. But there is more technical details, such as: • Sine: The complete DFT math also includes sine waves that follow a similar calculation process. • Phase: Here, we assume all the cosine waves are aligned at the origin, namely, phase is 0. If a phase p is added, for example, cos(w+p), we will need to calculate the sine component and use their ratio to figure out what p is. • Magnitude: If phase is not zero, the magnitude will need to be calculated by combining both cosine and sine terms.

[Discrete Fourier Transform] by Hand ✍️ In signal processing, the Discrete Fourier Transform (DFT) is no doubt the most important method. But the math involved is extremely complex, literally, involving a summation over a complex number term e^(-iwt). I developed this exercise to demonstrate that underneath such complexity, DFT is just a series of matrix multiplications you can calculate by hand. ✍️ Once you see that, it should not surprise you that a deep neural network, which is also a series of matrix multiplications, with activation functions in-between, can learn to perform DFT to process and analyze signals so effectively. How does DFT work? [1] Given ↳ Signals A, B, and C in the 🟧 frequency domain: ◦ A = cos(w) + 2cos(2w) ◦ B = cos(w) + cos(3w) + cos(4w) ◦ C = -cos(2w) + cos(3w) ◦ Each signal is a weighed sum of four cosine waves at frequencies 1w, 2w, 3w, and 4w. ◦ We will apply Inverse DFT to convert the signals to time domain representations, and then demonstrate DFT can convert back to their original frequency domain representations. ↳ Signal X in the 🟩 time domain. X is sampled at 10 time points 1t, 2t, …, 10t: ◦ X = [-2.5, -1.8, 3, -0.7, -1.0, -0.7, 3, -1.8, -2.5, 5] ◦ Suppose X is also a weighted sum of the same four cosine waves, but we don’t already know their weights. We will apply DFT to discover them. [2] 🟧 Frequency Matrix (F) ↳ Write the coefficients of A, B, C as a matrix F. Each signal is a row. Each frequency is a column. ↳ A → [1, 2, 0, 0] ↳ B → [1, 0, 1, 1] ↳ C → [0, 1-, 1, 0] [3] Cosine → Discrete ↳ Sample from the continuous cosine waves at discrete time points 1t, 2t, 3t, to 10t. [4] Cosine Matrix (W) ↳ Write the samples as a matrix, Each frequency is a row. Each time point is a column. [5] Inverse DFT: 🟧 Frequency → 🟩 Time ↳ Multiply the frequency matrix F and the cosine matrix W. ↳ The meaning of this multiplication is to linearly combine the four cosine waves (rows in W) into time-domain signals (rows in T) using the weights specified in F. ↳ The result is matrix T, which are signals A, B, C converted to the time domain. Each signal is a row. Each time point is a column. [6] Transpose ↳ Transpose T, converting each signal’s time domain representation from a row to a column. [7] DFT: 🟩 Time → 🟧 Frequency ↳ Multiply the cosine matrix W with the transpose of matrix T. ↳ The purpose of this multiplication is to take a dot-product between each time-domain signal (columns in the transpose of T) and each cosine wave (rows in W), which has the effect of projecting the signal onto a cosine wave to determine how much they are correlated. Zero means not correlated at all. ↳ The result is an intermediate version of the “recovered” frequency matrix where each column corresponds to a signal and each row corresponds to a frequency. ↳ Compared to the original frequency matrix F, this intermediate matrix has non-zero weights in the correct places, but scaled up by a factor of 5 (n/2, n=10). For example, signal A, originally [1,2,0,0], is recovered at [5,10,0,0]. [8] Scale ↳ Multiply each value by 2/n = 1/5 to scale down the intermediate matrix to match the magnitude of the original frequency matrix F. [9] Transpose ↳ Transpose the recovered frequency matrix back to the same orientation of the original frequency matrix F. ↳ Like magic 🪄, the result is identical to the original F, which means DFT successfully recovered the frequency components of signals A, B, C. [10] Apply DFT to X: 🟩 Time → 🟧 Frequency ↳ Now that we have some confidence in DFT’s ability to recover frequency components, we apply DFT to X’s time-domain representation by multiplying W with X. ↳ The result is the an intermediate matrix. [11] Scale ↳ Similarly, we scale down by a factor of 5 to obtain the recovered frequency components of X (a column). [12] Transpose ↳ Similarly, we transpose the recovered column to row to match the orientation of the frequency matrix. ↳ Using the coefficients [0,0,3,2], we can write the equation of X as 3cos(3w) + 2cos(4w). Notes: I hope this by hand exercise helps you understand the essence of DFT. But there is more technical details, such as: • Sine: The complete DFT math also includes sine waves that follow a similar calculation process. • Phase: Here, we assume all the cosine waves are aligned at the origin, namely, phase is 0. If a phase p is added, for example, cos(w+p), we will need to calculate the sine component and use their ratio to figure out what p is. • Magnitude: If phase is not zero, the magnitude will need to be calculated by combining both cosine and sine terms.

Tom Yeh

116,622 次观看 • 2 年前

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: Code: Like AlphaEvolve and its variants, our framework leverages LLMs to find state-of-the-art solutions to complex problems, but using orders of magnitude fewer resources! Many evolutionary AI systems are powerful but act like brute-force engines, burning thousands of samples to find good solutions. This makes discovery slow and expensive. We took inspiration from the efficiency of nature. ‘Shinka’ (進化) is Japanese for evolution, and we designed our system to be just as resourceful. On the classic circle packing optimization problem, ShinkaEvolve discovered a new state-of-the-art solution using only 150 samples. This is a big leap in efficiency compared to previous methods that required thousands of evaluations. We applied ShinkaEvolve to a diverse set of hard problems with real-world applications: 1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering an entire Pareto frontier of solutions trading performance for efficiency. 2/ Competitive Programming: On ALE-Bench (a benchmark for NP-Hard optimization problems), ShinkaEvolve took the best existing agent's solutions and improved them, turning a 5th place solution on one task into a 2nd place leaderboard rank in a competitive programming competition. 3/ LLM Training: We even turned ShinkaEvolve inward to improve LLMs themselves. It tackled the open challenge of designing load balancing losses for Mixture-of-Experts (MoE) models. It discovered a novel loss function that leads to better expert specialization and consistently improves model performance and perplexity. ShinkaEvolve achieves its remarkable sample-efficiency through three key innovations that work together: (1) an adaptive parent sampling strategy to balance exploration and exploitation, (2) novelty-based rejection filtering to avoid redundant work, and (3) a bandit-based LLM ensemble that dynamically picks the best model for the job. By making ShinkaEvolve open-source and highly sample-efficient, our goal is to democratize access to advanced, open-ended discovery tools. Our vision for ShinkaEvolve is to be an easy-to-use companion tool to help scientists and engineers with their daily work. We believe that building more efficient, nature-inspired systems is key to unlocking the future of AI-driven scientific research. We are excited to see what the community builds with it! Learn more in our technical report:

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: Code: Like AlphaEvolve and its variants, our framework leverages LLMs to find state-of-the-art solutions to complex problems, but using orders of magnitude fewer resources! Many evolutionary AI systems are powerful but act like brute-force engines, burning thousands of samples to find good solutions. This makes discovery slow and expensive. We took inspiration from the efficiency of nature. ‘Shinka’ (進化) is Japanese for evolution, and we designed our system to be just as resourceful. On the classic circle packing optimization problem, ShinkaEvolve discovered a new state-of-the-art solution using only 150 samples. This is a big leap in efficiency compared to previous methods that required thousands of evaluations. We applied ShinkaEvolve to a diverse set of hard problems with real-world applications: 1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering an entire Pareto frontier of solutions trading performance for efficiency. 2/ Competitive Programming: On ALE-Bench (a benchmark for NP-Hard optimization problems), ShinkaEvolve took the best existing agent's solutions and improved them, turning a 5th place solution on one task into a 2nd place leaderboard rank in a competitive programming competition. 3/ LLM Training: We even turned ShinkaEvolve inward to improve LLMs themselves. It tackled the open challenge of designing load balancing losses for Mixture-of-Experts (MoE) models. It discovered a novel loss function that leads to better expert specialization and consistently improves model performance and perplexity. ShinkaEvolve achieves its remarkable sample-efficiency through three key innovations that work together: (1) an adaptive parent sampling strategy to balance exploration and exploitation, (2) novelty-based rejection filtering to avoid redundant work, and (3) a bandit-based LLM ensemble that dynamically picks the best model for the job. By making ShinkaEvolve open-source and highly sample-efficient, our goal is to democratize access to advanced, open-ended discovery tools. Our vision for ShinkaEvolve is to be an easy-to-use companion tool to help scientists and engineers with their daily work. We believe that building more efficient, nature-inspired systems is key to unlocking the future of AI-driven scientific research. We are excited to see what the community builds with it! Learn more in our technical report:

Sakana AI

359,537 次观看 • 10 个月前

🚨BREAKING: THE COUP IS OVER | WAGNER’S RETREATING This official statement from Prigozhin, the head of the Wagner group and the leader of this coup, says it all. I don't think anyone expected this: "They were going to dismantle PMC Wagner. We came out on 23 June to the March of Justice. In a day, we walked to nearly 200km away from Moscow. In this time, we did not spill a single drop of blood of our fighters. Now, the moment has come when blood may spill. That’s why, understanding the responsibility for spilling Russian blood on one of the sides, we are turning back our convoys and going back to field camps according to the plan." The President of Belarus, Lukashenko, has been in talks with Prigozhin all day and has taken credit for the peace agreement. Prigozhin accepted the terms of Lukashenko’s agreement and agreed to halt the movement of his forces and return back to his bases. The agreement also guarantees security for fighters of PMC Wagner. It seems that the attempted coup has come to an end, and Prigozhin, along with his men, will return to their bases. Reports of Wagner forces not only leaving Moscow Oblas, but also leaving Rostov. Russian media reports that criminal cases have already been dropped from Yevgeny Prigozhin and that Prigozhin and his forces will receive FULL IMMUNITY Restrictions on the movement of vehicles have been lifted from the Voronezh region which saw clashes earlier during the coup. MY THOUGHTS: - I did not expect this would end peacefully with a deal as it seemed both sides seemed at the point of no return - I have no idea how Prigozhin and Putin can both operate in Russia with what just transpired, and I also have no idea what will happen with the war in Ukraine but I wouldn’t be surprised if we see a space deal reached. - Today was another example of citizen journalism replacing mainstream media with UNBIASED and UNCENSORED live breaking news. - I am fried, been awake for more than 30 hours, initially doing a space with former Pakistani Prime Minister Imran Khan before shifting to the Coup space which is at 21 hours and counting. Time for me to finally sleep!

🚨BREAKING: THE COUP IS OVER | WAGNER’S RETREATING This official statement from Prigozhin, the head of the Wagner group and the leader of this coup, says it all. I don't think anyone expected this: "They were going to dismantle PMC Wagner. We came out on 23 June to the March of Justice. In a day, we walked to nearly 200km away from Moscow. In this time, we did not spill a single drop of blood of our fighters. Now, the moment has come when blood may spill. That’s why, understanding the responsibility for spilling Russian blood on one of the sides, we are turning back our convoys and going back to field camps according to the plan." The President of Belarus, Lukashenko, has been in talks with Prigozhin all day and has taken credit for the peace agreement. Prigozhin accepted the terms of Lukashenko’s agreement and agreed to halt the movement of his forces and return back to his bases. The agreement also guarantees security for fighters of PMC Wagner. It seems that the attempted coup has come to an end, and Prigozhin, along with his men, will return to their bases. Reports of Wagner forces not only leaving Moscow Oblas, but also leaving Rostov. Russian media reports that criminal cases have already been dropped from Yevgeny Prigozhin and that Prigozhin and his forces will receive FULL IMMUNITY Restrictions on the movement of vehicles have been lifted from the Voronezh region which saw clashes earlier during the coup. MY THOUGHTS: - I did not expect this would end peacefully with a deal as it seemed both sides seemed at the point of no return - I have no idea how Prigozhin and Putin can both operate in Russia with what just transpired, and I also have no idea what will happen with the war in Ukraine but I wouldn’t be surprised if we see a space deal reached. - Today was another example of citizen journalism replacing mainstream media with UNBIASED and UNCENSORED live breaking news. - I am fried, been awake for more than 30 hours, initially doing a space with former Pakistani Prime Minister Imran Khan before shifting to the Coup space which is at 21 hours and counting. Time for me to finally sleep!

Mario Nawfal

21,667,270 次观看 • 3 年前

We're excited to unveil NRN Agents, a rebrand that aligns our project identity with our token and strengthens our mission to power the future of AI-driven gaming. This mission requires collaboration, and starting this week, we will begin our expansion to become a multi-chain ecosystem. We are joining forces with leading gaming platforms and ecosystems to realize this vision. Stay tuned for more announcements to come. Why NRN Agents? NRN stands for NEURON, the fundamental unit of intelligence. Our AI agents function as the neural foundation of games, learning, adapting, and evolving within game worlds to deliver unparalleled engagement. NRN agent SDK enables advanced gaming agents powered by a proprietary machine learning infrastructure focused on behavioral learning. We've perfected the craft of gaming agent design, creating hyper-efficient agents that are performant and scalable—from casual to the most demanding games. Our SDK will seamlessly integrate into many platforms, tech stacks, and ecosystem – Any Game. Any Chain. More than just games, it's the path to AGI Gaming is our proving ground, but not our final destination. We're using games as a sandbox to accelerate the development of generalized intelligence—one that will create meaningful real-world impact. With the upcoming launch of [redacted] and a growing network of partners committed to the AGI vision, we're building an open-source innovation movement powered by an AI x gaming framework connected by $NRN. $NRN the token $NRN is a utility token that serves as the gateway to our growing ecosystem. It will power a diversified economy with multiple revenue streams and staking opportunities: Agent Deployment: NRN is the laboratory creating gaming agents that can be distributed through platforms and launchpads alike. The model is simple: More games integrate, more NRN agents get deployed, more monetization. Data Creation: NRN Reinforcement Learning (RL) enables token staking to create Data Capsules. Players contribute gameplay data into the Capsules, which are used train RL agents and reward participants (players & stakers). AI Arena: $NRN also continues to power AI Arena's in-game economy, a cult favorite of competitive diehards that features a skill-based wagering system. To our community who have supported us since 2021: thank you for being part of our journey—the next chapter will be the most exciting yet!

We're excited to unveil NRN Agents, a rebrand that aligns our project identity with our token and strengthens our mission to power the future of AI-driven gaming. This mission requires collaboration, and starting this week, we will begin our expansion to become a multi-chain ecosystem. We are joining forces with leading gaming platforms and ecosystems to realize this vision. Stay tuned for more announcements to come. Why NRN Agents? NRN stands for NEURON, the fundamental unit of intelligence. Our AI agents function as the neural foundation of games, learning, adapting, and evolving within game worlds to deliver unparalleled engagement. NRN agent SDK enables advanced gaming agents powered by a proprietary machine learning infrastructure focused on behavioral learning. We've perfected the craft of gaming agent design, creating hyper-efficient agents that are performant and scalable—from casual to the most demanding games. Our SDK will seamlessly integrate into many platforms, tech stacks, and ecosystem – Any Game. Any Chain. More than just games, it's the path to AGI Gaming is our proving ground, but not our final destination. We're using games as a sandbox to accelerate the development of generalized intelligence—one that will create meaningful real-world impact. With the upcoming launch of [redacted] and a growing network of partners committed to the AGI vision, we're building an open-source innovation movement powered by an AI x gaming framework connected by $NRN. $NRN the token $NRN is a utility token that serves as the gateway to our growing ecosystem. It will power a diversified economy with multiple revenue streams and staking opportunities: Agent Deployment: NRN is the laboratory creating gaming agents that can be distributed through platforms and launchpads alike. The model is simple: More games integrate, more NRN agents get deployed, more monetization. Data Creation: NRN Reinforcement Learning (RL) enables token staking to create Data Capsules. Players contribute gameplay data into the Capsules, which are used train RL agents and reward participants (players & stakers). AI Arena: $NRN also continues to power AI Arena's in-game economy, a cult favorite of competitive diehards that features a skill-based wagering system. To our community who have supported us since 2021: thank you for being part of our journey—the next chapter will be the most exciting yet!

NRN Agents

20,762 次观看 • 1 年前

Transformer by hand ✍️ ~ 6 steps walkthrough below Open the hood of a transformer and the parts list is overwhelming: embeddings, positional encoding, attention weighting, self-attention, cross-attention, multi-head attention, layer norm, skip connections, softmax, linear, Nx, shifted right, query, key, value, masking. Which of those actually make the car run? Two of them. Attention weighting and the feed-forward network. Everything else is an enhancement to make it run faster and longer, which is how we got from a car to a truck, and to the word "large" in large language model. So I drew and calculated those two parts entirely by hand. Goal: push five features through one transformer block, filling in every cell yourself. 1. Given Five positions of input features, arriving from the previous block. 2. Attention matrix Let us feed all five features to a query-key module (QK) and read back an attention weight matrix, A. The details of that module are a post of their own. 3. Attention weighting We multiply the input features by A to get the attention weighted features, Z. Still five positions. The effect is to combine features *across positions*, horizontally: X1 becomes X1 + X2, X2 becomes X2 + X3, and so on. 4. First layer Let us feed all five weighted features into the first layer of the FFN. Multiply by the weights and biases. This time the combining happens *across feature dimensions*, vertically, and each feature grows from 3 numbers to 4. Note that every position goes through the same weight matrix. That is what "position-wise" means. 5. ReLU We cross out the negatives. They become zeros. 6. Second layer Let us bring it back down: 4 dimensions to 3. The output feeds the next block, which has a completely separate set of parameters, and the whole thing runs again. You have just calculated a transformer block by hand. ✍️ The takeaway: the two parts are doing two different jobs, and neither one alone is enough. Attention mixes *across positions*, so a feature can see its neighbours. The FFN mixes *across feature dimensions*, so each position can think about itself. Horizontal, then vertical. Then that pattern repeats N times, each block with its own separate set of weights. That is the Nx from the list up top, and that is what makes the transformer run. 💾 Save this post! #AIbyHand #Transformers #DeepLearning

Transformer by hand ✍️ ~ 6 steps walkthrough below Open the hood of a transformer and the parts list is overwhelming: embeddings, positional encoding, attention weighting, self-attention, cross-attention, multi-head attention, layer norm, skip connections, softmax, linear, Nx, shifted right, query, key, value, masking. Which of those actually make the car run? Two of them. Attention weighting and the feed-forward network. Everything else is an enhancement to make it run faster and longer, which is how we got from a car to a truck, and to the word "large" in large language model. So I drew and calculated those two parts entirely by hand. Goal: push five features through one transformer block, filling in every cell yourself. 1. Given Five positions of input features, arriving from the previous block. 2. Attention matrix Let us feed all five features to a query-key module (QK) and read back an attention weight matrix, A. The details of that module are a post of their own. 3. Attention weighting We multiply the input features by A to get the attention weighted features, Z. Still five positions. The effect is to combine features across positions, horizontally: X1 becomes X1 + X2, X2 becomes X2 + X3, and so on. 4. First layer Let us feed all five weighted features into the first layer of the FFN. Multiply by the weights and biases. This time the combining happens across feature dimensions, vertically, and each feature grows from 3 numbers to 4. Note that every position goes through the same weight matrix. That is what "position-wise" means. 5. ReLU We cross out the negatives. They become zeros. 6. Second layer Let us bring it back down: 4 dimensions to 3. The output feeds the next block, which has a completely separate set of parameters, and the whole thing runs again. You have just calculated a transformer block by hand. ✍️ The takeaway: the two parts are doing two different jobs, and neither one alone is enough. Attention mixes across positions, so a feature can see its neighbours. The FFN mixes across feature dimensions, so each position can think about itself. Horizontal, then vertical. Then that pattern repeats N times, each block with its own separate set of weights. That is the Nx from the list up top, and that is what makes the transformer run. 💾 Save this post! #AIbyHand #Transformers #DeepLearning

Tom Yeh

25,768 次观看 • 11 天前

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our understanding of emergence, evolution, and intelligence–core principles that can inspire the next generation of AI systems! We proudly collaborated with MIT, OpenAI, Swiss AI Lab IDSIA, and Ken Stanley on this exciting project. Full Paper (Website): Full Paper (arxiv): Code: In this work, we propose a new algorithm called Automated Search for Artificial Life (“ASAL”) to automate the discovery of artificial life using vision-language foundation models. Instead of tediously hand-designing every tiny rule of an Alife simulation, simply describe the space of simulations to search over, and ASAL will automatically discover the most interesting and open-ended artificial lifeforms! Because of the generality of foundation models, ASAL can discover new lifeforms across a diverse range of seminal ALife simulations, including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. ASAL even discovered novel cellular automata rules that are more open-ended and expressive than the original Conway’s Game of Life. We believe this new paradigm may reignite ALife research by overcoming the bottleneck of manually designed simulations, thus advancing beyond the limits of human ingenuity.

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our understanding of emergence, evolution, and intelligence–core principles that can inspire the next generation of AI systems! We proudly collaborated with MIT, OpenAI, Swiss AI Lab IDSIA, and Ken Stanley on this exciting project. Full Paper (Website): Full Paper (arxiv): Code: In this work, we propose a new algorithm called Automated Search for Artificial Life (“ASAL”) to automate the discovery of artificial life using vision-language foundation models. Instead of tediously hand-designing every tiny rule of an Alife simulation, simply describe the space of simulations to search over, and ASAL will automatically discover the most interesting and open-ended artificial lifeforms! Because of the generality of foundation models, ASAL can discover new lifeforms across a diverse range of seminal ALife simulations, including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. ASAL even discovered novel cellular automata rules that are more open-ended and expressive than the original Conway’s Game of Life. We believe this new paradigm may reignite ALife research by overcoming the bottleneck of manually designed simulations, thus advancing beyond the limits of human ingenuity.

Sakana AI

750,756 次观看 • 1 年前

As we prepare to launch several projects, we're eager to provide a general update to our community. We are steadily approaching our end goal, thanks to the daily progress we're making toward our vision. Achieving our objectives will bring about a significant transformation in cross-chain interoperability and the flow of liquidity within protocols. This will address crucial challenges and drive mass adoption. Our future-focused approach and effective team collaboration keep us moving forward in an organized manner. Let’s delve deeper into the state of development of our current products and upcoming projects. Tao Bridge Starting with the Tao Bridge, which enables the #Bittensor community to unlock DeFi opportunities with their $TAO via a highly efficient blockchain like #MultiversX, known for its security, speed, and affordability. We deeply admire #Bittensor and believe a project like that is crucial for the future of not just the crypto space but also humanity, as it addresses the major challenges AI faces today: centralization, siloed and isolated work, which pose risks and hinder the technology's potential. We are committed to the vision of subnets and dynamic $TAO, convinced that this ecosystem is as groundbreaking as #Ethereum or #Bitcoin. We will continue to support #Bittensor wherever possible, and our bridge will also expand to other chains with Hatom V2. The TAO Bridge, deployed on and accessible through will launch on the Mainnet in 14 days, on March 27th. You can follow the countdown on the lending page at Given that our main priorities are security and stability, this period will be primarily focused on quality assurance to ensure a flawless Mainnet launch. The launch will also introduce TAO Liquid Staking at along with the integration of both $wTAO and $swTAO on the lending page. This allows #Bittensor users to leverage liquid stake, employ short or long strategies, among other DeFi strategies, or simply access stablecoin liquidity while maintaining exposure to their $TAO. Up to $1M will be distributed as additional incentives on top of the supply APYs at the launch of the $wTAO and $swTAO money markets, with $200K allocated for the first month specifically for bootstrapping. Initially, 70% of rewards will go to liquidity providers, and 30% to those using $HTM to boost their lending positions. This changes to a 50-50 split in the second month, and by the third month, all incentives are directed through the Booster. This approach encourages early participation and sustained engagement with $HTM. Introducing $TAO to #MultiversX will result in the creation of Liquidity Pools (LPs) on both AshSwap 🔥 and xExchange ⚡. These LPs will be incentivized by both entities, and Hatom will distribute extra rewards at launch. The goal is to make #MultiversX a one-stop hub for $TAO holders. Upon stabilizing the volumes, there will also be plans to integrate it on AshPerp 🔥. Furthermore, with the release of $USH, users will have the ability to mint it while retaining exposure to their $TAO. The TAO Bridge and TAO Liquid Staking smart contracts have been audited by Runtime Vеrification and @arda_project, while penetration testing and DevSecOps have been performed on our infrastructure by CertiK. We're excited to announce our exclusive partnership with TAONEW one of the top 5 validators on #Bittensor. TAONEW has been extremely helpful and supportive from day one. By sharing 50% of its service fee with its stakers, TAONEW enables Hatom to offer an optimized Staking APY to its users. Since our initial reference, #Bittensor has grown sevenfold, becoming the largest AI project in the crypto sphere. We reiterate our commitment to contribute to such technology and hope to address some of its current DeFi challenges. Syfy Moving forward, today marks a significant milestone, not only for our decentralized protocols but also for our development companies, which currently stand as the sole and primary contributors to the Hatom Labs and Soul Labs. We’re excited to unveil Syfy, the evolved identity of Hatom Labs and Soul Labs, now serving as the parent entity for our burgeoning development companies. Organization is crucial for scalability, which is why Syfy was established to cultivate an environment where our teams can collaborate more seamlessly, enhancing our effectiveness and efficiency. At the same time, we remain committed to upholding the financial independence of each project, supported by its own community of funding contributors. Feel free to explore our website at for more information! Additionally, don't forget to follow Syfy and explore their Genesis article highlighted in their initial post: Booster V2 The Booster V2 will introduce a range of new features and opportunities for $HTM holders: Optimized Position Boosting: Previously, boosting was done individually for each money market, necessitating $HTM token distribution and periodic rebalancing due to price fluctuations. With Booster V2, the system now considers the overall position, eliminating the need for manual rebalancing. Gas Fee Reduction: Booster V2 implements optimizations that result in reduced gas fees, making transactions more cost-effective for users. Incorporation of Governance: Users staking $HTM tokens gain voting rights directly within the Booster, allowing them to participate in governance decisions while maintaining their staked positions. (Note: Only $HTM tokens are considered for governance; LP tokens are not included.) Enhanced Boosting Mechanism: The Booster V2 enables LP Tokens to boost positions within the Booster, leveraging trading fees from swaps and farm incentives while boosting lending positions. Smart Contract Completion: The Booster smart contract has been completed and audited by @arda_project, ensuring security and reliability. Frontend Implementation: The frontend design for Booster V2 has been successfully implemented, providing users with an intuitive interface. Collaboration with xExchange: Exploration is ongoing for collaboration with xExchange ⚡ to enable LP creation, farming, and meta-staking within the Booster. Upon finalization of testing, we will launch the Booster V2 on the devnet to gather community feedback and begin preparations for the mainnet release. Soul Before delving into Soul Labs's developments, it's essential to summarize its core functionality briefly: Soul Labs seamlessly connects different lending protocols and blockchains, facilitating lending and borrowing across platforms like Aave, Compound Labs, and Hatom Labs, consolidating liquidity and users' borrowing capabilities. Utilizing LayerZero Labs and other messaging layers for cross-chain communication, Soul Labs bypasses asset bridging or synthetics, unlocking novel DeFi strategies and solidifying its position as the ultimate solution for cross-lending dilemmas. Soul V1 will be permissionless, holding censorship-resistant features, incorporating multiple redundancy mechanisms, and providing support for various DApps. We're thrilled to announce that, following the launch of the Tao Bridge in 2-3 weeks, we will introduce the Soul Labs website. This platform has been meticulously crafted over 250 days to not only provide a comprehensive overview of our vision but also to offer an engaging and captivating experience that promises to be memorable. Regarding the app, significant progress has been made on the V1 protocol, including: Smart Contract Development and Testing: • Completion of the initial phase of smart contract development. • Conducting advanced testing to ensure the system's robustness. • Establishment of a fully functional proof of concept. Successful deployment and testing on the #Goerli (#Ethereum Testnet) and #Mumbai (#Polygon Testnet), leveraging LayerZero Labs for seamless operation. Feature Enhancement and Protocol Optimization: • Enhanced testing procedures to bolster system resilience. • Integration of advanced features and significant code refactoring for optimization. • Incorporation of various communication methods, including LayerZero Labs, Formerly Axelar, now at @axelar, Chainlink CCIP), and wormholecrypto, into Soul Labs framework, enhancing its resilience and flexibility. This allows Soul Labs to maintain operation through alternative protocols if the primary one is temporarily paused. Website Development and Documentation: • Nearing the completion of the v1 app, with final touches being applied. • The preparation of comprehensive V1 documentation and the Yellow Paper, available upon Soul Labs's public launch, offering detailed insights into the platform's infrastructure and capabilities. USH Recognizing the critical need for stable liquidity within the ecosystem, we have positioned ourselves at the forefront of providing a solution by introducing $USH, the first native, decentralized, and over-collateralized stablecoin on #MultiversX. As market conditions have improved, we have observed a growing demand for stablecoins in the ecosystem, evidenced by the utilization rate in the Lending Protocol spiking to over 90% several times in recent months. Therefore, our goal is to tackle the current challenges faced by users by creating a robust product that will not only help them hedge against market volatility but also open up better opportunities to trade the markets and generate yield. We're happy to unveil the $USH website, now live with a sleek and intuitive user interface, designed for ease of use, which ensures that interacting with the protocol is straightforward and accessible for all. You can access it now through this link: For the technical side, we’re advancing steadily and we’ve accomplished the following milestones: Lending Protocol Facilitator: • Coded the first version to support multiple discount factors for different collaterals. • Implemented tracking of borrowing effectiveness to enable earnings forecasting for the module and support minting processes. Isolated Pools Facilitator: • Coded the first version of Isolated Pools Facilitator. • Use of $EGLD or $sEGLD as collateral, with positions stored always in $EGLD to benefit the protocol through Liquid Staking and lending interest. • Virtual account implementation for converting $sEGLD earnings into $USH, functioning like liquidation where users deposit $USH for a higher amount of $HsELGD. Staking Module • Coded the first version of the Staking Module that allows users to stake and unstake without any restrictions. We're currently focusing our efforts on the following tasks: • Implementation of HTM Booster in the discount model in the Lending Protocol. • Implementation of different depeg strategies and brainstorming further potential “soft” depeg mechanisms. • Research and implementation of rewards model for Staking Module. • Research and implementation of Boosted Vaults Facilitator. • Review and stress-test the first version of the code. Upon launch, $USH will be integrated into various protocols and AMMs across the ecosystem, further increasing both its utility and liquidity. The opportunities will be vast, enabling users to engage in a wide range of activities such as yield farming, staking, and arbitrage, all while leveraging a stable and reliable asset. Regarding the USH Airdrop campaign, it will continue until the official launch of $USH planned for late Q2-early Q3, rewarding all users who have actively participated in the initiative. Hatom V2 It is clear by now that we are driven to build a more robust, interoperable, and secure DeFi space, removing the current barriers that hinder users' capabilities to seamlessly interact with different blockchains. Through Hatom V2, we will introduce Hatom's cross-chain architecture, designed from the ground up for interoperability. This approach will elevate the protocol to unprecedented levels, enabling its deployment across various blockchains and facilitating seamless connections between them through Soul. By enhancing interoperability, Hatom V2 aims to foster a more inclusive and accessible ecosystem. This expansion will not only broaden the protocol's reach but also significantly increase its flexibility and utility, allowing users to interact with a diverse range of assets and products across different chains. We’re thrilled to share that we are currently crafting the V2 redesign of the Hatom webpage. Anticipate a jaw-dropping transformation that will truly astonish, blending cutting-edge design with an unparalleled user experience, elevating it to a dynamic, interactive hub, and making every interaction more engaging. Good things take time, but we are confident that the release of V2 website will take place in the second quarter of this year and will officially mark the start of our journey into the cross-chain landscape. We are excited about the future and we truly believe that this will mark the beginning of a new era for Hatom. It's crucial for us to develop rapidly without sacrificing the quality or the security of each product. We're strategically allocating resources to ensure smooth progress in every area of our work. As we push forward, we believe that the launch of Soul Labs will be the most important milestone due to its massive potential and disruptive technology. We would like to thank you all for the unwavering support you've shown over the past few months; it truly fuels our passion to push daily and make strides toward achieving our ambitious goals.

As we prepare to launch several projects, we're eager to provide a general update to our community. We are steadily approaching our end goal, thanks to the daily progress we're making toward our vision. Achieving our objectives will bring about a significant transformation in cross-chain interoperability and the flow of liquidity within protocols. This will address crucial challenges and drive mass adoption. Our future-focused approach and effective team collaboration keep us moving forward in an organized manner. Let’s delve deeper into the state of development of our current products and upcoming projects. Tao Bridge Starting with the Tao Bridge, which enables the #Bittensor community to unlock DeFi opportunities with their $TAO via a highly efficient blockchain like #MultiversX, known for its security, speed, and affordability. We deeply admire #Bittensor and believe a project like that is crucial for the future of not just the crypto space but also humanity, as it addresses the major challenges AI faces today: centralization, siloed and isolated work, which pose risks and hinder the technology's potential. We are committed to the vision of subnets and dynamic $TAO, convinced that this ecosystem is as groundbreaking as #Ethereum or #Bitcoin. We will continue to support #Bittensor wherever possible, and our bridge will also expand to other chains with Hatom V2. The TAO Bridge, deployed on and accessible through will launch on the Mainnet in 14 days, on March 27th. You can follow the countdown on the lending page at Given that our main priorities are security and stability, this period will be primarily focused on quality assurance to ensure a flawless Mainnet launch. The launch will also introduce TAO Liquid Staking at along with the integration of both $wTAO and $swTAO on the lending page. This allows #Bittensor users to leverage liquid stake, employ short or long strategies, among other DeFi strategies, or simply access stablecoin liquidity while maintaining exposure to their $TAO. Up to $1M will be distributed as additional incentives on top of the supply APYs at the launch of the $wTAO and $swTAO money markets, with $200K allocated for the first month specifically for bootstrapping. Initially, 70% of rewards will go to liquidity providers, and 30% to those using $HTM to boost their lending positions. This changes to a 50-50 split in the second month, and by the third month, all incentives are directed through the Booster. This approach encourages early participation and sustained engagement with $HTM. Introducing $TAO to #MultiversX will result in the creation of Liquidity Pools (LPs) on both AshSwap 🔥 and xExchange ⚡. These LPs will be incentivized by both entities, and Hatom will distribute extra rewards at launch. The goal is to make #MultiversX a one-stop hub for $TAO holders. Upon stabilizing the volumes, there will also be plans to integrate it on AshPerp 🔥. Furthermore, with the release of $USH, users will have the ability to mint it while retaining exposure to their $TAO. The TAO Bridge and TAO Liquid Staking smart contracts have been audited by Runtime Vеrification and @arda_project, while penetration testing and DevSecOps have been performed on our infrastructure by CertiK. We're excited to announce our exclusive partnership with TAONEW one of the top 5 validators on #Bittensor. TAONEW has been extremely helpful and supportive from day one. By sharing 50% of its service fee with its stakers, TAONEW enables Hatom to offer an optimized Staking APY to its users. Since our initial reference, #Bittensor has grown sevenfold, becoming the largest AI project in the crypto sphere. We reiterate our commitment to contribute to such technology and hope to address some of its current DeFi challenges. Syfy Moving forward, today marks a significant milestone, not only for our decentralized protocols but also for our development companies, which currently stand as the sole and primary contributors to the Hatom Labs and Soul Labs. We’re excited to unveil Syfy, the evolved identity of Hatom Labs and Soul Labs, now serving as the parent entity for our burgeoning development companies. Organization is crucial for scalability, which is why Syfy was established to cultivate an environment where our teams can collaborate more seamlessly, enhancing our effectiveness and efficiency. At the same time, we remain committed to upholding the financial independence of each project, supported by its own community of funding contributors. Feel free to explore our website at for more information! Additionally, don't forget to follow Syfy and explore their Genesis article highlighted in their initial post: Booster V2 The Booster V2 will introduce a range of new features and opportunities for $HTM holders: Optimized Position Boosting: Previously, boosting was done individually for each money market, necessitating $HTM token distribution and periodic rebalancing due to price fluctuations. With Booster V2, the system now considers the overall position, eliminating the need for manual rebalancing. Gas Fee Reduction: Booster V2 implements optimizations that result in reduced gas fees, making transactions more cost-effective for users. Incorporation of Governance: Users staking $HTM tokens gain voting rights directly within the Booster, allowing them to participate in governance decisions while maintaining their staked positions. (Note: Only $HTM tokens are considered for governance; LP tokens are not included.) Enhanced Boosting Mechanism: The Booster V2 enables LP Tokens to boost positions within the Booster, leveraging trading fees from swaps and farm incentives while boosting lending positions. Smart Contract Completion: The Booster smart contract has been completed and audited by @arda_project, ensuring security and reliability. Frontend Implementation: The frontend design for Booster V2 has been successfully implemented, providing users with an intuitive interface. Collaboration with xExchange: Exploration is ongoing for collaboration with xExchange ⚡ to enable LP creation, farming, and meta-staking within the Booster. Upon finalization of testing, we will launch the Booster V2 on the devnet to gather community feedback and begin preparations for the mainnet release. Soul Before delving into Soul Labs's developments, it's essential to summarize its core functionality briefly: Soul Labs seamlessly connects different lending protocols and blockchains, facilitating lending and borrowing across platforms like Aave, Compound Labs, and Hatom Labs, consolidating liquidity and users' borrowing capabilities. Utilizing LayerZero Labs and other messaging layers for cross-chain communication, Soul Labs bypasses asset bridging or synthetics, unlocking novel DeFi strategies and solidifying its position as the ultimate solution for cross-lending dilemmas. Soul V1 will be permissionless, holding censorship-resistant features, incorporating multiple redundancy mechanisms, and providing support for various DApps. We're thrilled to announce that, following the launch of the Tao Bridge in 2-3 weeks, we will introduce the Soul Labs website. This platform has been meticulously crafted over 250 days to not only provide a comprehensive overview of our vision but also to offer an engaging and captivating experience that promises to be memorable. Regarding the app, significant progress has been made on the V1 protocol, including: Smart Contract Development and Testing: • Completion of the initial phase of smart contract development. • Conducting advanced testing to ensure the system's robustness. • Establishment of a fully functional proof of concept. Successful deployment and testing on the #Goerli (#Ethereum Testnet) and #Mumbai (#Polygon Testnet), leveraging LayerZero Labs for seamless operation. Feature Enhancement and Protocol Optimization: • Enhanced testing procedures to bolster system resilience. • Integration of advanced features and significant code refactoring for optimization. • Incorporation of various communication methods, including LayerZero Labs, Formerly Axelar, now at @axelar, Chainlink CCIP), and wormholecrypto, into Soul Labs framework, enhancing its resilience and flexibility. This allows Soul Labs to maintain operation through alternative protocols if the primary one is temporarily paused. Website Development and Documentation: • Nearing the completion of the v1 app, with final touches being applied. • The preparation of comprehensive V1 documentation and the Yellow Paper, available upon Soul Labs's public launch, offering detailed insights into the platform's infrastructure and capabilities. USH Recognizing the critical need for stable liquidity within the ecosystem, we have positioned ourselves at the forefront of providing a solution by introducing $USH, the first native, decentralized, and over-collateralized stablecoin on #MultiversX. As market conditions have improved, we have observed a growing demand for stablecoins in the ecosystem, evidenced by the utilization rate in the Lending Protocol spiking to over 90% several times in recent months. Therefore, our goal is to tackle the current challenges faced by users by creating a robust product that will not only help them hedge against market volatility but also open up better opportunities to trade the markets and generate yield. We're happy to unveil the $USH website, now live with a sleek and intuitive user interface, designed for ease of use, which ensures that interacting with the protocol is straightforward and accessible for all. You can access it now through this link: For the technical side, we’re advancing steadily and we’ve accomplished the following milestones: Lending Protocol Facilitator: • Coded the first version to support multiple discount factors for different collaterals. • Implemented tracking of borrowing effectiveness to enable earnings forecasting for the module and support minting processes. Isolated Pools Facilitator: • Coded the first version of Isolated Pools Facilitator. • Use of $EGLD or $sEGLD as collateral, with positions stored always in $EGLD to benefit the protocol through Liquid Staking and lending interest. • Virtual account implementation for converting $sEGLD earnings into $USH, functioning like liquidation where users deposit $USH for a higher amount of $HsELGD. Staking Module • Coded the first version of the Staking Module that allows users to stake and unstake without any restrictions. We're currently focusing our efforts on the following tasks: • Implementation of HTM Booster in the discount model in the Lending Protocol. • Implementation of different depeg strategies and brainstorming further potential “soft” depeg mechanisms. • Research and implementation of rewards model for Staking Module. • Research and implementation of Boosted Vaults Facilitator. • Review and stress-test the first version of the code. Upon launch, $USH will be integrated into various protocols and AMMs across the ecosystem, further increasing both its utility and liquidity. The opportunities will be vast, enabling users to engage in a wide range of activities such as yield farming, staking, and arbitrage, all while leveraging a stable and reliable asset. Regarding the USH Airdrop campaign, it will continue until the official launch of $USH planned for late Q2-early Q3, rewarding all users who have actively participated in the initiative. Hatom V2 It is clear by now that we are driven to build a more robust, interoperable, and secure DeFi space, removing the current barriers that hinder users' capabilities to seamlessly interact with different blockchains. Through Hatom V2, we will introduce Hatom's cross-chain architecture, designed from the ground up for interoperability. This approach will elevate the protocol to unprecedented levels, enabling its deployment across various blockchains and facilitating seamless connections between them through Soul. By enhancing interoperability, Hatom V2 aims to foster a more inclusive and accessible ecosystem. This expansion will not only broaden the protocol's reach but also significantly increase its flexibility and utility, allowing users to interact with a diverse range of assets and products across different chains. We’re thrilled to share that we are currently crafting the V2 redesign of the Hatom webpage. Anticipate a jaw-dropping transformation that will truly astonish, blending cutting-edge design with an unparalleled user experience, elevating it to a dynamic, interactive hub, and making every interaction more engaging. Good things take time, but we are confident that the release of V2 website will take place in the second quarter of this year and will officially mark the start of our journey into the cross-chain landscape. We are excited about the future and we truly believe that this will mark the beginning of a new era for Hatom. It's crucial for us to develop rapidly without sacrificing the quality or the security of each product. We're strategically allocating resources to ensure smooth progress in every area of our work. As we push forward, we believe that the launch of Soul Labs will be the most important milestone due to its massive potential and disruptive technology. We would like to thank you all for the unwavering support you've shown over the past few months; it truly fuels our passion to push daily and make strides toward achieving our ambitious goals.

Hatom Labs

203,486 次观看 • 2 年前

Big news! Today we are announcing that Shopify is acquiring the Threads team. There are a million feelings and thoughts from this journey, but nothing more than gratitude to all of our users and customers for building with us, my colleagues (both current and former) for making all of this possible, our investors, especially Mike Vernal, Elad Gil, Avichal - Electric ϟ Capital, and Jessica Verrilli for their unwavering support and guidance, and all of our friends and family who put up with the late nights, canceled plans, and the general roller coaster that is startup life. To our customers and users who have been asking, here’s how we got here: The past several months have been some of the most interesting and intense of my life. It all started with the rise of Instagram Threads, which presented us with the opportunity to sell our domains. Around the same time, a handful of companies approached us, wondering if we would be open to an acquisition. When this happened in the past, we would politely decline. However, this time, things were different. We weren't that excited about the time it would take to invest in a rebrand, and with mind-warping technological advances now being a commodity, we were excited about joining a place where we could tinker at scale. Each company we chatted with was incredible. However, what ultimately led us to choose Shopify over others was their culture; two distinct things in particular: 1) Craft-obsessed. Their obsession with not just building the right thing, but also building it the right way is inspiring. Sacrifice shows priority, and hearing stories about some of the hard decisions they made to ensure that what they ship is robust, scalable, and trustworthy, even at the cost of short-term metric gains, really proved that their obsession with craft was much more than a feel-good slogan. Their discussions and decisions have me truly believing they're going to be around for 100 years. 2) For entrepreneurs, by entrepreneurs. Just about every product and engineering leader I met was an ex-founder who grinded for years to turn nothing into something. They all still had that air of resilience, obsession with the details at every part of the stack, and a compelling vision of the future for whatever they were working on. Threads leadership is also made up of ex-founders, so the entrepreneurial focus at Shopify made it clear that it would be the best environment for us to grow and thrive. Very excited for this next chapter with the team and grateful for all of the support we’ve received over the years from those who believed in us. As always, thanks for (th) reading.

Big news! Today we are announcing that Shopify is acquiring the Threads team. There are a million feelings and thoughts from this journey, but nothing more than gratitude to all of our users and customers for building with us, my colleagues (both current and former) for making all of this possible, our investors, especially Mike Vernal, Elad Gil, Avichal - Electric ϟ Capital, and Jessica Verrilli for their unwavering support and guidance, and all of our friends and family who put up with the late nights, canceled plans, and the general roller coaster that is startup life. To our customers and users who have been asking, here’s how we got here: The past several months have been some of the most interesting and intense of my life. It all started with the rise of Instagram Threads, which presented us with the opportunity to sell our domains. Around the same time, a handful of companies approached us, wondering if we would be open to an acquisition. When this happened in the past, we would politely decline. However, this time, things were different. We weren't that excited about the time it would take to invest in a rebrand, and with mind-warping technological advances now being a commodity, we were excited about joining a place where we could tinker at scale. Each company we chatted with was incredible. However, what ultimately led us to choose Shopify over others was their culture; two distinct things in particular: 1) Craft-obsessed. Their obsession with not just building the right thing, but also building it the right way is inspiring. Sacrifice shows priority, and hearing stories about some of the hard decisions they made to ensure that what they ship is robust, scalable, and trustworthy, even at the cost of short-term metric gains, really proved that their obsession with craft was much more than a feel-good slogan. Their discussions and decisions have me truly believing they're going to be around for 100 years. 2) For entrepreneurs, by entrepreneurs. Just about every product and engineering leader I met was an ex-founder who grinded for years to turn nothing into something. They all still had that air of resilience, obsession with the details at every part of the stack, and a compelling vision of the future for whatever they were working on. Threads leadership is also made up of ex-founders, so the entrepreneurial focus at Shopify made it clear that it would be the best environment for us to grow and thrive. Very excited for this next chapter with the team and grateful for all of the support we’ve received over the years from those who believed in us. As always, thanks for (th) reading.

Rousseau Kazi

46,761 次观看 • 2 年前