Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

I trained a 12M parameter LLM on my own ML framework using a Rust backend and CUDA kernels for flash attention, AdamW, and more. Wrote the full transformer architecture, and BPE tokenizer from scratch. The framework features: - Custom CUDA kernels (Flash Attention, fused LayerNorm, fused GELU) for 3x... increased throughput - Automatic WebGPU fallback for non-NVIDIA devices - TypeScript API with Rust compute backend - One npm install to get started, prebuilt binaries for every platform Try out the model for yourself: Built with Reese Chong. Check out the repos and blog if you want to learn more. Shoutout to Modal for the compute credits allowing me to train on 2 A100 GPUs without going broke cc sunny madra Gavinshow more

Aadi Kulshrestha

3,449 subscribers

813,943 Aufrufe • vor 3 Monaten •via X (Twitter)

Bildung Wissenschaft & Technologie

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

I implemented Google Research's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA loads Try it out: s/o Bryce, the CUDA Colonel and the cuTile team at NVIDIA for lending me Blackwell GPU access :) cc sunny madra Gavin

I implemented Google Research's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA loads Try it out: s/o Bryce, the CUDA Colonel and the cuTile team at NVIDIA for lending me Blackwell GPU access :) cc sunny madra Gavin

ani

809,610 Aufrufe • vor 3 Monaten

Check out mistral.rs, our #Rust-based open source inference engine allowing for fast #LLM serving for a variety of architectures including X-LoRA mixture-of-expert (MoE) models, Llama-3, Mistral/Mixtral, Gemma & many others. Built on the Hugging Face #Candle framework for #Rust w/ custom CUDA kernels in the backend (as well as support for Metal, Apple Accelerate, and Intel MKL for CPU use), you can easily create a REST API OpenAI compatible server or run via Python bindings. Key features include: ✅Prefix caching, continuous batching ✅Flash Attention V2 ✅Device offloading ✅GGUF or Hugging Face models ✅2, 3, 4, 5, 6 and 8 bit quantization ✅X-LoRA MoE non-granular scalings for fast inference ✅Grammar support ✅Continuous batching ✅LoRA support with weight merging ✅LlamaIndex 🦙 integration ...and much more. Incorporation into our GraphReasoning multi-agent modeling framework & LlamaIndex 🦙 allows you to combine in-context learning with adversarial agentic strategies, to dive deep into complex scientific analyses, such as to predict material behaviors, generate hypotheses, analyze papers and data, develop new research concepts, and much more. Check out mistral.rs: Join our Discord here: Rust Trending Rust Language

Check out mistral.rs, our #Rust-based open source inference engine allowing for fast #LLM serving for a variety of architectures including X-LoRA mixture-of-expert (MoE) models, Llama-3, Mistral/Mixtral, Gemma & many others. Built on the Hugging Face #Candle framework for #Rust w/ custom CUDA kernels in the backend (as well as support for Metal, Apple Accelerate, and Intel MKL for CPU use), you can easily create a REST API OpenAI compatible server or run via Python bindings. Key features include: ✅Prefix caching, continuous batching ✅Flash Attention V2 ✅Device offloading ✅GGUF or Hugging Face models ✅2, 3, 4, 5, 6 and 8 bit quantization ✅X-LoRA MoE non-granular scalings for fast inference ✅Grammar support ✅Continuous batching ✅LoRA support with weight merging ✅LlamaIndex 🦙 integration ...and much more. Incorporation into our GraphReasoning multi-agent modeling framework & LlamaIndex 🦙 allows you to combine in-context learning with adversarial agentic strategies, to dive deep into complex scientific analyses, such as to predict material behaviors, generate hypotheses, analyze papers and data, develop new research concepts, and much more. Check out mistral.rs: Join our Discord here: Rust Trending Rust Language

Markus J. Buehler

73,581 Aufrufe • vor 2 Jahren

>build a 1.5B VLA on top of MiniCPM-V 4.6 >compete with π0.5 and Qwen-VLA >use roughly half the visual-processing compute >remember previous steps without blowing up inference cost >pull the network plug >let the robot keep tracking — fully local, fully onboard >with CUDA Graph optimization and custom Triton fused kernels, MiniCPM-Robot inference throughput goes from 10 Hz → 33 Hz, and up to 36 Hz on the same NVIDIA H20 >release the models and framework as open source so why does robotics still rely on bigger models and the cloud? OpenBMB just dropped MiniCPM-Robot at WAIC two models + one inference framework 🧵

>build a 1.5B VLA on top of MiniCPM-V 4.6 >compete with π0.5 and Qwen-VLA >use roughly half the visual-processing compute >remember previous steps without blowing up inference cost >pull the network plug >let the robot keep tracking — fully local, fully onboard >with CUDA Graph optimization and custom Triton fused kernels, MiniCPM-Robot inference throughput goes from 10 Hz → 33 Hz, and up to 36 Hz on the same NVIDIA H20 >release the models and framework as open source so why does robotics still rely on bigger models and the cloud? OpenBMB just dropped MiniCPM-Robot at WAIC two models + one inference framework 🧵

arc.

48,232 Aufrufe • vor 4 Tagen

The latest MLX has a CUDA back-end! To get started: pip install "mlx[cuda]" With the same codebase you can develop locally, run your model on Apple silicon, or in the cloud on Nvidia GPUs. MLX is designed around Apple silicon - which has a unified memory architecture. It uses the same design with CUDA. So there's no need to move arrays around from CPU memory to GPU memory. Note, this is early days - some operations are missing and performance is still being optimized. But it's already quite fast for Transformer training, text generation, and more! Here's a demo using mlx-lm to generate text with Llama 3 8B (bf16) on an A100:

The latest MLX has a CUDA back-end! To get started: pip install "mlx[cuda]" With the same codebase you can develop locally, run your model on Apple silicon, or in the cloud on Nvidia GPUs. MLX is designed around Apple silicon - which has a unified memory architecture. It uses the same design with CUDA. So there's no need to move arrays around from CPU memory to GPU memory. Note, this is early days - some operations are missing and performance is still being optimized. But it's already quite fast for Transformer training, text generation, and more! Here's a demo using mlx-lm to generate text with Llama 3 8B (bf16) on an A100:

Awni Hannun

42,761 Aufrufe • vor 1 Jahr

Just launched #CES2026, the new open-source NVIDIA Nemotron Speech ASR model is here to solve latency drift and redundant compute. Its cache-aware streaming architecture eliminates the need for buffered inference, giving you stable, sub-100ms latency (24ms median T-T-F) and up to 3x more throughput on your GPU. 🤗 Read the technical blog with real-world results from Daily and Modal on Hugging Face:

Just launched #CES2026, the new open-source NVIDIA Nemotron Speech ASR model is here to solve latency drift and redundant compute. Its cache-aware streaming architecture eliminates the need for buffered inference, giving you stable, sub-100ms latency (24ms median T-T-F) and up to 3x more throughput on your GPU. 🤗 Read the technical blog with real-world results from Daily and Modal on Hugging Face:

NVIDIA AI Developer

138,518 Aufrufe • vor 6 Monaten

Congrats to the Kimi.ai team! This is awesome. Great to see this level of research coming from open-source frontier model labs. I liked the paper so much I built a Rust implementation of it ;) Full AttnRes + Block AttnRes with two-phase inference, built using Burn (tensor library and Deep Learning Framework, in Rust, by Tracel AI). Runs on CPU, CUDA, Metal, wgpu. Includes an interactive TUI that trains a model live and visualizes depth attention evolving from uniform to selective in real time. Repo link and more on what is implemented in the comments.

Congrats to the Kimi.ai team! This is awesome. Great to see this level of research coming from open-source frontier model labs. I liked the paper so much I built a Rust implementation of it ;) Full AttnRes + Block AttnRes with two-phase inference, built using Burn (tensor library and Deep Learning Framework, in Rust, by Tracel AI). Runs on CPU, CUDA, Metal, wgpu. Includes an interactive TUI that trains a model live and visualizes depth attention evolving from uniform to selective in real time. Repo link and more on what is implemented in the comments.

abdel

95,237 Aufrufe • vor 4 Monaten

Inkling, Thinking Machines' first open model, dropped today: 975B total / 41B active MoE, up to 1M context, reasoning natively over text, images, and audio. Serving and RL support are already live: you can run and shape it on an open stack, starting now. Day 0 support on SGLang SGLang and Miles RadixArk👇 - Inkling's new architecture (ShortConv, attention with relative positional embedding, shared expert sink MoE) is natively implemented and deeply optimized, with prefill full CUDA graph and MXFP8 KV cache - Full parameter and LoRA RL in a customized Megatron backend, train inference consistency via customized kernels, routing replay, and cross-runtime parameter synchronization - DFlash speculative decoding from Modal for low-latency serving Launching now, blog and cookbook in the comments ⬇️

Inkling, Thinking Machines' first open model, dropped today: 975B total / 41B active MoE, up to 1M context, reasoning natively over text, images, and audio. Serving and RL support are already live: you can run and shape it on an open stack, starting now. Day 0 support on SGLang SGLang and Miles RadixArk👇 - Inkling's new architecture (ShortConv, attention with relative positional embedding, shared expert sink MoE) is natively implemented and deeply optimized, with prefill full CUDA graph and MXFP8 KV cache - Full parameter and LoRA RL in a customized Megatron backend, train inference consistency via customized kernels, routing replay, and cross-runtime parameter synchronization - DFlash speculative decoding from Modal for low-latency serving Launching now, blog and cookbook in the comments ⬇️

LMSYS Org

143,172 Aufrufe • vor 13 Tagen

Ubuntu 26.04 LTS, codenamed #ResoluteRaccoon, is now available to download. 🦝 Resolute Raccoon builds on the resilience-focused improvements introduced in interim releases, with TPM-backed full-disk encryption, improved support for application permission prompting, Livepatch updates for Arm-based servers, and Rust-based utilities for enhanced memory safety. This release also brings native support for industry-leading AI/ML toolkits like NVIDIA CUDA and AMD ROCm, making Ubuntu 26.04 LTS the ideal platform for AI development and production workloads. Install now: Learn more about the release:

Ubuntu 26.04 LTS, codenamed #ResoluteRaccoon, is now available to download. 🦝 Resolute Raccoon builds on the resilience-focused improvements introduced in interim releases, with TPM-backed full-disk encryption, improved support for application permission prompting, Livepatch updates for Arm-based servers, and Rust-based utilities for enhanced memory safety. This release also brings native support for industry-leading AI/ML toolkits like NVIDIA CUDA and AMD ROCm, making Ubuntu 26.04 LTS the ideal platform for AI development and production workloads. Install now: Learn more about the release:

Ubuntu

652,410 Aufrufe • vor 3 Monaten

$Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. Subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.$

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. Subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

Alexander Whedon

13,199,594 Aufrufe • vor 2 Monaten

Finally, I'm done with the basics of backend engineering on the all-in-one resources to learn backend engineering I'm working on This basic section includes - Internet - HTTP - Servers - Web Dev fundamentals - Fundamentals of Operating Systems - A server-side language -> JavaScript | Go | Rust | Node - A framework -> Express | Nestjs | Django | Laravel | Spring Boot This is the boring part. Don't worry. We are getting into the fun part soon. Still working on it, updating it, Adding more, and making it better for you. Follow me. Retweet. Comment "Backend." I will DM you the PDF version when it's out. Enjoy the web version, and send in your feedback on how we can improve it for everyone.

Finally, I'm done with the basics of backend engineering on the all-in-one resources to learn backend engineering I'm working on This basic section includes - Internet - HTTP - Servers - Web Dev fundamentals - Fundamentals of Operating Systems - A server-side language -> JavaScript | Go | Rust | Node - A framework -> Express | Nestjs | Django | Laravel | Spring Boot This is the boring part. Don't worry. We are getting into the fun part soon. Still working on it, updating it, Adding more, and making it better for you. Follow me. Retweet. Comment "Backend." I will DM you the PDF version when it's out. Enjoy the web version, and send in your feedback on how we can improve it for everyone.

Solomon Eseme

92,196 Aufrufe • vor 3 Jahren

Mixtral 8x7B Instruct with AWQ & Flash Attention 2 🔥 All in ~24GB GPU VRAM! With the latest release of AutoAWQ - you can now run Mixtral 8x7B MoE with Flash Attention 2 for blazingly fast inference. All in < 10 lines of code. The only real change except loading AWQ weights is to pass attn_implementation="flash_attention_2" over to the .from_pretrained call whilst loading the model. Here's a full run through: 1. Install AutoAWQ and transformers pip install autoawq git+ com/huggingface/transformers.git 2. Initialise the tokeniser and the model from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer model_id = "casperhansen/mixtral-instruct-awq" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, low_cpu_mem_usage=True, device_map="cuda:0", attn_implementation="flash_attention_2") 3. Initialise the TextStreamer streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) 4. Tokenise the inputs tokens = tokenizer( text, return_tensors='pt' ).input_ids.to("cuda:0") 5. Generate! generation_output = model.generate( tokens, streamer=streamer, max_new_tokens=512 ) That's it! 🤗

Mixtral 8x7B Instruct with AWQ & Flash Attention 2 🔥 All in ~24GB GPU VRAM! With the latest release of AutoAWQ - you can now run Mixtral 8x7B MoE with Flash Attention 2 for blazingly fast inference. All in < 10 lines of code. The only real change except loading AWQ weights is to pass attn_implementation="flash_attention_2" over to the .from_pretrained call whilst loading the model. Here's a full run through: 1. Install AutoAWQ and transformers pip install autoawq git+ com/huggingface/transformers.git 2. Initialise the tokeniser and the model from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer model_id = "casperhansen/mixtral-instruct-awq" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, low_cpu_mem_usage=True, device_map="cuda:0", attn_implementation="flash_attention_2") 3. Initialise the TextStreamer streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) 4. Tokenise the inputs tokens = tokenizer( text, return_tensors='pt' ).input_ids.to("cuda:0") 5. Generate! generation_output = model.generate( tokens, streamer=streamer, max_new_tokens=512 ) That's it! 🤗

Vaibhav (VB) Srivastav

128,901 Aufrufe • vor 2 Jahren

I added KV caching and INT8 KV quantization to our transformer inference, improving throughput by 35x. All of this was done from scratch in Rust + CUDA, on top of a homemade ML framework. On a 4-token prompt with 252 generated tokens: - Original: 0.76 tok/s - KV cache fp32: 27.21 tok/s - KV cache int8 (quantized): 27.29 tok/s Try it out yourself here: In practice: - KV caching gave us about a 35x end-to-end speedup - INT8 KV cache kept roughly the same speed as fp32 but cut KV cache memory by 3.78x FP32 cache used 4.5 MB in this run while the INT8 cache used only 1.19 MB This simple change to inference created a huge impact on performance. To learn more about the KV cache and other optimizations like this, check out the blog at

I added KV caching and INT8 KV quantization to our transformer inference, improving throughput by 35x. All of this was done from scratch in Rust + CUDA, on top of a homemade ML framework. On a 4-token prompt with 252 generated tokens: - Original: 0.76 tok/s - KV cache fp32: 27.21 tok/s - KV cache int8 (quantized): 27.29 tok/s Try it out yourself here: In practice: - KV caching gave us about a 35x end-to-end speedup - INT8 KV cache kept roughly the same speed as fp32 but cut KV cache memory by 3.78x FP32 cache used 4.5 MB in this run while the INT8 cache used only 1.19 MB This simple change to inference created a huge impact on performance. To learn more about the KV cache and other optimizations like this, check out the blog at

Reese Chong

52,588 Aufrufe • vor 3 Monaten

New course: Build and Train an LLM with JAX, built in partnership with Google and taught by Chris Achard. JAX is the open-source library behind Google's Gemini, Veo, and other advanced models. This short course teaches you to build and train a 20-million parameter language model from scratch using JAX and its ecosystem of tools. You'll implement a complete MiniGPT-style architecture from scratch, train it, and chat with your finished model through a graphical interface. Skills you'll gain: - Learn JAX's core primitives: automatic differentiation, JIT compilation, and vectorized execution - Build a MiniGPT-style LLM using Flax/NNX, implementing embedding and transformer blocks - Load a pretrained MiniGPT model and run inference through a chat interface Come learn this important software layer for building LLMs!

New course: Build and Train an LLM with JAX, built in partnership with Google and taught by Chris Achard. JAX is the open-source library behind Google's Gemini, Veo, and other advanced models. This short course teaches you to build and train a 20-million parameter language model from scratch using JAX and its ecosystem of tools. You'll implement a complete MiniGPT-style architecture from scratch, train it, and chat with your finished model through a graphical interface. Skills you'll gain: - Learn JAX's core primitives: automatic differentiation, JIT compilation, and vectorized execution - Build a MiniGPT-style LLM using Flax/NNX, implementing embedding and transformer blocks - Load a pretrained MiniGPT model and run inference through a chat interface Come learn this important software layer for building LLMs!

Andrew Ng

192,696 Aufrufe • vor 4 Monaten

If you have compute hungry AI ML tasks, you can spin up an Intel Gaudi2 instance and watch it just rip. I wrote a quick Python script to use the facebook/detr-resnet-50 model (🙏🏻 Hugging Face) to detect and label objects in a video. It's interesting that no one is talking about Intel Gaudi vs NVIDIA A100 or H100. Gaudi2 isn't cheap at $10.42/hour but it feels like a viable alternative to fighting someone for an A100 or H100. Here's my code if you want to try it out:

If you have compute hungry AI ML tasks, you can spin up an Intel Gaudi2 instance and watch it just rip. I wrote a quick Python script to use the facebook/detr-resnet-50 model (🙏🏻 Hugging Face) to detect and label objects in a video. It's interesting that no one is talking about Intel Gaudi vs NVIDIA A100 or H100. Gaudi2 isn't cheap at $10.42/hour but it feels like a viable alternative to fighting someone for an A100 or H100. Here's my code if you want to try it out:

Ryan Carson

41,328 Aufrufe • vor 2 Jahren

Cerebras just had the biggest IPO of the year. Founder Andrew Feldman says the 3 most important things he had to convince investors of while doing the roadshow were that demand for inference is going to 1,000,000x, the GPU isn't the only way to do compute, and that the CUDA moat is overstated. What he said: "Jensen said some time ago on Brad Gerstner's podcast that the demand for inference will grow by a 1,000,000x, and nobody believed him. And at the same time, you saw Sam Altman displaying real vision and going out and trying to lock up huge amounts of compute, memory, data centers, and power, because he saw it too." "[We tried] to share what that means — what exponential demand means. And that we're still so early, and yet the demand for AI compute is overwhelming." "The other thing is that there are lots of ways to do this. The GPU isn't the only way. You've got TPUs, Trainium, and us. There are lots of different ways to build a solution here." "And finally — the notion that CUDA is this grand lock-in is overplayed. Gemini 3, which is an excellent model, was trained on TPUs with no CUDA. The Anthropic models were trained on Trainium with no CUDA. Some of the best models, some of the most interesting things are being done without CUDA. And that lock-in might be overplayed." $CBRS

Cerebras just had the biggest IPO of the year. Founder Andrew Feldman says the 3 most important things he had to convince investors of while doing the roadshow were that demand for inference is going to 1,000,000x, the GPU isn't the only way to do compute, and that the CUDA moat is overstated. What he said: "Jensen said some time ago on Brad Gerstner's podcast that the demand for inference will grow by a 1,000,000x, and nobody believed him. And at the same time, you saw Sam Altman displaying real vision and going out and trying to lock up huge amounts of compute, memory, data centers, and power, because he saw it too." "[We tried] to share what that means — what exponential demand means. And that we're still so early, and yet the demand for AI compute is overwhelming." "The other thing is that there are lots of ways to do this. The GPU isn't the only way. You've got TPUs, Trainium, and us. There are lots of different ways to build a solution here." "And finally — the notion that CUDA is this grand lock-in is overplayed. Gemini 3, which is an excellent model, was trained on TPUs with no CUDA. The Anthropic models were trained on Trainium with no CUDA. Some of the best models, some of the most interesting things are being done without CUDA. And that lock-in might be overplayed." $CBRS

TBPN

36,474 Aufrufe • vor 2 Monaten

Excited to announce the new preview for Microsoft Foundry Agents 🎉! You can now build, run, and deploy your agent using any model, any framework, any harness in the cloud 🧑‍💻 - check out the demo below This is not just any cloud compute environment; it's an agent-optimized platform with: 🖥️ Persistent microVMs - securely scale up and down without losing context 🛠️ Built-in tools (1000+) 👀 Observability and evaluations 👷 Guardrails 🔐 Private networking... and more

Excited to announce the new preview for Microsoft Foundry Agents 🎉! You can now build, run, and deploy your agent using any model, any framework, any harness in the cloud 🧑‍💻 - check out the demo below This is not just any cloud compute environment; it's an agent-optimized platform with: 🖥️ Persistent microVMs - securely scale up and down without losing context 🛠️ Built-in tools (1000+) 👀 Observability and evaluations 👷 Guardrails 🔐 Private networking... and more

Jeff Hollan

157,752 Aufrufe • vor 3 Monaten

The freedom to build with any agent framework you want is analogous to the freedom of religion, and this is a core value for the Virtuals society. Here are a few ways we are opening up support for every autonomous agentic framework out there: Today: - Enabling agent creators to specify which framework the agent runs on - Release of Terminal API, which allows agent builders using other agent frameworks to stream their thoughts and activities live on their agent pages. For a guide on how to use this, refer to: - Ability for agents running on any framework to publicly list their capabilities, a key enabler for the agent-to-agent marketplace Soon: - Integration into the agentic commerce standard and registry. Agents across frameworks being able to orchestrate and pay for tasks among themselves - ⁠Multi agent orchestration across frameworks. Imagine building an autonomous business with agents from different religions Welcome to the Virtuals society.

The freedom to build with any agent framework you want is analogous to the freedom of religion, and this is a core value for the Virtuals society. Here are a few ways we are opening up support for every autonomous agentic framework out there: Today: - Enabling agent creators to specify which framework the agent runs on - Release of Terminal API, which allows agent builders using other agent frameworks to stream their thoughts and activities live on their agent pages. For a guide on how to use this, refer to: - Ability for agents running on any framework to publicly list their capabilities, a key enabler for the agent-to-agent marketplace Soon: - Integration into the agentic commerce standard and registry. Agents across frameworks being able to orchestrate and pay for tasks among themselves - ⁠Multi agent orchestration across frameworks. Imagine building an autonomous business with agents from different religions Welcome to the Virtuals society.

Virtuals Protocol

216,105 Aufrufe • vor 1 Jahr

This is Arbigab. The most advanced, off the shelf, profitable arbitrage trading bot for Polymarket & Kalshi. Built by senior developers and trading veterans, this software empowers beginners to capture their share of the market! To get the bot NOW or get more information, visit: Below is a live trading session using our rust trading engine for arbitrage on the BTC up/down 5 minutes market. It's a sample of 2 iterations. You can find this terminal on the website front page. It is connected to our production backend (with a 2 minutes delay).

This is Arbigab. The most advanced, off the shelf, profitable arbitrage trading bot for Polymarket & Kalshi. Built by senior developers and trading veterans, this software empowers beginners to capture their share of the market! To get the bot NOW or get more information, visit: Below is a live trading session using our rust trading engine for arbitrage on the BTC up/down 5 minutes market. It's a sample of 2 iterations. You can find this terminal on the website front page. It is connected to our production backend (with a 2 minutes delay).

Gabagool 22 Trading Bot

18,526 Aufrufe • vor 4 Monaten

working theory for a new vibe code workflow: 1. create a proof of concept in v0 2. move to github and clone in a local repo 3. use as ref for agent planning 4. create a backend directory to test API plumbing 5. create a frontend directory to prototype iteratively this way you can repeatedly rip apart the frontend for ideation without worrying about breaking changes to the backend. i really like asking for an open ended "job to be done" (i.e. place a bet with a single tap) with 5-10 variations to feel. it's the modern day wireframing. it's still UI slop but i can get ideas in my head out faster and reveal design shortcomings before polishing or worrying about making components functional. and because the v0 prototype gets you to functional ai features and backend integration fastest, you have both a technical and conceptual reference to use in your agentic planning so your v1 is more predictable with minimal effort.

working theory for a new vibe code workflow: 1. create a proof of concept in v0 2. move to github and clone in a local repo 3. use as ref for agent planning 4. create a backend directory to test API plumbing 5. create a frontend directory to prototype iteratively this way you can repeatedly rip apart the frontend for ideation without worrying about breaking changes to the backend. i really like asking for an open ended "job to be done" (i.e. place a bet with a single tap) with 5-10 variations to feel. it's the modern day wireframing. it's still UI slop but i can get ideas in my head out faster and reveal design shortcomings before polishing or worrying about making components functional. and because the v0 prototype gets you to functional ai features and backend integration fastest, you have both a technical and conceptual reference to use in your agentic planning so your v1 is more predictable with minimal effort.

0xDesigner

14,912 Aufrufe • vor 7 Monaten

A sneak peek on the multiplayer mode I'm adding to my game for @levelsio's 2026 #vibejam I'm really happy with how this is turning out! Big shoutout to Colyseus ⚔️ for such a nice framework. I fetched the whole docs, created a colyseus-skill.md file and it's pretty much one shotting everything (still need some tweaks and testing, of course) I'm just not very happy with the single character model, it looks weird on multiplayer. I might add more if I get the time.

A sneak peek on the multiplayer mode I'm adding to my game for @levelsio's 2026 #vibejam I'm really happy with how this is turning out! Big shoutout to Colyseus ⚔️ for such a nice framework. I fetched the whole docs, created a colyseus-skill.md file and it's pretty much one shotting everything (still need some tweaks and testing, of course) I'm just not very happy with the single character model, it looks weird on multiplayer. I might add more if I get the time.

André → andreelias.dev

10,865 Aufrufe • vor 3 Monaten