Martian's banner

Martian

@withmartian • 3,502 subscribers

Understanding Intelligence. Measurement. Explanation. Application. That's how we're tackling AI interpretability: the greatest scientific problem of our age.

Shorts

🚀Introducing The LLM Inference Provider Leaderboard - a live-updated, unbiased eval of API Inference products. Featuring: Abacus.AI, Anyscale, DeepInfra, Decart, Fireworks AI, Lepton AI, Together AI, Perplexity, Replicate, as well as OpenAI and Anthropic models For each provider's Mixtral-8x7B and Llama-2-70B-Chat public endpoint, we benchmark cost, rate limit, P50 & P90 of throughput & TTFT, and average daily collections overtime for long term tracking. At Martian, we route each API request to the best LLM to reduce cost, reduce latency, and get the best performance. So finding the best providers is an important problem for us. We found that there's a > 5x cost difference, > 6x throughput variation, and even larger rate limit discrepancies among providers! Choosing between different LLMs is only part of the equation -- the selection of different inference endpoints is also crucial to get the best performance for your use case. See highlights of provider performance in🧵👇

🚀Introducing The LLM Inference Provider Leaderboard - a live-updated, unbiased eval of API Inference products. Featuring: Abacus.AI, Anyscale, DeepInfra, Decart, Fireworks AI, Lepton AI, Together AI, Perplexity, Replicate, as well as OpenAI and Anthropic models For each provider's Mixtral-8x7B and Llama-2-70B-Chat public endpoint, we benchmark cost, rate limit, P50 & P90 of throughput & TTFT, and average daily collections overtime for long term tracking. At Martian, we route each API request to the best LLM to reduce cost, reduce latency, and get the best performance. So finding the best providers is an important problem for us. We found that there's a > 5x cost difference, > 6x throughput variation, and even larger rate limit discrepancies among providers! Choosing between different LLMs is only part of the equation -- the selection of different inference endpoints is also crucial to get the best performance for your use case. See highlights of provider performance in🧵👇

128,898 просмотров

Videos

Anya Rossi

sweetdream.ai

SweetDream.ai•Sponsored•Livecam

Watch Anya Live

Anya is streaming live right now! Join her private show and enjoy exclusive content.

Exclusive private shows

1.2k viewers online

Private Show

Join now for exclusive access

Free preview available • Premium content

Announcing Ship: an endpoint with the highest intelligence per dollar of any frontier model. Today, Ship makes using Opus and GPT 5.6 Sol 50% cheaper guaranteed by a quality SLA.

Announcing Ship: an endpoint with the highest intelligence per dollar of any frontier model. Today, Ship makes using Opus and GPT 5.6 Sol 50% cheaper guaranteed by a quality SLA.

616,345 просмотров • 1 день назад

Introducing Code Review Bench v0: The first independent code review benchmark. 200,000+ PRs. Unbiased. Fully OSS. Updated daily. Tool performance highlights 🧵👇 Featuring: Augment Code baz Claude CodeRabbit Cursor Google Gemini GitHub Graphite Greptile Kilo OpenAI Developers Propel Qodo

Introducing Code Review Bench v0: The first independent code review benchmark. 200,000+ PRs. Unbiased. Fully OSS. Updated daily. Tool performance highlights 🧵👇 Featuring: Augment Code baz Claude CodeRabbit Cursor Google Gemini GitHub Graphite Greptile Kilo OpenAI Developers Propel Qodo

220,764 просмотров • 4 месяцев назад

The software factory is already here. We're seeing bots write code, bots review it, and humans reduced to dispatching the next tool in the chain. Using 500k+ PRs from Code Review Bench, we looked at one question: can the human leave the loop yet?

The software factory is already here. We're seeing bots write code, bots review it, and humans reduced to dispatching the next tool in the chain. Using 500k+ PRs from Code Review Bench, we looked at one question: can the human leave the loop yet?

49,102 просмотров • 3 месяцев назад

How good is Claude Code Review really, and is it worth $25+ per review? We scraped every OSS repo on GitHub that's using it to figure out how devs actually use it. Here's how it stacks up against 22 other tools: Featuring: Augment Code baz CodeAnt AI (YC W24) CodeRabbit Cognition cubic Cursor Google Gemini Greptile Kilo Kody from Kodus Mesa Qodo

How good is Claude Code Review really, and is it worth $25+ per review? We scraped every OSS repo on GitHub that's using it to figure out how devs actually use it. Here's how it stacks up against 22 other tools: Featuring: Augment Code baz CodeAnt AI (YC W24) CodeRabbit Cognition cubic Cursor Google Gemini Greptile Kilo Kody from Kodus Mesa Qodo

49,312 просмотров • 4 месяцев назад

We've been tracking AI code review tools across OSS, and a new category is emerging. We're calling it "Deep Review": → Standard AI review: PR-level, fast, human in the loop → Deep Review: repo-wide context, runs autonomously in the background 🧵👇

We've been tracking AI code review tools across OSS, and a new category is emerging. We're calling it "Deep Review": → Standard AI review: PR-level, fast, human in the loop → Deep Review: repo-wide context, runs autonomously in the background 🧵👇

40,442 просмотров • 3 месяцев назад

We’re open-sourcing a new tool to control how LLMs behave: k-steering. In just 10 lines of code, you can control multiple aspects of LLM behavior at the same time without any fine-tuning or prompt engineering. Here's how 👇

We’re open-sourcing a new tool to control how LLMs behave: k-steering. In just 10 lines of code, you can control multiple aspects of LLM behavior at the same time without any fine-tuning or prompt engineering. Here's how 👇

12,293 просмотров • 3 месяцев назад

Excited to make two announcements today! -- 1. We're partnering with Accenture to power their LLM Switchboard and >$1B in Gen AI Deployments. 2. We're launching Airlock, our LLM Compliance Automation tool.

Excited to make two announcements today! -- 1. We're partnering with Accenture to power their LLM Switchboard and >$1B in Gen AI Deployments. 2. We're launching Airlock, our LLM Compliance Automation tool.

11,056 просмотров • 1 год назад

Больше нет контента для загрузки