Martian's banner
Martian's profile picture

Martian

@withmartian2,620 subscribers

Understanding Intelligence. Measurement. Explanation. Application. That's how we're tackling AI interpretability: the greatest scientific problem of our age.

Shorts

🚀Introducing The LLM Inference Provider Leaderboard - a live-updated, unbiased eval of API Inference products. Featuring: Abacus.AI, Anyscale, DeepInfra, Decart, Fireworks AI, Lepton AI, Together AI, Perplexity, Replicate, as well as OpenAI and Anthropic models For each provider's Mixtral-8x7B and Llama-2-70B-Chat public endpoint, we benchmark cost, rate limit, P50 & P90 of throughput & TTFT, and average daily collections overtime for long term tracking. At Martian, we route each API request to the best LLM to reduce cost, reduce latency, and get the best performance. So finding the best providers is an important problem for us. We found that there's a > 5x cost difference, > 6x throughput variation, and even larger rate limit discrepancies among providers! Choosing between different LLMs is only part of the equation -- the selection of different inference endpoints is also crucial to get the best performance for your use case. See highlights of provider performance in🧵👇

🚀Introducing The LLM Inference Provider Leaderboard - a live-updated, unbiased eval of API Inference products. Featuring: Abacus.AI, Anyscale, DeepInfra, Decart, Fireworks AI, Lepton AI, Together AI, Perplexity, Replicate, as well as OpenAI and Anthropic models For each provider's Mixtral-8x7B and Llama-2-70B-Chat public endpoint, we benchmark cost, rate limit, P50 & P90 of throughput & TTFT, and average daily collections overtime for long term tracking. At Martian, we route each API request to the best LLM to reduce cost, reduce latency, and get the best performance. So finding the best providers is an important problem for us. We found that there's a > 5x cost difference, > 6x throughput variation, and even larger rate limit discrepancies among providers! Choosing between different LLMs is only part of the equation -- the selection of different inference endpoints is also crucial to get the best performance for your use case. See highlights of provider performance in🧵👇

128,665 просмотров

Videos

Больше нет контента для загрузки