Загрузка видео...

Не удалось загрузить видео

На главную

MLX + TurboQuant = Local Super Power Take a local (private) document(s) or codebase, pre-fill the 256k KV cache (the context) with the document(s) and system prompt, quantise and run on Apple's MLX and you have almost instantaneous, lossless document queries with total privacy. For a 75-page PDF (some...

85,589 просмотров • 2 месяцев назад •via X (Twitter)

Комментарии: 0

Нет доступных комментариев

Здесь появятся комментарии из оригинального поста

Похожие видео

The same kinds of productivity gains we've seen in coding with AI agents are heading to the rest of knowledge work. This is the jump when you go from having a chatbot to being able to actually have an agent go off and do work for minutes or even hours and come back with a complete work output that you then review. Here's an example of the new Box Agent filling out an RFP response from an existing knowledge base. This process would normally take hours to fill out, and requires the full attention of the user doing the work. Now, you provide the Box Agent with the RFP questions, and it will go off, make a plan, extract all the relevant questions, read through existing source material to come up with an answer, and then generate a new word document as the final output. All while you're doing something else. The key to this architecture is that the agent is able to use all of the same tools in the background that a user uses to get work done. The agent can search for documents, read entire files, run scripts and tools in the background, and even be able to write code on the fly to automate tasks it hasn't seen before. And best of all, the Box Agent will (soon) work from the Box MCP and CLI so you can invoke it in any agentic system as a step in a process. This kind of agent complexity would have been impossible even 6 months ago. Models consistently failed at tracking long running tasks or using the right tools at the right moment for the task. But this is all now possible because of models like GPT-5.4, Opus 4.6, and Gemini 3, and is only getting better by the month. Just as we moved from engineers writing code and using AI as an assistant to answer questions, in many areas of knowledge work -like legal, finance, consulting, sales, marketing, and more- when we have a problem we'll just kick off the AI agent to just go work on it for us in the background.

Aaron Levie

24,618 просмотров • 2 месяцев назад

This Chinese developer runs 9 agents on Claude Code under a GPT-5.5 orchestrator and they close 500 client tasks a month without a single assistant. His client work is closed without him, on a single laptop and only three subscriptions. The entire system lives on one MacBook Pro M4 with 128 GB of memory and subscriptions to Claude Code and GPT-5.5 cost him approximately $300 a month. There is no CRM, no team, no office only a terminal window with 9 parallel streams. The orchestrator works with a simple system prompt: «You are the orchestrator of a client inbox. Classify every incoming email into 4 categories: code, content, analysis, communication. Delegate to the corresponding worker agent. When the result is ready, check it for completeness, send it to the client on my behalf, and mark the task as closed. Do not ask clarifying questions.» And the orchestrator checks the inbox every 30 seconds, classifies fresh emails, and distributes them to 9 worker agents on Claude Code, each of whom is responsible for their own class of tasks. Here is an example of how one of them closes a request to refactor a client's auth module: Task: refactor user-auth module Broke the monolith into 3 files by responsibilities Added unit tests, coverage increased to 87% Renamed 4 functions to camelCase according to the style guide PR is ready for review, link below» And so about 50 cycles a day. By noon 25 tasks are closed, by dinner 50, and by the end of the month 500. On average, it takes about 7 minutes from the appearance of an email in the inbox to sending the result to the client. This is more than what a live team of 6 developers, copywriters and analysts working 8 hours a day closes. This is no longer an agency. This is a workstation where an orchestrator replaces a manager, and 9 worker agents replace the staff. The pipeline goes from inbox to closing 500 times a month without human participation at any step.

Blaze

29,917 просмотров • 1 месяц назад

✨ I spent the last 48 hours making GPT-4 read the entire Solana validator codebase and write documentation, so doesn't have to. Introducing — an AI-powered chatbot trained on nothing but code that can answer deep technical questions. How it works 👇 But first... A huge shoutout to , Zahid Khawaja, and Sean. Their hard work made prototyping this thing a breeze. Without further ado... Devs like to write code, not documentation. Tribal knowledge is lost when devs move on to other projects, leaving future devs to sort through mountains of code and figure out not just how it works, but why it works that way. This is all about to change. GPT-4's ability to write code is stunning. It seems to understand something fundamental about writing software that previous models just didn't. This comprehension of the principles that drive the design behind a complex system carries over into its ability to document existing codebases in a truly impressive way. With the enlarged context window(s), it's now feasible to feed GPT-4 entire files of code and ask it to write documentation about how the code works. Taking this as a starting point, the process looks something like this: 1. Download repo. 2. Depth-first traversal of repo contents, ignoring things like package-lock and binary files. 3. For each file, feed to GPT-4 and ask it to write documentation in markdown. 4. Save the output in a separate location as [outputRoot]/[inputFilepath][inputFilename].md 5. For each folder, we ask GPT-4 to write a summary of the folder, taking the newly generated documentation for all files in the folder and the summaries from each of its subfolders as context. Write this to the filesystem as markdown. Now we have a filesystem that matches the structure of the input repo, but all files in the tree are markdown documentation of the corresponding code file. From here, we: 1. Load markdown documents into LangChain. 2. Embed all documents via OpenAI embeddings. 3. Store embeddings in Pinecone. When a user sends a query: 1. Embed query. 2. Find k-nearest markdown files. 3. Feed to GPT-4 with a prompt asking to answer the query based on k-nearest markdown documents provided. The craziest part of all this? GPT-4 actually wrote ~30% of the code. The results are pretty good for 2 days of work. There is certainly room for improvement. Some items that are top of mind: 1. TolyGPT will occasionally hallucinate answers. It is especially bad with links to external sources, like GitHub. The base model seems to know a bit about Solana already, and sometimes this creeps in. Fine-tuning the prompt can solve some of this. 2. Context selection is difficult in a codebase this large. For example, sometimes it will pull in details about the Solana SDK when asked about transaction processing. The SDK files can seem relevant depending on the phrasing of the question. It may be worth breaking the documentation into subsystems to limit this. 3. Not all files fit into the 32k token window. As of now, there are 23 (out of ~1,100) files that cannot be documented in their entirety. Some of these files are very important to how Solana works. Final thoughts: 1. GPT-4 is super powerful, and we're going to see a ton of tools that supercharge the entire software development lifecycle. This is not 12 months away. For the people that can afford it, these tools are here now. And they're only getting better. Act accordingly. 2. The price of inference has to come down for this to go mainstream. I spent about $300 prototyping this project, and the final crawl cost about the same. The high cost of GPT-4 will push developers to other, cheaper alternatives with similar performance. This is coming very soon. If you have a large software project and you're interested in something like this for your codebase, fill out this form and we'll be in touch this week. Or just DM me :)

Sam Hogan 🇺🇸

374,577 просмотров • 3 лет назад