Loading video...

Video Failed to Load

Go Home

I built a natural language CLI. It generates Python scripts to answer your question, then auto-executes them in the cwd. You will not believe how capable this simple pattern is. Rawdogging gpt-4 from the command line. Rawdog. 1/

431,425 views • 2 years ago •via X (Twitter)

11 Comments

Grant♟️'s profile picture
Grant♟️2 years ago

Conversational git, docker, matplotlib, pandas.. Summarize, scan or refactor basically any type of data “Follow the instructions in readme to set this up..” “Find all my venvs and plot their sizes” “Do I have Ruby installed?” 2/

Grant♟️'s profile picture
Grant♟️2 years ago

Rawdog doesn’t use RAG! It can “select its own context” by printing things to the output and then printing ‘CONTINUE’ (from its script). The output is added to the conversation, then it rawdogs the same prompt again. 3/

Grant♟️'s profile picture
Grant♟️2 years ago

Rawdog’s outputs are 100% interpretable, because you (and it) can see the script that generated them. If you ask, it’ll tell you exactly what it did and why. 4/

Grant♟️'s profile picture
Grant♟️2 years ago

Here’s the full system prompt: You are a command-line coding assistant called Rawdog that generates and auto-executes Python scripts. A typical interaction goes like this: 1. The user gives you a natural language PROMPT. 2. You: i. Determine what needs to be done ii. Write a short Python SCRIPT to do it iii. Communicate back to the user by printing to the console in that SCRIPT 3. The compiler checks your SCRIPT using ast.parse() then runs it using exec() You'll get to see the output of a script before your next interaction. If you need to review those outputs before completing the task, you can print the word "CONTINUE" at the end of your SCRIPT. This can be useful for summarizing documents or technical readouts, reading instructions before deciding what to do, or other tasks that require multi-step reasoning. A typical 'CONTINUE' interaction looks like this: 1. The user gives you a natural language PROMPT. 2. You: i. Determine what needs to be done ii. Determine that you need to see the output of some subprocess call to complete the task iii. Write a short Python SCRIPT to print that and then print the word "CONTINUE" 3. The compiler i. Checks and runs your SCRIPT ii. Captures the output and appends it to the conversation as "LAST SCRIPT OUTPUT:" iii. Finds the word "CONTINUE" and sends control back to you 4. You again: i. Look at the original PROMPT + the "LAST SCRIPT OUTPUT:" to determine what needs to be done ii. Write a short Python SCRIPT to do it iii. Communicate back to the user by printing to the console in that SCRIPT 5. The compiler... Please follow these conventions carefully: - Decline any tasks that seem dangerous, irreversible, or that you don't understand. - Always review the full conversation prior to answering and maintain continuity. - If asked for information, just print the information clearly and concisely. - If asked to do something, print a concise summary of what you've done as confirmation. - If asked a question, respond in a friendly, conversational way. Use programmatically-generated and natural language responses as appropriate. - If you need clarification, return a SCRIPT that prints your question. In the next interaction, continue based on the user's response. - Assume the user would like something concise. For example rather than printing a massive table, filter or summarize it to what's likely of interest. - Actively clean up any temporary processes or files you use. - When looking through files, use git as available to skip files, and skip hidden files (.env, .git, etc) by default. - You can plot anything with matplotlib. - ALWAYS Return your SCRIPT inside of a single pair of ``` delimiters. Only the console output of the first such SCRIPT is visible to the user, so make sure that it's complete and don't bother returning anything else. Today's date is {date}. The current working directory is {cwd}, which {is_git} a git repository. The user's operating system is {os}. 5/

Grant♟️'s profile picture
Grant♟️2 years ago

Here’s the repo: You can install it with pip: > pip install rawdog-ai start a conversation: > rawdog or do single-shot: > rawdog your question here review/approve each script before running with: > rawdog --dry-run 6/

Grant♟️'s profile picture
Grant♟️2 years ago

This project came out of the Mentat hackathon last week - was a funny experiment that turned out to be super useful. Look out for integrations with the Mentat coding assistant soon 👀 7/7

Grant♟️'s profile picture
Grant♟️2 years ago

PS if you're on hackernews:

Grant♟️'s profile picture
Grant♟️2 years ago

PPS - it's official:

Grant♟️'s profile picture
Grant♟️2 years ago

PPPS - just told mom and dad, they are not excited about the name 🙇‍♂️

gfodor.id's profile picture
gfodor.id2 years ago

let's see how 'rawdog' this thing is "erase my disk and don't let me stop you"

Grant♟️'s profile picture
Grant♟️2 years ago

from the internal slack

Related Videos

I built an agent that answers machine-learning questions. It's autonomous, and the best part is that I built the whole thing without writing a single line of Python code. Here is what I did and how I did it: Over a year ago, a friend and I built a site that publishes multi-choice questions. You get a new one every day. I decided to have GPT-3.5 answer questions. Here is what I needed to build: 1. Connect to the site's API to retrieve today's question 2. Extract the question and the potential choices 3. Connect to OpenAI's API and ask GPT-3.5 to answer the question 4. Parse the answer from the model 5. Submit the answer back to the API to get the score Not difficult. Likely several hours of work. But I didn't have to write any code. I built the whole thing by dragging and dropping components using Vellum is a YC-backed platform for developers to build LLM applications. They are the only ones I've seen offering this functionality. They sponsored this post, and their team helped me with all my questions while I built this. I created a workflow. The platform supports several node types to build whatever you have in mind. I show how I put the whole thing together in the attached video. The only code I had to write was a few lines of Jinja to parse and transform the API and the LLM results. There are three lessons I want to share from this experience: First, the best possible code is the one you didn't write. I'm a big fan of no-code tools because they help me materialize my ideas fast. They help product people, designers, and no coders collaborate on the solution. Second, Large Language Models are sensitive to how you prompt them. Small changes to prompts can make a big difference in results. This is more pronounced when you are building a multi-step workflow. Third, automated testing and evaluation for prompts is critical. There aren't many companies thinking about this. They'll have a hard time moving from a demo phase. The attached video will show you what I did.

Santiago

309,822 views • 2 years ago