Greg Kamradt's banner
Greg Kamradt's profile picture

Greg Kamradt

@GregKamradt47,262 subscribers

President @arcprize, builder/engineer

Shorts

my new vibe code setup: 1 orchestrator agent which controls 85 sub-agents working in parallel each sub-agent spawns from my stream of consciousness and tests from the main orchestrator Here's how it works:

my new vibe code setup: 1 orchestrator agent which controls 85 sub-agents working in parallel each sub-agent spawns from my stream of consciousness and tests from the main orchestrator Here's how it works:

316,172 görüntüleme

315x1 Bench - New 1RM Been chasing this one for 4-5 months Now time to move onto the next goal

315x1 Bench - New 1RM Been chasing this one for 4-5 months Now time to move onto the next goal

46,514 görüntüleme

.Sam Whitmore crushing it at AI Memory She is stepping through the evolution of memory she’s built at New Computer Recording coming soon - absolutely gold

.Sam Whitmore crushing it at AI Memory She is stepping through the evolution of memory she’s built at New Computer Recording coming soon - absolutely gold

20,562 görüntüleme

Videos

GregKamradt's profile picture

How good is GPT-4-Vision at extracting text from images? I wanted to find the limit - but I found weirdness instead Most surprising: GPT-4V performance varies depending on the *structure* of text it sees Let me explain A set of images with progressively more text was presented to GPT-4-Vision. GPT-4V was asked what text it saw in the image. The response from the model was compared against the image’s original text and scored for similarity. The model was tested with 4 types of text: essay, random words, random tokens, and random characters. Findings: * Performance degrades - Yes, the models are good at basic OCR, but as you get more text and words then performance drops (this is expected) * Type of context matters - You should expect different recall on your texts based on your context types * Hallucination Errors - I thought that the model would make errors of omission (it wouldn’t return all the words). But instead the model mostly made hallucination errors - it replaced words with made up words. * Evals Matter - This test in isolation doesn’t mean that your data will have the same results, but it should motivate you to create eval tests for your data and anticipate errors which are hard to spot Notes: * Next step would be to add additional image types like tables or PDFs * GPT-4V would routinely get stuck in repeat-token-loops when trying to extract random tokens * GPT-4V would refuse to answer most random character images

Greg Kamradt

49,109 görüntüleme • 2 yıl önce

Daha fazla içerik yok.