AI SKILLS
AI prompt optimalizálás automatizálása Autoresearch és bináris értékelő csomagok használatával
A new methodology for optimizing AI performance combines Claude Code with the 'autoresearch' methodology pioneered by Andrej Karpathy. This approach allows users to create systems where AI prompts effectively improve themselves through automated iteration and testing, moving away from manual trial-and-error.
The process involves defining binary 'yes/no' evaluation criteria—known as an 'eval suite'—and allowing an agent to generate multiple outputs every few minutes. The system scores these outputs, mutates the prompt to find better versions, and keeps the winner for the next cycle. This method has proven highly effective in real-world tests, such as reducing website load times from 1,100ms to 67ms over 67 automated experiments for a fraction of the cost of manual engineering.
- Uses Andrej Karpathy's autoresearch methodology to automate prompt engineering.
- Relies on binary 'yes/no' evaluation criteria to avoid the noise of 1-7 rating scales.
- System generates outputs, evaluates them, and mutates the prompt to find 'winners'.
- Achieved a 97.5% success rate in diagram generation tasks within a few test runs.
- Testing cycles can cost as little as $0.20 per cycle while producing near-perfect prompts.
- The method prevents models from 'gaming' the evaluation by keeping criteria simple and binary.
Miért fontos?
This approach shifts prompt engineering from a subjective art form to an automated engineering discipline. It allows developers to achieve extremely high reliability in AI outputs without the hundreds of hours usually required for manual prompt tuning.