Average success rate across all puzzles and seeds for baseline models and LLMs, broken down by puzzle category (note that CoT indicates the use of chain-of-thought prompting). Categories increase in difficulty going from yellow to green to blue to purple. Credit: arXiv (2024). DOI: 10.48550/arxiv.2404.11730 Can artificial intelligence (AI) match human skills for finding obscure connections between words? Researchers at NYU Tandon School of Engineering turned to the daily Connections puzzle from The New York Times to find out. Connections gives players five attempts to group 16 words into four Read More
No comments:
Post a Comment