Performance of human (purple), GPT-4 (dark blue), GPT-3.5 (light blue) and LLaMA2-70B (green) on the battery of theory of mind tests. a, Original test items for each test showing the distribution of test scores for individual sessions and participants. b, Interquartile ranges of the average scores on the original published items (dark colors) and novel items (pale colors) across each test. Credit: Nature Human Behaviour (2024). DOI: 10.1038/s41562-024-01882-z An international team of psychologists and neurobiologists has found via experimentation that two types of LLMs are able to equal or outperform Read More
No comments:
Post a Comment