1 year ago

Sat Oct 28, 2023 10:28pm PST

Ask HN: We need a new Turing-Test

It wasn't long ago that the best attempt to pass the Turing-Test was a chatbot that "was portrayed as a 13-year-old Ukrainian boy to induce forgiveness in those with whom it interacts for its grammatical errors and lack of general knowledge." [0]. But then ChatGPT came along and blew the test out of the water.

There are other, more well defined intelligence tests such as the "Winograd Schema" [1] which asks grammatical questions that require real-world knowledge and common-sense in order to answer such as:

"In the sentence 'The trophy would not fit in the suitcase because it was too big/small.' what does big/small refer to?"

But even these type of questions which were considered to be "The Sentences Computers Can't Understand, But Humans Can" as late as 2020 [2] seem to be easy for LLMs to answer with GPT4 topping the list with almost human level accuracy [3].

So assuming ChatGPT isn't an AGI (yet), how can we prove it to ourselves? What linguistic tasks are humans qualitatively better than LLMs in?

------

[0] https://en.wikipedia.org/wiki/Eugene_Goostman [1] https://en.wikipedia.org/wiki/Winograd_schema_challenge [2] https://www.youtube.com/watch?v=m3vIEKWrP9Q [3] https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande

comments:

add comment

loading comments...