The Opinion piece on AI by Lionel, for the Times Higher Education magazine

The dialogue about AI must be cross-disciplinary

Advances in machine learning have been spectacular in the last five years. This form of artificial intelligence has led to significant progress in areas such as autonomous driving, automated text generation and machine translation (to and from multiple languages). Google Translate is the most obvious example of the last of these. Yet useful as it is, it still makes some fairly basic mistakes. For example, it correctly renders “the window that I have shut” into French as “la fenêtre que j'ai fermée”, but incorrectly translates “the key that I have found” as “la clé que j'ai trouvé”.

Anyone with a French A level will tell you that, with the avoir verb, the past participle must agree with the direct object when it precedes the verb. “Clé” is feminine, so the extra ‘e’ is needed on the end of “trouvé”. Testing with similar examples gives a phrase translation accuracy of about 50 per cent, which isn’t great.

To someone like me, who has been working in machine learning (ML) for the past 30 years, this is not surprising. Translation is only as good as the data fed to the ML algorithm during the learning phase. Google Translate has no understanding of French grammar: it learns through brute repetition of exemplar sequences. Evidently there are not enough examples in Google’s training data of phrases with feminine nouns as objects preceding avoir for the correct translation to be given every time.

I have on my bookshelves a book I bought in the late 1980s, when I started experimenting with machine learning (“artificial neural networks” or “connectionism”, as the field was then called). In this book, Thinking Machines, the authors described the Chinese Room thought experiment, proposed by the philosopher John Searle in 1980. Strings of Chinese characters (“input questions”) are passed under the room’s door. By following the instructions from a computer program for correctly manipulating Chinese symbols, Searle, who does not speak Chinese, is able to send the appropriate sequence of Chinese characters back out under the door (“output answers”), thereby convincing observers outside the room that there is a Chinese speaker inside the room.

At the end of his thought experiment, Searle asks whether the computer program could be said to understand Chinese (“strong AI”) or whether it just simulated that ability (“weak AI”). As my experience with Google Translate reveals, such a question is still relevant now, even if today’s data-driven ML algorithms are entirely different from the symbol-manipulating programmes of the early 1980s.

Within the ML community, the focus is almost entirely on building ever more impressive demonstrators, such as the work by DeepMind researchers on game-playing machines. In 2016, their AlphaGo ML algorithm was able to beat the world’s best Go player. AlphaGo Zero and AlphaZero then went beyond AlphaGo by generating their own training datasets, using a combination of deep neural networks, reinforcement learning and game-specific representations to achieve “super-human” performance. As a result of two AlphaZero machines playing millions of games against each other, they explored a huge space of possibilities and were able to make moves that a human player could not have foreseen. But AlphaZero has no more understanding of Go than Google Translate has of French language or grammar.

The most powerful ML model today, GPT-3, is used in hundreds of text-generating apps, such as chatbots, producing nearly 5 billion words a day. But does GPT-3 understand the text it automatically generates? There has been extraordinary progress in learning algorithms, computational hardware and size of training data, but are we any closer to building thinking machines than we were 30 years ago (whether we call these strong AI, artificial general intelligence or superintelligence)? What is demonstrated by the ability to learn how to translate languages, play intellectually demanding games or generate text automatically in response to prompts? The remarkable success of weak AI? Or the first hint of strong AI? Such a debate should really be taking place within higher education, especially as in-person seminars and workshops resume. Emily Bender, a linguist from the University of Washington, last year updated the Chinese Room thought experiment with her “octopus test”
to emphasise the importance of the link between form and meaning. Two people living alone on remote islands send each other text messages through an underwater cable. An octopus listens in on the pulses, then cuts off one of the islanders and attempts to impersonate them by tapping on the cable. What happens when one of the islanders sends a message with instructions for how to build a coconut catapult but also asks the other islander for suggestions on how to improve the design?

Dialogue around such deep questions with ML researchers in computer science departments has been minimal, however, because most of them are too busy trying to keep up with the big
tech companies while training PhD students – who are soon absorbed into the ever-growing labs of those very same companies. In a world of chatbots and autonomous vehicles, fundamental questions about the limits of AI/ML need urgently to be revisited, with insights from multiple disciplines. ML researchers in academia should engage in a new dialogue with colleagues in philosophy, linguistics and cognitive science. Reuben College, Oxford’s newest college, intends to play its part in promoting these multidisciplinary exchanges.