From Turing to Deepseek, Learning Reinforcement Flies in the Summit of He

Using a carrot band to train a small horse and knight. (Photo from: Education images/universal images … [+] Group through Getty images)

Universal Images Group through Getty Images

Andrew Barto and Richard Sutton are the Reciples of the Turing Prize for “Developing conceptual and algorithmic reinforcement learning foundations,” the Association for Information Machines or ACM reported last week. Guide lesson with rewards is practiced for millennia, and in 20^Th Century, it became the basis for a large branch of psychological theory and experimentation.

Mimi Turing, often referred to as “Nobel in computing”, is named for Alan M. Turing, considered to be the father of the theoretical science of computers. In various forums between 1947 and 1950, Turing advised his conviction that all mental operations are calculable and his ideas for educating similar cars in the way children learn: “Training of the human child depends largely on a system of rewards and punishments.”

Barto and Sutton have led the attempt to decide what they have called “a calculator for learning from interactions”, developing algorithms to learn effective through proof and error and “delayed reward”, rewards not immediately, but on a series of actions.

Various approaches are used in the search for “artificial intelligence” or, as I see, expanding what computers can do by adding new (and progressively improved) skills such as text processing or speech. Like other approaches, reinforcement learning has encountered objections from scholars who choose the most key methods. And like other approaches, such as deep -deep triumphant (using artificial nerve networks) which has long been called “alchemy” by its destroyers, eventually grew to become the main course.

The triumph of reinforcement learning is part of the broader “Learning Machinery Learning” victory over the old fashion “(symbolic one), the clash of paradigms summarized by Terry Winograd as” trial and exclusive educated “versus” perfect intellect. ” Over the years, three different approaches have been developed to help computers learn from the examples: supervised learning, supervised learning and reinforcement learning.

In the reinforcement lesson, “students are not told which actions they take, but rather they must find out which actions give the greatest reward by trying them,” writes Sutton and Bart in their textbook Reinforcement lesson: an entry (1998; 2^AND Publication in 2018).

In the supervised lesson, the computer program extrapoles from the examples shown to correctly identify an non -present example in the training group. Sutton and Barto call it “an important type of learning”, but argue that “it is often impractical to take examples of the desired behavior that are accurate and representative of all the situations in which the agent must act. In the unexplored territory – where one would expect learning to be more useful – one agent should be able to learn.”

Unattended lesson, where examples are not labeled or defined, excel in finding models and relationships between data elements. Discovering a structure or a model in data is useful in many applications, but in itself does not address the purpose of reinforcement learning to maximize a reward. “Therefore, we consider the reinforcement lesson to be a paradigm of the third machine’s teaching, along with the supervised teaching and supervised lesson and perhaps other paradigms,” write Sutton and Barto.

In 1959, Arthur Samuel marked the term “car lesson”, which he defined as “programming a digital computer to behave in a way that, if done by human beings or animals, would be described as including the learning process.” Teaching a computer to play the checkers, Samuel “suggested most modern ideas in reinforcement learning,” write Stewart Russel and Peter Norvig in Artificial intelligence: a modern approach. However, Samuel was limited by working with a computer “approximately 100 billion times less powerful” than today’s graphic processing unit or GPU.

The merging of artificial nerve networks with GPU energy or deep learning with algorithms developed by Barto, Sutton, and others led to great advances in practical reinforcement learning applications over the past fifteen years. Google’s Deepmind has demonstrated the power of this combination in his Alphago program victories over the best man go Players in 2016 and 2017. This was followed by Alphalaro, who learned to defeat the world champions in three different games (chess, shogi, and Go) using only information about the rules of these games and policies he learned from broad self-play.

Despite this success, the researchers and he still rejected the learning of reinforcement as a practical method applicable. In his Tering Award lecture in 2019 while mourning the abuse of deep learning researchers like himself, Geoffrey Hinton said: “There are two types of learning algorithms – actually three, but the third type does not work very well. This is called the teaching of reinforcement. There is a wonderful Reduce absurdum ads of reinforcement learning. It’s called deepmind. “

In 2022, he’s pioneer Andrew Nga argued in the “Problem of Reinforcement Learning” that reinforcement algorithms that operate in the real world do not work in the real world. Three years later, the Effective Deepseek Data access to engineering and it resolves the challenges related to the amount of reinforcement learning data and computer power requirements. Responding to Deepseek and another high performance model that similarly improved his “reasoning” with reinforcement learning, NG wrote: “Less than three years ago, reinforcement learning seemed too weak to value trouble. Now it is a main direction in language modeling. Learning machinery continues to be full of turns!”

The theory is important, and Barto and Sutton combined at least three special research efforts in a “coherent perspective” of modern reinforcement learning. The smart engineer, however, moves modern or forward computing in real -world environments and practical applications, creatively shaking a theoretical model and overcoming the challenges of implementation.

“The general problem of learning from interaction to achieve goals is still far from the solution, but our understanding of it has improved significantly,” Sutton and Barto summarized their work in the 2018 edition of their book. Modesty and their understanding of how difficult it is to make computers “understand”, “think” or “reasons” as people should guide scholars who continue to promise the advent of artificial intelligence similar to human or agi in a year or two.

Or even “supervision”. Agi propagandists should read or reread Alan Turing, who observed more than three -quarters a century ago, “If expected to be an infallible car, it cannot be intelligent.” The most important was the observation of Turing, completely ignored by today’s enthusiasts, about the role of human culture and interactions in the development of human intelligence: “The isolated man does not develop any intellectual power. It is necessary for him to sink into an environment of other men whose techniques he absorbs during the first twenty years of his life. “

Leave a Comment Cancel Reply