The book AI Crash Course by Hadelin de Ponteves contains a toolkit of four different AI models: Thompson Sampling, Q-Learning, Deep Q-Learning and Deep Convolutional Q-learning. It teaches the theory of these AI models and provides coding examples for solving industry cases based on these models.
InfoQ readers can find an excerpt of AI Crash Course on the publisher's website.
InfoQ interviewed Hadelin de Ponteves about using different AI models and how to develop AI skills.
InfoQ: Why did you write this book?
Hadelin de Ponteves: I wrote this book because I wanted my community of AI students to benefit from a complimentary resource other than my online courses. For many years I've created online courses on Artificial Intelligence (AI), which have been very successful and have contributed to the AI community. However, something essential was missing. At one point, there were so many AI courses that most of my students asked me for guidance on how to take the courses. So instead of providing an order in which to take the courses, I decided to create an all-in-one full guide to AI as a book, which would include in a perfect structure all the best explanations and real-world practical activities from my courses. You see, my goal is to democratize AI and raise awareness among everyone of the fact that AI is an accessible technology that can make a difference for the better in this world. I am trying my best to spread knowledge around the world to get people prepared for the future jobs and opportunities of this 21st century. And I thought some people would learn AI much more efficiently from an all-in-one book they can take anywhere, rather than completing tens of online courses that can be hard to navigate. That being said, this book is also a great additional resource for those people who do prefer, and take, online courses.
InfoQ: For whom is the book intended?
De Ponteves: This book is intended for people who are at beginner and intermediate levels of AI. Indeed, the first chapters explain all the foundations of AI theory, so that people get the right fundamentals, and then even a full chapter on Python basics is included for people who don’t know how to code. Then, the next chapters cover more advanced models in AI, and many practical activities, so that not only beginners but also intermediate level students can learn more and practice.
InfoQ: What AI models exist and how do they differ from each other?
De Ponteves: There are four different AI models: Thompson Sampling, Q-Learning, Deep Q-Learning and Deep Convolutional Q-Learning. They are very different on an algorithmic level: Deep Q-Learning is Q-Learning combined with Deep Learning. Deep Convolutional Q-Learning is Deep Q-Learning combined with a CNN (Convolutional Neural Network). And Thompson Sampling is very different from the others, in the way that it relies purely on basic statistics. But these four AI models are also very different in terms of the applications they have. Thompson Sampling is used to build a selling machine for online advertising. Q-Learning is used to build a logistics optimization system for process automation. Deep Q-Learning is used to build a self-driving car and also to solve a business problem. Deep Convolutional Q-Learning is used to build an AI that plays a video game.
InfoQ: How does reinforcement learning work?
De Ponteves: Reinforcement Learning works in the following way: firstly, an AI takes an observation (values, images, or any data) as input, and returns an action to perform as output (principle #1). Then, there is a reward system that helps the AI measure its performance over the iterations. The AI will learn through trial and error based on the reward it gets over time (principle #2). The input (state), the output (action), and the reward system define what we call an AI environment (principle #3). The AI interacts with this environment through a process called the Markov decision process (principle #4). Finally, in training mode (when the AI is training), the AI learns how to maximize its total reward by updating its parameters through the iterations, and in inference mode (when the AI is simply making predictions) the AI performs its actions over full episodes without updating any of its parameters – that is to say, without learning (principle #5).
InfoQ: In the book you explained how we can use Thompson Sampling to find out what marketing strategy provides the highest revenue. What does the algorithm that is used for this look like?
De Ponteves: The algorithm is rather simple, since it can be coded in less than 100 lines of code! So how does it work? Well, each marketing strategy is associated with its own Beta distribution. Each time the marketing strategy is successful, its Beta distribution is slightly shifted to the right, and each time the marketing strategy is not successful, its Beta distribution is slightly shifted to the left. Therefore over the rounds, the Beta distribution of the slot machine with the highest conversion rate will be progressively shifted to the right, and the Beta distributions of the strategies with lower conversion rates will be progressively shifted to the left. And the key to this is that every timestep we sample a random draw from each Beta distribution associated with each marketing strategy, and we select of course the strategy that has the highest of these random draws. Hence statistically, the slot machine with the highest conversion rate will be selected more and more.
InfoQ: What are the benefits that we can get from using an AI approach to determine the best strategy?
De Ponteves: There is only one benefit, other than the fact that you do indeed manage to find the best strategy, which is efficiency. That AI, compared to other algorithms, will be the one that finds that best strategy the fastest. For example, if you are dealing with 10 strategies and you want to figure out which one has the highest conversion rate, Thompson Sampling will figure it out in hundreds of rounds, whereas a classic approach would find it in thousands of rounds. And this is a huge difference because it saves time and costs. The way Thompson Sampling manages to figure it out so quickly lies in the fact that it makes an optimized probabilistic model to find the strategies with the highest conversion rates.
InfoQ: How does deep Q-learning differ from Q-learning?
De Ponteves: Deep Q-Learning is the combination of Q-Learning and Deep Learning. It’s like we are building a brain for the AI, and it is exactly that brain which will decide which predictions to make in the environment to reach the maximum reward and win. How is this brain built? It is built from artificial neurons put in multiple layers. Each neuron from one layer is connected to every neuron from the previous layer, and every layer has its own activation function—a function that decides how much each output signal should be blocked. The step in which this artificial brain works out the prediction is called forward-propagation and the step in which it learns is called back-propagation. There are three main types of back-propagation: batch gradient descent, stochastic gradient descent, and the best one, mini-batch gradient descent, which combines the advantages of both previous methods.
InfoQ: How can we recognize images using Convolutional Neural Networks?
De Ponteves: A Convolutional Neural Network is an advanced neural network that searches for certain features in these images, allowing it to recognize what’s inside. For example if it wants to recognize a dog or a cat, it will search for features like the nose shape which is different in a dog than in a cat. Then on the technical side, a Convolutional Neural Network uses three main steps: convolution, where we search for features; max pooling, where we shrink the image in size; and flattening, where we flatten 2D images to a 1D vector so that we can input it into a neural network. Then forward-propagation through a fully-connected neural network is applied to predict what the image is representing.
InfoQ: How can we train AI systems that are based on Convolutional Neural Networks?
De Ponteves: We can train them exactly the same way as we would do with a classic AI system based on Deep Q-Learning. The only thing that changes is that we give the AI the ability to see images and even videos, thanks to the Convolutional Neural Network that it contains in front of its brain. So the only use of the convolutional neural network is for the AI to be able to observe real images, and then its brain (the classic system based on Deep Q-Learning) is trained the usual way with commonly stochastic gradient descent and back-propagation.
InfoQ: What do you recommend to people who want to practice their AI skills?
De Ponteves: There are many ways for people to practice their AI skills. People can enter AI competitions like the ones on Kaggle, which contain problems that can be solved with deep reinforcement learning. People could build some new AIs like the ones we create in the book, such as the self-driving car. For example, people could build an AI that plays the game of pong. Also, there's a great AI platform called OpenAI Gym where people can practice building AIs for many types of applications, including an AI that plays Atari games (Breakout, Pacman, Space Invaders, and so on), an AI that plays car racing, an AI that plays the game Doom, or training a virtual robot on how to walk and run. I really recommend that people check out the Open AI Gym website which has all these fantastic applications they can work and practice on.
About the Book Author
Hadelin de Ponteves is the co-founder and CEO at BlueLife AI, which leverages the power of cutting-edge Artificial Intelligence to empower businesses to make massive profits by innovating, automating processes and maximizing efficiency. De Ponteves is also an online entrepreneur who has created 50+ top-rated educational e-courses for the world on topics such as Machine Learning, Deep Learning, Artificial Intelligence and Blockchain, which have reached 1M+ sales in 204 countries.