The following is a summary and article by AI based on a transcript of the video "Future Computers Will Be Radically Different (Analog Computing)". Due to the limitations of AI, please be careful to distinguish the correctness of the content.
00:00 | - For hundreds of years, |
---|---|
00:01 | analog computers were the most powerful computers on Earth, |
00:05 | predicting eclipses, tides, and guiding anti-aircraft guns. |
00:09 | Then, with the advent of solid-state transistors, |
00:12 | digital computers took off. |
00:14 | Now, virtually every computer we use is digital. |
00:18 | But today, a perfect storm of factors is setting the scene |
00:21 | for a resurgence of analog technology. |
00:24 | This is an analog computer, |
00:27 | and by connecting these wires in particular ways, |
00:30 | I can program it to solve a whole range |
00:32 | of differential equations. |
00:34 | For example, this setup allows me to simulate |
00:37 | a damped mass oscillating on a spring. |
00:40 | So on the oscilloscope, you can actually see the position |
00:43 | of the mass over time. |
00:45 | And I can vary the damping, |
00:48 | or the spring constant, |
00:51 | or the mass, and we can see how the amplitude |
00:54 | and duration of the oscillations change. |
00:57 | Now what makes this an analog computer |
01:00 | is that there are no zeros and ones in here. |
01:03 | Instead, there's actually a voltage that oscillates |
01:06 | up and down exactly like a mass on a spring. |
01:10 | The electrical circuitry is an analog |
01:12 | for the physical problem, |
01:14 | it just takes place much faster. |
01:16 | Now, if I change the electrical connections, |
01:19 | I can program this computer |
01:20 | to solve other differential equations, |
01:22 | like the Lorenz system, |
01:24 | which is a basic model of convection in the atmosphere. |
01:27 | Now the Lorenz system is famous because it was one |
01:29 | of the first discovered examples of chaos. |
01:32 | And here, you can see the Lorenz attractor |
01:35 | with its beautiful butterfly shape. |
01:38 | And on this analog computer, |
01:39 | I can change the parameters |
01:42 | and see their effects in real time. |
01:46 | So these examples illustrate some |
01:47 | of the advantages of analog computers. |
01:50 | They are incredibly powerful computing devices, |
01:53 | and they can complete a lot of computations fast. |
01:56 | Plus, they don't take much power to do it. |
02:01 | With a digital computer, |
02:02 | if you wanna add two eight-bit numbers, |
02:05 | you need around 50 transistors, |
02:08 | whereas with an analog computer, |
02:09 | you can add two currents, |
02:12 | just by connecting two wires. |
02:15 | With a digital computer to multiply two numbers, |
02:18 | you need on the order of 1,000 transistors |
02:20 | all switching zeros and ones, |
02:23 | whereas with an analog computer, |
02:24 | you can pass a current through a resistor, |
02:28 | and then the voltage across this resistor |
02:31 | will be I times R. |
02:34 | So effectively, |
02:35 | you have multiplied two numbers together. |
02:40 | But analog computers also have their drawbacks. |
02:42 | For one thing, |
02:43 | they are not general-purpose computing devices. |
02:46 | I mean, you're not gonna run Microsoft Word on this thing. |
02:49 | And also, since the inputs and outputs are continuous, |
02:52 | I can't input exact values. |
02:55 | So if I try to repeat the same calculation, |
02:58 | I'm never going to get the exact same answer. |
03:01 | Plus, think about manufacturing analog computers. |
03:04 | There's always gonna be some variation |
03:06 | in the exact value of components, |
03:08 | like resistors or capacitors. |
03:10 | So as a general rule of thumb, |
03:12 | you can expect about a 1% error. |
03:15 | So when you think of analog computers, |
03:17 | you can think powerful, fast, and energy-efficient, |
03:20 | but also single-purpose, non-repeatable, and inexact. |
03:25 | And if those sound like deal-breakers, |
03:28 | it's because they probably are. |
03:30 | I think these are the major reasons |
03:31 | why analog computers fell out of favor |
03:33 | as soon as digital computers became viable. |
03:36 | Now, here's why analog computers may be making a comeback. |
03:41 | (computers beeping) |
03:43 | It all starts with artificial intelligence. |
03:46 | - [Narrator] A machine has been programmed to see |
03:48 | and to move objects. |
03:51 | - AI isn't new. |
03:52 | The term was coined back in 1956. |
03:55 | In 1958, Cornell University psychologist, |
03:58 | Frank Rosenblatt, built the perceptron, |
04:01 | designed to mimic how neurons fire in our brains. |
04:05 | So here's a basic model of how neurons in our brains work. |
04:08 | An individual neuron can either fire or not, |
04:12 | so its level of activation can be represented |
04:14 | as a one or a zero. |
04:16 | The input to one neuron |
04:18 | is the output from a bunch other neurons, |
04:21 | but the strength of these connections |
04:22 | between neurons varies, |
04:24 | so each one can be given a different weight. |
04:27 | Some connections are excitatory, |
04:29 | so they have positive weights, |
04:30 | while others are inhibitory, |
04:32 | so they have negative weights. |
04:34 | And the way to figure out |
04:35 | whether a particular neuron fires, |
04:37 | is to take the activation of each input neuron |
04:40 | and multiply by its weight, |
04:42 | and then add these all together. |
04:44 | If their sum is greater than some number called the bias, |
04:47 | then the neuron fires, |
04:49 | but if it's less than that, the neuron doesn't fire. |
04:53 | As input, Rosenblatt's perceptron had 400 photocells |
04:57 | arranged in a square grid, |
04:59 | to capture a 20 by 20-pixel image. |
05:02 | You can think of each pixel as an input neuron, |
05:04 | with its activation being the brightness of the pixel. |
05:07 | Although strictly speaking, |
05:09 | the activation should be either zero or one, |
05:11 | we can let it take any value between zero and one. |
05:15 | All of these neurons are connected |
05:18 | to a single output neuron, |
05:20 | each via its own adjustable weight. |
05:23 | So to see if the output neuron will fire, |
05:25 | you multiply the activation of each neuron by its weight, |
05:28 | and add them together. |
05:30 | This is essentially a vector dot product. |
05:33 | If the answer is larger than the bias, the neuron fires, |
05:36 | and if not, it doesn't. |
05:38 | Now the goal of the perceptron |
05:40 | was to reliably distinguish between two images, |
05:43 | like a rectangle and a circle. |
05:45 | For example, |
05:46 | the output neuron could always fire |
05:48 | when presented with a circle, |
05:49 | but never when presented with a rectangle. |
05:52 | To achieve this, the perception had to be trained, |
05:55 | that is, shown a series of different circles |
05:58 | and rectangles, and have its weights adjusted accordingly. |
06:02 | We can visualize the weights as an image, |
06:05 | since there's a unique weight for each pixel of the image. |
06:09 | Initially, Rosenblatt set all the weights to zero. |
06:12 | If the perceptron's output is correct, |
06:14 | for example, here it's shown a rectangle |
06:16 | and the output neuron doesn't fire, |
06:19 | no change is made to the weights. |
06:21 | But if it's wrong, then the weights are adjusted. |
06:23 | The algorithm for updating the weights |
06:25 | is remarkably simple. |
06:27 | Here, the output neuron didn't fire when it was supposed to |
06:30 | because it was shown a circle. |
06:32 | So to modify the weights, |
06:33 | you simply add the input activations to the weights. |
06:38 | If the output neuron fires when it shouldn't, |
06:40 | like here, when shown a rectangle, |
06:42 | well, then you subtract the input activations |
06:45 | from the weights, and you keep doing this |
06:48 | until the perceptron correctly identifies |
06:50 | all the training images. |
06:52 | It was shown that this algorithm will always converge, |
06:55 | so long as it's possible to map the two categories |
06:58 | into distinct groups. |
07:00 | (footsteps thumping) |
07:02 | The perceptron was capable of distinguishing |
07:04 | between different shapes, like rectangles and triangles, |
07:07 | or between different letters. |
07:09 | And according to Rosenblatt, |
07:10 | it could even tell the difference between cats and dogs. |
07:14 | He said the machine was capable |
07:15 | of what amounts to original thought, |
07:18 | and the media lapped it up. |
07:20 | The "New York Times" called the perceptron |
07:22 | "the embryo of an electronic computer |
07:25 | that the Navy expects will be able to walk, talk, |
07:28 | see, write, reproduce itself, |
07:30 | and be conscious of its existence." |
07:34 | - [Narrator] After training on lots of examples, |
07:36 | it's given new faces it has never seen, |
07:39 | and is able to successfully distinguish male from female. |
07:43 | It has learned. |
07:45 | - In reality, the perceptron was pretty limited |
07:47 | in what it could do. |
07:48 | It could not, in fact, tell apart dogs from cats. |
07:52 | This and other critiques were raised |
07:53 | in a book by MIT giants, Minsky and Papert, in 1969. |
07:58 | And that led to a bust period |
08:00 | for artificial neural networks and AI in general. |
08:03 | It's known as the first AI winter. |
08:06 | Rosenblatt did not survive this winter. |
08:09 | He drowned while sailing in Chesapeake Bay |
08:12 | on his 43rd birthday. |
08:14 | (mellow upbeat music) |
08:17 | - [Narrator] The NAV Lab is a road-worthy truck, |
08:19 | modified so that researchers or computers |
08:22 | can control the vehicle as occasion demands. |
08:25 | - [Derek] In the 1980s, there was an AI resurgence |
08:28 | when researchers at Carnegie Mellon created one |
08:30 | of the first self-driving cars. |
08:32 | The vehicle was steered |
08:33 | by an artificial neural network called ALVINN. |
08:36 | It was similar to the perceptron, |
08:37 | except it had a hidden layer of artificial neurons |
08:41 | between the input and output. |
08:43 | As input, ALVINN received 30 by 32-pixel images |
08:47 | of the road ahead. |
08:48 | Here, I'm showing them as 60 by 64 pixels. |
08:51 | But each of these input neurons was connected |
08:54 | via an adjustable weight to a hidden layer of four neurons. |
08:57 | These were each connected to 32 output neurons. |
09:01 | So to go from one layer of the network to the next, |
09:04 | you perform a matrix multiplication: |
09:06 | the input activation times the weights. |
09:10 | The output neuron with the greatest activation |
09:12 | determines the steering angle. |
09:15 | To train the neural net, |
09:16 | a human drove the vehicle, |
09:18 | providing the correct steering angle |
09:20 | for a given input image. |
09:22 | All the weights in the neural network were adjusted |
09:24 | through the training |
09:25 | so that ALVINN's output better matched that |
09:27 | of the human driver. |
09:30 | The method for adjusting the weights |
09:31 | is called backpropagation, |
09:33 | which I won't go into here, |
09:34 | but Welch Labs has a great series on this, |
09:37 | which I'll link to in the description. |
09:40 | Again, you can visualize the weights |
09:41 | for the four hidden neurons as images. |
09:44 | The weights are initially set to be random, |
09:46 | but as training progresses, |
09:48 | the computer learns to pick up on certain patterns. |
09:51 | You can see the road markings emerge in the weights. |
09:54 | Simultaneously, the output steering angle coalesces |
09:58 | onto the human steering angle. |
10:00 | The computer drove the vehicle at a top speed |
10:03 | of around one or two kilometers per hour. |
10:06 | It was limited by the speed |
10:07 | at which the computer could perform matrix multiplication. |
10:12 | Despite these advances, |
10:13 | artificial neural networks still struggled |
10:15 | with seemingly simple tasks, |
10:17 | like telling apart cats and dogs. |
10:19 | And no one knew whether hardware |
10:22 | or software was the weak link. |
10:24 | I mean, did we have a good model of intelligence, |
10:26 | we just needed more computer power? |
10:28 | Or, did we have the wrong idea |
10:30 | about how to make intelligence systems altogether? |
10:33 | So artificial intelligence experienced another lull |
10:36 | in the 1990s. |
10:38 | By the mid 2000s, |
10:39 | most AI researchers were focused on improving algorithms. |
10:43 | But one researcher, Fei-Fei Li, |
10:45 | thought maybe there was a different problem. |
10:48 | Maybe these artificial neural networks |
10:50 | just needed more data to train on. |
10:52 | So she planned to map out the entire world of objects. |
10:56 | From 2006 to 2009, she created ImageNet, |
10:59 | a database of 1.2 million human-labeled images, |
11:02 | which at the time, |
11:03 | was the largest labeled image dataset ever constructed. |
11:06 | And from 2010 to 2017, |
11:08 | ImageNet ran an annual contest: |
11:10 | the ImageNet Large Scale Visual Recognition Challenge, |
11:13 | where software programs competed to correctly detect |
11:16 | and classify images. |
11:17 | Images were classified into 1,000 different categories, |
11:21 | including 90 different dog breeds. |
11:23 | A neural network competing in this competition |
11:25 | would have an output layer of 1,000 neurons, |
11:28 | each corresponding to a category of object |
11:30 | that could appear in the image. |
11:32 | If the image contains, say, a German shepherd, |
11:34 | then the output neuron corresponding to German shepherd |
11:37 | should have the highest activation. |
11:39 | Unsurprisingly, it turned out to be a tough challenge. |
11:43 | One way to judge the performance of an AI |
11:45 | is to see how often the five highest neuron activations |
11:48 | do not include the correct category. |
11:50 | This is the so-called top-5 error rate. |
11:53 | In 2010, the best performer had a top-5 error rate |
11:56 | of 28.2%, meaning that nearly 1/3 of the time, |
12:01 | the correct answer was not among its top five guesses. |
12:04 | In 2011, the error rate of the best performer was 25.8%, |
12:09 | a substantial improvement. |
12:11 | But the next year, |
12:12 | an artificial neural network |
12:13 | from the University of Toronto, called AlexNet, |
12:16 | blew away the competition |
12:17 | with a top-5 error rate of just 16.4%. |
12:22 | What set AlexNet apart was its size and depth. |
12:25 | The network consisted of eight layers, |
12:27 | and in total, 500,000 neurons. |
12:30 | To train AlexNet, |
12:31 | 60 million weights and biases had to be carefully adjusted |
12:35 | using the training database. |
12:37 | Because of all the big matrix multiplications, |
12:40 | processing a single image required 700 million |
12:43 | individual math operations. |
12:45 | So training was computationally intensive. |
12:48 | The team managed it by pioneering the use of GPUs, |
12:51 | graphical processing units, |
12:52 | which are traditionally used for driving displays, screens. |
12:56 | So they're specialized for fast parallel computations. |
13:00 | The AlexNet paper describing their research |
13:02 | is a blockbuster. |
13:04 | It's now been cited over 100,000 times, |
13:07 | and it identifies the scale of the neural network |
13:10 | as key to its success. |
13:12 | It takes a lot of computation to train and run the network, |
13:16 | but the improvement in performance is worth it. |
13:19 | With others following their lead, |
13:20 | the top-5 error rate |
13:22 | on the ImageNet competition plummeted |
13:23 | in the years that followed, down to 3.6% in 2015. |
13:28 | That is better than human performance. |
13:31 | The neural network that achieved this |
13:32 | had 100 layers of neurons. |
13:35 | So the future is clear: |
13:36 | We will see ever increasing demand |
13:38 | for ever larger neural networks. |
13:40 | And this is a problem for several reasons: |
13:43 | One is energy consumption. |
13:45 | Training a neural network requires an amount |
13:47 | of electricity similar to the yearly consumption |
13:49 | of three households. |
13:50 | Another issue is the so-called Von Neumann Bottleneck. |
13:54 | Virtually every modern digital computer |
13:55 | stores data in memory, |
13:57 | and then accesses it as needed over a bus. |
14:00 | When performing the huge matrix multiplications required |
14:02 | by deep neural networks, |
14:04 | most of the time and energy goes |
14:05 | into fetching those weight values rather |
14:07 | than actually doing the computation. |
14:10 | And finally, there are the limitations of Moore's Law. |
14:13 | For decades, the number of transistors |
14:14 | on a chip has been doubling approximately every two years, |
14:18 | but now the size of a transistor |
14:20 | is approaching the size of an atom. |
14:21 | So there are some fundamental physical challenges |
14:24 | to further miniaturization. |
14:26 | So this is the perfect storm for analog computers. |
14:30 | Digital computers are reaching their limits. |
14:32 | Meanwhile, neural networks are exploding in popularity, |
14:35 | and a lot of what they do boils down |
14:38 | to a single task: matrix multiplication. |
14:41 | Best of all, neural networks don't need the precision |
14:44 | of digital computers. |
14:45 | Whether the neural net is 96% or 98% confident |
14:48 | the image contains a chicken, |
14:50 | it doesn't really matter, it's still a chicken. |
14:52 | So slight variability in components |
14:54 | or conditions can be tolerated. |
14:57 | (upbeat rock music) |
14:58 | I went to an analog computing startup in Texas, |
15:01 | called Mythic AI. |
15:03 | Here, they're creating analog chips to run neural networks. |
15:06 | And they demonstrated several AI algorithms for me. |
15:10 | - Oh, there you go. |
15:11 | See, it's getting you. (Derek laughs) |
15:13 | Yeah. - That's fascinating. |
15:14 | - The biggest use case is augmented in virtual reality. |
15:17 | If your friend is in a different, |
15:19 | they're at their house and you're at your house, |
15:20 | you can actually render each other in the virtual world. |
15:24 | So it needs to really quickly capture your pose, |
15:27 | and then render it in the VR world. |
15:29 | - So, hang on, is this for the metaverse thing? |
15:31 | - Yeah, this is a very metaverse application. |
15:35 | This is depth estimation from just a single webcam. |
15:38 | It's just taking this scene, |
15:39 | and then it's doing a heat map. |
15:41 | So if it's bright, it means it's close. |
15:43 | And if it's far away, it makes it black. |
15:45 | - [Derek] Now all these algorithms can be run |
15:47 | on digital computers, |
15:48 | but here, the matrix multiplication is actually taking place |
15:52 | in the analog domain. (light music) |
15:54 | To make this possible, |
15:55 | Mythic has repurposed digital flash storage cells. |
15:59 | Normally these are used as memory |
16:01 | to store either a one or a zero. |
16:03 | If you apply a large positive voltage to the control gate, |
16:07 | electrons tunnel up through an insulating barrier |
16:10 | and become trapped on the floating gate. |
16:12 | Remove the voltage, |
16:13 | and the electrons can remain on the floating gate |
16:15 | for decades, preventing the cell from conducting current. |
16:18 | And that's how you can store either a one or a zero. |
16:21 | You can read out the stored value |
16:22 | by applying a small voltage. |
16:25 | If there are electrons on the floating gate, |
16:26 | no current flows, so that's a zero. |
16:29 | If there aren't electrons, |
16:30 | then current does flow, and that's a one. |
16:33 | Now Mythic's idea is to use these cells |
16:36 | not as on/off switches, but as variable resistors. |
16:40 | They do this by putting a specific number of electrons |
16:42 | on each floating gate, instead of all or nothing. |
16:45 | The greater the number of electrons, |
16:47 | the higher the resistance of the channel. |
16:49 | When you later apply a small voltage, |
16:52 | the current that flows is equal to V over R. |
16:55 | But you can also think of this as voltage times conductance, |
16:59 | where conductance is just the reciprocal of resistance. |
17:02 | So a single flash cell can be used |
17:04 | to multiply two values together, voltage times conductance. |
17:09 | So to use this to run an artificial neural network, |
17:11 | well they first write all the weights to the flash cells |
17:14 | as each cell's conductance. |
17:16 | Then, they input the activation values |
17:19 | as the voltage on the cells. |
17:21 | And the resulting current is the product |
17:23 | of voltage times conductance, |
17:25 | which is activation times weight. |
17:28 | The cells are wired together in such a way |
17:30 | that the current from each multiplication adds together, |
17:34 | completing the matrix multiplication. |
17:36 | (light music) |
17:39 | - So this is our first product. |
17:40 | This can do 25 trillion math operations per second. |
17:45 | - [Derek] 25 trillion. |
17:47 | - Yep, 25 trillion math operations per second, |
17:49 | in this little chip here, |
17:50 | burning about three watts of power. |
17:52 | - [Derek] How does it compare to a digital chip? |
17:54 | - The newer digital systems can do anywhere |
17:57 | from 25 to 100 trillion operations per second, |
18:00 | but they are big, thousand-dollar systems |
18:02 | that are spitting out 50 to 100 watts of power. |
18:06 | - [Derek] Obviously this isn't |
18:07 | like an apples apples comparison, right? |
18:09 | - No, it's not apples to apples. |
18:10 | I mean, training those algorithms, |
18:13 | you need big hardware like this. |
18:15 | You can just do all sorts of stuff on the GPU, |
18:17 | but if you specifically are doing AI workloads |
18:20 | and you wanna deploy 'em, you could use this instead. |
18:22 | You can imagine them in security cameras, |
18:25 | autonomous systems, |
18:26 | inspection equipment for manufacturing. |
18:29 | Every time they make a Frito-Lay chip, |
18:30 | they inspect it with a camera, |
18:32 | and the bad Fritos get blown off of the conveyor belt. |
18:36 | But they're using artificial intelligence |
18:37 | to spot which Fritos are good and bad. |
18:40 | - Some have proposed using analog circuitry |
18:42 | in smart home speakers, |
18:43 | solely to listen for the wake word, like Alexa or Siri. |
18:47 | They would use a lot less power and be able to quickly |
18:49 | and reliably turn on the digital circuitry of the device. |
18:53 | But you still have to deal with the challenges of analog. |
18:56 | - So for one of the popular networks, |
18:58 | there would be 50 sequences |
19:00 | of matrix multiplies that you're doing. |
19:02 | Now, if you did that entirely in the analog domain, |
19:05 | by the time it gets to the output, |
19:06 | it's just so distorted |
19:07 | that you don't have any result at all. |
19:10 | So you convert it from the analog domain, |
19:12 | back to the digital domain, |
19:14 | send it to the next processing block, |
19:15 | and then you convert it into the analog domain again. |
19:18 | And that allows you to preserve the signal. |
19:20 | - You know, when Rosenblatt was first setting |
19:22 | up his perceptron, |
19:23 | he used a digital IBM computer. |
19:26 | Finding it too slow, |
19:28 | he built a custom analog computer, |
19:30 | complete with variable resistors |
19:32 | and little motors to drive them. |
19:35 | Ultimately, his idea of neural networks |
19:37 | turned out to be right. |
19:39 | Maybe he was right about analog, too. |
19:43 | Now, I can't say whether analog computers will take |
19:46 | off the way digital did last century, |
19:48 | but they do seem to be better suited |
19:51 | to a lot of the tasks that we want computers |
19:53 | to perform today, |
19:55 | which is a little bit funny |
19:56 | because I always thought of digital |
19:58 | as the optimal way of processing information. |
20:01 | Everything from music to pictures, |
20:03 | to video has all gone digital in the last 50 years. |
20:07 | But maybe in a 100 years, |
20:09 | we will look back on digital, |
20:11 | not not as the end point of information technology, |
20:15 | but as a starting point. |
20:17 | Our brains are digital |
20:19 | in that a neuron either fires or it doesn't, |
20:21 | but they're also analog |
20:24 | in that thinking takes place everywhere, all at once. |
20:28 | So maybe what we need |
20:30 | to achieve true artificial intelligence, |
20:32 | machines that think like us, is the power of analog. |
20:37 | (gentle music) |
20:42 | Hey, I learned a lot while making this video, |
20:44 | much of it by playing with an actual analog computer. |
20:47 | You know, trying things out for yourself |
20:48 | is really the best way to learn, |
20:50 | and you can do that with this video sponsor, Brilliant. |
20:53 | Brilliant is a website and app |
20:54 | that gets you thinking deeply |
20:56 | by engaging you in problem-solving. |
20:58 | They have a great course on neural networks, |
21:00 | where you can test how it works for yourself. |
21:02 | It gives you an excellent intuition |
21:04 | about how neural networks can recognize numbers and shapes, |
21:07 | and it also allows you to experience the importance |
21:09 | of good training data and hidden layers |
21:11 | to understand why more sophisticated |
21:14 | neural networks work better. |
21:15 | What I love about Brilliant |
21:16 | is it tests your knowledge as you go. |
21:19 | The lessons are highly interactive, |
21:20 | and they get progressively harder as you go on. |
21:23 | And if you get stuck, there are always helpful hints. |
21:26 | For viewers of this video, |
21:27 | Brilliant is offering the first 200 people |
21:29 | 20% off an annual premium subscription. |
21:32 | Just go to brilliant.org/veritasium. |
21:35 | I will put that link down in the description. |
21:37 | So I wanna thank Brilliant for supporting Veritasium, |
21:40 | and I wanna thank you for watching. |