Transcript of YouTube Video: Future Computers Will Be Radically Different (Analog Computing)

Transcript of YouTube Video: Future Computers Will Be Radically Different (Analog Computing)

The following is a summary and article by AI based on a transcript of the video "Future Computers Will Be Radically Different (Analog Computing)". Due to the limitations of AI, please be careful to distinguish the correctness of the content.

Article By AIVideo Transcript
00:00

- For hundreds of years,

00:01

analog computers were the most powerful computers on Earth,

00:05

predicting eclipses, tides, and guiding anti-aircraft guns.

00:09

Then, with the advent of solid-state transistors,

00:12

digital computers took off.

00:14

Now, virtually every computer we use is digital.

00:18

But today, a perfect storm of factors is setting the scene

00:21

for a resurgence of analog technology.

00:24

This is an analog computer,

00:27

and by connecting these wires in particular ways,

00:30

I can program it to solve a whole range

00:32

of differential equations.

00:34

For example, this setup allows me to simulate

00:37

a damped mass oscillating on a spring.

00:40

So on the oscilloscope, you can actually see the position

00:43

of the mass over time.

00:45

And I can vary the damping,

00:48

or the spring constant,

00:51

or the mass, and we can see how the amplitude

00:54

and duration of the oscillations change.

00:57

Now what makes this an analog computer

01:00

is that there are no zeros and ones in here.

01:03

Instead, there's actually a voltage that oscillates

01:06

up and down exactly like a mass on a spring.

01:10

The electrical circuitry is an analog

01:12

for the physical problem,

01:14

it just takes place much faster.

01:16

Now, if I change the electrical connections,

01:19

I can program this computer

01:20

to solve other differential equations,

01:22

like the Lorenz system,

01:24

which is a basic model of convection in the atmosphere.

01:27

Now the Lorenz system is famous because it was one

01:29

of the first discovered examples of chaos.

01:32

And here, you can see the Lorenz attractor

01:35

with its beautiful butterfly shape.

01:38

And on this analog computer,

01:39

I can change the parameters

01:42

and see their effects in real time.

01:46

So these examples illustrate some

01:47

of the advantages of analog computers.

01:50

They are incredibly powerful computing devices,

01:53

and they can complete a lot of computations fast.

01:56

Plus, they don't take much power to do it.

02:01

With a digital computer,

02:02

if you wanna add two eight-bit numbers,

02:05

you need around 50 transistors,

02:08

whereas with an analog computer,

02:09

you can add two currents,

02:12

just by connecting two wires.

02:15

With a digital computer to multiply two numbers,

02:18

you need on the order of 1,000 transistors

02:20

all switching zeros and ones,

02:23

whereas with an analog computer,

02:24

you can pass a current through a resistor,

02:28

and then the voltage across this resistor

02:31

will be I times R.

02:34

So effectively,

02:35

you have multiplied two numbers together.

02:40

But analog computers also have their drawbacks.

02:42

For one thing,

02:43

they are not general-purpose computing devices.

02:46

I mean, you're not gonna run Microsoft Word on this thing.

02:49

And also, since the inputs and outputs are continuous,

02:52

I can't input exact values.

02:55

So if I try to repeat the same calculation,

02:58

I'm never going to get the exact same answer.

03:01

Plus, think about manufacturing analog computers.

03:04

There's always gonna be some variation

03:06

in the exact value of components,

03:08

like resistors or capacitors.

03:10

So as a general rule of thumb,

03:12

you can expect about a 1% error.

03:15

So when you think of analog computers,

03:17

you can think powerful, fast, and energy-efficient,

03:20

but also single-purpose, non-repeatable, and inexact.

03:25

And if those sound like deal-breakers,

03:28

it's because they probably are.

03:30

I think these are the major reasons

03:31

why analog computers fell out of favor

03:33

as soon as digital computers became viable.

03:36

Now, here's why analog computers may be making a comeback.

03:41

(computers beeping)

03:43

It all starts with artificial intelligence.

03:46

- [Narrator] A machine has been programmed to see

03:48

and to move objects.

03:51

- AI isn't new.

03:52

The term was coined back in 1956.

03:55

In 1958, Cornell University psychologist,

03:58

Frank Rosenblatt, built the perceptron,

04:01

designed to mimic how neurons fire in our brains.

04:05

So here's a basic model of how neurons in our brains work.

04:08

An individual neuron can either fire or not,

04:12

so its level of activation can be represented

04:14

as a one or a zero.

04:16

The input to one neuron

04:18

is the output from a bunch other neurons,

04:21

but the strength of these connections

04:22

between neurons varies,

04:24

so each one can be given a different weight.

04:27

Some connections are excitatory,

04:29

so they have positive weights,

04:30

while others are inhibitory,

04:32

so they have negative weights.

04:34

And the way to figure out

04:35

whether a particular neuron fires,

04:37

is to take the activation of each input neuron

04:40

and multiply by its weight,

04:42

and then add these all together.

04:44

If their sum is greater than some number called the bias,

04:47

then the neuron fires,

04:49

but if it's less than that, the neuron doesn't fire.

04:53

As input, Rosenblatt's perceptron had 400 photocells

04:57

arranged in a square grid,

04:59

to capture a 20 by 20-pixel image.

05:02

You can think of each pixel as an input neuron,

05:04

with its activation being the brightness of the pixel.

05:07

Although strictly speaking,

05:09

the activation should be either zero or one,

05:11

we can let it take any value between zero and one.

05:15

All of these neurons are connected

05:18

to a single output neuron,

05:20

each via its own adjustable weight.

05:23

So to see if the output neuron will fire,

05:25

you multiply the activation of each neuron by its weight,

05:28

and add them together.

05:30

This is essentially a vector dot product.

05:33

If the answer is larger than the bias, the neuron fires,

05:36

and if not, it doesn't.

05:38

Now the goal of the perceptron

05:40

was to reliably distinguish between two images,

05:43

like a rectangle and a circle.

05:45

For example,

05:46

the output neuron could always fire

05:48

when presented with a circle,

05:49

but never when presented with a rectangle.

05:52

To achieve this, the perception had to be trained,

05:55

that is, shown a series of different circles

05:58

and rectangles, and have its weights adjusted accordingly.

06:02

We can visualize the weights as an image,

06:05

since there's a unique weight for each pixel of the image.

06:09

Initially, Rosenblatt set all the weights to zero.

06:12

If the perceptron's output is correct,

06:14

for example, here it's shown a rectangle

06:16

and the output neuron doesn't fire,

06:19

no change is made to the weights.

06:21

But if it's wrong, then the weights are adjusted.

06:23

The algorithm for updating the weights

06:25

is remarkably simple.

06:27

Here, the output neuron didn't fire when it was supposed to

06:30

because it was shown a circle.

06:32

So to modify the weights,

06:33

you simply add the input activations to the weights.

06:38

If the output neuron fires when it shouldn't,

06:40

like here, when shown a rectangle,

06:42

well, then you subtract the input activations

06:45

from the weights, and you keep doing this

06:48

until the perceptron correctly identifies

06:50

all the training images.

06:52

It was shown that this algorithm will always converge,

06:55

so long as it's possible to map the two categories

06:58

into distinct groups.

07:00

(footsteps thumping)

07:02

The perceptron was capable of distinguishing

07:04

between different shapes, like rectangles and triangles,

07:07

or between different letters.

07:09

And according to Rosenblatt,

07:10

it could even tell the difference between cats and dogs.

07:14

He said the machine was capable

07:15

of what amounts to original thought,

07:18

and the media lapped it up.

07:20

The "New York Times" called the perceptron

07:22

"the embryo of an electronic computer

07:25

that the Navy expects will be able to walk, talk,

07:28

see, write, reproduce itself,

07:30

and be conscious of its existence."

07:34

- [Narrator] After training on lots of examples,

07:36

it's given new faces it has never seen,

07:39

and is able to successfully distinguish male from female.

07:43

It has learned.

07:45

- In reality, the perceptron was pretty limited

07:47

in what it could do.

07:48

It could not, in fact, tell apart dogs from cats.

07:52

This and other critiques were raised

07:53

in a book by MIT giants, Minsky and Papert, in 1969.

07:58

And that led to a bust period

08:00

for artificial neural networks and AI in general.

08:03

It's known as the first AI winter.

08:06

Rosenblatt did not survive this winter.

08:09

He drowned while sailing in Chesapeake Bay

08:12

on his 43rd birthday.

08:14

(mellow upbeat music)

08:17

- [Narrator] The NAV Lab is a road-worthy truck,

08:19

modified so that researchers or computers

08:22

can control the vehicle as occasion demands.

08:25

- [Derek] In the 1980s, there was an AI resurgence

08:28

when researchers at Carnegie Mellon created one

08:30

of the first self-driving cars.

08:32

The vehicle was steered

08:33

by an artificial neural network called ALVINN.

08:36

It was similar to the perceptron,

08:37

except it had a hidden layer of artificial neurons

08:41

between the input and output.

08:43

As input, ALVINN received 30 by 32-pixel images

08:47

of the road ahead.

08:48

Here, I'm showing them as 60 by 64 pixels.

08:51

But each of these input neurons was connected

08:54

via an adjustable weight to a hidden layer of four neurons.

08:57

These were each connected to 32 output neurons.

09:01

So to go from one layer of the network to the next,

09:04

you perform a matrix multiplication:

09:06

the input activation times the weights.

09:10

The output neuron with the greatest activation

09:12

determines the steering angle.

09:15

To train the neural net,

09:16

a human drove the vehicle,

09:18

providing the correct steering angle

09:20

for a given input image.

09:22

All the weights in the neural network were adjusted

09:24

through the training

09:25

so that ALVINN's output better matched that

09:27

of the human driver.

09:30

The method for adjusting the weights

09:31

is called backpropagation,

09:33

which I won't go into here,

09:34

but Welch Labs has a great series on this,

09:37

which I'll link to in the description.

09:40

Again, you can visualize the weights

09:41

for the four hidden neurons as images.

09:44

The weights are initially set to be random,

09:46

but as training progresses,

09:48

the computer learns to pick up on certain patterns.

09:51

You can see the road markings emerge in the weights.

09:54

Simultaneously, the output steering angle coalesces

09:58

onto the human steering angle.

10:00

The computer drove the vehicle at a top speed

10:03

of around one or two kilometers per hour.

10:06

It was limited by the speed

10:07

at which the computer could perform matrix multiplication.

10:12

Despite these advances,

10:13

artificial neural networks still struggled

10:15

with seemingly simple tasks,

10:17

like telling apart cats and dogs.

10:19

And no one knew whether hardware

10:22

or software was the weak link.

10:24

I mean, did we have a good model of intelligence,

10:26

we just needed more computer power?

10:28

Or, did we have the wrong idea

10:30

about how to make intelligence systems altogether?

10:33

So artificial intelligence experienced another lull

10:36

in the 1990s.

10:38

By the mid 2000s,

10:39

most AI researchers were focused on improving algorithms.

10:43

But one researcher, Fei-Fei Li,

10:45

thought maybe there was a different problem.

10:48

Maybe these artificial neural networks

10:50

just needed more data to train on.

10:52

So she planned to map out the entire world of objects.

10:56

From 2006 to 2009, she created ImageNet,

10:59

a database of 1.2 million human-labeled images,

11:02

which at the time,

11:03

was the largest labeled image dataset ever constructed.

11:06

And from 2010 to 2017,

11:08

ImageNet ran an annual contest:

11:10

the ImageNet Large Scale Visual Recognition Challenge,

11:13

where software programs competed to correctly detect

11:16

and classify images.

11:17

Images were classified into 1,000 different categories,

11:21

including 90 different dog breeds.

11:23

A neural network competing in this competition

11:25

would have an output layer of 1,000 neurons,

11:28

each corresponding to a category of object

11:30

that could appear in the image.

11:32

If the image contains, say, a German shepherd,

11:34

then the output neuron corresponding to German shepherd

11:37

should have the highest activation.

11:39

Unsurprisingly, it turned out to be a tough challenge.

11:43

One way to judge the performance of an AI

11:45

is to see how often the five highest neuron activations

11:48

do not include the correct category.

11:50

This is the so-called top-5 error rate.

11:53

In 2010, the best performer had a top-5 error rate

11:56

of 28.2%, meaning that nearly 1/3 of the time,

12:01

the correct answer was not among its top five guesses.

12:04

In 2011, the error rate of the best performer was 25.8%,

12:09

a substantial improvement.

12:11

But the next year,

12:12

an artificial neural network

12:13

from the University of Toronto, called AlexNet,

12:16

blew away the competition

12:17

with a top-5 error rate of just 16.4%.

12:22

What set AlexNet apart was its size and depth.

12:25

The network consisted of eight layers,

12:27

and in total, 500,000 neurons.

12:30

To train AlexNet,

12:31

60 million weights and biases had to be carefully adjusted

12:35

using the training database.

12:37

Because of all the big matrix multiplications,

12:40

processing a single image required 700 million

12:43

individual math operations.

12:45

So training was computationally intensive.

12:48

The team managed it by pioneering the use of GPUs,

12:51

graphical processing units,

12:52

which are traditionally used for driving displays, screens.

12:56

So they're specialized for fast parallel computations.

13:00

The AlexNet paper describing their research

13:02

is a blockbuster.

13:04

It's now been cited over 100,000 times,

13:07

and it identifies the scale of the neural network

13:10

as key to its success.

13:12

It takes a lot of computation to train and run the network,

13:16

but the improvement in performance is worth it.

13:19

With others following their lead,

13:20

the top-5 error rate

13:22

on the ImageNet competition plummeted

13:23

in the years that followed, down to 3.6% in 2015.

13:28

That is better than human performance.

13:31

The neural network that achieved this

13:32

had 100 layers of neurons.

13:35

So the future is clear:

13:36

We will see ever increasing demand

13:38

for ever larger neural networks.

13:40

And this is a problem for several reasons:

13:43

One is energy consumption.

13:45

Training a neural network requires an amount

13:47

of electricity similar to the yearly consumption

13:49

of three households.

13:50

Another issue is the so-called Von Neumann Bottleneck.

13:54

Virtually every modern digital computer

13:55

stores data in memory,

13:57

and then accesses it as needed over a bus.

14:00

When performing the huge matrix multiplications required

14:02

by deep neural networks,

14:04

most of the time and energy goes

14:05

into fetching those weight values rather

14:07

than actually doing the computation.

14:10

And finally, there are the limitations of Moore's Law.

14:13

For decades, the number of transistors

14:14

on a chip has been doubling approximately every two years,

14:18

but now the size of a transistor

14:20

is approaching the size of an atom.

14:21

So there are some fundamental physical challenges

14:24

to further miniaturization.

14:26

So this is the perfect storm for analog computers.

14:30

Digital computers are reaching their limits.

14:32

Meanwhile, neural networks are exploding in popularity,

14:35

and a lot of what they do boils down

14:38

to a single task: matrix multiplication.

14:41

Best of all, neural networks don't need the precision

14:44

of digital computers.

14:45

Whether the neural net is 96% or 98% confident

14:48

the image contains a chicken,

14:50

it doesn't really matter, it's still a chicken.

14:52

So slight variability in components

14:54

or conditions can be tolerated.

14:57

(upbeat rock music)

14:58

I went to an analog computing startup in Texas,

15:01

called Mythic AI.

15:03

Here, they're creating analog chips to run neural networks.

15:06

And they demonstrated several AI algorithms for me.

15:10

- Oh, there you go.

15:11

See, it's getting you. (Derek laughs)

15:13

Yeah. - That's fascinating.

15:14

- The biggest use case is augmented in virtual reality.

15:17

If your friend is in a different,

15:19

they're at their house and you're at your house,

15:20

you can actually render each other in the virtual world.

15:24

So it needs to really quickly capture your pose,

15:27

and then render it in the VR world.

15:29

- So, hang on, is this for the metaverse thing?

15:31

- Yeah, this is a very metaverse application.

15:35

This is depth estimation from just a single webcam.

15:38

It's just taking this scene,

15:39

and then it's doing a heat map.

15:41

So if it's bright, it means it's close.

15:43

And if it's far away, it makes it black.

15:45

- [Derek] Now all these algorithms can be run

15:47

on digital computers,

15:48

but here, the matrix multiplication is actually taking place

15:52

in the analog domain. (light music)

15:54

To make this possible,

15:55

Mythic has repurposed digital flash storage cells.

15:59

Normally these are used as memory

16:01

to store either a one or a zero.

16:03

If you apply a large positive voltage to the control gate,

16:07

electrons tunnel up through an insulating barrier

16:10

and become trapped on the floating gate.

16:12

Remove the voltage,

16:13

and the electrons can remain on the floating gate

16:15

for decades, preventing the cell from conducting current.

16:18

And that's how you can store either a one or a zero.

16:21

You can read out the stored value

16:22

by applying a small voltage.

16:25

If there are electrons on the floating gate,

16:26

no current flows, so that's a zero.

16:29

If there aren't electrons,

16:30

then current does flow, and that's a one.

16:33

Now Mythic's idea is to use these cells

16:36

not as on/off switches, but as variable resistors.

16:40

They do this by putting a specific number of electrons

16:42

on each floating gate, instead of all or nothing.

16:45

The greater the number of electrons,

16:47

the higher the resistance of the channel.

16:49

When you later apply a small voltage,

16:52

the current that flows is equal to V over R.

16:55

But you can also think of this as voltage times conductance,

16:59

where conductance is just the reciprocal of resistance.

17:02

So a single flash cell can be used

17:04

to multiply two values together, voltage times conductance.

17:09

So to use this to run an artificial neural network,

17:11

well they first write all the weights to the flash cells

17:14

as each cell's conductance.

17:16

Then, they input the activation values

17:19

as the voltage on the cells.

17:21

And the resulting current is the product

17:23

of voltage times conductance,

17:25

which is activation times weight.

17:28

The cells are wired together in such a way

17:30

that the current from each multiplication adds together,

17:34

completing the matrix multiplication.

17:36

(light music)

17:39

- So this is our first product.

17:40

This can do 25 trillion math operations per second.

17:45

- [Derek] 25 trillion.

17:47

- Yep, 25 trillion math operations per second,

17:49

in this little chip here,

17:50

burning about three watts of power.

17:52

- [Derek] How does it compare to a digital chip?

17:54

- The newer digital systems can do anywhere

17:57

from 25 to 100 trillion operations per second,

18:00

but they are big, thousand-dollar systems

18:02

that are spitting out 50 to 100 watts of power.

18:06

- [Derek] Obviously this isn't

18:07

like an apples apples comparison, right?

18:09

- No, it's not apples to apples.

18:10

I mean, training those algorithms,

18:13

you need big hardware like this.

18:15

You can just do all sorts of stuff on the GPU,

18:17

but if you specifically are doing AI workloads

18:20

and you wanna deploy 'em, you could use this instead.

18:22

You can imagine them in security cameras,

18:25

autonomous systems,

18:26

inspection equipment for manufacturing.

18:29

Every time they make a Frito-Lay chip,

18:30

they inspect it with a camera,

18:32

and the bad Fritos get blown off of the conveyor belt.

18:36

But they're using artificial intelligence

18:37

to spot which Fritos are good and bad.

18:40

- Some have proposed using analog circuitry

18:42

in smart home speakers,

18:43

solely to listen for the wake word, like Alexa or Siri.

18:47

They would use a lot less power and be able to quickly

18:49

and reliably turn on the digital circuitry of the device.

18:53

But you still have to deal with the challenges of analog.

18:56

- So for one of the popular networks,

18:58

there would be 50 sequences

19:00

of matrix multiplies that you're doing.

19:02

Now, if you did that entirely in the analog domain,

19:05

by the time it gets to the output,

19:06

it's just so distorted

19:07

that you don't have any result at all.

19:10

So you convert it from the analog domain,

19:12

back to the digital domain,

19:14

send it to the next processing block,

19:15

and then you convert it into the analog domain again.

19:18

And that allows you to preserve the signal.

19:20

- You know, when Rosenblatt was first setting

19:22

up his perceptron,

19:23

he used a digital IBM computer.

19:26

Finding it too slow,

19:28

he built a custom analog computer,

19:30

complete with variable resistors

19:32

and little motors to drive them.

19:35

Ultimately, his idea of neural networks

19:37

turned out to be right.

19:39

Maybe he was right about analog, too.

19:43

Now, I can't say whether analog computers will take

19:46

off the way digital did last century,

19:48

but they do seem to be better suited

19:51

to a lot of the tasks that we want computers

19:53

to perform today,

19:55

which is a little bit funny

19:56

because I always thought of digital

19:58

as the optimal way of processing information.

20:01

Everything from music to pictures,

20:03

to video has all gone digital in the last 50 years.

20:07

But maybe in a 100 years,

20:09

we will look back on digital,

20:11

not not as the end point of information technology,

20:15

but as a starting point.

20:17

Our brains are digital

20:19

in that a neuron either fires or it doesn't,

20:21

but they're also analog

20:24

in that thinking takes place everywhere, all at once.

20:28

So maybe what we need

20:30

to achieve true artificial intelligence,

20:32

machines that think like us, is the power of analog.

20:37

(gentle music)

20:42

Hey, I learned a lot while making this video,

20:44

much of it by playing with an actual analog computer.

20:47

You know, trying things out for yourself

20:48

is really the best way to learn,

20:50

and you can do that with this video sponsor, Brilliant.

20:53

Brilliant is a website and app

20:54

that gets you thinking deeply

20:56

by engaging you in problem-solving.

20:58

They have a great course on neural networks,

21:00

where you can test how it works for yourself.

21:02

It gives you an excellent intuition

21:04

about how neural networks can recognize numbers and shapes,

21:07

and it also allows you to experience the importance

21:09

of good training data and hidden layers

21:11

to understand why more sophisticated

21:14

neural networks work better.

21:15

What I love about Brilliant

21:16

is it tests your knowledge as you go.

21:19

The lessons are highly interactive,

21:20

and they get progressively harder as you go on.

21:23

And if you get stuck, there are always helpful hints.

21:26

For viewers of this video,

21:27

Brilliant is offering the first 200 people

21:29

20% off an annual premium subscription.

21:32

Just go to brilliant.org/veritasium.

21:35

I will put that link down in the description.

21:37

So I wanna thank Brilliant for supporting Veritasium,

21:40

and I wanna thank you for watching.