Article Derived From Transcript of YouTube Video: NVIDIA CEO Jensen Huang Keynote at COMPUTEX 2024

Transcript of YouTube Video: NVIDIA CEO Jensen Huang Keynote at COMPUTEX 2024

Welcome to our collection of transcripts of YouTube videos, where we provide detailed text versions of "NVIDIA CEO Jensen Huang Keynote at COMPUTEX 2024" content.

Article Derived From TranscriptVideo Transcript

Transcript Summary

NVIDIA CEO Jensen Huang's keynote address at COMPUTEX 2024 highlights the expanding role of the company beyond GPUs, emphasizing their contributions to the development of AI infrastructure. Huang discusses the upcoming ventures, including the production of self-driving cars with Mercedes and JLR fleets, as well as advancements in humanoid robotics enabled by cognitive capabilities and world understanding AI developments. He underscores Taiwan's integral role in manufacturing and how the future will see the creation of robots and computers with physical mobility. The speech also delves into the transformative impact of generative AI on industries, envisioning a shift where applications assemble teams of AI experts (Nims) to execute tasks, interact through digital human interfaces, and fundamentally change the way software is produced and utilized. This new paradigm, according to Huang, ushers in a restart of the computer industry and an industrial revolution where computers become factories for generating intelligence across sectors.

Detailed Transcript of YouTube Videos

ll right, let's get started. Good take, good take. Okay, you guys ready? Yeah, yeah. Everybody thinks we make GPUs. I didn't video is so much more than that. This whole keynote is going to be about that, okay. So we'll start at the top with examples of the use cases and then seeing it in action. That's kind of the flow. It's such a compelling story. I'm super nervous about this. I just got to get in the rhythm. We're two weeks away from it. You guys can really, really make it great. We should take a swing at it. Yeah, that's the plan. We need to get daily status on that animation. Can you mute it cuz I hear myself? Sorry, what's the drop date for all the videos? It needs to be done on the 28th. Did you get all that? Safe travels, everybody. Super excited to see everyone. See you guys soon. Okay, bye.

We're basically moving as fast as the world can absorb technology, so we've got to lead the problem ourselves. The spine, you just have to figure out a way to make it pop, you know what I'm saying? Yeah, you know what I'm saying. You want to, yeah, that kind of thing. Thank you. I'm super late. Let's go.

Please welcome to the stage, Nvidia founder and CEO, Jensen Huang.

I am very happy to be back. Thank you for letting us use your stadium. The last time I was here, I received a degree from N and I gave the "run, don't walk" speech. Today, we have a lot to cover, so I cannot walk. I must run. We have a lot to cover. I have many things to tell you. I'm very happy to be here in Taiwan. Taiwan is the home of our treasured partners. This is, in fact, where everything Nvidia does begins. Our partners and ourselves take it to the world.

Taiwan and our partnership has created the world's AI infrastructure. Today, I want to talk to you about several things: one, what is happening and the meaning of the work that we do together; what is generative AI, what is its impact on our industry and on every industry; a blueprint for how we will go forward and engage this incredible opportunity; and what's coming next - generative AI and its impact, our blueprint, and what comes next. These are really exciting times, a restart of our computer industry, an industry that you have forged, an industry that you have created, and now you're prepared for the next major journey.

But before we start, Nvidia lives at the intersection of computer graphics, simulations, and artificial intelligence. This is our soul. Everything that I show you today is simulation. It's math, it's science, it's computer science, it's amazing computer architecture, none of it's animated, and it's all homemade. This is Nvidia's soul, and we put it all into this virtual world we call Omniverse. Please enjoy.

I want to speak to you in Chinese, but I have so much to tell you. I have to think too hard to speak Chinese, so I have to speak to you in English. At the foundation of everything that you saw were two fundamental technologies: accelerated computing and artificial intelligence running inside the Omniverse. Those two technologies, those two fundamental forces of computing, are going to reshape the computer industry.

The computer industry is now some 60 years old. In a lot of ways, everything that we do today was invented the year after my birth in 1964. The IBM System 360 introduced central processing units, general-purpose computing, the separation of hardware and software through an operating system, multitasking, I/O subsystems, DMA, all kinds of technologies that we use today, architectural compatibility, backward compatibility, family compatibility - all of the things that we know today about computing, largely described in 1964.

Of course, the PC revolution democratized computing and put it in the hands and the houses of everybody. And then, in 2007, the iPhone introduced mobile computing and put the computer in our pockets. Ever since, everything is connected and running all the time through the mobile cloud.

In the last 60 years, we saw several - just a few, actually, two or three - major technology shifts in computing, where everything changed, and we're about to see that happen again. There are two fundamental things that are happening:

  1. The processor, the engine by which the computer industry runs, the central processing unit's performance scaling, has slowed tremendously. Yet, the amount of computation we have to do is still doubling very quickly, exponentially.
  2. If the processing requirements and the data that we need to process continue to scale exponentially, but performance does not, we will experience computation inflation. In fact, we're seeing that right now as we speak. The amount of data center power used all over the world is growing quite substantially. The cost of computing is growing. We are seeing computation inflation.

This, of course, cannot continue. The data is going to continue to increase exponentially, and CPU performance scaling will never return. There is a better way. For almost two decades now, we've been working on accelerated computing. CUDA augments a CPU, offloads, and accelerates the work that a specialized processor can do much, much better. In fact, the performance is so extraordinary that it is very clear now, as CPU scaling has slowed and even substantially stopped, we should accelerate everything.

I predict that every application that is processing-intensive will be accelerated, and surely every data center will be accelerated in the near future. Now, accelerated computing is very sensible. It's very common sense. If you take a look at an application, and here the 100T means 100 units of time - it could be 100 seconds, it could be 100 hours. In many cases, as you know, we're now working on artificial intelligence applications that run for 100 days.

The one T is code that requires sequential processing, where single-threaded CPUs are really quite essential - operating systems control logic, really essential to have one instruction executed after another. However, there are many algorithms - computer graphics is one - that you can operate completely in parallel. Computer graphics, image processing, physics simulations, combinatorial optimizations, graph processing, database processing, and of course, the very famous linear algebra of deep learning. There are many types of algorithms that are very conducive to acceleration through parallel processing.

So we invented an architecture to do that by adding the GPU to the CPU. The specialized processor can take something that takes a great deal of time and accelerate it down to something that is incredibly fast. And because the two processors can work side by side, they're both autonomous, and they're both separate and independent. That is, we could accelerate what used to take 100 units of time down to one unit of time. The speed up is incredible. It almost sounds unbelievable, but today, I'll demonstrate many examples for you. The benefit is quite extraordinary - a 100 times speed up, but you only increase the power by about a factor of three, and you increase the cost by only about 50%.

We do this all the time in the PC industry. We add a $500 GPU, a GeForce GPU, to a $1,000 PC, and the performance increases tremendously. We do this in a data center - a billion-dollar data center. We add $500 million worth of GPUs, and all of a sudden, it becomes an AI factory. This is happening all over the world today.

Well, the savings are quite extraordinary. You're getting 60 times performance per dollar. 100 times speed up, you only increase your power by 3x, 100 times speed up, you only increase your cost by 1.5x. The savings are incredible. The savings are measured in dollars. It is very clear that many companies spend hundreds of millions of dollars processing data in the cloud. If it was accelerated, it is not unexpected that you could save hundreds of millions of dollars.

Now, why is that? Well, the reason for that is very clear. We've been experiencing inflation for so long in general-purpose computing that we finally came to a point where we finally determined to accelerate. There's an enormous amount of captured loss that we can now regain, a great deal of captured retained waste that we can now relieve out of the system, and that will translate into savings - savings in money, savings in energy.

That's why you've heard me say, "The more you buy, the more you save," and now I've shown you the mathematics. It is not accurate, but it is correct. Okay, that's called CEO math. CEO math is not accurate, but it is correct. The more you buy, the more you save.

Accelerated computing does deliver extraordinary results, but it is not easy. Why is it that it saves so much money, but people haven't done it for so long? The reason for that is because it's incredibly hard. There is no such thing as a software that you can just run through a C compiler, and all of a sudden, that application runs 100 times faster. That is not even logical. If it was possible to do that, they would have just changed the CPU to do that.

You, in fact, have to rewrite the software. That's the hard part. The software has to be completely rewritten so that you could re-express the algorithms that were written on a CPU so that it could be accelerated, offloaded, accelerated, and run in parallel. That computer science exercise is insanely hard.

Well, we've made it easy for the world over the last 20 years, of course. The very famous CNN, the deep learning library that processes neural networks. We have a library for AI physics that you could use for fluid dynamics and many other applications where the neural network has to obey the laws of physics. We have a great new library called AIAL, which is a CUDA accelerated 5G radio, so that we can software-define and accelerate the telecommunications networks.

The way that we've software-defined the world's networking, the internet, and so the ability for us to accelerate that allows us to turn all of telecom into essentially the same type of platform, a computing platform, just like we have in the cloud. Kitho is a computational lithography platform that allows us to process the most computationally intensive parts of chip manufacturing, making the masks. TSMC is in the process of going to production with Kitho, saving enormous amounts of energy and more enormous amounts of money.

But the goal for TSMC is to accelerate their stack so that they're prepared for even further advances in algorithms and more computation for deeper and deeper narrow and narrow transistors. Parab is our gene sequencing library. It is the highest throughput library in the world for gene sequencing. COP is an incredible library for combinatorial optimization, route planning optimization, the traveling salesman problem.

Incredibly complicated, people - well, scientists have largely concluded that you needed a quantum computer to do that. We created an algorithm that runs on accelerated computing that runs lightning fast, 23 world records. We hold every single major world record today.

C Quantum is an emulation system for a quantum computer. If you want to design a quantum computer, you need a simulator to do so. If you want to design quantum algorithms, you need a quantum emulator to do so. How would you do that? How would you design these quantum computers, create these quantum algorithms if the quantum computer doesn't exist, while you use the fastest computer in the world that exists today, and we call it, of course, Nvidia CUDA. On that, we have an emulator that simulates quantum computers. It is used by several hundred thousand researchers around the world.

It is integrated into all the leading frameworks for quantum computing and is used in scientific supercomputing centers all over the world. CDF is an unbelievable library for data processing. Data processing consumes the vast majority of cloud spend today. All of it should be accelerated. QDF accelerates the major libraries used in the world, Spark. Many of you probably use Spark in your companies, pandas, a new one called Polar, and of course, networkx, which is a graph processing, database, um, library.

So these are just some examples. There are so many more. Each one of them had to be created so that we can enable the ecosystem to take advantage of accelerated computing. If we hadn't created CNN CUDA alone wouldn't have been able to be used by all of the deep learning scientists around the world because CUDA and the algorithms that are used in TensorFlow and PyTorch, the deep learning algorithms, the separation is too far apart. It's almost like trying to do computer graphics without OpenGL. It's almost like doing data processing without SQL. These domain-specific libraries are really the treasure of our company. We have 350 of them. These libraries is what it takes and what has made it possible for us to have such open so many markets. I'll show you some other examples today.

Well, just last week, Google announced that they've put CDF in the cloud and accelerated pandas. Pandas is the most popular data science library in the world. Many of you in here probably already use pandas. It's used by 10 million data scientists in the world, downloaded 170 million times each month. It is the Excel, the spreadsheet of data scientists. Well, with just one click, you can now use pandas in Collab, which is Google's cloud data centers platform, accelerated by QDF. The speed up is really incredible. Let's take a look.

That was a great demo, right? It didn't take long. When you accelerate data processing that fast, demos don't take long.

Okay, well, CUDA has now achieved what people call a tipping point, but it's even better than that. CUDA has now achieved a virtuous cycle. This rarely happens. If you look at history and all the computing architecture, computing platforms, in the case of microprocessors, CPUs, it has been here for 60 years. It has not been changed for 60 years at this level.

Creating a new platform is extremely hard because it's a chicken and egg problem. If there are no developers that use your platform, then of course, there will be no users. But if there are no users, there's no install base. If there's no install base, developers aren't interested in it. Developers want to write software for a large installed base, but a large installed base requires a lot of applications, so that users would create that install base.

This chicken or the egg problem has rarely been broken and has taken us now 20 years, one domain library after another, one acceleration library after another, and now we have 5 million developers around the world. We serve every single industry, from healthcare, financial services, of course, the computer industry, automotive industry, just about every major industry in the world, just about every field of science because there are so many customers for our architecture.

OEM and cloud service providers are interested in building our systems. System makers, amazing system makers like the ones here in Taiwan, are interested in building our systems, which then takes and offers more systems to the market, which of course creates greater opportunity for us, which allows us to increase our scale R&D, scale which speeds up the application even more.

Well, every single time we speed up the application, the cost of computing goes down. This is that slide I was showing you earlier. 100x speed up translates to 97-98% savings. So when we go from 100x speed up to 200x speed up to a thousand x speed up, the savings, the marginal cost of computing, continues to fall.

We believe that by reducing the cost of computing incredibly, the market, developers, scientists, inventors, will continue to discover new algorithms that consume more and more and more computing. So that one day, something happens, that a phase shift happens, that the marginal cost of computing is so low that a new way of using computers emerges.

In fact, that's what we're seeing now. Over the years, we have driven down the marginal cost of computing. In the last 10 years, in one particular algorithm, by a million times. Well, as a result, it is now very logical and very common sense to train large language models with all of the data on the internet. Nobody thinks twice about this idea that you could create a computer that could process so much data to write its own software.

The emergence of artificial intelligence was made possible because of this complete belief that if we made computing cheaper and cheaper and cheaper, somebody's going to find a great use.

Today, CUDA has achieved the virtual cycle. Installed bases are growing. Computing cost is coming down, which causes more developers to come up with more ideas, which drives more demand. Now, we're on the beginning of something very, very important.

But before I show you that, I want to show you what is not possible if not for the fact that we created CUDA. That we created the modern version of generative AI. What I'm about to show you would not be possible.

This is Earth 2. The idea that we would create a digital twin of the Earth, that we would go and simulate the Earth so that we could predict the future of our planet, to better avert disasters or better understand the impact of climate change, so that we can adapt better, so that we could change our habits.

Now, this digital twin of Earth is probably one of the most ambitious projects that the world's ever undertaken, and we're taking large steps every single year. I'll show you results every single year, but this year, we made some great breakthroughs. Let's take a look.

On Monday, the storm will veer north again and approach Taiwan. There are big uncertainties regarding its path. Different paths will have different levels of impact on Taiwan.

For NVIDIA Earth 2.

Someday in the near future, we will have continuous weather prediction at every square kilometer on the planet. You will always know what the climate's going to be. You will always know, and this will run continuously because we've trained the AI, and the AI requires so little energy. So this is just an incredible achievement. I hope you enjoyed it.

And very importantly, the truth is that was a Jensen AI that was not me. I wrote it, but an AI Jensen AI had to say it. That is a miracle. That is a miracle indeed.

However, in 2012, something very important happened because of our dedication to advancing CUDA, because of our dedication to continuously improve the performance and drive the cost down. Researchers discovered CUDA in 2012. That was Nvidia's first contact with AI. This was a very important day. We had the good wisdom to work with the scientists to make it possible for deep learning to happen, and AlexNet achieved, of course, a tremendous computer vision breakthrough.

But the great wisdom was to take a step back and understand what was the background, what is the foundation of deep learning, what is this long-term impact, what is its potential? And we realized that this technology has great potential to scale an algorithm that was invented and discovered decades ago.

All of a sudden, because of more data, larger networks, and very importantly, a lot more compute, all of a sudden, deep learning was able to achieve what no human algorithm was able to. Now, imagine if we were to scale up the architecture even more, larger networks, more data, and more compute, what could be possible?

So we dedicated ourselves to reinvent everything after 2012. We changed the architecture of our GPU to add tensor cores. We invented NVLink. That was 10 years ago. Now, CDNN, tensor RT, nickel, we bought Mellanox, tensor RTLM, the Triton inference server, and all of it came together on a brand new computer. Nobody understood it, nobody asked for it, nobody understood it, and in fact, I was certain nobody wanted to buy it.

So we announced it at GTC, and Open AI, a small company in San Francisco, saw it and they asked me to deliver one to them. I delivered the first DGX, the world's first AI supercomputer, to Open AI in 2016.

Well, after that, we continued to scale from one AI supercomputer, one AI appliance, we scaled it up to large supercomputers, even larger by 2017. The world discovered Transformers, so that we could train enormous amounts of data and recognize and learn patterns that are sequential over large spans of time.

It is now possible for us to train these large language models to understand and achieve a breakthrough in natural language understanding. We kept going after that. We built even larger ones, and then in November 2022, trained on thousands, tens of thousands of Nvidia GPUs, in a very large AI supercomputer, Open AI announced Chat GPT.

One million users after five days, one million after five days, a hundred million after two months, the fastest-growing application in history. And the reason for that is very simple. It is just so easy to use, and it was so magical to use, to be able to interact with a computer like it's human, instead of being clear about what you want, it's like the computer understands your meaning, it understands your intention.

Oh, I think here, it, um, asked the, the closest Night Market, night, as you know, the night market is very important to me. So when I was young, uh, I was, I think I was four and a half years old, I used to love going to the night market because I, I just love watching people and, and, uh, and so we went, my parents used to take us to the night market, uh, and, and, um, and I love, I love, uh, going, and one day, uh, my face, you guys might, might see that I have a large scar on my face. My face was cut because somebody was washing their knife, and I was a little kid, um, but my memories of the Night Market is, uh, so deep because of that, and I used to love, I, I just, I still love going to the night market, and I just need to tell you guys this, the Tona Night Market is, is really good because there's a lady, uh, she's been working there for 43 years. She's the fruit lady, and it's in the middle of the street, between the two, go find her, okay? She's really terrific.

I think it would be funny after this, all of you go to see her. She's doing better and better every year. Her cart has improved, and yeah, I just love watching her succeed. Anyways, uh, Chat GPT came along, and, um, and something is very important in this slide here. Let me show you something. This slide, okay, and this slide. The fundamental difference is this. Until Chat GPT revealed it to the world, AI was all about perception, natural language understanding, computer vision, speech recognition. It's all about perception and detection.

This was the first time the world saw a generative AI. It produced tokens, one token at a time, and those tokens were words. Some of the tokens, of course, could now be images or charts or tables, songs, words, speech, videos. Those tokens could be anything. They could be anything that you can learn the meaning of. It could be tokens of chemicals, tokens of proteins, genes. You saw earlier in Earth 2, we were generating tokens of the weather.

We can generate tokens for almost anything of value. We can generate steering wheel control for a car. We can generate articulation for a robotic arm. Everything that we can learn, we can now generate.

We have now arrived not at the AI era, but a generative AI era. But what's really important is this. This computer that started out as a supercomputer has now evolved into a data center, and it produces one thing. It produces tokens. It's an AI factory.

This AI factory is generating, creating, producing something of great value, a new commodity. In the late 1890s, Nicola Tesla invented an AC generator. We invented an AI generator. The AC generator generates electrons. Nvidia's AI generator generates tokens. Both of these things have large market opportunities. It's completely fungible in almost every industry, and that's why it's a new industrial revolution.

We have now a new factory producing a new commodity for every industry that is of extraordinary value, and the methodology for doing this is quite scalable, and the methodology for doing this is quite repeatable. Notice how quickly so many different AI models, generative AI models, are being invented, literally daily. Every single industry is now piling on for the very first time.

The IT industry, which is a $3 trillion IT industry, is about to create something that can directly serve a hundred trillion dollars of industry, no longer just an instrument for information storage or data processing but a factory for generating intelligence for every industry. This is going to be a manufacturing industry, not a manufacturing industry of computers, but using the computers in manufacturing. This has never happened before. It's quite an extraordinary thing.

What led started with accelerated computing led to AI, led to generative AI, and now an industrial revolution. Now, the impact to our industry is also quite significant. Of course, we could create a new commodity, a new product we call tokens for many industries, but the impact to ours is also quite profound.

For the very first time, as I was saying earlier, in 60 years, every single layer of computing has been changed. From CPUs, general-purpose computing, to accelerated GPU computing, where the computer needs instructions. Now, computers process LLMs, large language models, AI models.

Whereas the computing model of the past is retrieval-based - almost every time you touch your phone, some pre-recorded text or pre-recorded image or pre-recorded video is retrieved for you and recomposed based on a recommender system to present it to you based on your habits.

But in the future, your computer will generate as much as possible, retrieve only what's necessary. And the reason for that is because generated data requires less energy to go fetch information. Generated data is also more contextually relevant. It will encode knowledge. It will your understanding of you.

Instead of "get that information for me" or "get that file for me," you just say, "ask me for an answer." Instead of a tool, instead of your computer being a tool that we use, the computer will now generate skills. It performs tasks.

Instead of an industry that is producing software, which was a revolutionary idea in the early '90s - remember the idea that Microsoft created for packaging software, revolutionized the PC industry. Without packaged software, what would we use the PC to do? It drove this industry.

Now we have a new factory, a new computer, and what we will run on top of this is a new type of software, and we call it NIMS, Nvidia inference microservices. Now, what happens is the NIM runs inside this factory, and this NIM is a pre-trained model. It's an AI. Well, this AI is, of course, quite complex in itself, but the computing stack that runs AI is insanely complex.

When you go and use Chat GPT, underneath there, their stack is a whole bunch of software. Underneath that prompt is a ton of software, and it's incredibly complex because the models are large, billions to trillions of parameters. It doesn't run on just one computer. It runs on multiple computers. It has to distribute the workload across multiple GPUs, tensor parallelism, pipeline parallelism, data parallelism, all kinds of parallelism, expert parallelism, all kinds of parallelism, distributing the workload across multiple GPUs, processing it as fast as possible.

Because if you are in a factory, if you run a factory, your throughput directly correlates to your revenues. Your throughput directly correlates to the quality of service, and your throughput directly correlates to the number of people who can use your service.

We are now in a world where data center throughput utilization is vitally important. It was important in the past, but not vitally important. It was important in the past, but people don't measure it today. Every parameter is measured - start time, uptime, utilization, idle time, you name it because it's a factory. When something is a factory, its operations directly correlate to the financial performance of the company.

So we realize that this is incredibly complex for most companies to do, so what we did was we created this AI in a box, and the containers incredible months of software inside this container is CUDA, CNN, tensor RT, Triton for inference services. It is cloud-native so that you could auto-scale in a Kubernetes environment. It has management services and hooks so that you can monitor your AIs. It has common APIs, standard API, so that you could literally chat with this box.

You download this NIM, and you can talk to it, so long as you have CUDA on your computer, which is now, of course, everywhere. It's in every cloud, available from every computer maker. It is available in hundreds of millions of PCs.

When you download this, you have an AI, and you can chat with it like Chat GPT. All of the software is now integrated - 400 dependencies, all integrated into one. We tested this NIM, each one of these pre-trained models against all our entire install base that's in the cloud, all the different versions of Pascal and Ampere and Hoppers, and all kinds of different versions.

I even forget some NIMs incredible invention. This is one of my favorites, and of course, as you know, we now have the ability to create large language models and pre-trained models of all kinds, and we have all of these various versions, whether it's language-based or vision-based or imaging-based, or we have versions that are available for healthcare, digital biology.

We have versions that are digital humans that I'll talk to you about, and the way you use this, just come to AI.nvidia.com, and today we just posted up on Hugging Face, the Llama 3 NIM, fully optimized. It's available there for you to try, and you can even take it with you. It's available to you for free.

So you could run it in the cloud, run it in any cloud, you could download this container, put it into your own data center, and you could host it, make it available for your customers. We have, as I mentioned, all kinds of different domains, physics, some of it is for semantic retrieval called RAGs, vision, languages, all kinds of different languages.

And the way that you use it is connecting these microservices into large applications. One of the most important applications in the coming future, of course, is customer service agents. Customer service agents are necessary in just about every single industry. It represents trillions of dollars of customer service around the world.

Nurses are customer service agents, um, in some ways, some of them are non-prescription or non-diagnostics-based nurses. They are essentially customer service, customer service for retail, for uh, quick service foods, financial services, insurance - just tens and tens of millions of customer service can now be augmented by language models and augmented by AI.

So these boxes that you see are basically NIMs. Some of the NIMs are reasoning agents - given a task, figure out what the mission is, break it down into a plan. Some of the NIMs retrieve information. Some of the NIMs might go and do search. Some of the NIMs might use a tool like COP that I was talking about earlier. It could use a tool that, uh, could be running on SAP, and so it has to learn a particular language called ABAP. Maybe some NIMs have to do SQL queries.

So all of these NIMs are experts that are now assembled as a team. What's happening? The application layer has been changed. What used to be applications written with instructions are now applications that are assembling teams, assembling teams of AIs. Very few people know how to write programs. Almost everybody knows how to break down a problem and assemble teams.

I believe every company in the future will have a large collection of NIMs, and you would bring down the experts that you want, connect them into a team, and you don't even have to figure out exactly how to connect them. You just give the mission to an agent, to a NIM, to figure out who to break the task down and who to give it to, and they that a that central, the leader of the application, if you will, the leader of the team, would break down the task and give it to the various team members.

The team members would do their task, bring it back to the team leader, the team leader would reason about that and present an information back to you, just like humans.

This is in our near future. This is the way applications are going to look now.

Of course, we could interact with these large AI services with text prompts and speech prompts. However, there are many applications where we would like to interact with what is otherwise a human-like form. We call them digital humans. Nvidia has been working on digital human technology for some time. Let me show it to you.

And well before I do that, hang on a second. Before I do that, okay, digital humans have the potential of being a great interactive agent with you. They make much more engaging. They could be much more empathetic. And of course, um, we have to cross this incredible chasm, this uncanny chasm of realism, so that the digital humans would appear much more natural. This is, of course, our vision. This is a vision of where we love to go, uh, but let me show you where we are.

Great to be in Taiwan before I head out to the night market. Let's dive into some exciting frontiers of digital humans. Imagine a future where computers interact with us just like humans can.

Hi, my name is Sophie, and I am a digital human brand ambassador for Unique. This is the incredible reality of digital humans. Digital humans will revolutionize industries from customer service to advertising and gaming. The possibilities for digital humans are endless.

You, the scans you took of your current kitchen with your phone, they will be AI interior designers helping generate beautiful photorealistic suggestions and sourcing the materials and furniture. We have generated several design options for you to choose from. There'll also be AI customer service agents making the interaction more engaging and personalized.

Or digital healthcare workers who will check on patients, providing timely personalized care.

Um, I did forget to mention to the doctor that I am allergic to penicillin. Is it still okay to take the medications, the antibiotics you've been prescribed, ciprofloxacin and metronidazole?

Don't contain penicillin, so it's perfectly safe for you to take them.

And they'll even be AI brand ambassadors setting the next marketing and advertising trends.

Hi, I'm IMA, Japan's first virtual model. New breakthroughs in generative AI and computer graphics let digital humans see, understand, and interact with us in human-like ways.

From what I can see, it looks like you're in some kind of recording or production setup.

The foundation of digital humans are AI models built on multilingual speech recognition and synthesis and LLMs that understand and generate conversation. The AIs connect to another generative AI to dynamically animate a lifelike 3D mesh of a face.

And finally, AI models that reproduce lifelike appearances, enabling real-time path-traced subsurface scattering to simulate the way light penetrates the skin, scatters, and exits at various points, giving skin its soft and translucent appearance.

Nvidia ACE is a suite of digital human technologies packaged as easy-to-deploy, fully optimized microservices or NIMs. Developers can integrate ACE NIMs into their existing frameworks, engines, and digital human experiences.

Neoton SLM and LLM NIMs to understand our intent and orchestrate other models. Reva speech NIMs for interactive speech and translation. Audio to face and gesture NIMs for facial and body animation. And Omniverse RTX with DLSS for neuro rendering of skin and hair.

ACE NIMs run on Nvidia GDN, a global network of Nvidia accelerated infrastructure that delivers low-latency digital human processing to over 100 regions.

It's pretty incredible. While those ACE runs in a cloud, but it also runs on PCs. We had the good wisdom of including tensor core GPUs in all of RTX. So we've been shipping AI GPUs for some time, preparing ourselves for this day.

The reason for that is very simple. We always knew that in order to create a new computing platform, you need an install base first. Eventually, the application will come. If you don't create the install base, how could the application come? And so if you build it, they might not come, but if you build it, if you don't build it, they cannot come.

And so we installed every single RTX GPU with tensor core G, tensor core processing, and now we have 100 million GeForce RTX AI PCs in the world, and we're shipping 200.

And this copy Tex were featuring four new amazing laptops, all of them are able to run AI. Your future laptop, your future PC will become an AI. It'll be constantly helping you, assisting you in the background.

The PC will also run applications that are enhanced by AI, of course. All your photo editing, your writing, and your tools, all the things that you use will all be enhanced by AI. And your PC will also host applications with digital humans that are AIs.

So there are different ways that AIs will manifest themselves and become used in PCs, but PCs will become very important AI platforms. So where do we go from here?

I spoke earlier about the scaling of our data centers, and every single time we scaled, we found a new phase change. When we scaled from DGX into large AI supercomputers, we enabled Transformers to be able to train on enormously large data sets.

Well, what happened was, in the beginning, the data was human-supervised. It required human labeling to train AIs. Unfortunately, there's only so much you can human label. Transformers made it possible for unsupervised learning to happen.

Now, Transformers just look at an enormous amount of data or look at an enormous amount of video or look at an enormous amount of images, and it can learn from studying an enormous amount of data, find the patterns and relationships itself.

While the next generation of AI needs to be physically based. Most of the AIs today don't understand the laws of physics. It's not grounded in the physical world. In order for us to generate images, videos, 3D graphics, and many physics phenomena, we need AI that are physically based and understand the laws of physics.

Well, the way that you could do that is, of course, learning from video is one source. Another way is synthetic data, simulation data. And another way is using computers to learn with each other. This is really no different than using AlphaGo, having AlphaGo play itself, self-play, and between the two capabilities, playing each other for a very long period of time, they emerge even smarter.

So you're going to start to see this type of AI emerging. Well, if the AI data is synthetically generated and using reinforcement learning, it stands to reason that the rate of data generation will continue to advance, and every single time data generation grows, the amount of computation that we have to offer needs to grow with it.

We are to enter a phase where AIs can learn the laws of physics and understand and be grounded in physical world data. So we expect that models will continue to grow, and we need larger GPUs.

Blackwell was designed for this generation. This is Blackwell, and it has several very important technologies. One of course is just the size of the chip. We took two of the largest chips that is as large as you can make it at TSMC and connected two of them together with a 10 terabytes per second link between the world's most advanced CCs, connecting these two together.

We then put two of them on a computer node connected with a Gray CPU. The Gray CPU could be used for several things. In the training situation, it could be used for fast checkpoint and restart. In the case of inference and generation, it could be used for storing context memory so that the AI has memory and understands the context of the conversation you would like to have.

It's our second-generation Transformer engine. The Transformer engine allows us to adapt dynamically to a lower precision based on the precision and the range necessary for that layer of computation.

This is our second-generation GPU that has secure AI so that you could ask your service providers to protect your AI from being either stolen from theft or tampering.

This is our fifth-generation NVLink. NVLink allows us to connect multiple GPUs together, and I'll show you more of that in a second.

And this is also our first generation with a reliability and availability engine system. This RAS system allows us to test every single transistor, flip-flop, memory, on-chip memory, off-chip so that we can in the field determine whether a particular chip is failing.

The MTBF, the meantime between failure of a supercomputer with 10,000 GPUs, is measured in hours. The meantime between failure of a supercomputer with 100,000 GPUs is measured in minutes. So the ability for a supercomputer to run for a long period of time and train a model that could last for several months is practically impossible if we don't invent technologies to enhance its reliability.

Reliability would of course enhance its uptime, which directly affects the cost.

And then lastly, the decompression engine. Data processing is one of the most important things we have to do. We added a data compression engine, decompression engine, so that we can pull data out of storage 20 times faster than what's possible today.

Well, all of this represents Blackwell, and I think we have one here that's in production during GTC. I showed you Blackwell in a prototype state. The other side, this is why we practice, ladies and gentlemen, this is Blackwell.

Blackwell is in production. Incredible amounts of technology. This is our production board. This is the most complex, highest-performance computer the world's ever made.

This is the Gray CPU, and these are you could see each one of these Blackwells, two of them connected together. You see that it is the largest die, the largest chip the world makes, and then we connect two of them together with a 10-terabyte per second link.

Okay, and that makes the Blackwell computer, and the performance is incredible. Take a look at this.

So you see, our computational, the FLOPs, the AI FLOPs for each generation has increased by a thousand times in eight years. Moore's Law in eight years is something along the lines of, I don't know, maybe 40, 60, and in the last eight years, Moore's Law has gone a lot less.

So just to compare, even Moore's Law at its best of times compared to what Blackwell could do, so the amount of computations is incredible.

And whenever we bring the computation high, the thing that happens is the cost goes down. And I'll show you what we've done is we've increased throughput, computational capability, the energy used to train a GP4, 2 trillion parameter, 8 trillion tokens.

The amount of energy that is used has gone down by 350 times. Well, Pascal would have taken 1,000 gigawatt hours. 1,000 gigawatt hours means that it would take a gigawatt data center. The world doesn't have a gigawatt data center, but if you had a gigawatt data center, it would take a month.

If you had a 100-megawatt data center, it would take about a year. So nobody would, of course, um, create such a thing.

And that's the reason why these large language models, Chat GPT wasn't possible only eight years ago. By us driving down, increasing the performance, the energy efficiency while keeping and improving energy efficiency along the way, we've now taken with Blackwell what used to be 1,000 gigawatt hours to three, incredible advance.

Three gigawatt hours. If it's a, um, a 10,000 GPUs, for example, it would only take about 10 days or so. So the amount of advance in just eight years is incredible.

Well, this is for inference. This is for token generation. Our token generation performance has made it possible for us to drive the energy down by 45,000 times, 177,000 joules per token. That was Pascal. 177,000 joules is kind of like two light bulbs running for two days.

It would take two light bulbs running for two days, amounts of energy, 200 watts running for two days to generate one token of GPT.

It takes about three tokens to generate one word. And so the amount of energy used necessary for Pascal to generate GP4 and have a Chat GPT experience with you was practically impossible.

But now we only use 0.4 joules per token, and we can generate tokens at incredible rates and very little energy.

Okay, so Blackwell is just an enormous leap. Well, even so, it's not big enough. And so we have to build even larger machines. And so the way that we build it is called DGX.

So this is our Blackwell chips, and it goes into DGX systems. That's why we should practice. So this is a DGX Blackwell. This has this is air-cooled, has eight of these GPUs inside. Look at the size of the heat sinks on these GPUs, about 15 kilowatts, 15,000 watts, and completely air-cooled.

This version supports x86, and it goes into the infrastructure that we've been shipping Hoppers into. However, if you would like to have liquid cooling, we have a new system, and this new system is based on this board, and we call it MGX for modular.

And this modular system, you won't be able to see this. Can you see this? Can you see the...

Okay, I'll say, and so this is the MGX system. And here's the two, uh, Black, Blackwell boards. So this one node has four Blackwell chips. These four Blackwell chips are liquid-cooled.

Nine of them, nine of them, well, 72 of these, 72 of these GPUs, 72 of these GPUs are then connected together with a new NVLink. This is NVLink switch, fifth generation.

And the NVLink switch is a technology miracle. This is the most advanced switch the world's ever made. The data rate is insane, and these switches connect every single one of these Blackwells to each other.

So that we have one giant, 72 GPU Blackwell. The benefit of this is that in one domain, one GPU domain, this now looks like one GPU. This one GPU has 72 versus the last generation of eight, so we increased it by nine times. The amount of bandwidth we've increased by 18 times. The AI flops we've increased by 45 times.

And yet, the amount of power is only 10 times. This is 100 kilowatts, and that's for one. Now, of course, you could always connect more of these together, and I'll show you how to do that in a second.

But what's the miracle is this chip, this NVLink chip. People are starting to awaken to the importance of this NVLink chip as it connects all these different GPUs together because the large language models are so large, it doesn't fit on just one GPU, doesn't fit on one, just just one node.

It's going to take the entire rack of GPUs like this new DGX that I that I was just standing next to, to hold a large language model that are tens of trillions of parameters.

Large MVLink switch in itself is a technology miracle. It's 50 billion transistors, 74 ports at 400 Gbits each, four links cross-sectional bandwidth of 7.2 terabytes per second. But one of the important things is that it has mathematics inside the switch so that we can do reductions, which is really important in deep learning, right on the chip.

And so this is what this is what a DGX looks like now. And a lot of people ask us, you know, they say, and there's this, there's this confusion about what Nvidia does and how is it possible that Nvidia became so big building GPUs.

And so there's an impression that this is what a GPU looks like. Now, this is a GPU, this is one of the most advanced GPUs in the world, but this is a gamer GPU. But you and I know that this is what a GPU looks like. This is one GPU, ladies and gentlemen, DGX GPU.

You know, the back of this GPU is the NVLink spine. The NVLink spine is 5,000 wires, two miles, and it's right here. This is an NVLink spine, and it connects 70 two GPUs to each other.

This is an electrical mechanical miracle. The transceivers make it possible for us to drive the entire length in copper. And as a result, this switch, the EnV link switch, driving the EnV link spine in copper, makes it possible for us to save 20 kilowatts in one rack. 20 kilowatts could now be used for processing, just an incredible achievement.

So this is the MV links.

I went down today, and if you and even this is not big enough, even this is not big enough for AI factories. So we have to connect it all together with very high-speed networking.

Well, we have two types of networking. We have InfiniBand, which has been used in supercomputing and AI factories all over the world, and it is growing incredibly fast for us. However, not every data center can handle InfiniBand because they've already invested their ecosystem in Ethernet for too long.

And it does take some specialty and some expertise to manage InfiniBand switches and InfiniBand networks. And so what we've done is we've brought the capabilities of InfiniBand to the Ethernet architecture, which is incredibly hard.

The reason for that is Ethernet was designed for high average throughput because every single note, every single computer is connected to a different person on the internet, and most of the communications is the data center with somebody on the other side of the internet.

However, deep learning in AI factories, the GPUs are not communicating with people on the internet, mostly it's communicating with each other. They're communicating with each other because they're all they're collecting partial products, and they have to reduce it and then redistribute it.

Chunks of partial products, reduction, redistribution - that traffic is incredibly bursty, and it is not the average throughput that matters. It's the last arrival that matters because if you're reducing, collecting partial products from everybody, if I'm trying to take all of your so it's not the average throughput.

It's whoever gives me the answer last. Ethernet has no provision for that. And so there are several things that we have to create. We created an end-to-end architecture so that the the NIC and the switch can communicate.

We applied four different technologies to make this possible. Number one, Nvidia has the world's most advanced RDMA, and so now we have the ability to have a network-level RDMA for Ethernet that is incredibly great. Number two, we have congestion control. The switch does telemetry at all times, incredibly fast.

And whenever the uh, the uh, the GPUs or the the NXs are sending too much information, we can tell them to back off so that it doesn't create hotspots. Number three, adaptive routing. Ethernet needs to transmit and receive in order. We see congestions, or we see uh, ports that are not currently being used irrespective of the ordering, we will send it to the available ports, and BlueField on the other end reorders it so that it comes back in order.

That adaptive routing, incredibly powerful. And then lastly, noise isolation. There's more than one model being trained or something happening in the data center at all times, and their noise and their traffic could get into each other and causes jitter.

So when it's when the noise of one training model, one model training causes the last arrival to end up too late, it really slows down the training. Well, overall, remember you have you have built a $5 billion data center, and you're using this for training. If the utilization network utilization was 40% lower, and as a result, the training time was 20% longer.

The $5 billion data center is effectively like a $6 billion data center, so the cost is incredible. The cost impact is quite high. Ethernet with Spectrum-X basically allows us to improve the performance so much that the network is basically free. And so this is really quite an achievement.

We're very, we have a whole pipeline of Ethernet products behind us. This is Spectrum-X800. It is 51.2 terabits per second and 256 radix. The next one coming is 512 RIC, and that's one year from now, 512 RIC, and that's called Spectrum-X800 Ultra. And the one after that is X1600.

But the important idea is this X800 is designed for tens of thousands, tens of thousands of GPUs. X800 Ultra is designed for hundreds of thousands of GPUs, and X1600 is designed for millions of GPUs. The days of millions of GPU data centers are coming.

And the reason for that is, of course, we want to train much larger models, but very importantly, in the future, almost every interaction you have with the Internet or with a computer will likely have a generative AI running in the cloud somewhere, and that generative AI is working with you, interacting with you, generating videos or images or text or maybe a digital human.

So you're interacting with your computer almost all the time, and there's always a generative AI connected to that. Some of it is on-prem, some of it is on your device, and a lot of it could be in the cloud. These generative AIs will also do a lot of reasoning capability.

Instead of just one-shot answers, they might iterate on answers so that it improves the quality of the answer before they give it to you. And so the amount of generation we're going to do in the future is going to be extraordinary.

Let's take a look at all of this put together now. Tonight, this is our first nighttime keynote. I want to thank all of you for coming out tonight at 7 o'clock, and so what I'm about to show you has a new vibe. Okay, there's a new vibe. This is kind of the nighttime keynote vibe, so enjoy.

This is Blackwell, of course. It's the first generation of Nvidia platforms that was launched at the beginning at the right as the world knows, the generative AI era is here. Just as the world realized the importance of AI factories, just as the beginning of this new industrial revolution.

We have so much support, nearly every, every computer maker, every CSP, every GPU cloud sovereign clouds, even telecommunications companies, enterprises all over the world. The amount of success, the amount of adoption, the amount of enthusiasm for Blackwell is just really off the charts.

And I want to thank everybody for that. We're not stopping there. During this during the time of this incredible growth, we want to make sure that we continue to enhance performance, continue to drive down the cost of training, cost of inference, and continue to scale out AI capabilities for every company to embrace.

The further we the further performance we drive up, the greater the cost decline. Hopper platform, of course, was the most successful data center processor probably in history, and this is just an incredible, incredible success story. However, Blackwell is here.

And every single platform, as you'll notice, are several things. You got the CPU, you have the GPU, you have NVLink, you have the NIC, and you have the switch. The MVLink switch, the uh, connects all of the GPUs together as large of a domain as we can, and whatever we can do, we connect it with large, very large, and very high-speed switches.

Every single generation, as you'll see, is not just a GPU, but it's an entire platform. We build the entire platform. We integrate the entire platform into an AI factory supercomputer. However, then we disaggregate it and offer it to the world.

And the reason for that is because all of you could create interesting and innovative configurations and all kinds of different, uh, styles and to fit different, uh, data centers and different customers in different places, some of it for edge, some of it for Telco, and all of the different innovation are possible if it would we made the systems open and make it possible for you to innovate.

So we design it integrated but offer it to you disintegrated so that you could create modular systems. The Blackwell platform is here. Our company is on a one-year rhythm.

Our basic philosophy is very simple: one, build the entire data center scale, disaggregate it, and sell it to you in parts on a one-year rhythm, and we push everything to technology limits - whatever TSMC process technology, we push it to the absolute limits, whatever packaging technology, push it to the absolute limits, whatever memory technology, push it to the absolute limits, SerDes technology, optics technology, everything is pushed to the limit.

And then after that, do everything in such a way so that all of our software runs on this entire install base. Software inertia is the single most important thing in computers. It will, when a computer is backwards compatible and it's architecturally compatible with all the software that has already been created, your ability to go to market is so much faster, and so the velocity is incredible when we can take advantage of the entire installed base of software that's already been created.

Well, Blackwell is here. Next year is Blackwell Ultra, just as we had H100 and H200. You'll probably see some pretty exciting new generation from us for Blackwell Ultra, again, again, push to the limits, and the next-generation Spectrum switches I mentioned.

Well, this is the very first time that this next click has been made, and I'm not sure yet whether I'm going to regret this or not. We have code names in our company, and we try to keep them very secret, uh, often times, uh, most of the employees don't even know, but our next-generation platform is called Reuben.

The Reuben platform, um, I'm, I'm not going to spend much time on it. I know what's going to happen. You're going to take pictures of it, and you're going to go look at the fine prints, uh, and feel free to do that. So we have the Reuben platform, and one year later, we have the Reuben Ultra platform.

All of these chips that I'm showing you here are all in full development, 100% of them, and the rhythm is one year at the limits of technology, all 100% are architecturally compatible.

So this is basically what Nvidia is building, and all of the riches of software on top of it. So in a lot of ways, the last 12 years from that moment of ImageNet and us realizing that the future of computing was going to radically change to today is really exactly as I was holding up earlier, GeForce pre-2010, and Nvidia today.

The company has really transformed tremendously, and I want to thank all of our partners here for supporting us every step along the way.

This is the Nvidia Blackwell platform. Let me talk about what's next. The next wave of AI is physical AI, AI that understands the laws of physics, AI that can work among us, and so they have to understand the world model so that they understand how to interpret the world, how to perceive the world.

They have to, of course, have excellent cognitive capabilities, so they can understand us, understand what we ask, and perform the tasks in the future. Robotics is a much more pervasive idea. Of course, when I say robotics, there's a humanoid robotics that's usually the representation of that, but that's not at all true. Everything is going to be robotic.

All of the factories will be robotic. The factories will orchestrate robots, and those robots will be building products that are robotic, robots interacting with robots, building products that are robotic.

Well, in order for us to do that, we need to make some breakthroughs, and let me show you the video.

The era of robotics has arrived. One day, everything that moves will be autonomous. Researchers and companies around the world are developing robots powered by physical AI. Physical AIs are models that can understand instructions and autonomously perform complex tasks in the real world.

Multimodal LLMs are breakthroughs that enable robots to learn, perceive, and understand the world around them and plan how they'll act. From human demonstrations, robots can now learn the skills required to interact with the world using gross and fine motor skills.

One of the integral technologies for advancing robotics is reinforcement learning. Just as LLMs need RLH, or reinforcement learning from human feedback, to learn particular skills, generative physical AI can learn skills using reinforcement learning from physics feedback in a simulated world.

These simulation environments are where robots learn to make decisions by performing actions in a virtual world that obeys the laws of physics. In these robot gyms, a robot can learn to perform complex and dynamic tasks safely and quickly, refining their skills through millions of acts of trial and error.

We built Nvidia Omniverse as the operating system where physical AIs can be created. Omniverse is a development platform for virtual world simulation, combining real-time physically based rendering, physics simulation, and generative AI technologies.

In Omniverse, robots can learn how to be robots. They learn how to autonomously manipulate objects with precision, such as grasping and handling objects, or navigate environments autonomously, finding optimal paths while avoiding obstacles and hazards.

Learning in Omniverse minimizes the sim-to-real gap and maximizes the transfer of learned behavior. Building robots with generative physical AI requires three computers: Nvidia AI supercomputers to train the models, Nvidia Jetson Orin and next-generation Jetson Thor robotic supercomputers to run the models, and an Nvidia Omniverse where robots can learn and refine their skills in simulated worlds.

We build the platforms, acceleration libraries, and AI models needed by developers and companies and allow them to use any or all of the stacks that suit them best.

The next wave of AI is here. Robotics powered by physical AI will revolutionize industries.

This isn't the future. This is happening now.

There are several ways that we're going to serve the market. The first, we're going to create platforms for each type of robotic systems: one for robotic factories and warehouses, one for robots that manipulate things, one for robots that move, and one for robots that are humanoid.

So each one of these robotic platforms is like almost everything else we do: a computer, acceleration libraries, and pre-trained models. And we test everything, we train everything, and integrate everything inside Omniverse, where Omniverse is, as the video was saying, where robots learn how to be robots.

Of course, the ecosystem of robotic warehouses is really, really complex. It takes a lot of companies, a lot of tools, a lot of technology to build a modern warehouse, and warehouses are increasingly robotic. One of these days, they will be fully robotic.

So in each one of these ecosystems, we have SDKs and APIs that are connected into the software industry, SDKs and APIs connected into the edge AI industry, and companies, and then also, of course, systems that are designed for PLCs and robotic systems for the ODMs.

It's then integrated by integrators, ultimately building warehouses for customers. Here we have an example of KENMAC building a robotic warehouse for Giant Group.

And then here, now let's talk about factories. Factories have a completely different ecosystem, and Foxconn is building some of the world's most advanced factories. Their ecosystem, again, edge computers and robotics software for designing the factories, the workflows, programming the robots, and of course, PLC computers that orchestrate the digital factories and the AI factories.

We have SDKs that are connected into each one of these ecosystems as well. This is happening all over Taiwan. Foxconn has built is building digital twins of their factories. Delta is building digital twins of their factories. By the way, half is real, half is digital. Half is Omniverse.

Pegatron is building digital twins of their robotic factories. Wistron is building digital twins of their robotic factories. And this is really cool. This is a video of Foxconn's new factory. Let's take a look.

Demand for NVIDIA accelerated computing is skyrocketing as the world modernizes traditional data centers into generative AI factories.

Foxconn, the world's largest electronics manufacturer, is gearing up to meet this demand by building robotic factories with Nvidia Omniverse and AI.

Factory planners use Omniverse to integrate facility and equipment data from leading industry applications like Siemens Teamcenter and Autodesk Revit.

In the digital twin, they optimize floor layout and line configurations and locate optimal camera placements to monitor future operations with Nvidia Metropolis powered vision AI.

Virtual integration saves planners the enormous cost of physical change orders during construction. The Foxconn teams use the digital twin as the source of truth to communicate and validate accurate equipment layout.

The Omniverse digital twin is also the robot gym where Foxconn developers train and test Nvidia Isaac AI applications for robotic perception and manipulation and Metropolis AI applications for sensor fusion.

In Omniverse, Foxconn simulates two robot AIs before deploying runtimes to Jetson and computers on the assembly line. They simulate Isaac manipulator libraries and AI models for automated optical inspection for object identification, defect detection, and trajectory planning.

To transfer HGX systems to the test pods, they simulate Isaac perceptor-powered Fobot AMRs as they perceive and move about their environment with 3D mapping and reconstruction.

With Omniverse, Foxconn builds their robotic factories that orchestrate robots running on Nvidia Isaac to build Nvidia AI supercomputers, which in turn train Foxconn robots.

So a robotic factory is designed with three computers: train the AI on Nvidia AI, you have the robot running on the PLC systems for orchestrating the factories, and then of course, simulate everything inside Omniverse.

Well, the robotic arm and the robotic AMRs are also the same way, three computer systems. The difference is the two Omniverses will come together, so they'll share one virtual space. When they share one virtual space, that robotic arm will become inside the robotic factory, and again, three, three, three computers.

And we provide the computer, the acceleration layers, and pre-trained AI models. We've connected Nvidia manipulator and Nvidia Omniverse with Seimens, the world's leading industrial automation software and systems company. This is really a fantastic partnership, and they're working on factories all over the world.

Semantic PI AI now integrates Isaac manipulator, and semantic pick AI runs operates ABB, Yaskawa, FANUC, Universal Robotics, and Techman.

So Seimens is a fantastic integration. We have all kinds of other integrations. Let's take a look. ArcBest is integrating Isaac perceptor into Vox smart autonomy robots for enhanced object recognition and human motion tracking in material handling.

BYD Electronics is integrating Isaac manipulator and perceptor into their AI robots to enhance manufacturing efficiencies for global customers.

Ideal Works is building Isaac perceptor into their Iwos software for AI robots in factory logistics.

Intrinsic, an Alphabet company, is adopting Isaac manipulator into their FlowState platform to advance robot grasping.

Gideon is integrating Isaac perceptor into Treyl AI-powered forklifts to advance AI-enabled logistics.

Argo Robotics is adopting Isaac CTOR into the perception engine for advanced vision-based AMRs.

Solomon is using Isaac manipulator AI models in their Acuppiic 3D software for industrial manipulation.

Techman Robot is adopting Isaac Sim and manipulator into TM Flow, accelerating automated optical inspection.

Pteron Robotics is integrating Isaac manipulator into Polycope X for cobots, and Isaac perceptor into Mirr AMR.

Vention is integrating Isaac manipulator into Machine Logic for AI manipulation robots.

Robotics is here. Physical AI is here. This is not science fiction, and it's being used all over Taiwan, and it's really, really exciting.

And that's the factory, the robots inside, and of course, all the products going to be robotics. There are two very high-volume robotics products.

One of course is the self-driving car or cars that have a great deal of autonomous capability. Nvidia again builds the entire stack. Next year, we're going to go to production with the Mercedes fleet, and after that, in 2026, the JLR fleet.

We offer the full stack to the world, however, you're welcome to take whichever parts, whichever layer of our stack, just as the entire Drive stack is open.

The next high-volume robotics product that's going to be manufactured by robotic factories with robots inside will likely be humanoid robots, and this has made great progress in recent years, both in the cognitive capability because of foundation models and also the world understanding capability that we're in the process of developing.

I'm really excited about this area because, obviously, the easiest robot to adapt into the world are human robots because we built the world for us. We also have the vast, the most amount of data to train these robots than other types of robots because we have the same physique.

So the amount of training data we can provide through demonstration capabilities and video capabilities is going to be really great. And so we're going to see a lot of progress in this area.

Well, I think we have some robots that we'd like to welcome here. We go. About my size, and we have some friends to join us. So the future of robotics is here. The next wave of AI, and of course, you know, Taiwan builds computers with keyboards. You build computers for your pocket. You build computers for data centers.

In the future, you're going to build computers that walk and computers that roll around. And um, so these are all just computers, and as it turns out, the technology is very similar to the technology of building all of the other computers that you already build today.

So this is going to be a really extraordinary journey for us. Well, I want to thank, I want to thank, I want to thank, I have, I've made one last video. If you don't mind, uh, something that, uh, that we really enjoyed making, um, and if you let's run it.

Thank you, I love you guys. Thank you.

Thank you all for coming. Have a great comput.

Notes

That's all the content of the video transcript for the video: 'NVIDIA CEO Jensen Huang Keynote at COMPUTEX 2024'. We use AI to organize the content of the script and write a summary.

For more transcripts of YouTube videos on various topics, explore our website further.