Challenges To Building Level 5 Automotive Chips

It’s an exciting time in the automotive space, and this is especially true when it comes to all of the activity around autonomous driving and the path to achieving full Level 5 autonomy. The technology is complex, the ecosystem seems to get more complex by the day, and simulating autonomous systems safely makes this an extremely fascinating area from an engineering perspective. At the heart of achieving Level 5 autonomous are the electronic components that serve as the conductors to direct the orchestra of hardware and software. How big is this opportunity? With continuing consolidation of electronic functions, Semico Research estimates the overall market for automotive chips will reach $73 billion by 2023, with 91.8 million units dedicated to AI for ADAS. Semiconductor Engineering recently discussed these issues during a conference panel session at the Drive World Conference with Dean Drako, CEO at Drako Motors; David Fritz, senior autonomous vehicle SoC leader at Mentor, a Siemens Business; Rahul Gulati, principal engineer at Qualcomm; Burkhard Huhnke, vice president for automotive at Synopsys; and Bala Rajendran, Global CTO EDA at Dell. What follows are excerpts of that discussion. (L-R): Bala Rajendran, Dean Drako, Ann Steffora Mutschler, Burkhard Huhnke, David Fritz, Rahul Gulati. Source: Semiconductor Engineering SE: Level 5 autonomous is a really complicated engineering challenge that involves hardware, software, the system, and how to validate all of the elements together. As the automotive ecosystem moves in this direction, what do you see as some of the top issues. Drako: As we all know, Level 5 autonomy is really hard — really, really, really hard. It’s probably going to be a while before we get there. It’s maybe one of the biggest engineering problems we face in this decade, and maybe even the next decade. I break it down into three components. The first component in getting to Level 5 autonomy is reading the sensory input. There is a lot of video, a lot of LiDAR, a lot of input, and you have to understand the environment. You have to navigate and make decisions about where the car should go, how to drive the car, and these kinds of decisions. Once the car has figured out some basic driving — which is really hard — the second component is even harder, which is dealing with the strange and crazy things that people do. For instance, take the unexpected behavior of the ball bouncing in the front of the car, the child falling down, the car running the red light. All of these things that don’t fit the normal pattern are very dangerous, very unexpected, and can happen very quickly. You’ve got to handle all of these corner cases, in the code, in software. In fact, what gets very interesting is that the autonomous cars that are being driven around today are very slow and very cautious because that makes the problem a little easier. But in some situations, the decision is going to be actually to do the opposite of that, where you’re going to have to quickly do something in order to avoid an accident or avoid hurting somebody. It’s going to be a long time before we get to that level of software. The third element is that you have to have the mechanical and software components that the first two components give the commands to, and can then faithfully execute them. Here, you need a mechanical system, and a software system that given the decision made by the previous two components, can accurately move the car, adjust the car, drive the car, stop the car in all kinds of adverse conditions — snow, ice, weather, rain, low visibility, high visibility, even potentially when some components of the car are not working. Fritz: When I started looking into autonomous ICs, and what it was going to take, it was clear there were some major pieces that were missing. First of all, you have to have a way of handling the sensing and the perception, so Siemens went out and bought a company. Then, you need the decision-making, the artificial intelligence, so Siemens bought Mentor Graphics. There’s another company to do actuation modeling with 1D physics, and they bought (LMS). They’ve spent over $10.4 billion over the last decade putting this together, so it’s not just something simple. It’s not something that just any company could do. First, if we’re SoC engineers and we’re just thinking about that SoC, we forget that because the inputs to AVs are so complex, and the decisions that have to be made are so complex. You have to take into account the environment and the actuation. It’s only then that you know if you’re getting correct operation and correct decisions are being made in the context of the whole vehicle. We can no longer just take the chips and say, ‘I used UVM, I fed some inputs in, I calculated the right outputs out.’ It doesn’t work anymore. That’s not how we need to do things anymore, particularly in the AV space. Constrained random testing, Monte Carlo simulation, that’s what we’re doing. That’s what Waymo’s doing. They recently drove 2 billion simulation miles. My question is, how many of those miles are just repeating getting to the point where you can do something different? Last thing here, consolidation of functionality is inevitable. We’ve seen it in every industry. If you have a large compute platform, when the competition starts, we start throwing other things into that, we start adding to that; it’s an economic function, it’s happening. When you start thinking about the impact of these three things on the design of an AV IC, it should change the methodology that you’re using not only testing, building, the interaction between hardware and software, where does AI actually fit in this process? How do we actually do inferencing in an effective way? All of those are things that we have to consider at the top level? And then once you have all that, how do you verify correct operation? And you want to do it before you put the thing on the road not after you put the thing on the road and there’s methodologies for doing that. Gulati: I want to talk about the top three factors influencing an SoC architecture that can be used to implement an automated learning system. A lot of it is a factor of an overall system architecture. Do I have a performance SoC coupled with a typical virtual motion control unit, which is the highest safety integrity ASIL-D. And do we allocate some safety requirements on the performance SoC? If we were to look at top three stake-holding requirements, in terms of these are an absolute, non-negotiable set of requirements for any given SoC in an AV, it would be cybersecurity, functional safety, and performance per watt. If you start decomposing those requirements, they will lead to several sub-requirements. If we say we need to be ASIL-B, this comes with requirements for organizations to do safety-critical SoC development. Do you have the right processes in place? Do you have the right competence to be able to do that SoC architecture? This also comes with the need for a safety culture in the organization. Is the organization capable of performing all the lifecycle activities that is required of a safety-critical development? That’s a fundamental change for somebody who is not delivering into the automotive safety digital world, to someone who now wants to deliver or intends to deliver. From the functional safety aspect, there are sub-branches for which you have to have to meet the requirements. In cybersecurity, all safety will be off the table if a malicious attacker is able to get hold of the device. In fact, all of that functional safety will actually work against you. It will do the wrong thing very correctly. That’s what functional safety will allow all of these malicious actors to perform. As far as performance per watt, we need all the hardware accelerators, but how many TOPS (tera operations per second) per watt? There are levels of expectations from OEM 1 to OEM 2, then a third OEM 3 says whatever you provide is insufficient because they don’t know what their requirements are, so we are learning as we go along. A last point is the need for standards. Standards typically reflect the state of the art, but that does not exist for these areas. Standards are being written as the technology is being introduced. All these neural networks that everyone talks about, eventually what may go to production, is what is not available today. It will be tomorrow’s net. That will be is far more efficient in terms of power in terms of performance, so let’s use that. Also, from a hardware architecture perspective, we have to be flexible enough to not tie ourselves to one specific implementation of a network. In terms of standards and guidelines and independent assessments, the functional safety standards point to the need for assessments and independent assessments for the highest instance but we still need an overarching document. Huhnke: I’m the evidence that it’s possible to come from a dusty old industry into the EDA industry. We never talked in the past, and maybe that was the problem. We should have. Maybe we should have talked about what’s possible in the design of integrated circuits and what the automotive requirements are to get both integrated and up to speed. If you look outside of this facility, you see all these prototypes driving around the area, and nobody worldwide is actually ready to run Level 4 and 5. Why? Because it’s still an approach that places the computer the trunk. There’s one manufacturer that has jumped into the next phase to build parallel cores across more of that space. It’s a master and slave principle to the functional safety observation within. What is really important is to move everything closer into the chip design, which is possible, and build in safety items —whatever is available to ensure that automotive requirements first of all land in the chip design to reduce the cost, to reduce components, and enable comprehensive safety and security design. Coming from this old traditional world, the development plan of a car has five domain controllers. That takes 3 to 4.5 years from concept to start of production. The tape out time of a car is 4.5 years. Tape out time of a silicon chip is maybe 1.5 years. If you’re able to synchronize that perfectly, then you have a chance actually to reach, finally, the product on the road. In mass production, we’re talking about multi-thousands per vehicle. What’s required? We have to talk with our OEMs in the early phase of the concept. This is when the chips are being designed. And this is when you can design all the requirements into the chip, which includes security and safety. It’s important to note that we already have models to verify and debug everything. Let’s use those models to co design hardware and software as early as possible. Shift left tools — let’s use our virtual models. We have great relationships with every semiconductor company to build these centers of excellence to provide this model so that you can learn software testing even before hardware is ready, and so you can increase coverage, you can prepare the test cases, and you can launch very well designed software before even hardware is available. This will be going successfully and then you can roll it out. This is the robustness of a process that is required in the most complex ecosystem in the world. Rajendran: My position is quite simple. AI chips for autonomous driving are still in their infancy. That means that it’s actually an exciting time, and we could actually do things very differently. The way we have been doing things in the past was dumb, and because I’ve been in the industry for almost 25 years, every time I see problems with EDA tools I wonder why they did this. There has to be a better way. Since we are innovating really fast, it seems every piece is moving, including design tools. Even the standards are not in place. So everything is evolving. For this to happen, we are completely in a Wild West world at this point. That’s why for us to move a little bit, we need to break some of our old ways of thinking in terms of what works and doesn’t work. We need to move a little bit faster. For any chip design project, the cheapest spot is basically the tools and infrastructure. They are the cheapest spot, but the most expensive part of what we’re actually working on. Yet companies think that they shouldn’t spend more than 2% to 5% of their total R&D budget on tools. SE: To move this forward, there are so many different activities happening across the semiconductor industry, and the automotive ecosystem. What should we be focusing on today? Bala: We spend a lot of time on the inference. For the older AI, it was purely influenced by a lot of work that’s being done on the learning part, which is the global design teams. A model that’s being generated in California will not work in China or different parts of the world. But they all have to work in a very globally distributed design infrastructure, and the data that they’re putting across the huge infrastructure, the things that scale, have not been done before. That has to tie in with the current the inference chip, the AI chip that you’re talking about. Both hardware and the software need to be accepting more of the co-design concept. Co-design has been around for a long time, but it’s mostly a theoretical experience. Huhnke: From the holistic perspective, it comes down to functional safety, security and reliability. Those are the three high value problems for the entire industry. How do you design that into the integrated circuit? That requires a collaboration between the building blocks, and leads into the question of what standard blocks are supposed to be used? What are the customer-specific blocks that can be added to that? The AI aspect is a broad key word, but what does it mean? Actually, it’s just an accelerator to use vision, pictures and do embedded vision processing. But what is the quality and the accuracy and the requirements? Last but not least, we need to define very exactly what the failure in time rate is actually allowed to be in this domain of the semiconductor design. Gulati: Traceability is another consideration, which is very unique to autonomous chips, because back in the day, at least, when we were designing chips for phones, there was not really a compelling reason to trace any chip all the way to product, as most of the lifespan is two to four years. When it comes to implementing for a car, it’s about a minimum of 15 to 20 years, so tracing that product all the way until regression, where exactly the bug got introduced, that’s a huge challenge. Drako: Every semiconductor data sheet I’ve ever read always has this little disclaimer that this device is not suitable for use in life-saving equipment. Everyone’s probably read that somewhere on every semiconductor data sheet the industry has created. And now we can’t have that disclaimer because basically we’re going to be using the chips and the software in not a life-saving device, but potentially a life-threatening device. And so we’ve got to go basically retrace all of the design methodology and everything and bring it up to hospital-grade level kind of equipment, and that’s a huge task. Audience Member: if you look at this from a functional safety standpoint, the standards specifically cause a safety element out of context development, meaning, it’s not been developed in the context of a particular model. It’s not for a Mercedes, or it’s not for a BMW or an Audi or a GM. So the semiconductor vendor has made assumptions at a system level, and it is the semiconductor vendor’s responsibility to document those assumptions of use for the system integrators. And as a system integrator, it’s my responsibility to read through all those assumptions of use, and make sure that I’ve established the validity of those assumptions. And if any of those gets invalidated, it means a system-level change and a broad impact to the end system. That’s where those assumptions of use have to be very explicitly stated by every semiconductor vendor. All of that has been changing over the last decade. Semiconductor vendors are now accepting that they are being used in safety critical applications, and they are explicitly putting that out as part of the safety manuals. As we go forward, we’ll see more and more outside, better assumptions of use being documented in those safety manuals. Fritz: What you all are saying is dead-on, but I see those as problems that are in the process of being solved. The bigger challenge, and what we really need to be thinking about is, why is it that we have so many casualties with AVs that are being prototyped now? Why is that happening and what do we have to do different to stop that from happening? If you actually look at where we are in the process of the hype curve, as research says, we’re at that ‘trough of disillusionment,’ which is great because that means we’re to the point where we can actually start to deliver to expectations and then go on from there to revenue generated by everybody involved. That’s wonderful. But in the cell phone market you have to go through a certification process before you can sell that phone to get on the carrier’s network. We have nothing even close to an equivalent of that for an automobile. We need a process that’s in place that can be embraced. Drako: What about crash testing? Fritz: Crash testing is very different from Level 5 autonomy. SE: There are some other industries that have already done a lot of the work in simulation and validation, namely, aerospace and defense. Are we borrowing enough from those industries to validate, verify, simulate, etc., so we are not reinventing the wheel? Drako: Probably not. Huhnke: Actually, aerospace and defense have been the driver for many years. Yes, military expectations and requirements are so high that the cost is also extremely high. Now we’re coming from a low-cost industry. The automotive industry has trained for many decades to deliver low-cost results. Now we have to approach each other and meet in the middle. We bring aerospace and defense and SAE together to overlap on the standards, because that’s exactly what’s required. Let’s learn from both sides. So it’s happening, but we can do much more. Gulati: We do have ASIL-rated parts today that are being used in all these safety critical systems, and wherever it is not ASIL-rated it is being coupled with an ASIL-rated part. Drako: You should compare the price of a Ford truck to a Boeing. Audience Member: Let me express the complete opposite to you because I worked for some time on the Google chauffeur car project, Waymo. I’m not speaking for them, but Google’s philosophy — which exists a little bit in Waymo — was to buy the least-reliable parts you can get your hands on, which really means buy the cheapest parts. As Google was building its server farms, instead of buying enterprise-grade, they bought the absolute cheapest part, expected every part to fail regularly, designed to fail operational design defense fully functional, not just operation. This is the philosophy of people who are not coming from the world you’re coming from. There are many arguments if this is the right philosophy not to try and get a military grade or life saving grade components, not to think of it from an SoC basis, but to think of the entire systems reliability, including all the redundancies. Redundancy doesn’t mean two parallel cores in lock step. Redundancy means that if any one of these fails, something else will pick up. Fritz: It’s really a collision of two very different worlds. Drako: It is very interesting. The software development process until about 10 to 12 years ago was all about making the software and then testing it. I can assume that the computer is going to run correctly, and the disk drives are going to be there, that when I store something it’s going to get stored, and when I asked for it back, it’s going to come back. You wrote your software assuming that. In today’s world of software development, you no longer assume that. You assume that the network is mishmash mess, and that anything might not work. So each little component is built very defensively, assuming that any of the network connections and connections between might disappear. Gulati: It’s a systems engineering issue, essentially, where the emergent property may not be evident from the individual functions of the individual elements. That’s when we bring them together and see the emergence. And then, how we manage that emergence is what it’s all about from a systems engineering standpoint. Fritz: Let’s look at where the automotive industry has been. First, it’s a very complex ecosystem. There are all kinds of partnerships. I spent a lot of time with OEMs and Tier 1s. The way the process has been for many years is, ‘Here are the major requirements, we decompose those and decompose and decompose until we get down to something that’s atomic. We try to model that. It’s primarily mechanical at that point with MATLAB. We push a button, we generate the C code, and put that on a 16-bit microcontroller on a PCB and call it done. Hopefully, you’ve done all that exactly right so when you reassemble it back up to the system and do the integration. Life is good.’ But it turns out that it’s never good. You still have half the project ahead, trying to solve the integration problems in a hurry. You miss a model year, you’re out of business, so the pressure is intense. What we’re saying is if they treated that process, and used some of the things that we’ve learned in semiconductor over the last 20 years, some of those problems are now more manageable. The solutions are more practical. That’s why we’re seeing a lot of discussions between OEMs, and the Tier 2s — the Intels, the Qualcomms, the Nvidias. Huhnke: The good thing is that the interface is being built. Actually, we’re talking different languages. It takes you six months until you understand EDA, and vice versa, but then it’s moving. It has been a mechanical engineering-focused industry, and now it’s shifting the core competence into the hardware and software stack. We will see in 10 years, the VW Group and whoever is out there, they will still survive. It’s not about that they will get out of the business. What’s happening right now is that this competency shift requires experts, and this perturbation model from the smallest piece of SoC up to the vehicle integration. Fritz: This is solid engineering that needs to be applied. Related Stories Autonomous Vehicles Are Reshaping The Tech World Even before fully autonomous vehicles blanket the road there is major upheaval at all levels of the industry. How Many Test Miles Make A Vehicle Safe? Simulation and test can improve safety, but that requires a standard framework and definitions. How To Automate Functional Safety Experts at the Table: With both established companies and new players clamoring to play a role in the automotive space, how is the industry moving towards automation? (Part One) Building An Efficient Inferencing Engine In A Car How to model a chip quickly, including corner cases. Automotive Knowledge Center Top stories, blogs, videos, white papers