Medical Device Quality, Regulatory and Product Development Blog | Greenlight Guru

Generative AI in MedTech: Quality, Risks, and the Autonomy Scale with Ashkon Rasooli

Written by Etienne Nichols | January 26, 2026

In this episode, host Etienne Nichols sits down with Ashkon Rasooli, founder of Ingenious Solutions and a specialist in Software as a Medical Device (SaMD). The conversation previews their upcoming session at MD&M West, focusing on the critical intersection of generative AI (GenAI) and quality assurance. While many AI applications exist in MedTech, GenAI presents unique challenges because it creates new data—text, code, or images—rather than simply classifying existing information.

Ashkon breaks down the specific failure modes unique to generative models, most notably "hallucinations." He explains how these outputs can appear legitimate while being factually incorrect, and explores the cascading levels of risk this poses. The discussion moves from simple credibility issues to severe safety concerns when AI-generated data is used in critical clinical decision-making without proper guardrails.

The episode concludes with a forward-looking perspective on how validation is shifting. Ashkon argues that because GenAI behavior is statistical rather than deterministic, traditional pre-market validation is no longer sufficient. Instead, a robust quality framework must include continuous post-market surveillance and real-time independent monitoring to ensure device safety and effectiveness over time.

Watch the Video:

Listen now:

Love this episode? Leave a review on iTunes!

Have suggestions or topics you’d like to hear about? Email us at podcast@greenlight.guru.

Key Timestamps

  • 01:45 - Introduction to MD&M West and the "AI Guy for SaMD," Ashkon Rasooli.
  • 04:12 - Defining Generative AI: How it differs from traditional machine learning and image recognition.
  • 06:30 - Hallucinations: Exploring failure modes where AI creates plausible but false data.
  • 08:50 - The Autonomy Scale: Applying standard 34971 to determine the level of human supervision required.
  • 12:15 - Regulatory Gaps: Why no generative AI medical devices have been cleared by the FDA yet.
  • 15:40 - Safety by Design: Using "independent verification agents" to monitor AI outputs in real-time.
  • 19:00 - The Shift to Post-Market Validation: Why 90% validation at launch requires 10% continuous monitoring.
  • 22:15 - Comparing AI to Laboratory Developed Tests (LDTs) and the role of the expert user.

Top takeaways from this episode

  • Right-Size Autonomy: Match the AI’s level of independence to the risk of the application. High-risk diagnostic tools should have lower autonomy (Level 1-2), while administrative tools can operate more freely.
  • Implement Redundancy: Use a "two is one" approach by employing an independent AI verification agent to check the primary model’s output against safety guidelines before it reaches the user.
  • Narrow the Scope: To reduce hallucinations, limit the AI's task breadth. A model asked to write a specific security requirement is more reliable than one asked to generate an entire Design History File (DHF).
  • Prioritize Detectability: Design UI/UX features that provide the sources or "basis" for an AI's answer, allowing human users to verify the data and catch errors more easily.
  • Continuous Surveillance: Accept that pre-market validation cannot cover all statistical outcomes; establish a post-market "watchtower" to monitor for performance shifts and user feedback trends.

References:

  • ISO 14971: The standard for the application of risk management to medical devices.
  • AAMI TIR34971: Guidance on the application of ISO 14971 to machine learning in medical devices.
  • IEC 62304: Medical device software lifecycle processes.
  • Etienne Nichols: LinkedIn Profile

MedTech 101: The Autonomy Scale

Think of the Autonomy Scale like the driver-assist features in a car.

  • Level 1 is like a backup camera: It gives you data, but you are still 100% in control of the steering and braking.
  • Level 5 is a fully self-driving car where you can sleep in the back seat.

In MedTech, most generative AI is currently aiming for Level 2 or 3, where the AI suggests a "route" (like a diagnosis or a draft report), but a human "driver" (the doctor or engineer) must keep their hands on the wheel and verify every turn.

Memorable quotes from this episode

"Hallucinations are just a very familiar form of failure modes... where the product creates sample data that doesn't actually align with reality." - Ashkon Rasooli

"Your validation plan isn't just going to be a number of activities you do that gate release to market; it is actually going to be those plus a number of activities you do after market release." - Ashkon Rasooli

Feedback Call-to-Action

We want to hear from you! How is your team implementing AI in your workflow? Do you have questions about the shifting regulatory landscape? Send your thoughts, reviews, or topic suggestions to podcast@greenlight.guru. We read every email and pride ourselves on providing personalized responses to our community of MedTech movers and shakers.

Sponsors

This episode is brought to you by Greenlight Guru. Whether you are navigating the complexities of generative AI or traditional hardware, Greenlight Guru offers the only specialized Quality Management System (QMS) and Electronic Data Capture (EDC) solutions designed specifically for the medical device industry. By integrating your quality processes with clinical data collection, Greenlight Guru helps you move from "check-the-box" compliance to true quality.

 

Transcript

Etienne Nichols: Hey, everyone. Welcome back to the Global Medical Device Podcast. My name is Etienne Nichols. I'm the host for today's episode and today we're going to be talking about an upcoming event, MD&M West.

That's not all we're going to be talking about. This isn't necessarily an infomercial for the event. We will be talking about that, though, because the speaker that I have with me today, Ashkon Rasooli, perhaps a voice you've heard on the podcast before, and myself will both be speaking at the at the event at MD&M West coming up very quickly at the time of this recording, it's next week, so you may be listening to this afterwards, but we're going to be previewing one session that sits particularly relevant, at least in my mind, to the medical device industry, and that is Generative AI and quality and the relationship between those two.

And I don't want to say too much about that necessarily, but I do want to say something about Ashkon. Ashkon Rasooli runs EnGenius Solutions. He's who I consider to be the AI guy for SAMD.

He's, he works with digital health teams building software as a medical device, helping that software in the devices. He spent a lot of time where cool tech meets prove that it works.

And his session is about what good looks like for Generative AI, especially the failure modes that are unique to Generative AI, like outputs that sound right that maybe aren't, and the different areas that users can get into when they're not sure what the risks are.

So, we're not necessarily going to be talking about how to clear one of these medical devices.

We will talk about a little bit about what his session is and what it isn't. But this is hopefully going to be practical considerations for tools workflows that we have already seen in the medical device industry.

So, if you're a quality leader, engineer, product person, if you're just trying to understand what's coming up into our industry, hopefully this will give you some frameworks to think about before you walk into the room at MD&M West or if it's over, you know, before you walk into your next conference room and talk about AI.

So how, how are you doing today, Ashkon?

Ashkon Rasooli: Doing great, and thank you very much for that intro.

Etienne Nichols: Well, good to be with you I always enjoy our conversations.

I suppose we could just talk about your session.

That's the thing that I was most interested in, what problem you intend to be solving for those MedTech professionals that are able to come to that.

Ashkon Rasooli: Yeah, I think that'll be a good, at least conversation started for us. This session is really focused on what does quality look like for a piece of software that uses Generative AI?

It is not necessarily limited to medical device applications. In fact, medical device applications at this point are none, none are cleared. But this applies across the board to any application of Generative AI.

We got three categories that the high-level use of AI at point of care, AI and operations like tools for design, development, manufacturing, monitoring and then AI and medical devices. But this will across the board apply to basically any sort of product that uses Generative AI.

Etienne Nichols: Okay, now I know someone out there listening when you say none of those have been cleared yet necessarily for Generative AI. Someone out there is going to have the objection and say, well I know there are AI devices that are out there, but what's the difference in what we're seeing or we call or the FDA has that AIML, the artificial intelligence and machine learning versus what a lot of people call it Generative or GenAI.

What's the difference? What is GenAI?

Ashkon Rasooli: Yeah, I think it's useful to go back to the definition of generative AI. What do we consider to be Generative AI? It is a subset of your general machine learning or ML applications where the product is designed to create sample data that resembles the data it gets trained on.

So, consider like a classic machine learning application we had a decade ago, image recognition.

It was given a task of identifying objects in photos. It was given photos as training different objects pre labeled. It was then trained on that the narrow scope of the task was to identify the object.

That was the range of outputs that we expected from that product.

A product is considered Generative AI if the design is such that it generates sample data that mimics what it was trained on. So, you know, the most important, the most common one we're familiar with, obviously now with the chat, GPTs and clouds of the world that everybody's using is text.

It is trained on text; it is expected to generate text. But really this applies across the po, across the board to many forms of artifacts or data. So, image generation, video generation. You know, one of the primary applications of Generative AI now is coding. So, it is trained on a bunch of code samples.

It is expected to generate code.

That is what we call Generative AI.

Etienne Nichols: Okay, and so that's Generative AI sounds like by definition that's introduces the possibility for hallucination because it actually is generating something new. And that's actually something I'm a little bit curious about.

How do you even prevent that hallucination? I know we're getting close, or at least hopefully closer than we have been in the past, but I don't know if you want to, I don't know, speak to that for a moment.

Ashkon Rasooli: No, absolutely. So, hallucination is one of the very common failure modes. I try to speak in very familiar terms to the medical device industry when I talk about failure modes, terms aligned to 14971 likelihoods, probabilities.

When you think about the quality assurance process as a whole, every step in the quality assurance process, particularly with a software product, a software development lifecycle is meant to mitigate failures out in the field, and they almost never eliminate the likelihood of failure.

And so, it is no different with Generative AI products.

Hallucinations are just a very familiar form of failure modes in this product. Right. It is where the product creates the sample data we're talking about and the sample data that it's talking about doesn't actually align with reality.

Right.

And we've seen many, many examples of that. It makes up sources, it makes up studies, it makes up concepts that don't exist. It has answered the question of how do you stack objects with we're going to put a rock on an egg?

Because it doesn't have the fundamental understanding of that.

And there are a number of ways to prevent it. But I think what's important also is to think about the level of impact or risk that a hallucination could have.

You've got at the base level, hallucinations that are readily detected by a human and those will often impact the credibility of the product manufacturer.

They will not, however, impact safety because usually no decision is going to be made based on that hallucination. It is readily detected.

Where this gets more risky is when hallucinations are looking, you know, legit and nobody catches that this is in fact a hallucination and they may in fact incorporate the data from that hallucination into a critical decision, into the decision-making process.

Next level to that could be not only is it hallucinating and providing incorrect and valid data in its hallucination, it is also crossing a number of what should be guardrails or red lines, you know, ethically, morally, legally, things like disclosing private data, disclosing harmful information, violating trademarks, copyrights, things of that nature.

So, I think with every application, the risk analysis, which typically includes estimating the severity of a failure mode, should be done for what hallucinations are going to look like with the Generative AI product.

What could be done to prevent that?

There's a number of ways to mitigate it. Again, I want to go back to the point that quality assurance is about mitigating over and over again the likelihood of failure.

And so, what I'm about to say is not going to eliminate the risk of hallucinations. However, the ways to mitigate it is at the top level. Workflow design is one thing.

How much autonomy do you design the product for?

This is done at the product management level by having more human supervision.

I always like to talk about this autonomy scale in this context, anywhere from 1 to 5.

It is noteworthy to talk about 34971 - which is a standard that captures this autonomy scale that I'm referring to as well, where one is the output coming from. This product is just a piece of data without which the job is still going to get done.

The decision is still going to be made.

It is just there as another data point.

And five is absolute full autonomy. And there is, you know, degrees in between. And so, you know, where you put your workflow.

I'd call that right. Sizing the autonomy of the workflow, that is one thing you can do. Basically, match the autonomy levels with the risk of the application.

But further down from that, there are a number of design decisions that can be made.

Things like adjusting the parameters that go into the model, things like temperatures which impact the randomness of content generated.

You know, higher temperatures are better for more creative applications.

At the same time, they increase the risk of hallucinations, lower temperatures for less creative applications, much more repeatable, much more predictable output. And so again, another lever that can be pulled is right. Sizing the.

And there are a number of other things that can be done to kind of limit the likelihood of hallucinations. For example, just increase the detectability of the output. You could actually have inherent prompts in your product and application where the underlying sources for the output the model also puts out.

And again, these themselves could be hallucinated, but it will still be one more thing that the user who would otherwise not detect the hallucination could use to detect the hallucination.

So, things like sources, you know, links to where the data was pulled from.

Once a user sees the output, and then they see the basis for the output. By looking at the basis for the output, they may be able to readily tell that this is basically a hallucinated output.

It is an invalid data.

You know, some other things that can be done include like AI detectors built in and all that. So, there's a number of ways to prevent hallucinations.

Etienne Nichols: Yeah, no, that makes sense. And you kind of went deep and I appreciate that going into the different risks that could potentially happen. And it makes me think of IBDs to a certain degree and I don't know, maybe I'm way off base.

I think of specificity and sensitivity, those key measures. What I typically think of as the dividing line between how a lot of medical device works and then how or when it comes to risk management and how IBDs are.

You know, there's the unique aspects to them.

The sensitivity being that true positive rate to show how well you can determine if someone has a disease and that, that specificity showing how well it finds people that do not have it or ruling out the disease and you know, the false positives and false negatives and I don't know if there's some form of that.

I have not read 34971. I've, I've, I've seen it and heard a little bit about it. I think it's the application of risk management for AI.

So, I think that's definitely a good call out. I appreciate you doing that.

I meant you mentioned something about Generative AI and how there's. Well, I kind of keep going back I guess to the fact that there hasn't been one cleared yet or approved through the FDA.

I assume it will be an approval if we don't have a predicate. Unless there's some way to do a predicate. Maybe that's another topic. I don't necessarily want to chase that rabbit right now, but how, what's the gap and why is it just we don't have a true problem to solve with it yet or they're working on. Do you know of those who are working on a Generative AI device or what's holding back from having one out there so far?

Ashkon Rasooli: Yeah, that's the million-dollar question, honestly is like what's holding us back? And I think it's a little bit of we don't know what we don't know and therefore there's undies around.

You know, from the viewpoint of a regulatory body like the FDA, you know what good enough looks like for a technology like this? Because what good enough looks like even outside of the medical device industry is still yet to be defined.

It is going to be heavily application dependent.

But what has happened Is we've gone from classical software where the behavior was very deterministic and you had clear pass-fail criteria and you could just have that, you know, true false kind of assertion in your test cases to, you know, classic machine learning, not generative applications.

With those, the output became statistical.

You had a statistical performance.

However, you have a predefined range of outcomes. For example, I talked about the image processing application.

The output would have fallen on a range of outcomes of, you know, detecting certain number of objects or detecting the wrong object. And you could have kind of planned for that and statistically characterized it and set acceptance criteria around that.

Now with the Generative AI, that range is either extremely broad and it is not something that you can readily just look at and decide this is what acceptance criteria looks like.

And often we get surprised by the outputs that come out of the model.

You know, we've all heard the news and read articles of the cases where the product outputs text that is harmful, inappropriate, you know, violates copyrights, when the prompt wasn't even asking for that, you know, because something statistical happened there. And so ultimately the question becomes how do you even know the range of outcomes that are going to come out of this?

And then on top of that, build acceptance criteria.

Yeah, very VNV focused perspective on this.

But I think ultimately that is the question with, you know, clearing of the GenAI devices or approving the first example of the GenAI device.

Etienne Nichols: Well, let's get into good. You mentioned V and V and you got me thinking a little bit about good practices or practices that maybe not just good practices but required practices.

So, I know your talk is going to be talking about good machine learning practices.

And as it relates to this, can you give US a top five considerations that MedTech professionals need to think about?

Ashkon Rasooli: Yeah, you know, if I were to, you know, kind of go in order, I do want to as the first two go back to what we had already established as good software development practice your standards quality assurance framework if you're doing a medical device six or three or four, if you're doing the tool, you know, the GXPs or the CAN5s or the CSA frameworks do still apply and must not be ignored or GenAI products. Because what's going to happen is you're going to have a product that definitely has a GenAI components.

It might even be the main component, but you're definitely going to have non GenAI components to that classic software. This could be the front end. The user uses any sort of onboarding, workflow, any sort of backend, you could have data lakes, data structures.

All of that is going to still rely on your classic traditional software quality assurance frameworks.

And the reason I say this is the assumption that all of that operates is kind of a precursor and a prerequisite for everything else. I'm going to say that is unique to GenAI.

So, you've got to keep that in addition to classical, you know, the classic software QA frameworks, the standard machine learning considerations continue to apply. So, I got to keep those things like transparency or that the data you use to train represents your actual application, you have adequate representation, a lack of bias, you monitor for ongoing shifts in your outputs in case the reality starts shifting from the data. You're training from things like explainability and you know, the standard security privacy concerns of am I training on data that I have consent to train on.

These are just a subset of the classic concerns around all machine learning products.

They do continue to apply to Generative AI products because Generative AI again is just a subset of that.

Yeah, but if I were to kind of like focus on things that apply to Generative AI, kind of touch base on a couple of them, but one of them would be early on identifying workflow and labeling for that workflow.

You got to optimize the level of autonomy with the level of risk decision is going to have the decision that this product is going to be supporting.

A couple things I've seen be useful is to narrow through either labeling and informing and training the users, or just through product design, the scope of the use that the product could actually be used for. So, one good application I've seen here, one good risk control I've seen here is limiting how you use the product.

If it's one of those products where you prompt and you get an output, which is typically what Generative AI products look like to have prompting guidelines. You see, there's a number of prompting methodologies out there that one can use, and different models behave differently on different prompt structures and architectures.

Things like a one shot or a two shot to give it examples of what you want and then you allow it to learn from those examples as a part of the prompt.

This is at the product management level.

The, I guess fourth thing I would put on this list is kind of building safety and effectiveness by design into the product. We already kind of talked about model design control, the parameters and the verification.

Etienne Nichols: If I go back just one, when you're talking about workflow and labeling, so what you're talking about with it's really informing the user who's about to prompt this, that this is how you speak to this large language model. If it's that kind of input, just a required structure on how to talk to them, you're really talking about labeling or let me go back even one step further when you said the scope of the use, it's not so much. We're building it so it can only do this thing.

There could be additional things that it's using, but really it's conveying that scope to the user. Am I hearing that right? So that the user knows I really can only use this for these 1, 2, 3, 4, 5 things.

If I get outside that, sure, I can start going outside and coloring outside the lines. But now I'm outside whatever has been validated and verified that it's capable of using.

Is that, is that accurate?

Yeah.

Ashkon Rasooli: And you know, one example that you know, kind of aligns with this paradigm from the non-machine learning world is one of the questions we're trying to answer years ago was these mobile apps, they're going to have medical device applications, but then there's a number of mobile hardware, mobile software out there.

How do we clear these? Right.

And you know, the FDA guidance for interoperability basically asked that whatever you pick to test, you disclose to the user. And it doesn't mean that you're going to have to test each and every single device out there that's just not feasible.

For example, all the different versions of Android, all the different versions of iOS, all the different versions of Linux products, right? But you're going to pick a sample and you're going to test on that, and you got to disclose that to the user.

And so, I look at it in the same fashion in the sense that what could be done is a breadth is a very broad use case. And what you've actually tested it for and are marketing it for though is a niche in that middle and like a much narrower scope.

So, you say, listen, I've tested it with the following prompt structure or the following use case, for example, you are much less likely to see huge impactful failures and hallucinations if the scope of the task is narrow.

For example, if you tell the product to write an entire DHF for you for a device, you're almost guaranteed to have an unusable output versus breaking that down into files and documents and sections within those documents versus I say four security requirements around this product.

Given the following geographies, the following client base, the following architecture.

What are security frameworks you think I should be thinking of what Are the security assets or surfaces or threats I should be thinking of?

You are much more likely to get a useful answer out of that because you've narrowed down the scope of the task there. And so, to do that, and then also disclose that to the users and train them on it, will limit the number of, you know, unintended misuse. Let's call it where you're using the product for that, which it's not meant to perform well for.

Etienne Nichols: Yeah, Foreseeable misuse, I think is definitely applicable here. Yeah. Okay, so you went on to safety by design. And I guess I know ISO 14971 would have us probably do this in reverse order, but I think this is.

This is great. So, the next one, safety by design, what were some of the things you were saying about that? I kind of cut you off.

Ashkon Rasooli: Yeah. So, building guidelines guardrails into the product, kind of as pre prompts, essentially the prompt that comes from a user gets augmented by prompts that are built into the product and making sure that those enforce certain guidelines. That is one way to improve the design.

But also, even that has a statistical chance of failing to even further mitigate that. One of the useful implementations I've seen is to have an independent verification agent.

So, you've got your own standard AI model, but then it gives you an output in real time. You've got an agent that verifies, given the guidelines, that the output does not violate any of the guidelines. If it does, it basically stops that from going out to the user. That, again, is possible to statistically fail. But the likelihood of both of these statistically failing is far lower than just one of them failing. So that's another kind of like inherent. I decide to think of it as, you know, we used to have what we called like rubber duck coding method, where you had like a duck and you'd explain your code to the duck, or, you know, paired programming methods where it would be two people kind of at the same station trying to code.

And the ultimate idea there is one individual can fail with some likelihood, but two individuals failing at the same time together is far lower in likelihood. And so, it's the same thing.

You know, these agents at the end of the day are kind of proxies for humans.

So, it's the same. Same idea that makes sense.

Etienne Nichols: I think the military has a phrase where they say two is one and one is none. So, it's good to have a little bit of redundancy in your.

So what else is that do we do? We cover five. I don't Even remember. And it doesn't have to be five, it could be, you know, three or seven.

Ashkon Rasooli: Well, since you asked for five, I'll give you five. So no, I think we covered four. I think there is another one that I think I want to highlight here.

What we're seeing is just the general shift in terms of what validation looks like from pre-market to more post market. This is a trend that had been kind of flown around and thought of for just general software, non-machine learning.

But with machine learning it's becoming more and more a reality that there is only so much we can do pre-market.

And if we are to reap the benefits of this technology, we're going to have to give in to including in the validation suite, continuous post market surveillance.

And so, your validation plan isn't just going to be a number of activities you do that gate release the market, it is actually going to be those plus a number of activities you do after market release.

Yeah. So, you know Your product is 90% validated and it goes to market and then it remains validated. You continue to monitor that your product is in this validated stage through monitoring the output.

Again, that has been something where independent AI agents have been thrown in the mix kind of on an ongoing basis, reporting back the rates of failure out in the field, monitoring every output, running it against guidelines and pass-fail criteria and reporting back in terms of how often are you failing.

That is another area where user feedback becomes even more important.

If you know this is a medical device, it would be classic complaint handling and customer feedback. But for non-medical devices too, it is crucially important that you've got some sort of, even something as simple as a yes or no.

Was this answer useful indication there for a user to kind of provide you a quick feedback and you can quickly monitor that also for trends. That itself could be an indication that due to a shift in external circumstances or something changing with your model training, you're seeing a shift in your performance.

Etienne Nichols: Yeah, yeah, that makes that, that's, that's a good point. You know, I'm curious, I'm just gonna throw this out there and see what you think you could react to this when, when you're talking about that, I like to, you know, think a little bit laterally. I mentioned IVDs earlier. So now you talk about putting it out into the field and 90% validated or whatever the percent you want to call it and constantly kind of re circulate and come back around and checking that for whatever reason.

And it made me think of LDTs Laboratory developed tests, not in a derogatory sense. You know, I have respect for both industries and medical device laboratory developed tests. I consider them to be, you know, it's kind of like the recipe that is out there and then it's put together and I guess it's put together and utilized by a professional in the industry, scientists of some sort, usually pretty highly degreed professional person, but they have the, you know, depending on their level of training and their wizardry, they can maybe alter things if they feel like it and want to.

You know, that's the real intelligence that's out there. Now we're about to put artificial and artificial intelligence out into the market and we have to make sure that it's being consistent across all demographics, populations, whatever the case may be.

I feel like some of this is a problem that LDTs or maybe other ancillary or lateral industries have experienced. I wonder if there's some things we can learn from these other industries.

Ashkon Rasooli: Yeah, 100%. I think ultimately the idea with LDTs was the entity carrying out the test is adequately trained to manage all the risks that are associated with that otherwise unclear test.

Right.

And the analogy here for Generative AI products would be if you've limited your user base to a number of experts.

And that's one way to look at it. But I guess what if those experts. I think what you're getting at is what if those expert users are actually human AI combinations.

So you've got what you define as an expert user be a slightly less expert user with an AI that is expert and that independent agent from the actual product that is doing the generating helps the user basically highlights things for them and says these seem sus, you know, you should probably deep dive into this.

Etienne Nichols: Well, so let me actually, let me back up just one more just because I don't know if I was very clear in my analogy and it could be a bad analogy.

So, it's not going to hurt my feelings. So instead of thinking at like you have the user who is a scientist of an LDT and then the user of an AI, I'm actually saying that the Generative AI that we put out into the market, the analogy would be the LDT plus the scientist and that could slightly alter it.

And then the doctor is the one who uses either of these could use the Generative AI, could use the scientist plus LDT and what they, they may have a slight variation across population.

If this scientist led t person, he looks at PSA levels or whatever the case may be, this AI has, you know, sifted through the market and is changing things slightly.

The doctor has to recognize, has to be able to have some understanding of when things are going to be altered. I don't know, it might not be a good analogy. It shouldn't be coming up with these things live.

But yeah, no, no, no, I could see that.

Ashkon Rasooli: I think what you're doing there is the actual test plus the scientist that carries out the test is analogous to the GenAI tool itself. And then another, you know, maybe we'll call it another independent monitoring AI that goes with it. And that package is the same as the other package because that package does. The secondary independent AI is actually making sure that this AI is being used correctly.

That's actually basically an extension to what I was saying, which in addition to checking the output, you actually are able to have that independent monitor actually checks how the product's being used and basically inform the user that the thing you're asking is kind of not within the guidelines.

In fact, you're seeing some of that currently with the commercial products. You know, if you try to get, you know, legal advice from ChatGPT or something like that, it'll start with, I'm not a lawyer. Right, right. It didn't, it didn't used to do that. They've kind of built that in there. So, there's a guideline engine there trying to enforce the guidelines.

Etienne Nichols: Yeah, yeah, that's a good point. So, well, let's talk a little bit about that then, using OpenAI or some other tool.

Maybe we're not talking about medical devices, but medical device companies use a lot of different tools and really that's the primary market talking to today. That's the audience that's likely going to be listening to this conversation.

What does it look like at a high level, the verification, validation, what, what should we be requiring and how we know this tool works when it comes to the tools that medical device companies use?

Ashkon Rasooli: Yeah, I mean, ultimately, in terms of, you know, technical quality assurance that goes with the technology and therefore, regardless of the application, what I spoke of earlier continues to apply as effective risk control measures, effective quality assurance activities.

Whether you have a medical device, a tool.

What does change, though, obviously, is the risk level.

So, the amount each one of these activities should be invested in is going to be changing because you're going to have different levels of risk.

So, while you would have, for example, the autonomy optimization of the workflow, in both cases, you're probably going to have, statistically speaking, on average, higher autonomy workflows for a tool than a medical Device, you know, if you've got a medical device that is making diagnostic decisions or treatment decisions for that matter, obviously fully autonomous would mean fully autonomous, you know, mistreatment of patient.

Whereas all your product is doing is for example, gathering supply chain data and manufacturing and helping diversify and incorrect information there will really translate to inefficient supply chain versus, you know, actual harm.

So, I think the same activities, but the levels are going to be different.

Etienne Nichols: Yeah.

Ashkon Rasooli: And you know, I'm trying to think of if there's anything.

I guess there is one thing that I could see be more tool related in all the harms. You know, one of the failure modes or hazards of using GenAI is if you rely on it too much.

Over reliance on AI could result in a loss of expertise in an organization because what would have previously been employees of the company, senior experts taking in data, analyzing them and therefore become knowledge kind of built into the organization is now going to be built into this AI tool.

And one classic irony of this is the fact that AI is going to get better and better, which means its failures are going to require more and more of an expertise to catch.

At the same time, the more you use AI, the less you're going to have experts to catch the issues with AI.

And so, there's going to have to be this optimal balance on stopping, you know, how do we gain the efficiencies without setting ourselves up for long term failure here?

By kind of losing tribal knowledge. The other way you lose expertise in tribal knowledge is people stop being engaged. Because when you look at these Generative AI system workflows, the workload shifts from actual fun generation of content or work to mostly just verifying and validating.

And you know, I could later, you know, talk about this in software engineering context and coding specifically as an example. But in general, speaking of these workflows, the work becomes more boring and less engaging. Because what you're now looking at is just here's a blob of artifacts that have come out of the AI and I need you to find the issues with it.

And so, this becomes less engaging work than you know, creating your own.

And so, I think I see that be more a risk with tools that are deployed widely within a company versus a medical device.

Etienne Nichols: Yeah, that's a good point.

I guess the example I like to use is Google Maps. If you use Google Maps for everything you do, you won't know how to get around without it.

And it's interesting to look at the generations that have come before us. You know, I grew up in Tulsa, Oklahoma and I actually installed draperies in probably over a thousand houses. Got me before, you know, through high school, through college and stuff like that.

So, I was a different, I knew that town and I can go there today and use Google Maps and say, yeah, that's, that's pretty good route, but I actually know a better route. And so, but there are very few places that I could do that with. And I think it's going to be more and more like you said, that concentration, that knowledge, that expertise is going to become concentrated maybe in a single, you know, the companies themselves may not even be able to see whether these tools are right. They're just going to trust the expertise who are concentrated in one, one company who have, have delivered these tools to them, whether it's coding or wherever else.

Ashkon Rasooli: Yeah. And you know, in terms of AI, I think the other factor that contributes to this is the fact that, you know, when we talk about tribal knowledge, it's not just that we're transferring that to the AI tool. It is also that tribal knowledge includes a lot of information that the AI tool necessarily by design is not going to have, you know, just the business information, business specific information with someone like, you know, an employee who's been there for 10, 15, 20 years is going to have the as now we're never going to think of all the things to give to AI, right, and put the same importance and detail and that kind of stuff.

And so, what's going to happen is not only is there not anyone who has the information that the AI has to catch the issues, but also there's not anyone that has that and then all the context around it which we don't normally fit to the AI.

Etienne Nichols: You know, this, it's a, it's a problem that we've been plagued with and there's two sides to this problem. The tribal knowledge of, well, we'll no longer have the tribal people to have the tribal knowledge, you know, or whatever the case may be, but in the past that could walk right out the door with that person.

It's never been documented. So, I, I'm curious because I don't know which is, you know, there's got to be a sweet spot where we utilize these tools for what the efficiency gains that we can get.

But we also still maintain that humanity and all the different context clues, emotional intelligence, every other aspect that we need as a company.

Ashkon Rasooli: Yeah, no, you're right. I think obviously loss of talent and tribal knowledge is an existing problem to a certain level, I think over reliance on AI Tools will actually exacerbate that.

Etienne Nichols: Yeah, yeah.

What about.

Because that's a conversation we can go on for hours. I'm sure you mentioned software developers and using that as an example, can you, can you give us some more unique color there?

Ashkon Rasooli: Yeah, you know, I was talking about how, you know, there's this paradoxical trade off here where you have a need for expertise to catch errors in the outputs of the AI, but at the same time you're giving most of the knowledge to the AI. And so like when you talk about like software development, you know, vibe coding is very hot now and you code with the AI.

But ultimately what these AI models are doing is prediction.

They're being trained on existing samples of code and they're predicting, you know, what the next line of codes should be and what they do often, which also is categorized as hallucinations. They detect patterns in lines of code, which in reality these lines of code, while they sometimes follow patterns, 70%, 80% of time, the other 20% of the time don't. And like senior coders with done that. So, you know, one very common use case in coding is you pull in libraries, you know, what we call software of unknown provenance or soup.

And these libraries have a naming convention. It is also very possible that the naming convention is switched for one particular library.

Etienne Nichols: Yeah.

Ashkon Rasooli: When the AI is coding, it's going to just follow the general pattern of naming these libraries and it's going to try to pull them in.

A senior coder that has used that library before is going to be able to catch that because it's not going to look right. They've used it before, but your average coder is not going to be able to. In fact, it looks right, it looks like the rest of them. Now this opens up actually the possibility of a hacker coming up with a library that is named in the way that the AI would predict, has some of that functionality, but has a bunch of also backdoors and you know, code injections into your library.

The more you rely on AI to generate your code for you, because the other part of this is like for businesspeople are going to try to push more to be done with AI and less to be done by humans. And you're going to get, you know, thousands of lines of code generated by AI and one person to verify it all the more you're going to see this slip through. Right. The more this is going to become a risk.

Right. And so, we're going to have to pull back at some point. There is going to be this point where we're going to be like, we've done too much reliance on these AI generated content.

We've got to pull it back a little and in fact enforce a certain level of human engagement.

This is something we had to deal with, you know, software development before with automated testing. In fact, you would never want your software testing and software VMV to be a hundred percent automated.

And the reason for that is the automated test cases only look for the things you think of checking at the time, and they do not look for things you forget.

And the reality is in the process of software development, people may introduce unanticipated outcomes, and it is very possible that the automated test isn't programmed to catch it. I'm gonna give you an example. I might have a button that is supposed to have an action, and the AI test is programmed to test for that functionality.

At the same time, I have a coder, you know, front end coder coding that front end button and you know, messing up the looks of, looks completely weird and off center and not available.

Your AI test or your automated test looking for the functionality will pass.

Yeah, it doesn't care. But yeah, right, because it was never programmed to look for the design of that button. But a human would immediately cache that issue because they can't find the button.

And so, we had to deal with this with automated testing where we were like, we're enforcing a certain level of manual testing. We are not going to go a hundred percent.

I think you're going to see the same paradigm be carried over to AI use.

You are enforcing a certain level of human oversight, human knowledge, human expertise, work being done manually versus, you know, AI.

Etienne Nichols: Yeah, I think that's, that's a good example. I appreciate you sharing that.

So instead of just giving your whole talk here, who do you think should go to your, your session? What, what do you expect them to be able to walk away with? And maybe more importantly, what should they not expect to walk away with?

Just so we level set?

Ashkon Rasooli: Sure, sure. I think that's a really good question. So, I think in terms of who should go, I think anyone looking to just use the technology of Generative AI in a product and would like to know how to do quality assurance in terms of safety effectiveness, they would benefit from this.

It doesn't matter what sort of application we're talking about, whether it's like, you know, tool, whether it's a point of care, it doesn't matter. I think anyone who's looking to integrate Generative AI in a product will benefit from this talk, at the minimum, they're going to walk away with nuggets of quality control ideas which will increase their chances of success. So, no harm there.

What I would explicitly call out that this is not covering is how to clear or approve a regulated medical device with Generative AI. That is an open topic.

The other thing that I want to highlight is there's a number of factors that one should think of when rolling out AI products largely to a culture, to a company.

And some of these factors are beyond safety and effectiveness. So, these include things like cost of running this product, what is my API cost, if I'm, for example, integrating into open APIs, that kind of stuff, what is this costing me? Is it actually saving me any money? Things like the amount of energy consumption that goes into running these models and training these models and, you know, the environmental impacts of these.

Just how do you do organizational adoption in terms of, like, culture? These are very important, very valid topics around AI tools.

This talk isn't about that. This talk is very focused on how do you ensure functionality, safety, efficacy.

Cool.

Etienne Nichols: Awesome. Well, I'm looking forward to it. Looking forward to seeing you in person and those of you listening, definitely reach out to Ashkon or myself. If you're heading to MD&M West or in a few more months, I think March.

Towards the end of March, we're going to be at LSI Life Sciences Intelligence in Dana Point, California. So, there's another opportunity to reach out to either Ashkon or myself. Talk more.

I'm sure we'll have more conversations there. So, any last things you'd like to share with the audience? Ashkon?

Ashkon Rasooli: No, I look forward to seeing everyone at the event and look forward for us to meet up. And thank you very much for the great conversation.

Etienne Nichols: Awesome.

All right, well, we'll let you all get back to it. Thanks so much for listening to the Global Medical Device Podcast. We will see you all next time. Take care.

Thanks for tuning in to the Global Medical Device Podcast. If you found value in today's conversation, please take a moment to rate, review and subscribe on your favorite podcast platform.

If you've got thoughts or questions, we'd love to hear from you. Email us at podcast@greenlight.guru.

Stay connected for more insights into the future of MedTech innovation. And if you're ready to take your product development to the next level. Visit us at www.greenlight guru. Until next time, keep innovating and improving the quality of life.

 

 

About the Global Medical Device Podcast:

The Global Medical Device Podcast powered by Greenlight Guru is where today's brightest minds in the medical device industry go to get their most useful and actionable insider knowledge, direct from some of the world's leading medical device experts and companies.

Like this episode? Subscribe today on iTunes or Spotify.