Human and Climate Drivers of Global Models with Machine Learning - Transcript

Water Resources Podcast – Yadu Pokhrel: Human and Climate Drivers of Global Models with Machine Learning

[00:00:22] Bridget Scanlon: I'm pleased to introduce Yadu Pokhrel to the podcast. Yadu is a professor at the Department of Civil and Environmental Engineering at Michigan State University. His research evaluates interactions among water, energy, and food systems within the context of climate extremes, particularly droughts and floods, and also considering human intervention.

And his work extends from local scales to global scales, and he has done a lot of research on global modeling and providing insights into different regions, like the Amazon, the Mekong, and other areas. And so today I think we are going to talk about his recent work on the Colorado River Basin and where he considered climate and human drivers, and then research on modeling intercomparison projects and global modeling. A lot about global modeling. Thank you so much Yadu for joining me.

[00:01:25] Yadu Pokhrel: Thank you so much Bridget for having me for this podcast. It's a pleasure to discuss a number of things that you mentioned, including Colorado the water crisis in the Colorado, our modeling work, and a number of other things that we have been doing. So, I'm very excited.

[00:01:43] Bridget Scanlon: Right. And I've known Yadu for a number of years. And we're currently working on Mississippi River Basin Modeling Analysis. There's a lot of interest these days. Yadu, as you mentioned as the Colorado River Basin has been subjected to multidecadal drought that lasted since 2000, reservoirs declining a lot, maybe about 30% capacity right now for Lakes Powell and Mead.

So you have been using the Community Land Model to simulate flow in the Colorado. And maybe you can give us a little bit of background on the Community Land Model and then the application to the work that you're doing in the Colorado.

[00:02:24] Yadu Pokhrel: Absolutely Bridget. Yeah, this is a project, I would like to start by bringing a little bit of context. This is a project funded by the National Science Foundation with our team at Michigan State leading the project jointly with University of Colorado Boulder. And also we have five different water management entities, or I would say stakeholders in the region as a part of the project, including US Bureau of Reclamation, California Dept. of Water Resources, and others.

So in this project, we are primarily doing modeling work. And you mentioned CLM by using CLM, the Community Land Model version five of it. And the Community Land Model, basically the base of the Community Land Model is a large scale global model, and recently it's used even for smaller scales, but large scale model that simulates hydrological processes in a full, process based, approach.

And it's a model that is coupled with this parent climate model, Community Earth System Model 2 CESM 2. But in our work, we use the land version of the model where we decouple the model from the climate system, we provide atmospheric data as input and run at the land-only mode.

So, and that is the hydrology. The Community Land Model has been advanced in many ways to simulate the biophysical and bio geochemical processes on land. I think it's one of the most advanced land surface models we have. But the other part of that is also simulating the human interventions.

And when we use this model for the Colorado, of course, natural hydrology is one part, but without looking into, or without considering the management aspects, it's not very meaningful to do hydrology in the Colorado, right? It's heavily dammed, one of the most intensively managed river basins in the world.

So we brought this model and then we combine this with river flood and reservoir routing model called CaMa flood. So we are using a combination of these two so that we can produce runoff from the Community Land Model, and then simulate the river flow processes, flood processes, and the river reservoir routing using the CaMa flood model. So this is one part.

The other aspect is irrigation. We may touch upon this later on other topics as well, but the model also has the capability to simulate irrigation water requirements. So in essence, it's integration of advanced land surface hydrology, combined with irrigation capability, and then integrated with the river model that has a reservoir operation capability. So we are using this model to look into the historical aspect of the Colorado. How has hydrology changed and not really going to hundreds of years, but the past few decades. And can the model reproduce the hydrology in the past several decades? And can the model or, and then can we use the model to simulate the future of the Colorado is another question that we're trying to answer.

[00:05:26] Bridget Scanlon: Right. And so, land model, so it's open source. So when you say community, does that indicate then that the community can advance the model and they can change it for their own use? Or they can contribute to changes to the model and then it gets updated? You said you are on version five now, so I guess each time you have a new version incorporate different updates and things like that.

So what does community really mean in a Community Land Model? And who is contributing to the advances in the CLM model?

[00:05:57] Yadu Pokhrel: That's a great question, Bridget. Yes, Community Land Model. There is a little bit of history to that, but without going into the long history where the name came from, this model is housed originally developed and now housed, maintained, and released by the National Center for Atmospheric Resources, NCAR and NCAR receives funding from the National Science Foundation. So in some ways this initiative is supported by the US Federal Government Agency. And at NCAR there are scientists working to advance the model, but also they work with the broader community. Because at NCAR, NCAR is under UCAR so they work with universities across US and around the world.

So in terms of advancing the model, it is maintained at NCAR and they do the releases. And the latest one is five and they will in the near future, I think they will release version six. But in this process, they collaborate with the community and if the community, if, or I would say if there is a new scheme or improvement coming from the community, and if it is compatible with the model development principle, then those advancements could go into the official version and NCAR could release it. So there are some technical things that we need to make sure will work out. But in general, the community can use the model and also contribute to the advancement of the model.

So that is where this, the community, the word community is used.

[00:07:18] Bridget Scanlon: Right, right. And you mentioned, the Colorado River Basin is intensively managed. But the flow at Lees Ferry, in between the upper Colorado River basin and the lower basin, they have estimated what the naturalized flow of the river would be at Lees Ferry by subtracting all of the diversions and return flows and all of that sort of thing.

So, that's one test of your model then, without any human intervention, if you can reproduce the natural flow at Lees Ferry. I think in your paper you mentioned that you try to reproduce this. And so how well can you reproduce the naturalized flows in Lees Ferry?

[00:07:59] Yadu Pokhrel: That's a great question, and when we started this project, or when we wrote the proposal, we assumed that the Community Land Model would be able to, or it would be good in terms of reproducing the naturalized flow, before we start working on the human aspect, which is pretty challenging.

But, we started running the model, and one big issue was at Lees Ferry, the upstream portion is snow dominated. It's pretty well known or that, it's, everybody knows that. So, when we started running the model, the model struggled to simulate the natural stream flow at Lees Ferry Station because of complicated snow process.

And then there are other processes like groundwater flow, subsurface flow, I would say. So the model struggled. So we had to go and look into model parameters because Community Land Model, fundamentally, this is a model that is designed to do large scale simulations, capturing large scale drivers and reasonably simulating stream flow at the outlet of big basins, right?

So when we start going into the headwater catchment of the Colorado, then the model may not hit the right parameters, or the physics may not work at that scale. We are working at five kilometer grids versus the original model is at 50 to hundred kilometer, typically when we run at the continental to global scale.

So we are going down to five kilometer. That's still not very really high resolution in terms of management, but pretty high resilience in terms of large scale model development. So the model struggled to reproduce the naturalized stream flow at Lees Ferry Station. And as you mentioned in the Colorado, we have these naturalized stream flows, developed by taking away all the human intervention aspects from the observed stream flows.

We don't have these data sets for all river basins around the world, so this is really good place to test the model. And because the model struggled, we went to look into some processes and the initial thought was maybe there are some missing processes and we can represent those or we can advance, but not much luck with that.

We have pretty good representation of evapotranspiration, runoff generation, and to certain degree even groundwater flow processes. But we couldn't improve the stream flow at Lees Ferry. And then we went to parameters. And big thing here is this is a process-based model, and we like to say this is a process based model.

The problem is when we need to go and play with some parameters. In this process-based models, all the parameters have some physical meaning. We can't randomly go and crank up and down and try to get the stream flow right. Then we may be completely missing or messing up with the ET process because that parameter has certain meaning, right?

And in CLM they give certain bounds based on expert judgment. And then also it has 230 or 230 plus parameters. So the challenge was what do we do? We tried to play with the physics and we did our best. We couldn't make the model work. And then we started looking into parameters.

We have 230 parameters. So this is the main part of the paper by my student Ahmed Elko, that appeared in water resources research. And this was jointly done with NCAR, the developers of CLM model. And what we did is then we started getting help from machine learning and artificial intelligence, AI, ML, because we have so many parameters.

The model is very intensive in terms of computational resources and time. So we started looking into different parameters and we use some machine learning techniques to identify some of the parameters that are heavily influential in terms of runoff generation. And we boiled that down to about 30 parameters from 230 and eventually we'll work further to get to seven parameters that were really, we call influential parameters for runoff generation process.

And then once we understood these are the seven critical parameters, we started playing with those, but even that is very computationally intensive. And then we started looking into emulators to, kind of, avoid repeated simulation by using CLM. By doing so, we were able to finally produce the naturalized stream flow, going back to your question.

Now, by doing all of that work, we have the CLM model that reproduces the naturalized stream flow at Lees Ferry in a reasonable way.

[00:12:25] Bridget Scanlon: Right, right. So, a lot of people talk about process-based modeling and then they talk about data-driven approaches, AI and ML approaches, and these are kind of two end members. But I think what you are describing then is that you are using AI and ML to optimize parameters and to help you so that you don't have the computational intensity of running CLM a zillion times.

So I think we are going to see more and more of this hybrid of maybe process-based modeling with the AI ML and having this to move forward because of the computational intensity, and also the uncertainty in these parameters. So, that's nice that you were able to get that.

And the snow simulations worked fine. Some of the parameters were related to snow or runoff, or these seven parameters that you've, it's sort of a sensitivity analysis you were describing, right? To figure out which were the most important, you said, influential parameters and were they mostly related to snow and runoff or?

[00:13:25] Yadu Pokhrel: Absolutely, yes. It's sort of sensitivity analysis. The first part, first step is look into sensitivity, but we have 230 parameters, so we had to play with those so many times to boil that down to seven parameters. And once we know that runoff is sensitive to these parameters, and of course, yes, some of the parameters were related to snow, some were related to even photosynthesis that impacts ET, and some were related to subsurface flow.

Three different types of processes. And once we have those seven parameters, once we get those by doing some sensitivity analysis, the next step is looking into, or I would say optimizing those to get the right stream flow. Even within those, what we found, an interesting thing, is a lot of times when we do parameter optimization, we just look at annual stream flow and then try to come up with the parameter that is valid for the entire year, for example.

But here, the hydrology during snow accumulation time, there may be one parameter that is heavily influential, and if we don't capture that, or if we don't have the right parameter for snow accumulation, the snow might be melting quickly, and we get a peak very early in spring or summer. So, that parameter is important during accumulation stage.

And then when you go to spring, late spring to summer, there may be ET related, photosynthesis related parameters. So it was not only just finding parameter and optimizing those for entire year, but looking at the seasonality of the influence of those parameters. That is another important thing that my student looked into.

And in the paper we have a nice chart showing when a particular parameter is influential and when, when does that influence go down and then comes up. And going back to your question, yes, some of those were related, many of those were related to snowmelt. And if we don't capture snow melt accurately, there are two big problems.

One is early melt, let's say. And that will lead to timing issue in the hydrograph and also. Even if we are able to capture the timing, then, the quick melting may give you a very large peak at one time, and then, later on you don't have so much storage left in the basin.

So that will now impact the low flow, stream flow during the low flow timing. So somewhere stream flow, somewhere ET related, and some were subsurface runoff related parameters.

[00:15:45] Bridget Scanlon: Right. So that was the naturalized flow, and then you went on then, so I mean, the snowmelt and everything is very important. Because the upper Colorado basin contributes about 70 to 80% of the flow in the lower Colorado basin, so it's the main driver.

And we often hear, what is the snow accumulation on April 1st in California, in the Sierras, or these sorts of things. What was the snow conditions like? Because that predicts what the water year is going to be like in the future. So, so then you had the naturalized flow and you had the good seasonal distribution, which is important because if you don't get the snowmelt right, then you might have drought conditions in the late summer that may not actually be happening.

Then you try to bring in the reservoir storage and the reservoir management. And I think Oakridge Turner's paper are the inferred storage targets and release functions for the continental US, I guess that was the study that has estimated reservoir operations, because it's very difficult to get estimates of how the operators are managing the reservoirs in order to simulate them appropriately.

Maybe you can describe that a little bit.

[00:16:53] Yadu Pokhrel: Right. And you mentioned an important point about what we just talked about. 85% of the runoff in the Colorado at Lees Ferry comes from only 15% of the basin. And that is heavily contributed by snow related process, snowfall, snow melt and so on. So I just wanted to close the part that we discussed.

It's really interesting hydrology, 85% coming from 15%, and if we don't get that right, then there's no way we get the hydrology right? So, yeah, and going to your question about reservoir operation. So, since the early two thousands, 2001, two, and then 2006. In 2006, there was a paper published by Naota Hanasaki from Japan, and they presented a large scale, global scale reservoir operation model in their 2006 paper.

And, that led to a number of other models, either kind of adapting those or advancing those in some ways. But most of those models, and I would say during my PhD dissertation, I adopted Hanasaki scheme and I put that into a land surface model. Within the Japanese climate model called MATSIRO. The model is called MATSIRO Land Surface model.

I put that reservoir operation scheme into MATSIRO, but it was still in essence the Hanasaki scheme. And what Hanasaki scheme does in terms of reservoir operation, it allows us to simulate reservoir operation at large scales, continental global big reservoirs. The release scheme is based on two things.

One is either hydropower, I would say three, or irrigation or other reservoirs. And if it is hydropower, there's a simple function because in hydropower reservoirs, they try to release water more or less at a constant rate to optimize power throughout the year, right? That's why downstream of Lake Mead, we see a pretty flat hydrograph because they release more or less the same thing. There's a little bit of peak there, but not so much. So one hydropower, more or less constant release or based on some function.

If it is an irrigation reservoir, primarily for irrigation, I would say. Then the release is based on downstream demand. We try to capture how much demand there is in the downstream, and then release water from the reservoir to meet those demands.

And the third category is other types of reservoirs could be flood control, navigation, recreation, and so on. For those types of reservoirs, there is a very simple scheme in the end.

All of these are generic schemes. We use one scheme for all the reservoirs globally because there is no data available, even in the United States, publicly available data on how they operate reservoirs. So that's a big problem.

So in some cases, generally even my work that I did in 2012 in the Colorado or Columbia and other basins, we can generally improve the flow from the natural version of the model, because in the natural version there are no reservoirs there's a big peak. And once we put even simple reservoir scheme, the hydrograph becomes more or less flat and there's a big improvement.

But we are not capturing the actual operation. That is where this paper by Sean Turner, et al. And in that one they provided data sets for the US reservoirs. And these are not actual operation data sets, but they mimic sort of the reservoir operation. And that data was really valuable.

And what we did is in the paper, done by my student, he started with the generic scheme based on Hanasaki or my prior work published in 2012 and others as well, then we were not able to capture the seasonal dynamics of water release in the Colorado. So another challenge, like what I mentioned earlier, we started with natural flow and we had to go to machine learning.

Here, we started with the common approach of doing large scale simulations for reservoir operation, but we couldn't capture, and that is expected because these are generic schemes. There's nothing about how water is released from Lake Mead. So, my student got the data from Sean Turner (2020). And then what he did is he sort of improved the generic scheme in, in a way that now the release is based on this new data that mimics the actual operation of reservoirs for all the US reservoirs. And we were doing only Colorado in this case.

And, in doing so, what we also looked into how much water is diverted from the upstream for maybe it would be out of basin diversion, or it could be diversion for irrigation. We also collected all the data from the Bureau of Reclamation, USGS and other agencies, and then further combine with the database scheme and remove all the water that is already taken away from the system.

And that led to significant improvements in the release from the major reservoirs. That is the kind of summary of the story, and I would be happy to further elaborate if you have other points to bring.

[00:21:55] Bridget Scanlon: Well, that's great! Bringing in these human interventions like reservoir management and stuff at the global scale, and then going much more detailed and the data that Oak Ridge Sean Turner provides to improve. Even if we don't have the data from the operators, then it seems like he can look at the releases and stuff and then back calculate how they're operating, I guess.

Yeah. Right. So, it seems that you are involved in global modeling, you've been involved in global modeling all your career, and one of the things, you know, and we rely heavily on these models, a lot of the agencies, UN and other agencies are relying on the output for these models to get an idea of the global water situation.

And that makes model inter comparisons very important. And there are a number of model intercomparison programs. And I know early on when we were looking at GRACE satellite data, Himanshu Save, who is a GRACE scientist said I'm tired of people telling me how inaccurate the GRACE satellite data are for water storage.

And he said, I would like to see how well the global models do. And we did a comparison and we are surprised to find a lot of variability among the models, and between the models and GRACE. And that study, it seemed like it suggested that the models weren't capturing the extremes, and maybe they didn't have enough storage space to accommodate very high flows and very low flows, or, storage.

But you've been involved in the Model Intercomparison Program for many years now, so I think it's, ISIMIP Intersectoral Impact Model Intercomparison Program. Maybe you can describe that program and then maybe what other Model Intercomparison Programs are out there, and how that has evolved.

[00:23:39] Yadu Pokhrel: Sure, Bridget. Yeah, that's a really great point. And this is something that I have been working on for a long time, as you mentioned. So, the ISIMIP project funded by the European Union, and with participation from modelers around the world, actually. Global hydrological models, is there for more than a decade now. I think it started in 2012 or so. In 2011, there was another project within the European Union, it was called EU Watts where they brought a few models and they did some comparisons, and that led to a new initiative, it's not really led to, but ISIMIP followed the EU Watts initiative at that time.

So ISIMIP started around maybe 2012 plus or minus one year. And in this one, the idea was to bring global hydrological modelers around the world together, to do a comparison of how these models reproduce, initially, let's say just stream flow, in the very beginning it was only stream flow and it started with hydrology.

But, also, as the name implies, it is intersectoral. So it's not only one sector. Water is one sector, there are 18 or so sectors right now, water sector, global and regional, and fire sector, lake sector by, biodiversity sector, biome sector and agriculture sector and so on. So there are 18 or so different sectors, but in the beginning it was more on climate and hydrology.

That's why you see climate coming and showing up on the website and early publications. So that is where this started. And, the model that I kind of advanced during my PhD called MATSIRO.

So that model was part of it from the very beginning, and that's how I got into ISIMIP. And we started working on that and there were teams from different countries and Japan was actively engaged. And because I was working with the Japanese model, we started engagement and until now, the model is there, the ISIMIP community is expanding.

And from last year I have started contributing as one of the two coordinators of the global water sector with Simon Gosling in the UK. So that is the overview. And what we are doing or how they started is initially, we just took observed data sets, observation based global climate data sets, and ran a number of hydrological models.

Initially, it was a few models. That was around 13, 14 that was called MIP Fast Track, just running some hydrological models and doing intercomparisons. And those intercomparisons provided a lot of insights on how would this model do in terms of simulating stream flow, where the gaps are, whether some of the models are much better than others.

And initially there was a tendency of a little bit of ranking the models, but then we started not doing that, because the community realized there is no absolutely best model. Because one model may reproduce stream flow in one region with very high efficiency, but then it may struggle to do in another region.

So we started just looking at models without ranking and looking into different regions with less human impact or more human impact, and so on. And then it also expanded, the initiative expanded from hydrology climate to other areas. As I mentioned earlier, there are a number of other sectors, and right now they're talking to each other.

And one recent initiatives in terms of the intersection is there is a paper that has been submitted on biodiversity, looking at the hydrologic extremes and impacts on biodiversity, different species, and so on. So, there are a lot of other things.

And lake sector is tightly connected to water, they're looking into global lakes under climate change, but these are related to or linked to hydrological models as well. So overall, the initiative is bringing together different models, different communities. And then we started by comparing the models to understand how these models do in terms of reproducing stream flow.

And then the next step was looking into future predictions. Of course, future predictions, we can't say whose model is good, because it's about the future. But then, by bringing different models, we can talk about the confidence or what is the spread among different models. And I'll stop by just, mentioning one paper that I did in 2021 on global terrestrial water storage and drought.

By bringing, eight models from ISIMIP and with 85 different realizations, that is a combination of eight models and three GCMs and a couple of different scenarios. And we looked into global scale terrestrial water storage change over the course of the 21st century, finding that in two thirds of the world, all of these models consistently say 85 different realizations, consistently say in two thirds of the world, terrestrial water storage will decline by the mid-century. And that will lead to significantly, or that would lead to extreme to exceptional droughts, increasing significantly in places where terrestrial water storage is going to decline.

That's one thing, and I'll stop here and I can elaborate, there is another paper, that you have been co-author on as well, accepted by Nature Water. So I can elaborate, but I'll stop here for now.

[00:29:02] Bridget Scanlon: Right. And so, so you mentioned then that you're looking at initially it was mostly streamflow and then you were looking at observation data and driven by actual climate observations, also. And then you mentioned now looking at storage, and then driven by global climate models, right? GCMs. And so, when you go to the ISIMIP website, it says, ISIMIP offers a framework for consistently projecting the impacts of climate change across affected sectors and spatial scales. So maybe you guys might want to update that.

[00:29:34] Yadu Pokhrel: (Laughs) Right. Yeah. It's yeah beyond that,

[00:29:37] Bridget Scanlon: A legacy, I think, yes.

[00:29:38] Yadu Pokhrel: That's the legacy. Yeah.

[00:29:39] Bridget Scanlon: Right, but, let's first talk about how it has been used to look at climate change. So the other is a climate model Intercomparison project, which is CMIP and CMIP6, CMIP five. And now when I went to that website, it's now called Coupled Model Intercomparison Project, I think, but maybe that was just one website.

So maybe you can describe how you do these future scenarios, what scenarios you pick, what different emission scenarios, and just a little bit of background for the listeners on that aspect.

[00:30:10] Yadu Pokhrel: Right. This is a great point, Bridget Yeah, CMIP six, the climate model intercomparison projects, they do intercomparison among climate models. They look into future predictions and again, look at the uncertainty and spread among different models, are their hot models consistently predicting higher temperatures and all of that.

And those models also have hydrological models. And some of those hydrological models participate in ISIMIP in offline mode. For example, Community Land Model is used in the CESM community or system model. So it may appear there in the comparison, but we only use the land version of those models within ISIMIP.

So the climate intercomparison, it's mostly on climate, but not specifically on land hydrology. And now we may be using similar models, and there are a couple other things also I want to note, within ISIMIP we have three types of models.

Number one is land surface models. Those are the land component of climate models like CLM, MATSIRO and others.

And then number two is global hydrological models, like, WaterGap or HO8 and PCRGlobWB, all of these models are not a part of climate models. They're developed independently of climate models by just taking climate information. And these we call global hydrological models.

And the third type of model is like LPJmL (Lund Potsdam Jena managed Land), that are vegetation dynamics models. They also do land hydrology in a detailed fashion.

So we have three types of models. And, why I wanted to connect this is some of the models are also in the climate models there, but those models are pretty coarse and ISIMIP does simulations at relatively fine resolution, half a degree, 50 kilometer, but climate models may do at 200 kilometers, for example, right?

So we go to finer resolution number one. And then we provide consistent input to all the models within the MIP. So, we, the MIP provides the input data to all modelers. And they take the same input data, same parameters, and so on. So that is the consistent, we put the same input, we expect similar outputs, right?

And the third one is we try to also, make sure models consistently represent human interventions, but some models don't have reservoir operation, so we don't eliminate those models. So there's some discrepancy there, but then we can't compare reservoir release and so on, but the land hydrology still there.

So that is one good thing we put, we develop consistent datasets, give consistent data sets to the modelers, and then we get consistent outputs to the extent possible. And then that becomes a part of the ISIMIP database and the community, broader community can take those and do some analysis.

And I think your question was particularly, there's climate model intercomparison project. So those are mostly on climate, and here we're talking about land hydrology, looking into streamflow initially. And also during this evolution, initially MIP only required the modelers to output some fluxes, like ET runoff and stream flow.

But now they are required to output terrestrial water storage and all the components, for example, so there are a lot more details there. And all the modelers are producing those, and these data sets are available so that we can go beyond the stream flow. We can go to, also, detailed flood simulations in some cases, and all the terrestrial water storage components. And we can consistently compare all of these variables.

[00:33:32] Bridget Scanlon: That's great. it's nice, easy, maybe you've got these land surface models and they were developed by the climate community. And so, so I guess NASA GLDAS would be global Land Data Simulation System, or CLSM, these types of models. And oftentimes they don't include any human intervention.

So sometimes they don't have irrigation, they don't have reservoir management and stuff like that. And then the global hydrologic models like WaterGap and stuff, they may be more with human intervention and have various water uses. Irrigation, power, electricity, and all of these different sectors.

So it's nice that you are comparing all of these models now and get a feel for their reliability. So a lot of your research Yadu, you have focused on adding human intervention into the model, so irrigation and reservoir management. I mean, irrigation is the elephant in the room: 70% of global water withdrawal, 90% of global water consumption based on Siebert’s early studies. So maybe you can describe how you did this with the model that you were using from Japan, how you added those human interventions. I mean, you've talked a little bit about the reservoir operations and irrigation.

How do these models simulate irrigation, and what level of detail?

[00:34:52] Yadu Pokhrel: Right. That's a great point Bridget. Yes. It's the big elephant in the room. It's more than 75% if you go to different continents could be more than 85% of water consumed for, water withdrawn for human consumption, going to irrigation. So if we are not able to capture that in the model in many parts of the world, we may be grossly missing the fundamental aspects of hydrology.

And in terms of modeling, you mentioned within MIP models like WaterGap are more comprehensive in terms of human intervention versus the Land Surface Models like CLM, MATSIRO, and others didn't even include any human interventions. Let's say it is eight years ago or so, now they have some of the human components. But, the human interventions started getting incorporated into the hydrological or global hydrological models like WaterGap, HO8, and so on.

And in terms of developments, irrigation representation started with the hydrological models in WaterGap and HO8, as I said. And when I did my PhD work around 2009, 10, 11, none of the land surface models have irrigation at that time. So I started putting irrigation into the Japanese land surface model MATSIRO.

And what we did in that one is there are two different aspects or philosophies in doing irrigation. One, like in WaterGap model, you would try to, you would do, potential evaporation for plants or crops, and then you would do actual evaporation. If water is limited, then your actual evaporation is lower than the potential rate, and the model calculates the gap.

We try to bring the actual evaporation close to potential evaporation if there was abundant water from irrigation, right? So that gap is the irrigation demand in some of these hydrological models. But land surface models don't really simulate evapotranspiration based on potential ET they do based on energy balance.

So in many models there is no potential ET even in land surface models. There, and this is not simulated at all. So what we did in land surface models is instead of doing the difference between potential ET and actual ET, because the potential ET doesn't exist, we did the soil moisture deficit approach.

There's the actual soil moisture simulated by the model in dry conditions, in the absence of irrigation, and there's a threshold. If we had water, if we were able to irrigate, then we would bring that water to close to field capacity, let's say. That gaps becomes irrigation water requirement.

But actual, in reality, irrigation is not that simple. Humans, farmers, use a lot of judgments. Weather information, sensors water availability and water savings, all of that, it's way more complicated. And in the past few years, there have been some discussions in the literature, sometimes even back and forth, how much of those human activities are captured in the models? But I would say in the beginning when we started our goal was not to really capture everything that farmers do.

Again, in terms of big picture, in heavily irrigated areas, like the High Plains Aquifer or California Central Valley, can we reproduce analog irrigation amounts so that this land surface models when coupled with climate models, if the volume of water used for irrigation is more or less accurately captured, then we can link these models, do climate simulations and so on, right?

And also use these models to estimate future irrigation water requirements. So that is where this started and and today, I think in some models, they are putting more details in terms of the timing of irrigation, using remote sensing to understand when the crop growth is happening, and limiting irrigation during that period based on leaf area index and all of that.

And if some models do very detailed simulations at smaller scales, they could even try to capture how farmers do irrigation. But in general, some of the models, large scale models do ET minus actual ET, PET minus actual ET based irrigation. And the models like Community Land Models do based on soil moisture deficit.

And there are a lot of details there, that deficit. Actual soil moisture is there, but there's a target. How do we set the target? And depending on this target, the gap changes and we can have even twice the amount of irrigation water used. So in one of the studies we use SMAP, soil moisture active passive satellite data on soil moisture, because SMAP nicely captures soil moisture in irrigated areas.

So we assimilated SMAP data into the Community Land Model and constrained the irrigation water withdrawal by setting the threshold based on SMAP. And that SMAP now captures what farmers might be doing, because depending on how farmers irrigate, soil moisture varies, right? So that allowed us to find that threshold.

And now we can use that threshold to in other areas or for future prediction. So that is where things stand. And there was, or yeah, 2023 there is a paper. In Nature Reviews Earth and Environment led by Sonali McDermid. That came out of a big irrigation initiative globally. And we synthesized everything in terms of where irrigation modeling started, where things stand today, where the challenges are, and so on. So that's really good paper McDermott et al, 2023.

And that led to another initiative, this is my last point about irrigation. Which is called IrrMIP Irrigation Model Intercomparison Project. And this project is about implementing these land or integrating the land surface models like CLM, that have irrigation capability into their parent models. And then doing multiple simulations with different climate models where there's irrigation incorporated. And comparing what is the impact of irrigation, heat extremes in terms of change in air temperature and so on. And just two or three months ago, there's a paper that is the first one and that is out.

And we are still working on other, in other parts of other areas of the IrrMIP project.

[00:41:01] Bridget Scanlon: Right, right. Well, that's great that there's a lot of emphasis. Then, are these models incorporating the crops that are the exact crops that are growing, or are they just approximating what the crops are? Because I guess that would be a pretty important factor also.

[00:41:18] Yadu Pokhrel: Right. Yeah, that's another really important point. Many of these models when we started implementing irrigation, we didn't really think about crop. The model has a prescribed crop. And there's a seasonality of LAI, and we don't even know what that crop is. Among 13, 14 vegetation types, it's crop, and we just irrigate where there's crop.

So it was pretty crude. But now in CLM, this is integrated with crops. So we have a crop model that simulates crop phenology, when the crop is planted, all of the growing stages, harvesting and all of that, and we only irrigate during those days, growing period. And also in one of the studies we even went beyond the simple crop model that CLM has by bringing a model called ABIM (Agent Based Irrigation Model), that is agronomy based crop model, that simulates all the detail from seeding germination. And it also considers heat stress, water stress, nutrient stress and all of that. And depending on how crops are growing, now the irrigation model simulates irrigation water requirements. So this is a nice integration of crop and irrigation. And in fact, that is what we are using in our Mississippi project.

And we have been able to nicely capture irrigation requirements. At the same time, also crop yield, because yield is related to irrigation. So now we can use these models not only for water resources purpose, but also to look at crop yield and we can link that to food security and other stuff.

So a lot of advancements have been made in terms of irrigation modeling. The only big area is, as has been discussed in the literature, is some of the details about how farmers do irrigation. It's very hard to- like a reservoir operation. How do they actually operate? We don't have information, irrigation at the plot scale or farm scale.

Different farmers do different things and it's very difficult to bring that into the model. Otherwise, I would say a lot of progress has been made in capturing or modeling irrigation.

[00:43:19] Bridget Scanlon: Right, right. And I know Petra, with the WaterGap model she incorporated deficit irrigation in the High Plains because a lot of times they just irrigate, they don't irrigate the full amount between PET and actual ET, potential and actual ET, but the deficit irrigation, so they have drip systems and stuff like that. So they may only be irrigating about 70% of that or different things.

So yeah, I mean, we are continually evolving. So in addition, MIP and these models, in addition to using them for looking at climate impacts, and it's nice to, I mean, we need models to test different things. And we can test what is the climate impact alone by not including human intervention, or we can test, whether humans amplify or dampen climate impacts.

And we do a lot of these tests with models, they're a tool. And even if the model mightn't be totally accurate or whatever, because we don't know, but they can give us the relative importance of different drivers and stuff.

So one of the things that models are oftentimes used are for water scarcity assessments, global water scarcity assessments. And so when you are using these models for that, I mean, it depends what you simulate for the water demand and what you simulate for the supplies. And then the difference becomes the deficit or the scarcity.

So, you mentioned, we know irrigation is probably the dominant water demand, then people also incorporate environmental flows. And they estimate those and the pretty high percentage of the flow they assume are, should be allowed for environmental flows.

And then on the supply side, it seems like these models, I get the impression they first use surface water, and then if surface water is not available, then they will use groundwater. And some of these models don't have groundwater. So maybe you can describe a little bit on this supply demand and how that might impact scarcity assessments.

[00:45:20] Yadu Pokhrel: Right, that's right, Bridget. Many of the models, at least when we started implementing human interventions, the idea was let's estimate the water demand within a grid cell, and then the model will look for water. If there's any surface water available, it would go and withdraw.

And in the beginning we didn't even consider environmental flows and it would entirely withdraw, which is not the case in reality. And later on some thresholds were prescribed, let's say 10%, 20%, up to 40% of environmental flows with some information that we have. And then if we deplete to that extent, then the model would go and pump groundwater.

That was the general philosophy when we started, but recently, at least for the United States, we have data from USGS about how much groundwater, how much of the water withdrawal comes from surface versus groundwater. And we have some information about environmental requirements in the US, so we're utilizing those datasets.

We tell the model don't just go and try to withdraw everything from surface water, but rather use USGS guidelines. And if USGS says, 60% comes from surface water, the model would try to go and withdraw 60% from surface water, but if there is no surface water, then it would go to groundwater anyway.

But that is one improvement that we have made, but there are challenges in other parts of the world. That's one thing. And yeah, that is one part in terms of sort of supply versus demand. We first calculate the demand and then we go for supply. And in the end, depending on how much water is available, we develop these indices, water scarcity, index, and so on, right?

So those are all there, and there are different kinds of indices that have been developed. And these are particularly useful in regions where there is water scarcity already happening. And the question is, how would that evolve into the future? And we simulate future demand, for example, irrigation, increasing irrigation demand under climate change.

And also changing supplies, and how would the deficit change in the future? That is one important aspect, and I think there are many different indices, but in the end, the general philosophy is looking into how will the supply demand balance would change in the future.

Yeah, I think, yeah, I don't know if I answered your question. I can add a few things, but if you have any other clarification-

[00:47:43] Bridget Scanlon: I mean, I guess, different models, how they compare when you develop these scarcity estimates. I think maybe it's important to consider a number of different models to incorporate uncertainty in those. And if you assume you're just relying on renewable water supplies, or you are pumping, like the High Plains when we pump non-renewable groundwater, those sorts of things.

So, a lot of factors.

[00:48:07] Yadu Pokhrel: Right. Yeah, that's a good point, yes. That brought a very important point actually. Some models do renewable versus non-renewable. Other models have simplified groundwater, they just withdraw. And some models even have imaginary source, they unlimitedly withdraw water. Because the model doesn't know how much or doesn't really, doesn't have explicit groundwater scheme. So that's a very important point.

And that, that is where there is a big challenge, I think. There's a lot of other stuff that we are missing. Example, in heavily depleted systems, economic aspects, can we continue to pump groundwater? How deep is it like in Ogalala Aquifer in Northwest India?

So most of the models today, they simply assume there's unlimited water and farmers can easily withdraw water from groundwater sources. But some of the models separate that into renewable and non-renewable, so that we can talk about renewable first, and then if we are taking non-renewable groundwater, then we can talk about, what would happen in the future in terms of non-renewable groundwater use that leads to depletion and other things.

So, but that.. I don't think there's much advancement made in that area because it's hard to tell when we are going to deplete a particular system, or also these models don't have the economic part integrated, so we don't know what would be the economic constraints. So I remember Yoshi Wada and others, there were discussions about integrating economic models and then doing some realistic projections, that would give us the realistic scarcity index, really. Because in some reasons, right now the models assume there's unlimited water. But if there are big constraints in terms of getting water because the water table is too deep, then the scarcity index might be pretty different, right?

I think that is where we need to work a little bit more.

[00:49:53] Bridget Scanlon: Right, right. I don't think the farmers are going to be extracting water from the Moho because of the energy, the costs of extracting it and the food prices wouldn't accommodate it.

[00:50:04] Yadu Pokhrel: Yeah, and that is an important point, in India what is happening is they go and start pumping in one area, northwest India, and after some point they use a lot of diesel. These are diesel fired pumps. And, they start feeling, oh, we are using too much diesel and getting tiny bit of water, and they go and drill another one.

So there are millions of abandoned wells in South Asia because it's not feasible, or it's not worth spending even subsidized diesel from the government. Even that is not sufficient to get the amount of water they need, and they're going to nearby regions and so on.

And, as there has been a lot of discussion, I don't know until when we can do, but farmers need to think about how much water they get by spending so much, right?

[00:50:47] Bridget Scanlon: Right. Right. Well thank you so much, Yadu. Our guest today is Yadu Pokhrel, he's a professor at Department of Civil and Environmental Engineering at Michigan State. I really appreciate your overview on global modeling, and linkages to climate, and also your efforts to incorporate human intervention, which is so important.

So good luck with your research, and thank you for joining us today.

Human and Climate Drivers of Global Models with Machine Learning - Transcript

Water Resources Podcast – Yadu Pokhrel: Human and Climate Drivers of Global Models with Machine Learning

Thank you to our podcast partners