Updated 12 October, 2003
Climate Science: Development of
In order to develop a strategic approach to build a national capability of high-end modeling, climate science will be broken into three large elements, Modeling, Data, and Computational Systems. It is safe to say that all of these elements currently command substantial monetary resources and are in varying states of health. In terms of total cost, the Data Element is the largest of the three. This reflects the cost of the development, deployment, and operations of the instrumentation needed to collect the observations, as well as the cost of the data and information systems that provide data services. The charge for this document focuses on modeling and its related computing, and the observational system issues are limited to the model-data interface. Again, a comprehensive national climate service requires thorough integration of the modeling and data activities; therefore, the charge of this document may have an artificial boundary that ultimately needs to be breached.
Relative guidelines can be given for the budget of the Modeling and Computational Systems Elements. A single institution with the charge to deliver high-end modeling products to a spectrum of customers should expect its expenditure in the Computational Systems Element (see Section 4.3) to be equal to or greater than its investment in scientific personnel to support the Modeling Element. Depending on how the single institution organizes itself, the ratio of cost is on the order of 1:1 to 2:1 (Computational Systems/Modeling). There are tremendous uncertainties in the estimation of budgets for computational systems because of the policies that govern high performance computing in the U.S.
Further, there must be recognition of the need to build interfaces of the high-end modeling institute with external communities. These external communities include: the customer community, the users of the products of the high-end modeling institute; the diverse community of discovery-driven researchers, whose activities must contribute to and benefit from the high-end modeling enterprise; and other modeling centers.
The Modeling element refers to those activities involved in building, applying, and validating geophysical models. Historically, research activities in several agencies focused on models that represent the atmosphere, the oceans, the land-surface, and the cryosphere. Within these disciplines, models specific to particular problems were developed, for instance, numerical weather prediction models, stratospheric chemistry models, and long-term climate assessment models. Currently, there is a movement to view these discipline models as components of combined models that represent all of the importance processes of the Earth system. Therefore, leading-edge activities today are focused on coupling atmospheric, oceanic, land-surface, cryospheric, and, even, thermospheric models. In addition to these geophysically based discipline divisions, there are application-based divisions in which computationally demanding comprehensive models have been built. For example, for the atmosphere there are global-scale, regional-scale, mesoscale, and cloud-resolving models. The primary reason these divisions were made originally was to develop tractable problems that allowed quantitative paths to be developed. This incremental approach allowed scientists to organize the complexity of environmental phenomena; focus on specific important processes; and adaptation of problems to available computational devices.
The broad and substantial U.S. investment in this quilt of modeling activities developed both scientific expertise and model algorithms that were and are pioneering. In fact, as other countries have developed modeling capabilities, they have been able to directly benefit from these pioneering efforts. In effect, the U.S. funded much of the initial discovery-driven research that allowed, globally, the more efficient development of comprehensive capabilities by other countries. The U.S. is now faced with the need to develop a similar capability, requiring a new sort of infrastructure to facilitate this integrated activity.
We believe that models built and maintained by U.S. scientists are in some cases competitive with those run anywhere. However, the U.S. capability is fragile, and the evolution of U.S. capabilities is lagging substantially. Increasingly, U.S. competitiveness in environmental modeling rests on the success of past investments and the well-respected role of U.S. scientists in the international community. A major part of this fragility is directly related to computational resources, where the number of numerical experiments run with a particular model is much smaller than that being run with similar models overseas. This reduces the validation and analysis of model results, compromising the scientific process on which the existing U.S. strengths lie.
While the historical approach to funding modeling activities did provide a wealth of scientific results and algorithms, simply increasing the funding within this approach is not a viable path forward. There is too much fragmentation in the current research activities and the discipline- and agency- linked approach to modeling supports, and often increases, this fragmentation. The focus of effort must be raised out of the disciplines and turned towards the products that are expected from the models. It is the definition of the products and the management of the resources that allows decisions to be made about which development paths to follow.
Only with a systematic product-dependent approach can questions about the merits of investment in more resolution or more sophisticated physical parameterizations be weighed. Scientifically, increased resolution and improved physics both demand attention. However, the two are intimately related, with, for instance, increased resolution moving the development of physical parameterizations closer to first principles. Higher resolution with inadequate model physics will not lead to more credible results. Better physics in low-resolution models will still leave important questions unanswered, such as regional climate impacts. Only a comprehensive, systematic approach will suffice. Further, model development should also be linked to the Data Element, exploiting information in existing observations and directing the observational strategies to the most important new observations. Therefore a balanced approach is needed, and it is the development and nurturing of the expected products that provides the primary mechanism for guiding the balance. It is the job of the responsible managers to determine the balance while maintaining the integrity of the underlying science. As in any business, the responsible manager must consider the capabilities of other organizations, as their capabilities contribute to the definition of what the community feels is both credible and state-of-the-art.
Climate observing systems have been addressed in a separate report, Adequacy of Climate Observing Systems (National Academy Press, 1999). That report follows from earlier findings that the Earth's climate observing system is "inadequate" and "deteriorating," and endorses a set of principles that need to be addressed in order to assure an adequate observing system. Parallel to the findings of the report, Capacity of U.S. Climate Modeling to Support Climate Change Assessment Activities (National Academy Press, 1998), there is the finding that the U.S. agencies do not have an integrated, systematic approach to provide the necessary climate observations. Therefore, many of the management issues that are discussed below are relevant to the Data Element of climate science. Indeed, the Modeling and Data Elements must be addressed in a systematic and integrated way.
The inadequacies of the data system come in the face of a number of apparent contradictions. In terms of quantity of data and parameters measured, there is more data than ever before. In addition, as more and more space-based instruments are launched the amount of data stands to overwhelm the capabilities of data systems, modeling systems, and the community to utilize the observations in a timely manner. As in the case of models (discussed in Section 4.1), this contradiction is embedded in an imbalance between discovery-driven and product-driven research. However, the details of the Data Element are different.
The largest amount of money under the USGCRP umbrella is targeted towards discovery-based research measurements from space. Many of these observations target identification or quantification of specific physical and chemical processes. The data used to establish baselines of key parameters in climate studies come, however, from the operational data collected primarily in support of weather forecasting. Again, the greatest bulk of the operational data come from satellites. In total, it is safe to say that, worldwide, these data are underutilized, and there have been a number of workshops focused specifically on this problem (e.g. Errico, 1999). These operational data have not been collected with accuracy and stability requirements suitable for climate research.
While the bulk of the operational data are from space-based platforms, the most critical climate data sets are, still, from traditional land- and balloon-based observational systems. Many of these observing systems are simple technology. The collection of high-quality observations from these systems requires routine, careful process, with significant human oversight over both process and data evaluation. There is ever-present pressure to replace these observing systems with automated systems, or to force reliance on satellite instruments. Without careful planning, such changes threaten the fidelity of the climate observing system. In addition, given that oceanic and land-surface parameters are key climatic variables, there is the need to develop suitably based observing systems to provide the necessary foundation. Some focused capabilities have already been implemented, such as the Network for Detection of Stratospheric Change. Concepts for others have been developed, but require funding to build, deploy, and maintain them.
Ultimately, a critical subset of climate observations falls into the cracks between the requirements for operational weather prediction and the, often, high-technology observing systems designed for exploratory research. In the end, no Agency, no entity, has the responsibility with aligned resources, to collect and provide stewardship of climate observations. The well-intentioned efforts of the Agency research programs to address the needs for climate observations fall short.
4.3) Computational Systems
The computational requirements for climate-science modeling have traditionally been one of a handful of problems that easily and necessarily consume the capabilities of the most capable of computers. Indeed, progress in Earth science is dependent on computational resources. Computers are used to provide simulations, which are used not only for prediction but also for experimentation to isolate and understand underlying physical and chemical processes. Computers lie at the core of data assimilation, a process where observations are melded with model simulations to provide accurate estimation of the state of the atmosphere, oceans, and land-surface. Ultimately, computers provide the tool that allows the integration of the complexity of physical, spatial, and temporal scales into comprehensive models that represent our best expression of the behavior of the climate system. While we often think of "the computer" as the key component of computational systems, software is what is actually built by the scientific community. The software connects the intellectual endeavors of the scientists with the enabling technology of the computer.
The software that comprises a comprehensive model of the atmosphere-ocean-land-surface system consists of many hundreds of thousands of lines of code. The analysis software in related data assimilation systems is of the same size. Many hundreds of functions and routines are involved. The routines change with scientific advancement, and there is a continual demand to improve the accuracy of, or eliminate, the many approximations that reside in all modeling and assimilation systems. There are not standard or unique ways to express, even, the relatively stable, non-controversial, components of these models; therefore, the human diversity of the discovery-driven research community is reflected in the software systems that have been developed over the past twenty years or more.
From the 1980s through the mid-1990s, the interface with the computational platform was relatively stable. However, with the move in the U.S. away from shared-memory vector computers to distributed-memory commodity-based processors, the computational environment has become volatile. The components that comprise the high-performance hardware available to U.S. scientists are now dependent on the rapidly changing commercial market. How these components are collected together to provide high-performance computing platforms is often a research activity unto itself. Since the market for high-end computing is both small and specialized, corporate interest to develop platforms to support high-end applications does not draw top priority. The net result of these changes is that much attention needs to be placed on the interface of the application software with the computing environment in order to provide successful strategies for code runability and performance.
From a computational perspective, software is of the highest priority. A hardware-centric approach is no longer useful and the notion that "one computer fits all" is na´ve. Without a large and successful investment in software, purchasing U.S.-available, distributed-memory parallel machines will do little to advance the overall state of climate-science in the U.S. Since the software challenges are less daunting for the tightly integrated vector supercomputers provided by Japanese vendors (i.e. NEC, Fujitsu), their purchase would benefit the U.S. community immediately. In either case, however, an approach focused on sound principles of systems design and systems engineering is required. The planning for systems must extend beyond a focus on just the core models or assimilation algorithms to include adequate interfaces with data, aspects of semi-automated validation (quality assurance), and interfaces with customers. The successful development of a high-end climate-science-modeling capability will require the development of computing systems focused on this problem. The continued pursuit of technology-driven development of hardware capabilities, with the idea that climate-science applications will be able to benefit from these developments, provides, at best, an uncertain progression in the development of the needed climate-science capabilities.
Software and hardware systems must be developed with a definitive focus on the specific applications to be executed. The focus must move to the development of software systems that represent a state-of-the-art expression of the science and a flexible robust interface to an uncertain computational environment. The potential capabilities of supercomputing hardware are only realized with the development of successful supercomputing software. It is safe to conclude that the current dollar investment in software is inadequate, and that the challenges in software development are enormous with many intrinsic risks. The issues of software design and development will be discussed more fully in the next section.
4.4) Integration across Elements, Institutions, and Disciplines
The traditional evolution of the U.S.'s diverse modeling activities was highlighted in Section 4.1. As the individual disciplines move to incorporate algorithms from other disciplines, program managers and funding officials expect the research in these different areas to contribute to research in other areas. However, the contribution of one research area to another is far from optimal, and it is one of the challenges of the climate-science community to make these connectivities more effective.
The need for more effective integration goes beyond programmatic efficacy. The quality of the different sub-disciplines has developed to the point that many scientific challenges now lie in expansion out of the original discipline. This is quite clear in weather prediction and climate simulation where many of the physical processes are essentially the same, and where reconciliation of long-term (i.e. climate) behavior with short-term (i.e. weather) behavior confronts fundamental processes. Similarly, reconciliation of the behavior of parameterizations across spatial scales from cloud-resolving to global obviously improves the robustness of predictive models.
The development of multiple, comprehensive, high-end capabilities, for example a unified chemistry-climate model, from the existing specialized efforts spreads resources thin. For instance, the extension of a stratospheric chemistry model to include tropospheric chemistry and then further to climate simulation requires the inclusion of physical processes that are already central to the activities of the traditional climate community. Similarly, the extension of a tropospheric climate model to study stratospheric processes and to further include ozone photochemistry requires the inclusion of physical and chemical processes that are core to the stratospheric chemistry community. There is a natural tendency for these groups to look at their native model as the core tool and then either adapt algorithms from the other community or invent derivative algorithms that adapt to the particular model. This leads to models that are not of uniform quality across their entire suite of algorithms. Further, it consumes the time and energy of scientists in reworking what are often routine algorithms and while it advances the completeness of the models, it often does not advance the scientific integrity of the field as a whole. The development of a unified infrastructure in which scientists from multiple institutions can contribute concurrently is an essential, and currently absent, ingredient in an effective high-end modeling capability.
Therefore, substantial integration is needed across disciplines, across institutions, and across the three Elements: Modeling, Data, and Computational Systems. In fact, the historical programmatic investment in each of these parts individually lies at the fragmentation that currently permeates the field. Recently, there have been some successes as scientists and program managers recognize the fragmenting processes and try to overcome them. However, the sociological inertia in the current culture is difficult to overcome. Suitable levels of integration need to be pursued as strategic initiatives, but they must be kept well enough defined to better the possibility of success.